# 2.3 Correlation

Scatterplots are the best way to visualise correlation between two continuous (scale) variables, so let us start by looking at a few basic examples. The graphs below illustrate a variety of different bivariate relationships, with the horizontal axis (x-axis) representing one variable and the vertical axis (y-axis) the other. Each point represents one individual and is dictated by their score on each variable. Note that these examples are fabricated for the purpose of our explanation - real life is rarely this neat! Let us look at each in turn:
Several points are evident from these scatterplots. - When one variable increases when the other increases the correlation is positive; and when one variable increases when the other decreases the correlation is negative.
- The strongest correlations occur when data points fall exactly in a straight line (
**Figure 2.3.1**). - The correlation becomes weaker as the data points become more scattered. If the data points fall in a random pattern there is no correlation (
**Figure 2.3.3**). - Only relationships where one variable increases or decreases systematically with the other can be said to demonstrate correlation - at least for the purposes of regression analysis!
The next page will talk about how we can statistically represent the strength and direction of correlation but first we should run through how to produce scatterplots using SPSS/PASW.
Let's use the LSYPE 15,000 dataset to explore the relationship between Key Stage 2 exam scores (exams students take at age 11) and Key Stage 3 exam scores (exams students take at age 14). Take the following route through SPSS:
A pop-up will now appear asking you to define your scatterplot. All this means is you need to decide which variable will be represented on which axis by dragging the variables from the list on the left into the relevant boxes (or transferring them using the arrows). We have put the age 14 assessment score (ks3stand) on the vertical y-axis and age 11 scores (ks2stand) on the horizontal x-axis. SPSS/PASW allows you to label or mark individual cases (data points) by a third variable which is a useful feature but we will not need it this time. When you are ready click on
Et Voila! The scatterplot (
Note that there is what is called a 'floor effect' where no age 11 scores are below approximately '-25'. They form a straight vertical line in the scatterplot. We discuss this in more detail in Extension A, which is about transforming data. This is important but for now you may prefer to stay focused on the general principles of simple linear regression. Now that we know how to generate scatterplots for displaying relationships visually it is time to learn how to understand them statistically. |