# Extension A: What should I do if my continuous variables are not normally distributed?

There are occasions where your continuous variables may not be normally
distributed. This is often caused by ceiling or floor
effects
where data points gather at the extremes of the scale but it can occur for a great many reasons. This can lead to many a researcher tantrum – after all, doesn’t this mean that regression analysis cannot be used? In actual fact this is not
Let’s take a close look at our histogram. Two things are notable about the distribution: - There is a floor effect - a large number of cases with an average age 11 mark of zero. These are not missing cases - students who were absent from the ks2 tests have already been excluded and in the SPSS data file they have the system missing value (indicated as ‘.’). The scores of zero indicate the 214 students whose attainment is judged by their teachers to be so low that they are working at a level below that assessed by the tests. These represent about 1.5% of the students in the sample. We do not want to exclude these students as we have a genuine measure of their attainment; it is just that we are not able to differentiate between them (all we can say is that they all got the lowest possible score). While they do not fit well into a normal distribution it would be a quite an omission to exclude them from the analysis.
- Second, the distribution does not fit the normal curve very well! The data are skewed and there is a long
tail of lower scores. Technically the distribution would be described as
**negatively skewed**, as there are more data points than expected in the left tail of the distribution. (If there were more data points than expected in the right tail of the distribution it would be described as positively skewed).
This non-normal distribution is a significant problem if we want to use parametric statistical tests with our data, since these methods assume normally distributed continuous variables.
What can we do about this? Luckily SPSS has a number of options to
In this section we will transform our ks2 score to the normal equivalent. From the menu you would do this as follows:
SYNTAX ALERT!!! You could also do this using the following syntax, if you are so inclined:RANK VARIABLES=kS2score (A) /NORMAL This creates a new variable prefixed by N to indicate normalised, so here the new variable is called
Compare
This new variable has a mean of 0 and standard
deviation of 1. However for the purposes of our examples it will be easier to work with larger numbers without decimal places, so we will just multiply
There are a range of other transformations that can be used to correct non-normal data distributions. While many students are (understandably) suspicious of such transformations (it sounds like fudging or fixing your data!) the key principle is that you apply the same transformation to all your data points. We don’t intend to go through all the transformations here, but a good transformation should be a relatively simple and straightforward operation. For example a square root transformation is often use when the outcome data (for example income) might be positively skewed (for example by having a small number of very high salaries for millionaires). Taking the square root of large values has more of an impact than taking the square root of small values. Consequently taking the square root of your data points will bring any large scores closer to the centre. Other common transformations include taking the log [log(x)] or the reciprocal [1/x] of your data. See Field, 2009 (p153-164) for a detailed discussion of transformation of data.
...However, if you want to learn more about computing variables here is a brief guide to how we did it. We need to type the name of our new variable into the small window in the top left called
COMPUTE kS2stand=RND(nks2scor*10,1) The descriptive statistics for ks2stand are shown below (
We have followed the same process as described above to create transformed variables for age 14 standard score (ks3stand) and age 16 standard score (ks4stand). You will find these in the data file. As an optional exercise follow the example above to transform |