Using Statistical Regression Methods in Education Research

SRME

Extension B: What do I do about missing data?

Missing data

Throughout this website we have usually assumed that missing data is what is known as ‘missing at random’. This means that there is no consistent reason why data is missing for particular cases and subgroups within our population will have roughly the same amount of cases with missing data. In other words missing data is spread ‘randomly’ throughout the entire sample.

Making this assumption can sometimes be unsound. For example, suppose that 100 people tried a special one-month diet designed to help you lose weight. Of those who started this diet, 50 completed it, all of whom lost a statistically significant amount of weight. Can we conclude the diet was effective? After all, though some of our data is missing, of the 50 participants in our sample 100% reported weight loss. Hooray! The diet works!

But perhaps this conclusion is hasty. Why did we lose 50 participants? Did they have less willpower than the other participants? Were people dropping out because they found out they were actually gaining weight? Were males more likely to stick to the diet than females? Are there 50 people who didn’t complete the diet because they were admitted to hospital with malnutrition? We don’t know... the dropout may not have been completely random.

The moral of this story is that you should always check for meaningful patterns in missing data – there may be a good reason why data goes missing that may be important to your analysis and interpretation!

Missing at random and missing not at random data

There were some minor issues of this type when the LSYPE data was collected. Some of the participants and their parents did not use English as their first language and so required a translator to take part in a few of the interviews. However translators for a small number of languages were not available when the interviews took place and so data was missing for many of the participants from a particular ethnic background. If we were exploring the relationship between ethnicity and those variables for which data was missing for one specific ethnic group we could get a distorted picture of that relationship.

Page contact: Feedback to ReStore team Last revised: Fri 4 Jun 2010
Back to top of page