About this Resource

Single level modelling approaches -multiple linear and logistic regression -are valuable methods to look at the nature and extent of associations of explanatory variables with an outcome of interest. However, many populations of interest in social science have a multi-level structure. If we ignore the structure and use a single level model, our analyses may be flawed because we have ignored the context in which processes may occur. Examples of multilevel populations include pupils (level 1) in schools (level 2), or people (level 1) in areas (level 2). Taking the second example, if we choose a single level modelling approach, we must decide whether to carry out the analysis at the individual level or at the area level.

If we carry out the analysis at the individual level and ignore the context we may miss important group level effects -this problem is often referred to as the atomistic fallacy.

This may occur, for example, when we consider unemployment as an outcome of interest and look at this with respect to invidivdual characteristics such as gender, ethnic group and qualifications but do not take the local labour market conditions into account.

If we carry out a single level analysis at the group level and assume the results also apply at the individual level our analyses may be flawed because there are problems of making individual level inferences from group level analyses. This phenomenon is known as the ecological fallacy. This would occur, for example, if the unemployment rate was the outcome of interest and this was related to an area level explanatory variable such as the proportion of people in rented accommodation in each area. This analysis would provide an estimate of the area level relationship between the proportion renting and the unemployment rate but it could not be immediately inferred that this relationship holds at the individual level for unemployed people and people who rent. This phenomenon is known as the ecological fallacy.

Multilevel models have been developed to allow analysis at several levels simultaneously, rather than having to choose at which level to carry out a single level analysis. Multilevel models can be fitted for dependent variables that are interval scale or with categorical outcomes. As well as allowing the relationship between the explanatory variables and dependent variables to be estimated, having taken into account the population structure, multilevel models enable the extent of variation in the outcome of interest to be measured at each level assumed in the model -both before and after the inclusion of explanatory variables in the model.

For example we may wish to assess the extent of variation in examination performance at 16 at the pupil level and at the school level, this would allow us to answer the following research questions:

What proportion of variation in examination performance occurs between schools and what proportion occurs between pupils?

How much of this pupil and school level variation is explained when explanatory variables such as prior examination performance and gender are included in the model?

Multilevel modelling techniques developed rapidly in the late 1980s, when the computing methods and resources for this modelling procedure improved dramatically. Much of the literature on multilevel modelling from this period focuses on educational data, and explores the hierarchy of pupils, classes, schools and sometimes also local education authorities. Measures of educational performance, such as exam scores are usually the dependent variables in this research.

The multilevel model also has other useful properties. Firstly, models can be specified to allow different relationships between the dependent variable and explanatory variables within different groups.

For example, to allow a school-specific relationship between prior and current examination performance. Conceptually, this is similar to allowing a separate regression line for each school but statistically the multilevel model is a much more efficient way to proceed than via a separate regression analysis within each school.

Multilevel models are also more statistically efficient (i.e. make better use of the available data) than an alternative fixed effects approach which would involve adding dummy variables and their interactions to the multiple linear or logistic regression models.

Secondly the multilevel model provides a natural and appropriate framework for combining data from different sources at one of the levels assumed in the model. For example if we specify a multilevel model with individual at level 1 and country at level 2 and we have sample survey data for a number of countries such as the European Social Survey (ESS). We can use this dataset to assess the associations of age, gender, employment status etc with the chance that someone turns out to vote. If we have additional country level data, such as information from Eurostat New Cronos on social cohesion or long term unemployment, we can include this information in the model as a set of country level variables.

A standard multilevel dataset comprises a set of individual level data with group level indicators. An example would be ESS data where data are available for individuals (level 1) and an indicator of country (level 2) is available for each individual. If additional country level such as the Eurostat New Cronos data are available, these can be combined with the ESS data at country level in the multilevel model, as explained theoretically in models 5 and 6 in Section 4 and from a practical perspective in Section 5.

The University of Manchester; Mimas; ESRC; RDI

Countries and Citizens: Unit 5 Multilevel modelling using macro and micro data by Mark Tranmer, Cathie Marsh Centre for Census and Survey Research is licensed under a Creative Commons Attribution-Non-Commercial-Share Alike 2.0 UK: England & Wales Licence.