About this Resource
P|E|A|S FAQ   | home   | contact us   | index   | links and further resources   |

7 Analysing subgroups of surveys

7.1 Introduction > 7.2 Stratification and subgroups > 7.3 The effect of subgroups 7.4 Design effects for subgroups


7.1 Introduction

More often than not, we are interested in only a part of a survey population. Geographic areas may be of interest in their own right. Or we may want to analyse the answer of a question asked of a specific sub-population (e.g. hours of use for internet users, income for single parent families).
7.2 Stratification and subgroups

When a survey has been stratified or post-stratified it is important to analyse the data correctly, as follows:

  • Define the survey design for the whole sample
  • Use a command that asks for a subsample analysis.

The reason for this is that the design of a stratified sample ensures that the proportions in the sample match exactly with those in the sampling frame. Thus there is no between-strata variation contributing to the variance. But when we select only a subset of the PSUs, or the respondents this will no longer be true.

The number of single parents in the sample, in a particular stratum, will not match exactly with the numbers expected from the population more about income for single parents in (FRS). So we need to take account of this by defining the design for the whole sample, and then analysing a sub-sample more about subgroup sampling (SHS).

Survey analysis packages should allow correctly for this uncertainty. When there are missing values for a variable in a survey, then using just the complete cases is equivalent to taking a sub-sample.

The only exception to this rule is when the subgroups of interest are complete strata. For example, when a sample is stratitified by region. In this case each stratum can be considered as a mini-survey and can be analysed on its own.

7.3 The effect of sub-groups on the precision of survey estimates

For a complete survey, stratification can improve the precision of estimates while clustering may make them less precise. These effects may be less evident in subgroups, but this depends entirely on how the subgroups are formed.

subsample clustering
Figure 7.1 First subsample of a survey with clustering and stratification

If, as shown here by the red squares, the selected sample is a complete stratum or a group of strata, then the subgroup will inherit all the design properties of this part of the survey. If there is further stratification that is nested within the first set of strata (e.g. stratification by social class within regions then the sub-sample will benefit from this stratification.) more about sub groups (FRS)

subsample clustering
7.2 Second subsample of a survey with clustering and stratification

If the subgroup of interest is selected within PSUs and also across strata, the precision of statistics for the subgroup will be more like a random sample than the whole survey. The effect of clustering will be less damaging as the clusters are smaller, and the benefit from stratification will not be so great. We are still working on this with exemplars, and not quite sure if it is right.

But, if we want to compare the sub-sample with the other sample members, here the red squares with the circles, this will be the best design because the comparisons will be within the PSUs and will be balanced by the stratification.

Fig 7.3 Third subsample of a survey with clustering and stratification

Here we have strata selected so that complete PSUs are in the subgroup. The estimates for this subgroup get the worst of both worlds. They will not get the full benefit from the stratification, but will pay the full price for clustering. More commonly the subgroup will not be complete PSUs, but a subgroup may be very clustered within PSUs. For example, unemployed people will tend to be concentrated in certain areas where there is little work available.

In this case the precision of comparisons between the subgroup and the other sample members will be affected by the clustering.

7.4 Design effects for subgroups

When a subgroup of a survey is analysed we can define a InfoButtondesign effect (DE) and a InfoButtondesign factor (DF) for the subgroup. The subgroup design effects and factors will usually be different from that for the whole survey.

There are two different ways in which design effects can be defined for subgroups.

Remember that DEs and DFs compare a survey design with what we would get from simple random sampling. For subgroups we can think of two different ways of doing a simple random sample:-

1. A simple random sample of the whole survey so that the subgroups are represented in proportion to their size

2. A simple random sample for each subgroup selected by taking a simple random sample of size equal to the number of individuals in the survey within that subgroup.

The first method compares the current design to a random sample over the whole population. Thus it will give a low design effect (good precision) for subgroups that are over represented in the survey.

The second method would give all subgroups the design effect that they would get if a sub-survey were designed consisting of just one subgroup at a time including the same number of units that are in the current sample. So in this case the DEs and DFs are affected only by the design features within the subgroups.

Some packages give an option to select which of these you want.

peas project 2004/2005/2006