About this Resource
P|E|A|S FAQ   | home   | contact us   | index   | links and further resources   |

8. Checking the Survey Design

8.1 Introduction > 8.2 Split PSU's > 8.3 Lonely PSU's > 8.4 Subsamples and Regression


Methods used to analyse surveys require that we specify the survey design. Sometimes the data will not agree with the design that has been specified. There are two main problems that can happen.


8.2 Primary sampling units split across strata
a split psu

If the design is correct a InfoButtonPSUs should never be split across strata. This can happen because of data errors or else because the naming convention has re-used PSU identifiers in different strata.(e.g. PSU 1 in stratum 1 may be a different one from PSU 1 in stratum 2).

Most methods of analysing surveys will be calculating the variability of the estimates from differences between PSUs within strata (more about stratification). So they will fail if there are any split PSUs.

One answer to get over the problem is to subdivide the PSUs into sub-PSUs so that they are nested within strata. This is the right thing to do if it is just a labelling problem. If there are only a few such among thousands of PSUs this may be OK, but generally it is not a good thing.

Packages can provide two things to help:
  • different options about how they handle split PSUs
  • tools to find out where the problem is.
Features SPSS complex surveys R Survey Package   SAS   survey   procedures Stata
warning about split PSU's no yes no no
default action seems to split PSU's fail with warning splits with log warning splits PSU's
Options none fail or split no no
Table 7.1 Packages and split PSU's

Examples of the effect of splitting PSUs on results can be found in Exemplar 1 and Exemplar 2.
8.3 Lonely PSUs
a lonely psu

Having a stratum with just one PSU is not in conflict with the survey design, but it poses problems for some methods of estimating precision.

In stratum 2 we have no estimate of the variability between PSUs. This can cause some methods of analysis to fail. When this happens there are various options.
  • Group the strata together to prevent this happening. This can be a nuisance if there are just one or two lonely PSU strata among lots of other strata.
  • Set the variation for this stratum to zero. This is OK if this PSU has been selected with certainty, e.g. a business survey where there is just one very large firm in a stratum by itself
  • Use an approximate method. One example is to take the variability of the lonely PSU from the overall mean rather than the stratum mean. This tends to give values that may be too high.
Packages can (but some don't)
  • give different options
  • provide diagnostics to find where the lonely PSUs are.
The table below gives our current assessment (September 04) of what survey packages do in these cases.
Features SPSS complex Surveys R survey package SAS survey procedures Stata
Summary of survey giving PSU's per stratum no yes summary yes as an option with STRATUM yes survey design
Lonely PSU's highlighted no no no yes
Warnings when
there are lonely PSU's
no yes varying by options set yes - in log file yes

Options for approximate methods


set to zero or two approximate methods

set to zero none

Exemplar 2 illustrates how annoying these minor discrepancies between the design and what is required for the analysis to work can be to the analyst, and shows how they can be overcome.


8.4 Subsamples and regression

A survey with no lonely PSUs can easily have some when only a subsample of the data are examined. Both of the packages that have failures or warnings for a whole survey (R and Stata) will not give failures when you define subgroups correctly (see theory section7.7).

Regression methods are not affected by lonely PSUs in any package. The reasoning as to why this is OK is a bit tricky to get your mind round, but all the people we have spoken to seem top think this is OK. Setting up your analysis as a regression, rather than a comparison of means or proportions, when there are a few lonely PSUs can provide a solution to save time and trouble.

peas project 2004/2005/2006.