Checking the survey design

P|E|A|S

text/print version

8. Checking the Survey Design

8.1 Introduction > 8.2 Split PSU's > 8.3 Lonely PSU's > 8.4 Subsamples and Regression

top

8.1 Introduction

Methods used to analyse surveys require that we specify the survey design. Sometimes the data will not agree with the design that has been specified. There are two main problems that can happen.

8.2 Primary sampling units split across strata

If the design is correct a InfoButton PSUs should never be split across strata. This can happen because of data errors or else because the naming convention has re-used PSU identifiers in different strata.(e.g. PSU 1 in stratum 1 may be a different one from PSU 1 in stratum 2).

Most methods of analysing surveys will be calculating the variability of the estimates from differences between PSUs within strata (more about stratification). So they will fail if there are any split PSUs.

One answer to get over the problem is to subdivide the PSUs into sub-PSUs so that they are nested within strata. This is the right thing to do if it is just a labelling problem. If there are only a few such among thousands of PSUs this may be OK, but generally it is not a good thing.

Packages can provide two things to help:

different options about how they handle split PSUs
tools to find out where the problem is.

Features	SPSS complex surveys	R Survey Package	SAS survey procedures	Stata
warning about split PSU's	no	yes	no	no
default action	seems to split PSU's	fail with warning	splits with log warning	splits PSU's
Options	none	fail or split	no	no

Table 7.1 Packages and split PSU's

Examples of the effect of splitting PSUs on results can be found in Exemplar 1 and Exemplar 2.

top

8.3 Lonely PSUs

Having a stratum with just one PSU is not in conflict with the survey design, but it poses problems for some methods of estimating precision.

In stratum 2 we have no estimate of the variability between PSUs. This can cause some methods of analysis to fail. When this happens there are various options.

Group the strata together to prevent this happening. This can be a nuisance if there are just one or two lonely PSU strata among lots of other strata.
Set the variation for this stratum to zero. This is OK if this PSU has been selected with certainty, e.g. a business survey where there is just one very large firm in a stratum by itself
Use an approximate method. One example is to take the variability of the lonely PSU from the overall mean rather than the stratum mean. This tends to give values that may be too high.

Packages can (but some don't)

give different options
provide diagnostics to find where the lonely PSUs are.

The table below gives our current assessment (September 04) of what survey packages do in these cases.

Features	SPSS complex Surveys	R survey package	SAS survey procedures	Stata
Summary of survey giving PSU's per stratum	no	yes summary	yes as an option with STRATUM	yes survey design
Lonely PSU's highlighted	no	no	no	yes
Warnings when there are lonely PSU's	no	yes varying by options set	yes - in log file	yes
Options for approximate methods	no	set to zero or two approximate methods	set to zero	none

Exemplar 2 illustrates how annoying these minor discrepancies between the design and what is required for the analysis to work can be to the analyst, and shows how they can be overcome.

top

8.4 Subsamples and regression

A survey with no lonely PSUs can easily have some when only a subsample of the data are examined. Both of the packages that have failures or warnings for a whole survey (R and Stata) will not give failures when you define subgroups correctly (see theory section7.7).

Regression methods are not affected by lonely PSUs in any package. The reasoning as to why this is OK is a bit tricky to get your mind round, but all the people we have spoken to seem top think this is OK. Setting up your analysis as a regression, rather than a comparison of means or proportions, when there are a few lonely PSUs can provide a solution to save time and trouble.

peas project 2004/2005/2006.