Exemplar 2 - Preprocessing

The programs below are provided to allow you to see how the data were extracted from the large survey files and made ready for analysis by different packages. It is not expected that you would want to do this again, but you may find it helpful for doing similar things yourself.

The data analysed in exemplar are taken from the Scottish Household Survey, years 2001 nnd 2002. The data used is taken from the interview with the 'Random adult' .

The data for this exemplar are available in the ESRC  data archive. But the data sets at the archive do not identify PSUs because of concern about the confidentiality of respondents. A data set was made available in SAS from the Scottish Executive (SHS team) with PSUs added. To prevent identification of individuals we have modified the data and taken other precautions as described here. The data set includes only the variables needed for the analyses illustrated here..

Some data problems were identified, because some clusters were not nested in strata, and some strata had only one PSU. To overcome these problems data corrections were carried out in SAS.

 The program to extract and correct the data this was written in SAS and is: ex2_prep.sas   It produces a SAS data set ex2.sas7bdat, and a transport file to read into other systems (ex2.xpt).

To investate how each package handles data problems a further data set was produced of the data without correcting them. The program to produce this is: ex2_prep_nc.sas and it produces a SAS data sets (ex2_nc.sas7bdat )and a transport file to read into other systems (ex2_nc.xpt).

The following steps were used to read the SAS data sets into other packages :-

The data sets can all be accessed from the main page of exemplar 2.