Exemplar 3 - Preprocessing

The data for this exemplar were obtained from the ESRC data archive. In preparing the data sets for the web, steps were taken to ensure that no individual could be identified from the data. In particular the following steps were taken:-

The programs below are provided to allow you to see how the data were extracted from the large survey files and made ready for analysis by different packages.

The programs below are provided to allow you to see how the data were extracted from the large survey files and made ready for analysis by different packages. It is not expected that you would want to do this again, but you may find it helpful for doing similar things yourself.

The data analysed in exemplar are taken from the 1998 Scottish Health Survey, This data set is available from the ESRC data archive but it is not obvious how to identify the primary sampling units from this file. The code to do this is in the SPSS syntax to assist anyone else with access to the full data file. This will not result in any disclosure to those without access to the data. A small data set has been extracted to include only the variables needed here.

 The program to extract the data and identify the PSUs this was written in SPSS and the syntax is in is: ex3_prep.SPS   It produces an SPSS data set ex3.sav, and an SPSS transport format file ex2.por as well as a SAS data set ex3_prep.sas7bdat.

The R AND Stata files are produced via a SAS export file and are created with the following steps:-
  • - The SAS code to read the files ex3.sas7bdat  and produce a transport file ex3.xpt is in the program is: ex3_prep.sas  .
  • - The R commands to read ex3.xpt into R and produce ex3.Rdata is in: ex3_prep.R.
  • - The Stata do file to read ex3.xpt into Stata and produce ex3.dta is in: ex3_prep.do.

    The data sets ( in bold )  mentioned here are restricted to the variables needed for these analyses. They can all be accessed from the main page of exemplar 3.

    click here to return to main exemplar 3 page