* PRE PROCESSING FOR EXEMPLAR 3 IN SPSS LOTS OF CODE NEEDED TO IDENTIFY PSUs AND ALSO TO SORT OUT WHICH ONES GIVEN OUT IN WHICH MONTH WRITES FILE EX3 dot SAV WITH SMOKING AND DEMOGRAPHIC VARIABLES WRITTEN GILLIAN RAAB OCTOBER 04 FOR PEAS PROJECT. IMPORT FILE= 'C:\Documents and settings\gillian raab\My Documents\aprojects\peas\web\exemp3\data\shs98i.por'. *FILE='H:\aprojects\peas\exemp3\data\shs98i.por'. save outfile='C:\Documents and settings\gillian raab\My Documents\aprojects\peas\web\exemp3\data\shs98i.sav'. * select adults only. select if (resptyp=2). execute. *make new sequential ID. compute idno=$CASENUM. execute. VARIABLE LABELS IDNO "sequential ID". * add noise to the weights. compute r=normal(1). compute weighta=weighta*(0.1*r+1). execute. * get grossed up weight by multiplying by mid year population estimate total for age groups and dividing by sum of weights. compute grosswt=weighta*3727000/8993. VARIABLE LABELS grosswt "weight grossed to population totals". * keep only variables needeed. SAVE OUTFILE="C:\Documents and Settings\gillian raab\My Documents\aprojects\peas\web\exemp3\data\small.sav" /keep=age sex region hboard carstair carstg5 cigst1 cigst2 archpt nofad sc weighta grosswt idno. GET FILE='C:\Documents and Settings\gillian raab\My Documents\aprojects\peas\web\exemp3\data\small.sav'. execute. compute psu=archpt. execute. *******this improved code provided by Andrew Elders replaces bit below commented out***************. *The aggregate command will * - create an output file with one-record-per-archpt (without duplicates) * - keep the first record within each archpt group * - sort the file into ascending archpt order * - save only the 3 variables (region, carstair and archpt) to the output file. aggregate outfile='psu1.sav'/break psu/region carstair=first(region carstair). execute. ****************************************************************************. * Identify Duplicate Cases in fact none * and identify forst one in each group by archpt. *SORT CASES BY PSU(A) . *MATCH FILES /FILE = * /BY psu */FIRST = PrimaryFirst /LAST = PrimaryLast. *EXECUTE. *and save one per ARCHPT and save file with one record per archpt. *select if PrimaryFirst=1. *execute. *SAVE OUTFILE='psu1.sav' * /KEEP=region carstair archpt/COMPRESSED. GET FILE='psu1.sav'. SORT CASES BY region (A) carstair (A) . execute. * get new value for psu for pairs. COMPUTE id=$CASENUM. execute. COMPUTE even = (trunc(id/2)*2=id ). execute. * attach a PSUNO record to each archpt case. * where these are in the order on the selected file. * and will also give ordering by months. IF EVEN=0 psuno=1+($CASENUM-1)/2. IF EVEN=1 PSUNO=$CASENUM/2. if (even=1) psu=lag(archpt). IF (EVEN=0) PSU=ARCHPT. EXECUTE. * now sorting by psu gives the order in which psus selected over months. * note psu numbers are not quite sequential. SORT CASES BY psu (A) . execute. if (even=1) psu=trunc($casenum/2). if (even=0) psu=1+trunc($casenum/2). compute month=trunc((psu-0.5)/26)+1. compute quarter=trunc((psu-0.5)/26/3)+1. execute. VARIABLE LABELS archpt "label for field work 2 per psu". VARIABLE LABELS psu "no from order of selection over months". variable labels psuno 'no from order in region carstairs sequence'. VARIABLE LABELS month "month address issued". VARIABLE LABELS quarter "quarter adress issued". EXECUTE. * now get implicit strata by sorting back in region and carstairs order. SORT CASES BY region carstair (A) . execute. MATCH FILES /FILE = * /BY region /FIRST = regfirst /LAST = reglast. EXECUTE. * identify first in a region and set to first stratum and within loop set all even records to lag of previous. * and for odd numbers make into pairs of even records. do if ( regfirst=1). compute stratum=1. compute nin=1. else if (even=0). compute stratum=lag(stratum). compute nin=lag(nin)+1. do if (nin=3). compute stratum=stratum+1. compute nin=1. end if. else if (even=1). compute stratum=lag(stratum). compute nin=lag(nin). end if. execute. * now sort by descending to pick out odd ones in strata of 1 at the end of each sequence. sort cases by region stratum nin (D) reglast(D). execute. do if (( reglast=1) and (nin=1)). compute stratum=stratum-1. compute nin=3. end if. do if lag(nin)=3 and lag(reglast)=1. compute stratum=stratum-1. compute nin=3. end if. compute regstrat=region*100+stratum. * now assign each one the number in stratum for all members of stratum. sort cases by regstrat (A) nin (D). execute. MATCH FILES /FILE = * /BY regstrat /FIRST = strfirst . execute. if strfirst=0 nin=lag(nin). execute. VARIABLE LABELS archpt "label for field work 2 per psu". VARIABLE LABELS psu "no from order of selection over months". variable labels psuno 'no from order in region carstairs sequence'. VARIABLE LABELS regstrat "unique stratum no including regional id". VARIABLE LABELS stratum "stratum no within region". VARIABLE LABELS nin "number of psus in stratum". EXECUTE. SORT CASES BY ARCHPT(A) . SAVE OUTFILE='psus.sav' /DROP=region carstair id even regfirst reglast strfirst/COMPRESSED. *get file ='psus.sav'. GET FILE='C:\Documents and Settings\gillian raab\My Documents\aprojects\peas\web\exemp3\data\small.sav'. SORT CASES BY ARCHPT(A) . MATCH FILES /FILE=* /TABLE='psus.sav'/BY archpt. EXECUTE. * sAVE ORIGINAL FILE W TWO NEW VARIABLES PSU AND PSUNO etc. SAVE OUTFILE='C:\Documents and Settings\gillian raab\My Documents\aprojects\peas\web\exemp3\data\ex3.sav' /drop=archpt month quarter stratum nin carstair region psuno /COMPRESSED. SAVE TRANSLATE OUTFILE='C:\Documents and Settings\gillian raab\My Documents\aprojects\peas\web\exemp3\data\ex3.sas7bdat' /TYPE=SAS /VERSION=7 /PLATFORM=WINDOWS /MAP /REPLACE /VALFILE='C:\Documents and Settings\gillian raab\My Documents\aprojects\peas\web\exemp3\data\ex3_formats.sas' . SAVE TRANSLATE OUTFILE='C:\Documents and Settings\gillian raab\My Documents\aprojects\peas\web\exemp3\data\ex3.xpt' /TYPE=SAS /VERSION=X /MAP /REPLACE . .