/*------------------------------------------------------------ Analyse FRS data in Stata Exemplar 1 ex1.do version 9 July 06 HOW TO USE THIS FILE To submit commands highlight the line you want to submit and go to >Do on the >Tools menu The results will then appear in the main Stata window Any text between a /* and a */ is treated as a comment. At the Stata command window typing "help do" will give more information and "help help" gives very useful information on viewing results HIGHLIGHT AND do ONE LINE AT A TIME STARTING AFTER THE comments first get the simple mean and confidence interval of the income (unweighted) ----------------------------------------------------------*/ ci hhinc /*----------------------------------------------------------- now set up the survey description with clustering and weighting and run the svydes to get a description of it If you want to see the contents of the data file go to the main Stata window and to >window >data editor --------------------------------------------------------------*/ svyset psu [pwei=gross2] svydes /*-------------------------------------------------------------- This design is now part of your data file and if you save your data file it will be saved with it. Now use the svy: mean procedure to get the weighted mean with a standard error appropriate for clustered data HOW MANY PSUs ARE IN THIS DATA SET? ---------------------------------------------------------------*/ svy: mean hhinc /*------------------------------------------------------------ etstat is used to get design effects ----------------------------------------------------------------*/ estat effects /*---------------------------------------------------- We will now look at what the standard error for the mean income would have been for a weighted survey but with no clustering. First remove all the survey design featuresand then set a new design with just weighting and no clustering notice how many PSUs it gives you NOW ----------------------------------------------------------------*/ svyset [pwei=gross2] svydes svy: mean hhinc estat effects /*--------------------------------------- now testing the wrong kind of weights see Stata's help for weights for a good explanation ------------------------------------------------------*/ ci hhinc [aweight=gross2] ci hhinc [fweight=gross2] /*------------------------------------------------------------ Now clustering but no weighting WHICH HAs THE LARGER EFFECT ON THE DESIGN EFFECT CLUSTERING OR grossING? ----------------------------------------------------------------*/ svyset psu svydes svy: mean hhinc estat effects /*----------------------------------------------------------- Now subgroups First for the survey data with weighting only and looking at the subgroup of single parents ---------------------------------------------------------------*/ gen lonep=(adulth==1 & depchldh>0) svyset[pweight=gross2] svy, subpop(lonep): mean hhinc estat effects, srssubpop /*-------------------------------------------------------- now add clustering ---------------------------------------------------*/ svyset psu svy, subpop(lonep): mean hhinc estat effects, srssubpop /*------------------------------------------------ notice that we have used the subpopulation option of a simple random sample within subpopulations here, to compare our results with what we would have found for a simple random sample of 334 lone parents. See exemplar 3 for a discussion of this. Without this option we would have found different results that relate to the whole survey as a SRS. -------------------------------------------------*/ estat effects /*------------------------------------------------- This survey was weighted and clustered and poststratified at the UK level by tenure and council tax band. To poststratify at the Scotland level we need to get and supply to Stata the additioanl data for the population totals of these variables. This is done by the following code. -------- first council tax band percentages -------*/ gen cttot=0 replace cttot= 24.83 if ctband==1 replace cttot= 24.62 if ctband==2 replace cttot= 15.45 if ctband==3 replace cttot=11.91 if ctband==4 replace cttot=11.96 if ctband==5 replace cttot=5.95 if ctband==6 replace cttot=3.94 if ctband==7 replace cttot=0.45 if ctband==8 replace cttot=0.89 if ctband==9 /*-------------- and tenure tyes--------------------*/ gen tentot=0 replace tentot= 62.63 if tenure==1 replace tentot= 21.59 if tenure==2 replace tentot= 5.58 if tenure==3 replace tentot= 10.20 if tenure==4 /*---------------- rescale to scottish houisehold totals-------*/ replace cttot=cttot/100*2242012 replace tentot=tentot/100*2242012 /*---------------------------------------------------------- The survey was matched to both totals for the UK by a method called raking. Stata cannot do this yet, but we can look at post-stratification by each factor separately --------------------- first by tenure -------------------------*/ svyset psu[pwei=gross2], poststrata(tenure) postweight(tentot) svy: mean hhinc svyset psu[pwei=gross2], poststrata(ctband) postweight(cttot) svy: mean hhinc /*---------------------------------------------------------- poststratified estimators don't work with the estat effects command, but we can see that both give lower standard errors than before. In Stata version 8 it was necessary to use the contributed library with replication weights to analyse post-startified data correctly. In version 9 we can get these methods as part of the svy commands. We can check that they give results very close to the method used in Stata 9 for post-stratification. --------------------------------------------------------------*/ svyset psu[pwei=gross2], poststrata(tenure) postweight(tentot) vce(jackknife) svy: mean hhinc svyset psu[pwei=gross2], poststrata(ctband) postweight(cttot) vce(jackknife) svy: mean hhinc