CTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> Exemplar 1 Stata code
/*--------------------------------------------------- HOW TO USE THIS FILE This is an HTML version of the Stata do file ex1.do Comments like this are shown in green ---------------------------------------------------------*/ To see output from the commands go to the Stata results. Links in this page Mean income with different design assumptions Subgroup lone parents Poststratification to match Scottish totals Jacknife estimation for mean /*------------------------------------------------------------ Analyse FRS data in Stata Exemplar 1 ex1.do version of 25/9/04 first get the simple mean and confidence interval of the income (unweighted) ----------------------------------------------------------*/ ci hhinc /*----------------------------------------------------------- now set up the survey description with clustering and weighting and run the svydes to get a description of it If you want to see the contents of the data file go to the main Stata window and to >window >data editor --------------------------------------------------------------*/ svyset psu [pwei=gross2] svydes /*-------------------------------------------------------------- This design is now part of your data file and if you save your data file it will be saved with it. Now use the svymean procedure to get the weighted mean with a standard error appropriate for clustered data HOW MANY PSUs ARE IN THIS DATA SET -------------------------------------------------------*/ back to top svy: mean hhinc /*----------------------------------------------------- etstat is used to get design effects ------------------------------------------------------*/ estat effects /*---------------------------------------------------- We will now look at what the standard error for the mean income would have been for a weighted survey but with no clustering. Set a new design with just weighting and no clustering notice how many PSUs it gives you NOW ------------------------------------------------------*/ svyset [pwei=gross2] svydes svy: mean hhinc /*--------------------------------------- now testing the wrong kind of weights see Stata's help for weights for a good explanation ------------------------------------------------------*/ ci hhinc [aweight=gross2] ci hhinc [fweight=gross2] /*------------------------------------------------------------ Now clustering but no weighting WHICH HAs THE LARGER EFFECT ON THE DESIGN EFFECT CLUSTERING OR grossING? -----------------------------------------------------*/ svyset psu svydes svy: mean hhinc back to top /*------------------------------------------- Now subgroups First for the survey data with weighting only and looking at the subgroup of single parents ---------------------------------------------------------------*/ gen lonep=(adulth==1 & depchldh>0) svyset[pweight=gross2] svy, subpop(lonep): mean hhinc estat effects, srssubpop /*-------------------------------------------------------- now add clustering ---------------------------------------------------*/ svyset psu svy, subpop(lonep): mean hhinc estat effects, srssubpop /*------------------------------------------------ notice that we have used the subpopulation option of a simple random sample within subpopulations here, to compare our results with what we would have found for a simple random sample of 334 lone parents. See exemplar 3 for a discussion of this. Without this option we would have found different results that relate to the whole survey as a SRS. -------------------------------------------------*/ estat effects back to top /*------------------------------------------------- This survey was weighted and clustered and poststratified at the UK level by tenure and council tax band. To poststratify at the Scotland level we need to get and supply to Stata the additioanl data for the population totals of these variables. This is done by the following code. -------- first council tax band percentages -------*/ gen cttot=0 replace cttot= 24.83 if ctband==1 replace cttot= 24.62 if ctband==2 replace cttot= 15.45 if ctband==3 replace cttot=11.91 if ctband==4 replace cttot=11.96 if ctband==5 replace cttot=5.95 if ctband==6 replace cttot=3.94 if ctband==7 replace cttot=0.45 if ctband==8 replace cttot=0.89 if ctband==9 /*-------------- and tenure tyes--------------------*/ gen tentot=0 replace tentot= 62.63 if tenure==1 replace tentot= 21.59 if tenure==2 replace tentot= 5.58 if tenure==3 replace tentot= 10.20 if tenure==4 /*---------------- rescale to scottish houisehold totals-------*/ replace cttot=cttot/100*2242012 replace tentot=tentot/100*2242012 /*---------------------------------------------------------- The survey was matched to both totals for the UK by a method called raking. Stata cannot do this yet, but we can look at post-stratification by each factor separately --------------------- first by tenure -------------------------*/ svyset psu[pwei=gross2], poststrata(tenure) postweight(tentot) svy: mean hhinc svyset psu[pwei=gross2], poststrata(ctband) postweight(cttot) svy: mean hhinc back to top /*---------------------------------------------------------- poststratified estimators don't work with the estat effects command, but we can see that both give lower standard errors than before. In Stata version 8 it was necessary to use the contributed library with replication weights to analyse post-startified data correctly. In version 9 we can get these methods as part of the svy commands. We can check that they give results very close to the method used in Stata 9 for post-stratification. --------------------------------------------------------------*/ svyset psu[pwei=gross2], poststrata(tenure) postweight(tentot) vce(jackknife) svy: mean hhinc svyset psu[pwei=gross2], poststrata(ctband) postweight(cttot) vce(jackknife) svy: mean hhinc