--------------------------------------------------------------------------- log: C:\Documents and Settings\All Users\Documents\My Documents\pea > s\ex1datafiles\results\Statalog.log log type: text opened on: 17 Jul 2006, 01:42:32 . do "C:\DOCUME~1\Gillian\LOCALS~1\Temp\STD04000000.tmp" . /*------------------------------------------------------------ > Analyse FRS data in Stata Exemplar 1 > ex1.do version 9 July 06 > > HOW TO USE THIS FILE > To submit commands highlight the line you want to submit and > go to >Do on the >Tools menu > The results will then appear in the main Stata window > > Any text between a /* and a */ is treated as a comment. > > At the Stata command window typing "help do" will give more information > and "help help" gives very useful information on viewing results > > HIGHLIGHT AND do ONE LINE AT A TIME STARTING AFTER THE comments > > first get the simple mean and confidence interval of > the income (unweighted) > ----------------------------------------------------------*/ . . ci hhinc Variable | Obs Mean Std. Err. [95% Conf. Interva > l] -------------+------------------------------------------------------------- > -- hhinc | 4695 470.8643 6.022048 459.0583 482.67 > 04 . . /*----------------------------------------------------------- > now set up the survey description with clustering and weighting > and run the svydes to get a description of it > > If you want to see the contents of the data file go to the main > Stata window and to >window >data editor > --------------------------------------------------------------*/ . . svyset psu [pwei=gross2] pweight: gross2 VCE: linearized Strata 1: SU 1: psu FPC 1: . svydes Survey: Describing stage 1 sampling units pweight: gross2 VCE: linearized Strata 1: SU 1: psu FPC 1: #Obs per Unit ---------------------------- Stratum #Units #Obs min mean max -------- -------- -------- -------- -------- -------- 1 320 4695 3 14.7 23 -------- -------- -------- -------- -------- -------- 1 320 4695 3 14.7 23 . . /*-------------------------------------------------------------- > This design is now part of your data file and if you save > your data file it will be saved with it. > > Now use the svy: mean procedure to get the weighted mean > with a standard error appropriate for clustered data > > HOW MANY PSUs ARE IN THIS DATA SET? > ---------------------------------------------------------------*/ . . svy: mean hhinc (running mean on estimation sample) Survey: Mean estimation Number of strata = 1 Number of obs = 4695 Number of PSUs = 320 Population size = 2.2e+06 Design df = 319 -------------------------------------------------------------- | Linearized | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ hhinc | 483.0913 10.63943 462.159 504.0236 -------------------------------------------------------------- . /*------------------------------------------------------------ > etstat is used to get design effects > ----------------------------------------------------------------*/ . . estat effects ---------------------------------------------------------- | Linearized | Mean Std. Err. Deff Deft -------------+-------------------------------------------- hhinc | 483.0913 10.63943 2.90045 1.70307 ---------------------------------------------------------- . . /*---------------------------------------------------- > We will now look at what the standard error for the mean income > would have been for a weighted survey but with no clustering. > > First remove all the survey design featuresand > then set a new design with just weighting and no clustering > > notice how many PSUs it gives you NOW > ----------------------------------------------------------------*/ . svyset [pwei=gross2] pweight: gross2 VCE: linearized Strata 1: SU 1: FPC 1: . svydes Survey: Describing stage 1 sampling units pweight: gross2 VCE: linearized Strata 1: SU 1: FPC 1: #Obs per Unit ---------------------------- Stratum #Units #Obs min mean max -------- -------- -------- -------- -------- -------- 1 4695 4695 1 1.0 1 -------- -------- -------- -------- -------- -------- 1 4695 4695 1 1.0 1 . svy: mean hhinc (running mean on estimation sample) Survey: Mean estimation Number of strata = 1 Number of obs = 4695 Number of PSUs = 4695 Population size = 2.2e+06 Design df = 4694 -------------------------------------------------------------- | Linearized | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ hhinc | 483.0913 7.877496 467.6477 498.5349 -------------------------------------------------------------- . estat effects ---------------------------------------------------------- | Linearized | Mean Std. Err. Deff Deft -------------+-------------------------------------------- hhinc | 483.0913 7.877496 1.59003 1.26096 ---------------------------------------------------------- . . /*--------------------------------------- > now testing the wrong kind of weights > see Stata's help for weights for a good explanation > ------------------------------------------------------*/ . ci hhinc [aweight=gross2] Variable | Obs Mean Std. Err. [95% Conf. Interva > l] -------------+------------------------------------------------------------- > -- hhinc | 4695 483.0913 6.2472 470.8438 495.33 > 87 . ci hhinc [fweight=gross2] Variable | Obs Mean Std. Err. [95% Conf. Interva > l] -------------+------------------------------------------------------------- > -- hhinc | 2236979 483.0913 .2861713 482.5304 483.65 > 22 . /*------------------------------------------------------------ > Now clustering but no weighting > > WHICH HAs THE LARGER EFFECT ON THE DESIGN EFFECT > CLUSTERING OR grossING? > ----------------------------------------------------------------*/ . svyset psu pweight: VCE: linearized Strata 1: SU 1: psu FPC 1: . svydes Survey: Describing stage 1 sampling units pweight: VCE: linearized Strata 1: SU 1: psu FPC 1: #Obs per Unit ---------------------------- Stratum #Units #Obs min mean max -------- -------- -------- -------- -------- -------- 1 320 4695 3 14.7 23 -------- -------- -------- -------- -------- -------- 1 320 4695 3 14.7 23 . svy: mean hhinc (running mean on estimation sample) Survey: Mean estimation Number of strata = 1 Number of obs = 4695 Number of PSUs = 320 Population size = 4695 Design df = 319 -------------------------------------------------------------- | Linearized | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ hhinc | 470.8643 8.907628 453.3392 488.3894 -------------------------------------------------------------- . estat effects ---------------------------------------------------------- | Linearized | Mean Std. Err. Deff Deft -------------+-------------------------------------------- hhinc | 470.8643 8.907628 2.18794 1.47917 ---------------------------------------------------------- . /*----------------------------------------------------------- > Now subgroups > > First for the survey data with weighting only > and looking at the subgroup of single parents > ---------------------------------------------------------------*/ . gen lonep=(adulth==1 & depchldh>0) lonep already defined r(110); end of do-file r(110); . do "C:\DOCUME~1\Gillian\LOCALS~1\Temp\STD04000000.tmp" . svyset[pweight=gross2] pweight: gross2 VCE: linearized Strata 1: SU 1: FPC 1: . . svy, subpop(lonep): mean hhinc (running mean on estimation sample) Survey: Mean estimation Number of strata = 1 Number of obs = 4695 Number of PSUs = 4695 Population size = 2.2e+06 Subpop. no. obs = 334 Subpop. size = 131801 Design df = 4694 -------------------------------------------------------------- | Linearized | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ hhinc | 276.5555 8.212478 260.4552 292.6558 -------------------------------------------------------------- . estat effects, srssubpop ---------------------------------------------------------- | Linearized | Mean Std. Err. Deff Deft -------------+-------------------------------------------- hhinc | 276.5555 8.212478 .941584 .970352 ---------------------------------------------------------- . /*-------------------------------------------------------- > now add clustering > ---------------------------------------------------*/ . svyset psu pweight: VCE: linearized Strata 1: SU 1: psu FPC 1: . svy, subpop(lonep): mean hhinc (running mean on estimation sample) Survey: Mean estimation Number of strata = 1 Number of obs = 4695 Number of PSUs = 320 Population size = 4695 Subpop. no. obs = 334 Subpop. size = 334 Design df = 319 -------------------------------------------------------------- | Linearized | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ hhinc | 281.4162 9.255717 263.2062 299.6261 -------------------------------------------------------------- . estat effects, srssubpop ---------------------------------------------------------- | Linearized | Mean Std. Err. Deff Deft -------------+-------------------------------------------- hhinc | 281.4162 9.255717 1.07308 1.0359 ---------------------------------------------------------- . /*------------------------------------------------ > notice that we have used the subpopulation > option of a simple random sample within > subpopulations here, to compare our results with > what we would have found for a simple random sample of > 334 lone parents. > See exemplar 3 for a discussion of this. > > Without this option we would have found different > results that relate to the whole survey as a SRS. > -------------------------------------------------*/ . estat effects ---------------------------------------------------------- | Linearized | Mean Std. Err. Deff Deft -------------+-------------------------------------------- hhinc | 281.4162 9.255717 1.07608 1.03734 ---------------------------------------------------------- . /*------------------------------------------------- > This survey was weighted and clustered and > poststratified at the UK level by tenure and council tax > band. To poststratify at the Scotland level we need to > get and supply to Stata the additioanl data for > the population totals of these variables. This is done > by the following code. > -------- first council tax band percentages -------*/ . end of do-file . do "C:\DOCUME~1\Gillian\LOCALS~1\Temp\STD04000000.tmp" . /*---------------------------------------------------------- > The survey was matched to both totals for the UK by > a method called raking. > Stata cannot do this yet, but we can look at post-stratification > by each factor separately > --------------------- first by tenure -------------------------*/ . svyset psu[pwei=gross2], poststrata(tenure) postweight(tentot) pweight: gross2 VCE: linearized Poststrata: tenure Postweight: tentot Strata 1: SU 1: psu FPC 1: . svy: mean hhinc (running mean on estimation sample) Survey: Mean estimation Number of strata = 1 Number of obs = 4695 Number of PSUs = 320 Population size = 2.2e+06 N. of poststrata = 4 Design df = 319 -------------------------------------------------------------- | Linearized | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ hhinc | 477.8756 8.728426 460.703 495.0481 -------------------------------------------------------------- . svyset psu[pwei=gross2], poststrata(ctband) postweight(cttot) pweight: gross2 VCE: linearized Poststrata: ctband Postweight: cttot Strata 1: SU 1: psu FPC 1: . svy: mean hhinc (running mean on estimation sample) Survey: Mean estimation Number of strata = 1 Number of obs = 4695 Number of PSUs = 320 Population size = 2.2e+06 N. of poststrata = 9 Design df = 319 -------------------------------------------------------------- | Linearized | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ hhinc | 477.6088 7.739039 462.3828 492.8348 -------------------------------------------------------------- . /*---------------------------------------------------------- > poststratified estimators don't work with the > estat effects command, but we can see that both give lower > standard errors than before. > > In Stata version 8 it was necessary to use the contributed > library with replication weights to analyse post-startified > data correctly. > In version 9 we can get these methods as part of the svy commands. > We can check that they give results very close to the method > used in Stata 9 for post-stratification. > --------------------------------------------------------------*/ . svyset psu[pwei=gross2], poststrata(tenure) postweight(tentot) vce(jackkn > ife) pweight: gross2 VCE: jackknife MSE: off Poststrata: tenure Postweight: tentot Strata 1: SU 1: psu FPC 1: . svy: mean hhinc (running mean on estimation sample) Jackknife replications (320) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200 .................................................. 250 .................................................. 300 .................... Survey: Mean estimation Number of strata = 1 Number of obs = 4695 Number of PSUs = 320 Population size = 2.2e+06 N. of poststrata = 4 Replications = 320 Design df = 319 -------------------------------------------------------------- | Jackknife | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ hhinc | 477.8756 8.75012 460.6604 495.0908 -------------------------------------------------------------- . svyset psu[pwei=gross2], poststrata(ctband) postweight(cttot) vce(jackkni > fe) pweight: gross2 VCE: jackknife MSE: off Poststrata: ctband Postweight: cttot Strata 1: SU 1: psu FPC 1: . svy: mean hhinc (running mean on estimation sample) Jackknife replications (320) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 .................................................. 150 .................................................. 200 .................................................. 250 .................................................. 300 .................... Survey: Mean estimation Number of strata = 1 Number of obs = 4695 Number of PSUs = 320 Population size = 2.2e+06 N. of poststrata = 9 Replications = 320 Design df = 319 -------------------------------------------------------------- | Jackknife | Mean Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ hhinc | 477.6088 7.829009 462.2058 493.0118 -------------------------------------------------------------- . end of do-file . log close log: C:\Documents and Settings\All Users\Documents\My Documents\pea > s\ex1datafiles\results\Statalog.log log type: text closed on: 17 Jul 2006, 01:46:05 ---------------------------------------------------------------------------