/*------------------------------------------------------------
Analyse  FRS data in Stata Exemplar 1
ex1.do version 9 July 06

HOW TO USE THIS FILE
To submit commands highlight the line you want to submit and 
go to >Do  on the >Tools menu
The results will then appear in the main Stata window

Any text between a /* and  a */ is treated as a comment.

At the Stata command window typing "help do" will give more information 
   and   "help help" gives very useful information on viewing results

HIGHLIGHT AND do ONE LINE AT A TIME STARTING AFTER THE comments

first get the simple mean and confidence interval of 
the income (unweighted)
----------------------------------------------------------*/

ci hhinc

/*-----------------------------------------------------------
now set up the survey description with clustering and weighting
and run the svydes to get a description of it

If you want to see the contents of the data file go to the main 
Stata window and to >window >data editor
--------------------------------------------------------------*/

svyset psu [pwei=gross2]
svydes 

/*--------------------------------------------------------------
This design is now part of your data file and if you save
your data file it will be saved with it. 

Now use the svy: mean procedure to get the weighted mean
with a standard error appropriate for clustered data

HOW MANY PSUs ARE IN THIS DATA SET?
---------------------------------------------------------------*/

svy: mean hhinc
/*------------------------------------------------------------
etstat is used to get design effects
----------------------------------------------------------------*/

estat effects

/*----------------------------------------------------
We will now look at what the standard error for the mean income
would have been for a weighted survey but with no clustering.

First remove all the survey design featuresand 
then set a new design with just weighting and no clustering

notice how many PSUs it gives you NOW
----------------------------------------------------------------*/
svyset [pwei=gross2]
svydes
svy: mean hhinc
estat effects

/*---------------------------------------
now testing the wrong kind of weights
see Stata's help for weights for a good explanation
------------------------------------------------------*/
ci hhinc [aweight=gross2]
ci hhinc [fweight=gross2]
/*------------------------------------------------------------
Now clustering but no weighting

WHICH HAs THE LARGER EFFECT ON THE DESIGN EFFECT
CLUSTERING OR grossING?
----------------------------------------------------------------*/
svyset psu
svydes
svy: mean hhinc
estat effects
/*-----------------------------------------------------------
Now subgroups

First for the survey data with weighting only
and looking at the subgroup of single parents
---------------------------------------------------------------*/
gen lonep=(adulth==1 & depchldh>0)
svyset[pweight=gross2] 

svy, subpop(lonep): mean hhinc
estat effects, srssubpop
/*--------------------------------------------------------
now add clustering
---------------------------------------------------*/
svyset psu
svy, subpop(lonep): mean hhinc
estat effects, srssubpop
/*------------------------------------------------
notice that we have used the subpopulation 
option of a simple random sample within
subpopulations here, to compare our results with
what we would have found for a simple random sample of
334 lone parents.
See exemplar 3 for a discussion of this.

Without this option we would have found different 
results that relate to the whole survey as a SRS.
-------------------------------------------------*/
estat effects
/*-------------------------------------------------
This survey was weighted and clustered and 
poststratified at the UK level by tenure and council tax
band. To poststratify at the Scotland level we need to 
get and supply to Stata the additioanl data for 
the population totals of these variables. This is done 
by the following code.
-------- first council tax band percentages -------*/
gen cttot=0
replace cttot= 24.83 if ctband==1 
 replace cttot= 24.62  if ctband==2
replace cttot= 15.45 if ctband==3 
 replace cttot=11.91 if ctband==4 
replace cttot=11.96 if ctband==5 
replace cttot=5.95 if ctband==6  
 replace  cttot=3.94 if ctband==7 
 replace  cttot=0.45 if ctband==8 
replace cttot=0.89 if ctband==9 
/*-------------- and tenure tyes--------------------*/
gen tentot=0
replace tentot= 62.63 if tenure==1 
replace tentot=  21.59  if tenure==2 
replace tentot=  5.58  if tenure==3 
replace tentot= 10.20 if tenure==4 
/*---------------- rescale to scottish houisehold totals-------*/
replace cttot=cttot/100*2242012
replace tentot=tentot/100*2242012
/*----------------------------------------------------------
The survey was matched to both totals for the UK by
a method called raking.
Stata cannot do this yet, but we can look at post-stratification
by each factor separately
--------------------- first by tenure -------------------------*/
svyset psu[pwei=gross2], poststrata(tenure) postweight(tentot) 
svy: mean hhinc
svyset psu[pwei=gross2], poststrata(ctband) postweight(cttot) 
svy: mean hhinc
/*----------------------------------------------------------
poststratified estimators don't work with the 
estat effects command, but we can see that both give lower 
standard errors than before.

In Stata version 8 it was necessary to use the contributed
library with replication weights to analyse post-startified 
data correctly. 
In version 9 we can get these methods as part of the svy commands.
We can check that they give results very close to the method
used in Stata 9 for post-stratification.
--------------------------------------------------------------*/
svyset psu[pwei=gross2], poststrata(tenure) postweight(tentot) vce(jackknife)
svy: mean hhinc
svyset psu[pwei=gross2], poststrata(ctband) postweight(cttot) vce(jackknife)
svy: mean hhinc