/*------------------------------------------------------------
Analyse  FRS data in Stata Exemplar 1
ex1.do version 8 code 25/9/04

HOW TO USE THIS FILE
To submit commands highlight the line you want to submit and 
go to >Do  on the >Tools menu
The results will then appear in the main Stata window

Any text between a /* and  a */ is treated as a comment.

At the Stata command window typing "help do" will give more information 
   and   "help help" gives very useful information on viewing results

HIGHLIGHT AND do ONE LINE AT A TIME STARTING AFTER THE comments

first get the simple mean and confidence interval of 
the income (unweighted)
----------------------------------------------------------*/

ci hhinc

/*-----------------------------------------------------------
now set up the survey description with clustering and weighting
and run the svydes to get a description of it

If you want to see the contents of the data file go to the main 
Stata window and to >window >data editor
--------------------------------------------------------------*/

svyset [pwei=gross2],psu(psu)
svydes

/*--------------------------------------------------------------
This design is now part of your data file and if you save
your data file it will be saved with it. 

Now use the svymean procedure to get the weighted mean
with a standard error appropriate for clustered data

HOW MANY PSUs ARE IN THIS DATA SET
---------------------------------------------------------------*/

svymean hhinc

/*----------------------------------------------------
We will now look at what the standard error for the mean income
would have been for a weighted survey but with no clustering.

First remove all the survey design featuresand 
then set a new design with just weighting and no clustering

notice how many PSUs it gives you NOW
----------------------------------------------------------------*/

svyset, clear(all)
svyset [pwei=gross2]
svydes
svymean hhinc

/*---------------------------------------
now testing the wrong kind of weights
see Stata's help for weights for a good explanation
------------------------------------------------------*/
ci hhinc [aweight=gross2]
ci hhinc [fweight=gross2]
/*------------------------------------------------------------
Now clustering but no weighting

WHICH HAs THE LARGER EFFECT ON THE DESIGN EFFECT
CLUSTERING OR gross2ING?
----------------------------------------------------------------*/

svyset, clear(all)
svyset ,psu(psu)
svydes
svymean hhinc
/*-------------------------------------------
Now subgroups

First for the survey data with weighting only
and looking at the subgroup of single parents
---------------------------------------------------------------*/
gen lonep=(adulth==1 & depchldh>0)
svyset[pweight=gross2] , clear(strata psu )

svymean hhinc, subpop(lonep)srssubpop
/*--------------------------------------------------------
now add clustering
---------------------------------------------------*/
svyset, psu(psu) clear(strata)

svymean hhinc, subpop(lonep)srssubpop
/*------------------------------------------------
notice that we have used the subpopulation 
option of a simple random sample within
subpopulations. If we had missed this off we 
would have got a different answer
See exemplar 3 for a discussion of this.

Here it would have given a different answer
-------------------------------------------------*/
svymean hhinc, subpop(lonep)
/*-------------------------------------------------
This survey was weighted and clustered and 
poststratified at the UK level. To poststratify 
at the Scotland level we need to get the additioanl data


Go to the data window in Stata and check how
the variable CTBAND relates to the PSUs

Now define the survey as though it were STRATIFIED
by council tax band

Look at how many PSUs you have now
---------------------------------------------------*/
svyset [pwei=gross2],psu(psu) strata(ctband)
svymean hhinc
/*------------------------------------------------
You should find that Stata has made lots of more
PSUs by dividing each PSU into subgroups according
to CTBAND.

Of course this is wrong, because of the post-stratification 
we need to use the SVR library ofr Stata that uses
replication methods.

The code for this is in the second file for this exemplar
ex1rep.do
------------------------------------------------------------*/

/*---------------------------------------------------
           
Analyses exemplar 1 with replication methods
YOU WILL NEED ONE OF THE LARGER VERSIONS OF 
Stata  (SE or INTERCOOLED) to run these 
replication analyses


HOW TO USE THIS FILE
To submit commands highlight the line you want to submit and 
go to >Do  on the >Tools menu
The results will then appear in the main Stata window

Any text between a /* and  a */ is treated as a comment.

At the Stata command window typing "help do" will give more information 
   and   "help help" gives very useful information on viewing results

first get the simple mean and confidence interval of 
the income (unweighted)

First step is to install the svr library
You need to be connected to the internet for this
and it will take a few minutes, but you only need to do it once.
------------------------------------------------------------------------------------------------------*/
ssc install svr
/*--------------------------------------------------------------------------------------------------------
When it installs you can use the help for the commands for svr
to find out about it

If you have already opened the data set by clicking on the web page
you should  save it  (file menu >save) empty the workspace with the command
drop _all
You need to do this before you increase the memory to run this analysis
------------------------------------------------------------------------------------------------------*/
clear
set matsize 800
set memory 500M
/*--------- now reopen your saved data file-----------------------
first changing directory to whee  your file is located
---------------------------------------------------------------------------------------*/
 cd  "C:\Documents and Settings\gillian raab\My Documents\aprojects\peas\ex1datafiles\exemp1\data" 
use ex1,clear
/*-----------------------------------------------------------------------------------------------
the next bit of code adds the Scotland totals by council tax band and 
tenure type so they can be used to rake the sample to match both of these
--------------------------------------------------------------------------------------------------*/
gen cttot=0
replace cttot= 24.83 if ctband==1 
 replace cttot= 24.62  if ctband==2
replace cttot= 15.45 if ctband==3 
 replace cttot=11.91 if ctband==4 
replace cttot=11.96 if ctband==5 
replace cttot=5.95 if ctband==6  
 replace  cttot=3.94 if ctband==7 
 replace  cttot=0.45 if ctband==8 
replace cttot=0.89 if ctband==9 
gen tentot=0
replace tentot= 62.63 if tenure==1 
replace tentot=  21.59  if tenure==2 
replace tentot=  5.58  if tenure==3 
replace tentot= 10.20 if tenure==4 
replace cttot=cttot/100*2242012
replace tentot=tentot/100*2242012



/*------------------- ------------------------------------------------------------------
and make a set of jacknife weights for this survey
This next command will create 320 new variables (one for each replicate) where one
of the 481 PSUs is dropped from each replication. Look at the data to check this
----------------------------------------------------------------------------------------------------------*/
survwgt create jk1, psu(psu) weight(gross2)
/*---------------- now use the survey replication commands--------------------------------*/
survwgt rake [all] , by(ctband tenure) totvars( cttot tentot) replace

svrmean hhinc