/*------------------------------------------------------------ Analyse FRS data in Stata Exemplar 1 ex1.do version 8 code 25/9/04 HOW TO USE THIS FILE To submit commands highlight the line you want to submit and go to >Do on the >Tools menu The results will then appear in the main Stata window Any text between a /* and a */ is treated as a comment. At the Stata command window typing "help do" will give more information and "help help" gives very useful information on viewing results HIGHLIGHT AND do ONE LINE AT A TIME STARTING AFTER THE comments first get the simple mean and confidence interval of the income (unweighted) ----------------------------------------------------------*/ ci hhinc /*----------------------------------------------------------- now set up the survey description with clustering and weighting and run the svydes to get a description of it If you want to see the contents of the data file go to the main Stata window and to >window >data editor --------------------------------------------------------------*/ svyset [pwei=gross2],psu(psu) svydes /*-------------------------------------------------------------- This design is now part of your data file and if you save your data file it will be saved with it. Now use the svymean procedure to get the weighted mean with a standard error appropriate for clustered data HOW MANY PSUs ARE IN THIS DATA SET ---------------------------------------------------------------*/ svymean hhinc /*---------------------------------------------------- We will now look at what the standard error for the mean income would have been for a weighted survey but with no clustering. First remove all the survey design featuresand then set a new design with just weighting and no clustering notice how many PSUs it gives you NOW ----------------------------------------------------------------*/ svyset, clear(all) svyset [pwei=gross2] svydes svymean hhinc /*--------------------------------------- now testing the wrong kind of weights see Stata's help for weights for a good explanation ------------------------------------------------------*/ ci hhinc [aweight=gross2] ci hhinc [fweight=gross2] /*------------------------------------------------------------ Now clustering but no weighting WHICH HAs THE LARGER EFFECT ON THE DESIGN EFFECT CLUSTERING OR gross2ING? ----------------------------------------------------------------*/ svyset, clear(all) svyset ,psu(psu) svydes svymean hhinc /*------------------------------------------- Now subgroups First for the survey data with weighting only and looking at the subgroup of single parents ---------------------------------------------------------------*/ gen lonep=(adulth==1 & depchldh>0) svyset[pweight=gross2] , clear(strata psu ) svymean hhinc, subpop(lonep)srssubpop /*-------------------------------------------------------- now add clustering ---------------------------------------------------*/ svyset, psu(psu) clear(strata) svymean hhinc, subpop(lonep)srssubpop /*------------------------------------------------ notice that we have used the subpopulation option of a simple random sample within subpopulations. If we had missed this off we would have got a different answer See exemplar 3 for a discussion of this. Here it would have given a different answer -------------------------------------------------*/ svymean hhinc, subpop(lonep) /*------------------------------------------------- This survey was weighted and clustered and poststratified at the UK level. To poststratify at the Scotland level we need to get the additioanl data Go to the data window in Stata and check how the variable CTBAND relates to the PSUs Now define the survey as though it were STRATIFIED by council tax band Look at how many PSUs you have now ---------------------------------------------------*/ svyset [pwei=gross2],psu(psu) strata(ctband) svymean hhinc /*------------------------------------------------ You should find that Stata has made lots of more PSUs by dividing each PSU into subgroups according to CTBAND. Of course this is wrong, because of the post-stratification we need to use the SVR library ofr Stata that uses replication methods. The code for this is in the second file for this exemplar ex1rep.do ------------------------------------------------------------*/ /*--------------------------------------------------- Analyses exemplar 1 with replication methods YOU WILL NEED ONE OF THE LARGER VERSIONS OF Stata (SE or INTERCOOLED) to run these replication analyses HOW TO USE THIS FILE To submit commands highlight the line you want to submit and go to >Do on the >Tools menu The results will then appear in the main Stata window Any text between a /* and a */ is treated as a comment. At the Stata command window typing "help do" will give more information and "help help" gives very useful information on viewing results first get the simple mean and confidence interval of the income (unweighted) First step is to install the svr library You need to be connected to the internet for this and it will take a few minutes, but you only need to do it once. ------------------------------------------------------------------------------------------------------*/ ssc install svr /*-------------------------------------------------------------------------------------------------------- When it installs you can use the help for the commands for svr to find out about it If you have already opened the data set by clicking on the web page you should save it (file menu >save) empty the workspace with the command drop _all You need to do this before you increase the memory to run this analysis ------------------------------------------------------------------------------------------------------*/ clear set matsize 800 set memory 500M /*--------- now reopen your saved data file----------------------- first changing directory to whee your file is located ---------------------------------------------------------------------------------------*/ cd "C:\Documents and Settings\gillian raab\My Documents\aprojects\peas\ex1datafiles\exemp1\data" use ex1,clear /*----------------------------------------------------------------------------------------------- the next bit of code adds the Scotland totals by council tax band and tenure type so they can be used to rake the sample to match both of these --------------------------------------------------------------------------------------------------*/ gen cttot=0 replace cttot= 24.83 if ctband==1 replace cttot= 24.62 if ctband==2 replace cttot= 15.45 if ctband==3 replace cttot=11.91 if ctband==4 replace cttot=11.96 if ctband==5 replace cttot=5.95 if ctband==6 replace cttot=3.94 if ctband==7 replace cttot=0.45 if ctband==8 replace cttot=0.89 if ctband==9 gen tentot=0 replace tentot= 62.63 if tenure==1 replace tentot= 21.59 if tenure==2 replace tentot= 5.58 if tenure==3 replace tentot= 10.20 if tenure==4 replace cttot=cttot/100*2242012 replace tentot=tentot/100*2242012 /*------------------- ------------------------------------------------------------------ and make a set of jacknife weights for this survey This next command will create 320 new variables (one for each replicate) where one of the 481 PSUs is dropped from each replication. Look at the data to check this ----------------------------------------------------------------------------------------------------------*/ survwgt create jk1, psu(psu) weight(gross2) /*---------------- now use the survey replication commands--------------------------------*/ survwgt rake [all] , by(ctband tenure) totvars( cttot tentot) replace svrmean hhinc