Stata OUTPUT FOR EXEMPLAR 1

HTML version of Stata do file can be viewed here

Links in this page

Mean income with different design assumptions
Subgroup lone parents
Raking to match Scottish totals 
Jacknife estimation for mean

Stata output is in green Commands are in white

Comments are in blue


   > first get the simple mean and confidence interval of
   > the income (unweighted)
   > ----------------------------------------------------------*/
   .
   . ci hhinc

       Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
   -------------+---------------------------------------------------------------
          hhinc |       4695    470.8643    6.022048        459.0583    482.6704

   .
 back to top

   .
   . svyset [pwei=gross2],psu(psu)
   pweight is gross2
   psu is psu

   . svydes

   pweight:  gross2
   Strata:   
   PSU:      psu
                                         #Obs per PSU
    Strata                       ----------------------------
          #PSUs     #Obs       min      mean       max
   --------  --------  --------  --------  --------  --------
          1       320      4695         3      14.7        23
   --------  --------  --------  --------  --------  --------
          1       320      4695         3      14.7        23

   .
   .
   . svymean hhinc

   Survey mean estimation

   pweight:  gross2                                  Number of obs    =      4695
   Strata:                                      Number of strata =         1
   PSU:      psu                                     Number of PSUs   =       320
                                                     Population size  =   2236979

   ------------------------------------------------------------------------------
       Mean |   Estimate    Std. Err.   [95% Conf. Interval]        Deff
   ---------+--------------------------------------------------------------------
      hhinc |   483.0913    10.63943     462.159    504.0236    2.900452
   ------------------------------------------------------------------------------

   .
   . svyset, clear(all)
   no variables are set

   . svyset [pwei=gross2]
   pweight is gross2

   . svydes

   pweight:  gross2
   Strata:   
   PSU:      
                                         #Obs per PSU
    Strata                       ----------------------------
          #PSUs     #Obs       min      mean       max
   --------  --------  --------  --------  --------  --------
          1      4695      4695         1       1.0         1
   --------  --------  --------  --------  --------  --------
          1      4695      4695         1       1.0         1

   . svymean hhinc

   Survey mean estimation

   pweight:  gross2                                  Number of obs    =      4695
   Strata:                                      Number of strata =         1
   PSU:                                Number of PSUs   =      4695
                                                     Population size  =   2236979

   ------------------------------------------------------------------------------
       Mean |   Estimate    Std. Err.   [95% Conf. Interval]        Deff
   ---------+--------------------------------------------------------------------
      hhinc |   483.0913    7.877496    467.6477    498.5349    1.590031
   ------------------------------------------------------------------------------

   .
   . /*---------------------------------------
   > now testing the wrong kind of weights
   > see Stata's help for weights for a good explanation
   > ------------------------------------------------------*/
   . ci hhinc [aweight=gross2]

       Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
   -------------+---------------------------------------------------------------
          hhinc |       4695    483.0913      6.2472        470.8438    495.3387

   . ci hhinc [fweight=gross2]

       Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
   -------------+---------------------------------------------------------------
          hhinc |    2236979    483.0913    .2861713        482.5304    483.6522

   . /*------------------------------------------------------------
   > Now clustering but no weighting
   >
   > WHICH HAs THE LARGER EFFECT ON THE DESIGN EFFECT
   > CLUSTERING OR gross2ING?
   > ----------------------------------------------------------------*/
   .
   . svyset, clear(all)
   no variables are set

   . svyset ,psu(psu)
   psu is psu

   . svydes

   pweight:  
   Strata:   
   PSU:      psu
                                         #Obs per PSU
    Strata                       ----------------------------
          #PSUs     #Obs       min      mean       max
   --------  --------  --------  --------  --------  --------
          1       320      4695         3      14.7        23
   --------  --------  --------  --------  --------  --------
          1       320      4695         3      14.7        23

   . svymean hhinc

   Survey mean estimation

   pweight:                                    Number of obs    =      4695
   Strata:                                      Number of strata =         1
   PSU:      psu                                     Number of PSUs   =       320
                                                     Population size  =      4695

   ------------------------------------------------------------------------------
       Mean |   Estimate    Std. Err.   [95% Conf. Interval]        Deff
   ---------+--------------------------------------------------------------------
      hhinc |   470.8643    8.907628    453.3392    488.3894    2.187942
   ------------------------------------------------------------------------------
 
 back to top
   . /*-------------------------------------------
   > Now subgroups
   >
   > First for the survey data with weighting only
   > and looking at the subgroup of single parents
    ---------------------------------------------------------------*/

   . gen lonep=(adulth==1 & depchldh>0)

   . svyset[pweight=gross2] , clear(strata psu )
   pweight is gross2

   .  svymean hhinc if lonep == 1 , available

   Survey mean estimation

   pweight:  gross2                                  Number of obs    =       334
   Strata:                                      Number of strata =         1
   PSU:                                Number of PSUs   =       334
                                                     Population size  =    131801

   ------------------------------------------------------------------------------
       Mean |   Estimate    Std. Err.   [95% Conf. Interval]        Deff
   ---------+--------------------------------------------------------------------
      hhinc |   276.5555    8.223924    260.3781    292.7329      .94421
   ------------------------------------------------------------------------------

   . /*--------------------------------------------------------
   > now add clustering
   > ---------------------------------------------------*/
   . svyset, psu(psu) clear(strata)
   pweight is gross2
   psu is psu

   . svymean hhinc if lonep==1, available

   Survey mean estimation

   pweight:  gross2                                  Number of obs    =       334
   Strata:                                      Number of strata =         1
   PSU:      psu                                     Number of PSUs   =       196
                                                     Population size  =    131801

   ------------------------------------------------------------------------------
       Mean |   Estimate    Std. Err.   [95% Conf. Interval]        Deff
   ---------+--------------------------------------------------------------------
      hhinc |   276.5555    8.503837    259.7842    293.3268    1.009579
   ------------------------------------------------------------------------------

   .
   . /*-------------------------------------------------
   > This survey was weighted and clustered and
   > poststratified at the UK level. To poststratify
   > at the Scotland level we need to get the additioanl data
   >
   >
   > Go to the data window in Stata and check how
   > the variable CTBAND relates to the PSUs
   >
   > Now define the survey as though it were STRATIFIED
   > by council tax band
   >
   > Look at how many PSUs you have now
   > ---------------------------------------------------*/
   . svyset [pwei=gross2],psu(psu) strata(ctband)
   pweight is gross2
   strata is ctband
   psu is psu

   . svymean hhinc

   Survey mean estimation

   pweight:  gross2                                  Number of obs    =      4695
   Strata:   ctband                                  Number of strata =         9
   PSU:      psu                                     Number of PSUs   =      1572
                                                     Population size  =   2236979

   ------------------------------------------------------------------------------
       Mean |   Estimate    Std. Err.   [95% Conf. Interval]        Deff
   ---------+--------------------------------------------------------------------
      hhinc |   483.0913    8.193814    467.0193    499.1633    1.720289
   ------------------------------------------------------------------------------

   . /*------------------------------------------------
   > You should find that Stata has made lots of more
   > PSUs by dividing each PSU into subgroups according
   > to CTBAND.


 back to top
   >
   > Of course this is wrong, because of the post-stratification
   > we need to use the SVR library ofr Stata that uses
   > replication methods.
> ------------------------------------------------------------*/
   .
   . /*---------------------------------------------------
       > First step is to install the svr library
  > ---------------------------------------------------------------------*/
   . ssc install svr
   checking svr consistency and verifying not already installed...
   all files already exist and are up-to-date.

   . clear

   . set matsize 800

   Current memory allocation

                       current                                 memory usage
       settable          value     description                 (1M = 1024k)
       --------------------------------------------------------------------
       set maxvar         5000     max. variables allowed           1.733M
       set memory           10M    max. data space                 10.000M
       set matsize         800     max. RHS vars in models          4.950M
                                                               -----------
                                                                   16.682M

   . set memory 500M

   Current memory allocation

                       current                                 memory usage
       settable          value     description                 (1M = 1024k)
       --------------------------------------------------------------------
       set maxvar         5000     max. variables allowed           1.733M
       set memory          500M    max. data space                500.000M
       set matsize         800     max. RHS vars in models          4.950M
                                                               -----------
                                                                  506.682M

   . /*-----------------------------------------------------------------------------------------------
   > the next bit of code adds the Scotland totals by council tax band and
   > tenure type so they can be used to rake the sample to match both of these
   > --------------------------------------------------------------------------------------------------*
   . replace cttot= 24.83 if ctband==1
   (1023 real changes made)

   .  replace cttot= 24.62  if ctband==2
   (1139 real changes made)

   . replace cttot= 15.45 if ctband==3
   (801 real changes made)

   .  replace cttot=11.91 if ctband==4
   (679 real changes made)

   . replace cttot=11.96 if ctband==5
   (528 real changes made)

   . replace cttot=5.95 if ctband==6
   (288 real changes made)

   .  replace  cttot=3.94 if ctband==7
   (179 real changes made)

   .  replace  cttot=0.45 if ctband==8
   (18 real changes made)

   . replace cttot=0.89 if ctband==9
   (40 real changes made)

   . gen tentot=0

   . replace tentot= 62.63 if tenure==1
   (3008 real changes made)

   . replace tentot=  21.59  if tenure==2
   (1105 real changes made)

   . replace tentot=  5.58  if tenure==3
   (284 real changes made)
   .

 back to top
   . /*------------------- ------------------------------------------------------------------
   > and make a set of jacknife weights for this survey
   > This next command will create 320 new variables (one for each replicate) where one
   > of the 320 PSUs is dropped from each replication. Look at the data to check this
   > -------------------------------------------------------------------------------*/
   . survwgt create jk1, psu(psu) weight(gross2)
   Generating replicate weights.........................................................................
   > ...................................................................................................
   > ...................................................................................................
   > .................................................

   Created weights and set svr values:
      meth jk1
        pw gross2
        rw jk1_1 jk1_2 jk1_3 jk1_4 jk1_5 jk1_6 jk1_7 jk1_8 jk1_9 jk1_10 jk1_11 jk1_12 jk1_13 jk1_14
           jk1_15 jk1_16 jk1_17 jk1_18 jk1_19 jk1_20 jk1_21 jk1_22 jk1_23 jk1_24 jk1_25 jk1_26 jk1_27
           jk1_28 jk1_29 jk1_30 jk1_31 jk1_32 jk1_33 jk1_34 jk1_35 jk1_36 jk1_37 jk1_38 jk1_39 jk1_40
           jk1_41 jk1_42 jk1_43 jk1_44 jk1_45 jk1_46 jk1_47 jk1_48 jk1_49 jk1_50 jk1_51 jk1_52 jk1_53
           jk1_54 jk1_55 jk1_56 jk1_57 jk1_58 jk1_59 jk1_60 jk1_61 jk1_62 jk1_63 jk1_64 jk1_65 jk1_66

  lines missed
           jk1_281 jk1_282 jk1_283 jk1_284 jk1_285 jk1_286 jk1_287 jk1_288 jk1_289 jk1_290 jk1_291
           jk1_292 jk1_293 jk1_294 jk1_295 jk1_296 jk1_297 jk1_298 jk1_299 jk1_300 jk1_301 jk1_302
           jk1_303 jk1_304 jk1_305 jk1_306 jk1_307 jk1_308 jk1_309 jk1_310 jk1_311 jk1_312 jk1_313
           jk1_314 jk1_315 jk1_316 jk1_317 jk1_318 jk1_319 jk1_320
       dof 319
       fay 0
      psun 


   . /*---------------- now use the survey replication commands--------------------------------*/
   . survwgt rake [all] , by(ctband tenure) totvars( cttot tentot) replace

   SVR settings updated:
        pw gross2
        rw jk1_1 jk1_2 jk1_3 jk1_4 jk1_5 jk1_6 jk1_7 jk1_8 jk1_9 jk1_10 jk1_11 jk1_12 jk1_13 jk1_14
           jk1_15 jk1_16 jk1_17 jk1_18 jk1_19 jk1_20 jk1_21 jk1_22 jk1_23 jk1_24 jk1_25 jk1_26 jk1_27
           jk1_28 jk1_29 jk1_30 jk1_31 jk1_32 jk1_33 jk1_34 jk1_35 jk1_36 jk1_37 jk1_38 jk1_39 jk1_40
           jk1_41 jk1_42 jk1_43 jk1_44 jk1_45 jk1_46 jk1_47 jk1_48 jk1_49 jk1_50 jk1_51 jk1_52 jk1_53

k1_271 jk1_272 jk1_273 jk1_274 jk1_275 jk1_276 jk1_277 jk1_278 jk1_279 jk1_280
           jk1_281 jk1_282 jk1_283 jk1_284 jk1_285 jk1_286 jk1_287 jk1_288 jk1_289 jk1_290 jk1_291
           jk1_292 jk1_293 jk1_294 jk1_295 jk1_296 jk1_297 jk1_298 jk1_299 jk1_300 jk1_301 jk1_302
           jk1_303 jk1_304 jk1_305 jk1_306 jk1_307 jk1_308 jk1_309 jk1_310 jk1_311 jk1_312 jk1_313
           jk1_314 jk1_315 jk1_316 jk1_317 jk1_318 jk1_319 jk1_320


   .
   . svrmean hhinc

   Survey mean estimation, replication (jk1) variance method

   Analysis weight:      gross2                   Number of obs       =      4695
   Replicate weights:    jk1_1...                 Population size     =   2242012
   Number of replicates: 320                      Degrees of freedom  =       319

   ------------------------------------------------------------------------------
       Mean |   Estimate    Std. Err.   [95% Conf. Interval]        Deff
   ---------+--------------------------------------------------------------------
      hhinc |   475.9539    7.399009    461.3969    490.5109    1.422339
   ------------------------------------------------------------------------------

   .
   end of do-file