Stata commands in white,
output in green and yellow,
warnings are in red
Comments on interpretation of output are in blue.
For comments on running the analyses go to the commented code file.
. svyset [pwei=ind_wt],psu(psu) strata(stratum)
pweight is ind_wt strata is stratum psu is psu
. svyprop intuse back to top
---------------------------------------------------------------------
pweight: ind_wt Number of obs = 28685
Strata: stratum Number of strata = 281
PSU: psu Number of PSUs = 11937
Population size = 28642.333
---------------------------------------------------------------------
Here sum of weights add approximately to sample size, since weights
are not grossing up weights. Hence population total misleading, but
results are OK.
Survey proportions estimation
+-------------------------------------------+
| intuse Obs Est. Prop. Std. Err. |
|-------------------------------------------|
| 0 19823 0.658436 0.003405 |
| 1 8862 0.341564 0.003405 |
+-------------------------------------------+
. svymean intuse , ci deff
To get design effects, need to get mean of the 0/1 vraiable
Survey mean estimation
pweight: ind_wt Number of obs = 28685
Strata: stratum Number of strata = 281
PSU: psu Number of PSUs = 11937
Population size = 28642.333
----------------------------------------------------------------------
Mean | Estimate Std. Err. [95% Conf. Interval] Deff
-------+------------------------------------------------------------
intuse | .3415642 .0034046 .3348905 .3482378 1.478395
----------------------------------------------------------------------
. svymean intuse, deff deft by(sex) back to top
Survey mean estimation
pweight: ind_wt Number of obs = 28685
Strata: stratum Number of strata = 281
PSU: psu Number of PSUs = 11937
Population size = 28642.333
-----------------------------------------------------------------
Mean Subpop. | Estimate Std. Err. Deff Deft
---------------+-------------------------------------------------
intuse |
male | .3851593 .0051239 1.405104 1.185371
female | .3070535 .0043296 1.410531 1.187658
-----------------------------------------------------------------
. svymean rc5
stratum with only one PSU detected
This failure happens because restricting to internet users only gives
at least one stratum with a lonely PSU
. Next command shows which ones. It would
be tiresome to sort them all out.
svydes if intuse==1
pweight: ind_wt
Strata: stratum
PSU: psu
#Obs per PSU
Strata ----------------------------
stratum #PSUs #Obs min mean max
-------- -------- -------- -------- -------- --------
100A 110 110 1 1.0 1
100B 60 60 1 1.0 1
Lines missed out here
180E 10 10 1 1.0 1
180F 18 18 1 1.0 1
180G 17 17 1 1.0 1
180H 1* 1 1 1.0 1
180I 63 63 1 1.0 1
A * indicates a lonely PSU
More lines missed here
A few more lonely PSUs could be seen in them.
. svyset [pweight=ind_wt], clear( strata psu pweight ) back to top
pweight is ind_wt
. svymean intuse , ci deff
Survey mean estimation
pweight: ind_wt Number of obs = 28685
Strata: <one> Number of strata = 1
PSU: <observations> Number of PSUs = 28685
Population size = 28642.333
------------------------------------------------------------------------
Mean | Estimate Std. Err. [95% Conf. Interval] Deff
---------+--------------------------------------------------------------
intuse | .3415642 .0032671 .3351606 .3479678 1.361344
------------------------------------------------------------------------
. svyset ,psu(psu)
pweight is ind_wt
psu is psu
. svymean intuse , ci deff
Survey mean estimation
pweight: ind_wt Number of obs = 28685
Strata: <one> Number of strata = 1
PSU: psu Number of PSUs = 11937
Population size = 28642.333
----------------------------------------------------------------------
Mean | Estimate Std. Err. [95% Conf. Interval] Deff
---------+------------------------------------------------------------
intuse | .3415642 .0037742 .334166 .3489623 1.816821
----------------------------------------------------------------------
. svyset, clear(psu) strata(stratum)
pweight is ind_wt
strata is stratum
. svymean intuse , ci deff
Survey mean estimation
pweight: ind_wt Number of obs = 28685
Strata: stratum Number of strata = 281
PSU: <observations> Number of PSUs = 28685
Population size = 28642.333
---------------------------------------------------------------------
Mean | Estimate Std. Err. nf. Interval] Deff
---------+-----------------------------------------------------------
intuse | .3415642 .0031608 .3353689 .3477594 1.274203
----------------------------------------------------------------------
. svyset,psu(psu)
pweight is ind_wt
strata is stratum
psu is psu
This restores the actual correct design
. svytab intuse sex, count row percent back to top
pweight: ind_wt Number of obs = 28685
Strata: stratum Number of strata = 281
PSU: psu Number of PSUs = 11937
Population size = 28642.333
-------------------------------------
Whether |
person |
uses the |
--interne | sex
t | male female Total
----------+--------------------------
0 | 7781 1.1e+04 1.9e+04
| 41.26 58.74 100
|
1 | 4874 4909 9783
| 49.82 50.18 100
|
Total | 1.3e+04 1.6e+04 2.9e+04
| 44.18 55.82 100
-------------------------------------
Key: weighted counts
row percentages
Pearson:
Uncorrected chi2(1) = 191.8936
Design-based F(1, 11656) = 143.8122 P = 0.0000
The counts in this table are the weighted counts, i.e. the sum of the
weights in each cell. This is not very helpful especially since we don't
have grossing up weights here. Actual base numbers in each row would be
much more useful. The format for counts is also not good, but could
easily be fixed with a format subcommand (see table below).
. svytab intuse rc5,
stratum with only one PSU detected
. svytab sex groc, count row percent format(%10.2f)
pweight: ind_wt Number of obs = 28685
Strata: stratum Number of strata = 281
PSU: psu Number of PSUs = 11937
Population size = 28642.333
----------------------------------------
| groc
sex | 0 1 Total
----------+-----------------------------
male | 12310.71 344.77 12655.48
| 97.28 2.72 100.00
|
female | 15513.40 473.45 15986.86
| 97.04 2.96 100.00
|
Total | 27824.11 818.22 28642.33
| 97.14 2.86 100.00
----------------------------------------
Key: weighted counts
row percentages
Pearson:
Uncorrected chi2(1) = 1.4349
Design-based F(1, 11656) = 1.0726 P = 0.3004
. svymean intuse, ci deff by(council) back to top
Survey mean estimation
pweight: ind_wt Number of obs = 28685
Strata: stratum Number of strata = 281
PSU: psu Number of PSUs = 11937
Population size = 28642.333
---------------------------------------------------------------------------
Mean Subpop. | Estimate Std. Err. [95% Conf. Interval] Deff
---------------+-----------------------------------------------------------
intuse |
Aberdeen | .4259058 .015494 .3955349 .4562766 1.285795
Clackman | .300511 .0301624 .2413875 .3596344 1.158457
Orkney_I | .3083178 .0301204 .2492768 .3673588 .4625155
PerthKin | .3762066 .0225993 .3319081 .420505 1.634168
Renfrews | .3161091 .0174258 .2819515 .3502666 1.417949
only a few rows shown
------------------------------------------------------------------------------
. svymean intuse, ci deff by(council) srssubpop
This gives design effects with respecyt to simple random sampling in the subgroups
Survey mean estimation
pweight: ind_wt Number of obs = 28685
Strata: stratum Number of strata = 281
PSU: psu Number of PSUs = 11937
Population size = 28642.333
------------------------------------------------------------------------------
Mean Subpop. | Estimate Std. Err. [95% Conf. Interval] Deff
---------------+--------------------------------------------------------------
intuse |
Aberdeen | .4259058 .015494 .3955349 .4562766 1.141851
Clackman | .300511 .0301624 .2413875 .3596344 2.246256
Orkney_I | .3083178 .0301204 .2492768 .3673588 2.612066
PerthKin | .3762066 .0225993 .3319081 .420505 1.466841
Renfrews | .3161091 .0174258 .2819515 .3502666 1.261359
only a few rows shown
------------------------------------------------------------------------------
. tabulate groupinc, generate(groupinc) back to top
getting dummy variables for regression
groupinc | Freq. Percent Cum.
------------+-----------------------------------
missing | 756 2.64 2.64
under 10K | 8,840 30.82 33.45
10-20K | 10,206 35.58 69.03
20-30k | 5,472 19.08 88.11
30-50k | 2,876 10.03 98.13
50K+ | 535 1.87 100.00
------------+-----------------------------------
Total | 28,685 100.00
. svylogit intuse groupinc3-groupinc6 if groupinc>0,prob deff deft or
Survey logistic regression
pweight: ind_wt Number of obs = 28685
Strata: stratum Number of strata = 281
PSU: psu Number of PSUs = 11937
Population size = 28642.333
F( 4, 11653) = 686.78
Prob > F = 0.0000
------------------------------------------------------------------------------
intuse | Odds Ratio Std. Err. t P>|t| Deff Deft
-------------+----------------------------------------------------------------
groupinc3 | 2.0628 .0916902 16.29 0.000 1.245006 1.115799
groupinc4 | 5.224951 .2463328 35.07 0.000 1.322352 1.149936
groupinc5 | 13.08222 .7537283 44.63 0.000 1.420008 1.191641
groupinc6 | 22.63186 2.864421 24.65 0.000 1.587204 1.259843
------------------------------------------------------------------------------
. logistic intuse groupinc3-groupinc6 if groupinc>0, or
This is simple unweighted regression
Logistic regression Number of obs = 28685
LR chi2(4) = 4898.43
Prob > chi2 = 0.0000
Log likelihood = -15285.327 Pseudo R2 = 0.1381
------------------------------------------------------------------------------
intuse | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
groupinc3 | 2.549186 .0980338 24.33 0.000 2.364106 2.748756
groupinc4 | 6.627238 .2735814 45.81 0.000 6.112148 7.185737
groupinc5 | 15.67939 .7972235 54.13 0.000 14.1922 17.32243
groupinc6 | 26.5441 2.922164 29.78 0.000 21.39251 32.93626
------------------------------------------------------------------------------
. tabulate shs_6cla, generate (rural)
Urban/rural classification | Freq. Percent Cum.
--------------------------------------+-----------------------------------
Urban settlements of over 125,000 pop | 10,298 35.96 35.96
Other urban | 8,352 29.16 65.12
Small access towns,3-10k | 2,937 10.26 75.38
Small remote towns, pop 3-10k | 1,290 4.50 79.88
Accessible rural, pop<3k, drive>0 | 3,264 11.40 91.28
Remote rural, pop<3k, drive>30 | 2,498 8.72 100.00
--------------------------------------+-----------------------------------
Total | 28,639 100.00
. svylogit intuse groupinc3-groupinc6 rural2-rural6 if groupinc>0,prob or
Survey logistic regression
pweight: ind_wt Number of obs = 28639
Strata: stratum Number of strata = 281
PSU: psu Number of PSUs = 11922
Population size = 28593.511
F( 9, 11633) = 306.37
Prob > F = 0.0000
------------------------------------------------------------------------------
intuse | Odds Ratio Std. Err. t P>|t|
-------------+----------------------------------------------------------------
groupinc3 | 2.08292 .0927013 16.49 0.000
groupinc4 | 5.308005 .2510165 35.30 0.000
groupinc5 | 13.30205 .7706116 44.67 0.000
groupinc6 | 22.71765 2.889008 24.56 0.000
rural2 | .8277699 .0339711 -4.61 0.000
rural3 | .9343237 .0535642 -1.18 0.236
rural4 | .9006713 .0927712 -1.02 0.310
rural5 | .8625718 .0452299 -2.82 0.005
rural6 | 1.017198 .0685998 0.25 0.800
------------------------------------------------------------------------------
. bspline,x(age) power(3) gen(bs)
unrecognized command: bspline
You will get this error if you have not installed the spline package from
the internet via the help.
. bspline,x(age) power(3) gen(bs)
(1 missing value generated)
This approach could be extended to produce m ore complicated fits
like the one shown on the exemplar main page. WE show only a simple
spline fit here.
. svymlogit intuse bs1-bs4,noconst deft deff
Survey multinomial logistic regression
pweight: ind_wt Number of obs = 28684
Strata: stratum Number of strata = 281
PSU: psu Number of PSUs = 11937
Population size = 28641.023
F( 4, 11653) = 748.87
Prob > F = 0.0000
------------------------------------------------------------------------------
intuse | Coef. Std. Err. Deff Deft
-------------+----------------------------------------------------------------
bs1 | 3.267213 1.556268 1.487014 1.219432
bs2 | -.5498629 .41734 1.361043 1.166638
bs3 | .6622839 .447941 1.317879 1.147989
bs4 | -26.89027 2.000127 1.239484 1.113321
------------------------------------------------------------------------------
(Outcome intuse==0 is the comparison group)
. predict pr0 pr1
(option p assumed; predicted probabilities)
(1 missing value generated)
. plot pr1 age
.57159 +
| *
| ******
| ********
P | ******
r | *****
( | ***
i | ***
n | ***
t | **
u | **
s | **
e | **
= | **
= | **
1 | *
) | **
| ***
| **
| ****
.0158 + ******
+----------------------------------------------------------------+
16 age 80