Stata commands in white,
output in green and yellow,
warnings are in red
Comments on interpretation of output are in blue.
For comments on running the analyses go to the commented code file.
The first analysis is on a data file with just the 6 measures of prevalence using the ice procedure in the MICE package. Indicators of whether a variable was missing are calculated by the gen option.
back to top
. ice dprev1 dprev2 dprev3 dprev4 dprev5 dprev6 using previmp , seed(812736) m(10) genmiss(miss) replace
Variable | Command | Prediction equation
-------------+-------------+--------------------------------------------------
dprev1 | logit | dprev2 dprev3 dprev4 dprev5 dprev6
dprev2 | logit | dprev1 dprev3 dprev4 dprev5 dprev6
dprev3 | logit | dprev1 dprev2 dprev4 dprev5 dprev6
dprev4 | logit | dprev1 dprev2 dprev3 dprev5 dprev6
dprev5 | logit | dprev1 dprev2 dprev3 dprev4 dprev6
dprev6 | logit | dprev1 dprev2 dprev3 dprev4 dprev5 Imputing 1..2..3..4..5..6..7..8..9..10..file previmp.dta saved
Now use the imputed data set and make some tables to check. use previmp bysort _j: tab missdprev1 dprev1,row _______________________________________________________________________________ -> _j = 1 +----------------+ | Key | |----------------| | frequency | | row percentage | +----------------+ 1 if | dprev1 | dprev1 imput. (429 missing, 0 | values) otherwise | 0 1 | Total -----------+----------------------+---------- 0 | 1,050 2,849 | 3,899 | 26.93 73.07 | 100.00 -----------+----------------------+---------- 1 | 112 317 | 429 | 26.11 73.89 | 100.00 -----------+----------------------+---------- Total | 1,162 3,166 | 4,328 | 26.85 73.15 | 100.00 Only the first one is shown here, but it is important to look at them all to check fro problems. back to top Now post-imputation procedures. To get means in subgroups, use a regression with constant supressed . tab gender,gen(gendum)
bysort _j:regress dprev6 gendum1 gendum2,noconstant Individual regressions, only the first one shown -> _j = 1 Source | SS df MS Number of obs = 4328 -------------+------------------------------ F( 2, 4326) = 4042.53 Model | 1832.50053 2 916.250264 Prob > F = 0.0000 Residual | 980.499472 4326 .226652675 R-squared = 0.6514 -------------+------------------------------ Adj R-squared = 0.6513 Total | 2813 4328 .649953789 Root MSE = .47608 ------------------------------------------------------------------------------ dprev6 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- gendum1 | .6806761 .0101755 66.89 0.000 .6607268 .7006254 gendum2 | .6185133 .0102938 60.09 0.000 .5983322 .6386944 ------------------------------------------------------------------------------
micombine regress dprev6 gendum1 gendum2,noconstant
Multiple imputation parameter estimates (10 imputations) ------------------------------------------------------------------------------ dprev6 | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- gendum1 | .6794427 .0122988 55.24 0.000 .6553308 .7035546 gendum2 | .6226741 .0108862 57.20 0.000 .6013316 .6440167 ------------------------------------------------------------------------------ 4328 observations. The standard errrors allow for differences between imputations. Note how they are increased compared to a single imputation.. back to top Now a big model #delimit; ice sasmky80 sbsmky80 scsmky80 sdsmky80 sesmky80 sfsmky80 raalcy80 rbalcy80 rcalcy80 rdalcy80 realcy80 rfalcy80 gender ethgp hzrespre szindep szabs04 hzevrefi yzleave sector dvar1 dvar2 dvar3 dvar4 dvar5 dvar6 dvol1 dvol2 dvol3 dvol4 dvol5 dvol6 drgcode1 drgcode2 drgcode3 drgcode4 drgcode5 drgcode6 using ex6imp2, m(4) draw cycle(30) replace genmiss(miss) cmd(sasmky80 sbsmky80 scsmky80 sdsmky80 sesmky80 sfsmky80 raalcy80 rbalcy80 rcalcy80 rdalcy80 realcy80 rfalcy80 dvar1 dvar2 dvar3 dvar4 dvar5 dvar6:ologit, dvol1 dvol2 dvol3 dvol4 dvol5 dvol6:regress, drgcode1 drgcode2 drgcode3 drgcode4 drgcode5 drgcode6 gender ethgp hzrespre hzevrefi szabs04 yzleave:logit); #delimit cr Only a few of the equations are shown. These are large equations but the ordered logit model gives a model without too many parameters and the scores are treated as X variables (rather than dummies) in other models Variable | Command | Prediction equation -------------+-------------+-------------------------------------------------- sasmky80 | ologit | sbsmky80 scsmky80 sdsmky80 sesmky80 sfsmky80 | | raalcy80 rbalcy80 rcalcy80 rdalcy80 realcy80 | | rfalcy80 gender ethgp hzrespre szindep szabs04 | | hzevrefi yzleave sector dvar1 dvar2 dvar3 dvar4 | | dvar5 dvar6 dvol1 dvol2 dvol3 dvol4 dvol5 dvol6 | | drgcode1 drgcode2 drgcode3 drgcode4 drgcode5 | | drgcode6 gender | logit | [No missing data in estimation sample] hzrespre | logit | [No missing data in estimation sample] szindep | | [No missing data in estimation sample] dvar1 | ologit | sasmky80 sbsmky80 scsmky80 sdsmky80 sesmky80 | | sfsmky80 raalcy80 rbalcy80 rcalcy80 rdalcy80 | | realcy80 rfalcy80 gender ethgp hzrespre szindep | | szabs04 hzevrefi yzleave sector dvar2 dvar3 dvar4 | | dvar5 dvar6 dvol1 dvol2 dvol3 dvol4 dvol5 dvol6 | | drgcode1 drgcode2 drgcode3 drgcode4 drgcode5 | | drgcode6 dvol5 | regress | sasmky80 sbsmky80 scsmky80 sdsmky80 sesmky80 | | sfsmky80 raalcy80 rbalcy80 rcalcy80 rdalcy80 | | realcy80 rfalcy80 gender ethgp hzrespre szindep | | szabs04 hzevrefi yzleave sector dvar1 dvar2 dvar3 | | dvar4 dvar5 dvar6 dvol1 dvol2 dvol3 dvol4 dvol6 | | drgcode1 drgcode2 drgcode3 drgcode4 drgcode5 | | drgcode6 drgcode1 | logit | sasmky80 sbsmky80 scsmky80 sdsmky80 sesmky80 | | sfsmky80 raalcy80 rbalcy80 rcalcy80 rdalcy80 | | realcy80 rfalcy80 gender ethgp hzrespre szindep | | szabs04 hzevrefi yzleave sector dvar1 dvar2 dvar3 | | dvar4 dvar5 dvar6 dvol1 dvol2 dvol3 dvol4 dvol5 | | dvol6 drgcode2 drgcode3 drgcode4 drgcode5 | | drgcode6 drgcode2 | logit | sasmky80 sbsmky80 scsmky80 sdsmky80 sesmky80 | | sfsmky80 raalcy80 rbalcy80 rcalcy80 rdalcy80 | | realcy80 rfalcy80 gender ethgp hzrespre szindep | | szabs04 hzevrefi yzleave sector dvar1 dvar2 dvar3 | | dvar4 dvar5 dvar6 dvol1 dvol2 dvol3 dvol4 dvol5 | | dvol6 drgcode1 drgcode3 drgcode4 drgcode5 | | drgcode6 drgcode3 | logit | sasmky80 sbsmky80 scsmky80 sdsmky80 sesmky80 | | sfsmky80 raalcy80 rbalcy80 rcalcy80 rdalcy80 | | realcy80 rfalcy80 gender ethgp hzrespre szindep | | szabs04 hzevrefi yzleave sector dvar1 dvar2 dvar3 | | dvar4 dvar5 dvar6 dvol1 dvol2 dvol3 dvol4 dvol5 | | dvol6 drgcode1 drgcode2 drgcode4 drgcode5 | | drgcode6 Imputing 1..2..3..4..5..6..7..8..9..10.. Some graphical data checking back to top . use ex6imp2 histogram dvar1, percent bfcolor(red) blcolor(red) xtitle(Volume sweep 1)by(missdvol1, legend(off)). use ex6imp2 histogram dvol1, percent bfcolor(red) blcolor(red) xtitle(Volume sweep 1)by(missdvol1, legend(off)) Plots of observed and imputed values for volume before fixing it
Now some code to tidy up data - only a sample is shown Full details are in the code file ....... replace dvol5=0 if dvol5<=0 replace dvol5=round(dvol5) replace dvol6=0 if dvol6<=0 ..... histogram dvol1, percent bfcolor(red) blcolor(red) xtitle(Volume sweep 1)by(missdvol1, legend(off))
This looks better. back to top Now post-imputation procedures. To get means in subgroups, use a regression with constant supressed tab gender,gen(gendum) . tab gender,gen(gendum)
Gender | Freq. Percent Cum.
------------+-----------------------------------
Male | 21,890 50.58 50.58
Female | 21,390 49.42 100.00
------------+-----------------------------------
Total | 43,280 100.00
. micombine regress dprev1 gendum1 gendum2,noconstant
Multiple imputation parameter estimates (10 imputations)
------------------------------------------------------------------------------
dprev6 | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gendum1 | .6980356 .0144737 48.23 0.000 .6696597 .7264115
gendum2 | .6172043 .0142478 43.32 0.000 .5892713 .6451373
------------------------------------------------------------------------------
4328 observations.
Now logistic regression on two other factors . tabulate sector, gen( sec) independent | , special | or | behavioural | educ | Freq. Percent Cum. ------------+----------------------------------- M | 37,250 86.07 86.07 I | 5,140 11.88 97.94 S | 380 0.88 98.82 B | 510 1.18 100.00 ------------+----------------------------------- Total | 43,280 100.00 tabulate szindep,gen(dep) SZ. Individual | deprivation/socio-economi | Freq. Percent Cum. ---------------------------+----------------------------------- Manual/high deprivation | 18,990 43.88 43.88 Non-manual/low deprivation | 24,290 56.12 100.00 ---------------------------+----------------------------------- Total | 43,280 100.00 micombine logistic dprev6 gender dep1 sec2 sec3 sec4 Multiple imputation parameter estimates (10 imputations) ------------------------------------------------------------------------------ dprev6 | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- gender | -.3662452 .0768526 -4.77 0.000 -.5168736 -.2156168 dep1 | .2610846 .0889303 2.94 0.003 .0867843 .4353848 sec2 | -.477864 .1031108 -4.63 0.000 -.6799575 -.2757705 sec3 | -1.358237 .4259697 -3.19 0.001 -2.193122 -.5233516 sec4 | -.0986321 .4243947 -0.23 0.816 -.9304305 .7331663 _cons | 1.171121 .1313379 8.92 0.000 .9137035 1.428539 ------------------------------------------------------------------------------ 4328 observations. back to top