/*--------------------------------------------------- HOW TO USE THIS FILE This is an HTML version of the Stata do file ex6.do It is intended to show you the code and to allow links not to use interactively EVERYTHING INSIDE (STAR SLASH) AND (SLASH STAR) IS TAKEN AS A COMMENT These are shown in black commands shown in red -------------------------------------------*/ For results go to results file. First analyses of just the 6 prevalence measures Analysis with MICE package Postimputation procedures for IVEWARE results Then with all scores and other variables Analysis with MICE package Postimputation procedures for IVEWARE results /*---------------------------------------------------------------------------------------------------
first read in the file, but before this increase the memory available to Stata because this exemplar needs it. this command with your file name will do it or use the menus You must increased the memory before reading thge data.Change the file name to where you have the data stored. --------------------------------------------------------------*/ clear set memory 900m use "C:\Documents and Settings\gillian raab\My documents\aprojects\peaslaptop\ex6datafiles\data\ex6.dta", clear back to top
/************************************************************************
* first a small model with just DPREV1 to DPREV6 and gender *
* get a smaller file *
************************************************************************/
keep dprev1 dprev2 dprev3 dprev4 dprev5 dprev6 gender
save prevonly,replace
use prevonly /*--------------------------------------------------------------------------------------------------------- the first imputation method we will use is the implimentation of mice for Stata by patrick royston. this is also an add-on package. to install it, do a web search net resources for mice (in capital letters)and install the package. you can then use and get help for the commands ice - to do the imputation micombine - for analysing multiple imputations mijoin misplit the results will be saved in a data set called prevonlyimp.dta. the m(4) subcommand gets 4 sets of imputed data. genmiss(miss) creates new variables to show which ones have been imputed. seed sets the seed for the random number so that we can get exactly the same results again IT IS IMPORTANT TO USE THE DRAW OPTION HERE SINCE THERE IS A PROBLEM WITH THE DEFAULT METHOD OF PREDICTIVE MEAN MATCHING THE NEW VERSION WILL HAVE DRAW AS THE DEFAULT - SO CHECK THE HELP ----------------------------------------------------------------------------------------------------------*/ ice dprev1 dprev2 dprev3 dprev4 dprev5 dprev6 using previmp , seed(812736) m(10) genmiss(miss) replace /*------------------------------------------------------------------------------------------------------------ now open prevonlyimp.dta and look at the results. you will have ten sets of data one following the other with missing observations replaced and new variables starting with miss to indicate missingness The programme has recognised that you have binary data and imputed them with logistic regressions. you have 2 sets of data and at the end of the file you have two new variables at the end (_i and _j by default). _i gives the sequence number in the original file _j is 1 for the first imputation and 2 for the second and so on. Now open new file and some tables to check.
----------------------------------------------------------------------*/
use previmp
bysort _j: tab missdprev1 dprev1,row back to top/*------------------------------------------------------------------ Now post imputation with micombine Use regress with no constant to get means and se by gender of DPREV6 You need to set up dummy variables with tab first -------------------------------------------------------------------------------------------------------------*/ tab gender,gen(gendum) bysort _j:regress dprev6 gendum1 gendum2,noconstant micombine regress dprev6 gendum1 gendum2,noconstant back to top
/********************************************************************* * Now back to original file for a bigger imputation model * *********************************************************************/ use "C:\Documents and Settings\gillian raab\My Documents\aprojects\peaslaptop\ex6datafiles\data\ex6.dta", clear /*-------- --------------------------------------------------- the delimit command sets ; as the end of a command so that commands can run over several lines The cmd option defines the type of regression that will be needed for each variable at the updating stage - see the help files for ice Patience needed here it will be slow, especially as we are running 30 cycles to be sure of convergence. This was suggested from looking at the output from R. Notice that we don't have dprev1-dprev6 in this model since we have dvol1, dvar1 which are zeros for zero values of corresponding dprev variable ----------------------------------------------------------------*/ #delimit; ice sasmky80 sbsmky80 scsmky80 sdsmky80 sesmky80 sfsmky80 raalcy80 rbalcy80 rcalcy80 rdalcy80 realcy80 rfalcy80 gender ethgp hzrespre szindep szabs04 hzevrefi yzleave sector dvar1 dvar2 dvar3 dvar4 dvar5 dvar6 dvol1 dvol2 dvol3 dvol4 dvol5 dvol6 drgcode1 drgcode2 drgcode3 drgcode4 drgcode5 drgcode6 using ex6imp2, m(4) draw cycle(30) replace genmiss(miss) cmd(sasmky80 sbsmky80 scsmky80 sdsmky80 sesmky80 sfsmky80 raalcy80 rbalcy80 rcalcy80 rdalcy80 realcy80 rfalcy80 dvar1 dvar2 dvar3 dvar4 dvar5 dvar6:ologit, dvol1 dvol2 dvol3 dvol4 dvol5 dvol6:regress, drgcode1 drgcode2 drgcode3 drgcode4 drgcode5 drgcode6 gender ethgp hzrespre hzevrefi szabs04 yzleave:logit); #delimit cr /*----------------------------------------------------------------------------------------------------------- now open ex6imp2 and look at the results to see if OK and close again --------------------------------------------------------------------------------------------------------------*/ use ex6imp2,clear /*------and save it--------don't want to have to repeat--------------------------------*/ save ex6imp2.dta,replace back to top
/*-------------------- examine contents---------------------------------------------------------*/ /*-------------------- some sample plots for observed and imputed data------*/ histogram dvar1, discrete percent gap(50) bfcolor(red) blcolor(red) xtitle(Variety sweep 1)by(missdvar1, legend(off)) histogram dvol1, percent bfcolor(red) blcolor(red) xtitle(Volume sweep 1)by(missdvol1, legend(off)) /*-------------- and some tables by imputation-----------*/ tabulate missdvar6 dvar6, row bysort _j: tabulate missdvar6 dprev6, row /*--------------- recode volume to round numbers and set -ve to zero-------*/ replace dvol1=0 if dvol1<=0 replace dvol1=round(dvol1) replace dvol2=0 if dvol2<=0 replace dvol2=round(dvol2) replace dvol3=0 if dvol3<=0 replace dvol3=round(dvol3) replace dvol4=0 if dvol4<=0 replace dvol4=round(dvol4) replace dvol5=0 if dvol5<=0 replace dvol5=round(dvol5) replace dvol6=0 if dvol6<=0 replace dvol6=round(dvol6) /*---------------------------------tidy so vol agrees w variety together----------------------------------------*/ replace dvol1=0 if dvar1==0 replace dvol2=0 if dvar2==0 replace dvol3=0 if dvar3==0 replace dvol4=0 if dvar4==0 replace dvol5=0 if dvar5==0 replace dvol6=0 if dvar6==0 replace dvol1=1 if dvar1>0 & dvol1==0 replace dvol2=1 if dvar2>0 & dvol2==0 replace dvol3=1 if dvar3>0 & dvol3==0 replace dvol4=1 if dvar4>0 & dvol4==0 replace dvol5=1 if dvar5>0 & dvol5==0 replace dvol6=1 if dvar6>0 & dvol6==0 replace dprev1=0 if dvar1==0 replace dprev2=0 if dvar2==0 replace dprev3=0 if dvar3==0 replace dprev4=0 if dvar4==0 replace dprev5=0 if dvar5==0 replace dprev6=0 if dvar6==0 replace dprev1=1 if dvar1>0 replace dprev2=1 if dvar2>0 replace dprev3=1 if dvar3>0 replace dprev4=1 if dvar4>0 replace dprev5=1 if dvar5>0 replace dprev6=1 if dvar6>0 /*----------------- check some histograms now-----------------------*/ histogram dvol1, percent bfcolor(red) blcolor(red) xtitle(Volume sweep 1)by(missdvol1, legend(off)) /*---Now we can see how the imputed values vary by imputation-------*/ bysort _j: summarize dprev1 dprev2 dprev3 dprev4 dprev5 dprev6 /*--------------- and get some results with micombine to compare---------------*/ tab gender,gen(gendum) micombine regress dprev1 gendum1 gendum2,noconstant micombine regress dprev6 gendum1 gendum2,noconstant micombine regress dvol1 gendum1 gendum2,noconstant micombine regress dvol6 gendum1 gendum2,noconstant micombine regress dvar6 gendum1 gendum2,noconstant micombine regress dvar6 gendum1 gendum2,noconstant /*-------------now a logistic regression-----------*/ tabulate szindep,gen(dep) tab dep1 micombine logistic dprev6 gender dep1 sec2 sec3 sec4 bysort gender: summarize dprev1 dprev2 dprev3 dprev4 dprev5 dprev6 dvol1 dvol2 dvol3 dvol4 dvol5 dvol6 dvar1 dvar2 dvar3 dvar4 dvar5 dvar6; #delimit cr /*-------------now a logistic regression-----------*/ tabulate sector, gen( sec) tabulate szindep,gen(dep) tab dep1 micombine logistic dprev6 gender dep1 sec2 sec3 sec4