/*--------------------------------------------------------------------------------------------------- analyse exemplar 6data in Stata How to use this file everything inside (star slash) and (slash star) is taken as a comment open it in the Stata do file editor run one line or group of lines at a time by either highlighting them and going to do in the tools menu 'do selection' or press ctrl^d first read in the file, but before this increase the memory available to Stata because this exemplar needs it. this command with your file name will do it or use the menus You must increased the memory before reading thge data. Change the file name to where you have the data stored. -------------------------------------------------------------------------------------------------------*/ clear set memory 900m use "C:\Documents and Settings\gillian raab\My Documents\aprojects\peaslaptop\ex6datafiles\data\ex6.dta", clear /************************************************************************ * first a small model with just DPREV1 to DPREV6 and gender * * get a smaller file * ************************************************************************/ keep dprev1 dprev2 dprev3 dprev4 dprev5 dprev6 gender save prevonly,replace use prevonly bysort gender: summarize dprev1 dprev2 dprev3 dprev4 dprev5 dprev6 /*--------------------------------------------------------------------------------------------------------- the first imputation method we will use is the implimentation of mice for Stata by patrick royston. this is also an add-on package. to install it, do a web search net resources for mice (in capital letters)and install the package. you can then use and get help for the commands mvis - to do the imputation micombine - for analysing multiple imputations mijoin misplit A NEW VERSION OF THIS WILL BE AVAILABLE SHORTLY WITH mvis RENAMED AS ice the results will be saved in a data set called prevonlyimp.dta. the m(4) subcommand gets 4 sets of imputed data. genmiss(miss) creates new variables to show which ones have been imputed. seed sets the seed for the random number so that we can get exactly the same results again IT IS IMPORTANT TO USE THE DRAW OPTION HERE SINCE THERE IS A PROBLEM WITH THE DEFAULT METHOD OF PREDICTIVE MEAN MATCHING THE NEW VERSION WILL HAVE DRAW AS THE DEFAULT - SO CHECK THE HELP ----------------------------------------------------------------------------------------------------------*/ ice2 dprev1 dprev2 dprev3 dprev4 dprev5 dprev6 using previmp , seed(812736) m(10) genmiss(miss) replace /*------------------------------------------------------------------------------------------------------------ now open prevonlyimp.dta and look at the results. you will have five sets of data one following the other with missing observations replaced and new variables starting with miss to indicate missingness The programme has recognised that you have binary data and imputed them with logistic regressions. you have 2 sets of data and at the end of the file you have two new variables at the end (_i and _j by default). _i gives the sequence number in the original file _j is 1 for the first imputation and 2 for the second and so on. Now open new file and some tables to check. ----------------------------------------------------------------------------------------------------------------*/ use previmp bysort _j: tab missdprev1 dprev1,row /*----------------------------------------------------------------------------------------------------------- Now post imputation with micombine Use regress with no constant to get means and se by gender of DPREV6 You need to set up dummy variables with tab first -------------------------------------------------------------------------------------------------------------*/ tab gender,gen(gendum) bysort _j:regress dprev6 gendum1 gendum2,noconstant micombine regress dprev6 gendum1 gendum2,noconstant /********************************************************************* * Now back to original file for a bigger imputation model * *********************************************************************/ use "C:\Documents and Settings\gillian raab\My Documents\aprojects\peaslaptop\ex6datafiles\data\ex6.dta", clear /*-------- ------------------------------------------------------------------------------------------- the delimit command sets ; as the end of a command so that commands can run over several lines The cmd option defines the type of regression that will be needed for each variable at the updating stage - see the help files for mvis NOTE USE OF draw OPTION AGAIN and 10 imputations Patience needed here it will be slow, especially as we are running 30 cycles to be sure of convergence. This was suggested from looking at the output from R. Notice that we don't have dprev1-dprev6 in this model since we have dvol1, dvar1 which are zeros for zero values of corresponding dprev variable ---------------------------------------------------------------------------------------------------------*/ #delimit; ice2 sasmky80 sbsmky80 scsmky80 sdsmky80 sesmky80 sfsmky80 raalcy80 rbalcy80 rcalcy80 rdalcy80 realcy80 rfalcy80 gender ethgp hzrespre szindep szabs04 hzevrefi yzleave sector dvar1 dvar2 dvar3 dvar4 dvar5 dvar6 dvol1 dvol2 dvol3 dvol4 dvol5 dvol6 drgcode1 drgcode2 drgcode3 drgcode4 drgcode5 drgcode6 using ex6imp2, m(10) cycle(30) replace genmiss(miss) cmd(sasmky80 sbsmky80 scsmky80 sdsmky80 sesmky80 sfsmky80 raalcy80 rbalcy80 rcalcy80 rdalcy80 realcy80 rfalcy80 dvar1 dvar2 dvar3 dvar4 dvar5 dvar6:ologit, dvol1 dvol2 dvol3 dvol4 dvol5 dvol6:regress, drgcode1 drgcode2 drgcode3 drgcode4 drgcode5 drgcode6 gender ethgp hzrespre hzevrefi szabs04 yzleave:logit); #delimit cr save "C:\Documents and Settings\gillian raab\My Documents\aprojects\peaslaptop\ex6datafiles\data\MICEsc.dta", replace /*----------------------------------------------------------------------------------------------------------- now open ex6imp2 and look at the results to see if OK and close again --------------------------------------------------------------------------------------------------------------*/ use ex6imp2,clear /*------and save it--------don't want to have to repeat--------------------------------*/ save ex6imp2.dta,replace /*-------------------- examine contents---------------------------------------------------------*/ /*-------------------- some sample plots for observed and imputed data------*/ histogram dvar1, discrete percent gap(50) bfcolor(red) blcolor(red) xtitle(Variety sweep 1)by(missdvar1, legend(off)) histogram dvol1, percent bfcolor(red) blcolor(red) xtitle(Volume sweep 1)by(missdvol1, legend(off)) /*-------------- and some tables by imputation-----------*/ tabulate missdvar6 dvar6, row bysort _j: tabulate missdvar6 dprev6, row /*--------------- recode volume to round numbers and set -ve to zero-------*/ replace dvol1=0 if dvol1<=0 replace dvol1=round(dvol1) replace dvol2=0 if dvol2<=0 replace dvol2=round(dvol2) replace dvol3=0 if dvol3<=0 replace dvol3=round(dvol3) replace dvol4=0 if dvol4<=0 replace dvol4=round(dvol4) replace dvol5=0 if dvol5<=0 replace dvol5=round(dvol5) replace dvol6=0 if dvol6<=0 replace dvol6=round(dvol6) /*---------------------------------tidy so vol agrees w variety together----------------------------------------*/ replace dvol1=0 if dvar1==0 replace dvol2=0 if dvar2==0 replace dvol3=0 if dvar3==0 replace dvol4=0 if dvar4==0 replace dvol5=0 if dvar5==0 replace dvol6=0 if dvar6==0 replace dvol1=1 if dvar1>0 & dvol1==0 replace dvol2=1 if dvar2>0 & dvol2==0 replace dvol3=1 if dvar3>0 & dvol3==0 replace dvol4=1 if dvar4>0 & dvol4==0 replace dvol5=1 if dvar5>0 & dvol5==0 replace dvol6=1 if dvar6>0 & dvol6==0 replace dprev1=0 if dvar1==0 replace dprev2=0 if dvar2==0 replace dprev3=0 if dvar3==0 replace dprev4=0 if dvar4==0 replace dprev5=0 if dvar5==0 replace dprev6=0 if dvar6==0 replace dprev1=1 if dvar1>0 replace dprev2=1 if dvar2>0 replace dprev3=1 if dvar3>0 replace dprev4=1 if dvar4>0 replace dprev5=1 if dvar5>0 replace dprev6=1 if dvar6>0 save "C:\Documents and Settings\gillian raab\My Documents\aprojects\peaslaptop\ex6datafiles\data\MICEsc.dta", replace /*----------------- check some histograms now-----------------------*/ histogram dvol1, percent bfcolor(red) blcolor(red) xtitle(Volume sweep 1)by(missdvol1, legend(off)) /*---Now we can see how the imputed values vary by imputation-------*/ bysort _j: summarize dprev1 dprev2 dprev3 dprev4 dprev5 dprev6 /*--------------- and get some results to compare---------------*/ tab gender,gen(gendum) micombine regress dprev1 gendum1 gendum2,noconstant micombine regress dprev6 gendum1 gendum2,noconstant micombine regress dvol1 gendum1 gendum2,noconstant micombine regress dvol6 gendum1 gendum2,noconstant micombine regress dvar1 gendum1 gendum2,noconstant micombine regress dvar6 gendum1 gendum2,noconstant /*-------------now a logistic regression-----------*/ tabulate sector, gen( sec) tabulate szindep,gen(dep) tab dep1 micombine logistic dprev6 gender dep1 sec2 sec3 sec4