/*---------------------------------------------------------------------------------------------------
analyse exemplar 6data in Stata                                        
How to use this file                                                             
everything inside (star slash)  and (slash star)                  
 is taken as a comment 
open it in the Stata do file editor

run one line or group of lines at a time by either highlighting
 them and going to do in the tools menu 'do selection'
 or press ctrl^d 

first read in the file, but before this increase the memory
available to Stata because this exemplar needs it.
this command with your file name will do it or use the menus
You must increased the memory before reading thge data.

Change the file name to where you have the data stored.
-------------------------------------------------------------------------------------------------------*/
clear
set memory 900m
use "C:\Documents and Settings\gillian raab\My Documents\aprojects\peaslaptop\ex6datafiles\data\ex6.dta", clear

/************************************************************************
*   first a small model with just DPREV1 to DPREV6 and gender           *
*   get a smaller file                                                  *
************************************************************************/
keep dprev1 dprev2 dprev3 dprev4 dprev5 dprev6 gender
save prevonly,replace
use prevonly
bysort gender:  summarize dprev1 dprev2 dprev3  dprev4 dprev5  dprev6
/*---------------------------------------------------------------------------------------------------------
the first imputation method we will use is the implimentation 
of mice for Stata by patrick royston. this is also an add-on
package.
to install it, do a web search net resources
for mice (in capital letters)and install the package.
you can then use and get help for the commands

mvis         - to do the imputation
micombine    - for analysing multiple imputations
mijoin
misplit
A NEW VERSION OF THIS WILL BE AVAILABLE SHORTLY 
WITH mvis RENAMED AS ice

the results will be saved in a data set called prevonlyimp.dta.
the m(4) subcommand gets 4 sets of imputed data.
genmiss(miss) creates new variables to show which ones have been imputed.
seed sets the seed for the random number so that we can get exactly the same results again

IT IS IMPORTANT TO USE THE DRAW OPTION HERE SINCE THERE 
IS A PROBLEM WITH THE DEFAULT METHOD OF PREDICTIVE MEAN 
MATCHING
THE NEW VERSION WILL HAVE DRAW AS THE DEFAULT - SO CHECK THE HELP
 ----------------------------------------------------------------------------------------------------------*/
ice2 dprev1 dprev2 dprev3 dprev4 dprev5 dprev6  using previmp ,  seed(812736) m(10)  genmiss(miss) replace
/*------------------------------------------------------------------------------------------------------------
now open prevonlyimp.dta and look at the results.
you will have five sets of data one following the other
with missing observations replaced and new variables 
starting with miss to indicate missingness

The programme has recognised that you have binary data and
imputed them with logistic regressions.
you have 2 sets of data and at the end of the file you have two 
new variables at the end (_i and _j by default).
_i gives the sequence number in the original file
_j is 1 for the first imputation and 2 for the second and so on.
Now open new file and some tables to check.
----------------------------------------------------------------------------------------------------------------*/

use previmp 
bysort _j: tab missdprev1 dprev1,row
/*-----------------------------------------------------------------------------------------------------------
Now post imputation with micombine
Use regress with no constant to get means and se by gender of DPREV6
You need to set up dummy variables with tab first
-------------------------------------------------------------------------------------------------------------*/
tab gender,gen(gendum)
bysort _j:regress dprev6 gendum1 gendum2,noconstant
micombine regress dprev6 gendum1 gendum2,noconstant


/*********************************************************************
*  Now back to original file for a bigger imputation model             *
*********************************************************************/
use "C:\Documents and Settings\gillian raab\My Documents\aprojects\peaslaptop\ex6datafiles\data\ex6.dta", clear
/*-------- -------------------------------------------------------------------------------------------
the delimit command sets ; as the end of a command
           so that commands can run over several lines

The cmd option defines the type of regression that will be needed 
for each variable at the updating stage - see the help files for mvis 
NOTE USE OF draw OPTION AGAIN and 10 imputations

Patience needed here it will be slow, especially as we are running
30 cycles to be sure of convergence. This was suggested from looking at 
the output from R.

Notice that we don't have dprev1-dprev6 in this model
since we have dvol1, dvar1 which are zeros for zero 
values of corresponding dprev variable
---------------------------------------------------------------------------------------------------------*/
#delimit;
ice2 sasmky80 sbsmky80 scsmky80 sdsmky80 sesmky80 sfsmky80 
raalcy80 rbalcy80 rcalcy80 rdalcy80 realcy80 rfalcy80
 gender ethgp hzrespre szindep szabs04 
hzevrefi yzleave sector 
dvar1 dvar2 dvar3 dvar4 dvar5 dvar6
dvol1 dvol2 dvol3 dvol4 dvol5 dvol6
drgcode1 drgcode2 drgcode3 drgcode4 drgcode5 drgcode6 
 using ex6imp2,
m(10) cycle(30) replace genmiss(miss)
cmd(sasmky80 sbsmky80 scsmky80 sdsmky80 sesmky80 sfsmky80 
raalcy80 rbalcy80 rcalcy80 rdalcy80 realcy80 rfalcy80
dvar1 dvar2 dvar3 dvar4 dvar5 dvar6:ologit,
dvol1 dvol2 dvol3 dvol4 dvol5 dvol6:regress,
drgcode1 drgcode2 drgcode3 drgcode4 drgcode5 drgcode6
 gender ethgp hzrespre  hzevrefi szabs04 yzleave:logit);
#delimit cr
save "C:\Documents and Settings\gillian raab\My Documents\aprojects\peaslaptop\ex6datafiles\data\MICEsc.dta", replace
/*-----------------------------------------------------------------------------------------------------------
now open ex6imp2 and look at the results to see if OK and close again
--------------------------------------------------------------------------------------------------------------*/
use ex6imp2,clear
/*------and save it--------don't want to have to repeat--------------------------------*/
save ex6imp2.dta,replace
/*-------------------- examine contents---------------------------------------------------------*/
/*-------------------- some sample plots for observed and imputed data------*/
histogram dvar1, discrete percent gap(50) bfcolor(red) blcolor(red) xtitle(Variety sweep 1)by(missdvar1, legend(off))
histogram dvol1,  percent  bfcolor(red) blcolor(red) xtitle(Volume sweep 1)by(missdvol1, legend(off))
/*-------------- and some tables by imputation-----------*/
tabulate   missdvar6 dvar6, row
bysort _j: tabulate   missdvar6 dprev6, row

/*--------------- recode volume to round numbers and set -ve to zero-------*/
 replace dvol1=0 if dvol1<=0
replace  dvol1=round(dvol1)
replace dvol2=0 if dvol2<=0
replace  dvol2=round(dvol2)
replace dvol3=0 if dvol3<=0
replace  dvol3=round(dvol3)
replace dvol4=0 if dvol4<=0
replace  dvol4=round(dvol4)
replace dvol5=0 if dvol5<=0
replace  dvol5=round(dvol5)
replace dvol6=0 if dvol6<=0
replace  dvol6=round(dvol6)

/*---------------------------------tidy so vol agrees w variety together----------------------------------------*/


replace dvol1=0 if dvar1==0
replace dvol2=0 if dvar2==0
replace dvol3=0 if dvar3==0
replace dvol4=0 if dvar4==0
replace dvol5=0 if dvar5==0
replace dvol6=0 if dvar6==0

replace dvol1=1 if dvar1>0 & dvol1==0
replace dvol2=1 if dvar2>0 & dvol2==0
replace dvol3=1 if dvar3>0 & dvol3==0
replace dvol4=1 if dvar4>0 & dvol4==0
replace dvol5=1 if dvar5>0 & dvol5==0
replace dvol6=1 if dvar6>0 & dvol6==0

replace  dprev1=0 if dvar1==0
replace dprev2=0 if dvar2==0
replace dprev3=0 if dvar3==0
replace dprev4=0 if dvar4==0
replace dprev5=0 if dvar5==0
replace dprev6=0 if dvar6==0

replace  dprev1=1 if dvar1>0
replace dprev2=1 if dvar2>0
replace dprev3=1 if dvar3>0
replace dprev4=1 if dvar4>0
replace dprev5=1 if dvar5>0
replace dprev6=1 if dvar6>0
save "C:\Documents and Settings\gillian raab\My Documents\aprojects\peaslaptop\ex6datafiles\data\MICEsc.dta", replace

/*----------------- check some histograms now-----------------------*/
histogram dvol1,  percent  bfcolor(red) blcolor(red) xtitle(Volume sweep 1)by(missdvol1, legend(off))

/*---Now we can see how the imputed values vary by imputation-------*/
bysort _j: summarize dprev1 dprev2  dprev3 dprev4 dprev5 dprev6

/*--------------- and get some results to compare---------------*/
tab gender,gen(gendum)
micombine 
regress dprev1 gendum1 gendum2,noconstant
micombine regress dprev6 gendum1 gendum2,noconstant
micombine regress dvol1 gendum1 gendum2,noconstant
micombine regress dvol6 gendum1 gendum2,noconstant
micombine regress dvar1 gendum1 gendum2,noconstant
micombine regress dvar6 gendum1 gendum2,noconstant

/*-------------now a logistic regression-----------*/

tabulate sector, gen( sec)
tabulate szindep,gen(dep)
tab dep1
micombine logistic dprev6 gender dep1  sec2 sec3 sec4