About this Resource
P|E|A|S FAQ   | home   | contact us   | index   | links and further resources   |

D. The Stata Package

uparrowtop

Statalogo is a commercial package developed by the Stata corporation in the USA for statistics, graphics and data management. It is widely used by academics especially in medical research and particularly in epidemiology. It is continuing developing, and new versions are released every one or two years. The exemplars have all been developed using Stata Version 8.1, but are now being updated to Stata 9. Exemplar 1 is complete. Version 8 code is still available.

Version 9 was released in May 2005. Although it announces that it has many new survey features, some of these are availbale to survey 8 users via contributed libraries. The major additions to the survey routines are analysis via replication and methods for InfoButtonpost-stratification . Replication methods are available in the version 8 library contributed by Nick Winter (see D.5 below). This library also allows poststratification to more than one set of population totals by raking. These routines are used for exemplar 1.

More details of Version 8 to 9 changes are in section D7.

To help you get started with Stata we have developed some guides to go with each exemplar.

Mini-guides are intended as reminders for people with some experience with Stata.

mini guide 1
mini guide 2
mini guide 3
mini guide 4
mini guide 5
mini guide 6

Novice guides are intended to give non-Stata users the basic help they need to analyse surveys: novice guide

uparrowtop
D.2 Web links

http://www.Stata.com is the main site for the Stata corporation has links to resources such as FAQs and other documentation.

Click here for UCLA with extensive resources to help you with Stata as well as links to other sources of help.

A web page that has helpful hints on this can be found at http://statlab.stat.yale.edu/help/workshops/Stata/.

There is an active Stata mailing list and questions and answers can be searched from the archive page.

uparrowtop
D.3 Costs and licensing

Stata is distributed in the UK by Timberlake consultants. Licenses are sold as perpetual licenses with different prices for academic and non-academic users, for larger and smaller versions and special deals for students. A single user academic license of standard Stata with full documentation is currently (October 04)around £400 for an academic user and £700 for non-academic users. Stata is not currently available via CHEST.

uparrowtop
D.4 What kind of packageis it?

Stata lies between SPSS and SAS on several dimensions. It is fairly easy to use although not quite so heavily based on a point-and-click interface as SPSS. Its data manipulation facilities are comparable to SPSS, but less powerful than SAS. Like SAS its statistical procedures are well supported and documented.

Plus points

  • It has the best range of survey software that is available in a standard commercial package. It has a very wide range of high quality and up-to-date statistical software.
  • Its output is clear and easy to read. Its statistical procedures are very well documented in manuals.
  • It can produce high quality graphics
  • There is a wide range of user-written software available (see below)
Minus points
  • Its main interface, with a single command line, seems a bit clumsy, but running from Do files works well.
uparrowtop
D.5 Methods for complex surveys

Stata has a range of commands for complex surveys that are documented
in their manual ‘Survey Data’. These include methods for means, proportions, tables, regression (including logistic and Poisson regression) among others. The manual explains the background carefully.

Linearisation methods and replication methods (jackknife and BRR) are available as options. Design effects are available via the post-estimation command (estat, effects). An excellent guide on how to use Stata survey methods can be found at the UCLA site.

uparrowtop
D.5 Methods for complex surveys

Stata has a range of commands for complex surveys that are documented
in their manual ‘Survey Data’. These include methods for means, proportions, tables, regression (including logistic and Poisson regression) among others. The manual explains the background carefully.

Linearisation methods and replication methods, see theory section 4.3, are available, as options with the standard commands. An excellent guide on how to use Stata survey methods can be found at the UCLA site.

uparrowtop
D.6 Contributed package for survey analysis by replication methods

The svr package for version 8 provides Stata routines to analyze complex survey data using replication methods. It was released in June 2004 and its author is: Nick Winter of University of Virginia University.

Links to Stata programs from the author’s web site provide a summary

The available methods are balanced repeated replication (BRR), and several versions of survey jackknife (JK1, JK2, and JKn).

The package allows both the calculation of replicate weights and the analysis of data sets that are provided with replicate weights.

Methods of analysis that are covered include:-

  • Descriptive statistics (svrmean, svrtotal, svrratio)
  • Correlations (svrcorr)
  • Regression models including logistic and Poisson models and more(svrmodel, svrest)
  • Poststratification and raking

Version 9 includes these methods as part of the main survey capability. The only exeption is raking to match more than one total, which cannot be handled by the main package yet.

uparrowtop
D.7 Changes from version 8 to version 9

There are quite major changes to the survey commands in Stata 9 compared to Stata 8. A detailed summary can be found at the relevant section of the Stata website.

The basic change is that the survey commands are now prefixes to the equivalent non-survey commands. Replication methods are now part of the main package, rather than available only as a contributed library

Another feature of version 9 is that a new svyset command automatically clears all previous settings. This would have saved me a lot of mistakes when the original P|E|A|S code was being prepareSd.

If you have code that has worked on version 8 you can still use it if your data set was created under version 8 (true for the P|E|A|S exemplars) or for V9 data sets by replacing the svyset command with the same command with a prefix version 8: svyset.

You may also find a seminar on the UCLA statistical computing site What's new in Stata 9? helpful.

peas project 2004/2005/2006