About this Resource
P|E|A|S FAQ   | home   | contact us   | index   | links and further resources   |

text/print version

A. The R Package

A.1 Introduction > A.2 Web Links > A.3 Costs and Licensing > A.4 What kind of package is it?
>A.5 Methods for Complex Surveys > A.6 User Written Code > A.7 Software for imputation.


rlogoThe R system for statistical computing software is supported by an international collaboration of computer scientists and statisticians. It is sophisticated software and highly professional and bears a more than chance resemblance to the commercial package S-PLUS.

Unlike most shareware it is very well supported and I have usually had a response within a few hours when the author of a part of the system is contacted for help. It is freely downloadable and can be installed on a PC, a Mac or a UNIX system in less than 5 minutes. All the details are on the web site.

New versions appear regularly. Most of the exemplar code has been developed for Version 1.09 on a PC. The latest version (October 04) is version 2.01 and some of the exemplar code has been tested on this without problems. The new version for PCs has some convenient features like a facility to open script files and run code directly from them.

Mini-guides are intended as reminders for people with some experience of R:

mini guide 1
mini guide 2
mini guide 3
mini guide 4
mini guide 5
mini guide 6
The R Guide for novices is intended to give non-R users the basic help they need to analyse surveys. Click here for the guide.


http://www.r-project.org is the main link to the R project which then links to
mirror sites across the world.

There are extensive links to official and contributed documentation on this site.
R for Beginners, by Emmanuel Paradis, in the contributed documentation is a good introduction for someone who really wants to get to grips with the package.

The textbook Modern Applied Statistics with Splus (Venables, W amd Ripley B, Springer, 1999) is a comprehensive guide to using Splus/R but does not cover the Survey procedures.

There are various mailing lists for R-help and questions and answers can be searched from the archive page.

The html R help can be accessed online at the R Search Site, though it may not always have the latest versions.

The home page for the R survey package gives an overview of the package as well as lots of worked examples and explanations (see below),


R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License . The license allows you to do many things including selling the software and selling products that include the software, as detailed in the license conditions.

The R package compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

A.4 What kind of package is it?

It is an interactive, object-oriented package that supports many kinds of structures. It is an interpreted computing language where you have data objects that you manipulate in a workspace. e.g, data frames, matrices, many dimensional arrays, survey design objects. Many statistical procedures are available in the base package and even more in the contributed libraries which can be quickly and easily downloaded from the web as you work.

Plus points
  • Its graphics facilities are excellent and extremely flexible.
  • It has a vast array of statistical procedures
  • It has a very wide range of high quality and up-to-date statistical software.
  • Its statistical procedures are very well documented
  • One can carry out very complicated operations on the R data structures
  • Online help is convenient
  • Thomas Lumley (author of the survey package) fixes things that you suggest in no time at all
Minus points
  • To someone used to a traditional package like SPSS or SAS, the R package may seem very confusing and the learning curve will be steep
  • You can get confused with the very complication of the procedures you can carry out
  • It is designed to be interactive, and this makes it a challenge to keep a good audit trail for practical survey work It is not well suited to the kind of detailed documentation that is required for survey work
  • While you are working the data is held in memory and this may be a limitation especially on older machines.
A.5 Methods for complex surveys

General and Authorship

A survey analysis package has recently been added to R. It has been written by Thomas Lumley of the University of Washington. He has been very helpful in adapting the package following our suggestions.

The survey package seems to be convenient from the point of view of an experienced R user. We have developed the exemplars with the version current in the first half of 2004. The current version is 3.0 (May 2005). Details of improvements from earlier versions can be found at the NEWS link from the suvey package home page.Further updates are expected.

The new version 3.2 is eaven more improved, easier to use, and includes generalised calibration methods.

What do they do?

The software covers a wide range of survey analysis methods. In particular: - mean, quantiles, variance, tables, ratios, totals
- generalised linear models (e.g. linear regression, logistic regression, Poisson models, etc.)
- proportional hazards models

There is also a new set of procedures for non-response weighting, but we have not had time to investigate them yet.

These methods are implemented with standard errors from either

  • Taylor linearisation in functions such as svydesign ,svymean and svyreg
  • or replication methods such as balanced repeated replication and the jackknife are available through functions such as svyrepdesign, svrepmean and svrepreg

In earlier versions these were seperate functions. More recent versions (2.9 and later) use the same functions for both methods and R recognises which one to use from the properties of the design you have declared.

(Click here for more info on standard errors when designing surveys.)

Documentation for the survey package

The main source os Thomas Lumley's home page.

Manual pages for the survey package are available while you work and can be read online here. A paper describing the package is Lumley T. (2004) "Analysis of complex survey samples" Journal of Statistical Software 9(8) available here


The R language allows experienced users to write their own functions that can be used by others. Some have been written for this project. To use them:-

Save the file (myRfunctions.R) to somewhere on your machine. You can read it with a text editor.

When running R go to the file menu > source R code and locate the file ‘myRfunctions.R’ and click on it to read into R.

The functions are then available for you to use and you only need to do this once

They consist of modified versions of the files for the survey package. The syntax is the same as for the original R survey functions.

While we were writing the material for the P|E|A|S site we have been incontact with Thomas Luimley, the author of the survey package about various aspecst of the R survey functions. He has now incorporated the ideas in our local functions into the newest version of the survey package, so they should not now be required. This includes the method we describe for estimating percentiles. that is now included in the R function svy.quantile.



svymean.deff - svymean Calculates the mean of a survey variable and also prints the design effect. Now obsolete as survey package has this as an option.

svrepmean.deff -svrepmeanAs above for a survey with replicate weights.
my.svyquantile svyquantile Uses a simpler different simpler method of quantile estimation that is more suited to large data sets(more about percentiles.)

my.svrepquantile - msvrepquantileAs above for a survey with replicate weights.


A.7 Software for imputation

There are several R user-supplied packages that contribute carrying out imputation and analysing imputed data. These are summarised in the pages on imputation (section 6.6).

P|E|A|S profect 2004/2005/2006