Weighting

P|E|A|S

text/print version

3. Weighting

3.0 Introduction 3.1 Scaling of weights> 3.2 Weighted means>3.3 Weighted tables> 3.4 Weighted percentiles > 3.5 Why weight?> 3.6 Calculating Weights> 3.7 The effect of weighting on standard errors>3.8 Weighting in regression analyses > 3.9 Different kinds of weights

top

3.0 Introduction

Many surveys come with weights that should be used in the analysis. The purpose of these weights is to allow the analysis to be adjusted to better represent the population. Special procedures are required for making table of means, counts, percentages, or other survey statistics from data with weights.

For example, suppose we have data from a survey with two strata (rich areas and poor areas) and a sample of exactly 100 households from each stratum. But we might know that in the population there are exactly 2500 households in poor areas and 1200 in rich areas. Our sample would then over-represent the rich areas.

If we just calculated the mean income household of the sample it would be too high. To get an unbiased estimate of the population average household income we need to give the rich areas lower weight than the poor areas.

The way to get the weights in this example is to use the inverse of the sampling fraction in each stratum as a weight.

�	Rich areas	Poor areas	Total
number in sample	100	100	200
number in population	1200	2500	3700
sampling fraction	100/1200	100/2500
weight	1200/100=12	2500/100=25
sum of weights	1200	2500	3700

Part of the ESDS government site has a useful gude to weighting in UK government surveys and a spreadsheet that gives information on what detail is available weighting for several of the main UK govenment surveys.

top

3.1 Scaling of weights

The weights in the example above are calculated as the inverse of the sampling fraction. This means that the the numbers in weighted tables will add to the population total. This kind of weight is known as a grossing up weight.

The weights can be scaled by multiplying by any constant without affecting weighted estimates of means, proportions and percentages in weighted tables. This is known as rescaling the weights.

Common choices for a choice of scaling are :

Making weights sum to the population total (grossing-up weights)
Making weights sum to the achieved sample total. This is popular because it means that significance levels and confidence intervals calculated from a weighted analysis using the wrong kind of weight (e.g WEIGHT in SPSS) will at least be in striking distance of the right answer, though they will NOT be correct and generally will be too narrow. See below for a discussion of different types of weighting.

If you need to calculate grossing up weights for a survey where another kind of weight has been supplied, you need simply to multiply each weight by the population size and divide it by the sum of the existing weights. An approximation to the population size is often acceptable.

Does it matter how I scale the weights?

Usually it doesn't. It will never make change your estimates of means, proportions or of regression coefficients. If you are using survey software that knows you have probability weights it will not affect the standard errors either, unless you are using finite population correction.

If you use the wrong kind of weights, then scaling will make a difference (see exemplar1).

It may also make a difference to the answers you get for a design effect. In SPSS you will only get the correct InfoButton design effects and design factors if you have grossing-up weights. This is also the default in R, but there is an option that allows you to override this.

top

3.2 Weighted means (averages) how they are calculated

- A simple mean is just (sum of observations) � (total number).

- A weighted mean is (sum of observations times weight) divided by (sum of weights).

So we can see that an ordinary mean can be thought of as a weighted mean when all the weights are exactly 1.000.

In the example above, suppose the mean household income in the poor areas was �12,000 and that in the rich areas was �25,000, then the weighted mean would be

[100x �12,000 x (weight = 25) +100 x �25,000 x(weight=12)]��
�� (100x25+100 x 12) �� = �16,216.20.

An unweighted mean here would just be �18,500, so we can see that the weighting has corrected the fact that the sample had too many rich households.

top

3.3 Weighted tables

In calculating ordinary tables we put counts into the boxes for the table and then get percentages for the rows or the columns. For a weighted table we simply put the sum of the weights in the boxes instead and calculate the percentages odd rows and columns from these.

The actual sums of the weights do not make much sense unless the weights are grossing-up weights, so that they respresent the population.

In the example above, suppose that we found that 60 of the 100 households in the rich areas had central heating, while only 32 of those in the poor area did.

Then the weighted table would look like this:-

�	Rich area	Poor area	Both areas
Central Heating	60 * 12 = 720	32 *25=800	1520
No Central Heating	40 * 12 = 480	68 *25=1700	2180
All	1200	2500	3700
Base	100	100	200

Note that we add a row at the bottom of the table to show the actual numbers (referred to as the base) from which the weighted table is calculated.

These are grossing-up weights, so the numbers in the table estimate the numbers in the population. We can calculate our estimate of the proportion of central heating in the whole area as 1520 � 3700 = 41.1%, closer to the poor area percentage (32%) than that for the rich area (60%).

We can calculate row or column percentages from any such weighted tables. When we do this it is good practice to indicate how many cases (here households) have contributed to the analysis. This is known as the base for the row or column.

For example, suppose we are interested to know the proportion of central heating installations in different kinds of area, we would get this table:-

�	Rich Area	Poor Area	Both Areas	Base
central heating	47%	53 %	100%	92
no central heating	22%	78%	100%	108

As in table 3.2 we have information on the base, here as a final colum. This is good practice for presenting the results of a survey since it can point the reader to subsets of the data where the numbers are too small to give reliable results.

An adjustment for the survey design is also required for chi-squared tests. More details of this can be found in the section on ;section 5.6; of the pages on standard errors for surveys.

top

3.4 Weighted Percentiles

In caclulating percentiles of the distribution from a weighted sample, we need to allow for the fact that the units in the sample represent different proportions of the population. The procedure is to rank the members of the sample by increasing values of the variable for which the population percentiles are required. Then to calculate the cumulative sum of the weights along the weighted sample. The point at which the cumulative sum of the weights reaches half of the total for the sample is the median. The point at which it reaches 10% of the total is the 10th percentile, and so on.

The easiset way to get a standard error or confidence interval for a weighted percentile, is to calculate a InfoButton standard error or confidence interval for the survey estimate of a percentage and to map this back to the values of the variable at the limits.

To view a 'How To' weighted percentile diagram click here.

top

3.5 Why Do We Needto Weight?

Often weighting is needed in surveys because of stratification disproportionate sampling. Another important reason for weighting is to adjust for survey non-response. For a comparison with other, non-survey, reasons for weighting see below.

But weighting may be required for other sample designs that use non-equal probabilities of selection. In many surveys some units are, usually for reasons of convenience, given a higher probability of selection than others. This is dealt with in the analysis by weighting the data. Examples include:

selection of individuals within households
selection of events

1. Selection of individuals within households
It is usual in surveys of the general population to select an equal probability sample of addresses from the Postcode Address File and then to:

enumerate the households at that address and select all, or a sub-sample of households, and
to enumerate the adults within each household and select just one for interview.

This results in adults from larger households being under-sampled relative to adults from smaller households. It can also lead to households
from multi-household addresses being under-sampled.

2. Selection of events
In some surveys the unit of analysis is sometimes an event rather than an individual. For instance, in the Health Survey for England, some analysis is done on the accidents that individuals experience, where the unit of analysis is an accident rather than the individual having the accident.

What often happens in these cases is that, for each individual in the survey, the events of interest are enumerated and a random sub-sample of events per person selected. Details are then only collected on the sub-sample of events. This tends to mean that the probability of selection for each event is smaller for individuals with a larger number of events than for those with a smaller number of events.

Why non-equal probabilities of selection are used:
Survey sample designers would probably avoid non-equal probabilities of selection that are not part of the stratification if they could. The reason they can’t be avoided is usually because the units on the sampling frame do not match one-to-one with the units of analysis.

Instead the analysis units cluster within the frame units. For instance, individuals cluster within households, and households cluster within addresses, yet ‘address’ is often the sampling frame unit. Taking this example of selecting individuals from a sampling frame of addresses: In such instances, to select a strictly equal probability sample of individuals there are only a limited number of options:

If you have a count of the number of individuals per address then you can over-sample ‘large’ addresses in proportion to their size and then select one individual per address. The under-sampling at the second stage then cancels out the over-sampling at the first stage. In practice this approach is not an option in the UK because the Postcode Address File ( PAF) does not have a count of individuals.
Select addresses with equal probability and then select all individuals at that address for the survey. This approach is sometimes used in surveys, but it has some serious problems attached to it. Firstly, in large households the survey can become very burdensome. Secondly, depending upon the subject of the survey, the responses of individuals from the same household may be highly correlated, in which case within-household clustering effects will come into play.

So, rather than accept these problems, survey designers often opt for the alternative of selecting addresses with equal probability and then selecting just one individual per household and then weighting the data to deal with the non-equal probabilities of selection.

top

3.6 Calculating the Weights

If survey units are selected with non-equal probabilities of selection then the data needs to be weighted. The weights are calculated in ratio to the inverse of the probability of selection.

For example, if survey unit X has half the chance of selection that survey unit Y has, then survey unit X will be given a weight twice as large as that of Y.

As an example, suppose a survey is designed where households are selected with equal probability and one adult is selected per household. Then the survey may find that, say, 38% of households have just one adult, 53% have two adults, and so on (see table below – second column).

Looking at the distribution in terms of the adults rather than the households, this means that each 100 households covers about 175 adults (column 3).

Yet only 100 of these 175 will be selected for the survey (column 4). To weight these 100 so that they are representative of the 175, the individuals selected have to be weighted by the household size (column 5).

Household size	Households per 100	Adults	Adults selected	Weight
1	38	38	38	1
2	51	102	51	2
3	9	27	9	3
4	2	8	2	4
5	100	175	100	�

top

3.7 The effect of weighting on the precision of estimates

In theory, weighting, by itself, can either improve the precision of survey estimates or make them worse. What happens to the precision of estimates is governed by how the outcome being measured relates to the weights.

If there is no relationship between the outcome being measured and the probability of selection into a survey then weighting will, on average, decrease precision.

We can get a rough guide to how large this effect will be by calculating the average squared weights, divided by the square of the average weight. For example, suppose a survey samples equal numbers two different kinds of household, and one has a weight of 1, whereas a smaller fraction of the other type of household was taken so it requires a weight of 3.

The average squared weight is then (1+9) / 2 = 5 and the average weight is (1+3) / 2 = 2. So the InfoButton design effects becomes 5 / (2 x 2) = 1.25 and its square root, the design factors 1.18. This means that the standard error of estimates might increase by 18% and the survey has the precision equivalent to a simple unweighted sample with a smaller sample size equal to around 80% of the sample..

But we can also get improvements from weighting. The commonest situation for this is when you are interested in estimating a proportion of something that is a rare event. Then the precision for the overall estimate of the numbers and proportions will be increased if the sampling fraction can be increased where the event is more common.

For example, if you wanted to estimate the number and percentage of high performance cars in a town, it would be best to take a sample that had a low sampling fraction in poorer areas, since very few people on low incomes would be expected to own such cars..

It is also beneficial to reduce the sampling fraction when there is an event that occurs nearly every member of a stratum. See exemplar 4, section 4.5.1 for an example where a larger sampling fraction has been taken in this type of stratum with an adverse impact on the population estimates for one particular outcome.

The principle is that sampling fractions should be higher for more variable responses. Both very rare or very common events have low variability so it is enough to get a small sample from such strata.

Where weighting has been introduced to adjust for non-response (see section 5) there are additional issues for the effect of weighting on standard errors. The weights will not have been selected to improve precision. Also, the weights themselves will not be known exactly but will have to be estimated from the data. For both of these reason it is best to avoid extreme non-response weights that may have an adverse effect on your results. See section 5.3 for some discussion of this.

top

3.8 Using weighted data in regression analyses

There is an ongoing debate about whether or not survey weights should be used for modelling of survey data. The arguments for and against can be summarised as:

Against using weights:

(i) In regression models weighting the data will only change the regression coefficients if the unweighted model error terms correlate with the weights. But if this is the case then the model must be mis-specified. Proper specification of the model will eliminate the difference between the unweighted and weighted regression coefficients. And since the unweighted estimates usually have the smallest standard errors these should be used in preference to the weighted estimates.

(ii) Some analyses (such as factor analysis in SPSS) do not allow you to specify a survey weight.

For using weights:
(i) Researchers use regression methods for many purposes and it is not always appropriate to use fully specified models. For instance a researcher may wish to develop a prediction model using just a small number of linear terms. In these instances the weighted and unweighted regression co-efficients may be very different. In this case it is simpler and safer to assume that the weighted survey model is a better fit to a population model than the unweighted survey model is.

(ii) The arguments about fully specified models may not apply for non-linear models such as logistic regression.

Exemplar 4 illustrates where weighting makes a difference even though the model appears to be fully specified.

top

3.9 Different kinds of weights

There are several different reasons why a statistical analysis needs to adjust for weighting and this means that the data need to be analyzed in different ways. All weighted estimates will give the same answers for means, proportions and regression coefficients. But they give different answers, sometimesvery different, for standard errors, intervals and statistical tests.

To tell a statistical package why weighting is being done we refer to different kinds of weight. There are several types of weight that you may come across. The most common ones are:-

Probability weights (either design weights or non-response weights)
Frequency weights
Analytical weights

All of the discussion above has assumed that the weights areprobabilty weights. This is almost always the correct choice for surveys. Design weights are determined by the probability that a unit has been selected as part of the sampling scheme. Non-response weights are determined from the probability of response calculated from data about responding and non-responding units. This is discussed in section 5.3 of the theory section on non-response and an example of one way of doing it is given in Exemplar 5. For most large government surveys the weights provided with the survey are a combination of design weights and non-response weights. The design weights are often modified by a InfoButton post-stratification procedure that matches the survey totals with population totals such as census data. For a discussion of how design weights differ from non response weights see section in the section on non-response.

Frequency weights assume that the data consist of a common response from more than one individual. Thus a frequency weight of 8 would mean that the sample has 8 people in it with the same responses to all the questions in the survey. This can lead to very misleading results if frequency weights are specified in error (seeExemplar 1) .

Analytical weights assume that the measurements for units in the survey are subject to different degrees of random error and that the larger the error for a unit the smaller the weight it should contribute to the estimate. But the sample size is taken as the number of units in the survey. They are appropriate for scientific experiments where some measurements are known to be more precise than others. Analytical weights are equivalent to rescaling frequency weights to make them add to the sample size. If what you really want is a probability weight you will not get the ridiculous answers that you can get with frequency weights, but your answer will not be right for a probability weight (seeExemplar 1) .

Software packages will often provide weighted analyses for any procedures, but they seldom explain what type of weight has been used. Stata is the exception. The choices are explained in the help section and there are options for probability, frequency and analytical weights. SAS and R assume analytical weights if a simple weight statement is given outside the survey procedures. An SPSS weight statement assumes that a weight is a frequency weight except in its complex surveys module.

peas project 2004/2005/2006