Publication - Statistics

Confidence intervals for the Scottish Health Survey

It is good practice to present confidence intervals alongside the point estimate wherever an inference is being made from a sample to a population.

Published:
14 Jan 2020
Confidence intervals for the Scottish Health Survey

When reporting results, it is good practice to present confidence intervals alongside the point estimate wherever an inference is being made from a sample to a population.

What is a confidence interval?

Confidence intervals provide the likely range of a sample proportion or sample mean from the true proportion/mean found in the population. This enables us to estimate the precision of results obtained from our sample, compared with the true population.

Confidence intervals usually appear as : estimate +/- margin of error

For example, say when analysing an estimate in a sample we produce a mean result of 23%. For this estimate we also calculate a confidence interval of +/- 4.2% for this estimate. Therefore, our confidence interval is (23%-4.2%) to (23%+4.2%), ie the confidence interval is (18.8% to 27.2%).

Note that the margin of error covers only random sampling errors. Undercoverage and non-response are not covered. Outliers can have a big impact on the confidence interval. This is hardly surprising since we use the mean and standard deviation to calculate the confidence interval.

Why do we need confidence intervals?

Say we wish to know the proportion (ie percentage) of adults who are overweight, or the mean BMI of adults in the Scottish population. To obtain these figures we analyse the relevant variables from our sample (eg the Scottish Health Survey 2008 dataset) of the population and produce a sample proportion and sample mean.

However, we know when we produce these proportions/means from our sample that they are probably not exactly the same as the true proportion/mean actually existing in the population (in this case the Scottish population).

Why is this ?

Every time we estimate the statistics found in the population we will produce different results. This is due to sampling variability - if we were to randomly draw our sample from the population over many times, the samples we obtained would all differ slightly from each other (since they would contain a different range of respondents). Thus the sample proportions and sample means produced from each of these samples would also differ, even although the variations may (or may not) be small.

The true proportion or mean for the population does exist and and is a fixed number, but we just do not know exactly what it is, even although we think we have a pretty good estimate of the population proportion/mean values by estimating them from our sample. Thus we need more certainty about the results we have produced which will also enable us to make inferences about the population we have sampled. Therefore by using confidence intervals we can determine what is the plausible range of the true proportion/mean in the population. Thus confidence intervals allow us to estimate a range of values for the true population proportion/mean.

What is a confidence level?

We need to know how likely it is that this range (confidence interval) about the sample proportion/mean will actually contain the true - but unknown - proportion/mean value in the population. How likely this will actually occur, ie the probability of it happening, is called the confidence level. It is usual practice to create confidence intervals at the 95% level which means 95% of the time our confidence intervals should contain the true value found in the population.

If we increase our confidence level (eg to 99%), then we increase the size of the range about our estimate. For example say we calculate a confidence interval of an estimate and obtain (+/- 2%) at the 95% level. For the same estimate this range increases to (+/-3.7%) at the 99% level. Clearly a smaller range about our estimate (ie the margin of error) is more useful, when making inferences about our population.

Can we reduce the width of our confidence interval to increase precision ?

If our confidence interval is too large (ie the margin of error is large), then there are a number of methods we can use to reduce it, thereby improving the precision of our results. For example, we could increase our sample size (if possible), use a lower level of confidence (eg. change from 99% to 95%) or reduce the standard deviation. See below for more details on sample size and confidence intervals.

What impact does sample size have on confidence intervals?

It is important to recognise that it is our sample size that influences the margin of error (ie the confidence interval). The true size of the population does not affect it. Confidence intervals from large sample sizes tend to be quite narrow in width, resulting in more precise estimates, whereas confidence intervals from small sample sizes tend to be wide, producing less precise results. However, after reaching a certain sample size it is not really worth increasing our sample size any further. Increasing the sample size from 100 to 500 reduces the confidence intervals from 9.8 to 4.3, whereas increasing the sample size further to 1000 only reduces the confidence intervals to 3.1. A more detailed discussion on the impact of sample size on confidence intervals is available.

Sample sizes for many Government surveys need to be large to allow a minimum sample size for analysis of the smaller sub-groups eg Health Boards and other sub-groups of interest: the large sample collected is not necessarily required for the Scotland level estimates but rather the disaggregation required, to produce estimates with acceptable levels of precision.

When should confidence intervals be used?

Ideally confidence intervals should be presented alongside the sample mean or sample proportion produced, wherever an inference is being made from a sample to a population.

How to use confidence intervals to estimate change

Confidence intervals can be used to help make inferences about any changes in the population, for example, changes over a time period.

For example, say in 2008 an estimate and its corresponding confidence interval are calculated, and this estimate is recorded again in 2010. We can use these two estimates to determine whether any significant change has occurred in this estimate over time. If the confidence intervals of these two estimates do not overlap then there is a statistically significant difference between the two estimates. However, if they do overlap, it does not necessarily mean there is no significant difference. A more exact approach is to calculate the ratio of the two estimates, or calculate the difference between them, then produce a corresponding confidence interval for this difference.

A more detailed explanation on using confidence intervals to estimate change is available.

How does a complex survey design affect confidence intervals?

The Scottish Health Survey 2008 used a clustered, stratified multi-stage sample design. In addition, weights were applied when obtaining survey estimates. One of the effects of using the complex design and weighting is that standard errors for survey estimates are generally higher than the standard errors that would be derived from an unweighted simple random sample of the same size. Therefore the true standard errors of the complex design are calculated by multiplying the standard error (of an estimate from a simple random sample) by the design factor(deft).

The ratio of the standard error of the complex sample to that of a simple random sample of the same size is known as the design factor.

The 95% confidence interval of a complex survey design is equal to:

p +/- (1.96 x true standard error)

where

true standard error = design factor x standard error of a simple random sample; and

p = the point estimate, which is the percentage or proportion estimated from our sample (or sample mean)

SAS users, and SPSS users (who also have the Complex Samples Module add-on) can quickly produce the confidence intervals of an estimate from a complex survey design.

How to report confidence intervals?

When presenting the confidence interval of an estimate, the level of confidence should be clearly stated. The upper and lower limits of the interval should be clearly labelled, or expressed as a range (eg 18.8% to 27.2%). In graphs or charts, the confidence limits are usually presented as bars or whiskers (see link below).

How to present confidence intervals in Excel

Useful guidance is available on how to present confidence intervals of estimates in chart format within Excel (below).

Useful links:

'How good are our estimates' (Scottish Government)

'Calculation of Confidence intervals for point estimates and change' (Scottish Government)

'Commonly used public health statistics and their confidence intervals' (Guidance from the Association of Public Health Observatories (APHO) on the calculation of rates, proportions, means, confidence intervals and age-sex standardised rates for the most common statistics used in public health intelligence)

A spreadsheet tool designed to accompany the above 'Commonly used public health statistics and their confidence intervals' give examples of calculating the statistics and confidence intervals in Excel.

'Representing confidence intervals in Excel' (London Health Observatory)