Errors in Statistical Data
In sample surveys there are two types of error that can occur:
- sampling error which arises as only a part of the population is used to represent the whole population and;
- non-sampling error which can occur at any stage of a sample survey.
It is important to be aware of these errors so that they can be minimized.
Sampling error
Sampling error is the error we make in selecting samples that are not representative of the population. Since it is practically impossible for a smaller segment of a population to be exactly representative of the population, some degree of sampling error will be present whenever we select a sample. It is important to consider sampling error when publishing survey results as it gives an indication of the accuracy of the estimate and therefore reflects the importance that can be placed on interpretations.
If sampling principles are carefully applied within the constraints of available resources, sampling error can be accurately measured and kept to a minimum. Sampling error is affected by:
_ sample size
_ variability within the population
_ sampling scheme
Generally larger sample sizes decrease sampling error. To halve the sampling error the sample size has to be increased fourfold. In fact, sampling error can be completely eliminated by increasing the sample size to include every element in the population.
The population variability also affects the error, more variable populations give rise to larger errors as the samples or estimates calculated from different samples are more likely to have greater variation. The effect of the variability within the population can be reduced by increasing sample size to make it more representative of the target population.
Non-sampling error
Non-sampling error can be defined as those errors in a survey that are not sampling errors. Non-sampling error is any error not caused by the fact that we have only selected part of the population in the survey. Even if we were to undertake a complete enumeration of the population, non-sampling errors might remain. In fact, as the size of the sample increases, the non-sampling errors may get larger, because of such factors as possible increase in the response rate, interviewer errors, and data processing errors.
For the most part we cannot measure the effect that non-sampling errors will have on the results. Because of their nature, these errors may not be totally eliminated. Perhaps the biggest source of non-sampling error is a poorly designed questionnaire. The questionnaire can influence the response rate achieved in the survey, the quality of responses obtained and consequently the conclusions drawn from survey results.
Some common sources of non-sampling error are discussed in the following paragraphs.
Target Population
Failure to identify clearly who is to be surveyed. This can result in an inadequate sampling frame; imprecise definitions of concepts and poor coverage rules.
Non-response
A non-response error occurs when the respondents do not reflect the sampling frame. This could occur when the people who do not respond to the survey differ to the people who did respond to the survey. This often occurs in voluntary response polls. For example, suppose that in an air bag study we asked respondents to call a 0018 number to be interviewed. Because a 0018 call cost $2 per minute, many drivers may not respond. Furthermore, those who do respond may be the people who have had bad experiences with air bags. Thus the final sample of respondents may not even represent the sampling frame.
For example,
_ telephone polls miss those people without phones
_ household surveys miss homeless, prisoners, students in colleges, etc.
_ train surveys only target public transport users and tend to include regular public
transport users.
The questionnaire
Poorly designed questionnaires with mistakes in wording, content or layout may make it difficult to record accurate answers. The most effective methods of designing a questionnaire are discussed in Section 2.4. If these principles are followed it will help reduce the non-sampling error associated with the questionnaire.
Interviewers
If an interviewer is used to administer the survey, their work has the potential to produce non-sampling error. This can be due to the personal characteristics of the interviewer. For example, an elderly person will often be more comfortable giving information to a female interviewer. Other factors which could cause error are the interviewer’s opinions and characteristics which may influence the respondent’s answers.
Respondents
Respondents can also be a source of non-sampling error. They may refuse to answer questions, or provide inaccurate information to protect themselves. They may have memory lapses and/or lack of motivation to answer the questionnaire, particularly if the questionnaire is lengthy, overly complicated or of a sensitive nature. Respondent fatigue is a very important factor.
Social desirability bias refers to the effect where respondents will provide answers which they think are more acceptable, or which they think the interviewer wants to hear. For example, respondents may state that they have a higher income than is actually the case if they feel this will increase their status.
Respondents may refuse to answer a question which they find embarrassing or choose a response which prevents them from continuing with the questions. For example, if asked the question: “Are you taking oral contraceptive pills for any reason?”, and knowing that if they respond “Yes” they will be asked for more details, respondents who are embarrassed by the question are likely to answer “No”, even if this is incorrect.
Fatigue can be a problem in surveys which require a high level of commitment for respondents.
The level of accuracy and detail supplied may decrease as respondents become tired of recording all information. Sometimes interviewer fatigue can also be a problem, particularly when the interviewers have a large number of interviews to conduct.
Processing and collection
Processing and collection errors can be a source of non-sampling error. For example, the results from the survey may be entered incorrectly . The time of year the survey is enumerated can produce non-sampling error. For example, if the survey is conducted in the school holidays, potential respondents with school children could possibly be away or hard to contact.