PAPP102 - S08: Internal evaluation of demographic data

Random error

From the typology introduced on the previous page, we can see that random error, which affects the precision of our estimates, can arise from both measurement and statistical error.

In general, demographers do not concern themselves overly with random measurement error. In a census, for example, much of the data collected are not validated internally for consistency – a respondent may declare whether their house has piped water or not, but this information is not verified in any way. With other variables collected in a census, a limited degree of verification is possible. For example, if data are collected on a subject’s age and date of birth, these two pieces of information may be evaluated for consistency at a data processing stage. Likewise, a woman’s reports of the number of children surviving and number of children dead may be juxtaposed against her report of the total number of children to whom she reports having given birth.

In a Demographic and Health Survey (or in other surveys where much more time is allocated to collecting information from each respondent than in a census), a greater degree of verification is possible. Where anthropometric data are collected for example, subjects may be weighed or the heights measured twice, and the result only accepted if the two results are deemed sufficiently close – for example, the two measures of height are consistent to within 2cm, or weight to within 1 kg.

Random sampling error (‘statistical error’), which bears directly on the external validity of the results, is afforded much greater attention in demographic work. Errors of this kind can be attenuated by means of using larger samples (which reduces the variation in the estimates, albeit with an increased cost of conducting the survey because more subjects need to be sampled), and is explicitly accommodated in the standard practice of statistical inference and analysis. As has been noted in an earlier session , the costs of running a survey are roughly proportional to the size of the survey, but the gain in precision increases only with the square root of the sample size. Thus, doubling the size of a survey from 1 000 to 2 000 respondents would cost roughly twice (i.e. 100 per cent) as much, but the gain in precision would be only approximately 40 per cent $(=√{2000/1000} = √2 = 1.41)$