All data was subjected to automated consistency checks and identification of missing values. However, full editing and imputation was only carried out on the 100% coded data. This involved detailed checking carried out on each ED separately. Where missing, invalid and inconsistent values exceeded an agreed level of tolerance (set according to area . inner urban areas were expected to have more problems) the forms were subject to further clerical checking.
Once prescribed tolerance levels had been met, the data was subject to a process of imputation in which new values were given in place of missing, invalid and inconsistent answers.
Imputation was carried out for individual data items and for absent households. This was based on a "hot-deck" system in which the imputed values were taken from cases that had similar values on other variables. The variables used in this process varied according to the variable for which an imputed value was required (e.g. tenure, building type, and number of residents proved a good predictor of numbers of cars available to household).
For absent households who had failed to return forms, the type of area combined with the information collected (estimated) by enumerators (number of residents number of rooms, whether building self-contained) was used as a basis for matching to, and imputing records from, absent households which had returned forms.
To derive the 10% sample for the processing of "hard to code" questions, the country was first divided up into stratum of ten consecutive households (or ten persons in communal establishments). Then one household was randomly selected from every stratum (with a similar process for communal establishments). Since none of the 10% "hard to code" questions were imputed for "wholly absent" households, these were excluded from the 10% sample. The sample included all individuals within the selected households and communal establishments. Link to table showing how each 10% question was coded.