Skip to content

Known data issues

Some of the known problems relate to problems in implementing the experiments.

Wave 1

In Waves 1 and 2, we asked participants for consent to link administrative records to survey data. We will not be linking the administrative records because some of the consent forms have been lost. In the Wave 15 data release we included wave 1 consent variables. Wave 2 consent variables were restructured to improve clarity.

Wave 2

In Wave 2, a variable for w_ivtrans (translator used) was not collected. However, there is a related variable available in Waves 1-4, w_ivaffct22 (in what way was the respondent influenced: Other helped in translation, reading showcards, and other survey tasks).

Wave 3

In Wave 3, the Showcard experiment required at least some interviewers to use showcards for some participants and not for others. There are doubts about whether interviewers correctly followed the instruction about which sample members should have showcards. This situation could create errors and there is no check to tell whether or not the respondent saw the showcards.

In Wave 3, some respondents were incorrectly asked the experimental IP2 satisfaction questions in addition to the IP3 questions in relation to the satisfaction experiment. This happened with respondents with values 7, 8, 9, or 10 on the IP2 treatment indicator b_ff_lifesatw2. The responses to the IP3 questions for the respondents are potentially affected by having answered similar questions earlier in the interview. The questions that should not have been asked are c_lfsat variables ending in _g to _j. The c_lfsat variables ending in _a to _f are correct.

Variable c_conddateh, which is about strategies used to recall dates for a health condition beginning says it is a “check all that apply” variable. However, it was implemented as “select one”.  Similarly, the variable c_pldateh, which is about strategies used to recall dates for a move is documented as a “check all that apply” variable, but was implemented as “select one”.

Variables related to NSSEC in Wave 4 for current and last job (previously not included) are in the current release. These include the 3, 5 and 8 category classifications. 

A variable for highest qualification is not released because there has been a change in the response categories for educational and vocational qualifications.

In the Wave 3 Annual Events questions about employment, there are inconsistencies. For the first job, the question on the type of employment (nxtjbes) is less detailed than the one in the loop if they have additional jobs after this (nextjob). nxtjbes only asks if they were employed or self-employed, whereas nextjob asks if they were doing a different job for the same employer, working for a different employer or working as self-employed. This is only a problem for the first job reported in the annual events.

There are inconsistency in variable names and variable labels in employment histories between IP2 and IP3/IP4 (because of change in the way the histories are collected). From Wave 3 the loop through jobs starts at the second employment spell, whereas in IP2 the loop begins at the first spell. As a result the variable names are slightly inconsistent between IP2 and IP3/IP4. At IP3, the variable nxtst is supposedly equivalent to the variable nextstat1 at IP2 – i.e. it’s the first employment spell. However, it seems the variable nextstat1 has been incorrectly labelled as this first spell (it is in fact the 2nd employment spell after nxtst). In all Waves, the benefit income data has not been edited for outliers.

Wave 5

Errors in the Wave 5 questionnaire

The grid, household questionnaire and individual questionnaires were all programmed as separate web instruments, while the CAPI was programmed as one combined instrument. In previous waves, the feed-forward data sat within the household grid, and any text fills or routing in the household or individual questionnaires were programmed via a reference to the household grid data. In IP5, because the web instruments were programmed separately, the feed-forward data needed to be copied into these instruments, so that it could be referenced within the household or individual instrument. Each feed forward variable was copied individually (using code), there were mistakes in the code copying feed-forward data into the household and individual questionnaires. For subsequent waves, the whole feedforward is copied as a block, to ensure that all feedforward variables are copied correctly.

Feed-forward variables determine which experimental questions are asked in an interview, so the copying errors corrupted some of the experiments. This section describes their effects.

Household questionnaire. At the household level, three feed-forward variables: e_ff_rentwc, e_ff_metersw5 and e_ff_diw5 were improperly copied. The related variables about gas or electric meter reading were not asked and were not released in the data.

Additionally the e_ff_diw5 variable did not have the correctly assigned experimental values. This meant that the dependent interviewing (DI) experimental variables in the household questionnaire were confounded, in that some DI questions were asked, but not the ones that should have been according to the experimental design. There were four sets of questions affected by this confounding: hsrooms/hsbeds (number of bedrooms and other rooms at the address); hsownd (tenure); xpmg (monthly mortgage payments) and rent/rentwc (amount and frequency of rent). Some variables were combined to facilitate analysis; others were not released (see summary below).

The affected variables in the household questionnaire were:

Summary of household variables affected by errors

VariableImpact
E_FF_METERSW5Blank due to programming error
E_FF_DIW5Incorrect values due to programming error
E_HSROOMCHKCombined version released
E_HSOWNDCHKCombined version released
XPMG_AAsked, but wrong experimental version, not released
XPMG_BAsked, but wrong experimental version, not released
XPMG_CAsked, but wrong experimental version, not released
XPMG_DAsked, but wrong experimental version, not released
FF_RENTWCBlank due to programming error
RENTCHK_AAsked, but wrong experimental version, not released
RENTCHK_BAsked, but wrong experimental version, not released
RENTCHK_CAsked, but wrong experimental version, not released
RENTCHK_DAsked, but wrong experimental version, not released
GASUSENot asked due to programming error in FF_MetersW5
GASUSE_CAWINot released
GASMETERAsked, but wrong experimental version, not released
GASESTAsked, but wrong experimental version, not released
ELECUSEAsked, but wrong experimental version, not released
ELECMETERAsked, but wrong experimental version, not released
ELECESTAsked, but wrong experimental version, not released

Errors in Wave 5 individual questionnaire. There was an error in the code copying three feed-forward variables in the employment modules of the individual questionnaire which meant that they were blank: ff_jbmngr, ff_jbsize and ff_jbterm1. This affected multiple variables which were not released. See the summary below.

Due to an error in the code, none of the e_ff_bentype01 to e_ff_bentype37 variables were copied into the Individual questionnaire. This affected the nfh01 to nfh37 variables about benefit income. It only affected those people who did not mention a benefit that they said they were receiving the previous year. Those people will not have received the additional prompt question reminding them of last year’s answer. Our estimate is that around three-quarters of respondents were not eligible to be asked any additional prompt questions in the first place; of those who were eligible to be asked any, a large majority (around 70 per cent) only missed out on one such question, 20 per cent missed out on two, and ten per cent missed out on three or more.

The e_ff_casiw5 variable was not copied into the individual questionnaire at the start of fieldwork. The variable controls the mode of the self-completion questionnaire. The problem was resolved part way through the fieldwork period (after June 11). We created a variable e_scflagip5 (on e_indresp_ip) to show the status of mode of completion for the self-completion questionnaire in Wave 5. The effect of the error is that around 50 per cent of those eligible to receive the questions in face-to-face CASI mode did not get asked the experimental questions (313 people, based on unedited data). It should be noted that this does not confound the experiment (i.e. no respondents were asked questions in the wrong mode), but the reduced numbers mean that it does reduce its power to detect mode differences.

The affected variables in the Wave 5 individual questionnaire were

Summary of individual level variables affected by errors in feed forward variables
VariableImpact
FF_JBMNGRBlank due to programming error
JBMNGRCHKNot asked because FF_JBMNGR was blank
FF_JBSIZEBlank due to programming error
JBSIZECHK_A through JBSIZECHK_DNot asked, not released
FF_JBTERM1Blank due to programming error
JBTERM1_A through JBTERM1_DNot asked, not released  
FF_BENTYPE01-FF_BENTYPE37Blank due to programming error
NFH01-NFH37Not asked because FF_BENTYPE01 – FF_BENTYPE37 were blank
FF_CAWIW5not released
SF12 ModuleNot asked of some respondents (identified by variable E_CASIFLAGER) due to programming error that meant that some respondents were not asked part of the self-completion questions
GHQ ModuleNot asked of some respondents (identified by variable E_CASIFLAGER) due to programming error that meant that some respondents were not asked part of the self-completion questions
Parental Relationships ModuleNot asked of some respondents (identified by variable E_CASIFLAGER) due to programming error that meant that some respondents were not asked part of the self-completion questions
Alcohol ModuleNot asked of some respondents (identified by variable E_CASIFLAGER) due to programming error that meant that some respondents were not asked part of the self-completion questions
Personality ModuleNot asked of some respondents (identified by variable E_CASIFLAGER) due to programming error that meant that some respondents were not asked part of the self-completion questions

Wave 6

In Wave 6, four households in the £10 incentive treatment group became aware of the £30 treatment. To compensate they were offered an extra £20. The households are identified by the variable f_incentcomp on the record f_hhsamp.

Wave 7

In Wave 7, there are a few households with missing values for the experimental treatment allocations in the hhsamp file. The initial IP7 sample used to generate the experimental allocation variables was based on what was the latest IP6 data delivery at that time. Later data deliveries included some additional households. Most of these extra households were untraced with no addresses for them to go out in to the field. For the few that did have an address we generated randomisations for the experimental variables separately. For households with missing address information the experimental variables remained missing.  

Wave 11

In Wave 11, the variables related to height and weight were removed from all waves in the w_youth_ip record, due to measurement problems with these variables. 

The Wave 11 individual interview question timings file (k_indint_timings.csv) contained two errors which have been corrected with the version released with IP14. The first error was the following. When the timings file was created, respondents with the same Serial ID (a household-level fieldwork identifier) were updated with values from another interview that had the same Serial ID, overwriting values (e.g. if they both answered ConsentQ3, both interviews would have the same value for ConsentQ3 and should have been different). If they were routed to different versions of questions – e.g. ConsentQ3 and ConsentQ4 – one respondent would have timings for both questions. This has been corrected by ensuring that the serial ID is used with other identifiers to uniquely identify individual cases. This means that the updated timings data has changed across all the timings variables and the derived summary variables. The second error was that some observations for questions in modules that the respondent was not routed into contained the value “12/30/1899 0:00:00” instead of being blank. This has also been corrected.

Wave 13

In Wave 13, there was an error in the sample file. The variable ff_eventtrigw12 was erroneously set to missing for all sample members. As a result the question “eventdebrief” that should have been asked for all sample members invited to the event-triggered data collection during 2020/2021 (ff_eventtrigw12=1) was not asked of anyone. In addition, in the introduction to the Annual Events History (“calintro”), the text fill “Please tell us about all changes, even if you have already reported them in the monthly questions about life events that we have been trialling. The reason for asking you again is that in this interview we are interested in different aspects of any changes you have experienced.”, which should have been shown to all respondents invited to the event-triggered data collection, was not displayed to anyone.

Wave 14

In Wave 14, the variable father (“Fathered children since last interview”) is not populated for about 300 cases. This error occurred because an age filter was left active from a prior question, so respondents aged > 64 years were not asked.

Wave 15

In Wave 15, there is a household that participated in the wave but had no allocations for the experimental conditions (ff_ variables in the hhsamp file). This was a late re-joiner household, that was lost at Wave 13 but re-joined the panel for Wave 15 but after the allocations had been made.

Wave 16

In Wave 16, there are three households that participated in the wave but had no allocations for the experimental conditions (ff_ variables in the hhsamp file). These were late re-joiner households, that were lost previously but re-joined the panel for Wave 16 but after the allocations had been made.

In Wave 16, no respondents were routed into the proxy questionnaire module that asked about respondents who had moved into a care home (module “carehomeproxy” in the IP16 questionnaire). The corresponding variables were therefore dropped from the file p_indresp_ip. Similarly, in the household grid, no respondents were routed into the questions about household members who were reported as having moved into a care home at the previous wave. The variables chomestillchmrespidp were therefore dropped from the file p_indall_ip.

Email newsletter

Sign up to our newsletter