Certain respondents in a survey wave are assigned a weight of zero, effectively excluding them from cross-sectional (single-wave) analysis. This typically happens for respondents who do not meet specific criteria for representativeness in that wave.
Attrition occurs when cases are lost from a sample over time. There are a variety of reasons for this in longitudinal research, such as: unwillingness of subjects to continue to participate, difficulties in tracing original respondents for follow-up (due to change of address) and nonavailability (due to serious illness or death).
The inverse of the probability of selection for each unit in the sample. It accounts for the sampling design and ensures that each sampled units represents the correct number of individuals in the population.
Broad sense: in social science often used to denote any measurement derived from the human body which might relate to health, including grip strength, waist circumference, lung function, etc. Narrow sense (as defined by the National Institute for Health): “a characteristic that is objectively measured and evaluated as an indicator of normal biological processes, pathogenetic processes, or pharmacological responses to a therapeutic intervention”.
Computer-Assisted Personal Interviewing (CAPI) refers to survey data collection by an in-person interviewer (i.e. face-to-face interviewing) who uses a computer to administer the questionnaire to the respondent and captures the answers onto the computer.
Computer-Assisted Self-Interviewing (CASI) is the respondent using a computer to complete the survey questionnaire without an interviewer administering it to the respondent.
Computer-Assisted Telephone Interviewing (CATI) is the interviewer using a computer to conduct the interview with the respondent by telephone.
Computer-Assisted Web Interviewing (CAWI) is a questionnaire provided to the respondent with a link to complete on a website.
A range of values so defined that one can specify the probability that the value of a population parameter lies within it. For example, the average yearly income of a sample of 1000 individuals is £30,000. The true population value can be higher or lower. Based on the sample size and variability across individuals, we compute a 95% confidence interval of £28,000 to £32,000. This means that if we repeat the sampling process many times, 95% of the time, the average income will fall within this range.
Coverage error is a bias in a statistic that occurs when the statistic is based on a sample drawn from the population of interest with some population elements being excluded, i.e., when the sampling frame is different from the target population or the population of interest.
Cross-sectional weights are statistical adjustment factors applied to survey data to ensure that the sample accurately represents the target population at a specific point in time. They correct for unequal selection probabilities, nonresponse bias and other sampling imperfections, allowing for unbiased population estimates.
Explains the variable names and values in the datafile.
The inverse of the probability of selection of each unit in the sample (the base weight), accounting for factors like stratification, clustering, or oversampling to better reflect the intended sample design.
Some groups within a sample may be more likely to respond to the survey and this could lead to potential bias (e.g. certain age groups or educational attainment).
Deoxyribonucleic acid. The chemical in our cells which comprises our genome.
EUL refers to End user licence which is a type of licence for accessing the data. The data are available to users, once registered, via the UK Data Service.
Enumeration weights adjust for undercoverage or unequal probabilities of selection that arise during the household enumeration process. The household enumeration grid identifies all household members – this determines who is eligible for an interview – enumeration weights account for differences in the likelihood of households and individuals being listed.
The study of mechanisms that affect gene expression by altering the in a DNA way that does not change its code. This term is often used to indicate epigenomics. Encompasses several mechanisms, one of which is.
Fieldwork encompasses the tasks that are undertaken to collect data for a survey.
A section of a DNA molecule which the cell can read and translate to produce a protein.
The study of how the DNA code relates to traits (genetically determined characteristics) and health conditions. This term often indicates approaches that focus on a narrow set of genes or markers, but is often used more broadly to include ‘genomics’.
A study in which a phenotype is tested for association with a large number of genetic marks (especially single nucleotide variant (SNV) across the genome, using linear or logistic regression. See the biological data glossary for the full definition of terms.
Broadly speaking, the statistical assessment of empirically observed sample data against a theoretical model that asserts what would be found under particular specified conditions.
The inverse probability of sample members being selected through different samples and continuing to be enumerated up to wave X. Inclusion weights adjust for household response rate at the first wave of each sample (e.g. BHPS original, BHPS Scotland…GPS Wave 1, EMB) and attrition between the recruitment wave and the wave where the longitudinal inclusion weight is computed.
Key variables are a curated list of variables that provide an entry point into a specific research topic within Understanding Society. They highlight the most relevant measures that help researchers quickly grasp how the topic is represented in the study. Key variables are selected to give a clear overview of what can be explored, helping to identify where to begin with analysis without having to review all available variables at the same time. These variables serve as a starting point for understanding the topic and its potential applications in research.
Longitudinal weights account for sample attrition and aim to maintain the sample representativeness over multiple waves of data collection. These weights adjust for differences in response probabilities across waves, ensuring that the panel remains as representative as possible of the target population despite dropouts.
The chemical addition of a methyl group to a cytosine having an effect on the expression of a genet.
Survey researchers use the term mode to refer to the way in which data are collected in the survey (such as self-completion web interviews, face-to-face interviews or telephone interviews).
Is the influence that using different modes during data collection can have on survey responses. Methodology researchers look at the impact of mode on data obtained from surveys.
Mathematical or statistical representations of relationships between variables used to analyse data, make predictions, or test hypotheses.
In surveys, nonresponse occurs when selected individuals or households do not participate or fail to provide answers to some or all questions.
The suffix (sometimes used as a word) that denotes an approach wherein a broad, comprehensive set of molecules of a certain class are assayed (measured) or analysed simultaneously.
The sum of an individual's alleles which may contribute to a given phenotype, usually weighted by GWAS effect size. See the biological data glossary for the full definition of terms.
In sample surveys, primary sampling unit (PSU) arises in samples in which population elements are grouped into aggregates and the aggregates become units in sample selection. The aggregates are, due to their intended usage, called “sampling units.” Primary sampling unit refers to sampling units that are selected in the first (primary) stage of a multi-stage sample ultimately aimed at selecting individual elements.
An extremely diverse class of biological molecule, each protein is composed of amino acids and is encoded by a gene. Proteins carry out every process in our bodies.
The study of a broad range of proteins.
If someone is not able to participate due to illness or they are busy, interviewers can ask someone else in the household (e.g., spouse or adult child) to complete a proxy questionnaire on their behalf. This is a much shorter questionnaire including only factual questions.
A sample design is the framework used for the selection of a survey sample. For example, if researchers are interested in obtaining information through a survey for a population, or universe of interest, they must define a sampling frame that represents the population of interest, from which a sample is to be drawn.
Sampling error addresses how much, on average, the sample estimates of a study characteristic or variable, such as years of education, differ from sample to sample. Sampling error is essential in describing research results, how much they vary, and the statistical level of confidence that can be placed in them. Sampling error is also critical in tests of classic statistical significance.
Likelihood of a particular unit (individual or household) to be selected as part of a sample in a study.
Likelihood of a particular unit (individual or household) to be selected as part of a sample in a study.
A measure of the accuracy of a sample estimate. The standard error is the standard deviation of the sampling distribution of a statistic.
Stratified sampling separates the population into subgroups that are called “strata” and then selects random samples from each subgroup. Dividing the sampling effort in this fashion creates some extra work and extra cost. However, under some conditions, the estimates drawn from stratified samples have much lower sampling errors than estimates from simple random samples of the same size. This allows sampling error goals to be met with smaller sample sizes than are needed in simple random sampling and consequently lowers the total cost of research.
Design or method that does not reach the best possible statistical efficiency, representativeness or precision given the available information.
Not adjusted to account for differences in selection probability, nonresponse, or population representation.
Portion of the total variance in an estimate that can be attributed to a specific source of variability in the sampling process.
Each set of annual interviews conducted as part of the Understanding Society survey are referred to as a wave.
Adjustment factors applied to survey data to ensure that estimates derived from the sample are representative of the target population. They correct for unequal selection probabilities, nonresponse, and coverage issues.
Sign up to our newsletter