A

A priori analyses See planned analyses.
Absolute risk difference See risk difference.
Absolute risk reduction See risk difference.
Additive model A statistical model in which the combined effect of several factors is the sum of the effects produced by each of the factors in the absence of the others. For example, if one factor increases risk by a% and a second factor by b%, the additive combined effect of the two factors is (a + b)%. See also multiplicative model.
Adjusted analysis An analysis that controls (adjusts) for baseline imbalances in important patient characteristics. See also confounderregression analysis.
Adverse event An adverse outcome that occurs during or after the use of a drug or other intervention but is not necessarily caused by it.
Adverse effect An adverse event for which the causal relation between the drug/intervention and the event is at least a reasonable possibility. The term ‘adverse effect’ applies to all interventions, while ‘adverse drug reaction’ (ADR) is used only with drugs. In the case of drugs an adverse effect tends to be seen from the point of view of the drug and an adverse reaction is seen from the point of view of the patient.
Adverse reaction See adverse effect
Aggregate data Data summarised by groups, for example summary outcome data for treatment and control groups in a controlled trial.
Allocation concealment See concealment of allocation.
Alpha See Type I error.
Applicability See external validity.
Arithmetic mean See mean.
Arm [In a controlled trial.] Refers to a group of participants allocated a particular treatment. In arandomised controlled trial, allocation to different arms is determined by the randomisation procedure. Many controlled trials have two arms, a group of participants assigned to anexperimental intervention (sometimes called the treatment arm) and a group of participants assigned to a control (the control arm). Trials may have more than two arms, with more than one experimental arm and/or more than one control arm.
Ascertainment bias See detection bias.
Association A relationship between two characteristics, such that as one changes, the other changes in a predictable way. For example, statistics demonstrate that there is an association between smoking and lung cancer. In a positive association, one quantity increases as the other one increases (as with smoking and lung cancer).  In a negative association, an increase in one quantity corresponds to a decrease in the other. Association does not necessarily imply a causal effect.  (Also called correlation.)
Attrition The loss of participants during the course of a study. (Also called loss to follow up.) Participants that are lost during the study are often call dropouts.
Attrition bias Systematic differences between comparison groups in withdrawals or exclusions of participantsfrom the results of a study. For example, participants may drop out of a study because of side effects of an intervention, and excluding these participants from the analysis could result in an overestimate of the effectiveness of the intervention, especially when the proportion dropping out varies by treatment group.

B

Baseline characteristics Values of demographic, clinical and other variables collected for each participant at the beginning of a trial, before the intervention is administered.
Bayes’ theorem A probability theorem used to update the probability of an event in the light of a piece of new evidence. A common application is in diagnosis, where the prior probability of disease, obtained from population data, is updated to a posterior probability in the light of a positive or negative result from a diagnostic test.
Bayesian statistics An approach to statistics based on application of Bayes’ theorem that can be used in single studies or meta-analysis. A Bayesian analysis uses Bayes’ theorem to transform a priordistribution for an unknown quantity (e.g. an odds ratio) into a posterior distribution for the same quantity, in light of the results of a study or studies. The prior distribution may be based on external evidence, common sense or subjective opinion. Statistical inferences are made by extracting information from the posterior distribution, and may be presented as point estimates, and credible intervals (the Bayesian equivalent of confidence intervals).
Beta See Type II error.
Bias [In statistics.] A systematic error or deviation in results or inferences from the truth.  In studies of the effects of health care, the main types of bias arise from systematic differences in the groups that are compared (selection bias), the care that is provided, exposure to other factors apart from the intervention of interest (performance bias), withdrawals or exclusions of people entered into a study (attrition bias) or how outcomes are assessed (detection bias). Reviews of studies may also be particularly affected by reporting bias, where a biased subset of all the relevant data is available.
Bias prevention Aspects of the design or conduct of a study designed to prevent bias. For controlled trials, such aspects include randomisationblinding and concealment of allocation.
Binary data See dichotomous data.
Binomial distribution A statistical distribution with known properties describing the number of occurrences of an event in a series of observations. Thus, the number of deaths in the control arm of a controlled trialfollows a binomial distribution. The distribution forms the basis for analyses of dichotomous data.
Blinding [In a controlled trial:] The process of preventing those involved in a trial from knowing to which comparison group a particular participant belongs. The risk of bias is minimised when as few people as possible know who is receiving the experimental intervention and who the controlintervention. Participants, caregivers, outcome assessors, and analysts are all candidates for being blinded.  Blinding of certain groups is not always possible, for example surgeons in surgical trials. The terms single blinddouble blind and triple blind are in common use, but are not used consistently and so are ambiguous unless the specific people who are blinded are listed.  (Also called masking.)
Block randomisation See random permuted blocks.

C

Carry over [In a cross-over trial:] The persistence, into a later period of treatment, of some of the effects of a treatment applied in an earlier period.
Case series A study reporting observations on a series of individuals, usually all receiving the sameintervention, with no control group.
Case study A study reporting observations on a single individual.  (Also called anecdotecase history, orsingle case report.)
Case-control study A study that compares people with a specific disease or outcome of interest (cases) to people from the same population without that disease or outcome (controls), and which seeks to find associations between the outcome and prior exposure to particular risk factors. This design is particularly useful where the outcome is rare and past exposure can be reliably measured. Case-control studies are usually retrospective, but not always.
Categorical data Data that are classified into two or more non-overlapping categories. Race and type of drug (aspirin, paracetamol, etc.) are examples of categorical variables.  If there is a natural order to the categories, for example, non-smokers, ex-smokers, light smokers and heavy smokers, the data are known as ordinal data. If there are only two categories, the data are dichotomous data. See also continuous data.
Causal effect An association between two characteristics that can be demonstrated to be due to cause and effect, i.e. a change in one causes the change in the other. Causality can be demonstrated by experimental studies such as controlled trials (for example, that an experimental interventioncauses a reduction in mortality).  However, causality can often not be determined from anobservational study.
Censored [In survival analysis:] A term used in studies where the outcome is the time to a particular event, to describe data from patients where the outcome is unknown. A patient might be known not to have had the event only up to a particular point in time, so ‘survival time’ is censored at this point.
Chi-squared test A statistical test based on comparison of a test statistic to a chi-squared distribution. Used inRevMan analyses to test the statistical significance of the heterogeneity statistic.
CI See confidence interval.
Clinical guideline A systematically developed statement for practitioners and participants about appropriate health care for specific clinical circumstances.
Clinical trial An experiment to compare the effects of two or more healthcare interventions.  Clinical trial is an umbrella term for a variety of designs of healthcare trials, including uncontrolled trials,controlled trials, and randomised controlled trials.  (Also called intervention study.)
Clinically significant A result (e.g. a treatment effect) that is large enough to be of practical importance to patients and healthcare providers. This is not the same thing as statistically significant. Assessing clinical significance takes into account factors such as the size of a treatment effect, the severity of the condition being treated, the side effects of the treatment, and the cost. For instance, if the estimated effect of a treatment for acne was small but statistically significant, but the treatment was very expensive, and caused many of the treated patients to feel nauseous, this would not be a clinically significant result. Showing that a drug lowered the heart rate by an average of 1 beat per minute would also not be clinically significant.
Cluster randomised trial A trial in which clusters of individuals (e.g. clinics, families, geographical areas), rather than individuals themselves, are randomised to different arms. In such studies, care should be taken to avoid unit of analysis errors.
Cohort study An observational study in which a defined group of people (the cohort) is followed over time. Theoutcomes of people in subsets of this cohort are compared, to examine people who were exposed or not exposed (or exposed at different levels) to a particular intervention or other factor of interest. A prospective cohort study assembles participants and follows them into the future.  A retrospective (or historical) cohort study identifies subjects from past records and follows them from the time of those records to the present. Because subjects are not allocated by the investigator to different interventions or other exposures, adjusted analysis is usually required to minimise the influence of other factors (confounders).
Co-intervention The application of additional diagnostic or therapeutic procedures to people receiving a particular programme of treatment. In a controlled trial, members of either or both the experimental and the control groups might receive co-interventions.
Co-morbidity The presence of one or more diseases or conditions other than those of primary interest. In a study looking at treatment for one disease or condition, some of the individuals may have other diseases or conditions that could affect their outcomes.  (A co-morbidity may be a confounder.)
Concealment of allocation The process used to ensure that the person deciding to enter a participant into a randomised controlled trial does not know the comparison group into which that individual will be allocated. This is distinct from blinding, and is aimed at preventing selection bias. Some attempts at concealing allocation are more prone to manipulation than others, and the method of allocation concealment is used as an assessment of the quality of a trial. See also bias prevention. (Also called allocation concealment.)
Confidence interval A measure of the uncertainty around the main finding of a statistical analysis.  Estimates of unknown quantities, such as the odds ratio comparing an experimental intervention with acontrol, are usually presented as a point estimate and a 95% confidence interval. This means that if someone were to keep repeating a study in other samples from the same population, 95% of the confidence intervals from those studies would contain the true value of the unknown quantity.  Alternatives to 95%, such as 90% and 99% confidence intervals, are sometimes used.  Wider intervals indicate lower precision; narrow intervals, greater precision.  (Also called CI.)
Confidence limits The upper and lower boundaries of a confidence interval.
Confounded comparison comparison between two treatment groups that will give a biased estimate of the effect of treatment due to the study design. For a comparison to be unconfounded, the two treatment groups must be treated identically apart from the randomised treatment. For instance, to estimate the effect of heparin in acute stroke, a trial of heparin alone versus placebo would provide anunconfounded comparison.  However, a trial of heparin alone versus aspirin alone provides a confounded comparison of the effect of heparin. (See also unconfounded comparison.)
Confounder A factor that is associated with both an intervention (or exposure) and the outcome of interest. For example, if people in the experimental group of a controlled trial are younger than those in the control group, it will be difficult to decide whether a lower risk of death in one group is due to the intervention or the difference in ages. Age is then said to be a confounder, or a confounding variable.  Randomisation is used to minimise imbalances in confounding variables between experimental and control groups. Confounding is a major concern in non-randomised studies. See also adjusted analyses.
Consumer (healthcare consumer) Someone who uses, is affected by, or who is entitled to use a health related service.
Consumer advocate or representative Consumer who is actively involved with other consumers and able to represent the perspectives and concerns of that broader group of people. Consumer representatives work in Cochraneentities to ensure that consumers’ views are taken account of when review questions are being decided and results presented.
Contamination [In a controlled trial:] The inadvertent application of the intervention being evaluated to people in the control group; or inadvertent failure to apply the intervention to people assigned to theintervention group. Fear of contamination is one motivation for performing a cluster randomised trial.
Context The conditions and circumstances that are relevant to the application of an intervention, for example the setting (in hospital, at home, in the air); the time (working day, holiday, night-time); type of practice (primary, secondary, tertiary care; private practice, insurance practice, charity); whether routine or emergency.
Contingency table A table of frequencies or counts. In a two-way contingency table, sub-categories of one characteristic are indicated horizontally (in rows) and subcategories of another characteristic are indicated vertically (in columns). Tests of association between the characteristics can be readily applied. The simplest two-way contingency table is the 2×2 table, which is used in clinical trialsto compare dichotomous outcomes, such as death, for an experimental intervention andcontrol group.
Continuous data Data with a potentially infinite number of possible values within a given range.  Height, weight and blood pressure are examples of continuous variables. See also categorical data.
Control 1.  [In a controlled trial:] A participant in the arm that acts as a comparator for one or more experimental interventions. Controls may receive placebo, no treatment, standard treatment, or an active intervention, such as a standard drug.2.  [In a case-control study:] A person in the group without the disease or outcomeof interest.3.  [In statistics:] To adjust for, or take into account, extraneous influences or observations.
Control event rate See risk.
Control group 1.  [In a controlled trial:] The arm that acts as a comparator for one or more experimental interventions. See also control. (Also called comparison group.)2.  [In a case-control study:] The group without the disease or outcome of interest. (Also calledcomparison group.)
Control group risk See risk.
Control program [In communicable (infectious) diseases:] Programs aimed at reducing or eliminating the disease.
Controlled before and after study non-randomised study design where a control population of similar characteristics and performance as the intervention group is identified. Data are collected before and after theintervention in both the control and intervention groups.
Controlled (clinical) trial (CCT) See clinical trial. This is an indexing term used in MEDLINE and CENTRAL. Within CENTRAL it refers to trials using quasi-randomisation, or trials where double blinding was used butrandomisation was not mentioned.
Controlled trial clinical trial that has a control group. Such trials are not necessarily randomised.
Convenience sample A group of individuals being studied because they are conveniently accessible in some way. This could make them particularly unrepresentative, as they are not a random sample of the whole population. A convenience sample, for example, might be all the people at a certain hospital, or attending a particular support group. They could differ in important ways from the people who haven’t been brought together in that way: they could be more or less sick, for example.
Conventional treatment Whatever the standard or usual treatment is for a particular condition at that time.
Correlation 1.  See association. (Positive correlation is the same as positive association, and negative correlation is the same as negative association.)2.  [In statistics:] Linear association between two variables, measured by a correlation coefficient.  A correlation coefficient can range from -1 for perfect negative correlation, to +1 for perfect positive correlation (with perfect meaning that all the points lie on a straight line).  A correlation coefficient of 0 means that there is no linear relationship between the variables.
Cost-benefit analysis An economic analysis that converts effects into the same monetary terms as costs and compares them.
Cost-effectiveness analysis An economic analysis that views effects in terms of overall health specific to the problem, and describes the costs for some additional health gain (e.g. cost per additional stroke prevented).
Cost-utility analysis An economic analysis that expresses effects as overall health improvement and describes how much it costs for some additional utility gain (e.g. cost per additional quality-adjusted life-year).
Cox model See proportional hazards model.
Critical appraisal The process of assessing and interpreting evidence by systematically considering its validity, results, and relevance.
Cross-over trial A type of clinical trial comparing two or more interventions in which the participants, upon completion of the course of one treatment, are switched to another. For example, for a comparison of treatments A and B, the participants are randomly allocated to receive them in either the order A, B or the order B, A.  Particularly appropriate for study of treatment options for relatively stable health problems. The time during which the firs interventions is taken is known as the first period, with the second intervention being taken during the second period.  See also carry over, and  period effect.
Cross-sectional study A study measuring the distribution of some characteristic(s) in a population at a particular point in time. (Also called survey.)
Cumulative meta-analysis A meta-analysis in which studies are added one at a time in a specified order (e.g. according to date of publication or quality) and the results are summarised as each new study is added. In a graph of a cumulative meta-analysis, each horizontal line represents the summary of the results as each study is added, rather than the results of a single study.

D

Data derived analyses See unplanned analyses.
Data dredging Performing many analyses on the data from a study, for example looking for associations among many variables.  Particularly used to refer to unplanned analyses, where there is no apparent hypothesis, and only statistically significant results are reported.
Decision analysis A technique that formally identifies the options in a decision-making process, quantifies the probable outcomes (and costs) of each (and the uncertainty around them), determines the option that best meets the objectives of the decision-maker and assesses the robustness of this conclusion.
Degrees of freedom A concept that refers to the number of independent contributions to a sampling distribution (such as chi-squared distribution). In a contingency table, it is one less than the number of row categories multiplied by one less than the number of column categories; e.g. a 2 x 2 table comparing two groups for a dichotomous outcome, such as death, has one degree of freedom.
Dependent variable The outcome or response that results from changes to an independent variable.  In a clinical trial, the outcome (over which the investigator has no direct control) is the dependent variable, and the treatment arm is the independent variable. The dependent variable is traditionally plotted on the vertical axis on graphs. (Also called outcome variable.)
Descriptive study A study that describes characteristics of a sample of individuals. Unlike an experimental study, the investigators do not actively intervene to test a hypothesis, but merely describe the health status or characteristics of a sample from a defined population.
Design effect A number that describes how much larger a sample is needed in designs such as cluster randomised trials to achieve the same precision as a simple random sample. It is the ratio of the true variance of a statistic (taking the sampling design into account) to the variance of the statistic for a simple random sample with the same number of cases.
Detection bias Systematic difference between comparison groups in how outcomes are ascertained, diagnosed or verified.  (Also called ascertainment bias.)
Detection rate See sensitivity.
Dichotomous data Data that can take one of two possible values, such as dead/alive, smoker/non-smoker, present/not present. (Also called binary data.) Sometimes continuous data or ordinal data are simplified into dichotomous data (e.g. age in years could become <75 years or ≥75 years).
Distribution The collection of values of a variable in the population or the sample, sometimes called an empirical distribution. See also probability distribution.
Dose dependent A response to a drug which may be related to the amount received (i.e. the dose). Sometimes trials are done to test the effect of different dosages of the same drug. This may be true for both benefits and harms.
Dose response relationship The relationship between the quantity of treatment given and its effect on outcome.  In meta-analysis, dose-response relationships can be investigated using meta-regression.
Double blind See blinding.
Dropouts See attrition.

E

Economic analysis (economic evaluation) Comparison of the relationship between costs and outcomes of alternative healthcare interventions. See cost-benefit analysiscost-effectiveness analysis, and cost-utility analysis.
Effect size 1.  A generic term for the estimate of effect of treatmentfor a study.2.  A dimensionless measure of effect that is typically used for continuous data when different scales (e.g. for measuring pain) are used to measure an outcome and is usually defined as the difference in means between the intervention and control groups divided by the standard deviation of the control or both groups.  See also standardised mean difference.
Effectiveness The extent to which a specific intervention, when used under ordinary circumstances, does what it is intended to do. Clinical trials that assess effectiveness are sometimes called pragmatic or management trials. See also intention-to-treat.
Efficacy The extent to which an intervention produces a beneficial result under ideal conditions. Clinical trials that assess efficacy are sometimes called explanatory trials and are restricted to participants who fully co-operate.
Empirical Empirical results are based on observation rather than on reasoning alone.
Endpoint See outcome.
Epidemiology The study of the health of populations and communities, not just particular individuals.
Equipoise A state of uncertainty where a person believes it is equally likely that either of two treatmentoptions is better.
Equivalence trial A trial designed to determine whether the response to two or more treatments differs by an amount that is clinically unimportant. This is usually demonstrated by showing that the true treatment difference is likely to lie between a lower and an upper equivalence level of clinically acceptable differences.  See also non-inferiority trial.
Estimate of effect The observed relationship between an intervention and an outcome expressed as, for example, a number needed to treat to benefit, odds ratiorisk differencerisk ratiostandardised mean difference, or weighted mean difference.  (Also called treatment effect.)
Event rate See risk.
Experimental intervention An intervention under evaluation. In a controlled trial, an experimental intervention arm is compared with one or more control arms, and possibly with additional experimental intervention arms.
Experimental study A study in which the investigators actively intervene to test a hypothesis.  In a controlled trial, one type of experiment, the people receiving the treatment being tested are said to be in the experimental group or arm of the trial.
Explanatory trial A trial that aims to test a treatment policy in an ideal situation where patients receive the full course of therapy as prescribed, and use of other treatments may be controlled or restricted.  See also pragmatic trial.
External validity The extent to which results provide a correct basis for generalisations to other circumstances. For instance, a meta-analysis of trials of elderly patients may not be generalisable to children. (Also called generalisability or applicability.)

F

Factorial design A trial design used to assess the individual contribution of treatments given in combination, as well as any interactive effect they may have. Most trials only consider a single factor, where anintervention is compared with one or more alternatives, or a placebo. In a trial using a 2×2 factorial design, participants are allocated to one of four possible combinations. For example in a 2×2 factorial RCT of nicotine replacement and counselling, participants would be allocated to: nicotine replacement alone, counselling alone, both, or neither. In this way it is possible to test the independent effect of each intervention on smoking cessation and the combined effect of (interaction between) the two interventions. This type of study is usually carried out in circumstances where no interaction is likely.
False negative A falsely drawn negative conclusion.[In diagnostic tests:] A conclusion that a person does not have the disease or condition being tested, when they actually do.[In clinical trials:] See Type II error.
False positive A falsely drawn positive conclusion.[In diagnostic tests:] A conclusion that a person does have the disease or condition being tested, when they actually do not.[In clinical trials:] See Type I error.
Fixed-effect model [In meta-analysis:] A model that calculates a pooled effect estimate using the assumption that all observed variation between studies is caused by the play of chance. Studies are assumed to be measuring the same overall effect. An alternative model is the random-effects model.
Follow-up The observation over a period of time of study/trial participants to measure outcomes under investigation.
Forest plot A graphical representation of the individual results of each study included in a meta-analysistogether with the combined meta-analysis result. The plot also allows readers to see theheterogeneity among the results of the studies. The results of individual studies are shown as squares centred on each study’s point estimate. A horizontal line runs through each square to show each study’s confidence interval – usually, but not always, a 95% confidence interval. The overall estimate from the meta-analysis and its confidence interval are shown at the bottom, represented as a diamond. The centre of the diamond represents the pooled point estimate, and its horizontal tips represent the confidence interval.
Funnel plot A graphical display of some measure of study precision plotted against effect size that can be used to investigate whether there is a link between study size and treatment effect.  One possible cause of an observed association is reporting bias.

G

Generalisability (also: applicability, external validity) See external validity.
Gold standard The method, procedure, or measurement that is widely accepted as being the best available, against which new developments should be compared.
Grey literature Grey literature is the kind of material that is not published in easily accessible journals or databases. It includes things like conference proceedings that include the abstracts of the research presented at conferences, unpublished theses, and so on.

H

Hazard rate The probability of an event occurring given that it hasn’t occurred up to the current point in time.
Hazard ratio A measure of effect produced by a survival analysis. This represents the increased risk with which one group is likely to experience the outcome of interest.  For example, if the hazard ratio for death for a treatment is 0.5, then we can say that treated patients are likely to die at half the rate of untreated patients.
Heterogeneity 1.  Used in a general sense to describe the variation in, or diversity of, participants,interventions, and measurement of outcomes across a set of studies, or the variation in internalvalidityof those studies.2.  Used specifically, as statistical heterogeneity, to describe the degree of variation in the effect estimatesfrom a set of studies. Also used to indicate the presence of variability among studies beyond the amount expected due solely to the play of chance.See also homogeneousI2.
Heterogeneous Used to describe a set of studies or participants with sizeable heterogeneity.  The opposite ofhomogeneous.
Historical control control person or group for whom data were collected earlier than for the group being studied. There is a large risk of bias in studies that use historical controls due to systematic differences between the comparison groups, due to changes over time in risks, prognosis, health care, etc.
Homogeneous 1.  Used in a general sense to mean that the participantsinterventions, and measurement ofoutcomesare similar across a set of studies.2.  Used specifically to describe the effect estimatesfrom a set of studies where they do not vary more than would be expected by chance.See also homogeneousheterogeneity.
Hypothesis An unproved theory that can be tested through research.  To properly test a hypothesis, it should be pre-specified and clearly articulated, and the study to test it should be designed appropriately.  See also null hypothesis.
Hypothesis test A statistical procedure to determine whether to reject a null hypothesis on the basis of the observed data.

I

I2 A measure used to quantify heterogeneity. It describes the percentage of the variability in effect estimates that is due to heterogeneity rather than sampling error (chance). A value greater than 50% may be considered to represent substantial heterogeneity.
Incidence The number of new occurrences of something in a population over a particular period of time, e.g. the number of cases of a disease in a country over one year.
Independent A description of two events, where knowing the outcome or value of one does not inform us about the outcome or value of the other. Formally, two events ‘A and B’ are independent if the probability that A and B occur together is equal to the probability of A occurring multiplied by the probability of B occurring.
Independent variable An exposure, risk factor, or other characteristic that is hypothesized to influence the dependent variable. In a clinical trial, the outcome (over which the investigator has no direct control) is the dependent variable, and the treatment arm is the independent variable. In an adjusted analysis, patient characteristics are included as additional independent variables. (Also called explanatory variable.)
Individual patient data [In meta-analysis:] The availability of raw data for each study participant in each included study, as opposed to aggregate data (summary data for the comparison groups in each study). Reviews using individual patient data require collaboration of the investigators who conducted the original studies, who must provide the necessary data.
Intention to treat analysis A strategy for analysing data from a randomised controlled trial. All participants are included in the arm to which they were allocated, whether or not they received (or completed) theintervention given to that arm. Intention-to-treat analysis prevents bias caused by the loss of participants, which may disrupt the baseline equivalence established by randomisation and which may reflect non-adherence to the protocol. The term is often misused in trial publications when some participants were excluded.
Interaction The situation in which the effect of one independent variable on the outcome is affected by the value of a second independent variable. In a trial, a test of interaction examines whether thetreatment effect varies across sub-groups of participants. See also factorial trialsub-group analysis.
Interim analysis Analysis comparing intervention groups at any time before the formal completion of a trial, usually before recruitment is complete. Often used with stopping rules so that a trial can be stopped if participants are being put at risk unnecessarily.  Timing and frequency of interim analyses should be specified in the protocol.
Intermediary outcomes See surrogate endpoints.
Internal validity The extent to which the design and conduct of a study are likely to have prevented bias. Variation in quality can explain variation in the results of studies included in a systematic review. More rigorously designed (better quality) trials are more likely to yield results that are closer to the truth. (Also called methodological quality but better thought of as relating to bias prevention.) See also external validityvalidity,  bias prevention.
Inter-rater reliability The degree of stability exhibited when a measurement is repeated under identical conditions by different raters. Reliability refers to the degree to which the results obtained by a measurement procedure can be replicated. Lack of inter-rater reliability may arise from divergences between observers or instability of the attribute being measured. See also intra-rater reliability.
Interrupted time series A research design that collects observations at multiple time points before and after anintervention (interruption). The design attempts to detect whether the intervention has had an effect significantly greater than the underlying trend.
Intervention The process of intervening on people, groups, entities or objects in an experimental study. Incontrolled trials, the word is sometimes used to describe the regimens in all comparison groups, including placebo and no-treatment arms. See also treatmentexperimental intervention and control.
Intervention group A group of participants in a study receiving a particular health care intervention.  Parallel group trials include at least two intervention groups.
Intervention study See Clinical trial.
Intra-rater reliability The degree of stability exhibited when a measurement is repeated under identical conditions by the same rater. Reliability refers to the degree to which the results obtained by a measurement procedure can be replicated. Lack of intra-rater reliability may arise from divergences between instruments of measurement, or instability of the attribute being measured.

K

Key words A string of words attached to an article to be used to index or code the article in a database. See also MeSH

L

L’Abbé plot A scatter plot of the risk in the experimental group against the risk in the control group. Ideally the size of the plotting symbols should be proportional to the size of the trials. Trials in which the experimental treatment had a higher risk than the control will be in the upper left of the plot, between the y axis and the line of equality. If experimental is no better than control then the point will fall on the line of equality, and if the control treatment has a higher risk than the experimental treatment then the point will be in the lower right of the plot, between the x axis and the line of equality.
Linear scale A scale that increases in equal steps.  In a linear scale on a RevMan forest plot, the distance between 0 and 5 is the same as the distance between 5 and 10, or between 10 and 15. A linear scale may be used when the range of numbers being represented is not large, or to represent differences. See also logarithmic scale.
Logarithmic scale A scale in which the logarithm of a value is used instead of the value. In a logarithmic scale on aRevMan forest plot, the distance between 1 and 10 is the same as the distance between 10 and 100, or between 100 and 1000. A logarithmic scale may be used when the range of numbers being represented is large, or to represent ratios. See also linear scale.
Logistic regression A form of regression analysis that models an individual’s odds of disease or some other outcome as a function of a risk factor or intervention. It is widely used for dichotomous outcomes, in particular to carry out adjusted analysis.  See also meta-regression.
Log-odds ratio The (natural) log of the odds ratio. It is used in statistical calculations and in graphical displays of odds ratios in systematic reviews.
Loss to follow up See attrition.

M

Masking See blinding.
Matching [In a case-control study:] Choosing one or more controls with particular matching attributes for each case. Researchers match cases and controls according to particular variables that are thought to be important, such as age and sex.
Mean An average value, calculated by adding all the observations and dividing by the number of observations. (Also called arithmetic mean.)
Mean difference [In meta-analysis:] A method used to combine measures on continuous scales (such as weight), where the meanstandard deviation and sample size in each group are known. The weight given to the difference in means from each study (e.g. how much influence each study has on the overall results of the meta-analysis) is determined by the precision of its estimate of effect and, in the statistical software in RevMan and the Cochrane Database of Systematic Reviews, is equal to the inverse of the variance. This method assumes that all of the trials have measured the outcome on the same scale.  See also standardised mean difference.  (Also called WMD,weighted mean difference.)
Median The value of the observation that comes half way when the observations are ranked in order.
Meta-analysis The use of statistical techniques in a systematic review to integrate the results of included studies. Sometimes misused as a synonym for systematic reviews, where the review includes a meta-analysis.
Meta-regression [In meta-analysis:] A technique used to explore the relationship between study characteristics (e.g. concealment of allocation, baseline risk, timing of the intervention) and study results (the magnitude of effect observed in each study) in a systematic review. See also logistic regression.
Methodological quality See internal validitybias prevention.
Minimisation A method of allocation used to provide comparison groups that are closely similar for severalvariables. The next participant is assessed with regard to several characteristics, and assigned to the treatment group that has so far had fewer such people assigned to it. It can be done with a component of randomisation, where the chance of allocation to the group with fewer similar participants is less than one.  Minimisation is best performed centrally with the aid of a computer program to ensure concealment of allocation.
Morbidity Illness or harm. See also co-morbidity.
Mortality Death.
Multi-arm trial A trial with more than two arms.
Multicentre trial A trial conducted at several geographical sites. Trials are sometimes conducted among several collaborating institutions, rather than at a single institution – particularly when very large numbers of participants are needed.
Multiple comparisons The performance of multiple analyses on the same data. Multiple statistical comparisons increase the probability of making a Type I error, i.e. attributing a difference to an intervention when chance is a reasonable explanation.
Multiplicative model A statistical model in which the combined effect of several factors is the product of the effects produced by each in the absence of the others. For example, if one factor multiplies risk by a% and a second factor by b%, the combined effect of the two factors is a multiplication by (a x b)%. See also additive model.
Multivariate analysis Measuring the impact of more than one variable at a time while analysing a set of data, e.g. looking at the impact of age, sex, and occupation on a particular outcome. Performed usingregression analysis.

N

N of 1 randomised trial A randomised trial in an individual to determine the optimum treatment for that individual. The individual is given repeated administrations of experimental and control interventions (or of two or more experimental treatments), with the order of the treatments being randomised.
Negative association See association.
Negative predictive value [In screening/diagnostic tests:] A measure of the usefulness of a screening/diagnostic test. It is the proportion of those with a negative test result who do not have the disease, and can be interpreted as the probability that a negative test result is correct. It is calculated as follows: NPV = Number with a negative test who do not have disease/Number with a negative test.
Negative study A term often used to refer to a study with results that either do not indicate a beneficial effect oftreatment or that have not reached statistical significance.  The term can generate confusion because it can refer to either statistical significance or the direction of effect. Studies often have multiple outcomes, the criteria for classifying studies as ‘negative’ are not always clear and, in the case of studies of risk or undesirable effects, ‘negative’ studies are ones that do not show a harmful effect.
NNH See number needed to treat to harm.
NNT See number needed to treat to benefit.
NNTb See number needed to treat to benefit.
NNTh See number needed to treat to harm.
Non-experimental study See observational study.
Non-inferiority trial A trial designed to determine whether the effect of a new treatment is not worse than a standard treatment by more than a pre-specified amount. A one-sided version of an equivalence trial.
Non-randomised study Any quantitative study estimating the effectiveness of an intervention (harm or benefit) that does not use randomisation to allocate units to comparison groups (including studies where ‘allocation’ occurs in the course of usual treatment decisions or peoples’ choices, i.e. studies usually called ‘observational’). To avoid ambiguity, the term should be substantiated using a description of the type of question being addressed. For example, a ‘non-randomised intervention study’ is typically a comparative study of an experimental intervention against some control intervention (or no intervention) that is not a randomised controlled trial. There are many possible types of non-randomised intervention study, including cohort studiescase-control studiescontrolled before-and-after studiesinterrupted-time-series studies and controlled trials that do not use appropriate randomisation strategies (sometimes called quasi-randomised studies).
Normal distribution A statistical distribution with known properties commonly used as the basis of models to analysecontinuous data. Key assumptions in such analyses are that the data are symmetrically distributed about a mean value, and the shape of the distribution can be described using themean and standard deviation.
Null hypothesis The statistical hypothesis that one variable (e.g. which treatment a study participant was allocated to receive) has no association with another variable or set of variables (e.g. whether or not a study participant died), or that two or more population distributions do not differ from one another.  In simplest terms, the null hypothesis states that the factor of interest (e.g. treatment) has no impact on outcome (e.g. risk of death).
Number needed to harm See number needed to treat to harm.
Number needed to treat See number needed to treat to benefit.
Number needed to treat to benefit An estimate of how many people need to receive a treatment before one person would experience a beneficial outcome. For example, if you need to give a stroke prevention drug to 20 people before one stroke is prevented, then the number needed to treat to benefit for that stroke prevention drug is 20. The NNTb is estimated as the reciprocal of the absolute risk difference.  (Also called NNTNNTBnumber needed to treat.)
Number needed to treat to harm number needed to treat to benefit associated with a harmful effect. It is an estimate of how many people need to receive a treatment before one more person would experience a harmfuloutcome or one fewer person would experience a beneficial outcome. (Also called NNHNNTH,number needed to harm.) See also number needed to treat to benefit.

O

Observational study A study in which the investigators do not seek to intervene, and simply observe the course of events. Changes or differences in one characteristic (e.g. whether or not people received theintervention of interest) are studied in relation to changes or differences in other characteristic(s) (e.g. whether or not they died), without action by the investigator.  There is a greater risk ofselection bias than in experimental studies. See also randomised controlled trial. (Also called non-experimental study.)
Odds A way of expressing the chance of an event, calculated by dividing the number of individuals in a sample who experienced the event by the number for whom it did not occur. For example, if in a sample of 100, 20 people died and 80 people survived the odds of death are 20/80 = ¼, 0.25 or 1:4.
Odds ratio The ratio of the odds of an event in one group to the odds of an event in another group. In studies of treatment effect, the odds in the treatment group are usually divided by the odds in thecontrol group. An odds ratio of one indicates no difference between comparison groups. For undesirable outcomes an OR that is less than one indicates that the  intervention was effective in reducing the risk of that outcome.  When the risk is small, odds ratios are very similar to risk ratios. (Also called OR.)
One-sided test See one-tailed test.
One-tailed test hypothesis test in which the values for which we can reject the null hypothesis are located entirely in one tail of the probability distribution. Testing whether one treatment is better than another (rather than testing whether one treatment is either better or worse than another) would be a one-tailed test.  (Also called one-sided test.) See also two-tailed test.
Open clinical trial There are at least three possible meanings for this term:1. A clinical trial in which the investigator and participant are aware which intervention is being used for which participant (i.e. not blinded). Random allocationmay or may not be used in such trials. Sometimes called an ‘open label’ design.2. A clinical trial in which the investigator decides which intervention is to be used (non-random allocation). This is sometimes called an open label design (but some trials which are said to be ‘open label’, are randomised).3. A clinical trial that uses an open sequential design.
Open sequential design sequential trial where the decision to stop the trial rests on the size of effect in those studies, and there is no finite maximum number of participants in the study.
OR See odds ratio.
Ordinal data Data that are classified into more than two categories which have a natural order; for example, non-smokers, ex-smokers, light smokers and heavy smokers. Ordinal data are often reduced to two categories to simplify analysis and presentation, which may result in a considerable loss of information.
Original study See primary study.
Outcome A component of a participant’s clinical and functional status after an intervention has been applied, that is used to assess the effectiveness of an intervention. See also primary outcomesecondary outcome.
Outcome variable See dependent variable.

P

Paired design A study in which participants or groups of participants are matched (e.g. based on prognostic factors). One member of each pair is then allocated to the experimental (intervention) groupand the other to the control group.
Parallel group trial A trial that compares two groups of people concurrently, one of which receives the intervention of interest and one of which is a control group. Some parallel trials have more than twocomparison groups and some compare different interventions without including a non-intervention control group. (Also called independent group design.)
Parameter A quantity defining a theoretical model. Unlike variables, parameters do not relate to actual measurements or attributes of patients.
Participant An individual who is studied in a trial, often but not necessarily a patient.
Peer review refereeing process for checking the quality and importance of reports of research. An article submitted for publication in a peer-reviewed journal is reviewed by other experts in the area. See also external peer reviewer (of a Cochrane Review).
Per protocol analysis An analysis of the subset of participants from a randomised controlled trial who complied with the protocol sufficiently to ensure that their data would be likely to exhibit the effect of treatment. This subset may be defined after considering exposure to treatment, availability of measurements and absence of major protocol violations. The per protocol analysis strategy may be subject tobias as the reasons for non-compliance may be related to treatment.  See also intention to treat analysis.
Performance bias Systematic differences between intervention groups in care provided apart from the intervention being evaluated. For example, if participants know they are in the control group, they may be more likely to use other forms of care. If care providers are aware of the group a particular participant is in, they might act differently.  Blinding of study participants (both the recipients and providers of care) is used to protect against performance bias.
Period effect [In a cross-over trial:] A difference in the measured outcomes from one treatment period to another. This could be caused, for instance, by all patients in a trial naturally healing over time.
Person-years The average number of years that each participant is followed up for, multiplied by the number of participants.
Peto method A way of combining odds ratios that has become widely used in meta-analysis.  It is especially used to analyse trials with time to event outcomes. The calculations are straightforward and understandable, but this method produces biased results in some circumstances. It is a fixed-effect model.
Phase l, ll, lll, IV trials A series of levels of trials required of drugs before (and after) they are routinely used in clinical practice: Phase I trials assess toxic effects on humans (not many people participate in them, and usually without controls); Phase ll trials assess therapeutic benefit (usually involving a few hundred people, usually with controls, but not always); Phase III trials compare the new treatment against standard (or placebo) treatment (usually a full randomised controlled trial). At this point, a drug can be approved for community use. Phase IV monitors a new treatment in the community, often to evaluate long-term safety and effectiveness.
Placebo An inactive substance or procedure administered to a participant, usually to compare its effects with those of a real drug or other intervention, but sometimes for the psychological benefit to the participant through a belief that s/he is receiving treatment. Placebos are used in clinical trialsto blind people to their treatment allocation. Placebos should be indistinguishable from the active intervention to ensure adequate blinding.
Planned analyses Statistical analyses specified in the trial protocol; that is, planned in advance of data collection. In contrast to unplanned analyses.  (Also called a priori analysespre-specified analyses.)
Point estimate The results (e.g. meanweighted mean differenceodds ratiorisk ratio or risk difference) obtained in a sample (a study or a meta-analysis) which are used as the best estimate of what is true for the relevant population from which the sample is taken.
Poisson distribution A statistical distribution with known properties used as the basis of analysing the number of occurrences of relatively rare events occurring over time.
Population [In research:] The group of people being studied, usually by taking samples from that population. Populations may be defined by any characteristics e.g. geography, age group, certain diseases.
Positive association See association.
Positive predictive value [In screening/diagnostic tests:] A measure of the usefulness of a screening/diagnostic test. It is the proportion of those with a positive test result who have the disease, and can be interpreted as the probability that a positive test result is correct. It is calculated as follows: PPV = Number with a positive test who have disease/Number with a positive test.[In trial searching:] See precision.
Positive study A study with results indicating a beneficial effect of the intervention being studied.  The term can generate confusion because it can refer to both statistical significance and the direction of effect; studies often have multiple outcomes; the criteria for classifying studies as negative or positive are not always clear; and, in the case of studies of risk or undesirable effects, ‘positive’ studies are ones that show a harmful effect.
Post hoc analyses See unplanned analyses.
Posterior distribution The outcome of Bayesian statistical analysis. A probability distribution describing how likely different values of an outcome (e.g. treatment effect) are. It takes into account the belief before the study (the prior distribution) and the observed data from the study.
Power [In statistics:] The probability of rejecting the null hypothesis when a specific alternative hypothesis is true. The power of a hypothesis test is one minus the probability of Type II error.  In clinical trials, power is the probability that a trial will detect, as statistically significant, anintervention effect of a specified size. If a clinical trial had a power of 0.80 (or 80%), and assuming that the pre-specified treatment effect truly existed, then if the trial was repeated 100 times, it would find a statistically significant treatment effect in 80 of them.  Ideally we want a test to have high power, close to maximum of one (or 100%). For a given size of effect, studies with more participants have greater power. Studies with a given number of participants have more power to detect large effects than small effect.  (Also called statistical power.)
Pragmatic trial A trial that aims to test a treatment policy in a ‘real life’ situation, when many people may not receive all of the treatment, and may use other treatments as well.  This is as opposed to anexplanatory trial, which is done under ideal conditions and is trying to determine whether a therapy has the ability to make a difference at all (i.e. testing its efficacy).
Precision 1.  [In statistics:] A measure of the likelihood of random errors in the results of a study, meta-analysis or measurement. The greater the precision, the less random error. Confidence intervalsaround the estimate of effect from each study are one way of expressing precision, with a narrower confidence interval meaning more precision.2.  [In trial searching:] The proportion of relevant articles identified by a search strategyexpressed as a percentage of all articles (relevant and irrelevant) identified by that strategy. Highly sensitive strategies tend to have low levels of precision. It is calculated as follows:  Precision = Number of relevant articles/Number of articles identified. Also called positive predictive value. See also sensitivity.
Pre-specified analyses See planned analyses.
Prevalence The proportion of a population having a particular condition or characteristic: e.g. the percentage of people in a city with a particular disease, or who smoke.
Prevalence study A type of cross-sectional study that measures the prevalence of a characteristic.
Primary outcome The outcome of greatest importance.
Primary study ‘Original research’ in which data are collected. The term primary study is sometimes used to distinguish it from a secondary study (re-analysis of previously collected data), meta-analysis, and other ways of combining studies (such as economic analysis and decision analysis). (Also called original study.)
Probability The chance or risk of something happening.
Probability distribution The function that gives the probabilities that a variable equals each of a sequence of possible values. Examples include the binomial distributionnormal distribution and Poisson distribution. See also distribution.
Proportional hazards model [In survival analysis:] A statistical model that asserts that the effect of the study factors (e.g. theintervention of interest) on the hazard rate (the risk of occurrence of an event, such as death, at a point in time) in the study population is multiplicative and does not change over time. (Also called Cox model.)
Prospective study In evaluations of the effects of healthcare interventions, a study in which people are identified according to current risk status or exposure, and followed forwards through time to observeoutcomeRandomised controlled trials are always prospective studies. Cohort studies are commonly either prospective or retrospective, whereas case-control studies are usually retrospective. In Epidemiology, ‘prospective study’ is sometimes misused as a synonym for cohort study. See also retrospective study.
Protocol The plan or set of steps to be followed in a study. A Protocol for a systematic review should describe the rationale for the review, the objectives, and the methods that will be used to locate, select, and critically appraise studies, and to collect and analyse data from the included studies.
Publication bias See reporting bias.
P-value The probability (ranging from zero to one) that the results observed in a study (or results more extreme) could have occurred by chance if in reality the null hypothesis was true. In a meta-analysis, the P-value for the overall effect assesses the overall statistical significance of the difference between the intervention groups, whilst the P-value for the heterogeneity statistic assesses the statistical significance of differences between the effects observed in each study.

Q

Quality A vague notion of the methodological strength of a study, usually indicating the extent of bias prevention.
Quality score A value assigned to represent the validity of a study either for a specific criterion, such asconcealment of allocation, or overall. Quality scores can use letters (A, B, C) or numbers.  See also bias prevention.
Quasi-random allocation Methods of allocating people to a trial that are not random, but were intended to produce similar groups when used to allocate participants. Quasi-random methods include: allocation by the person’s date of birth, by the day of the week or month of the year, by a person’s medical record number, or just allocating every alternate person. In practice, these methods of allocation are relatively easy to manipulate, introducing selection bias. See also random allocation,randomisation.

R

Random Governed by chance.  See also randomisation.
Random allocation A method that uses the play of chance to assign participants to comparison groups in a trial, e.g. by using a random numbers table or a computer-generated random sequence.  Random allocation implies that each individual or unit being entered into a trial has the same chance of receiving each of the possible interventions. It also implies that the probability that an individual will receive a particular intervention is independent of the probability that any other individual will receive the same intervention. See also quasi-random allocationrandomisation.
Random error Error due to the play of chance. Confidence intervals and P-values allow for the existence of random error, but not systematic errors (bias).
Random permuted blocks A method of randomisation that ensures that, at any point in a trial, roughly equal numbers ofparticipants have been allocated to all the comparison groups.  Permuted blocks should be used in trials using stratified randomisation.  (Also called block randomisation.)
Random sample A group of people selected for a study that is representative of the population of interest. This means that everyone in the population has an equal chance of being approached to participate in the survey, and the process is meant to ensure that a sample is as representative of the population as possible. It has less bias than a convenience sample: that is, a group that the researchers have more convenient access to.  Randomised trials are rarely carried out on random samples.
Random-effects model [In meta-analysis:] A statistical model in which both within-study sampling error (variance) and between-studies variation are included in the assessment of the uncertainty (confidence interval) of the results of a meta-analysis. See also fixed-effect model. When there isheterogeneity among the results of the included studies beyond chance, random-effects models will give wider confidence intervals than fixed-effect models.
Randomisation The process of randomly allocating participants into one of the arms of a controlled trial. There are two components to randomisation: the generation of a random sequence, and its implementation, ideally in a way so that those entering participants into a study are not aware of the sequence (concealment of allocation). (Also called randomisation.)
Randomisation blinding See concealment of allocation.
Randomised clinical trial See randomised controlled trial.
Randomised controlled trial An experiment in which two or more interventions, possibly including a control intervention or no intervention, are compared by being randomly allocated to participants. In most trials one intervention is assigned to each individual but sometimes assignment is to defined groups of individuals (for example, in a household) or interventions are assigned within individuals (for example, in different orders or to different parts of the body).
Rate The speed or frequency of occurrence of an event, usually expressed with respect to time. For instance, a mortality rate might be the number of deaths per year, per 100,000 people.
RCT See randomised controlled trial..
Recall bias bias arising from mistakes in recollecting events, both because of failures of memory, and looking at things ‘with hindsight’ and possibly changed views.  People’s reports of what is happening to them currently, therefore, can be more accurate than their recall of what happened two years ago and how they felt about it at the time. This bias is a threat to the validity ofretrospective studies.
Reference population The population that the results of a study can be generalised to.  See also external validity.
Regression analysis A statistical modelling technique used to estimate or predict the influence of one or moreindependent variables on a dependent variable, e.g. the effect of age, sex, and educational level on the prevalence of a disease.  Logistic regression and meta-regression are types of regression analysis.
Relative risk See risk ratio.
Relative risk reduction The proportional reduction in risk in one treatment group compared to another.  It is one minus therisk ratio. If the risk ratio is 0.25, then the relative risk reduction is 1-0.25=0.75, or 75%.
Reliability The degree to which results obtained by a measurement procedure can be replicated. Lack of reliability can arise from divergences between observers or measurement instruments, measurement error, or instability in the attribute being measured.
Replicate/reproduce Do the same thing to other people in order to achieve the same outcomes that occurred in a study. Also, repeating the circumstances of a study to test whether the results and outcomes are similar in another sample or population.
Reporting bias bias caused by only a subset of all the relevant data being available. The publication of research can depend on the nature and direction of the study results.  Studies in which an intervention is not found to be effective are sometimes not published. Because of this, systematic reviews that fail to include unpublished studies may overestimate the true effect of an intervention. In addition, a published report might present a biased set of results (e.g. only outcomes or sub-groups where astatistically significant difference was found.  (Also called publication bias.)
Reproducible Able to be done the same way elsewhere. See replicate/ reproduce.
Retrospective study A study in which the outcomes have occurred to the participants before the study commenced.Case-control studies are usually retrospective, cohort studies sometimes are, randomised controlled trials never are.  See also prospective study.
Review 1. A systematic review.2. A review article in the medical literature which summarises a number of different studies and may draw conclusions about a particular intervention. Review articles are often not systematic. Review articles are also sometimes called overviews.3. To referee a paper. See refereereferee processexternal peer reviewer.
Risk The proportion of participants experiencing the event of interest. Thus, if out of 100 participants the event (e.g. a stroke) is observed in 32, the risk is 0.32. The control group risk is the risk amongst the control group. The risk is sometimes referred to as the event rate, and the control group risk as the control event rate.  However, these latter terms confuse risk with rate. Statistical texts in particular are happy to discuss risk of beneficial effects as well as adverse events.
Risk difference The difference in size of risk between two groups. For example, if one group has a 15% risk of contracting a particular disease, and the other has a 10% risk of getting the disease, the risk difference is five percentage points. (Also called absolute risk differenceabsolute risk reduction.)
Risk factor An aspect of a person’s condition, lifestyle or environment that affects the probability of occurrence of a disease. For example, cigarette smoking is a risk factor for lung cancer.
Risk ratio The ratio of risks in two groups. In intervention studies, it is the ratio of the risk in theintervention group to the risk in the control group. A risk ratio of one indicates no difference between comparison groups. For undesirable outcomes, a risk ratio that is less than one indicates that the intervention was effective in reducing the risk of that outcome. (Also calledrelative riskRR.)
RR See risk ratio.
Run-in period A period before randomisation when participants are monitored but receive no treatment (or they sometimes all receive one of the study treatments, possibly in a blind fashion). The data from this stage of a trial are only occasionally of value but can serve a valuable role in screening out ineligible or non-compliant participants, in ensuring that participants are in a stable condition, and in providing baseline observations. A run-in period is sometimes called a washout period if treatments that participants were using before entering the trial are discontinued.

S

Safety [of an intervention:] Refers to serious adverse effects, such as those that threaten life, require or prolong hospitalization, result in permanent disability, or cause birth defects. Indirect adverse effects, such as traffic accidents, violence, and damaging consequences of mood change, can also be serious.
SE See standard error.
Secondary outcome An outcome used to evaluate additional effects of the intervention deemed a priori as being less important than the primary outcomes.
Secondary study A study of studies: a review of individual studies (each of which is called a primary study). Asystematic review is a secondary study.
Selection bias 1. Systematic differences between comparison groups in prognosis or responsiveness to treatment. Random allocation with adequate concealment of allocation protects against selection bias. Other means of selecting who receives the  interventionare more prone to bias because decisions may be related to prognosis or responsiveness to treatment.2. A systematic error in reviews due to how studies are selected for inclusion.  Reporting biasis an example of this.3. A systematic difference in characteristics between those who are selected for study and those who are not. This affects external validity but not internal validity.
Sensitivity 1.  [In screening/diagnostic tests:] A measure of a test’s ability to correctly detect people with the disease.  It is the proportion of diseased cases that are correctly identified by the test.  It is calculated as follows:  Sensitivity = Number with disease who have a positive test/Number with disease.  (Also called true positive ratedetection rate.)2.  [In trial searching:] A measure of a search’s ability to correctly identify relevant articles. It is the proportion of all relevant articles from all searches that were identified by the particular search of interest. It is calculated as follows:  Sensitivity = Number of relevant articles identified by the search/Total number of relevant articles from all searches.  (Also called recall.)
Sensitivity analysis An analysis used to determine how sensitive the results of a study or systematic review are to changes in how it was done.  Sensitivity analyses are used to assess how robust the results are to uncertain decisions or assumptions about the data and the methods that were used.
Sequential trial A randomised trial in which the data are analysed after each participant’s results become available, and the trial continues until a clear benefit is seen in favour of one of the comparison groups, or it is unlikely that any difference will emerge.  The main advantage of sequential trials is that they are usually shorter than fixed size trials when there is a large difference in the effectiveness of the interventions being compared.  Their use is restricted to conditions where the outcome of interest is known relatively quickly.  In a group sequential trial, a limited number ofinterim analyses of the data are carried out at pre-specified times during recruitment and follow up, say 3-6 times in all.
Side effect Any unintended effect of an intervention. Side effects are most commonly associated with pharmaceutical products, in which case they are related to the pharmacological properties of the drug at doses normally used for therapeutic purposes in humans. See also adverse effect
Single blind (Also called single masked.)  See blinding.
SMD See standardised mean difference.
Specificity 1. [In screening/diagnostic tests:] A measure of a test’s ability to correctly identify people who do not have the disease. It is the proportion of people without the target disease who are correctly identified by the test. It is the complement of the false positiverate (FPR=1-specificity). It is calculated as follows: Specificity = Number without disease who have a negative test/Number without disease.2. [In trial searching:] There is no equivalent concept in trial searching, as we do not know the total number of irrelevant articles in existence. The concept of precision is usually used instead.
Standard deviation A measure of the spread or dispersion of a set of observations, calculated as the average difference from the mean value in the sample.
Standard error The standard deviation of the sampling distribution of a statistic. Measurements taken from a sample of the population will vary from sample to sample. The standard error is a measure of the variation in the sample statistic over all possible samples of the same size. The standard error decreases as the sample size increases. (Also called SE.)
Standard treatment See conventional treatment.
Standardised mean difference The difference between two estimated means divided by an estimate of the standard deviation. It is used to combine results from studies using different ways of measuring the same concept, e.g. mental health. By expressing the effects as a standardised value, the results can be combined since they have no units.  Standardised mean differences are sometimes referred to as a d index.  (Also called SMD.)
Statistical power See power.
Statistically significant A result that is unlikely to have happened by chance. The usual threshold for this judgement is that the results, or more extreme results, would occur by chance with a probability of less than 0.05 if the null hypothesis was true. Statistical tests produce a p-value used to assess this.
Stopping rule A procedure that allows interim analyses in clinical trials at predefined times, whilst preserving the Type I error at some pre-specified level. See also sequential trial.
Stratification The process by which groups are separated into mutually exclusive sub-groups of the populationthat share a characteristic: e.g. age group, sex, or socioeconomic status. It is possible to compare these different strata to try and see if the effects of a treatment differ between the sub-groups. See also sub-group analysis.
Stratified randomisation A method used to ensure that equal numbers of participants with a characteristic thought to affect prognosis or response to the intervention will be allocated to each comparison group. For example, in a trial of women with breast cancer, it may be important to have similar numbers of pre-menopausal and post-menopausal women in each comparison group. Stratified randomisation could be used to allocate equal numbers of pre- and post-menopausal women to each treatment group. Stratified randomisation is performed by performing separate randomisation (often using random permuted blocks) for each strata. See also minimisation.
Student’s t-test See t test.
Sub-group analysis An analysis in which the intervention effect is evaluated in a defined subset of the participantsin a trial, or in complementary subsets, such as by sex or in age categories. Trial sizes are generally too small for sub-group analyses to have adequate statistical power. Comparison of sub-groups should be by test of interaction rather than by comparison of p-values. Sub-group analyses are also subject to the multiple comparisons problem. See also multiple comparisons.
Surrogate endpoints Outcome measures that are not of direct practical importance but are believed to reflect outcomes that are important; for example, blood pressure is not directly important to patients but it is often used as an outcome in clinical trials because it is a risk factor for stroke and heart attacks. Surrogate endpoints are often physiological or biochemical markers that can be relatively quickly and easily measured, and that are taken as being predictive of important clinical outcomes.  They are often used when observation of clinical outcomes requires long follow-up.  (Also called intermediary outcomessurrogate outcomes.)
Surrogate outcomes See surrogate endpoints.
Survival analysis The analysis of data that measure the time to an event e.g. death, next episode of disease.  See also time to event.
Systematic review (synonym: systematic overview) A review of a clearly formulated question that uses systematic and explicit methods to identify, select, and critically appraise relevant research, and to collect and analyse data from the studies that are included in the review. Statistical methods (meta-analysis) may or may not be used to analyse and summarise the results of the included studies. See also Cochrane Review.

T

t distribution A statistical distribution describing the distribution of the means of samples taken from apopulation with unknown variance.
t test A statistical hypothesis test derived from the t distribution. It is used to compare continuous data in two groups. (Also called Student’s t-test.)
Temporal sequence The sequence of events in time, used as one of the criteria in evaluating causation – the exposure or  intervention must have occurred before the outcome to be a plausible cause of the outcome.
Test of association A statistical test to assess whether the value of one variable is associated (i.e. varies with) the value of another variable, or whether the presence or absence of a factor is more likely when a particular outcome is present. See also correlation.
Time to event A description of the data in studies where the analysis relates not just to whether an event occurs but also when. Such data are analysed using survival analysis.  (Also called survival data.)
Tolerability [of an intervention:] usually refers to medically less important (that is, without serious or permanent sequelae), but unpleasant adverse effects of drugs. These include symptoms such as dry mouth, tiredness, etc, that can affect a person’s quality of life and willingness to continue the treatment. As these adverse effects usually develop early on and are relatively frequent, randomised controlled trials may yield reliable data on their incidence.
Toxicity The degree to which a medicine is poisonous. How much of a medicine can be taken before it has a toxic effect.
Treatment The process of intervening on people with the aim of enhancing health or life expectancy. Sometimes, and particularly in statistical texts, the word is used to cover all comparison groups, including placebo and no treatment arms of a controlled trial and even interventions designed to prevent bad outcomes in healthy people, rather than cure ill people. See also intervention,experimental intervention and control.
Treatment effect See estimate of effect.
Trend 1. A consistent movement across ordered categories, e.g. a change in the effect observed in studies grouped according to, for instance, intensity of treatment.2. Used loosely to refer to an association or possible effect that is not statistically significant.  This usage should be avoided.
Trialist Used to refer to a person conducting or publishing a controlled trial.
Triple blind (Also called triple masked).  See blinding.
True positive rate See sensitivity.
2×2 table A contingency table with two rows and two columns. It arises in clinical trials that comparedichotomous outcomes, such as death, for an intervention and control group or two intervention groups.
Two-tailed hypothesis test in which the values for which we can reject the null hypothesis are located entirely in both tails of the probability distribution. Testing whether one treatment is either better or worse than another (rather than testing whether one treatment is only better than another) would be a two-tailed test. (Also called two-sided test.) See also one-tailed test.
Type I error A conclusion that a  treatment works, when it actually does not work. The risk of a Type I error is often called alpha. In a statistical test, it describes the chance of rejecting the null hypothesiswhen it is in fact true. (Also called false positive.)
Type ll error A conclusion that there is no evidence that a treatment works, when it actually does work. The risk of a Type II error is often called beta. In a statistical test, it describes the chance of not rejecting the null hypothesis when it is in fact false.  The risk of a Type II error decreases as the number of participants in a study increases. (Also called false negative.)

U

Unconfounded comparison A comparison between two  treatment groups that will give an unbiased estimate of the effect of treatment due to the study design. For a comparison to be unconfounded, the two treatment groups must be treated identically, apart from the randomised treatment. For instance, to estimate the effect of heparin in acute stroke, a trial of heparin alone versus placebo would provide an unconfounded comparison. However, a trial of heparin alone versus aspirin alone provides a confounded comparison of the effect of heparin.
Uncontrolled trial clinical trial that has no control group.
Unit of allocation The unit that is assigned to the alternative interventions being investigated in a trial. Most commonly, the unit will be an individual person but, in a cluster randomised trial, groups of people will be assigned together to one or the other of the interventions. In some other trials, different parts of a person (such as the left or right eye) might be assigned to receive different interventions. See also unit of analysis error.
Unit of analysis error An error made in statistical analysis when the analysis does not take account of the unit of allocation. In some studies, the unit of allocation is not a person, but is instead a group of people, or parts of a person, such as eyes or teeth.  Sometimes the data from these studies are analysed as if people had been allocated individually. Using individuals as the unit of analysis when groups of people are allocated can result in overly narrow confidence intervals. In meta-analysis, it can result in studies receiving more weight than is appropriate.
Unplanned analyses Statistical analyses that are not specified in the trial protocol, and are generally suggested by the data.  In contrast to planned analyses.  (Also called data derived analysespost hoc analyses.)
Users of reviews People using a review to make practical decisions about health care, and researchers conducting or considering further research.
Utility In economic and decision analysis, the value given to an outcome, usually expressed as being between zero and one (e.g. death typically has a utility value of zero and a full healthy life has a value of one).

V

Validity The degree to which a result (of a measurement or study) is likely to be true and free of bias(systematic errors). Validity has several other meanings, usually accompanied by a qualifying word or phrase; for example, in the context of measurement, expressions such as ‘construct validity’, ‘content validity’ and ‘criterion validity’ are used. See also external validityinternal validity.
Variable A factor that differs among and between groups of people. Variables include patient characteristics such as age, sex, and smoking, or measurements such as blood pressure or depression score. There can also be treatment or condition variables, e.g. in a childbirth study, the length of time someone was in labour, and outcome variables. The set of values of a variable in a population or sample is known as a distribution.
Variance A measure of the variation shown by a set of observations, equal to the square of the standard deviation. It is defined as the sum of the squares of deviations from the mean, divided by the number of observations minus one.

W

Washout period/phase [In a cross-over trial:] The stage after the first treatment is withdrawn, but before the second treatment is started. The washout period aims to allow time for any active effects of the first treatment to wear off before the new one gets started.
Weighted least squares regression [In meta-analysis:] A meta-regression technique for estimating the parameters of a regression model, wherein each study’s contribution to the sum of products of the measured variables (study characteristics) is weighted by the precision of that study’s estimate of effect.
Weighted mean difference See mean difference.
WMD See mean difference.

Z

Z [On a forest plot in RevMan:] The value of the test for the overall effect of treatment , from which a p-value is derived.