Deviance is a measure of how well data are fitted by a statistical model. Data typically contain variation from two sources: randomness (the random component) and factors (or covariates) that affect the mean, or average, of the data (the systematic component). Factors can include age, gender, radiation exposure, diet, type of occupation, and so on. An analysis of data is based on a model that encompasses both components. The random component is captured through a statistical distribution. The systematic component is handled through a mathematical function of the factors and effect parameters. The parameters are estimated by finding the model that fits the data best–hence the use of deviance. After fitting a model, each observed data point has an associated “fitted value” or estimate of the mean derived from the fitted model. The deviance measures the discrepancy between observed data and fitted values in light of the random variation described by the statistical distribution. If the model explains the variation in the data well, the discrepancy between data and fitted values will be small and the model will be accepted. On the other hand, if the discrepancy is large, the model must be revised.
Most analyses of atomic-bomb survivor data at the Radiation Effects Research Foundation are based on models developed using deviance. Reports typically mention “deviance” as a short-hand notation for “deviance difference.” The difference in deviance between two models for the systematic component is particularly useful when one model is “nested” within the other. A nested model can be obtained by fixing as constant some parameters in the larger model. According to statistical theory, the difference in deviances for two such models has approximately a chi-square distribution with degrees of freedom equal to the number of fixed parameters. This is the basis of testing whether the constrained model (the model with fixed parameters) fits the data as well as the larger model, or whether those parameters are needed in the model. For example, when testing whether there is a gender difference in sensitivity to radiation, one could include a parameter describing interaction between gender and dose response; if the difference in deviance is not statistically significant between that model and the constrained model (where the interaction parameter is set equal to zero), then one would conclude that there is no interaction.
Mathematically, the deviance is the logarithm of the likelihood ratio statistic comparing a particular fitted model to the so-called “full model.” The “full model” uses the observed data as fitted values and therefore ascribes all of the variation to the systematic component with none to the random component. At the other extreme is the so-called null model, where a common mean is fit to all of the data. The null model does not incorporate any factors, so all of the variation is ascribed to the random component and none to the systematic component. The goal of a statistical analysis is to find a model that describes the data better than the null model (if such a model exists) without merely repeating the data in the model as the “full model” does. This is accomplished by assessing the difference in deviance among several nested models and choosing the model that describes the data best with the fewest possible parameters. This is because the investigator wants to know how, or if, the data depend on various factors.