Regression analysis

home >
Glossary >
Regression analysis

Simple regression analysis is a statistical procedure used to summarize a trend or relationship between two variables. The result is a line or curve on a graph, representing a model (a mathematical function), which describes the general relationship in the data.

The simplest model is a straight line (linear model), Y = a + bX. The variable Y is the response, or outcome variable, which is plotted on the ordinate (vertical axis) of a graph. The variable X is the predictor, or explanatory variable, which is plotted on the abscissa (horizontal axis) of a graph. The other values in the model are called parameters; a is called the intercept of the line at the Y axis and b is called the slope. See, for example, the plot of erythrocyte mutant fraction versus radiation dose produced by radiobiologists at RERF, where Y is mutant fraction and X is radiation dose.

In regression analysis, the values of the parameters, a and b, are estimated using methods that seek the best fit of the model to the data. Different methods are used depending on the type of data. Common methods include simple linear regression (least squares) for continuous data (such as height, weight, or blood pressure), Poisson regression for data that are counts (e.g., number of persons with leukemia in a population), logistic regression for binary data (a yes/no outcome, such as having a certain symptom or not), and Cox regression for event times (such as how long a patient treated for cancer remains free of disease before suffering a relapse following therapy).

Many types of model are possible. Sometimes the model is used to describe (illustrate simply) the relationship between X and Y; such models are called descriptive models. The linear model is typically used this way. A linear-quadratic model, Y = a + bX + cX², can be used to describe data that display curvature (the parameter c is called the curvature). Sometimes the model is based on biological or physiological assumptions about the mechanism of how the explanatory variable X affects, or causes, the outcome Y; such models are called mechanistic models. With mechanistic models the mathematical function can be quite complicated, but the parameters have meaning in terms of biological or physiological quantities. It is also possible to include many explanatory variables, which is necessary when several related variables are associated with Y (confounding). Sometimes the joint effects of two or more explanatory variables include mechanistic interaction, where one explanatory variable modifies the effect of another (effect modification).

Back

Glossary

Regression analysis