
Extensions
Numerous extensions of linear regression have been developed, which allow some or all of the assumptions underlying the basic model to be relaxed.
Simple and multiple linear regression
Example of simple linear regression, which has one independent variable
The very simplest case of a single scalar predictor variable x and a single scalar response variable y is known as simple linear regression. The extension to multiple and/or vector-valued predictor variables (denoted with a capital X) is known as multiple linear regression, also known as multivariable linear regression (not to be confused with multivariate linear regression).
Multiple linear regression is a generalization of simple linear regression to the case of more than one independent variable, and a special case of general linear models, restricted to one dependent variable. The basic model for multiple linear regression is
\(Y_{i}=\beta _{0}+\beta _{1}X_{i1}+\beta _{2}X_{i2}+\ldots +\beta _{p}X_{ip}+\epsilon _{i}\)
for each observation \(i=1,\ldots ,n\).
In the formula above we consider n observations of one dependent variable and p independent variables. Thus, Yi is the ith observation of the dependent variable, Xij is ith observation of the jth independent variable, j = 1, 2, ..., p. The values βj represent parameters to be estimated, and εi is the ith independent identically distributed normal error.
In the more general multivariate linear regression, there is one equation of the above form for each of m > 1 dependent variables that share the same set of explanatory variables and hence are estimated simultaneously with each other:
\(Y_{ij}=\beta _{0j}+\beta _{1j}X_{i1}+\beta _{2j}X_{i2}+\ldots +\beta _{pj}X_{ip}+\epsilon _{ij}\)
for all observations indexed as i = 1, ... , n and for all dependent variables indexed as j = 1, ... , m.
Nearly all real-world regression models involve multiple predictors, and basic descriptions of linear regression are often phrased in terms of the multiple regression model. Note, however, that in these cases the response variable y is still a scalar. Another term, multivariate linear regression, refers to cases where y is a vector, i.e., the same as general linear regression.
General linear models
Heteroscedastic models
Generalized linear models
Generalized linear models (GLMs) are a framework for modeling response variables that are bounded or discrete. This is used, for example:
- when modeling positive quantities (e.g. prices or populations) that vary over a large scale - which are better described using a skewed distribution such as the log-normal distribution or Poisson distribution (although GLMs are not used for log-normal data, instead the response variable is simply transformed using the logarithm function);
- when modeling categorical data, such as the choice of a given candidate in an election (which is better described using a Bernoulli distribution/binomial distribution for binary choices, or a categorical distribution/multinomial distribution for multi-way choices), where there are a fixed number of choices that cannot be meaningfully ordered;
- when modeling ordinal data, e.g. ratings on a scale from 0 to 5, where the different outcomes can be ordered but where the quantity itself may not have any absolute meaning (e.g. a rating of 4 may not be "twice as good" in any objective sense as a rating of 2, but simply indicates that it is better than 2 or 3 but not as good as 5).
Generalized linear models allow for an arbitrary link function, g, that relates the mean of the response variable(s) to the predictors: \(E(Y)=g^{-1}(XB)\). The link function is often related to the distribution of the response, and in particular it typically has the effect of transforming between the \((-\infty ,\infty )\) range of the linear predictor and the range of the response variable.
Some common examples of GLMs are:
- Poisson regression for count data.
- Logistic regression and probit regression for binary data.
- Multinomial logistic regression and multinomial probit regression for categorical data.
- Ordered logit and ordered probit regression for ordinal data.
Hierarchical linear models
Errors-in-variables
Group effects
In a multiple linear regression model
\(y=\beta _{0}+\beta _{1}x_{1}+\cdots +\beta _{p}x_{p}+\varepsilon\),
parameter \(\beta _{j}\) of predictor variable \(x_{j}\) represents the individual effect of \(x_{j}\). It has an interpretation as the expected change in the response variable \(y\) when \(x_{j}\) increases by one unit with other predictor variables held constant. When \(x_{j}\) is strongly correlated with other predictor variables, it is improbable that \(x_{j}\) can increase by one unit with other variables held constant. In this case, the interpretation of \(\beta _{j}\) becomes problematic as it is based on an improbable condition, and the effect of \(x_{j}\) cannot be evaluated in isolation.
For a group of predictor variables, say, \(\{x_{1},x_{2},\dots ,x_{q}\}\), a group effect \(\xi (\mathbf {w} )\) is defined as a linear combination of their parameters
\(\xi (\mathbf {w} )=w_{1}\beta _{1}+w_{2}\beta _{2}+\dots +w_{q}\beta _{q}\),
where \(\mathbf {w} =(w_{1},w_{2},\dots ,w_{q})^{\intercal }\) is a weight vector satisfying \(\sum _{j=1}^{q}|w_{j}|=1\). Because of the constraint on \({w_{j}}}, \(\xi (\mathbf {w} )\) is also referred to as a normalized group effect. A group effect \(\xi (\mathbf {w} )\) has an interpretation as the expected change in \(y\) when variables in the group \(x_{1},x_{2},\dots ,x_{q}\) change by the amount \(w_{1},w_{2},\dots ,w_{q}\), respectively, at the same time with variables not in the group held constant. It generalizes the individual effect of a variable to a group of variables in that \((i)\) if \(q=1\), then the group effect reduces to an individual effect, and \((ii)\) if \(w_{i}=1\) and \(w_{j}=0\) for \(j\neq i\), then the group effect also reduces to an individual effect. A group effect \(xi (\mathbf {w} )\) is said to be meaningful if the underlying simultaneous changes of the \(q\) variables \((w_{1},w_{2},\dots ,w_{q})^{\intercal }\) is probable.
Group effects provide a means to study the collective impact of strongly correlated predictor variables in linear regression models. Individual effects of such variables are not well-defined as their parameters do not have good interpretations. Furthermore, when the sample size is not large, none of their parameters can be accurately estimated by the least squares regression due to the multicollinearity problem. Nevertheless, there are meaningful group effects that have good interpretations and can be accurately estimated by the least squares regression. A simple way to identify these meaningful group effects is to use an all positive correlations (APC) arrangement of the strongly correlated variables under which pairwise correlations among these variables are all positive, and standardize all \(p\) predictor variables in the model so that they all have mean zero and length one. To illustrate this, suppose that \(\{x_{1},x_{2},\dots ,x_{q}\}\) is a group of strongly correlated variables in an APC arrangement and that they are not strongly correlated with predictor variables outside the group. Let \(y'\) be the centred \(y\) and \(x_{j}'\) be the standardized \(x_{j}\). Then, the standardized linear regression model is
\(y'=\beta _{1}'x_{1}'+\cdots +\beta _{p}'x_{p}'+\varepsilon\).
Parameters \(\beta _{j}\) in the original model, including \(\beta _{0}\), are simple functions of \(\beta _{j}'\) in the standardized model. The standardization of variables does not change their correlations, so \(\{x_{1}',x_{2}',\dots ,x_{q}'\}\) is a group of strongly correlated variables in an APC arrangement and they are not strongly correlated with other predictor variables in the standardized model. A group effect of \(\{x_{1}',x_{2}',\dots ,x_{q}'\}\) is
\(\xi '(\mathbf {w} )=w_{1}\beta _{1}'+w_{2}\beta _{2}'+\dots +w_{q}\beta _{q}'\),
and its minimum-variance unbiased linear estimator is
\({\hat {\xi }}'(\mathbf {w} )=w_{1}{\hat {\beta }}_{1}'+w_{2}{\hat {\beta }}_{2}'+\dots +w_{q}{\hat {\beta }}_{q}'\),
where \({\hat {\beta }}_{j}'\) is the least squares estimator of \( \beta _{j}'\). In particular, the average group effect of the \(q\) standardized variables is
\(\xi _{A}={\frac {1}{q}}(\beta _{1}'+\beta _{2}'+\dots +\beta _{q}')\),
which has an interpretation as the expected change in \(y'\) when all \(x_{j}'\) in the strongly correlated group increase by \((1/q)\)th of a unit at the same time with variables outside the group held constant. With strong positive correlations and in standardized units, variables in the group are approximately equal, so they are likely to increase at the same time and in similar amount. Thus, the average group effect \(\xi _{A}\) is a meaningful effect. It can be accurately estimated by its minimum-variance unbiased linear estimator\({\hat {\xi }}_{A}={\frac {1}{q}}({\hat {\beta }}_{1}'+{\hat {\beta }}_{2}'+\dots +{\hat {\beta }}_{q}')\), even when individually none of the \(\beta _{j}\) can be accurately estimated by \({\hat {\beta }}_{j}'\).
Not all group effects are meaningful or can be accurately estimated. For example, \(\beta _{1}'\) is a special group effect with weights \(w_{1}=1\) and \(w_{j}=0\) for \(j\neq 1\), but it cannot be accurately estimated by \({\hat {\beta }}'_{1}\). It is also not a meaningful effect. In general, for a group of \(q\) strongly correlated predictor variables in an APC arrangement in the standardized model, group effects whose weight vectors \({w}\) are at or near the centre of the simplex \(\sum _{j=1}^{q}w_{j}=1\) ( \(w_{j}\geq 0\)) are meaningful and can be accurately estimated by their minimum-variance unbiased linear estimators. Effects with weight vectors far away from the centre are not meaningful as such weight vectors represent simultaneous changes of the variables that violate the strong positive correlations of the standardized variables in an APC arrangement. As such, they are not probable. These effects also cannot be accurately estimated.
Applications of the group effects include (1) estimation and inference for meaningful group effects on the response variable, (2) testing for "group significance" of the \(q\) variables via testing \(H_{0}:\xi _{A}=0\) versus \(H_{1}:\xi _{A}\neq 0\), and (3) characterizing the region of the predictor variable space over which predictions by the least squares estimated model are accurate.
A group effect of the original variables \(\{x_{1},x_{2},\dots ,x_{q}\}\) can be expressed as a constant times a group effect of the standardized variables \(\{x_{1}',x_{2}',\dots ,x_{q}'\}\). The former is meaningful when the latter is. Thus meaningful group effects of the original variables can be found through meaningful group effects of the standardized variables.