How Regression Is Applied in Contemporary Computing: Extensions | Saylor Academy

Extensions

Numerous extensions of linear regression have been developed, which allow some or all of the assumptions underlying the basic model to be relaxed.

Simple and multiple linear regression

Example of simple linear regression, which has one independent variable

The very simplest case of a single scalar predictor variable x and a single scalar response variable y is known as simple linear regression. The extension to multiple and/or vector-valued predictor variables (denoted with a capital X) is known as multiple linear regression, also known as multivariable linear regression (not to be confused with multivariate linear regression).

Multiple linear regression is a generalization of simple linear regression to the case of more than one independent variable, and a special case of general linear models, restricted to one dependent variable. The basic model for multiple linear regression is

\(Y_{i}=\beta _{0}+\beta _{1}X_{i1}+\beta _{2}X_{i2}+\ldots +\beta _{p}X_{ip}+\epsilon _{i}\)

for each observation \(i=1,\ldots ,n\).

In the formula above we consider n observations of one dependent variable and p independent variables. Thus, Y_i is the i^th observation of the dependent variable, X_ij is i^th observation of the j^th independent variable, j = 1, 2, ..., p. The values β_j represent parameters to be estimated, and ε_i is the i^th independent identically distributed normal error.

In the more general multivariate linear regression, there is one equation of the above form for each of m > 1 dependent variables that share the same set of explanatory variables and hence are estimated simultaneously with each other:

\(Y_{ij}=\beta _{0j}+\beta _{1j}X_{i1}+\beta _{2j}X_{i2}+\ldots +\beta _{pj}X_{ip}+\epsilon _{ij}\)

for all observations indexed as i = 1, ... , n and for all dependent variables indexed as j = 1, ... , m.

Nearly all real-world regression models involve multiple predictors, and basic descriptions of linear regression are often phrased in terms of the multiple regression model. Note, however, that in these cases the response variable y is still a scalar. Another term, multivariate linear regression, refers to cases where y is a vector, i.e., the same as general linear regression.

General linear models

The general linear model considers the situation when the response variable is not a scalar (for each observation) but a vector, \(y_i\). Conditional linearity of \( E(\mathbf {y} \mid \mathbf {x} _{i})=\mathbf {x} _{i}^{\mathsf {T}}B\) is still assumed, with a matrix B replacing the vector β of the classical linear regression model. Multivariate analogues of ordinary least squares (OLS) and generalized least squares (GLS) have been developed. "General linear models" are also called "multivariate linear models". These are not the same as multivariable linear models (also called "multiple linear models").

Heteroscedastic models

Various models have been created that allow for heteroscedasticity, i.e. the errors for different response variables may have different variances. For example, weighted least squares is a method for estimating linear regression models when the response variables may have different error variances, possibly with correlated errors. (See also Weighted linear least squares, and Generalized least squares). Heteroscedasticity-consistent standard errors is an improved method for use with uncorrelated but potentially heteroscedastic errors.

Generalized linear models

Generalized linear models (GLMs) are a framework for modeling response variables that are bounded or discrete. This is used, for example:

when modeling positive quantities (e.g. prices or populations) that vary over a large scale - which are better described using a skewed distribution such as the log-normal distribution or Poisson distribution (although GLMs are not used for log-normal data, instead the response variable is simply transformed using the logarithm function);
when modeling categorical data, such as the choice of a given candidate in an election (which is better described using a Bernoulli distribution/binomial distribution for binary choices, or a categorical distribution/multinomial distribution for multi-way choices), where there are a fixed number of choices that cannot be meaningfully ordered;
when modeling ordinal data, e.g. ratings on a scale from 0 to 5, where the different outcomes can be ordered but where the quantity itself may not have any absolute meaning (e.g. a rating of 4 may not be "twice as good" in any objective sense as a rating of 2, but simply indicates that it is better than 2 or 3 but not as good as 5).

Generalized linear models allow for an arbitrary link function, g, that relates the mean of the response variable(s) to the predictors: \(E(Y)=g^{-1}(XB)\). The link function is often related to the distribution of the response, and in particular it typically has the effect of transforming between the \((-\infty ,\infty )\) range of the linear predictor and the range of the response variable.

Some common examples of GLMs are:

Poisson regression for count data.
Logistic regression and probit regression for binary data.
Multinomial logistic regression and multinomial probit regression for categorical data.
Ordered logit and ordered probit regression for ordinal data.

Single index models allow some degree of nonlinearity in the relationship between x and y, while preserving the central role of the linear predictor β′x as in the classical linear regression model. Under certain conditions, simply applying OLS to data from a single-index model will consistently estimate β up to a proportionality constant.

Hierarchical linear models

Hierarchical linear models (or multilevel regression) organizes the data into a hierarchy of regressions, for example where A is regressed on B, and B is regressed on C. It is often used where the variables of interest have a natural hierarchical structure such as in educational statistics, where students are nested in classrooms, classrooms are nested in schools, and schools are nested in some administrative grouping, such as a school district. The response variable might be a measure of student achievement such as a test score, and different covariates would be collected at the classroom, school, and school district levels.

Errors-in-variables

Errors-in-variables models (or "measurement error models") extend the traditional linear regression model to allow the predictor variables X to be observed with error. This error causes standard estimators of β to become biased. Generally, the form of bias is an attenuation, meaning that the effects are biased toward zero.

Group effects

In a multiple linear regression model

\(y=\beta _{0}+\beta _{1}x_{1}+\cdots +\beta _{p}x_{p}+\varepsilon\),

parameter \(\beta _{j}\) of predictor variable \(x_{j}\) represents the individual effect of \(x_{j}\). It has an interpretation as the expected change in the response variable \(y\) when \(x_{j}\) increases by one unit with other predictor variables held constant. When \(x_{j}\) is strongly correlated with other predictor variables, it is improbable that \(x_{j}\) can increase by one unit with other variables held constant. In this case, the interpretation of \(\beta _{j}\) becomes problematic as it is based on an improbable condition, and the effect of \(x_{j}\) cannot be evaluated in isolation.

For a group of predictor variables, say, \(\{x_{1},x_{2},\dots ,x_{q}\}\), a group effect \(\xi (\mathbf {w} )\) is defined as a linear combination of their parameters

\(\xi (\mathbf {w} )=w_{1}\beta _{1}+w_{2}\beta _{2}+\dots +w_{q}\beta _{q}\),

where \(\mathbf {w} =(w_{1},w_{2},\dots ,w_{q})^{\intercal }\) is a weight vector satisfying \(\sum _{j=1}^{q}|w_{j}|=1\). Because of the constraint on \({w_{j}}}, \(\xi (\mathbf {w} )\) is also referred to as a normalized group effect. A group effect \(\xi (\mathbf {w} )\) has an interpretation as the expected change in \(y\) when variables in the group \(x_{1},x_{2},\dots ,x_{q}\) change by the amount \(w_{1},w_{2},\dots ,w_{q}\), respectively, at the same time with variables not in the group held constant. It generalizes the individual effect of a variable to a group of variables in that \((i)\) if \(q=1\), then the group effect reduces to an individual effect, and \((ii)\) if \(w_{i}=1\) and \(w_{j}=0\) for \(j\neq i\), then the group effect also reduces to an individual effect. A group effect \(xi (\mathbf {w} )\) is said to be meaningful if the underlying simultaneous changes of the \(q\) variables \((w_{1},w_{2},\dots ,w_{q})^{\intercal }\) is probable.

Group effects provide a means to study the collective impact of strongly correlated predictor variables in linear regression models. Individual effects of such variables are not well-defined as their parameters do not have good interpretations. Furthermore, when the sample size is not large, none of their parameters can be accurately estimated by the least squares regression due to the multicollinearity problem. Nevertheless, there are meaningful group effects that have good interpretations and can be accurately estimated by the least squares regression. A simple way to identify these meaningful group effects is to use an all positive correlations (APC) arrangement of the strongly correlated variables under which pairwise correlations among these variables are all positive, and standardize all \(p\) predictor variables in the model so that they all have mean zero and length one. To illustrate this, suppose that \(\{x_{1},x_{2},\dots ,x_{q}\}\) is a group of strongly correlated variables in an APC arrangement and that they are not strongly correlated with predictor variables outside the group. Let \(y'\) be the centred \(y\) and \(x_{j}'\) be the standardized \(x_{j}\). Then, the standardized linear regression model is

\(y'=\beta _{1}'x_{1}'+\cdots +\beta _{p}'x_{p}'+\varepsilon\).

Parameters \(\beta _{j}\) in the original model, including \(\beta _{0}\), are simple functions of \(\beta _{j}'\) in the standardized model. The standardization of variables does not change their correlations, so \(\{x_{1}',x_{2}',\dots ,x_{q}'\}\) is a group of strongly correlated variables in an APC arrangement and they are not strongly correlated with other predictor variables in the standardized model. A group effect of \(\{x_{1}',x_{2}',\dots ,x_{q}'\}\) is

\(\xi '(\mathbf {w} )=w_{1}\beta _{1}'+w_{2}\beta _{2}'+\dots +w_{q}\beta _{q}'\),

and its minimum-variance unbiased linear estimator is

\({\hat {\xi }}'(\mathbf {w} )=w_{1}{\hat {\beta }}_{1}'+w_{2}{\hat {\beta }}_{2}'+\dots +w_{q}{\hat {\beta }}_{q}'\),

where \({\hat {\beta }}_{j}'\) is the least squares estimator of \( \beta _{j}'\). In particular, the average group effect of the \(q\) standardized variables is

\(\xi _{A}={\frac {1}{q}}(\beta _{1}'+\beta _{2}'+\dots +\beta _{q}')\),

which has an interpretation as the expected change in \(y'\) when all \(x_{j}'\) in the strongly correlated group increase by \((1/q)\)th of a unit at the same time with variables outside the group held constant. With strong positive correlations and in standardized units, variables in the group are approximately equal, so they are likely to increase at the same time and in similar amount. Thus, the average group effect \(\xi _{A}\) is a meaningful effect. It can be accurately estimated by its minimum-variance unbiased linear estimator\({\hat {\xi }}_{A}={\frac {1}{q}}({\hat {\beta }}_{1}'+{\hat {\beta }}_{2}'+\dots +{\hat {\beta }}_{q}')\), even when individually none of the \(\beta _{j}\) can be accurately estimated by \({\hat {\beta }}_{j}'\).

Not all group effects are meaningful or can be accurately estimated. For example, \(\beta _{1}'\) is a special group effect with weights \(w_{1}=1\) and \(w_{j}=0\) for \(j\neq 1\), but it cannot be accurately estimated by \({\hat {\beta }}'_{1}\). It is also not a meaningful effect. In general, for a group of \(q\) strongly correlated predictor variables in an APC arrangement in the standardized model, group effects whose weight vectors \({w}\) are at or near the centre of the simplex \(\sum _{j=1}^{q}w_{j}=1\) ( \(w_{j}\geq 0\)) are meaningful and can be accurately estimated by their minimum-variance unbiased linear estimators. Effects with weight vectors far away from the centre are not meaningful as such weight vectors represent simultaneous changes of the variables that violate the strong positive correlations of the standardized variables in an APC arrangement. As such, they are not probable. These effects also cannot be accurately estimated.

Applications of the group effects include (1) estimation and inference for meaningful group effects on the response variable, (2) testing for "group significance" of the \(q\) variables via testing \(H_{0}:\xi _{A}=0\) versus \(H_{1}:\xi _{A}\neq 0\), and (3) characterizing the region of the predictor variable space over which predictions by the least squares estimated model are accurate.

A group effect of the original variables \(\{x_{1},x_{2},\dots ,x_{q}\}\) can be expressed as a constant times a group effect of the standardized variables \(\{x_{1}',x_{2}',\dots ,x_{q}'\}\). The former is meaningful when the latter is. Thus meaningful group effects of the original variables can be found through meaningful group effects of the standardized variables.

Others

In Dempster–Shafer theory, or a linear belief function in particular, a linear regression model may be represented as a partially swept matrix, which can be combined with similar matrices representing observations and other assumed normal distributions and state equations. The combination of swept or unswept matrices provides an alternative method for estimating linear regression models.

Course Introduction

Course Syllabus

Unit 1: What Is Artificial Intelligence?

1.1: The Turing Test

The Turing Test for Intelligence

Why the Turing Test Is Important

1.2: The Four Types of AI

Is Intelligence How You Think or the Output of Thinking?

Unit 1 Assessment

Unit 1 Assessment

Unit 2: Agent-Based Approach to AI

2.1: Introduction to Agent-Based AI

Agents, Agent Types, and Their Capabilities

2.2: Analyzing Environmental Characteristics

Properties of Problem Environments and How to Analyze Them

Unit 2 Assessment

Unit 2 Assessment

Unit 3: Machine Learning and Its Importance

3.1: Learning in AI and Agents

Supervised, Unsupervised, and Reinforcement ML

3.2: Applications of ML in Neural Networks

Newer Machine Learning Models and Applications

Unit 3 Assessment

Unit 3 Assessment

Unit 4: Machine Learning Algorithms

4.1: Classification Algorithms

Classification versus Regression

Importance of Classification and Regression in Machine Learning

Classification Using K-nearest Neighbors Algorithm

4.2: Classification Algorithm Performance

False Positives / False Negatives / Confusion Matrix

Precision and Recall Calculations from Confusion Matrix

Linear Regression – How It Works

4.3: Linear Regression Algorithms

Metrics for Linear Regression Effectiveness: R-squared, MSE and RSE

Lasso and Ridge Regression

Improving Linear Regression by Reducing Residual Errors

4.4: Other Supervised ML Classification Algorithms

Classification Using Decision Trees

Classification Using Logistic Regression

Applying Bayes' Theorem in Machine Learning

4.5: Unsupervised Learning and Reinforcement Learning

Unlabelled Data and Unsupervised Machine Learning

Principles and Applications of Reinforcement Learning

4.6: ML Using Neural Networks

Introduction to Neural Networks Basics

Neural Networks: Types and Applications

Unit 4 Assessment

Unit 4 Assessment

Unit 5: Problem-Solving Methods in AI

5.1: Integrating ML Skills

Applying Classification to Determine Insurability

How Regression Is Applied in Contemporary Computing

Using Neural Networks in Cancer Detection

5.2: General AI Problem-Solver Architecture

Characteristics of General Problem-Solver

5.3: Designing a General Problem-Solving Agent

How GPS Is Used

Computational Tractability of GPS

Unit 5 Assessment

Unit 5 Assessment

Unit 6: Search Algorithms

6.1: Uninformed Search Algorithms

Uninformed or Brute Force Search

Depth First Search Algorithm

Breadth First Search Algorithm

Uniform Cost Search Algorithm

6.2: Heuristic Search Algorithms

Heuristics and Using Them to Improve Search

Overview of A* Search and Analysis of Performance

Unit 6 Assessment

Unit 6 Assessment

Unit 7: Iterative Improvement Algorithms

7.1: Using Iterative Improvement to Solve Problems

Iterative Improvement Algorithms and Hill-Climbing

Constraint Satisfaction Problems and Their Importance

7.2: Improving Algorithm Efficiency

How Simulated Annealing Improves Hill-Climbing

Improving Mediocre Solutions Using Genetic Algorithms

Unit 7 Assessment