Linear regression

EDEXCEL A Level

Video masterclass

Topic summary

A linear regression equation is a mathematical model used to represent the relationship between two variables in the form of a straight line. It is commonly used in statistics to predict the value of one variable based on the value of another variable, often with the aim of identifying trends or patterns in data.

What is a Linear Regression Equation?

A linear regression equation typically has the form:

y = mx + c

Where:

y is the dependent variable (the value you're trying to predict),
x is the independent variable (the value you know),
m is the slope (or gradient) of the line, indicating how much y changes for each unit change in x,
c is the y-intercept (the value of y when x = 0).

How is a Linear Regression Equation Used?

Linear regression is often used in a variety of fields, such as economics, biology, and social sciences, to model and predict relationships between two variables. For example, it can help you predict sales based on advertising spend or estimate the weight of a person based on their height.

Least Squares Method

The linear regression equation is typically found by using the "least squares" method. This method involves finding the line that minimizes the sum of the squares of the vertical distances (or residuals) from each data point to the line. In other words, the line is positioned in such a way that the total squared differences between the actual data points and the predicted values are as small as possible.

The process involves:

Plotting the data points on a graph,
Drawing the line that best fits the data,
Minimising the sum of the squared distances between the points and the line.

Estimating with Regression Equations

Once a linear regression equation has been determined from a set of data, it can be used to estimate values of the dependent variable, $y$ , based on given values of the independent variable, $x$ . This is useful for predicting outcomes or understanding relationships in the data, as long as the value of $x$ is within the range of values used to create the regression model.

Interpolation

Interpolation refers to estimating the value of the dependent variable, $y$ , for an $x$ -value that lies within the range of known data points. Since the regression line is based on the data points, interpolation involves finding $y$ for a value of $x$ that falls between the minimum and maximum values of $x$ in the dataset.

For example, if the regression equation is $y = 10 x + 20$ , and we know that $x = 3$ lies within the data range, we can substitute $x = 3$ into the equation to find the corresponding $y$ -value.

Example of Interpolation: Given the regression equation $y = 10 x + 20$ , to estimate $y$ when $x = 3$ , substitute $x = 3$ :

y = 10(3) + 20 = 30 + 20 = 50

Thus, the estimated value of $y$ when $x = 3$ is 50.

Extrapolation

Extrapolation, on the other hand, involves estimating the value of the dependent variable, $y$ , for an $x$ -value that lies outside the range of known data points. While interpolation can give reliable estimates since it's based on the range of data already available, extrapolation can be more uncertain as it extends the model beyond the observed data, potentially leading to less accurate predictions.

For example, if the regression equation is $y = 10 x + 20$ , and we want to estimate the value of $y$ when $x = 6$ , which lies outside the dataset, we would substitute $x = 6$ into the equation:

y = 10(6) + 20 = 60 + 20 = 80

Thus, the estimated value of $y$ when $x = 6$ is 80. However, this estimate is an extrapolation and may not be as accurate as interpolation, especially if the relationship between the variables changes outside the range of the data.

Not Predicting $x$ with a $y$ -on- $x$ Regression Equation

It is important to note that a regression equation, such as $y = m x + c$ , is typically used to predict $y$ based on a given value of $x$ , not the other way around. The equation is derived to model the relationship between the variables in a way that best fits the data points, and using it to solve for $x$ from a given $y$ -value is not recommended, as it assumes that $y$ is dependent on $x$ . Solving for $x$ using a $y$ -on- $x$ regression equation would reverse the intended relationship and may lead to misleading results.

Extra questions (ultimate exclusive)

Ultimate members get access to four additional questions with full video explanations.

< back

Linear regression

Linear regression

Video masterclass

Topic summary

What is a Linear Regression Equation?

How is a Linear Regression Equation Used?

Least Squares Method

Estimating with Regression Equations

Interpolation

Extrapolation

Not Predicting x with a y-on-x Regression Equation

Extra questions (ultimate exclusive)

Not Predicting $x$ with a $y$ -on- $x$ Regression Equation