Angle between two planes: A plane in geometry is a flat surface that extends in two dimensions indefinitely but has no thickness. The angle formed...
Angle between Two Planes: Definition, Angle Bisectors of a Plane, Examples
November 10, 2024The term “Regression” refers to the process of determining the relationship between one or more factors and the output variable. The outcome variable is called the response variable, whereas the risk factors and co-founders are known as predictors or independent variables. In regression analysis, the dependent variable is represented by “y”, while the independent variables are represented by “x.”
There are various types of regression analysis, including linear, multiple linear, and nonlinear. Simple linear and multiple linear models are the most common. Nonlinear regression analysis is used for more complex data sets. The relationships between the dependent and independent variables show a nonlinear relationship. Read on to learn about
“Regression” comes from the word “regress,” derived from the Latin word “regressus,” which means “to go back” (to something). So, regression is the technique that helps you “to go back” from a jumbled, difficult-to-understand set of data to a simpler, more meaningful model.
Regression is a statistical technique used in economics, investing, and other fields to evaluate the strength and nature of a relationship between one dependent variable (usually denoted by \(Y\)) and a set of other variables (known as independent variables). Regression attempts to find a mathematical relationship between a set of random variables thought to predict \(Y\).
Simple linear regression and multiple linear regression are the two basic types of regression. Multiple linear regression uses two or more independent variables to predict the outcome of the dependent variable \(Y\). In contrast, simple linear regression uses one independent variable to describe or predict the outcome of the dependent variable \(Y\).
There are several types of regression, including linear, multiple linear, and nonlinear. Simple linear and multiple linear models are the most common. However, nonlinear regression analysis is widely used for more complex data sets with nonlinear relationships between the dependent and independent variables.
The general form of regression is:
Where:
\(y = \) Dependent Variable
\(x = \) Independent Variable
\(a = y – {\text{Intercept}} = \frac{{\sum y \sum {{x^2}} – \sum x \sum x y}}{{n\left( {\sum {{x^2}} } \right) – {{\left( {\sum x } \right)}^2}}}\)
\(b = {\text{Slope}}\,{\text{of}}\,{\text{the}}\,{\text{line}} = \frac{{n\sum x y – \left( {\sum x } \right)\left( {\sum y }\right)}}{{n\left( {\sum {{x^2}} } \right) – {{\left( {\sum x } \right)}^2}}}\)
\(x\) and \(y\) are two variables on the regression line
\(x =\) Values of the first data set
\(y =\) Values of the second data set
\(∈=\) Residuals
Where:
\(Y=\) Dependent Variable
\(X,{X_1},{X_2},{X_3}, \ldots ,{X_s} = \) Independent Variables
\({b_1},{b_2},{b_3}, \ldots ,{b_s} = \) Slopes
\(a=\) Intercept
\(∈ =\) Residuals
Residuals are the difference between the observed dependent value and the predicted value. Each data has one residual. The mean of residuals in the linear regression is always \(0\).
A regression line can depict a positive, negative, or no linear relationship.
For a Simple Linear Regression: \(y=a+bx\)
Case 1: If \(b=\) slope of line \(=0⟹\) There is no connection: In simple linear regression, the graphed line is flat (not sloped). The two variables do not have any relationship.
Case 2: If \(b=\) slope of line \(=+ve⟹\) The regression line slopes upward, with the lower end of the line at the graph’s \(y\)-intercept (axis) and the upper end of the line extending upward into the graph field, away from the \(x\)-intercept (axis). The two variables have a positive linear relationship: as the value of one rises, the value of the other rises as well.
Case 3: \(b=\) slope of line \(=-ve⟹\) The regression line slopes downward, with its upper end at the graph’s \(y\)-intercept (axis) and its lower end extending downward into the graph field, toward the \(x\)-intercept (axis). The two variables have a negative linear relationship: as the value of one increases, the value of the other decreases.
Given \(n\) data pairs \(\left({{x_1},\,{y_1}} \right)….\left({{x_n},\,{y_n}} \right)\), the best fit for the straight-line regression model
\(y=a+bx\) is found by the method of least squares.
Starting with the sum of square of the residuals, \(S\) we get
\(S = \sum\limits_{i = 1}^n {{{\left( {{y_i} – a – b{x_i}} \right)}^2}} \)
And using
\(\frac{{\partial S}}{{\partial a}} = 0\) and \(\frac{{\partial S}}{{\partial b}} = 0\)
gives two simultaneous linear equations whose solution (which gives the minimum value of \(S\)) is
\(a=y-\) \({\text{Intercept}} = \frac{{\sum y \sum {{x^2}} – \sum x \sum x y}}{{n\left( {\sum {{x^2}} } \right) – {{\left( {\sum x } \right)}^2}}}\)
\(b = {\text{Slope}}\,{\text{of}}\,{\text{the}}\,{\text{line}} = \frac{{n\sum x y – \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {\sum {{x^2}} } \right) – {{\left( {\sum x } \right)}^2}}}\)
The difference between correlation and regression are as follows:
Following are some of the most popular applications of regression:
Q.1. Find the regression line for the following set of data
\(\{ ( – 1,0),(0,2),(1,4),(2,5)\} \)
Ans: Let the regression line is
\(y=a+bx\)
\(x\) | \(y\) | \(xy\) | \({x^2}\) |
\(-2\) | \(-1\) | \(2\) | \(4\) |
\(1\) | \(1\) | \(1\) | \(1\) |
\(3\) | \(2\) | \(6\) | \(9\) |
\(\sum x = 2\) | \(\sum y = 2\) | \(\sum x y = 9\) | \(\sum {{x^2}} = 14\) |
\(a = y – {\text{intercept}} = \frac{{\sum y \sum {{x^2}} – \sum x \sum x y}}{{n\left( {\sum {{x^2}} }\right) – {{\left( {\sum x } \right)}^2}}} = \frac{{2 \times 14 – 2 \times 9}}{{3 \times 14 – 4}} = \frac{5}{{19}}\)
\(b = {\text{Slope}}\,{\text{of}}\,{\text{the}}\,{\text{line}} = \frac{{n\sum x y – \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {\sum {{x^2}} ) – {{\left( {\sum x } \right)}^2}} \right.}} = \frac{{3 \times 9 – 2 \times 2}}{{3 \times 14 – 4}} = \frac{{23}}{{38}}\)
Hence, \(y = \frac{5}{{19}} + \frac{{23}}{{38}}x\) is the required regression line.
Q.2. The value of \(x\) and their corresponding values of \(y\) are shown in the table below
\(x\) | \(0\) | \(1\) | \(2\) | \(3\) | \(4\) |
\(y\) | \(2\) | \(3\) | \(5\) | \(4\) | \(6\) |
Find the regression line \(y=a+bx\) and also estimate the value of \(y\) when \(x=10\)
Ans:
\(x\) | \(y\) | \(xy\) | \({x^2}\) |
\(0\) | \(2\) | \(0\) | \(0\) |
\(1\) | \(3\) | \(3\) | \(1\) |
\(2\) | \(5\) | \(10\) | \(4\) |
\(3\) | \(4\) | \(12\) | \(9\) |
\(4\) | \(6\) | \(24\) | \(16\) |
\(\sum x = 10\) | \(\sum y = 20\) | \(\sum x y = 49\) | \(\sum{{x^2}} = 30\) |
\(a = y – {\text{intercept}} = \frac{{\sum y \sum {{x^2}} – \sum x \sum x y}}{{n\left( {\sum {{x^2}} } \right) – {{\left( {\sum x } \right)}^2}}} = \frac{{20 \times 30 – 10 \times 49}}{{5 \times 30 – 100}} = \frac{{110}}{{50}} = 2.2\)
\(b = {\text{Slope}}\,{\text{of}}\,{\text{the}}\,{\text{line}} = \frac{{n\sum x y – \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {{{\sum x }^2}} \right) – {{\left( {\sum x } \right)}^2}}} = \frac{{5 \times 49 – 10 \times 20}}{{5 \times 30 – 100}} = \frac{{45}}{{50}} = 0.9\)
Hence, \(y=2.2+0.9x\) is the required regression line.
So, When \(x=10, y=2.2+0.9×10=11.2\)
Q.3. The sales of a company (in a million dollars) for each year are shown below:
\(x\)(year) | \(2005\) | \(2006\) | \(2007\) | \(2008\) | \(2009\) |
\(y\)(sales) | \(12\) | \(19\) | \(29\) | \(37\) | \(45\) |
Find the regression line \(y=a+bx\) and also estimate the sales of the company in \(2012\).
Ans: We are rewriting the above table to reduce the calculation to be involved.
\(x\)(years after \(2005\)) | \(0\) | \(1\) | \(2\) | \(3\) | \(4\) |
\(y\)(sales) | \(12\) | \(19\) | \(29\) | \(37\) | \(45\) |
\(x\) | \(y\) | \(xy\) | \({x^2}\) |
\(0\) | \(12\) | \(0\) | \(0\) |
\(1\) | \(19\) | \(19\) | \(1\) |
\(2\) | \(29\) | \(58\) | \(4\) |
\(3\) | \(37\) | \(111\) | \(9\) |
\(4\) | \(45\) | \(180\) | \(16\) |
\(\sum x = 10\) | \(\sum y = 142\) | \(\sum x y = 368\) | \(\sum{{x^2}} = 30\) |
\(a = y – {\text{intercept}} = \frac{{\sum y \sum {{x^2}} – \sum x \sum x y}}{{n\left( {\sum {{x^2}} } \right) – {{\left( {\sum x } \right)}^2}}} = \frac{{142 \times 30 – 10 \times 368}}{{5 \times 30 – 100}} = \frac{{580}}{{50}} = 11.6\)
\(b = {\text{Slope}}\,{\text{of}}\,{\text{the}}\,{\text{line}} = \frac{{n\sum xy – \left( {\sum x } \right)\left( {\sum y } \right)}}{{n\left( {\sum {{x^2}} } \right) – {{\left( {\sum x } \right)}^2}}} = \frac{{5 \times 368 – 10 \times 142}}{{5 \times 30 – 100}} = 8.4\)
Hence, \(y=11.6+8.4x\) is the required regression line.
So, For Sales in \(2012,\) i.e. \(7\) years after \(2005\)
So, for \(x=7, y=11.6+8.4×7=70.4\) million dollars.
Regression is a statistical technique used in economics, investing, and other fields to evaluate the strength and nature of a relationship between one dependent variable. We have learnt about the regression formula & its application in real-life situations. It has immense uses in the real world that led to a significant role in this mathematical world.
Q.1. What is the concept of regression?
Ans: A collection of statistical methods for estimating relationships between a dependent variable and one or more independent variables is known as regression. It can be used to determine the strength of a relationship between variables and to predict how they will interact in the future.
Q.2. Why do we use regression?
Ans: A regression analysis can be used for either of two purposes: predicting the value of the dependent variable for individuals for whom knowledge about the explanatory variables is available or estimating the impact of any explanatory variable on the dependent variable.
Q.3. How does regression work?
Ans: Regression is a method of predicting the values of a dependent variable by using an independent variable. A line of best fit is used in linear regression to derive an equation from the training dataset, which can then be used to predict the values of the testing dataset. The equation can be written as \(y=mx+b\), where \(y\) is the expected value, \(m\) is the line’s gradient, and \(b\) is the line’s intersection with the \(y\)-axis.
Q.4. What are regression and its types?
Ans: Regression is a powerful statistical tool that helps us to examine the relationship between two or more variables of interest. There are several types of regression, including linear, multiple linear, and nonlinear. Simple linear and multiple linear models are the most common.
Q.5. What is an example of regression?
Ans: In an industry, regressions can be used to assess patterns and make predictions or forecasts. For example, suppose a company’s sales have been increasing steadily every month for the past few years. In that case, the company might predict sales in future months by running a linear analysis on the sales data with monthly sales.
Some other helpful articles by Embibe are provided below:
We hope this article on regression has provided significant value to your knowledge. If you have any queries or suggestions, feel to write them down in the comment section below. We will love to hear from you. Embibe wishes you all the best of luck!