• Written By Keerthi Kulkarni
  • Last Modified 25-12-2024

Line of Regression: Definition, Formula, Equation

img-icon

Line of Regression: Applying a linear equation to observed data, linear regression attempts to demonstrate the relationship between two variables. One variable is independent, while the other is dependent. For example, the weight of a person is proportional to their height. We can say that a linear relationship exists between the person’s height and weight. The weight of a person increases in proportion to their height.

Regression coefficients are estimates of unknown parameters that describe the relationship between a predictor variable and its corresponding response. In this article, let us learn about the line of regression, including its definition, equation and coefficients.

Line of Regression: What is a Regression Line?

A line that describes how a set of data behaves is called a regression line. In other words, it provides the best trend from the available data.

One variable is not required to be dependent on another, or that one causes changes in the other, but there must also be some critical relationship between the two variables. In such cases, a scatter plot indicates the strength of the relationship between the variables.

The scatter plots do not show any increasing or decreasing pattern if there is no relationship or linking between the variables. In such cases, the linear regression is ineffective with the given data.

Equation of Line of Regression

The correlation coefficient defines the strength of a relationship between two variables. This coefficient’s value ranges from -1 to +1. This coefficient represents the strength of the observed data’s association with two variables.

Equation of Line of Regression

A linear regression line equation is written as y = a + bx, where x is the independent variable and is plotted along the x-axis.

The dependent variable, y, is plotted along the y-axis. The line’s slope is b, and the y-intercept is a.

Linear Regression

Linear regression depicts the relationship between two variables in a linear fashion. The linear regression equation is similar to the slope formula. It is calculated as y=a+bx.

Now, let us determine the value of the slope of the line, b, and the y-intercept, a.

a=(y)(x2)(x)(xy)n(x2)(x)2 and b=(xy)(x)(y)n(x2)(x)2

Simple Linear Regression

Simple linear regression is the primary cause of a single scalar predictor variable x and a single scalar response variable y. This regression equation is represented as y=a+bx

A multiple linear regression, also known as multivariable linear regression, is the extension to multiple and vector-valued predictor variables. 

Almost all real-world regression patterns contain multiple predictors, and explanations of linear regression are frequently expressed in terms of the multiple regression form. However, in these cases, the dependent variable y is still a scalar.

Regression Analysis

Regression coefficients are estimates of unknown parameters that describe the relationship between a predictor variable and its corresponding response. We can say that regression coefficients are used to forecast the value of an unknown variable based on the value of a known variable.

Linear regression determines the straight-line equation that quantifies how a unit change in an independent variable causes a change in the dependent variable. This is referred to as regression analysis.

Correlation Coefficients

Correlation coefficient is a statistical concept that assists in establishing a relationship between predicted and actual values obtained in an experiment. The calculated correlation coefficient value explains the closeness of the predicted and actual values.

The value of the correlation coefficients lies between 1 and +1. If the correlation coefficient value is positive, the two variables have a similar and identical relationship.

Otherwise, it denotes the dissimilarity of the two variables. It is expressed as a number known as the correlation coefficient. Correlations are classified into three types:

Positive CorrelationThe value of one variable increases linearly as the value of the other variable increases. This indicates that both variables have a similar relationship. In this case, the correlation coefficient would be positive, or 1.
Negative CorrelationWhen the value of one variable fall while the values of the other variable fall, it is said to be negatively correlated. The correlation coefficient would be negative in that case.
Zero CorrelationAnother situation occurs when there is no specific relationship between two variables.

Pearson’s Correlation

Pearson’s correlation coefficient is the most common type of correlation coefficient. Pearson’s correlation (also known as Pearson’s r) is a correlation coefficient that is frequently used in linear regression.

The linear correlation coefficient, denoted by r, defines the degree of relationship between two variables. It is known as the cross-correlation coefficient because it predicts the relationship between two variables.

If x and y are the two variables under consideration, the correlation coefficient can be computed using the formula.

r=n(xy)(x)(y)[nx2(x)2][ny2(y)2]

Here,

n= number of values or elements

x= Sum of 1st values list

y= Sum of 2nd values list

xy= Sum of the product of 1st and 2nd values

x2= Sum of squares of 1st values

y2= Sum of squares of 2nd values

Solved Examples – Line of Regression

Below are a few solved examples that can help in getting a better idea:

Q.1. Find the linear regression equation for the data given below:

X2358
Y36512

Ans:

XYX2XY
2346
36918
552525
8126496
X=18Y=26X2=102XY=145

Linear regression equation is Y=a+bX
By using the formula, we will get the values of a and b
b=nxy(x)(y)nx2(x)2
b=4×145(18)×(26)4×102(18)2
=11284
b=1.33
a=yb(x)n
a=261.33×184
a=0.515
Hence, the linear regression equation is Y=0.515+1.33X.

Q.2. For the following two sets of data, find a linear regression equation

x2468
y37510

Ans:

xyx2xy
2346
471628
653630
8106480
Σx=20Σy=25Σx2=120Σxy=144

The linear regression equation is Y=a+bX
By using the formula, we will get the values of a and b
b=nxy(x)(y)nx2(x)2
b=4×144(20)×(25)4×120(20)2=7680=0.95
a=yb(x)n
a=250.95×204=1.5
Hence, the linear regression equation is Y=1.5+0.95X.

Q.3. Find the regression coefficients

AgeGlucose Level
4399
2165
2579
4275
5787
5981

Ans:

Age (x)Glucose Level (y)xyx2
439942571849
21651365441
25791975625
427531501764
578749593249
598147793481
Total =2474862048511409

The regression equation is Y=a+bX
By using the formula, we will get the values of a and b
b=nxy(x)(y)nx2(x)2
b=6×20485(247)×(486)6×11409(247)2=28687445=0.385
a=yb(x)n
a=4860.385×2476=65.15
Hence, the linear regression equation is Y=65.15+0.385X.

Q.4. Find the line of regression for the below data:

AB
6.254.03
6.54.02
6.54.02
64.04
6.254.03
6.254.03

Ans:

XYXYX2
6.254.0325.1939.06
6.54.0226.1342.25
6.54.0226.1342.25
64.0424.2436
6.254.0325.1939.06
6.254.0325.1939.06
Total =37.7524.17152.06237.69

The line of regression is Y=a+bX
By using the formula, we will get the values of a and b
b=nxy(x)(y)nx2(x)2
b=6×152.06(37.75)×(24.17)6×237.6937.752
b=0.04
a=yb(x)n
a=24.17(0.04)×37.756
a=4.28
Hence, the line of regression is Y=0.04X+4.28

Q.5. Find the Pearson’s coefficient for the given data

AgeGlucose Level
4399
2165
2579
4275
5787
5981

Ans:

Age (x)Glucose Level (y)xyx2Y2
4399425718499801
216513654414225
257919756256241
4275315017645625
5787495932497569
5981477934816561
Total =247486204851140940022

The Pearson’s correlation coefficient is given by
r=n(xy)(x)(y)[nx2(x)2][ny2(y)2]
r6×20485(247×486)[6×11409(247)2][6×40022(486)2]
r=0.5298
Hence, the correlation coefficient is 0.5298.

Summary

Linear regression is the most fundamental and widely used type of predictive analysis in statistics. Its entire concept is to investigate two things. First, determine whether a set of predictor variables accurately predicts an outcome.

Second, determine which variables, in particular, are significant predictors of the outcome variable and how.

These regression estimates are extremely helpful in explaining the relationship between one or more independent variables and one dependent variable. The linear equation is the most basic form. Correlation coefficients are used to assess the strength of a relationship between two variables.

Pearson’s correlation is a correlation coefficient that is frequently used in linear regression.

FAQs on Line of Regression

Students might be having many questions with respect to the Line of Regression. Here are a few commonly asked questions and answers.

Q.1. What does the line of a regression tell you?
Ans: 
The regression line depicts the relationship between the independent and dependent variables.

Q.2. How do you find a regression line?
Ans:
 The equation for a linear regression line is Y=a+bX, where X is the explanatory variable and Y is the dependent variable. The slope of a line is b, and the intercept (the value of y when x=0) is a.

Q.3. What is the regression line called?
Ans:
 The regression line is also known as the “line of best fit” because it is the line that fits the best when drawn through the points. It is a line that minimises the difference between actual and predicted scores.

Q.4. What are examples of linear regression?
Ans:
The number of sales and the effect of fertiliser on the total crops, agricultural scientists use the linear regression. Doctors use to find the dosage and effect of the drug on blood pressure etc.

Q.5. What is the coefficient of correlation?
Ans: The correlation coefficient is a statistical concept that aids in establishing a relationship between the predicted and actual values from a statistical experiment.

We hope this information about the Line of Regression has been helpful. If you have any doubts, comment in the section below, and we will get back to you soon.

Stay tuned to embibe for the latest update on Line of Regression.

Unleash Your True Potential With Personalised Learning on EMBIBE