Water: One of the most significant oxides of hydrogen is water. The speciality of water is that most life forms on Earth cannot survive without...
Water Structure and Properties: Hydrogen Bonding, Dipole Moment
December 11, 2024Correlation: We have learned that the measures of central tendencies are mean, median and mode. If we examine two or more observations, the central value may be the same, but there may be significant differences in the distribution’s formation. This will assist us in comprehending the characteristics of a distribution.
Similarly, dispersion is the extent to which values in a distribution differ from the centre. The measures of dispersion are range, quartiles, average deviation, and standard deviation. Comparing two numeric variables and studying the relationship between them involves the analysis and study of two variables. Such a study is known as correlation. In this article, we will learn about correlation in detail.
Correlation means connection. Correlation analysis studies the relationship or connection between two or more variables. Two variables are said to be correlated if they differ in such a way that changes in one variable accompany changes in the other.
Example: The relationship between a student’s height and weight in a class, as well as the relationship between a family’s earnings and the amount spent each month.
Know About Statistics Formula Here
Correlation is a statistical tool used to establish the relationship between two or more variables. It defines the relationship between two variables.
Example: As summer approaches, the heat rises, and atmospheric temperature increases. So, people tend to travel to hill stations to enjoy the cold weather. Hence, the hill stations get crowded. Similarly, we can observe that the sales of ice creams, cool drinks, and fruits like watermelon increased during this period.
Correlation is a means of systematically examining such relationships or associations.
Although correlation measures the direction and degree of correlation, it does not say anything about the cause-and-effect relationship between two or more variables.
Example: We know that the demand for a commodity and its price are closely related. But, we cannot say exactly whether demand is causing the changes in price or whether it is the changes in price that is causing the demand.
Even though the cause-and-effect relationship cannot be established, we can conclude that the two variables, demand and price, are correlated.
Thus, correlation does not establish the causation, cause, and effect in a relationship.
There are three types of correlations between the two variables.
1. Positive Correlation
A positive correlation occurs when the values of two variables move in the same direction. In other words, an increase or decrease in one variable causes an increase or decrease in the other variable.
Examples:
2. Negative Correlation
A negative correlation occurs when the values of two variables move in opposite directions. In other words, an increase or decrease in one variable causes a decrease or increase in the other variable. When the variable \(x\) increases, the variable \(y\) decreases.
Examples:
3. No Correlation
When there is no linear dependence or relationship between two variables, there is said to be no correlation. Assume two variables have no correlation; this means they do not appear to be statistically related. The value of one variable does not change in relation to the value of the other variable.
Various methods are there for studying correlation. The most commonly used methods are:
1. Scatter Plots
A scatter diagram is a mathematical diagram that displays values for a two-variable data set using Cartesian coordinates. It is also known as a scatter graph, scatter chart, scattergram, or scatter diagram.
A scatter plot is a simple but helpful technique for visually examining the correlation of two variables without any numerical calculation. When scatter plots are used, the given data are plotted on a graph in the form of dots. For each pair of \(x\) and \(y\) values, we put a dot, and we get as many dots on the graph paper as the number of observations.
Various Parts of a Scatter Plot
In a scatter plot, the explanatory variable is on the \(x-\)axis, and the response variable is on the \(y-\) axis. An explanatory variable, also known as the independent variable or predictor variable, explains the variation in the response variable, also known as a dependent variable or an outcome variable.
The value of the response variable responds to changes in the explanatory variable.
Advantages of Scatter Plot
The scatter plot helps in describing the association by analysing the following factors.
From a scatter plot, we can understand whether the correlation is positive or negative, linear or not, whether the data is tightly clustered, and if there is the presence or absence of any outliers.
With the scatter of dots in the graph, we can form an idea of the nature of the relationship.
2. Karl Pearson’s Coefficient of Correlation
The product-moment correlation and simple correlation coefficient are other names for Karl Pearson’s coefficient of correlation. It calculates the degree of a linear relationship between two variables and provides a precise numerical value. It is denoted by the symbol \(r\).
Properties
Karl Pearson’s Coefficient of Correlation has the following observational properties:
Limitations
3. Spearman’s Rank Correlation
In some instances, the variables cannot be measured meaningfully. Such variables are called attributes.
Example: Honesty, beauty, and intelligence
We cannot assign definite values to such attributes. The ranking is considered a better alternative to quantify these attributes. If we want to study the relationship between two attributes, rank correlation is better than simple correlation. Spearman’s rank correlation assesses the strength and direction of the relationship between two ranked variables. It essentially measures the monotonicity of a relationship between two variables. In other words, it tells how well the relationship between two variables can be represented using a monotonic function.
Q.1. Tom has started a new catering business, where he is first analysing the cost of making a sandwich and what price should he sell them. He has gathered the below information after talking to various other cooks.
Number of Sandwiches | Cost of Bread | Vegetable | Total Cost |
\(10\) | \(100\) | \(30\) | \(130\) |
\(20\) | \(200\) | \(60\) | \(260\) |
\(30\) | \(300\) | \(90\) | \(390\) |
\(40\) | \(400\) | \(120\) | \(520\) |
Tom was convinced that there is a positive linear relationship between the number of sandwiches and the total cost of making them. Analyse if this statement is true.
Sol: Plot the points between the number of sandwiches prepared versus the cost of making them.
Observe that there is a positive relationship between them.
Q.2. Find the correlation coefficient between the data given below.
Roll No. | Marks in Subject A | Marks in Subject B |
\(1\) | \(48\) | \(45\) |
\(2\) | \(35\) | \(20\) |
\(3\) | \(17\) | \(40\) |
\(4\) | \(23\) | \(25\) |
\(5\) | \(47\) | \(45\) |
Sol: Let the marks in subject \(A\) be denoted by \(x\) and that in subject \(B\) by \(y\).
\(X\) | \(\left( {X – 34} \right)\) \( = x\) | \({x^2}\) | \(Y\) | \(\left( {Y – 35} \right)\) \( = y\) | \({y^2}\) | \(xy\) |
\(48\) | \(14\) | \(196\) | \(45\) | \(10\) | \(100\) | \(140\) |
\(35\) | \(1\) | \(1\) | \(20\) | \(-15\) | \(125\) | \(-15\) |
\(17\) | \(-17\) | \(289\) | \(40\) | \(5\) | \(25\) | \(-85\) |
\(23\) | \(-11\) | \(121\) | \(25\) | \(-10\) | \(100\) | \(110\) |
\(47\) | \(+13\) | \(169\) | \(45\) | \(10\) | \(100\) | \(130\) |
\(\sum {x = 170} \) | \(\sum {x = 10} \) | \(\sum {{x^2}} \) | \(\sum Y \) \(=175\) | \(\sum {y = 0} \) | \(\sum {{y^2}} \) \(=550\) | \(\sum {xy} \) \(=280\) |
Mean, \(\frac{{\sum x }}{5} = 34\)
\(r = \frac{{\sum {xy} }}{{\sqrt {\sum {{x^2}} \sum {{y^2}} } }}\)
\( = \frac{{280}}{{\sqrt {776} \times 550}}\)
\( = \frac{{280}}{{653.3}}\)
\(\therefore \,\,r = 0.429\)
Since \(r=0.429\) it means that there is a moderate positive correlation between both the subjects \(A\) and \(B\).
Q.3. Draw the scatter diagram showing a positive correlation.
Sol:
When the points in the graph are rising, moving from left to right, then the scatter plot shows a positive correlation.
Q.4. The following data gives the heights (in inches) of a father and his eldest son. Compute the correlation coefficient between the heights of fathers and sons using Karl Pearson’s method.
Height of father | \(65\) | \(66\) | \(67\) | \(67\) | \(68\) | \(69\) | \(70\) | \(72\) |
Height of son | \(67\) | \(68\) | \(65\) | \(68\) | \(72\) | \(72\) | \(69\) | \(71\) |
Sol:
Let \(x\) denote height of father and \(y\) denote height of son. The data is on the ratio scale.
We use Karl Pearson’s method.
\(r = \frac{{\sum\nolimits_{i = 1}^n {{x_i}{y_i} – \sum\nolimits_{i = 1}^n {{x_i}\sum\nolimits_{i = 1}^n {{y_i}} } } }}{{\sqrt {n\sum\nolimits_{i = 1}^n {x_i^2 – {{\left( {\sum\nolimits_{i = 1}^n {{x_i}} } \right)}^2}} } \sqrt {n\sum\nolimits_{i = 1}^n {y_i^2 – {{\left( {\sum\nolimits_{i = 1}^n {{y_i}} } \right)}^2}} } }}\)
\({x_i}\) | \({y_i}\) | \(x_i^2\) | \(y_i^2\) | \({x_i}{y_i}\) |
\(65\) | \(67\) | \(4225\) | \(4489\) | \(4355\) |
\(66\) | \(68\) | \(4356\) | \(4624\) | \(4488\) |
\(67\) | \(65\) | \(4489\) | \(4225\) | \(4355\) |
\(67\) | \(68\) | \(4489\) | \(4624\) | \(4556\) |
\(68\) | \(72\) | \(4624\) | \(5184\) | \(4896\) |
\(69\) | \(72\) | \(4761\) | \(5184\) | \(4968\) |
\(70\) | \(69\) | \(4900\) | \(4761\) | \(4830\) |
\(72\) | \(71\) | \(5184\) | \(5041\) | \(5112\) |
\(544\) | \(552\) | \(37028\) | \(38132\) | \(37560\) |
\(r = \frac{{8 \times 37560 – 544 \times 552}}{{\sqrt {8 \times 37028 – {{\left( {544} \right)}^2}} \sqrt {8 \times 38132 – {{\left( {552} \right)}^2}} }} = 0.603\)
Heights of father and son are positively correlated. It means that on the average, if fathers are tall then sons will probably tall and if fathers are short, probably sons may be short.
Q.5. Calculate the Spearman’s rank correlation coefficient for the following data.
Candidates | \(1\) | \(2\) | \(3\) | \(4\) | \(5\) |
Marks in Tamil | \(75\) | \(40\) | \(52\) | \(65\) | \(60\) |
Marks in English | \(25\) | \(42\) | \(35\) | \(29\) | \(33\) |
Sol:
Tamil | English | \({D_i} = {R_{1i}} – {R_{2i}}\) | \(D_i^2\) | ||
Marks | Rank \(\left( {{R_{1i}}} \right)\) | Marks | Rank \(\left( {{R_{2i}}} \right)\) | ||
\(75\) | \(1\) | \(25\) | \(5\) | \(-4\) | \(16\) |
\(40\) | \(5\) | \(42\) | \(1\) | \(4\) | \(16\) |
\(52\) | \(4\) | \(35\) | \(2\) | \(2\) | \(4\) |
\(65\) | \(2\) | \(20\) | \(4\) | \(-2\) | \(4\) |
\(60\) | \(3\) | \(33\) | \(3\) | \(0\) | \(0\) |
\({\sum\limits_{i – 1}^n {D_i^2\,\,} }\) | \(40\) |
\({\sum\limits_{i – 1}^n {D_i^2 = 40} }\) and \(n=5\)
\(\rho = 1 – \frac{{6\sum\nolimits_{i = 1}^n {D_i^2} }}{{n\left( {{n^2} – 1} \right)}}\)
\( = 1 – \frac{{6 \times 40}}{{5\left( {{5^2} – 1} \right)}} = 1 – \frac{{240}}{{5\left( {24} \right)}} = – 1\)
Interpretation: This perfect negative rank correlation \(-1\) indicates that scores in the subjects totally disagree. A student who is best in Tamil is the weakest in English and vice-versa.
Correlation defines the relationship between two variables. Correlation expresses the direction and strength of the relationship. It says nothing about a cause-and-effect relationship between two or more variables. Correlation can be positive or negative, or it can be non-existent. Scatter plots, Karl Pearson’s Coefficient of Correlation, and Spearman’s Rank Correlation are the most commonly used methods for studying correlation. The degree of correlation is expressed by the value of \(r\). If \(r=+1\), the variables are highly positively correlated. If the value is \(-1\), the variables are highly correlated or have a perfect negative correlation, and if the value is \(0\), there is no correlation between the variables.
Q.1. What are the types of correlation?
Ans. Correlation can be of three types:
(i) Positive Correlation
(ii) Negative Correlation
(iii) No Correlation
Q.2. How is correlation calculated?
Ans: The correlation can be calculated using Spearman’s rank correlation coefficient. It is given by:
\(\rho = 1 – \frac{{6\sum {d_i^2} }}{{n\left( {{n^2} – 1} \right)}}\)
Q.3. What does a correlation of 1 mean?
Ans: A correlation of \(+1\) indicates that there is a perfect positive correlation. This means that both variables move in the same direction at the same time.
Q.4. What is correlation? Give examples.
Ans: Correlation means connection. Correlation analysis studies the relationship or connection between two or more variables. Two variables are said to be correlated if they differ in such a way that changes in one variable accompany changes in the other.
Example: The relationship between a student’s height and weight in a class, as well as the relationship between a family’s earnings and the amount spent each month.
Q.5. Is correlation positive or negative?
Ans: Correlation can be positive or negative. A positive correlation is when the value of one variable increases when the other increases. A negative correlation is when the value of one variable decreases when the other increases.
Learn Everything About Regression Here
We hope this detailed article on Correlation will make you familiar with the topic. If you have any inquiries, feel to post them in the comment box. Stay tuned to embibe.com for more information.