Ungrouped Data: When a data collection is vast, a frequency distribution table is frequently used to arrange the data. A frequency distribution table provides the...
Ungrouped Data: Know Formulas, Definition, & Applications
December 11, 2024Introduction to Statistics: The word statistics seems to have derived from the Latin word status, which means a political state. Originally statistics was simply the collection of numerical data on some aspects of people’s lives. However, over time, its scope broadened.
Today, statistics mean a collection of facts or information concerning almost every aspect of people’s lives with a definite purpose in the form of numerical data, organisation, summarisation, and presentation of data by tables and graphs (charts), analysing and drawing inferences from the data. In this article, we will cover the basic concepts used in statistics. Scroll down to learn more!
Statistics is a branch of mathematics that involves collecting, organising, interpreting, presenting, and analysing data. The \(5\) stages of statistics are problem, plan, data, analysis, conclusion. Based on the studies of data obtained, people can draw conclusions, make decisions and plan wisely.
When the group we want to collect data from is large, we often randomly select a smaller group to represent the entire group. This method of data collection is called random sampling. Random sampling makes data collection faster, less costly, and easier to analyse since the data population is smaller.
In Biology and engineering, we can set up control experiments from random samples of particular objects of interest.
1. The yield of cross-fertilisation of different types of rice can be measured using samples from a patch of rice fields.
2. The lifetime of a particular battery can be tested in a quality-control laboratory by using a small random sample of batteries.
Some data, such as air pollution levels and bird migration paths, are observed through real-life situations.
We usually conduct surveys to gather personal data and public opinion by requesting sample populations of target groups to complete a questionnaire. The survey may be performed through face-to-face interviews, telephone interviews, postal surveys, or Internet surveys.
Some data such as economic conditions and a country’s population are obtained by conducting large-scale surveys that individuals or small companies cannot do. We can receive these types of data from official statistics published by government organisations. The United Nations Statistical Yearbook contains a wide range of international economic, social, and environmental data.
Some of the important terms related to statistics are as follows:
1. Primary Data: The information collected by the collector himself with a definite purpose in their mind is called primary data.
2. Secondary Data: The information gathered from a source already stored is called secondary data.
3. Raw Data: The numerical data recorded in its original form as collected by the investigator or received from some source is called raw data.
4. Variable: A quantity that is being measured in an experiment or survey is called a variable. Height, age and weight of people, income and expenditure of people, number of members in a family, number of workers in a factory, marks obtained by a student in a test, the number of runs scored in a cricket test, etc., are examples of variables.
Variables are of two types:
5. Range: The difference between the maximum and minimum values of a variable is called its range.
6. Variate: A particular value of a variable is called variate (observation).
7. Frequency: The number of times a variate (observation) occurs in a given data is called the frequency of that variate.
8. Frequency Distribution: A tabular arrangement of given numerical data showing the frequency of different variates is called frequency distribution, and the table itself is called frequency distribution table.
There are mainly two types of statistics, and they are;
Inferential statistics is mainly related to and associated with hypothesis testing, whose main target is to reject the null hypothesis. Hypothesis testing is a type of inferential procedure that takes the help of sample data to evaluate and assess the credibility of a hypothesis about a population. Inferential statistics are generally used to determine how strong the relationship is within the sample.
We can broadly classify statistics as shown below:
Suppose there are \(32\) students in Class \({\rm{IX}}\) in a school and in an examination, out of \(50\) marks, the marks obtained by them are as follows:
\(39,44,25,11,21,25,44,25,7,40,43,44,49,14,11,14,25,28,28,39,44,37,21,40,\)
\(43,3,37,25,25,21,37,38\)
The data in this form is the raw (or ungrouped or unclassified) data. Here the number of marks obtained is the variable, and each entry in the above list is an observation or variate.
To make it easily understandable, we present the above data in a table called the frequency distribution table.
The frequency distribution table for the above raw data is given below:
The mean or arithmetic average of a number of observations is the sum of the values of all the observations divided by the total number of observations.
Mean \( = \frac{{{x_1} + {x_2} + {x_3} + \ldots {x_n}}}{n} = \frac{{\sum {{x_i}} }}{n},\) where \(\sum {{x_i}} = {x_{1 + }}{x_{2 + }}{x_{3 + }}{x_{4 + }} \ldots \ldots {x_n}.\)
Thus, mean \( = \frac{{{\rm{ Sum}}\,{\rm{of}}\,{\rm{all}}\,{\rm{observations }}}}{{{\rm{ Total}}\,{\rm{number}}\,{\rm{of}}\,{\rm{observations }}}}\)
Median is the central value of statistical data if it is arranged in ascending or descending order. Thus, if there are \(n\) observations \({x_{1,}}{x_{2,}}{x_{3,}}{x_{4 + }} \ldots \ldots {x_n}\) arranged in ascending or descending order then,
Median \( = {\left( {\frac{{n + 1}}{2}} \right)^{th}}\) observation, if n is odd.
Median \( = \frac{{{{\left( {\frac{n}{2}} \right)}^{{\rm{th}}}}{\rm{ observation }} + {{\left( {\frac{n}{2} + 1} \right)}^{{\rm{th}}}}{\rm{ observation }}}}{2},\) if \(n\) is even.
The mode of a set of data is the value that occurs most often.
Suppose the sizes of eight coats sold by a boutique manager are \(7, 8, 8, 8, 9, 10, 12\) and \(16.\)
In this case, the mean \(=9.75\) and the median \(=8.5\)
These two measures of central tendency are not very meaningful to the manufacturer of the coats and the boutique manager because they are not production size numbers. Instead, they will be more interested to know the most popular size so that they can cater to the customer’s needs. The mode is sometimes used as a measure of central tendency.
In this case, the modal size or the distribution mode is \(8\) as it is the most popular.
Consider the class intervals \(1-5, 6-10, 11-15,….\) And the other class interval as \(1-10, 10-20, 20-30, 30-40,….\)
In the class interval \(1-5, 1\) is the lower limit, and \(5\) is the upper limit. If \(x\) is a member of this class, then \(1≤x≤5.\) Similarly, \(6\) is the lower limit, and \(10\) is the upper limit of class \(6-10.\) In this example, the classes are not overlapping but discontinuous. Such a frequency distribution is called discrete (or inclusive) distribution.
In the class interval \(1-10, 1\) is the lower limit, and \(10\) is the upper limit. If \(x\) is a member of this class, then \(1≤x≤10.\) Similarly, \(10\) is the lower limit, and \(20\) is the upper limit of class \(10-20.\) In this example, the classes are non-overlapping but continuous. Such a frequency distribution is called continuous (or exclusive) distribution.
If we measure height, weight and time, there may be fractions of a meter, kilogram and hour respectively, therefore we need continuous distribution.
To convert discrete classes into continuous classes, we require some adjustments.
Adjustment factor\( = \frac{{{\rm{ lower}}\,{\rm{limit}}\,{\rm{of}}\,{\rm{one}}\,{\rm{class – upper}}\,{\rm{limit}}\,{\rm{of}}\,{\rm{previous}}\,{\rm{class }}}}{2}\)
Subtract the adjustment factor from all the lower limits and add the adjustment factor to all the upper limits.
In a continuous distribution, the class limits are called true or actual class limits. The class limits obtained after adjustment in a discrete distribution are the true class or actual class limits. In discrete distribution, the original class limits are called the stated class limits.
Q.1. Find the mean of first \(6\) multiples of \(5.\)
Ans: The first \(6\) multiples of \(5\) are \(5, 10, 15, 20, 25\) and \(30.\)
The sum of these multiples \(=5+10+15+20+25+30=105\)
Number of multiples \(=6\)
Average \( = \frac{{{\rm{ Sum}}\,{\rm{of}}\,6\,{\rm{multiples }}}}{{{\rm{ Number}}\,{\rm{of}}\,{\rm{multiples }}}}\)
Average \( = \frac{{5 + 10 + 15 + 20 + 25 + 30}}{6}\)
Average \( = \frac{{105}}{6} = 17.5\)
Hence, the arithmetic mean of the first \(6\) multiples of \(5\) is equal to \(17.5.\)
Q.2. Find the mean of the first seven prime numbers.
Ans: First seven multiples are \(2, 3, 5, 7, 11, 13, 17\)
The sum of these prime numbers \(=2+3+5+7+11+13+17=58\)
Therefore their mean \( = \frac{{58}}{7} = 8\frac{2}{7}.\)
Q.3. Find the median of the following data. \(5, 3, 12, 0, 7, 11, 4, 3, 8\)
Ans: Arranging the given data in ascending order, we get
\(0, 3, 3, 4, 5, 7, 8, 11, 12\)
The total number of observations \(=n=9,\) which is odd.
Median \({\left( {\frac{{n + 1}}{2}} \right)^{{\rm{th}}}}\) observation
\( = \frac{{9 + 1}}{2} = 5\)
\({5^{{\rm{th}}}}\) observation, which is \(5.\)
Hence, the median is \(5.\)
Q.4. The number of goals scored by a football team in a series of matches is:
\(3, 1, 0, 7, 5, 3, 3, 4, 1, 2, 0, 2.\) Find the median of the data.
Ans: Arranging the number of goals in ascending order, we get,
\(0, 0, 1, 1, 2, 2, 3, 3, 3, 4, 5, 7\)
Here, \(n=12\)
Median \( = \frac{{{{\left( {\frac{n}{2}} \right)}^{{\rm{th}}}}{\rm{observation }} + {{\left( {\frac{n}{2} + 1} \right)}^{{\rm{th}}}}{\rm{observation }}}}{2}\)
\( = \frac{{{{(6)}^{{\rm{th}}}}{\rm{observation }} + {{(7)}^{{\rm{th}}}}{\rm{observation }}}}{2} = \frac{{2 + 3}}{2} = 2.5\)
Hence, the median of the data is \(2.5.\)
Q.5. If the mean of \(y+2, y+4, y+6, y+8\) and \(y+10\) is \(13,\) find the value of \(y.\)
Ans: Given
Mean of \(y+2, y+4, y+6, y+8\) and \(y+10\) is \(13.\)
Mean \( = \frac{{{\rm{ Sum}}\,{\rm{of}}\,{\rm{numbers }}}}{{{\rm{ Number}}\,{\rm{of}}\,{\rm{numbers }}}}\)
\(13 = \frac{{y + 2 + y + 4 + y + 6 + y + 8 + y + 10}}{5}\)
On further calculation, we get
\(13 \times 5 = 5y + 30\)
\( \Rightarrow 65 = 5y + 30\)
\( \Rightarrow 5y = 65 – 30\)
\( \Rightarrow 5y = 35\)
\( \Rightarrow y = 7\)
Hence, the value of \(y\) is equal to \(7.\)
Statistics is a branch of mathematics. It involves collecting, organising, interpreting, presenting, and analysing data. Statistics is divided into two types namely, descriptive and inferential statistics. Descriptive statistics describe the population either through numerical calculation or table or graph. Inferential statistics make inferences and predictions regarding the population based on a population sample. Furthermore, the five stages in statistics are problem, plan, data, analysis, and conclusion.
Q.1. What is the introduction to statistics?
Ans: Statistics is a branch of mathematics that involves collecting, organising, interpreting, presenting, and analysing data. Based on the studies of data obtained, people can draw conclusions, make decisions and plan wisely.
Q.2. Write a difference between primary data and secondary data.
Ans: The information collected by a collector himself with a definite purpose is called primary data, whereas the information gathered from a source already stored is called secondary data.
Q.3. What are the \(2\) types of statistics?
Ans: Statistics can be divided into two categories. They are:
1. Descriptive Statistics: Descriptive statistics uses data that describes the population either through numerical calculation or graph or table. It provides a graphical summary of data.
2. Inferential Statistics: Inferential Statistics makes inferences and predictions about the population based on a population sample. It generalises a large data set and applies probabilities to conclude. It is used for explaining the meaning of descriptive stats. Inferential Statistics is mainly related to and associated with hypothesis testing, whose main target is to reject the null hypothesis.
Q.4. What are the five stages of statistics?
Ans: The \(5\) stages of statistics are: Problem, Plan, Data, Analysis, Conclusion.
Q.5. Define variables with an example.
Ans: A quantity that is being measured in an experiment or survey is called a variable. Height, age and weight of people, income and expenditure of people, number of members in a family, number of workers in a factory, marks obtained by a student in a test, the number of runs scored in a cricket test, etc., are examples of variables.
Some other helpful articles by Embibe are provided below:
We hope this article on introduction to statistics has provided significant value to your knowledge. If you have any queries or suggestions, feel to write them down in the comment section below. We will love to hear from you. Embibe wishes you all the best of luck!