Effect of Student Behaviour on Learning Outcomes

Academic Success is an established term in the domain of education and assessment that has gone from common measures like grades in a series of exams to include many student outcomes over the years. There has been a rapid expansion of studies to identify measures that show academic success is not just the marks obtained in an exam but also the learning and holistic development of a student, including improvement in a student’s attitude towards any exam or academic problems, educational or otherwise. At Embibe, we have already developed various parameters to measure a student’s performance in tests like Embibe Score Quotient. We have been using various standardized models like Concept Mastery that rely heavily on Bayesian Knowledge Tracing Algorithm.

Recently, we developed a new metric- Sincerity Score that measures a student across 3 parameters and allocates a behavior or a combination of behaviors.

  1. Accuracy
    Percentage of questions the student got right to the total questions attempted by the student in the test.
  2. Attempt Percentage
    Percentage of questions attempted by the student to the total questions in the test.
  3. Time Percentage
    Percentage of time taken by the student to the total time allocated to the test.

Each parameter is classified into different partitions that culminate in 10 unique behaviors – 4 positive, 5 negative, and 1 neutral behavior. A rank is allocated to each of the behaviors, from best behavior to the worst possible behavior. Over 2.5 million valid test sessions are analyzed to develop an algorithm to identify test-on-test score improvement of every behavior compared to less positively ranked behavior. The results verify and quantify in a data-driven manner a fact long thought to be true that a student’s attitude helps determine their progress and can help even a below-average student achieve great results. Based on the thresholds and classification of students’ test session behavior, we can quantify how fast a behavior improves on average compared to less positive behavior and nudge students towards the appropriate behavior, and intimate students about the progress rate they can achieve with the improved behavior, thus improving learning outcomes.

Table 1: Different Sincerity Score behaviors with their metadata

Sincerity Score Meaning Rank/weight
(1->best,
10->worst)
Attribute
In Control The child is putting in the needed effort and succeeding often 1 Positive
Marathoner The stamina of the average session duration is high 2 Positive
Trying Hard  The child is putting much effort and still not unable to succeed often 3 Positive
Getting There    The stamina of the average session duration is average 4 Positive
Slow The child can succeed with much effort 5 Neutral
Train Harder  The child is not putting enough effort to succeed very often 6 Negative
Overconfident The child is dominantly overconfident and applying themself without putting enough effort 7 Negative
Low Confidence The child is not confident enough to apply themself 8 Negative
Jumping Around  The stamina of the average session duration very low 9 Negative
Careless
( Lack of interest, lack of focus, lack of concentration )
The child is dominantly underapplying themself to the material at hand and losing marks as a result. 10 Negative

Algorithm:

Inputs:

N: Total number of valid test sessions with positive score improvement

p: Number of Sincerity Score behaviors.

Outputs:

For every Sincerity Score behavior (Ranks 1-9), on average, how much faster the test-on-test improvement is observed to all less positive Sincerity Score behavior.

Glossary:

  • Pre-Test: Between any two successive tests( sorted by timestamp) given by a user on a specific exam, the first test amongst the two is called pre-test
  • Post-Test: Between any two successive tests( sorted by timestamp) given by a user on a specific exam, the second test amongst the two is called post-test
  • Valid Test Session: A test session is considered valid if a student :
    • spends >= 10% of time duration allocated to the test
    • answers >= 10% of total questions in the test
    • scores marks >= 0 in the test
  • Less Positive Sincerity Score Behaviour: A Sincerity Score behavior is considered less Positive than a second Sincerity Score behavior if its Rank value( based on Table 1) is greater than the second. Similarly, a Sincerity Score behavior is considered a better or more positive Sincerity Score behavior regarding second behavior if it is ranked lower than the second behavior.

Procedure:

  1. From all the test sessions of Embibe, only valid test sessions are considered.
    Time Complexity for this step = O(N)
  2. For every user, their test session was classified into a Sincerity Score behavior based on the accuracy, attempt percentage, and the time spent in that test session.
    Time Complexity for this step = O(N)
  3. The score difference between the pre-test and the post-test test (sorted by timestamp) given by the student with the same goal name and exam name is calculated.
    Time Complexity for this step = O(NlogN)
  4. Rank ( based on Table 1 ) is applied to the worst Sincerity Score behavior shown in the pre-test. A student may be classified into 2 or more behaviors in any test session.
    Time Complexity for this step = O(N)
  5. All test entries wherein the student has exhibited that behavior in the post-test is picked for every behavior.
    Only those test entries wherein the student’s behavior improved (as per ranking given) between the pre and post-test were taken.
    We get a weighted mean of the transition of the behavior on the score improvement.
    Weighted_Mean = (( Σ(weight of pre-test Sincerity Score- weight of post-test Sincerity Score)* score_improvement_achieved_by user )/ size of the group ) * (Normalisation Factor)
    Where Normalisation Factor = number of Sincerity Score/Σ(weights allocated to each Sincerity Score) = 10/55 
    Since we have given unequal weights to each rank, we need to include a Normalisation Factor.
    If we had given equal weightage to each Rank (i.e., every Rank weighted 1), then the sum of the Rank weights would have been 10.
    However, in our case, we are not giving equal weight. We need to ensure that comparatively, In Control has a 10x lighter weight in comparison to Careless. This is achieved by giving 10/55 weight to “In Control” and 10 x 10/55 weight to “Careless.” Similarly, every Behaviour is given a weight of Rank * (10/55).
    Observe that Σ(Rank * 10/55) for Rank ∈ Z and Rank ∈ [1,10] gives 10, which would have been obtained if there was no weight parameter.
    Time Complexity for this step = O(pN)
  6. Scale down the values by a factor of 1.63.
    Scale Down Value = ( maximum transition possible between ranks ) * Normalisation factor
    = (10 – 1) *(10/55) = 9 * 10/55
    = 1.63
    Time Complexity for this step = O(p)

Flow Chart

Observations:

BehaviorAverage Improvement RatioSSS String
In Control6.59In Control typically improve s 6.6x faster than students exhibiting lower behaviors.
Marathoner7.14Marathoner typically improves 7.1x faster than students exhibiting lower behaviors.
Trying Hard8.49Trying Hard typically improves 8.5x faster than students exhibiting lower behaviors.
Getting There6.19Getting There typically improves s 6.2x faster than students exhibiting lower behaviors.
Slow4.28Slow typically improves 4.3x faster than students exhibiting lower behaviors.
Train Harder3.85Train Harder typically improves 3.8x faster than students exhibiting lower behaviors.
Overconfident5.51Overconfident typically improves 5.5x faster than students exhibiting lower behaviors.
Low Confidence1.05Low Confidence typically improves 1.0x faster than students exhibiting lower behaviors.
Jumping Around0.69Jumping Around typically improves 0.7x faster than students exhibiting lower behaviors.

Amongst the 4 positive behaviors, it is observed that most students’ test sessions tend to show Marathoner ( 35.4% ) and Getting There ( 30% ) behavior. Amongst the negative behaviors, students tend towards Jumping Around ( 33% ) and Train Harder ( 50% ).

Future Work:

  1. Application of unsupervised learning to identify the behavior of the students. This will enable the system to eliminate bias regarding time spent, accuracy, and attempt percentage.
  2. The normalization of the score improvement using learned objective functions via neural networks and other deep learning-based methods
  3. Prediction of the futuristic success story of the user on a particular test type and analyze the performance of our model alongside ESQ
  4. Recommendation Engine to serve the best content and tests based on the personalized Success Stories.

References:

Embibe Score Quotient