Calculating weighted averages with numpy and Python!
Chris Greening
Posted on August 22, 2023
Introduction
Navigating the world of data often means operating in scenarios where not all data points have the same importance as one another
This is where the weighted average, a statistical tool that assigns importance to each value, helps us incorporate the context of a situation into our average calculations!
import numpy as np
With Python's versatile ecosystem we're able to leverage tools such as numpy
to quickly and efficiently calculate the weighted average in our analyses and data projects
Table of contents
- Prerequisites and installation
- What is the weighted average?
- Examining a simple example
- Using np.average to calculate weighted mean
- Conclusion
- Additional resources
Prerequisites and installation
The following package is a prerequisite installation for following along with this blog post!
To install it open your preferred terminal/console and run:
pip3 install numpy
What is the weighted average?
The weighted average is an extension of a typical arithmetic mean that includes the importance (or weight) of each data point when calculating the average
In scenarios where all data points have the same importance, the weighted average simplifies to the standard arithmetic mean. However, when the significance of each data point varies the weighted average becomes a vital tool
Examining a simple example
Let's consider an example where we are a data scientist employed by a university to calculate the average student grade across all classes in the school
To preserve the privacy of individual students we are only provided data aggregated at the class level and are thus given each individual class'
- average grade
- number of students
Our initial instinct might be to just take the usual average across all classes but what happens when comparing small classes to very large classes?
If a class has an average test score of 20/100 but only has 4 students is it fair to compare it to a class that has an average test score of 93 and 500 students? No!
If we did that the small class would be given an outsized level of importance as the test grades of just 4 students should not impact the overall mean as much as 500 students
So how do we incorporate the number of students into our university grade average?
With the weighted average!
Using np.average to calculate weighted average
Continuing with the previous example let's say these are the grades
and their respective number_of_students
per class:
grades = [20, 93, 56, 79, 100, 86]
number_of_students = [4, 500, 93, 274, 12, 30]
To get the weighted average across the entire university using numpy
all we have to do is incorporate the weights into the np.average
:
import numpy as np
university_average = np.average(grades, weights=number_of_students)
print(university_average)
>>> 84.57174151150055
Conclusion
And just like that we're able to quickly incorporate the weighted average into our projects by leveraging the np.average
's weights
argument
Thanks so much for reading and if you liked my content, be sure to check out some of my other work or connect with me on social media or my personal website 😄
Cheers!
Effortlessly scrape HTML tables into Python using pd.read_html!
Chris Greening ・ Aug 8 '23
Posted on August 22, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.