Mean, Median, Mode
Thales Bruno
Posted on July 14, 2020
In this article, we are talking about the common methods to measure the central tendency of the data, that is a way to explain our data in some manner. Although these measurements are quite simple, they are a foundation for a lot of other more complex measurements.
We will use this fake list of grades of a class below to demonstrate the concepts:
Grades |
---|
9.2 |
7.5 |
8 |
9 |
8.5 |
8 |
2.5 |
We create a Python variable to store the grades:
grades = [9.2, 7.5, 8, 9, 8.5, 8, 2.5]
Mean
The first one is the mean, which is the arithmetic average of the data. To calculate the mean of a set of numeric data, we take the sum of all observations divided by the number of observations.
In Python:
from typing import List
def mean(xs: List[float]) -> float:
return sum(xs)/len(xs)
So, applying in our grades data: mean(grades)
gives us ~7.52
.
Median
The second one is the median, which gives us something more positional. The median is the exact central value of a data, so to calculate it we need to:
- Sort data from smaller to largest, and
- If ODD length: pick up the middle value
- If EVEN length: calculate the mean of the two middle values
So, in Python could be like this:
def median(xs: List[float]) -> float:
s = sorted(xs)
# If odd lenght, just pick up the middle value
if len(xs) % 2:
p = len(xs)//2
return xs[p]
# If even lenght, take the mean of the left and right middle values
else:
l_p = (len(xs) // 2) - 1
r_p = len(xs) // 2
mid_values = [l_p, r_p]
return mean(mid_values)
This way, the grades sorted looks like this: [2.5, 7.5, 8, 8, 8.5, 9, 9.2]
and their median, median(grades)
, is 8
.
Mode
Finally, we have the mode, i.e., the observation that occurs the most in a dataset. Thus, a dataset may have one, multiple, or even none mode.
Python mode function:
from collections import Counter
def mode(xs: List[float]) -> List[float]:
counter = Counter(xs)
return [x_i for x_i, count in counter.items() if count == max(counter.values())]
The grade that occurs the most in our dataset is [8]
as we can see calling mode(grades)
Python mean, median, and mode
In this article, we made some Python code to illustrate our measurements, just as didactic purposes. By the way, we already have functions in Python libraries that make the job. Bellow, some of them:
- Mean: numpy.mean
- Median: numpy.median
- Mode: statistics.mode
Posted on July 14, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.