Numerical variables
Thales Bruno
Posted on July 8, 2020
Numerical Variables
Numerical variables, also known as quantitative variables, are the type of data that represent something measurable or countable like frequency, measurement, etc. Another attribute of numerical variables is that they are always numbers that can be placed in a meaningful order with consistent intervals.
As examples of quantitative variables we may mention:
- Weight
- Height
- Sales
- Production units
- Movie Ratings
Discrete and continuous
Numerical variables may be either discrete or continuous.
Discrete values are the result of counting, like when we count how many goals a football team has scored in a season. Here, the data take certain numerical values, like 60, 65, 72, and so on.
On the other hand, continuous values are the result of a measurement. For instance, we may measure the weights in kilograms of football team players, and the data will assume continuous values inside a range, like 84.1kg, 74.89483kg.
Buckets and bins
Buckets and bins are the way we may organize the numerical data collected in a meaningful order with consistent intervals to analyze and make insights from them. For example, we might collect the number of movies produced in the 20th Century and put them in buckets of 10 years, and as result, we could see the evolution of the Movie Industry in the last century.
But in this article, we will demonstrate a bit of numerical data using the Kaggle Google Play Store Apps dataset from Lavanya Gupta as we did in the article about Categorical Variables.
Using pandas, we will load the dataset, but only the Rating column, which is a typical numerical variable. The users rated the Apps from 1.0 to 5.0.
import pandas as pd
import plotly.express as px
from collections import Counter
df = pd.read_csv("./data/googleplaystore.csv", usecols=['Rating'])
# Drop missing values
df.dropna(axis=0, inplace=True)
ratings = df.Rating
# Drop a outline rating of 19.0 (from some error)
ratings.drop(10472, inplace=True)
# Plot a histogram
fig = px.histogram(ratings, x='Rating', title='Google Play Store Apps Ratings', template="simple_white")
fig.show()
Histogram
The chart we see above is a Histogram, which seems like the Bar Chart we've plotted in the Categorical Variable post, but actually they have some important differences. In a Histogram there is no space between the bars, and the intervals are equally spaced, as expected to numerical values.
The shape of the histogram already gives us useful information. The histogram above is left-skewed (it has a tail to the left), so we may conclude that most Apps were well evaluated because the highest rectangles are on the right side of the histogram, where we have the highest rates (between 4.0 and 5.0).
Other shapes a histogram can have are right skew, symmetric, bimodal, uniform. Perhaps we will see more examples of histogram shapes in the next posts!
References
courses.lumenlearning.com | 1.2 Data: Quantitative Data & Qualitative Data 🔎
online.stat.psu.edu | 1.1.1 - Categorical & Quantitative Variables 🔎
YouTube | Brandon Foltz | Statistics 101: Descriptive Statistics, Histograms
🔎
Posted on July 8, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.