Feature Engineering - Min/Max Aggregate

mage_ai

Mage

Posted on March 2, 2022

Feature Engineering - Min/Max Aggregate

TLDR

In this lesson, we’ll learn about the aggregate functions min() and max(), and see how they’re helpful in analyzing and understanding the data.

Glossary

  • Data Aggregation
  • Why is it necessary
  • Definition
  • Example
  • How to code

Data Aggregation

Data aggregation is known as summarization of data. Some of the most common aggregate functions are min(), max(), mean(), count(), sum() etc.

Why is it necessary

Data aggregation is a part of the data analysis process. Data analysis is the first and most critical step of model building. This allows us to delve deeper into the data and help us understand the data better.

Definition

In this lesson, we’ll explore min() and max() functions in detail.

  1. min(): This function helps us find the minimum or least value in a feature or column.

  2. max(): This function helps us find the maximum or highest value in a feature or column.

We can apply aggregate functions in 2 different ways:
Case-1: Apply aggregate functions on a single feature or column i.e., analyzing each column individually.
Case-2: Apply aggregate functions on groups i.e., we’ll group rows and analyze each group individually.

Example

Consider a dataset with 2 columns "Product" and "Price". Let’s apply aggregate functions (min() and max()) to find minimum and maximum value in the “Price” column.

Image descriptionFind minimum price

Image descriptionFind maximum price

Grouping is a 3 step process as shown below:
Step-1: Split the rows into groups based on the “Product” column.

Image description

There are 3 unique products (Laptop, Desk, Chair) in the “Product” column, so the rows are split into 3 groups.

Step-2: Find the minimum price of each unique product

Image description

Step-3: Display the output. For this, we’ll combine each group’s output to form a data frame and display the data frame.

Image description

Image descriptionSteps to find minimum value of each unique product

Image descriptionSteps to find maximum value of each unique product

How to code

In recent years, the popularity of ridesharing has skyrocketed. The key benefits of ridesharing are that it’s inexpensive, convenient, and allows anyone to easily travel from 1 location to another.

Image descriptionImage by mohamed Hassan from Pixabay

Service providers frequently change prices based on time, traffic, the number of cabs available, and other factors. As costs fluctuate, it's beneficial to offer users a range of prices for a specific route. So, with the help of rides data, let’s find the minimum and maximum prices for each unique route.

Image descriptionFind the minimum and maximum price of each unique route.

Step-1:
First let’s group rides by source and then by destination. To do this, we’ll iterate through the rows of rides data and save the “source” as keys of the dictionary. The final result should be as shown below.

Output format: {‘sourceA’: [(destination1, price1), (destination1, price2),...], ‘sourceB’:[(destination1, price1), (destination1, price2),...],....}

Image description

Image description

Step-2:
Find minimum price

By comparing the prices of routes with the same starting location and destination, we'll find the minimum price of each route.

Image description

Image descriptionLowest price of each unique route

Find maximum price
By comparing the prices of routes with the same starting point and destination, we'll find the highest price for each route.

Image description

Image descriptionHighest price of each unique route

From the output, we see that the price from “Haymarket Square” to “North Station” ranges between 3.0 and 32.5, “Haymarket Square” to “West End” ranges between 3.0 and 27.5, etc.

Group rows of the same route, and find the minimum and maximum price of each individual route.

Pandas has a built-in function groupby() that’s used to group rows in a dataset. This function is used along with min() and max() functions to find minimum and maximum values of each unique group.

Find minimum price

Image description

Image description

Find maximum price

Image description

Image description

Magical no code solution

For quick analysis and results, try our product, Mage. Our service features an "Edit data" area with multiple aggregation options. Apart from analyzing the data, you can create a new column and store the aggregation results that help in further analysis of the data.

Image description

Want to learn more about machine learning (ML)? Visit Mage Academy! ✨🔮

💖 💪 🙅 🚩
mage_ai
Mage

Posted on March 2, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Feature Engineering - Min/Max Aggregate
machinelearning Feature Engineering - Min/Max Aggregate

March 2, 2022