🚨 Introducing a python package that helps with machine learning feature engineering during the pandemic! lockdowndates🔥
Sean O'Connor
Posted on March 17, 2022
Recently during the pandemic if you were training machine learning models with some sort of time element you would have been scrambling to try and reengineer these as I would be confident to say the pandemic made all your forecasts very far from the truth.
Consumer habits were completely changed and became so unexpected it was hard to forecast. Everyone was left in the dark as to whether we all would be thrown into a complete or partial lockdown.
As a machine learning engineer I had to completely change my approach to training machine learning models during the pandemic. It took a lot of tweaking. The hardest part was training models for consumers in different countries around the world, where restrictions were completely different. This is still a pain for some countries at the moment with restrictions coming and going.
So, I decided instead of trying to alleviate the damage of the pandemic in our models, why not embrace them? Why not embed them as features in our models for them to learn about? That's why I decided to created lockdowndates
lockdowndates* provides all current and past restrictions imposed by governments in over **100 countries worldwide during the pandemic! To get started:
pip3 install lockdowndates
(side note: currently we only support python3.8
and above, this will be changing soon to support 3.6>
)
after installing we import:
from lockdowndates.core import LockdownDates
to get restrictions for a single country:
ld = LockdownDates("Aruba", "2022-01-01", "2022-01-08")
lockdown_dates = ld.dates()
lockdown_dates
aruba_country_code | aruba_stay_at_home | |
---|---|---|
timestamp | ||
2022-01-01 | ABW | 2.0 |
2022-01-02 | ABW | 2.0 |
2022-01-03 | ABW | 2.0 |
2022-01-04 | ABW | 2.0 |
2022-01-05 | ABW | 2.0 |
2022-01-06 | ABW | 2.0 |
2022-01-07 | ABW | 2.0 |
2022-01-08 | ABW | 2.0 |
to get restrictions for multiple countries:
ld2 = LockdownDates(["Canada", "Denmark"], "2022-01-01", "2022-01-08")
lockdown_dates = ld2.dates()
lockdown_dates
canada_country_code | denmark_country_code | canada_stay_at_home | denmark_stay_at_home | |
---|---|---|---|---|
timestamp | ||||
2022-01-01 | CAN | DNK | 1.0 | 0.0 |
2022-01-02 | CAN | DNK | 1.0 | 0.0 |
2022-01-03 | CAN | DNK | 1.0 | 0.0 |
2022-01-04 | CAN | DNK | 1.0 | 0.0 |
2022-01-05 | CAN | DNK | 1.0 | 0.0 |
2022-01-06 | CAN | DNK | 1.0 | 0.0 |
2022-01-07 | CAN | DNK | 1.0 | 0.0 |
2022-01-08 | CAN | DNK | 1.0 | 0.0 |
the legend for stay_at_home
are as follows:
- NaN - No data available for that date.
- 1.0 - recommend not leaving house.
- 2.0 - require not leaving house with exceptions for daily exercise, grocery shopping, and 'essential' trips.
- 3.0 - require not leaving house with minimal exceptions (eg allowed to leave once a week, or only one person can leave at a time, etc.
lockdowndates contains up to date data thanks to oxford university and their open source data!
I will keep updating lockdowndates to contain more restrictions, including restrictions for vaccinated vs non-vaccinated, school restrictions, work from home restrictions and many more! Track our issues here.
So next time you feature engineering on your next machine learning project and have tabular data from the during the pandemic, consider lockdowndates to make your life easier and to improve your metrics!
Documentation: lockdowndates
Github Repo: lockdowndates
Posted on March 17, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.