How to access Kaggle data from command line
Tomasz Wegrzanowski
Posted on November 20, 2022
The war isn't going anywhere for now, so every couple of days I have to do the following steps to update Russian losses tracker:
- download zip from Kaggle
- run
update_csv
script - optionally, verify that data looks right with
git diff
, as occasionally there's a typo which makes losses go backwards (it happened only a few times, and it always gets corrected in the next update) - delete
archive.zip
The annoying part is that Kaggle requires me to be logged in in a browser to download data, so I can't just replace that step with a curl
request.
So let's try to improve this flow a bit.
Get Kaggle API token
You need to create an account on Kaggle.
Then go to your account settings by clicking on top right icon, and selecting Account (https://www.kaggle.com/<name>/account
).
There's "Create New API Token" button, which will create new account token, and download it as kaggle.json
. Create ~/.kaggle
folder, and save that file to ~/.kaggle/kaggle.json
.
Kaggle will complain if you don't secure the file so run this: chmod 0600 ~/.kaggle/kaggle.json
Install Kaggle CLI tools
If you have Python3 installed, you just need to do pip3 install kaggle
Download dataset
User name and ID of the data set are in the URL, so to download https://www.kaggle.com/datasets/piterfm/2022-ukraine-russian-war
you need to run:
$ kaggle datasets download piterfm/2022-ukraine-russian-war
It will save it as 2022-ukraine-russian-war.zip
. There are extra options like where you want to download it, or unzipping it etc.
Full process
Now I can automate the whole process:
$ kaggle datasets download piterfm/2022-ukraine-russian-war
$ ./update_csv 2022-ukraine-russian-war.zip
$ trash 2022-ukraine-russian-war.zip
$ git add -u
$ git ci -m 'Data Update'
$ git push
And since it's just a series of commands, I can even make it run automatically every day, without any intervention.
I could also add some kind of data checks to the process, so if there's anything weird like numbers going backwards, it would stop the update and wait for the next day. But overall, I'm happy with how it all ended up.
Posted on November 20, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.