[BTY] Day 3: Improve software engineering skills as a researcher
Dang Hoang Nhu Nguyen
Posted on February 6, 2022
All the information are from this article of Lj Miranda: https://ljvmiranda921.github.io/notebook/2020/11/15/data-science-swe/. Please read the original for more details and related resources about the following notes.
Aside from developing Deep Learning models, you have to know how to create a machine learning application that receives HTTP requests, then deploy it as a containerized app. This task, aka. building Machine Learning (ML) Service, relates to software engineers that we (assume you're researchers like me) are lacking in skills.
Why?
Improves engineering sensibilities.
Most applications treat ML models as software components.
Increases familiarity with the ML workflow.
We’re familiar with the ML experimentation workflow. In addition, there is also a productization workflow where we deploy our models, perform A/B testing, take care of concept drift, and more.
Another tool under your belt to create more cool stuff.
Even if you won’t be working as a full-fledged ML Engineer or Developer, the technologies you’ll learn while building an ML Service enables you to do more things!
How?
1. Be comfortable with UNIX commands and a version-control system like Git.
2. Structure your Python project in a modular fashion
my_project/
├── api
├── docs
├── experiments
├── README.md
├── requirements.txt
├── src
│ ├── entrypoint.py
│ └── my_module
│ └── module_file.py
└── tests
├── my_module
│ └── test_my_module.py
└── test_entrypoint.py
3. Learn how to write an API on top of your model using Flask or FastAPI
4. “Containerize” your application using Docker
You want to use Docker for two things: (1) reproducibility and (2) isolation.
5. Learn how to deploy to a Cloud Platform
What’s next?
From here on in, you can keep improving your app by:
- Minimizing the size of your Docker image using multi-stage builds.
- Cleaning-up your repository. Model files shouldn’t be committed but stored in a storage service (e.g. Google Cloud Storage or AWS S3)
- Adding a Continuous Integration / Continuous Deployment (CI/CD) pipeline so that any change on Github is automatically reflected on your deployed app. I often use Github Actions for this (e.g., any change in the master branch is deployed automatically).
- Improving security! Make use of Docker args or .env to secure API tokens, passwords, and whatnot. Ideally you shouldn’t be committing any secrets on Git (it can still be recovered if you deleted it!). Be careful!
Posted on February 6, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
February 6, 2022