This article was originally written by Jakub Czakon posted on the Neptune blog.

Setting up a good tool stack for your Machine Learning team is important to work efficiently and be able to focus on delivering results. If you work at a startup you know that setting up an environment that can grow with your team, needs of the users and rapidly evolving ML landscape is especially important.

We wondered: “What are the best tools, libraries and frameworks that ML startups use?” to tackle this challenge.

And to answer that question we asked 41 Machine Learning startups from all over the world.

The result?

A ton of great advice that we grouped into:

Methodology
Software development setup
Machine Learning frameworks
MLOps
Unexpected 🙂

Read on to figure out what will work for your machine learning team.

Good methodology is the key

Tools are only as strong as the methodology that employs them.

If you run around training models on some randomly acquired data and deploy whatever model you can get your hands on, sooner or later there will be trouble 🙂

Kai Mildenberger from psyML says that:

To us, the careful versioning of all the training and testing data is probably the most essential tool/methodology. We expect that to remain one of the most key elements in our toolbox, even as all of the techniques and mathematical models iterate forever. A second aspect might be to be extremely hypothesis driven. We use that as the single most important methodology to develop models.

I think having a strong understanding of what you want to use your tools for (and that you actually need them) is the very first step.

That said it is important to know what is out there and what people in similar situations use successfully.

Let’s dive right into that!

Software development tooling is the backbone of ML teams

Development environment is the foundation of every team’s workflow. So it was very interesting to learn what tools companies around the world consider the best in this area.

Source: giphy.com

ML teams use various tools as an IDE. Many teams like SimpleReport and Hypergiant use Jupyter Notebooks and Jupyter Lab with its ecosystem of NB Extensions.

“Jupyter Notebook is very useful for quick experiments and visualization, especially when exchanging ideas between multiple team members. Because we use Tensorflow, Google Colab is a natural extension to share our code more easily.” – says Wenxi Chen from Juji.

Various flavours of Jupyter have been mentioned as well. Deepnote (a hosted Jupyter Notebook solution) is “loved for their ML stuff” by the team of Intersect Labs while Google Colab “is a natural extension to share our code more easily” for the Juji team.

Others choose more standard software development IDEs. Among those Pycharm, tooted by Or Izchak from Hotelmize as “the best Python IDE” and Visual Studio Code used by Scanta for its “ease of connectivity with Azure and many ML-based extensions provided” were mentioned the most.

For teams that use R language like SimpleReport, RStudio was a clear winner when it comes to the IDE of choice. As Kenton White from Advanced Symbolics mentions:

We mostly use R + RStudio for analysis and model building. The workhorse for our AI modeling is VARX for time series forecasts.

When it comes to code versioning Github is a clear favourite. As Daniel Hanchen from Umbra AI mentions:

Github (now free for all teams!!) with its super robust version control system and easy repository sharing functionality is super useful for most ML teams.

Among most popular languages we have Python, R and interestingly Clojure mentioned by Wenxi Chen from Juji.

As for the environment/infrastructure setup notable mentions from ML startups are:

“AWS as the platform for deployment” (Simple Report)
“Anaconda serves as our goto tool for running ML experiments due to its live code feature wherein it can be used to combine software code, computational output, explanatory text, and multimedia resources in a single document.” (Scanta)
“Redis dominates as an in-memory data structure store due to its support for different kinds of abstract data structures, such as strings, lists, maps, sets, sorted sets, HyperLogLogs, bitmaps, streams, and spatial indexes.” (Scanta)
“Snowflake and Amazon S3 for data storage.” (Hypergiant) “Spark-pyspark – very simple api for distributing job to work on big data.” (Hotelmize)

Sooo many Machine Learning Frameworks

Source: giphy.com

Integrated development environment is crucial, but one needs a good ML framework on top of that to transform the vision into a project. The range of tools pointed out by the startups is quite diverse here.

For playing with tabular data, Pandas was mentioned the most.

Additional benefit of using Pandas mentioned by Nemo D’Qrill, the CEO of Sigma Polaris is:

I'd say that Pandas is probably one of the most valuable tools, in particular when working in collaboration with external developers on various projects. Having all data files in the form of data frames, across teams and individual developers, makes for a much smoother collaboration and unnecessary hassle.

Interesting library mentioned by Software Developer from Hotelmize was dovpanda – python extension library for panda which gives you insights on your panda code and data while working with panda.

When it comes to visualization matplotlib is used the most by the likes of Trustium, Hotelmize, Hypergiant and others.

Plotly was also a common choice. As developers from Wordnerds explain “for great visualisations to make data understandable and look good”. Dash, a tool for building interactive dashboards on top of Plotly charts, was recommended by Theodoros Giannakopoulos from Behavioral Signals for ML teams that need to present their analytical results in a nice, user-friendly manner.

For more standard machine learning problems most teams like Wordnerds, Sensitrust or Behavioral Signals use Scikit-Learn. ML team from iSchoolConnect explains why it is such a great tool:

It is one of the most popular toolkits used by machine learning researchers, engineers, and developers. The ease with which you can get what you want is amazing! From feature engineering to interpretability, scikit-learn provides you with every functionality.

Truth be told Pandas and Sklearn are really the workhorses of ML teams all over the world.

As Michael Phillips, Data Scientist from Numerai says:

Modern Python libraries like Pandas and Scikit-learn have 99% of the tools that an ML team needs to excel. Though simple, these tools have extraordinary power in the hands of an experienced data scientist

In my opinion, while in the general ML team population this may be true, in the case of ML Startups a lot of work goes into state of the art methods which usually means deep learning models.

When it comes to general deep learning frameworks we had many different opinions.

Many teams like Wordnerds and Behavioral Signals choose PyTorch.

The team of ML experts from iSchoolConnect tells us why so many ML practitioners and researchers choose PyTorch.

If you want to go deep into the waters, PyTorch is the right tool for you! Initially, it will take time to get accustomed to it but once you get comfortable with it there is nothing like it! The library is even optimized for quickly training and evaluating your ML-models.

But it is still Tensorflow and Keras that are leading in popularity.

Most teams like Strayos and Repetere choose it as their ML development frameworks. Cedar Milazzo from Trustium said:

Tensorflow, of course. Especially with 2.0! Eager execution was what TF really needed and now it’s here. I should note that when I say “”tensorflow”” I mean “”tensorflow + keras”” since keras is now built into TF.

It’s also important to mention that you don’t have to choose one framework and exclude others.

For example, Melodia’s Founder, Omid Aryan said that:

The tools that have been most beneficial to us are TensorFlow, PyTorch, and Python’s old scikit-learn tools.

There are some popular frameworks for more specialized applications.

In Natural Language Processing we’ve heard:

“Huggingface: it’s the most advanced and highest performance NLP library ever created. It’s the first of its kind in that researchers are directly contributing to a highly scalable NLP library. It separates itself from other similar tools by having production level tools available a few months after a newer model is published” says Ben Lamm, the CEO of Hypergiant.
“Spacy is a very cool natural language toolkit. NLTK is by far the most popular and I certainly use it, but spacy does lots of things NLTK can’t do so well, such as stemming and dependency parsing.” mentions Cedar Milazzo, the CEO of Trustium
“Gensim is good for word vectors and document vectors too, and I believe it isn’t so popular.” adds Cedar Milazzo.

In Computer Vision:

“OpenCV is indispensable for computer vision work” for Hypergiant. Their CEO says *“It’s a classic CV ensemble of methods from the 1960s until 2014 that are useful pre and post processing and can work well in scenarios where a neural network would be overkill.” *

Also it’s worth noting that not every team is implementing deep learning models themselves.

As Iuliia Gribanova and Lance Seidman from Munchron say, there are now API services where you can outsource some (or all) of the work:

Google ML kit is currently one of the best easy-to-entry tools that lets mobile developers easily embed ML API services like face recognition, image labeling, and other items that Google offers into an Android or iOS App. But additionally, you can also bring in your own TF (TensorFlow) lite models to run experiments and then bring them into production using Google’s ML Kit.

I think it’s important to mention that not always you can choose the latest and greatest libraries and the toolstack gets handed to you when you join the team.

As Naureen Mahmood from Meshcapade shared:

*“In the past, some important autodiff libraries that have made it possible for us to run multiple joint optimizations, and in doing so helped us build some of the core tech we still use today, are Chumpy & OpenDR. Now there are fancier and faster ones out there, like Pytorch and TensorFlow.” *

When it comes to model deployment Patricia Thaine from Private AI mentions “tflite, flask, tfjs and coreml” as their frameworks of choice. She also suggests that visualizing models is very important to them and they are using Netron for that.

But there are tools that go beyond frameworks that can help ML teams deliver real value quickly.

This is where MLOps comes in.

MLOps starts to be more important for machine learning startups

You may be wondering what MLOps is or why you should care.

Source: giphy.com

The term alludes to DevOps and describes tools used for operationalization of machine learning activities.

Jean-Christophe Petkovich CTO at Acerta provided us with an extremely thorough explanation of how their ML team approaches MLOps. It was so good that I decided to share it (almost) in full:

I think most of the interesting tools that are going to see broader adoption in 2020 are centered around MLOps. There was a big push to build those tools last year, and this year we’re going to find out who the winners will be.

For me, MLflow seems to be in the lead for tracking experiments, artifacts, and outcomes. A lot of what we’ve built internally for this purpose are extensions to the functionality of MLflow to incorporate more data tracking similar to how DVC tracks data.

The other big names in MLOps are Kubeflow, Airflow and TFX with Apache Beam—all tools designed for capturing data science workflows and pipelines end-to-end.

There are several ingredients for a complete MLOps system:

You need to be able to build model artifacts that contain all the information needed to preprocess your data and generate a result.
Once you can build model artifacts, you have to be able to track the code that builds them, and the data they were trained and tested on.
You need to keep track of how all three of these things, the models, their code, and their data, are related.
Once you can track all these things, you can also mark them ready for staging, and production, and run them through a CI/CD process.
Finally, to actually deploy them at the end of that process, you need some way to spin up a service based on that model artifact.

When it comes to tracking, MLflow is our pick, it’s tried-and true at Acerta, as several of our employees already used it as part of their personal workflows, and now it’s the de facto tracking tool for our data scientists.

For tracking data pipelines or workflows themselves, we are currently developing against Kubeflow since we’re already on Kubernetes making deployment a breeze, and our internal model pipelining infrastructure meshes well with the Kubeflow component concept.

On top of all of this MLOps development, there’s a shift toward building feature stores—basically specialized data lakes for storing preprocessed data in various forms—but I haven’t seen any serious contenders that really stand out yet.

These are all tools that need to be in place—I know a lot of places are doing their own home-baked solutions to this problem, but I think this year we’re going to see a lot more standardization around machine learning applications.”

Emily Kruger from Kaskada, which accidently is a startup building a feature store solution 🙂 adds:

The most useful tools from our perspective are feature stores, automated deployment pipelines, and experimentation platforms. All these tools address challenges with MLOps, which is an important emerging space for data teams, especially those running ML models in production and at scale.

Ok so in light of this what are other teams using to solve those problems?

Some teams prefer end-to-end platforms, others create everything in-house. Many teams are somewhere in between with a mix of some specific tools and home-grown solutions.

In terms of larger platforms, two names that were mentioned often were:

Amazon SageMaker which according to ML team from VCV “has a variety of tools for distributed collaboration” and SimpleReport chooses as their platform for deployment.
Azure which as Scanta team tells us “serves as a way to build, train, and deploy our Machine Learning applications as well as it helps in adding intelligence in our applications via their Language, Vision, and Speech recognition support. Azure has been our choice of IaaS due to rapid deployments and low-cost Virtual Machines.”

Experiment tracking tools come in and we see ML startups use various options:

Strayos uses Comet ML “for model collaboration and results sharing”.
Hotelmize and others are going with tensorboard which “is the best tool to visualize your model behavior, specially for neural network models.”
“MLflow seems to be in the lead for tracking experiments, artifacts, and outcomes.” as Jean-Christophe Petkovich CTO at Acerta mentioned before
Other teams like Repetere try to keep it simple and say that ”Our tooling is very simple, we use tensorflow and s3 to version model artifacts for analysis”.

Typically, experiment tracking tools keep track of metrics and hyperparameters but as James Kaplan from MeetKai points out:

“The most useful types of ML tools for us are anything that helps with dealing with model regressions caused by everything except the model architecture. Most of these are tools we have built ourselves, but I assume there are many existing options out there. We like to look at confusion matrices that can be visually diff’d under scenarios such as:

new data added to the training set (and the providence of said data)
quantization configurations
pruning/distillation

*We have found that being able to track performance across new data additions is far more important than being able to just track performance across hyper parameters of the model itself. This is especially so when datasets grow/change far faster than model configurations” *

Speaking of pruning/distillation Malte Pietsch, Co-Founder of deepset explains that:

We see an increasing need for tools that help us profile & optimize models in terms of speed and hardware utilization. With the growing size of NLP models, it becomes increasingly important to make training and inference more efficient.

While we are still looking for the ideal tooling here, we found pytest-benchmark, NVIDIA’s Nsight Systems and kernprof quite helpful.”

Another interesting tool for benchmarking training/inference is MLPerf suggested by Anton Lokhmotov from Dividiti.

Experimenting with models is undoubtedly very important but putting models in front of end-users is where the magic happens (for most of us). On that front Rosa Lin from Tolstoy mentioned using streamlit.io which is a “great tool for building ML model web apps easily.”

Valuable word of warning when it comes to using ML focused solutions comes from Gianvito Pio, Co-Founder of Sensitrust:

“There are also tools like Knife and Orange that allow you to design an entire pipeline in a drag-and-drop fashion, as well as AutoML tools (see AutoWEKA, auto-sklearn and JADBio) that will automatically select the most appropriate model for a specific task.

However, in my opinion, a strong expertise in the Machine Learning and AI areas are still necessary. Even the “”best, automated”” tool can be misused, without a good background in the field.”

Unexpected

Ok, when I started working on this, some answers like PyTorch, Pandas or Jupyter Lab were what I expected.

But one answer we received was really out-of-the-box.

Source: giphy.com

It put all the other things in perspective and made me think that perhaps we should take a step back and take a look at the larger picture.

Christopher Penn from Trust Insights suggested that ML teams should use a rather interesting “tool”:

Wetware – the hardware and software combination that sits between your ears – is the most important, most useful, most powerful machine learning tool you have.

Far, FAR too many people are hoping AI is a magic wand that solves everything with little to no human input. The reverse is true; AI requires more management and scrutiny than ever, because we lack so much visibility into complex models.

Interpretability and explainability are the greatest challenges we face right now, in the wake of massive scandals about bias and discrimination. And AI vendors make this worse by focusing on post hoc explanations of models instead of building the expensive but worthwhile interpretations and checkpoints into models.

So, wetware – the human in the loop – is the most useful tool in 2020 and for the foreseeable future.”

Our perspective:

Since we are building tools for ML teams and some of our customers are AI startups I think it makes sense to give you our perspective.

So we see:

A lot of teams use Jupyter ecosystem for exploration and Pycharm/VSCode for development
For deep learning people are using everything Tensorflow, Keras and Pytorch. Notably, we see more and more people using high-level PyTorch training libraries like Lightning, Ignite, Catalyst, fastai and Skorch,
For visual exploration people are using matplotlib, plotly, altair and hiplot (hyperparameter visualizations)
For running hyperparameter sweeps and general run orchestration some teams like YNAP choose AWS SageMaker.
For experiment tracking we see open-source packages like TensorBoard, MLflow and Sacred (Neptune integrates with all of them)

… and since those are our customers naturally they use neptune-notebooks for tracking explorations in jupyter notebooks and neptune for experiment tracking and organization of their machine learning projects.

This article was originally written by Jakub Czakon and posted on the Neptune blog. You can find more in-depth articles for machine learning practitioners there.

Blog

The Best Tools, Libraries, Frameworks and Methodologies that Machine Learning Teams Use – Things We Learned from 41 ML Startups

Kamil A. Kaczmarek