Model2Vec: Making Sentence Transformers 500x faster on CPU, and 15x smaller

pringled

Thomas van Dongen

Posted on September 22, 2024

Model2Vec: Making Sentence Transformers 500x faster on CPU, and 15x smaller

Hey Everyone! I wanted to share a project we (Stephan and Thomas) have been working on for the past couple of months. It seems that the solution for many researchers and companies is to start with (very) large embedding models, but in our day to day job as ML engineers we still see plenty of value being provided with more traditional methods like static (word2vec) embeddings that are much more eco-friendly.

We've been experimenting with ways to use those same large embedding models to create much smaller models and found something quite interesting. By simply taking the output embeddings of a Sentence Transformer, reducing the dimensionality using PCA, and weighting the embeddings using zipf weighting, we created a very small static embedding model (30mb on disk) that outperforms other static embedding models on all tasks in MTEB. There's also no training data needed since all you need is a vocabulary, and "distilling" a model can be done in a minute on a regular CPU because you only have to forward pass the tokens from the vocabulary. The final model is (much) faster and smaller than (for example) GloVe while being more performant. This makes it great for usecases such as text classification, similarity search, clustering, or RAG.

The following plot shows the relationship between the models we implemented (Model2Vec base output and Model2Vec glove vocab) compared to various popular embedding models.
Image description

Want to try it yourself? We've implemented a basic interface for using our models, for example, you can use the following to create embeddings (after installing the package with pip install model2vec):

from model2vec import StaticModel

# Load a model from the HuggingFace hub (in this case the M2V_base_output model)
model_name = "minishlab/M2V_base_output"
model = StaticModel.from_pretrained(model_name)

# Make embeddings
embeddings = model.encode(["It's dangerous to go alone!", "It's a secret to everybody."])
Enter fullscreen mode Exit fullscreen mode

I'm curious to hear your thoughts on this, we've created a simple package called Model2Vec where we documented all our experiments and results: https://github.com/MinishLab/model2vec.

💖 💪 🙅 🚩
pringled
Thomas van Dongen

Posted on September 22, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related