Detailed Introduction to Word Embedding

Day 3: Word Embedding

As part of my #75DaysOfLLM journey, today we’re diving into Word Embeddings.

What is Word Embedding?

Word embedding is a technique in natural language processing (NLP) that represents words as dense vectors of real numbers. Instead of treating words as discrete symbols, word embedding allows us to capture the meaning and relationships between words in a continuous vector space.

Key points:

Words are represented as vectors (lists of numbers)
These vectors typically have 100-300 dimensions
Similar words have similar vector representations
The vectors capture semantic and syntactic information about words

Why is Word Embedding Important?

Capturing word relationships: Word embeddings can represent complex relationships between words, such as analogies (e.g., king - man + woman ≈ queen).
Dimensionality reduction: Instead of using one-hot encoding (which would result in extremely large, sparse vectors), word embeddings provide a dense, low-dimensional representation.
Improved performance: Word embeddings have been shown to improve performance in various NLP tasks, including:

All the below use cases are already explained in Day 1 of this series.
- Text classification
- Named entity recognition
- Machine translation
- Sentiment analysis
Transfer learning: Pre-trained word embeddings can be used as input features for other NLP models, allowing knowledge transfer between tasks.
Handling out-of-vocabulary words: Some embedding techniques can generate representations for words not seen during training.

How Does Word Embedding Work?

Word embedding algorithms typically work by analyzing large corpora of text and learning vector representations based on the contexts in which words appear. The underlying principle is the distributional hypothesis, which states that words that occur in similar contexts tend to have similar meanings.

The process generally involves:

Defining a context window (e.g., 5 words before and after the target word)
Scanning through the text corpus
Applying a learning algorithm to adjust word vectors based on observed contexts
Iterating until convergence or a specified number of epochs

The resulting vector space has interesting properties:

Words with similar meanings cluster together
Vector arithmetic can reveal semantic relationships (e.g., vec("king") - vec("man") + vec("woman") ≈ vec("queen"))

Popular Word Embedding Techniques

1. Word2Vec

Developed by researchers at Google, Word2Vec uses 2 layer shallow neural networks to learn word embeddings. It comes in two flavors:

a) Continuous Bag of Words (CBOW):

Predicts a target word given its context words
Faster to train and better for frequent words

b) Skip-gram:

Predicts context words given a target word
Works well with small datasets and rare words

Python Example using Gensim:



from gensim.models import Word2Vec

sentences = [['I', 'love', 'natural', 'language', 'processing'],
             ['Word', 'embedding', 'is', 'fascinating'],
             ['Machine', 'learning', 'is', 'the', 'future']]

# Train the model
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, sg=0)

# Get the vector for a word
vector = model.wv['language']

# Find similar words
similar_words = model.wv.most_similar('language', topn=3)
print(similar_words)

Parameters explained:

vector_size: Dimensionality of word vectors (default is 100)
window: Maximum distance between current and predicted word (default is 5)
min_count: Ignores words with frequency below this threshold
sg: Training algorithm: 1 for skip-gram; 0 for CBOW (default)

2. GloVe (Global Vectors for Word Representation)

GloVe, developed at Stanford, learns word vectors by analyzing global word-word co-occurrence statistics from a corpus.

Key features:

Combines the advantages of local context window methods and global matrix factorization
Often performs well on word analogy tasks
Efficient to train on large corpora

3. fastText

Created by Facebook's AI Research lab, fastText extends the Word2Vec model by treating each word as composed of character n-grams.

Advantages:

Can generate embeddings for out-of-vocabulary words
Performs well for morphologically rich languages
Captures subword information

Python Example using Gensim:



from gensim.models import FastText

# Train the model

model = FastText(sentences, vector_size=100, window=5, min_count=1, sg=0)

# Get vector for a word (works even for words not in training data)

vector = model.wv['language']

# Find similar words

similar_words = model.wv.most_similar('unpredictable', topn=3)

print(similar_words)

Differences Between These Techniques

Training approach:
- Word2Vec: Uses local context windows
- GloVe: Uses global co-occurrence statistics
- fastText: Uses character n-grams and local context windows
Handling of rare/unseen words:
- Word2Vec: Struggles with rare words, can't handle unseen words
- GloVe: Better with rare words due to global statistics, can't handle unseen words
- fastText: Handles both rare and unseen words well due to subword information
Training speed:
- Word2Vec: Fast
- GloVe: Generally slower than Word2Vec
- fastText: Similar to Word2Vec, but can be slower due to subword processing
Performance on different tasks:
- Word2Vec: Good all-around performance
- GloVe: Often excels at analogy tasks
- fastText: Performs well on morphologically rich languages and tasks requiring subword information

Conclusion

Word embedding has revolutionized many NLP tasks by providing rich, dense representations of words that capture semantic and syntactic information. By transforming words into numerical vectors, we enable machines to process and understand language in ways that more closely resemble human comprehension.

Each embedding technique (Word2Vec, GloVe, and fastText) has its strengths and is suited for different types of tasks or languages. As you work on NLP projects, experimenting with different embedding techniques can often lead to significant improvements in model performance.

The field of word embedding continues to evolve, with more recent developments including contextualized word embeddings (like ELMo and BERT) that generate different word vectors based on the surrounding context. These advancements promise even more sophisticated language understanding capabilities for AI systems in the future.

Blog

Detailed Introduction to Word Embedding

Naresh Nishad

Day 3: Word Embedding

What is Word Embedding?

Why is Word Embedding Important?

How Does Word Embedding Work?

Popular Word Embedding Techniques

1. Word2Vec

Python Example using Gensim:

2. GloVe (Global Vectors for Word Representation)

3. fastText

Python Example using Gensim:

Differences Between These Techniques

Conclusion

Join Our Newsletter. No Spam, Only the good stuff.

Related