Unsupervised Learning: Unveiling the Hidden Secrets in Your Data

Imagine walking into a room full of people, but none of them have name tags. Unsupervised learning is like being tasked with organising them into groups based on what you see. Unlike supervised learning where we have labeled data, here we're on our own to find hidden patterns and structures.

Table Of Content

Introduction to Unsupervised Learning
Types of Unsupervised Learning
Practical Example: Implementing Clustering with k-Means
Practical Example: Dimensionality Reduction with PCA
Applications and Challenges of Unsupervised Learning

Introduction to Unsupervised Learning

Unsupervised learning in artificial intelligence is a type of machine learning that learns from data without human supervision. Unlike supervised learning, unsupervised machine learning models are given unlabelled data and allowed to discover patterns and insights without any explicit guidance or instruction.

This type of learning is incredibly useful for tasks like:

Customer segmentation: Unsupervised learning can group customers based on their buying habits, allowing businesses to target specific demographics with personalised marketing campaigns.
Anomaly detection: Ever wondered how spam filters catch suspicious emails? Unsupervised learning can identify outliers in data, making it perfect for detecting fraudulent transactions or security threats.
Data compression: Images and videos can take up a lot of storage space. Unsupervised learning can compress data by reducing its dimensions while preserving key information.

Types of Unsupervised Learning

There are two main approaches to unsupervised learning:

Clustering: This is like sorting those people in the room. We group data points together based on their similarities. Popular clustering algorithms include k-Means (think of it as creating k distinct groups) and Hierarchical clustering (building a hierarchy of clusters like a family tree).
Dimensionality Reduction: Sometimes, data has too many variables, making it hard to visualise or analyse. Dimensionality reduction techniques like PCA (Principal Component Analysis) help us reduce the number of variables while keeping the most important information.

Practical Example: Implementing Clustering with k-Means

Let's get hands-on! We can use k-Means clustering to group customers based on their spending habits:

1.Importing Libraries

import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

2.Sample data: customer spending habits

data = {
    'CustomerID': range(1, 11),
    'Annual Income (k$)': [15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
    'Spending Score (1-100)': [39, 81, 6, 77, 40, 76, 6, 94, 3, 72]
}
df = pd.DataFrame(data)

3.Selecting features

X = df[['Annual Income (k$)', 'Spending Score (1-100)']]

4.Standardising the data

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

5.Applying k-Means clustering

kmeans = KMeans(n_clusters=3, random_state=0)
kmeans.fit(X_scaled)
df['Cluster'] = kmeans.labels_

6.Plotting the clusters

plt.figure(figsize=(10, 6))
plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=df['Cluster'], cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300, c='red', marker='X')
plt.title('Customer Clusters')
plt.xlabel('Annual Income (scaled)')
plt.ylabel('Spending Score (scaled)')
plt.show()

Practical Example: Dimensionality Reduction with PCA

Similarly, PCA can be used to reduce the dimensions of a dataset for better visualization:

1.Importing Libraries

import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

2.Sample data: customer spending habits

data = {
    'CustomerID': range(1, 11),
    'Annual Income (k$)': [15, 16, 17, 18, 19, 20, 21, 22, 23, 24],
    'Spending Score (1-100)': [39, 81, 6, 77, 40, 76, 6, 94, 3, 72],
    'Age': [25, 34, 22, 35, 40, 30, 26, 32, 28, 45]
}
df = pd.DataFrame(data)

3.Selecting features

X = df[['Annual Income (k$)', 'Spending Score (1-100)', 'Age']]

4.Standardising the data

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

5.Applying PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

6.Explained variance

explained_variance = pca.explained_variance_ratio_

7.Plotting the results

plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1])
for i, txt in enumerate(df['CustomerID']):
    plt.annotate(txt, (X_pca[i, 0], X_pca[i, 1]))
plt.title('PCA of Customer Data')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.grid(True)
plt.show()

print(f"Explained variance by component: {explained_variance}")

Applications and Challenges of Unsupervised Learning

Unsupervised learning unlocks a treasure trove of possibilities. It helps us segment markets, detect anomalies, compress data, and uncover hidden patterns in complex datasets. But like any adventure, there are challenges:

Choosing the right number of clusters: How many groups should we create in our k-Means example? Techniques like the Elbow Method can help us decide.
High-dimensional data: When dealing with many variables, it can be tricky to manage and visualize the data.
Interpretation: Making sense of the clusters and reduced dimensions requires careful analysis.

With careful planning and the right tools, unsupervised learning can be a powerful tool in your data science arsenal. So, next time you look at a crowd of unlabeled data, remember – there's a hidden story waiting to be discovered!

Happy Learning !

Please do comment below if you like the content or not

Have any questions or ideas or want to collaborate on a project, here is my linkedin

Blog