Building a Movie Recommendation System with Streamlit and Python

Hey fellow developers! Today, I'm excited to share a cool project I've been working on: a movie recommendation system built with Python and Streamlit. This system suggests movies based on a user's favorite film, making it a fun way to discover new movies to watch. Let's dive into how it works!

The Tech Stack

For this project, we're using:

Python
Streamlit for the web interface
pandas for data handling
scikit-learn for text processing and similarity calculations
TMDb API for fetching movie posters

How It Works

Data Loading: We start by loading movie data from a CSV file using pandas.
Feature Engineering: We combine several movie features (genres, director, tagline, keywords, cast) into a single string for each movie.
Text Vectorization: Using TfidfVectorizer from scikit-learn, we convert our text data into numerical feature vectors.
Similarity Calculation: We use cosine similarity to calculate how similar movies are to each other based on their feature vectors.
User Input: Through the Streamlit interface, users can input their favorite movie and choose how many recommendations they want.
Recommendation Generation: We find the closest match to the user's input, then use our similarity matrix to find and display the most similar movies.
Movie Posters: To make our app more visually appealing, we fetch movie posters from TMDb API.

The Code

Here's a breakdown of the main components:

import streamlit as st
import pandas as pd
import numpy as np
import difflib
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import requests

# Function to fetch movie posters
def fetch_movie_poster(movie_title):
    # ... (implementation details)

# Load and preprocess data
movies_data = pd.read_csv('movies.csv')
selected_features = ['genres', 'director', 'tagline', 'keywords', 'cast']

# Combine features and vectorize
combined_features = movies_data['genres'] + ' ' + movies_data['director'] + ' ' + movies_data['tagline'] + ' ' + movies_data['cast'] + ' ' + movies_data['keywords']
vectorizer = TfidfVectorizer()
feature_vector = vectorizer.fit_transform(combined_features)

# Calculate similarity
similarity = cosine_similarity(feature_vector)

# Streamlit UI
st.title('Movie Recommendation System')
movie_name = st.text_input('Enter the name of your favorite movie:')
num_recommendations = st.slider('How many recommendations would you like?', min_value=1, max_value=30, value=10)

# Generate and display recommendations
if movie_name:
    # ... (recommendation logic)

Running the App

To run this app, make sure you have all the required libraries installed and a movies.csv file with the necessary data. Then, simply run:

streamlit run your_script_name.py

Future Improvements

There are several ways this System could be improved:

Implement user accounts to track viewing history and improve recommendations over time.
Add more data sources to get a broader range of movies and more detailed information.
Incorporate collaborative filtering to consider user ratings and preferences.
Optimize the similarity calculation for larger datasets.

Conclusion

Building this movie recommendation system was a fun way to combine data science concepts with web development. It's a great starting point for more complex recommendation systems and showcases the power of Python libraries like scikit-learn and Streamlit.

I hope you found this interesting! Feel free to try it out, modify the code, and let me know if you have any questions or suggestions for improvements.

Happy coding! 🎬🍿

Blog