Building a Hollywood Movie Recommender System using Python and Flask

byIsrar Ahmad -ديسمبر 21, 2021

0

Movie recommendation systems have become an essential part of our entertainment experience. They help us discover new movies based on our preferences, making movie nights more enjoyable and efficient. In this blog post, we'll take you through the exciting journey of creating a Hollywood movie recommender system from scratch. By utilizing the IMDB 5000 Movie Dataset and employing Python, Flask, and data preprocessing techniques, we'll develop a personalized movie recommendation engine. So, grab your popcorn, and let's get started!

The Dataset

Our journey begins with the IMDB 5000 Movie Dataset, which serves as the foundation for our recommender system. This dataset contains valuable information about actors, directors, genres, and more, for a variety of Hollywood movies. We'll leverage this data to create a comprehensive recommendation engine that caters to individual preferences.

Step 1: Data Preprocessing

The first step is data preprocessing, where we clean and prepare the dataset for analysis. We have a dedicated notebook, preprocessing.ipynb, where we handle tasks such as handling missing values, formatting genres, and converting movie titles to lowercase for uniformity. We performed essential data cleaning tasks using the Pandas library saved the cleaned data in a CSV file named data.csv, which was used for further processing.

import numpy as np
import pandas as pd
data = pd.read_csv('movie_metadata.csv')

# Selecting relevant columns for recommendation
data = data.loc[:, ['actor_1_name', 'actor_2_name', 'actor_3_name', 'director_name', 'genres', 'movie_title']]

# Handling missing values
data['actor_1_name'] = data['actor_1_name'].replace(np.nan, 'unknown')
data['actor_2_name'] = data['actor_2_name'].replace(np.nan, 'unknown')
data['actor_3_name'] = data['actor_3_name'].replace(np.nan, 'unknown')
data['director_name'] = data['director_name'].replace(np.nan, 'unknown')

# Formatting genres
data['genres'] = data['genres'].replace('|', ' ')

# Lowercasing movie titles
data['movie_title'] = data['movie_title'].str.lower()
data['movie_title'] = data['movie_title'].str[:-1]

# Saving cleaned data to CSV
data.to_csv('data.csv', index=False)

Step 2: Creating the Similarity Matrix

To recommend movies based on their similarities, we need to create a similarity matrix. We've written a script called create.py that reads the preprocessed data, creates a count matrix, and computes the cosine similarity matrix from scikit-learn based on movie features like actors, directors, and genres. This similarity matrix quantifies the likeness between movies, enabling us to suggest movies with similar characteristics.

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

data = pd.read_csv('data.csv')

data['comb'] = data['actor_1_name'] + ' ' + data['actor_2_name'] + ' ' + data['actor_3_name'] + ' ' + data['director_name'] + ' ' + data['genres']

cv = CountVectorizer()
count_matrix = cv.fit_transform(data['comb'])
similarity_matrix = cosine_similarity(count_matrix)

np.save('similarity_matrix.npy', similarity_matrix)

Step 3: Building the Recommender Engine with Flask

With the similarity matrix in place, we move on to building the actual recommendation engine. Our main script, main.py, loads the preprocessed data and the similarity matrix. We employ the Flask web framework to create a user-friendly interface. Users can input a movie title, and our system will process this input to generate a list of recommended movies. The recommendations are based on the movie's similarity score compared to other movies in the dataset.

import pandas as pd
import numpy as np
import pickle
from flask import Flask, render_template, request
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Load data and create similarity matrix
df = pd.read_csv('data.csv')
similarity = np.load('similarity_matrix.npy')

# ... (code for data preprocessing)

app = Flask(__name__)

# ... (code for Flask routes and app setup)

# Recommender function
def rec(movie_title):
# ... (code for generating recommendations)

# ... (code for Flask routes and recommendation page)

if __name__ == '__main__':
app.run(debug=False, host='0.0.0.0')

Step 4: Deployment and User Interaction

To make our recommender system accessible to a wider audience, we've deployed it as a web application using Flask. Users can access the web application via the link provided: Movie Recommender System. Upon landing on the homepage, users can input the title of a Hollywood movie they've enjoyed or are curious about. Our system will then generate a list of recommended movies based on the input, providing both similar and diverse options to cater to different preferences.

Conclusion

Congratulations! You've successfully built a Hollywood movie recommender system using Python, Flask, and the IMDB 5000 Movie Dataset. From data preprocessing to deploying a user-friendly web application, you've explored every step of the process. Now, movie nights will be more enjoyable and hassle-free, thanks to your personalized movie recommendations. Feel free to explore the code snippets in the GitHub repository linked below, and get ready for a cinematic journey tailored to your preferences!

GitHub Repository: Movie Recommender System GitHub Repo

With your new recommender system in place, you're all set to embark on a cinematic adventure like never before. Enjoy discovering hidden gems and rewatching your favorites with the power of data-driven movie recommendations at your fingertips!

Building a Hollywood Movie Recommender System using Python and Flask

0 تعليقات

إرسال تعليق

Traditional RAG vs. HyDE: Understanding Advanced Retrieval Methods in AI

How to Write a Qualitative Paper: A Comprehensive Guide

نموذج الاتصال