Building a Hollywood Movie Recommender System using Python and Flask

 

Israr Ahmad Doswi









Movie recommendation systems have become an essential part of our entertainment experience. They help us discover new movies based on our preferences, making movie nights more enjoyable and efficient. In this blog post, we'll take you through the exciting journey of creating a Hollywood movie recommender system from scratch. By utilizing the IMDB 5000 Movie Dataset and employing Python, Flask, and data preprocessing techniques, we'll develop a personalized movie recommendation engine. So, grab your popcorn, and let's get started!

 

The Dataset

Our journey begins with the IMDB 5000 Movie Dataset, which serves as the foundation for our recommender system. This dataset contains valuable information about actors, directors, genres, and more, for a variety of Hollywood movies. We'll leverage this data to create a comprehensive recommendation engine that caters to individual preferences.

 

Step 1: Data Preprocessing

 

The first step is data preprocessing, where we clean and prepare the dataset for analysis. We have a dedicated notebook, preprocessing.ipynb, where we handle tasks such as handling missing values, formatting genres, and converting movie titles to lowercase for uniformity. We performed essential data cleaning tasks using the Pandas library saved the cleaned data in a CSV file named data.csv, which was used for further processing.

 

import numpy as np

import pandas as pd

data = pd.read_csv('movie_metadata.csv')

 

# Selecting relevant columns for recommendation

data = data.loc[:, ['actor_1_name', 'actor_2_name', 'actor_3_name', 'director_name', 'genres', 'movie_title']]

 

# Handling missing values

data['actor_1_name'] = data['actor_1_name'].replace(np.nan, 'unknown')

data['actor_2_name'] = data['actor_2_name'].replace(np.nan, 'unknown')

data['actor_3_name'] = data['actor_3_name'].replace(np.nan, 'unknown')

data['director_name'] = data['director_name'].replace(np.nan, 'unknown')

 

# Formatting genres

data['genres'] = data['genres'].replace('|', ' ')

 

# Lowercasing movie titles

data['movie_title'] = data['movie_title'].str.lower()

data['movie_title'] = data['movie_title'].str[:-1]

 

# Saving cleaned data to CSV

data.to_csv('data.csv', index=False)

 

Step 2: Creating the Similarity Matrix

 

To recommend movies based on their similarities, we need to create a similarity matrix. We've written a script called create.py that reads the preprocessed data, creates a count matrix, and computes the cosine similarity matrix from scikit-learn based on movie features like actors, directors, and genres. This similarity matrix quantifies the likeness between movies, enabling us to suggest movies with similar characteristics.

 

import pandas as pd

import numpy as np

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.metrics.pairwise import cosine_similarity

 

data = pd.read_csv('data.csv')

 

data['comb'] = data['actor_1_name'] + ' ' + data['actor_2_name'] + ' ' + data['actor_3_name'] + ' ' + data['director_name'] + ' ' + data['genres']

 

cv = CountVectorizer()

count_matrix = cv.fit_transform(data['comb'])

similarity_matrix = cosine_similarity(count_matrix)

 

np.save('similarity_matrix.npy', similarity_matrix)

 

Step 3: Building the Recommender Engine with Flask

 

With the similarity matrix in place, we move on to building the actual recommendation engine. Our main script, main.py, loads the preprocessed data and the similarity matrix. We employ the Flask web framework to create a user-friendly interface. Users can input a movie title, and our system will process this input to generate a list of recommended movies. The recommendations are based on the movie's similarity score compared to other movies in the dataset.

 

import pandas as pd

import numpy as np

import pickle

from flask import Flask, render_template, request

from sklearn.feature_extraction.text import CountVectorizer

from sklearn.metrics.pairwise import cosine_similarity

 

# Load data and create similarity matrix

df = pd.read_csv('data.csv')

similarity = np.load('similarity_matrix.npy')

 

# ... (code for data preprocessing)

 

app = Flask(__name__)

 

# ... (code for Flask routes and app setup)

 

# Recommender function

def rec(movie_title):

    # ... (code for generating recommendations)

 

# ... (code for Flask routes and recommendation page)

 

if __name__ == '__main__':

    app.run(debug=False, host='0.0.0.0')

 

 

Step 4: Deployment and User Interaction

To make our recommender system accessible to a wider audience, we've deployed it as a web application using Flask. Users can access the web application via the link provided: Movie Recommender System. Upon landing on the homepage, users can input the title of a Hollywood movie they've enjoyed or are curious about. Our system will then generate a list of recommended movies based on the input, providing both similar and diverse options to cater to different preferences.

 

Conclusion

 

Congratulations! You've successfully built a Hollywood movie recommender system using Python, Flask, and the IMDB 5000 Movie Dataset. From data preprocessing to deploying a user-friendly web application, you've explored every step of the process. Now, movie nights will be more enjoyable and hassle-free, thanks to your personalized movie recommendations. Feel free to explore the code snippets in the GitHub repository linked below, and get ready for a cinematic journey tailored to your preferences!

 

GitHub Repository: Movie Recommender System GitHub Repo

 

With your new recommender system in place, you're all set to embark on a cinematic adventure like never before. Enjoy discovering hidden gems and rewatching your favorites with the power of data-driven movie recommendations at your fingertips!

0 تعليقات

إرسال تعليق

Post a Comment (0)

أحدث أقدم