metadata

license: apache-2.0

Here’s a draft for a model card that you can use for Hugging Face, detailing the purpose, training data, architecture, and intended use of your recommendation model:

Model Card: Profile-Based Movie Recommendation Model

Model Overview

This model is a profile-based movie recommendation system designed to recommend movies based on user demographics and genre preferences. It was trained on the MovieLens 1M dataset and uses demographic and genre preferences to create user profiles through clustering. By leveraging user profiles and movie embeddings, the model provides movie recommendations tailored to each user’s interests.

Model Architecture

The model is built using TensorFlow and Keras and employs an embedding-based architecture:

User Profiles and Clustering: User demographics and genre preferences are clustered into a specified number of profiles using KMeans clustering. This results in profile IDs that capture user similarities based on age, occupation, gender, and preferred movie genres.
Embedding Layers:
- The user profile IDs are embedded in a lower-dimensional space using a trainable embedding layer.
- Similarly, movie IDs are embedded into a separate lower-dimensional space.
Dot Product for Recommendation: The model computes the dot product between the profile embedding and movie embedding, resulting in a similarity score. The higher the score, the more relevant the movie is predicted to be for the user profile.

Training Dataset

The model was trained on the MovieLens 1M dataset by GroupLens. The dataset contains 1 million ratings from 6,040 users on 3,900 movies.

Users: Contains demographic information such as age, gender, and occupation.
Ratings: Provides ratings from users for different movies.
Movies: Includes movie titles and genres (e.g., Action, Comedy, Romance).

Dataset Preparation

Preprocessing:
- User demographic data was one-hot encoded to include age, occupation, and gender.
- User genre preferences were extracted by identifying each user's top-rated genres, with genres being split and exploded for individual assignment.
Clustering: User profiles were clustered into 10 groups using KMeans clustering based on demographic and genre features.
Embedding Preparation: Profile IDs and Movie IDs were prepared for embedding layers.

Training Configuration

Optimizer: Adam
Loss Function: Mean Squared Error (MSE)
Metric: Mean Absolute Error (MAE)
Epochs: 10
Batch Size: 256
Embedding Dimension: 64

Intended Use

This model is intended to provide movie recommendations based on user profile clusters. By embedding user profiles and movies into a shared space, it provides recommendations by finding the best matching movies for a particular user profile.

Use Cases

Personalized Movie Recommendations: For streaming platforms, this model can serve as the core recommendation engine for suggesting movies tailored to user preferences based on demographics and past high-rated genres.
User Segmentation: The model clusters users based on demographic and genre preferences, which can also be used for analysis and targeted advertising.

Limitations

Cold Start Problem: The model may not perform optimally for new users without enough past ratings or for movies without sufficient interaction data.
Demographic Constraints: Recommendations are influenced heavily by demographic data and may not fully capture nuanced user preferences.
Genre Limitation: Genre preferences are based on past ratings, which may not always reflect the user’s evolving interests.

How to Use

To use this model, you'll need:

Profile ID: Identify or calculate the user’s profile ID based on demographics and genre preferences.
Movie ID: Specify the movie IDs you want to score for a particular profile.

from tensorflow import keras
import numpy as np

# Load the trained model
model = keras.models.load_model("profile_based_recommendation_model.keras")

# Example: Generate recommendations for a user with profile_id 3 for movies with IDs 10, 50, and 100
profile_id = np.array([3])
movie_ids = np.array([10, 50, 100])

# Predict scores
predictions = model.predict([profile_id, movie_ids])

# Display predicted scores for each movie
for movie_id, score in zip(movie_ids, predictions):
    print(f"Movie ID: {movie_id}, Predicted Score: {score}")

Dataset Citation

If you use this model or the dataset, please cite the MovieLens dataset as follows:

@article{harper2015movielens,
  title={The MovieLens datasets: History and context},
  author={Harper, F Maxwell and Konstan, Joseph A},
  journal={ACM Transactions on Interactive Intelligent Systems (TIIS)},
  volume={5},
  number={4},
  pages={1--19},
  year={2015},
  publisher={ACM New York, NY, USA}
}

Acknowledgments

Thanks to GroupLens Research for providing the MovieLens dataset and the open-source tools that make it accessible for research purposes.

This model card can be customized further if you want to add more specific instructions or additional use cases.