This post describes our CS109A final project at Harvard: a song recommender built with Spotify’s Million Playlist Dataset (MPD). The project was done together with Alpha Sanneh, Bella Gu, and Denise Yoon. A longer write-up and results are on our project website.

Many music streaming services let users create playlists—curated sets of songs that define a listening experience. A song recommender extends that experience by suggesting tracks that fit well with what’s already in the playlist. Our goal was to build such a system: given the songs in a playlist, recommend new songs that would make good additions, using the MPD.


Introduction

The Million Playlist Dataset contains millions of user-created playlists and the tracks within them. We used it to learn how songs co-occur across playlists and to predict which songs belong together. After an exploratory phase (including building a local SQLite database to query the ~12GB dataset), we focused on matrix factorization: we treat the data as a sparse playlist–song matrix and learn latent factors so we can fill in “missing” songs for a given playlist.


The model

We built a weighted matrix factorization model on a sparse playlist-by-song matrix. The idea is that both playlists and songs live in a lower-dimensional space; once we learn those embeddings, we can score any playlist–song pair and rank candidate songs for a playlist. The pipeline included:

We also explored audio features from the Spotify API (e.g. for a cosine-similarity or item-based model) on a subset of the data; the main recommender is matrix factorization on the MPD.

Schematic of matrix factorization for collaborative filtering.
Source: RecSys Series Part 4 — The 7 Variants of Matrix Factorization (Towards Data Science).

Results

We evaluated recommendations on held-out data, e.g. how well we predict missing songs in a playlist. To evaluate our models' performances, we used precision@K which evaluates the proportion of K recommended items that are relevant to the user. In our case, this means what proportion of K recommended songs are actually present in the playlist. For all models, we chose to set K=10, meaning we recommended 10 songs for each playlist.

The write-up on our project website goes into more detail on metrics and examples.


Final thoughts

The project was a good exercise in working with a large, sparse dataset and in implementing a classic recommendation approach. Possible next steps we had in mind: more efficient hyperparameter search (e.g. Bayesian optimization), studying the effect of matrix sparsity on performance, and extending the engine to handle novel playlists or cold-start settings.


Links: View on GitHub · Project website (CS109A Group 31)