Princeton University Library Catalog

Predicting Netflix Movie Ratings using a Topic Modeling Algorithm

Zhu, Michael [Browse]
Senior thesis
Arora, Sanjeev [Browse]
Singer, Amit [Browse]
Princeton University. Department of Mathematics [Browse]
Class year:
23 pages
Summary note:
Latent factor models and matrix factorization algorithms were some of the most successful stand-alone algorithms used for predicting movie ratings in the Netflix Prize. To address the sparsity in the movie rating training set, many matrix factorization algorithms train only on the observed ratings and use regularization to avoid overfitting. Topic modeling algorithms must also be able to handle high sparsity. Given a collection of documents, the purpose of topic modeling is to discover the high-level thematic structure that best explains the collection of documents as a whole. In the same way, we might hope that given a collection of movie ratings, we can uncover the high-level movie genres that best explain the collection of movie ratings as a whole. Mathematically, topic modeling can be interpreted as recovering the first factor in a matrix factorization, subject to some constraints. By this view, perhaps a topic modeling algorithm can be the first step in a matrix factorization algorithm that predicts Netflix movie ratings. In this thesis, we develop a three-step algorithm for predicting movie ratings using a matrix factorization of the form M = AW: first we obtain a collection of genres using a topic modeling algorithm, then we generate a suitable A matrix from the collection of genres, and finally we use the A matrix to get the W matrix.