Another Visualization of the Netflix Prize Dataset
April 3rd, 2007
Here’s a recent visualization I did of the dataset used in the Netflix Prize Competition. The dataset is 17,700 movies and 31 gigs of user ratings. This viz shows similar movies close to one another, with the similarities determined by a formula based on ratings.
I found most interesting a cluster of movies (in blue) that I’d say are generally acclaimed. The cluster contains movies of across all genres, such as Schindler’s List, BraveHeart, and Super Size Me. Beyond that, there’s a bunch of clusters which are mostly defined by a genre such as music, sports, documentary, Imax, children’s films, or bonus material. The big blob in the center is mostly what I’d call junk movies.
I’ve labeled some movies just to give some sense of what the clusters contain. There’s an interactive version of the viz as well, so you can explore the movies for yourself…
April 26th, 2007 at 8:24 pm
You said that “this viz shows similar movies close to one another, with the similarities determined by a formula based on ratings.”
Can you share the details of how you determined the (x,y) position of each movie in the plot? (perhaps also on the Visualization section of the Netflix prize forums?)
April 26th, 2007 at 9:30 pm
The similarities were computed using the measure found in Sarwar, et al:
http://www.ra.ethz.ch/CDstore/www10/papers/pdf/p519.pdf
The ordination was done using the VxOrd algorithm (best-in-show for cluster visualizations)…
http://www.cs.ubc.ca/~tmm/courses/cpsc533c-04-spr/readings/clusterstab.pdf
Cheers,
Todd
May 1st, 2007 at 9:20 am
Hi Todd.
These visualizations are wonderful. I tried looking for Napolean Dynamite in both the static and interactive visuals. I couldn’t find it. This movie is apparently particularly polarizing. Any idea where in this data cloud, it might reside?
Thanks.
Satindra.
May 8th, 2007 at 6:02 pm
Hey Todd,
if you dont mind sharing I’d like to know more about the function you used to transport the graphs to 2D. What’s your D-function (the density thing)? I am asking because I was experimenting on the same way but my movies never seemed to cluster in any way (just a big bunch)
Pat
May 20th, 2007 at 5:31 pm
[...] Another Visualization of the Netflix Prize Dataset [...]
August 20th, 2007 at 3:11 am
[...] ABeautifulWWW.com – have a look at this page “Another Visualization of the Netflix Prize Dataset” and look around the site. You will be [...]
November 20th, 2007 at 9:42 pm
Where is i can get 17700 titles of movies from imdb, please give me the links. thank you.