Algorithmically generated data visualizations I’ve created or collaborated on.
Netflix Prize v2.0 (2009)
Datasets: movie titles, similarities, coordinates, network (yFiles format)
I was asked to contribute a chapter to the O’Reilly book Beautiful Visualization (download chapter: bv_ch09.pdf). As part of that effort I decided to update a visualization of the Netflix Prize that I had previously done in 2007.
This visualization was constructed using the following process:
- Compute the similarity between all pairs of movies, based on user ratings
- Create a graph where the nodes are the movies, and edges exist between highly similar movies (edges are not displayed)
- Use a hierarchical graph layout algorithm to position the nodes so that similar nodes are near one another
- Use a label layout algorithm to position the movie names so as to minimize overlap
Yellowpages.com Query Logs (2009)
Datasets: similarities, coordiates, network (Pajek format)
For the second part of the Beautiful Visualization chapter, I used the same visualization technique as I did with the Netflix Prize to look at the query logs of YP.com from December 2008. For this visualization the nodes are search queries, and the similarity between queries is based on what businesses are clicked in the search results.
Systems Biology Pathways v2.0 (2008)
Second generation of visualizations of protein interaction networks. The main improvement in the second version is a layout algorithm that uses circles (rather than squares) for cell, nucleus, etc, and also places nodes on the membranes if biologically they belong there. All layout is done to minimize overlap, distance between connected nodes, and overall size.
Systems Biology Pathways v1.0 (2008)
This was an effort to take protein interaction networks, which are typically either hand-drawn or laid out as a flat network, and display them in biologically-significant hierarchy (nucleus within cell, etc), and make the hierarchy interactively zoom-able.
System Biology Researcher Communities (2008)
These are screen captures from a tool created to mine relationships among researchers and institutions in the systems biology field.
Science in Wikipedia (2007), With Bruce Herr and Katy Borner
Visual clustering of pages in Wikipedia. Pages were classified as Science, Math, Tech, or other using a naive baiyes classifier trained using the Wikipedia categories and category graph. Images were automatically extracted from Wikipedia and placed underneath to help label the image.
Netflix Prize v1.0 (2007)
Datasets: movie titles, similarities, coordinates, edges, nodes
Wikipedia (2006), With Bruce Herr and Katy Borner
Datasets: similarities, coordinates, network (Pajek format)
Visual clustering of pages in Wikipedia. Images were automatically extracted from Wikipedia and placed underneath to help label the image. Pages with a great deal of edit activity are reflected with the color red.
Bruce with one printed version:
US Patent Hierarchy (2006), With Katy Borner, Elisha Hardy, Bruce Herr, and Bradford Paley