Algorithmically generated data visualizations I’ve created or collaborated on.

Netflix Prize v2.0 (2009)

Datasets: movie titles, similarities, coordinates, network (yFiles format)

I was asked to contribute a chapter to the O’Reilly book Beautiful Visualization (download chapter: bv_ch09.pdf). As part of that effort I decided to update a visualization of the Netflix Prize that I had previously done in 2007.

image1-8

This visualization was constructed using the following process:

  1. Compute the similarity between all pairs of movies, based on user ratings
  2. Create a graph where the nodes are the movies, and edges exist between highly similar movies (edges are not displayed)
  3. Use a hierarchical graph layout algorithm to position the nodes so that similar nodes are near one another
  4. Use a label layout algorithm to position the movie names so as to minimize overlap

Close-up…

image1-11 image1-9

images3

images2


 

Yellowpages.com Query Logs (2009)

Datasets: similarities, coordiates, network (Pajek format)

For the second part of the Beautiful Visualization chapter, I used the same visualization technique as I did with the Netflix Prize to look at the query logs of YP.com from December 2008. For this visualization the nodes are search queries, and the similarity between queries is based on what businesses are clicked in the search results.


Close-ups…
image1-4
image1-7

image1-5


 

Systems Biology Pathways v2.0 (2008)

Datasets: unavailable

Second generation of visualizations of protein interaction networks. The main improvement in the second version is a layout algorithm that uses circles (rather than squares) for cell, nucleus, etc, and also places nodes on the membranes if biologically they belong there. All layout is done to minimize overlap, distance between connected nodes, and overall size.

image1-5

 

image1-5


 

Systems Biology Pathways v1.0 (2008)

Datasets: unavailable

This was an effort to take protein interaction networks, which are typically either hand-drawn or laid out as a flat network, and display them in biologically-significant hierarchy (nucleus within cell, etc), and make the hierarchy interactively zoom-able.

image1-5 image1-5

 

System Biology Researcher Communities (2008)

Datasets: unavailable

These are screen captures from a tool created to mine relationships among researchers and institutions in the systems biology field.
image1-5
image1-5


 

Science in Wikipedia (2007), With Bruce Herr and Katy Borner

Datasets: unavailable

Visual clustering of pages in Wikipedia.  Pages were classified as Science, Math, Tech, or other using a naive baiyes classifier trained using the Wikipedia categories and category graph.  Images were automatically extracted from Wikipedia and placed underneath to help label the image.

key]

science

 


 

Netflix Prize v1.0 (2007)

Datasets: movie titles, similarities, coordinates, edges, nodes

science


 

Wikipedia (2006), With Bruce Herr and Katy Borner

Datasets: similarities, coordinates, network (Pajek format)

Visual clustering of pages in Wikipedia.  Images were automatically extracted from Wikipedia and placed underneath to help label the image.  Pages with a great deal of edit activity are reflected with the color red.

 

wikivisEnlargeSection

wikivisEnlargeSection2

Bruce with one printed version:

IMG_1940


 

US Patent Hierarchy (2006), With Katy Borner, Elisha Hardy, Bruce Herr, and Bradford Paley

Datasets: unavailable


Share