A new visualization Bruce Herr and I recently completed is being featured in this week’s New Scientist Magazine (the article is free online, minus the viz). They did a good job jazzing up the language used to describe the viz–’power struggle’, ‘bubbling mass’, ‘blitzed articles’–but they also dumbed down the technical accomplishments. I guess not everyone gets as excited about algorithms as I do.
Before I talk anymore about the viz, though, let me mention its appearing at the NetSci 2007 Conference this week, and hopefully a varient will appear at Wikimania later this summer as well. The viz is a huge 5 feet by 5 feet when printed, and I only include a low res, smaller version here. At some point high quality art prints of it will appear at SciMaps for sale to fund further visualization research.
Now for the good stuff. Much like my visualization of the netflix prize competition data, we began this piece by representing the data as a network. In this case the nodes in the network are wikipedia articles and the edges are the links between articles. We then (with some help from our friends at Sandia) used an algorithm to lay out all 650,000 nodes (wikipedia articles) that had at least one link in such a way that similar articles are near one another. These are the yellow dots, which when viewed at low res give a yellow tint to the whole picture.
The sizes of the nodes (circles, dots, whatever you want to call them), are based on a model of revision activity. So large circles indicate that an article might be controversial, or the subject of lots of vandalism, or just a topic whose content frequently changes. We labeled only the largest nodes, to keep it readable. There is an interactive version of this in the works based on the google maps platform which will change the labels and pictures used as the user ‘zooms’ in or out. Stay tuned for that.
The image used for each tile was selected automatically, simply by using the first image in the most linked to article among all the articles in that tile. We were pleasantly surprised by the quality of the images that appeared.
Our hope for this visualization approach, which we continue to improve on, is that it could be updated in real time to give a macro sense of what is happening in Wikipedia. I personally hope that some variation of it will end up in high schools as a teaching tool and for generating discussions.
Top 20 Most Hotly Revised Articles
- Adolf Hitler
- October 2003
- Nintendo revolution
- Hurricane Katrina
- Britney Spears
- PlayStation 3
- Saddam Hussein
- Albert Einstein
- 2004 Indian Ocean Earthquake
- New York City
- Pope Benedict XVI
- Ronald Regan