Top 10 Google Tech Talks

All Google Tech Talks are here (Google EngEDU is the actual name of the talk series). Thought I’d compile a top ten list…

  1. Python and Python 3000. Two talks about the Python language given by its inventor Guido van Rossum. The first is about the language’s origins and the second is about its future.
  2. How Open Source Projects Survive Poisonous People (And You Can Too). Really liked this talk. Really liked it! Given by the lead developers of SubVersion (and other large projects), this talk provides a guide to working as a team. Next time I lead a project, I am going to ask everyone to watch this before starting work.
  3. Winning the DARPA Grand Challenge. The story of the robot race thru the mojave desert. Having been on a Grand Challenge team, I appreciate just how hard it was to win.
  4. Scrum, et al. An excellent talk about the Scrum agile software development methodology.
  5. Wikipedia and MediaWiki. A talk about the implementation of Wikipedia given by it original, and for a long time, only paid staff developer. Not a very dynamic talk, but the insider perspective is interesting.
  6. Computers versus Common Sense. Doug Lenet gives a talk about the famous AI project Cyc. I thought this project was put to rest a long time ago. Guess I was wrong. Anyways, if you like symbolic Artifical Intelligence, its a really interesting talk.
  7. Scholarly Data, Network Science, and (Google) Maps. A very good information visualization talk.
  8. 15 Views of a Node Link Graph: An Information Visualization Portfolio. A bunch of visualization techniques. I think Tamara Munzner leads some of the most interesting visualization work anywhere.
  9. Human Computation. A talk about harnessing human knowledge for tasks such as spam filtering and image recognition.
  10. Scrum Tuning: Lessons learned from Scrum implementation at Google. A talk about the experience of using Scrum given by one of its inventors.

 

GapMinder Talk

Just read an article about Google buying a small company called GapMinder which does data visualization.  I checked out the talk on the GapMinder homepage, and would recommend watching the first 10 minutes of it.  The visualization tool that is used throughout the talk is something special…easy to see Google’s interest.

I think GapMinder was the tool used by Hans Rosling to make his really awesome presentation at TED Talks last year. You can find all the talks here. You can search by names. Some of those talks are simply outstanding.

http://www.ted.com/index.php

Please look at his earlier talk too.

Great blog, by the way. Visualization brings the data alive.

Its online now… quite interesting…
http://tools.google.com/gapminder/

I read the entire post and the comments and made the changes just as you all suggested and it seems to be working beautifully – the first time. This is far too good to be true!

I’m just now building my first WordPress site and it looks a lot easier than building sites with Dreamweaver or the other apps I’ve gotten used to (I’ve tried them all!)

I’ve got GoDaddy Deluxe Hosting where I can put multiple domains under a master domain for the same price. I just opened a new sub-folder named eucopyright where I’m going to build the new site and uploaded all the wordpress files to it.

5 minutes and it seems to work as advertised. Wow. Thanks!

I got to be honest, I have been to countless websites, read numerous amounts of information on forums and details about installing wordpress with my godaddy account. I got to say I am lost! It has taken me a two weeks to accept my faith and let go of my pride to omit my failure.
I’ve followed the ” Famous 5-Minute Install “ multiple times, but when I try to access my web site I get the message “Additionally, a 404 Not Found error was encountered while trying to use an Error Document to handle the request. “ At this point I don’t even know what questions to start asking, if someone could please help a lost soul I would really appreciate it.

Visualizing Science & Tech Activity in Wikipedia

If you didn’t see our original Wikipedia Activity Visualization, check it out here (there’s a detailed explanation, as well).  Also, there is a Google maps style zoomable version here.

This new version uses the same layout and images (well, slightly improved) as the original, b

ut this time we tried to highlight activity in regions of Wikipedia that are predominately math or science or technology.

So we developed a program to classify Wikipedia articles as being one of these three categories (or none), based on the categories the article was assigned to and their positions in the Wikipedia category link network.

We were not surprised to see a tight cluster of math pages, in a region, I would add, which has little ‘hot’ activity.  In fact, the only article in that region with lots of activity is the article “Earth”.  It was also not surprising that technology articles are fairly spread out among the topics.

What’s striking is the science-related band (green-blue) that runs diagonal through the middle of the topic map.  I won’t share my interpretation, but rather let those interested come up with there own.  Hope you enjoy, please leave comments!

(CLICK IMAGE TO ENLARGE)

Above: The most actively edited science-related articles.

Left: Not much science here…a good indication the algorithms are working pretty well!

 

Flash vs. Processing

Over the past year and a half I’ve been hooked on the language Processing. I’ve even contributed a early version library for visualizing social network data.

For those unfamiliar with Processing, it’s a variant of Java.  Its distinguished by its emphasis on interactive media. The fundamental unit in Processing is a sketch, which is entirely and continuously redrawn at some given rate.  Sketches may be compiled into either standalone apps or Java applets.

Many creative types work with Processing–here’s some cool examples: The Dumpster, Thinking Machines 4 (pictured left), and Relations Environment. More can be seen on exhibition.  Its not hard to see why I was excited by the language.

I’m still excited by the Processing community and all the cool apps and exhibits they are turning out.  However, today I am much less excited by the technology.

  1. As an Internet content delivery mechanism, Java Applets are a poor choice.  They have a large memory footprint and so are slow to load, and they provide few options for communicating between client and server.
  2. Sketches are redrawn in their entirety with each tick of the clock.  There are no layers.  Why is this a problem?  This limits the number of graphical objects one can involve in a sketch without the sketch slowing down.
  3. There are no user interface components of any quality.  And Java Swing is not compatible with Processing.  While this is a real limitation, this is a somewhat hollow complaint, as I realize it is only a matter of time before quality UI widgets are created within Processing for Processing.
  4. Java 1.5 is not supported.  My concern here deals with the fact that one of Processing’s greatest strengths is that a developer may use Java.  I am wondering whether the Processing community will be able to maintain this integration over time.

Recently I’ve been looking into Flash (e.g., the chart in Google Finance is Flash) and have begun to believe that Flash is a better alternative.  It is great at content delivery, has convenient UI components and support, and the fundamental unit of Flash, the “movie”, can be redrawn using separate layers.

I am still learning ActionScript, the object-oriented language used for Flash.  My early impressions of it are that recent versions 2.0 and 3.0 are of a pretty high quality, but a far cry from Java.

I’ll post a follow-up to this in a few months after I have more experience with Flash and Actionscript under my belt.

Scheme Tutorial

I was asked to give a short (1 hr) tutorial on the Scheme language this week for students in the graduate and undergraduate AI courses at Indiana.  Thought I would post the slides in case anyone wants to adapt it for their own purposes…

1) Back up WordPress MySQL DATABASE!
2) Delete all WordPress files in directories, except “wp-content” folder & contents within that folder!!!
–Once all set, wait about 15 minutes for GoDaddy to refresh–
3) Setup new MySQL database for the new WordPress you’re about to install.
4) Reinstall WordPress from scratch. Do not restore/import the MySQL Database yet!
5) Setup WordPress options, (ie previous theme, update & activate plug-ins, PERMALINK SETTINGS)
–Once all set, wait about 15 minutes for GoDaddy to refresh–
6) Restore MySQL Database into the new database (step #3)

I know this is an old post, but since people are still commenting. Anyone have any luck setting up post via email? I tried that and Postie. When I try to run the php url to get and post emails, I get an error that the connection timed out. Anyone get this to work?

Getting WordPress to work on GoDaddy hosting is pretty easy, including using permalinks and mod_rewrite. I use GoDaddy economy hosting: http://zacvineyard.com/blog/2008/10/24/wordpress-godaddy-and-permalinks/

it works like a charm. For those wanting further info. I would suggest the excellent book “Building a WordPress Blog People Want to Read” by Scott McMulty. It takes the mystery out of blogging with wordpress and has helped me tremendously with the finer details of the program.

We had so many issues multiple instances of WordPress on GoDaddy with the permalink feature. We were running WP on GoDaddy Windows hosting and had several blogs on one hosting account. We were never able to get it to work so we just switched to Bluehost. Thanks though this post is helpful for those trying to get started.

A Review of MemoryArchive.org

I recently came across a small site running on Mediawiki called MemoryArchive.org.  The concept is that each article is a memory written, unlike Wikipedia, by a single author.  Subjective content allowed.

There seems to be a legit place for a site with this concept to complement Wikipedia.  Wikipedia is derivative knowledge, it is intended that the content be cited, meaning it already had to have been published somewhere.  Many valuable (and not so valuable) facts don’t fit that bill.  Also, when sources disagree but are merged into a single Wikipedia article, history according to Wikipedia has a rather non-deterministic feel to it.

That said, MemoryArchive.org has a long way to go in terms of concept, technology, and adoption.  If anyone involved with MemoryArchive comes across this review…well, I have some ideas:

  1. The site needs to provide a data dump (similar to Wikipedia’s data dump) or API.  That way researchers can use the knowledge without scraping the content.  Incidentally, I have written a basic scraper in Perl for this site if anyone wants it.
  2. Use Semantic Mediawiki.  Its the future.
  3. Allow any users to create links, categories on any page.  You’re already using MediaWiki, might as well take advantage of the technology.
  4. Allow usernames to be linked to social network account such as Facebook.  It will create many opportunities for applications to use the memories, and for memories to be related to one another.
  5. Link events to Wikipedia pages on those events…as I said, its complimentary to Wikipedia.

 

Ensemble Machine Learning Tutorial

Here’s the slides from a 2-part lecture I’m giving on ensemble learning at Indiana University.  It includes a discussion of the Netflix Prize competition, and the use of ensemble techniques in that competition.

Introduction to Ensemble Learning

Featuring Successes in the Netflix Prize Competition

Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University

  • Introduction Bias and variance problems
  • The Netflix Prize Success of ensemble methods in the
  • Netflix Prize Why Ensemble Methods Work
  • Algorithms AdaBoost BrownBoost Random forests

Bias and Variance

Decision Trees Small trees have high bias.

Large trees have high variance. Why?

Ensemble Classification Aggregation of predictions of multiple classifiers with the goal of improving accuracy.

Supervised learning task Training data is a set of users and ratings (1,2,3,4,5 stars) those users have given to movies. Construct a classifier that given a user and an unrated movie, correctly classifies that movie as either 1, 2, 3, 4, or 5 stars $1 million prize for a 10% improvement over Netflix’s current movie recommender/classifier (MSE = 0.9514)

Intuitions

  • Utility of combining diverse, independent opinions in human decision-making Protective Mechanism (e.g. stock portfolio diversity)
  • Violation of Ockham’s Razor Identifying the best model requires identifying the proper “model complexity”

Strategies

  • Boosting-Make examples currently misclassified more important (or less, in some cases)
  • Bagging-Use different samples or attributes of the examples to generate diverse classifiers

Random forests

Let the number of training cases be M, and the number of variables in the classifier be N.

For each tree,

  • Choose a training set by choosing N times with replacement from all N available training cases.
  • For each node, randomly choose n variables on which to base the decision at that node.

Some more evidence of ensembling. In this case all the competition entrants predictions were combined after the event closed to see what could have been…