An Interactive Visualization of the Netflix Prize Dataset

The visualization activated below (click the button) shows all 17,700 movies that are part of the Netflix Prize Competition. The movies are laid out such that simlar movies are close to one another. Similarity between two movies is computed based on whether users who like one like the other, or (and, really) those who dislike one dislike the other.  Alternatively, take a look at a colorful, static version.

Mouse over to get the movie titles…

PHP on windows is completely possible. If you had bought a domain name from godaddy and they say that you can’t host php on windows, then they are lieing. I suggest you setup your own blog on your own computer. I am writing a step by step on hosting wordpress on IIS & Godaddy. You can check it out at http://www.aleemonline.com/index.php/hosting-wordppress-with-iis

I too had to install WordPress for a client with a site on GoDaddy (ug!). I called GoDaddy after logging into my client’s account, and he walked me through it step by step. Piece of cake! There is MUCH to be said for live customer-service provided by a human being…after this experience I have a slightly better appreciation for GoDaddy, but their site still makes my eyes cross!!

I put together this guide for installation to make it as simple as possible from start to finish.

http://skepticalsinner.com/blog/2008/06/how-to-do-a-5-minute-install-of-wordpress-on-godaddy/

The good news is, if you’ve goofed up the first time, you can start from scratch using this guide.

I set up a custom php.ini file in my GoDaddy hosting account, by copying one from a server I own. If you’re familiar with the php.ini settings, or would like to try it out using the settings I chose, go to my web site (www.christiaanconover.com) and contact me, and I’ll send it to you. You simply upload the file named php.ini to your hosting root, and in about 24 hours the server will switch over to using it. I have it configured for use with GoDaddy servers, which will allow you to use mailer script.

WordPress on GoDaddy is as simple as can be without changing any files whatsoever. I listened to Ray’s instructions and from my Hosting Control Center, I went clicked on the tab Content. From there I clicked on GoDaddy Hosting Connections which brought me to the GoDaddy Hosting Connection Home. From here, you can either type in wordpress in the search or click on Tools/Scripts (located in the left panel). Under most popular (left panel) you’ll see wordpress at the top of the list. Click wordpress, and it will bring you to an installation screen where you press Install Now. From here, it’s self-explanotory and walks you thru the installation.

 

Another Visualization of the Netflix Prize Dataset

Here’s a recent visualization I did of the dataset used in the Netflix Prize Competition. The dataset is 17,700 movies and 31 gigs of user ratings. This viz shows similar movies close to one another, with the similarities determined by a formula based on ratings.

I found most interesting a cluster of movies (in blue) that I’d say are generally acclaimed. The cluster contains movies of across all genres, such as Schindler’s List, BraveHeart, and Super Size Me. Beyond that, there’s a bunch of clusters which are mostly defined by a genre such as music, sports, documentary, Imax, children’s films, or bonus material. The big blob in the center is mostly what I’d call junk movies.

I’ve labeled some movies just to give some sense of what the clusters contain. There’s an interactive version of the viz as well, so you can explore the movies for yourself…

In case you have a GoDaddy Windows ASP plan, and like me hit a very high frustration level until you remember this ‘minor’ detail, note that you can easily change your hosting plan to Linux.

In the main GoDaddy site, click on “My Hosting Account,” click on the account name (not “manage account”) and let the page reload. On the right hand side, the gray box will give you an option to”Upgrade/Downgrade Hosting Account”. Click on the link, and when the page reloads, highlight your preferred Linux hosting package. Accept your choice, and wait a few minutes for the new account to activate (your site should not go down while this takes place, mine didn’t).

You will now have an easier time installing Word Press on your own, or alternatively following Ray’s advice (which isn’t available for ASP accounts).

I’ve been uusing WordPress on our Linux/PHP shared hosting account for nearly a year. It was daunting at first but I got the hang of it, and I’m very satisfied with performance and ease of use.

GoDaddy has been excellent value, and I would recommend as a blog host.

I used the auto-install from GoDaddy’s applications, and upgraded later. The new version of WP is really good.

To answer Vet, you can do the URL change in a couple of ways that are easier than moving files.

1. GoDaddy allows an easy way of doing subdomains, and you can point them at your blog directory, giving you a new address.

example: blog.spatterblog.com -> http://www.spatterblog.com/wordpress

2. You can use a new domain name to point to the directory. Change your domain to something else: http://www.spatterblog-apps.com. Then do http://www.spatterblog.com -> http://www.spatterblog.com/wordpress

We do this with http://www.engaugement.com – try and you’ll see.

Scheme Tutorial

I was asked to give a short (1 hr) tutorial on the Scheme language this week for students in the graduate and undergraduate AI courses at Indiana.  Thought I would post the slides in case anyone wants to adapt it for their own purposes…

1) Back up WordPress MySQL DATABASE!
2) Delete all WordPress files in directories, except “wp-content” folder & contents within that folder!!!
–Once all set, wait about 15 minutes for GoDaddy to refresh–
3) Setup new MySQL database for the new WordPress you’re about to install.
4) Reinstall WordPress from scratch. Do not restore/import the MySQL Database yet!
5) Setup WordPress options, (ie previous theme, update & activate plug-ins, PERMALINK SETTINGS)
–Once all set, wait about 15 minutes for GoDaddy to refresh–
6) Restore MySQL Database into the new database (step #3)

I know this is an old post, but since people are still commenting. Anyone have any luck setting up post via email? I tried that and Postie. When I try to run the php url to get and post emails, I get an error that the connection timed out. Anyone get this to work?

Getting WordPress to work on GoDaddy hosting is pretty easy, including using permalinks and mod_rewrite. I use GoDaddy economy hosting: http://zacvineyard.com/blog/2008/10/24/wordpress-godaddy-and-permalinks/

it works like a charm. For those wanting further info. I would suggest the excellent book “Building a WordPress Blog People Want to Read” by Scott McMulty. It takes the mystery out of blogging with wordpress and has helped me tremendously with the finer details of the program.

We had so many issues multiple instances of WordPress on GoDaddy with the permalink feature. We were running WP on GoDaddy Windows hosting and had several blogs on one hosting account. We were never able to get it to work so we just switched to Bluehost. Thanks though this post is helpful for those trying to get started.

A Review of MemoryArchive.org

I recently came across a small site running on Mediawiki called MemoryArchive.org.  The concept is that each article is a memory written, unlike Wikipedia, by a single author.  Subjective content allowed.

There seems to be a legit place for a site with this concept to complement Wikipedia.  Wikipedia is derivative knowledge, it is intended that the content be cited, meaning it already had to have been published somewhere.  Many valuable (and not so valuable) facts don’t fit that bill.  Also, when sources disagree but are merged into a single Wikipedia article, history according to Wikipedia has a rather non-deterministic feel to it.

That said, MemoryArchive.org has a long way to go in terms of concept, technology, and adoption.  If anyone involved with MemoryArchive comes across this review…well, I have some ideas:

  1. The site needs to provide a data dump (similar to Wikipedia’s data dump) or API.  That way researchers can use the knowledge without scraping the content.  Incidentally, I have written a basic scraper in Perl for this site if anyone wants it.
  2. Use Semantic Mediawiki.  Its the future.
  3. Allow any users to create links, categories on any page.  You’re already using MediaWiki, might as well take advantage of the technology.
  4. Allow usernames to be linked to social network account such as Facebook.  It will create many opportunities for applications to use the memories, and for memories to be related to one another.
  5. Link events to Wikipedia pages on those events…as I said, its complimentary to Wikipedia.

 

Ensemble Machine Learning Tutorial

Here’s the slides from a 2-part lecture I’m giving on ensemble learning at Indiana University.  It includes a discussion of the Netflix Prize competition, and the use of ensemble techniques in that competition.

Introduction to Ensemble Learning

Featuring Successes in the Netflix Prize Competition

Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007 Indiana University

  • Introduction Bias and variance problems
  • The Netflix Prize Success of ensemble methods in the
  • Netflix Prize Why Ensemble Methods Work
  • Algorithms AdaBoost BrownBoost Random forests

Bias and Variance

Decision Trees Small trees have high bias.

Large trees have high variance. Why?

Ensemble Classification Aggregation of predictions of multiple classifiers with the goal of improving accuracy.

Supervised learning task Training data is a set of users and ratings (1,2,3,4,5 stars) those users have given to movies. Construct a classifier that given a user and an unrated movie, correctly classifies that movie as either 1, 2, 3, 4, or 5 stars $1 million prize for a 10% improvement over Netflix’s current movie recommender/classifier (MSE = 0.9514)

Intuitions

  • Utility of combining diverse, independent opinions in human decision-making Protective Mechanism (e.g. stock portfolio diversity)
  • Violation of Ockham’s Razor Identifying the best model requires identifying the proper “model complexity”

Strategies

  • Boosting-Make examples currently misclassified more important (or less, in some cases)
  • Bagging-Use different samples or attributes of the examples to generate diverse classifiers

Random forests

Let the number of training cases be M, and the number of variables in the classifier be N.

For each tree,

  • Choose a training set by choosing N times with replacement from all N available training cases.
  • For each node, randomly choose n variables on which to base the decision at that node.

Some more evidence of ensembling. In this case all the competition entrants predictions were combined after the event closed to see what could have been…