Someone at work recently asked how he should go about studying machine learning on his own. So I’m putting together a little guide. This post will be a living document…I’ll keep adding to it, so please suggest additions and make comments.

Fortunately, there’s a ton of great resources that are free and on the web. The very best way to get started that I can think of is to read chapter one of The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2009 edition). The pdf is available online. Or buy the book on Amazon here, if you prefer.

Once you’ve read the first chapter, download R. R is an open-source statistics package/language that’s quite popular. Never heard of it? Check out this post (How Google and Facebook are using R).

Once you’ve installed R, maybe played around a little, then check out this page which describes the major machine learning packages in R. If you’re already familiar with some of the techniques, then dive in and start playing around with them in R. On the other hand, if it looks really complicated, don’t worry about it yet.

Oh, by the way, if you want to start playing around with machine learning in R, you’ll need data. Check out the UCI Machine Learning Repository. They have both real and toy datasets. The *iris* dataset, for example, is famous for showing up in many research publications.

I’d suggest next reading more of The Elements of Statistical Learning. Its an excellent book. Try doing some of the programming exercises using R. If you don’t like this book, there are plenty of others. Bishop’s Pattern Recognition and Machine Learning is a famous one. It can be a little difficult depending on your math background. Tom Mitchell’s Machine Learning is another that’s often used to teach the topic.