Saturday, June 19, 2010

Resources for Learning about Machine Learning

Posted by Danny Tarlow
I've been using Quora a bit lately, somewhat to the detriment of this blog (though that's not the full explanation for my slow posting schedule). Anyhow, Quora is a nice question and answer service that has been getting some press in the startup world recently. A while back, Quora released a Terms of Service that gives pretty liberal terms of use for the content on the website. Founder Adam D'Angelo summarizes:
You can reuse all new content on Quora by publishing it anywhere on the web, as long as you link back to the original content on Quora.
One question that has received some interest (49 followers of the question, and 11 answers) and might be relevant to the readers here is this one:
What are some good resources for learning about machine learning?

I've read Programming Collective Intelligence, and am looking any recommendations on follow up books/resource.
There were some good answers, even some of which I didn't know about. Here's a sampling of the answers:

My answer was Andrew Ng's YouTube videos:
http://www.youtube.com/view_play_list?p=A89DCFA6ADACE599

Some other good ones:
Jie Tang says...
Mike Jordan and his grad students teach a course at Berkeley called Practical Machine Learning which presents a broad overview of modern statistical machine learning from a practitioner's perspective. Lecture notes and homework assignments from last year are available at
http://www.cs.berkeley.edu/~jordan/courses/294-fall09/

A Google search will also turn up material from past years
Ben Newhouse says...
The textbook "Elements of Statistical Learning" has an obscene amount of material in it and is freely available in PDF form via http://www-stat.stanford.edu/~tibs/ElemStatLearn/

While more niche than general Machine Learning, I recently ripped through "Natural Image Statistics" (also downloadable at http://www.naturalimagestatistics.net/ ). It's a great read both for its explanations of your standard ML algo's (PCA, ICA, mixed gaussians etc) and for its real-world applications/examples in trying to understand the models used for analysis in our neural vision system
Jeremy Leibs gives the staple of David MacKay's book (I believe David MacKay would say that machine learning is just information theory), right?:
"Information Theory, Inference, and Learning Algorithms" by David MacKay has some decent introductory material if I remember. Available online:
http://www.inference.phy.cam.ac.uk/mackay/itila/book.html
Incidentally, I haven't read Programming Collective Intelligence, but it seems popular amongst non researchers. Do any of you know more about it?

Also, I have a few more Quora invites left, so if anybody wants in, let me know, and we'll see what we can do.

6 comments:

David Warde-Farley said...

I had a look at Programming Collective Intelligence and, while I can see the appeal for a novice, I was kind of unimpressed. It's a fairly standard survey of methods for prediction and clustering but I found their explanations to range from 'okay' to somewhat misleading.

Also, while they provide real code instead of pseudocode in the form of Python implementations, they do everything with built-in Python data types which will be intolerably slow for all but the most contrived small examples. I understand *why* they do this, i.e. to make everything (including all the loops) explicit, but I feel like people running this code (without an understanding of asymptotic complexity or the mechanics of interpreted programming languages) might get the wrong idea about why things are so slow and write off the methods themselves rather than the naive implementation.

Danny Tarlow said...

Great review. Thanks!

Mathieu Blondel said...

I've been reading "Machine Learning - An algorithmic perspective" by Stephen Marsland recently. At first I was appealed because it covers quite many standard ML algorithms and provides implementations in Python+Numpy.

The explanations are a bit laconic so the code snippets sometimes feel like they have been dropped as is. As always, there's no universal book on ML so this must be complemented with other resources to get a good understanding.

Another major complaint I have is that the code snippets in the book usually don't contain a function signature so you don't know the inputs and outputs of the algorithm! The snippets are also available online so you can always check them, though. The code often doesn't feel Numpy-ish. Loops are often used where a nice Numpy one-liner would do the job.

The writing tone is very informal so you like it or not. That with the code snippets probably make it more suitable for the occasional practitioner.

pierre.rosado said...

The following videos are a good introduction to the basic ML techniques: http://bit.ly/2154Eb

By the way, everyone interested in ML will appreciate the following post of Bradford Cross: http://bit.ly/b1Hmrk

doug-ybarbo said...

Hi--i just wanted to drop you a note just to let you know that i provided a link to your blog's home page in an answer on StackOverflow. Here's the SO link.

http://stackoverflow.com/questions/3491112/entry-level-posts-for-machine-learning/3491265#3491265

If you have any objection please let me know and i'll remove it or edit it asap per your request.

Danny Tarlow said...

@doug - I have no objections, and thanks for letting me know!