Saturday, July 10, 2010

Really Big Numbers

Posted by Danny Tarlow
This is over 10 years old--and it's long and it wanders--but what a great article by Scott Aaronson:

Some excerpts:
A biggest number contest is clearly pointless when the contestants take turns. But what if the contestants write down their numbers simultaneously, neither aware of the other’s? To introduce a talk on "Big Numbers," I invite two audience volunteers to try exactly this. I tell them the rules:

You have fifteen seconds. Using standard math notation, English words, or both, name a single whole number—not an infinity—on a blank index card. Be precise enough for any reasonable modern mathematician to determine exactly what number you’ve named, by consulting only your card and, if necessary, the published literature.
To prognosticators of artificial intelligence, Moore’s Law is a glorious herald of exponential growth. But exponentials have a drearier side as well. The human population recently passed six billion and is doubling about once every forty years. At this exponential rate, if an average person weighs seventy kilograms, then by the year 3750 the entire Earth will be composed of human flesh. But before you invest in deodorant, realize that the population will stop increasing long before this—either because of famine, epidemic disease, global warming, mass species extinctions, unbreathable air, or, entering the speculative realm, birth control.
Imagine trying to explain the Turing machine to Archimedes. The genius of Syracuse listens patiently as you discuss the papyrus tape extending infinitely in both directions, the time steps, states, input and output sequences. At last he explodes.

"Foolishness!" he declares (or the ancient Greek equivalent). "All you’ve given me is an elaborate definition, with no value outside of itself."

How do you respond? Archimedes has never heard of computers, those cantankerous devices that, twenty-three centuries from his time, will transact the world’s affairs. So you can’t claim practical application. Nor can you appeal to Hilbert and the formalist program, since Archimedes hasn’t heard of those either. But then it hits you: the Busy Beaver sequence. You define the sequence for Archimedes, convince him that BB(1000) is more than his 10^63 grains of sand filling the universe, more even than 10^63 raised to its own power 10^63 times. You defy him to name a bigger number without invoking Turing machines or some equivalent. And as he ponders this challenge, the power of the Turing machine concept dawns on him. Though his intuition may never apprehend the Busy Beaver numbers, his reason compels him to acknowledge their immensity. Big numbers have a way of imbuing abstract notions with reality.


Sunday, July 4, 2010

Choosing a First Machine Learning Project: Start by Reading or by Doing?

Posted by Danny Tarlow
Sarath writes about doing a project during his final year of university related to machine learning:
I am writing this email to ask for some advice. well the thing is i haven't decided on my project yet, as i decided it will be better if i took some time to just strengthen my fundamentals and may be work on something small. well i came across this great blog called measuring measures where they had put up a reading list for machine learning and it was may i say a bit overwhelming.

My present goal is doing a graduate course in some good university with some good machine learning research and one of the reason i wanted to do a great project as i have heard that would be a great way to getting into a good university.

So my question is should my first priority be getting a really good and deep understanding of the subject or should i be more concerned with doing some good project with respect to admissions?
There are others who are likely more qualified than I am to answer this one, but here are my two cents:

That post certainly has things that would be nice to learn, but you don't need to know all of that in order to be a successful researcher. Depending on what area you go into, you might need different subsets of those references, or you might need something different all together. (For example, a reference I go back to time and time again is Schrijver's Combinatorial Optimization, but it's not on that list).

I think you should pick a project in an area that you find interesting, then just dive in. At first, I'd be less concerned with doing something new. First, focus on understanding a couple different existing approaches to the specific problem you've chosen, and pick up the necessary background as you go by trying to implement the algorithms and replicate published results, following references when you get confused, looking up terms, etc. Perhaps most importantly, work on your research skills. Important things:
  • Clearly write up exactly what you are doing and why you are doing it. Keep it as short as possible while still having all the important information.
  • Set up a framework so you are organized when running experiments
  • Even if the results are not state of the art or terribly surprising, keep track of all the outputs of all your different executions with different data sets as inputs, different parameter settings, etc.
  • Visualize everything interesting about the data you are using, the execution of your algorithms, and your results. Look for patterns, and try to understand why you are getting the results that you are.
All the while, be on the lookout for specific cases where an algorithm doesn't work very well, assumptions that seem strange, or connections between the approach you're working on to other algorithms or problems that you've run across before. Any of these can be the seed of a good research project.

In my estimation, I'd think graduate schools would be more impressed by a relevant, carefully done project, even if it's not terribly novel, than they would be with you saying on your application that you have read a lot of books.

If you're looking for project ideas, check out recent projects that have been done by students of Andrew Ng's machine learning course at Stanford:

Perhaps some readers who have experience on graduate committees can correct or add to anything that I said that was wrong or incomplete.