Monday, June 22, 2009

Hadoop in Python

Posted by Danny Tarlow
TO DO: Play around with Dumbo.
Dumbo is a Python module that allows you to easily write and run Hadoop streaming programs (it's named after Disney's flying circus elephant, since the logo for Hadoop is an elephant and Python was named after the BBC series "Monty Python's Flying Circus").
It apparently makes running Hadoop via Python a breeze:

