Tuesday, December 21, 2010

Machine Learning for Human Memorization

Posted by Danny Tarlow
Here's a machine learning problem I find interesting but haven't been able to satisfactorily solve.

Background: As some of you know, I used to play Scrabble somewhat seriously. Most Tuesdays in middle school, I would go to the local scrabble club meetings and play 4 games against the best Scrabble players in the area (actually, it was usually 3 games, because the 4th game started past my bedtime). It's not your family game of Scrabble: to begin to be competitive, you need to know all of the two letter words, most of the threes, and you need to have some familiarity with a few of the other high-priority lists (e.g., vowel dumps; short q, z, j, and x words; at least a few of the bingo stems). See here for a good starting point.

Anyway, I recently went to the Toronto Scrabble Club meeting and had a great time. I think I'll start going with more regularity. As a busy machine learning researcher, though, I don't have the time or the mental capacity to memorize long lists of words anymore: for example, there are 972 legal three letter words and 3902 legal four letter words.

So I'm looking for an alternative to memorization. Typically during play, there will be a board position that could yield a high-scoring word, but it requires that XXX or XXXX be a word. It would be very helpful if I could spend a minute or so of pen and paper computation time, then arrive at an answer like, "this is a word with 90% probability". So what I really need is just a binary classifier that maps a word to probability of label "legal".

Problem description: In machine learning terms, it's a somewhat unique problem (from what I can tell). We're not trying to build a classifier that generalizes well, because the set of 3 (or 4) letter words is fixed: we have all inputs, and they're all labeled. At first glance, you might think this is an easy problem, because we can just choose a model with high model capacity, overfit the training data, and be done. There's no need for regularization if we don't care about overfitting, right? Well, not exactly. By this logic, we should just use a nearest neighbors classifier; but in order for me to run a nearest neighbors algorithm in my head, I'd need to memorize the entire training set!

There is some related work. Regularizers like the Bayesian information criterion (BIC) penalize a model for having many parameters. This is somewhat close, but not all sets of parameters are equally difficult for me to memorize. For example, if somewhere in the algorithm we used the first 26 prime numbers, then I could sit down at the beginning of each game and write out the numbers pretty quickly, without any memorization ahead of time needed. Similarly, I know all of the two letter words, so it would be cheap to include a feature about whether a substring is a valid two letter word. The measure of complexity is something like Kolmogorov complexity, but it's modified because there are certain sets of letters or numbers that might have high Kolmogorov complexity but which I already know. Let's call this me complexity. In the case of me, Danny complexity.

So here's my admittedly still vague description of the problem: find the feature set and binary classifier that best balances high training set accuracy with low Danny complexity. In particular, the Danny complexity should be much smaller than that of the naive memorization strategy.

My first attempt at this for three letter words was to define a feature mapping from each letter x to integer value f(x), then to search over integer values of c_1, c_2, c_3, and thresh for a linear classifier c_1 * f(x_1) + c_2 * f(x_2) + c_3 * f(x_3) > thresh. The Danny complexity isn't bad, depending on the form of f(x), but it hasn't been very successful in the accuracy department.

Any ideas? Can you do better? I'm attaching my code, along with the list of three letter words. letter_val(letter) defines f(x), and hash_fn maps a full word into some discrete bin (0 or 1 for binary classification, but you could imagine having more bins). I tried a couple different ways of defining f(x), which you can see in the code.
LETTER_EMBEDDING_1D = [
    'A', 'O', 'E', 'I', 'U',
    'Y', 'R', 'H', 'L', 'M',
    'N', 'C', 'K', 'T', 'P',
    'Q', 'W', 'S', 'B', 'D',
    'F', 'G', 'X', 'V', 'J', 'Z'
    ]

LETTERMAP = dict(zip(LETTER_EMBEDDING_1D, range(26)))

"""
LETTERMAP = {
    'A' : 2,
    'B' : 3,
    'C' : 5,
    'D' : 7,
    'E' : 11,
    'F' : 13,
    'G' : 17,
    'H' : 19,
    'I' : 23,
    'J' : 29,
    'K' : 31,
    'L' : 37,
    'M' : 41,
    'N' : 43,
    'O' : 47,
    'P' : 53,
    'Q' : 59,
    'R' : 61,
    'S' : 67,
    'T' : 71,
    'U' : 73,
    'V' : 79,
    'W' : 83,
    'X' : 89,
    'Y' : 97,
    'Z' : 101
    }
"""


def letter_val(letter):
    return LETTERMAP[letter]
    #return ord(letter) - ord('A') + 1

def hash_fn(word, c1, c2, c3, m):
    if (c1*letter_val(word[0]) + c2*letter_val(word[1]) + c3*letter_val(word[2])) > m:
        return 1
    else:
        return 0

    #return (c1*letter_val(word[0]) + c2*letter_val(word[1]) + c3*letter_val(word[2])) % m


f = open("three_letter_words.txt")
GOODWORDS = []
for line in f:
    GOODWORDS.extend(line.split())
print GOODWORDS

LETTERS = [chr(65+i) for i in range(26)]
ALLWORDS = [a + b + c for a in LETTERS for b in LETTERS for c in LETTERS]

m = 2
for thresh in range(-1, 2):
    for c1 in range(-2,3):
        for c2 in range(-2,3):
            for c3 in range(-2,3):
                
                good_hashes = []
                all_hashes = []

                for w in GOODWORDS:
                    good_hashes.append(hash_fn(w, c1, c2, c3, thresh))

                for w in ALLWORDS:
                    all_hashes.append(hash_fn(w, c1, c2, c3, thresh))

                num_good = m*[0]
                num_all = m*[0]

                for h in good_hashes:
                    num_good[h] += 1
                
                for h in all_hashes:
                    num_all[h] += 1

                scores = m*[0]

                for i in range(m):
                    if num_all[i] > 0:
                        scores[i] = float(num_good[i]) / num_all[i]

                if max(scores) > .3:
                    print thresh,"\t", c1, c2, c3,":\t",
                    print scores
                    print
Save this list as three_letter_words.txt. If you're feeling ambitious, you can get the four letter words here.
AAH AAL AAS ABA ABO ABS ABY ACE ACT ADD ADO ADS ADZ AFF AFT AGA AGE AGO AHA AID
AIL AIM AIN AIR AIS AIT ALA ALB ALE ALL ALP ALS ALT AMA AMI AMP AMU ANA AND ANE 
ANI ANT ANY APE APT ARB ARC ARE ARF ARK ARM ARS ART ASH ASK ASP ASS ATE ATT AUK 
AVA AVE AVO AWA AWE AWL AWN AXE AYE AYS AZO 
BAA BAD BAG BAH BAL BAM BAN BAP BAR BAS BAT BAY BED BEE BEG BEL BEN BET BEY BIB 
BID BIG BIN BIO BIS BIT BIZ BOA BOB BOD BOG BOO BOP BOS BOT BOW BOX BOY BRA BRO 
BRR BUB BUD BUG BUM BUN BUR BUS BUT BUY BYE BYS 
CAB CAD CAM CAN CAP CAR CAT CAW CAY CEE CEL CEP CHI CIS COB COD COG COL CON COO 
COP COR COS COT COW COX COY COZ CRY CUB CUD CUE CUM CUP CUR CUT CWM 
DAB DAD DAG DAH DAK DAL DAM DAP DAW DAY DEB DEE DEL DEN DEV DEW DEX DEY DIB DID 
DIE DIG DIM DIN DIP DIS DIT DOC DOE DOG DOL DOM DON DOR DOS DOT DOW DRY DUB DUD 
DUE DUG DUI DUN DUO DUP DYE 
EAR EAT EAU EBB ECU EDH EEL EFF EFS EFT EGG EGO EKE ELD ELF ELK ELL ELM ELS EME 
EMF EMS EMU END ENG ENS EON ERA ERE ERG ERN ERR ERS ESS ETA ETH EVE EWE EYE 
FAD FAG FAN FAR FAS FAT FAX FAY FED FEE FEH FEM FEN FER FET FEU FEW FEY FEZ FIB 
FID FIE FIG FIL FIN FIR FIT FIX FIZ FLU FLY FOB FOE FOG FOH FON FOP FOR FOU FOX 
FOY FRO FRY FUB FUD FUG FUN FUR 
GAB GAD GAE GAG GAL GAM GAN GAP GAR GAS GAT GAY GED GEE GEL GEM GEN GET GEY GHI 
GIB GID GIE GIG GIN GIP GIT GNU GOA GOB GOD GOO GOR GOT GOX GOY GUL GUM GUN GUT 
GUV GUY GYM GYP 
HAD HAE HAG HAH HAJ HAM HAO HAP HAS HAT HAW HAY HEH HEM HEN HEP HER HES HET HEW 
HEX HEY HIC HID HIE HIM HIN HIP HIS HIT HMM HOB HOD HOE HOG HON HOP HOT HOW HOY 
HUB HUE HUG HUH HUM HUN HUP HUT HYP 
ICE ICH ICK ICY IDS IFF IFS ILK ILL IMP INK INN INS ION IRE IRK ISM ITS IVY 
JAB JAG JAM JAR JAW JAY JEE JET JEU JEW JIB JIG JIN JOB JOE JOG JOT JOW JOY JUG 
JUN JUS JUT 
KAB KAE KAF KAS KAT KAY KEA KEF KEG KEN KEP KEX KEY KHI KID KIF KIN KIP KIR KIT 
KOA KOB KOI KOP KOR KOS KUE 
LAB LAC LAD LAG LAM LAP LAR LAS LAT LAV LAW LAX LAY LEA LED LEE LEG LEI LEK LET 
LEU LEV LEX LEY LEZ LIB LID LIE LIN LIP LIS LIT LOB LOG LOO LOP LOT LOW LOX LUG 
LUM LUV LUX LYE 
MAC MAD MAE MAG MAN MAP MAR MAS MAT MAW MAX MAY MED MEL MEM MEN MET MEW MHO MIB 
MID MIG MIL MIM MIR MIS MIX MOA MOB MOC MOD MOG MOL MOM MON MOO MOP MOR MOS MOT 
MOW MUD MUG MUM MUN MUS MUT 
NAB NAE NAG NAH NAM NAN NAP NAW NAY NEB NEE NET NEW NIB NIL NIM NIP NIT NIX NOB 
NOD NOG NOH NOM NOO NOR NOS NOT NOW NTH NUB NUN NUS NUT 
OAF OAK OAR OAT OBE OBI OCA ODD ODE ODS OES OFF OFT OHM OHO OHS OIL OKA OKE OLD
OLE OMS ONE ONS OOH OOT OPE OPS OPT ORA ORB ORC ORE ORS ORT OSE OUD OUR OUT OVA 
OWE OWL OWN OXO OXY 
PAC PAD PAH PAL PAM PAN PAP PAR PAS PAT PAW PAX PAY PEA PEC PED PEE PEG PEH PEN 
PEP PER PES PET PEW PHI PHT PIA PIC PIE PIG PIN PIP PIS PIT PIU PIX PLY POD POH 
POI POL POM POP POT POW POX PRO PRY PSI PUB PUD PUG PUL PUN PUP PUR PUS PUT PYA
PYE PYX 
QAT QUA 
RAD RAG RAH RAJ RAM RAN RAP RAS RAT RAW RAX RAY REB REC RED REE REF REG REI REM 
REP RES RET REV REX RHO RIA RIB RID RIF RIG RIM RIN RIP ROB ROC ROD ROE ROM ROT 
ROW RUB RUE RUG RUM RUN RUT RYA RYE 
SAB SAC SAD SAE SAG SAL SAP SAT SAU SAW SAX SAY SEA SEC SEE SEG SEI SEL SEN SER 
SET SEW SEX SHA SHE SHH SHY SIB SIC SIM SIN SIP SIR SIS SIT SIX SKA SKI SKY SLY 
SOB SOD SOL SON SOP SOS SOT SOU SOW SOX SOY SPA SPY SRI STY SUB SUE SUM SUN SUP 
SUQ SYN 
TAB TAD TAE TAG TAJ TAM TAN TAO TAP TAR TAS TAT TAU TAV TAW TAX TEA TED TEE TEG
TEL TEN TET TEW THE THO THY TIC TIE TIL TIN TIP TIS TIT TOD TOE TOG TOM TON TOO 
TOP TOR TOT TOW TOY TRY TSK TUB TUG TUI TUN TUP TUT TUX TWA TWO TYE 
UDO UGH UKE ULU UMM UMP UNS UPO UPS URB URD URN USE UTA UTS 
VAC VAN VAR VAS VAT VAU VAV VAW VEE VEG VET VEX VIA VIE VIG VIM VIS VOE VOW VOX 
VUG 
WAB WAD WAE WAG WAN WAP WAR WAS WAT WAW WAX WAY WEB WED WEE WEN WET WHA WHO WHY 
WIG WIN WIS WIT WIZ WOE WOG WOK WON WOO WOP WOS WOT WOW WRY WUD WYE WYN 
XIS 
YAH YAK YAM YAP YAR YAW YAY YEA YEH YEN YEP YES YET YEW YID YIN YIP YOB YOD YOK 
YOM YON YOU YOW YUK YUM YUP 
ZAG ZAP ZAX ZED ZEE ZEK ZIG ZIN ZIP ZIT ZOA ZOO

Monday, November 1, 2010

Optimizing Loading Images in C++

Posted by Danny Tarlow
I'm working on a project where I need to load a lot of images into a C++ program, and it's taking up an annoyingly long amount of time relative to the rest of the program (the loading of data takes more time than running my algorithms), so I put in a couple hours to see if I could optimize it.

The basic setup is that for each example and iteration, I need to load around 100 images and iterate over all pixels. The images are each of size around 150x200 pixels. A typical full run of the full algorithm does around twenty iterations on a couple hundred examples (say 200). To produce the results I need, it will take 20 or 30 full runs. I can parallelize a lot, but I figured it was worth taking a pass optimizing my input/output code a bit first.

My initial implementation used the CImg library to load the images. It is a simple loop, iterating over filenames, loading the images, then iterating over the pixels to construct the model. For a single example, it takes about 6.7 seconds: 2.2 seconds to load the 100 images, and 4.5 seconds to iterate over them and construct the model. For a full run, that amounts to roughly 20 * 200 * 6.7 = 26800 seconds, or 7.4 hours. In reality, I usually split the work over 4 cores, so I can get results in ~2 hours.

The new version I am playing around with uses Google Protocol Buffers instead. Instead of loading each of the images separately, I write all of the pixel values of the 100 images into a protocol buffer, then load the single file instead of the 100 separate ones. For a single example, this cuts the time down to about 2.3 seconds: .3 seconds to load the data, and 2.0 seconds to iterate over the values and construct the model. It's not earth shattering, but it's still a nice speedup, cutting the input/output component of time for a full run down to about 9200 seconds, or 2.6 hours (2.6/7.4 = 35%).

Wednesday, October 20, 2010

An Algorithm to Generate Impossible Art?

Posted by Danny Tarlow
I posted last time about constraint satisfaction problems that are locally consistent (arc-consistent) everywhere but inconsistent globally. Part of the interesting bit was the connection to "impossible" Escher-style art, such as the impossible triangle.



This is really only the beginning, though. There should be many more examples where we have local consistency but global inconsistency. For example, it's straightforward to extend the 3-cycle that produces the impossible triangle to any odd-cycle. This produces impossible pentagons, impossible heptagons, etc. I've drawn the pentagon and heptagon examples below: Naturally, I want to push this further, finding more interesting figures to draw. Here's a proposed algorithm:
  • Generate a random constraint satisfaction problem that is arc-consistent but unsatisfiable.
  • Translate the kernel--the locally consistent problem that remains after running arc-consistency--into a drawing.
Both steps present some challenges. Unfortunately I don't (yet) have the answers. Maybe somebody out there can help:

For the first, how do we do it efficiently? I don't know, but we can think of doing guess and check for now. If somebody knows more about this, I'd be interested to hear about it. It seems related to generating hard instances to evaluate constraint satisfaction solvers.

For the second, perhaps somebody with some art background can help. How do we do this generally? I can extend the triangle example to odd cycles of this special form where each variable (corner) has two possible interpretations locally (UPDATE: I'm not sure this is the right interpretation): one where the figure is coming out of the page and one where it is going into the page. But how do we draw a visual analog of a variable with only one legal interpretation? How do we draw variables with more than 2 possible states? How do we draw examples with higher connectivity?

Does anybody know of references to any prior work on this topic?

Saturday, October 16, 2010

Limitations of Locality

Posted by Danny Tarlow
In this day and age, we're used to tackling problems where one processor just isn't enough.

The standard solution, of course, is to use more processors. Frameworks like MapReduce (and Hadoop) give us an abstraction where a master process spawns off a bunch of worker processes, each processor does some work on its own, then they all report their piece of results, which are combined together by the master process at the end to give the answer.

This works great when the pieces of the problem are independent-- that is, the workers don't need to communicate with each other in order to do their job. If your goal is to compute the average of a huge list of numbers, this works great: each processor reports the sum of numbers in its part of the list and the number of elements. If your goal is to compute the median, then it doesn't work so easily. (Think about it.)

So what do we do when work that the processors are doing isn't independent? Imagine that there are a bunch of interacting pieces to your model. For example, in computer vision, a common problem is to label each pixel in an image with the class of object at that pixel. We shouldn't expect to be able to label each pixel independently, because we know that objects in the world are "blobby", so knowing the value of one pixel will tell you something about the likely values of nearby pixels.

As another example, consider the problem of trying to label a webpage with a category representing what it's about. You could try to do it independently--just looking at the words and images on the page. However, a better approach would probably be to take into account dependencies between pages: if page A links to page B, that probably tells us something about the relationship between the category of A and the category of B. This could be especially useful for the cases where a page itself has few words, but it's linked to by many other pages.

In machine learning, perhaps our most powerful formalism to represent these types of interaction is the probabilistic graphical model. The power comes in the ability to precisely specify the structure of interactions in our problem via a graph (or hypergraph). We can then design message passing algorithms that take advantage of this structure: each region does some local computation, and then parts of the model that directly interact with each other exchange messages. In this way, information is propagated amongst the workers, but computation is still mostly done locally. If you've read about Google's Pregel infrastructure, it probably sounds pretty similar.

Unfortunately, there are certain fundamental limitations to doing things completely locally. The interesting thing is that the same fundamental issue comes up in a variety of settings.

In message passing algorithms (e.g., http://en.wikipedia.org/wiki/Belief_propagation), neighboring regions send messages to each other to try to enforce local consistency--that is, when two regions have beliefs about overlapping variables, they pass messages to each other to try to ensure that they agree on the value of the overlapping variables. You would hope that if you can find a solution that is locally consistent everywhere, that this would imply that there is a globally consistent solution.

Interestingly, this isn't the case. In MAP inference, this crops up in the looseness of the standard linear programming relaxation, which has constraints that enforce local but not global consistency. In constraint satisfaction problems, this comes up in the incompleteness of arc-consistency algorithms: there are certain unsatisfiable constraint satisfaction problems that arc-consistency, a local consistency checking algorithm, cannot recognize as inconsistent.
http://en.wikipedia.org/wiki/Local_consistency#Arc_consistency

I was discussing these connections at my tea talk the other day, and Geoff Hinton pointed out that this same phenomenon also comes up in the impossible triangle optical illusion. I think that's kind of cool -- people also get confused when things look sufficiently consistent locally but actually are inconsistent globally. Here's a diagram showing my interpretation of his comment along with the constraint satisfaction example.

Note that when you look at each corner independently (top row), everything looks consistent in isolation. However, when you look at the figure as a whole (bottom left), there is no globally consistent interpretation. In the constraint satisfaction example (bottom right), bold black edges represent legal combinations of variables, while missing edges represent illegal combinations. In this case, the arc-consistency algorithm terminates without finding an inconsistency, but if you work through the possibilities, you'll see that there is no legal assignment.

Now this isn't to say things are hopeless. Even though our locally consistent solutions can be very bad in the worst case, they often are good enough to be useful. If we want globally consistent exact solutions, we have lift and project methods from theoretical computer science, which show how to gradually move from local to global consistency. The big three frameworks are covered here:
A Comparison of the Sherali-Adams, Lovász-Schrijver and Lasserre Relaxations for 0-1 Programming
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.893

In the worst case, these methods become exponentially expensive as you move to become more global (which is not surprising, since we can represent many NP-hard problems with graphical models). However, it turns out that in practice, the real-world problems that we are interested are not the hardest computational problems. In some recent work, David Sontag and company have shown that just making things a little more global can be enough to exactly solve hard protein folding and protein design problems from computational biology: D. Sontag, T. Meltzer, A. Globerson, Y. Weiss, T. Jaakkola. Tightening LP Relaxations for MAP using Message Passing. Uncertainty in Artificial Intelligence (UAI) 24, July 2008.
[pdf] [code]

I should also (shamelessly) point out that global interactions don't always have to amount to intractable models. Some of my own recent work focuses on how we can get the best of both worlds when we have parts of the model that interact locally along with some additional global interactions. It turns out that you can develop very efficient message passing algorithms in some of these cases as well: D. Tarlow, I. Givoni, and R. Zemel. HOP-MAP: Efficient Message Passing with High Order Potentials. The 13th International Conference on Artificial Intelligence and Statistics (AISTATS 2010)
[pdf] [code]

Wednesday, October 13, 2010

Max-Product Belief Propagation and Constraint Satisfaction Problems

Posted by Danny Tarlow
I'm giving the machine learning tea talk today on the Augmenting DAG (directed acyclic graph) algorithm, which is an algorithm with some interesting history. It seems that there was a line of work starting in the mid seventies in the USSR about relating the discrete maximum a posteriori (MAP) inference problem (in English: find the most likely explanation of variables under some probabilistic model) in pairwise graphical models to: (a) constraint satisfaction problems, and (b) linear program relaxations. The work went unnoticed for 30 years (it was published in Russian), but it was recently discovered and reviewed by Tomas Werner in the 2007 journal paper available here:
T. Werner. A Linear Programming Approach to Max-sum Problem: A Review. IEEE Trans. on Pattern Recognition and Machine Intelligence (PAMI) 29(7), July 2007.
ftp://cmp.felk.cvut.cz/pub/cmp/articles/werner/Werner-PAMI-2007.pdf

It's interesting, because even today, the best known general algorithms for the MAP inference problem are based on linear program relaxations (though we solve the relaxation using algorithms that are more efficient than general purpose solvers). The most notable modern versions are variants on max-product belief propagation: the tree-reweighted belief propagation (TRW) of Martin Wainwright et al. and the sequential TRW (TRW-S) algorithm of Vladimir Kolmogorov (code here: http://vision.middlebury.edu/MRF/code/).

The other interesting bit is the connection between MAP and constraint satisfaction. It turns out that the inner loop of the Augmenting DAG algorithm is to run an arc-consistency algorithm. If you find an arc-consistency violation in an appropriately constructed graph, then you have also discovered a step that you can take in the dual of the MAP linear program relaxation that will improve your upper bound.

In practice, the prevailing wisdom is that you're better off using max-product or TRW-S to solve a general problem, but in my opinion, the Augmenting DAG algorithm is a very nice algorithm, and the paper makes a lot of interesting connections to work in other fields. It's definitely worth a read if you're interested in this sort of thing.

Saturday, July 10, 2010

Really Big Numbers

Posted by Danny Tarlow
This is over 10 years old--and it's long and it wanders--but what a great article by Scott Aaronson:
http://www.scottaaronson.com/writings/bignumbers.html

Some excerpts:
A biggest number contest is clearly pointless when the contestants take turns. But what if the contestants write down their numbers simultaneously, neither aware of the other’s? To introduce a talk on "Big Numbers," I invite two audience volunteers to try exactly this. I tell them the rules:

You have fifteen seconds. Using standard math notation, English words, or both, name a single whole number—not an infinity—on a blank index card. Be precise enough for any reasonable modern mathematician to determine exactly what number you’ve named, by consulting only your card and, if necessary, the published literature.
...
To prognosticators of artificial intelligence, Moore’s Law is a glorious herald of exponential growth. But exponentials have a drearier side as well. The human population recently passed six billion and is doubling about once every forty years. At this exponential rate, if an average person weighs seventy kilograms, then by the year 3750 the entire Earth will be composed of human flesh. But before you invest in deodorant, realize that the population will stop increasing long before this—either because of famine, epidemic disease, global warming, mass species extinctions, unbreathable air, or, entering the speculative realm, birth control.
...
Imagine trying to explain the Turing machine to Archimedes. The genius of Syracuse listens patiently as you discuss the papyrus tape extending infinitely in both directions, the time steps, states, input and output sequences. At last he explodes.

"Foolishness!" he declares (or the ancient Greek equivalent). "All you’ve given me is an elaborate definition, with no value outside of itself."

How do you respond? Archimedes has never heard of computers, those cantankerous devices that, twenty-three centuries from his time, will transact the world’s affairs. So you can’t claim practical application. Nor can you appeal to Hilbert and the formalist program, since Archimedes hasn’t heard of those either. But then it hits you: the Busy Beaver sequence. You define the sequence for Archimedes, convince him that BB(1000) is more than his 10^63 grains of sand filling the universe, more even than 10^63 raised to its own power 10^63 times. You defy him to name a bigger number without invoking Turing machines or some equivalent. And as he ponders this challenge, the power of the Turing machine concept dawns on him. Though his intuition may never apprehend the Busy Beaver numbers, his reason compels him to acknowledge their immensity. Big numbers have a way of imbuing abstract notions with reality.
http://www.scottaaronson.com/writings/bignumbers.html

56c93e000237489dae8d4cc906ff1c99

Sunday, July 4, 2010

Choosing a First Machine Learning Project: Start by Reading or by Doing?

Posted by Danny Tarlow
Sarath writes about doing a project during his final year of university related to machine learning:
I am writing this email to ask for some advice. well the thing is i haven't decided on my project yet, as i decided it will be better if i took some time to just strengthen my fundamentals and may be work on something small. well i came across this great blog called measuring measures where they had put up a reading list for machine learning and it was may i say a bit overwhelming. http://measuringmeasures.com/blog/2010/3/12/learning-about-machine-learning-2nd-ed.html

My present goal is doing a graduate course in some good university with some good machine learning research and one of the reason i wanted to do a great project as i have heard that would be a great way to getting into a good university.

So my question is should my first priority be getting a really good and deep understanding of the subject or should i be more concerned with doing some good project with respect to admissions?
There are others who are likely more qualified than I am to answer this one, but here are my two cents:

That post certainly has things that would be nice to learn, but you don't need to know all of that in order to be a successful researcher. Depending on what area you go into, you might need different subsets of those references, or you might need something different all together. (For example, a reference I go back to time and time again is Schrijver's Combinatorial Optimization, but it's not on that list).

I think you should pick a project in an area that you find interesting, then just dive in. At first, I'd be less concerned with doing something new. First, focus on understanding a couple different existing approaches to the specific problem you've chosen, and pick up the necessary background as you go by trying to implement the algorithms and replicate published results, following references when you get confused, looking up terms, etc. Perhaps most importantly, work on your research skills. Important things:
  • Clearly write up exactly what you are doing and why you are doing it. Keep it as short as possible while still having all the important information.
  • Set up a framework so you are organized when running experiments
  • Even if the results are not state of the art or terribly surprising, keep track of all the outputs of all your different executions with different data sets as inputs, different parameter settings, etc.
  • Visualize everything interesting about the data you are using, the execution of your algorithms, and your results. Look for patterns, and try to understand why you are getting the results that you are.
All the while, be on the lookout for specific cases where an algorithm doesn't work very well, assumptions that seem strange, or connections between the approach you're working on to other algorithms or problems that you've run across before. Any of these can be the seed of a good research project.

In my estimation, I'd think graduate schools would be more impressed by a relevant, carefully done project, even if it's not terribly novel, than they would be with you saying on your application that you have read a lot of books.

If you're looking for project ideas, check out recent projects that have been done by students of Andrew Ng's machine learning course at Stanford:
http://www.stanford.edu/class/cs229/projects2008.html
http://www.stanford.edu/class/cs229/projects2009.html

Perhaps some readers who have experience on graduate committees can correct or add to anything that I said that was wrong or incomplete.

Monday, June 28, 2010

Embracing Uncertainty

Posted by Danny Tarlow
If you're around London this week, try heading down to the Southbank Centre for the Royal Society's 350th anniversary event. The Machine Learning and Perception group at Microsoft Research Cambridge has a booth with a bunch of demos set up, all revolving around "Embracing Uncertainty: The New Machine Intelligence". I particularly like the Galton machine (or Quincunx).

I'll be down there on Tuesday and Friday.

Tuesday, June 22, 2010

Machine Learning (ICML) Discussion Site

Posted by Danny Tarlow
The International Conference on Machine Learning (ICML) is happening now in Haifa, Israel. There have been attempts at sites to discuss papers in previous years as well, but this year, there is a revamped version. Most notably, you can subscribe to an RSS feed of the discussion, which should make it easier for people to keep tabs on discussions.

http://mldiscuss.appspot.com/

Like many others, I think discussion sites are a nice idea. It's a step towards making research more interactive beyond just the conference poster session, it can help clarify ambiguities the readers might have of the paper, and I think it has the potential to make authors more accountable for their work. Hopefully it will catch on.

Oh, and I haven't had a chance to read any of them in detail yet, but here are the papers that caught my eye and will probably make it to my reading list:

Continuous-Time Belief Propagation
Tal El-Hay (Hebrew University); Ido Cohn; Nir Friedman (Hebrew University); Raz Kupferman (Hebrew University)

Interactive Submodular Set Cover
Andrew Guillory (University of Washington); Jeff Bilmes (University of Washington, Seat)

A fast natural Newton method
Nicolas Le Roux (Microsoft Cambridge); Andrew Fitzgibbon (Microsoft Research)

Non-Local Contrastive Objectives
David Vickrey (Stanford University); Cliff Chiung-Yu Lin (Stanford University); Daphne Koller (Stanford University)

Accelerated dual decomposition for MAP inference
Vladimir Jojic (Stanford University); Stephen Gould (Stanford University); Daphne Koller (Stanford University)

Monday, June 21, 2010

Bayesian NBA Basketball Predictions

Posted by Danny Tarlow
Those of you who read this blog regularly know that I like to play around each year predicting March Madness basketball scores. This last year, Lee got involved, and we ran the first annual March Madness Predictive Analytics Challenge, which by all measures was a great success.

Well, it's fun to run my model, but it's a pretty basic model. It doesn't use any information other than the scores of each game, so important things like when the game was played, whether it was a home or away game for each team, and various other pieces of side information are ignored. It's not that I don't think there's useful additional information, it's just tricky to figure out a good way to get it into the model.

So I'm quite pleased to report that some of my buddies from the Toronto machine learning group -- Ryan, George, and Iain -- had some great ideas. They wrote a paper about the ideas, which will be appearing at the upcoming conference, Uncertainty in Artificial Intelligence (UAI 2010). They're also releasing their data and code. The rough idea of the model is to train a different model for each context (which is given by the approximate date, who is home/away, and other side information), but to constrain the models with similar contexts to have similar parameters using Gaussian Process priors. As they say in the abstract:
We propose a framework for incorporating side information by coupling together multiple PMF problems via Gaussian process priors. We replace scalar latent features with functions that vary over the covariate space. The GP priors on these functions require them to vary smoothly and share information. We apply this new method to predict the scores of professional basketball games, where side information about the venue and date of the game are relevant for the outcome.
It's a cool model and a really nice idea. If you followed the previous action related to March Madness, I encourage you to take a look. And of course, it's never too early to be thinking about your entry for the 2011 March Madness Predictive Analytics Challenge!

Saturday, June 19, 2010

Resources for Learning about Machine Learning

Posted by Danny Tarlow
I've been using Quora a bit lately, somewhat to the detriment of this blog (though that's not the full explanation for my slow posting schedule). Anyhow, Quora is a nice question and answer service that has been getting some press in the startup world recently. A while back, Quora released a Terms of Service that gives pretty liberal terms of use for the content on the website. Founder Adam D'Angelo summarizes:
You can reuse all new content on Quora by publishing it anywhere on the web, as long as you link back to the original content on Quora.
One question that has received some interest (49 followers of the question, and 11 answers) and might be relevant to the readers here is this one:
What are some good resources for learning about machine learning?

I've read Programming Collective Intelligence, and am looking any recommendations on follow up books/resource.
There were some good answers, even some of which I didn't know about. Here's a sampling of the answers:

My answer was Andrew Ng's YouTube videos:
http://www.youtube.com/view_play_list?p=A89DCFA6ADACE599

Some other good ones:
Jie Tang says...
Mike Jordan and his grad students teach a course at Berkeley called Practical Machine Learning which presents a broad overview of modern statistical machine learning from a practitioner's perspective. Lecture notes and homework assignments from last year are available at
http://www.cs.berkeley.edu/~jordan/courses/294-fall09/

A Google search will also turn up material from past years
Ben Newhouse says...
The textbook "Elements of Statistical Learning" has an obscene amount of material in it and is freely available in PDF form via http://www-stat.stanford.edu/~tibs/ElemStatLearn/

While more niche than general Machine Learning, I recently ripped through "Natural Image Statistics" (also downloadable at http://www.naturalimagestatistics.net/ ). It's a great read both for its explanations of your standard ML algo's (PCA, ICA, mixed gaussians etc) and for its real-world applications/examples in trying to understand the models used for analysis in our neural vision system
Jeremy Leibs gives the staple of David MacKay's book (I believe David MacKay would say that machine learning is just information theory), right?:
"Information Theory, Inference, and Learning Algorithms" by David MacKay has some decent introductory material if I remember. Available online:
http://www.inference.phy.cam.ac.uk/mackay/itila/book.html
Incidentally, I haven't read Programming Collective Intelligence, but it seems popular amongst non researchers. Do any of you know more about it?

Also, I have a few more Quora invites left, so if anybody wants in, let me know, and we'll see what we can do.

Thursday, June 17, 2010

Uncertainty: Probability and Quantum Mechanics

Posted by Danny Tarlow
In machine learning, we often take probability for granted. We desire a system for representing uncertainty in the world, and Cox's theorem tells us that if we accept some basic postulates regarding what is desirable in a system of uncertainty, we will end up with probability.

So that should be the end of the story... right? Well, maybe not. The first Cox postulate is
Divisibility and comparability - The plausibility of a statement is a real number and is dependent on information we have related to the statement,
which seems quite innocent. However, who's to say that there is anything fundamental about real numbers? Real numbers have strange things like irrational numbers and negative numbers (crazy, I know), but they're lacking in comparison to imaginary numbers (there's no operation that you can apply 4 times before returning to your original value, which you can do by multiplying by i with imaginary numbers). It seems kind of arbitrary to choose real numbers. For a fun and interesting read, see the following link. It makes the point better than I can:
Negative numbers aren’t easy. Imagine you’re a European mathematician in the 1700s. You have 3 and 4, and know you can write 4 – 3 = 1. Simple.

But what about 3-4? What, exactly, does that mean? How can you take 4 cows from 3? How could you have less than nothing?

Negatives were considered absurd, something that “darkened the very whole doctrines of the equations” (Francis Maseres, 1759). Yet today, it’d be absurd to think negatives aren’t logical or useful. Try asking your teacher whether negatives corrupt the very foundations of math.

http://betterexplained.com/articles/a-visual-intuitive-guide-to-imaginary-numbers/
Imaginary numbers come up in the context of systems of uncertainty when we deal with quantum mechanics. The basic idea is that interactions operate over amplitudes (expressed as complex numbers), then to determine the likelihood of a final configuration, you look at norms of amplitudes. For a relatively straightforward explanation, see here: http://lesswrong.com/lw/pd/configurations_and_amplitude/

So I don't necessarily have any well-formed thoughts on the matter (yet?), but it's fun to think about other principled ways of representing uncertainty. I'm curious to know if there are types of interactions useful for machine learning that would be hard to represent with standard probability models but that would be aided by these types of quantum models.

Finally, I leave you with this blog comment from The Blog of Scott Aaronson:
“graphical models with amplitudes instead of probabilities” is a fair definition of a quantum circuit (and therefore a quantum computer).

http://scottaaronson.com/blog/?p=74#comment-1702
That seems to me, worth understanding deeper.

Wednesday, May 19, 2010

Notes on Approximation Algorithms

Posted by Danny Tarlow
I'm enjoying reading notes from Shuchi Chawla's course at the University of Wisconsin, Madison on approximation algorithms for NP-hard optimization problems. There is a pdf available with scribe notes from all the courses, which makes for a very nice 170 page handbook:
http://pages.cs.wisc.edu/~shuchi/courses/880-S07/scribe-notes/all-in-one.pdf

Wednesday, May 5, 2010

Looking for Testers: Max-Product Belief Propagation Code

Posted by Danny Tarlow
I'm preparing to release some code associated with my upcoming paper on max-product belief propagation with high order potentials:
Tarlow, D., Givoni, I., & Zemel, R. (2010). HOP-MAP: Efficient message passing with high order potentials. In Artificial Intelligence and Statistics (AISTATS).
http://www.cs.toronto.edu/~dtarlow/TarlowGivoniZemel_AISTATS2010.pdf

If you're interested in trying to compile and play with a small example that I've put together, please let me know in the comments or by email: dtarlow cs toronto edu (but make sure to leave a way for me to contact you!)

The code is in C++. I have Makefiles that I can confirm work on my Apple machines (OS 10.5 with gcc 4.0.1) and the Toronto Ubuntu-based cluster computers (gcc 4.2.4). It'd be great to get people with different setups as well, though I won't be a ton of help with compilation specifics. I only ask that if you want to be a tester, you have some experience compiling C++ and a few hours sometime in the next few days to devote to this.

The code is written to only do max-product (max-sum) belief propagation, so it should be a bit simpler than existing alternatives. It should also have more functionality related to high order factors than any other existing library (this is the point of the paper). I should say that the library is similar in flavor to several other excellent libraries that are more mature: The first part I am releasing is a stripped down version of the code I use for my research. As I learn more about maintaining a reasonably-sized public code base, I will add in more pieces. Unless you are looking just to learn, or if you are specifically looking for simple high-order max-product code, you might be happier with the alternatives. At some point, I would like to add some of my factors to the other projects.

Having said that, by using my code, you'll be the first to get my implementations of my research, and I will provide a limited amount of support, especially to the earliest testers.

Thursday, April 29, 2010

NFL Play-by-Play Data

Posted by Danny Tarlow
Brian over at Advanced NFL Stats, who runs an excellent site, has compiled seven years worth of play-by-play data for the NFL. It looks like an amazing data set:
http://www.advancednflstats.com/2010/04/play-by-play-data.html

Thanks, Brian!

Thursday, April 22, 2010

Reddit Data Release

Posted by Danny Tarlow
Reddit has released some data related to how users vote on stories (links):
The format is username,link_id,vote where vote is -1 or 1 (downvote or upvote). The dump is 29MB gzip compressed and contains 7,405,561 votes from 31,927 users over 2,046,401 links. It contains votes only from users with the preference "make my votes public" turned on (which is not the default). This doesn't have the subreddit ID or anything in there, but I'd be willing to make another dump with more data if anything comes of this one
It looks very interesting and could be a good data set for recommendation system-like algorithms.

Tuesday, April 6, 2010

Duke is not the only winner from last night

Posted by Lee
Duke isn't the only winner after last night's incredible matchup. Scott Turner (entry: The Pain Machine) finished first in our Sweet 16 bracket tied with Danny (there is no tie breaker as we did not ask contestants to predict the scoring outcome). Also, congratulations to Danny for having the top entry in both the full and the Sweet 16 brackets!

Thank you to all contestants for your participation and making this a really fun event!

I apologize for the lack of coverage of the results (swamped at work), but I'll be sure to post some results comparisons and more coverage of our inaugural March Madness Predictive Analytics Challenge. In the meantime, you can view the final standings here:
Complete Results for the Full Predictive Bracket
Complete Results for the Sweet 16 Predictive Bracket

Thursday, March 25, 2010

March Madness Contest Prize Teaser

Posted by Danny Tarlow
Doug has been brainstorming logo ideas for the eventual official Smell the Data sticker that will go to the winner of the March Madness Predictive Analytics Challenge. I really like some of the directions he's going in. Here's one of my favorites:

Sunday, March 21, 2010

Sweet Sixteen starts Thursday

Posted by Lee
What a crazy week of basketball. Who would have expected Kansas to be knocked out so early? Turns out none of the prediction algorithms had Kansas leaving this early, but a few of them don't have Kansas going to the Final Four -- and that's good news for those competitors!

If your bracket is busted or if you didn't participate yet, it's not too late! You can still submit a Sweet Sixteen bracket to our second chance bracket by the beginning of play Thursday for a chance at a $25 Amazon.com Gift Certificate.

Full contest info: http://blog.smellthedata.com/2010/03/official-march-madness-predictive.html

Saturday, March 20, 2010

Current Standings and Introductions

Posted by Lee
We have eight entries in our inaugural March Madness Predictive Analytics Challenge. The standings after the first two rounds of play look like this:
  1. My Robots Wicked Smaht
  2. ebv
  3. Danny's Dangerous Picks
  4. Hugues
  5. The Pain Machine
  6. FTW
  7. Simple PageRank
  8. BrentsBracket
With the first week of basketball in the tournament over, let's introduce our competitors!

Entry Name: My Robots Wicked Smaht
Team Members: Rolf and Andrew
In their own words: Our backgrounds are more on the human learning side of things, so we took a fairly simple approach to creating a bracket picking robot. Our robot uses a simple regression to identify key variables in order to enhance the RPI rankings.

Rolf has blogged about their entry.

Entry Name: ebv
Team Members: Eric Venner (venner at bcm dot edu)
In his own words: I'm using a very simple model based on PageRank. A loss is treated like a link from the losing team to the winning team, and weighted based on the point in the season at which it was played - later games are weighted higher.

Entry Name: Danny's Dangerous Picks
Team Members: Danny Tarlow (you're reading his blog)
In his own words: I generated the predicted score using my probabilistic matrix factorization model's offensive and defensive rankings to determine each game's winner, like described here: http://blog.smellthedata.com/2010/03/march-madness-2010-offense-and-defense.html

Entry Name: Hugues
Team Members: Hugues Salamin
In his own words: I am currently doing a PhD in Glasgow, Scotland. For the prediction, I use a CRF with one variable (winner) and the features are the some of the states of the team members. I got an accuracy of 0.75 when training on 2006 to 2009 and testing on 2010. I was planning to extend the model (predict overtime and score delta) but did not have enough time. Maybe for the sweet 16 part. The code is in Python and training uses the SciPy LBFGS implementation for the gradient descent.

Name: The Pain Machine
Team Members: Dr. Scott Turner (srt19170 at gmail)
In his own words: My Ph.D. is in Artificial Intelligence from UCLA, where I wrote a program (MINSTREL) to tell stories about King Arthur and his knights as a way to look at creativity and storytelling. For the past twenty years I've worked for the Aerospace Corporation as a software architect and ground system expert for satellite programs.

My approach wasn't particularly sophisticated; it processed the first part of the season to develop a ranking for all the teams, and then did a simple genetic algorithm to evolve an equation to predict outcomes based upon the ranking, RPI, and a few other stats. It was able to correctly predict the outcomes of my test set of games at about 80% (not particularly good, IMO).

Entry Name: FTW
Team Members: Matt Curry (matt at pseudocoder dot com); @mcurry - http://pseudocoder.com
In his own words: A bunch of years ago I wasted hours per day writing programs to predict the outcomes of sporting events, mostly for pretend gambling purposes. My best was a program that could pick select NBA games with a tremendous success rate (focused on home teams that were huge underdogs). I didn't have the testicular fortitude to trust it with more then a few small bets. My program for this contest is awful. I fully expect to get destroyed.

Entry Name: Simple Page Rank Bracket
Team Members: Daniel Mack (dmack at isis dot vanderbilt dot edu) and @manieldack
In his own words: As a first go around, I decided to step back and look at the problem as a network. With teams as nodes, and wins being represented as edges from the team that was beaten to the team that won. This structure actually has some interesting properties, but one of the most fascinating, is that it resembles in some fashion a web infrastructure. Good teams are linked frequently from other teams that are also linked frequently. Using the barest of page rank algorithms, I calculated the teams' ranks and propagated the winner through in the brackets, this meant that when I predict a team to win, the page rank is calculated to take that win into account, and thus teams in the Final Four have been impacted.

Entry Name: BrentsBracket
Team Members: Brent Castle
In his own words: Matrix Completion for Power Rating Differences

Thursday, March 18, 2010

Team "My Robots Wiked Smaht"

Posted by Danny Tarlow
This is a guest post by Rolf, the driving force behind our early co-leader, "My Robots Wiked Smaht". After three games, they are one of only two teams to have a perfect 3 of 3 bracket.

Entry Name: My Robots Wicked Smaht
Team Member Names: Rolf and Andrew
Description: Our backgrounds are more on the human learning side of things, so we took a fairly simple approach to creating a bracket picking robot. Our robot uses a simple regression to identify key variables in order to enhance our variant on RPI rankings.


As an early leader in the March Madness Predictive Analytics Challenge, Danny suggested that I write a guest blog post detailing some of the finer points of my prediction algorithm - "My Robots Wicked Smaht." First off, I must mention that although I am a regular reader of "This Number Crunching Life," my formal mathematics and computer science background is pretty limited. I'm currently doing a PhD in Education, so I've been more focused on human learning than machine learning. But enough chit chat.

My first task in creating this robot was to learn how to use excel. That was the trickiest part, actually. Once I had organized the 2009 data into some statistics that I thought might be relevant, I consulted with my colleague Andrew. He then ran a regression and determined that wins in January and February were more highly correlated with wins in march (our goal) than those from November or December. Taking this information, we created a kind of modified RPI ranking ( had to look this one up as well), with extra weight placed on late season wins in addition to the standard factors of winning percentage, opponent's winning percentage, and opponent's opponent's winning percentage.

I was hoping to also factor in the average team height, but that seemed like a whole lot to do. Next Year!

The Tournament Is Underway!

Posted by Lee
The tournament is in full swing. A few games are done and others have just tipped off! I've made the Yahoo bracket public so that anyone can view it.

Also, if you're still interested in participating, it's not too late! We have the Sweet Sixteen bracket coming up in a week and we will be accepting entries for that bracket through the morning of 3/25.

I'll be posting soon about all our wonderful contestants. And, get ready, because we'll also begin featuring guest posts from the contestants themselves!

Wednesday, March 17, 2010

Tournament Bracket Closing Soon! Deadline Extended

Posted by Lee
I hope you have those predictions ready! We already have 8, count 'em, EIGHT teams participating in the "Tournament" bracket of our inaugural March Madness Predictive Analytics Challenge. I'm really excited to see how everyone does!

Since it's hard to check that the 1am deadline is enforced via the Yahoo bracket, I've extended the deadline to noon ET on March 18th, which is when the Yahoo brackets should close (right before the tipoff of the first game for Florida @ BYU).

I'll be making a post once the competition is under way about who our competitors are and a little bit of information about each competitor

If you have a last-minute submission, please make sure to request your password from me tonight so that I have time to respond in order for you to join the competition.

Tuesday, March 16, 2010

More Data Updates

Posted by Lee
Sorry the player data has been a bit of a mess. Don't worry, all the data you have is technically correct. However, the schema for the 2010 data is a bit different. I have updated the previous blog posts to reflect what the schema actually is.

You can continue to use the existing data you have, assuming you use the schema changes mentioned. Alternatively, I've posted the most up-to-date version of the data here: http://cs.stanford.edu/~lzen/All_Player_Data.zip

This merged data set of the 2006-2010 seasons uses the original schema:
  • ID (GUID)
  • Name
  • Height
  • Position
  • Team
  • Year
  • Class (Freshman, Sophomore, Junior, Senior)
  • Games - the number of games the player participated in
  • Field goals (shots) made, excluding three point shots
  • Field goal attempts, exlcuding three point shots
  • Three point shots made
  • Three point shots attempted
  • Free throws made
  • Free throw attempts
  • Assists
  • Blocks
  • Rebounds
  • Steals

Starter Code

Posted by Danny Tarlow
There's been some demand for this. I leave it to you to decipher the mystery (which means I don't have time to explain it right now):
http://www.cs.toronto.edu/~dtarlow/march_madness_public.tgz

If you improve it in any great ways, though, please let me know.

PS. It's Python code for my march madness model.

Monday, March 15, 2010

Updated Player Data

Posted by Lee
In the original announcement post, I did not have 2010 player data included. I have now included that data in the player data.

Though the data has the same format as the other player data and I tried as hard as possible to match players to players, it's possible that some of the players are matched inaccurately. This might be due to two players with the same name on a team or players who I have assumed transfer schools but maybe actually be another player with the same name appearing on a different team.

In the process of doing some checks on the 2010 data, I realized that I had made a mistake in the hashing of the original data set. Players with the same name, instead of the same hash, were being mapped to each other. I've fixed that and re-uploaded all the player data. Sorry about this and I hope this doesn't severely inconvenience anyone.

The most up-to-date version of the data are available at

The 2010 player data has a slightly different schema (sorry!) It includes three sets of field goal figures -- field goals made and attempted without 3 pointers, field goals made and attempted including 3 pointers, and 3 pointers made and attempted. Also note that the last four columns are in slightly different order.
  • ID (GUID)
  • Name
  • Height
  • Position
  • Team
  • Year
  • Class (Freshman, Sophomore, Junior, Senior)
  • Games - the number of games the player participated in
  • Field goals (shots) made, excluding three point shots
  • Field goal attempts, exlcuding three point shots
  • Field goals (shots) made, including three point shots
  • Field goal attempts, including three point shots
  • Three point shots made
  • Three point shots attempted
  • Free throws made
  • Free throw attempts
  • Rebounds
  • Assists
  • Steals
  • Blocks

March Madness 2010: Offense and Defense Ratings

Posted by Danny Tarlow
I quickly ran my algorithm from last year on this year's score data. See the old post for the details. I ran the exact same 1D model with this year's scores data, which gives offensive and defensive ratings for each team.

How it works:
If you want the details, read the old post. If you just want the quick story, the output here can be viewed a rating of each team's offense (higher is better) and defense (lower is better). It's actually more than that, though. The numbers are calibrated so that you can predict the outcome of a game between team A and team B as follows:
Team A's predicted score: Team A offense rating * Team B defense rating
Team B's predicted score: Team B offense rating * Team A defense rating

For example, if Kansas played Lehigh, the predicted score will be:
Kansas: 10.27 * 9.17 = 94
Lehigh: 8.70 * 7.04 = 61

A real blowout!

I've also included a combined measure (higher is better) at the bottom, which is just the offensive rating divided by the defensive rating for each team.

So have fun with it. Look up a few games from the season and see how well these predictions match the true outcomes. For my real predictions, though, I will most likely be using a higher dimensional version of the model, so don't pin me down to these predictions.

Update: I lied. I actually did just use the 1D version, though there's no good justification for it over other settings of parameters. I didn't get a chance to do proper validation to set latent dimension or regularization strength.

Offenses
1 Villanova (10.6096071747)
2 Providence (10.4961023481)
3 BYU (10.3145090472)
4 Syracuse (10.2897143393)
5 Texas (10.2737615755)
6 Kansas (10.2705742443)
7 Maryland (10.2565209896)
8 Washington (10.2331268368)
9 California (10.2154362322)
10 Duke (10.2054254119)
11 Kansas St. (10.1591288426)
12 Xavier (10.121168382)
13 Virginia Military (10.0635213108)
14 Seton Hall (10.02517276)
15 Kentucky (10.0157560113)
16 Houston (9.98686919786)
17 Vanderbilt (9.90921563852)
18 Mississippi (9.79158837692)
19 Missouri (9.78065560781)
20 Louisville (9.70612426736)
21 Notre Dame (9.68648004228)
22 St. Mary's (9.68403578665)
23 Texas Tech (9.68259249492)
24 Ohio St. (9.66670247978)
25 New Mexico (9.65916226491)
26 Nevada (9.60499808476)
27 North Carolina (9.58002470987)
28 Georgetown (9.54401370366)
29 VCU (9.5113590692)
30 Baylor (9.50523410129)
31 Marshall (9.50445111215)
32 Minnesota (9.50421932536)
33 Michigan St. (9.50198567669)
34 Gonzaga (9.49881517781)
35 New Mexico St. (9.48362123051)
36 Colorado (9.47117793387)
37 Rhode Island (9.4689437856)
38 Arkansas (9.426477063)
39 Wake Forest (9.41800936615)
40 Seattle (9.40062565121)
41 Tennessee (9.39149332818)
42 Auburn (9.39083686601)
43 West Virginia (9.38665180217)
44 Clemson (9.38072138583)
45 Sam Houston St. (9.37071466958)
46 Siena (9.3263895148)
47 Georgia Tech (9.32282841693)
48 Arizona (9.31339708463)
49 Illinois (9.28819011497)
50 Oklahoma St. (9.26270937933)
51 UTEP (9.23343702695)
52 Memphis (9.22475741799)
53 Purdue (9.21034444121)
54 Marquette (9.20800477412)
55 Connecticut (9.18296576043)
56 Lipscomb (9.18236722172)
57 Loyola Marymount (9.16892800575)
58 Iowa St. (9.1571824487)
59 UNLV (9.14354681667)
60 Washington St. (9.13168384574)
61 South Carolina (9.12252944278)
62 Portland St. (9.09746808172)
63 Mississippi St. (9.08765594201)
64 Weber St. (9.0831999882)
65 Oklahoma (9.05438021562)
66 Miami (FL) (9.04895695302)
67 Texas A&M (9.04386975172)
68 Cornell (9.03644880214)
69 Texas St. (9.03095128356)
70 Ohio (9.02215074799)
71 Charlotte (8.99381210537)
72 Virginia Tech (8.99308457727)
73 Charleston (8.956239944)
74 Florida (8.95097964854)
75 Valparaiso (8.92759560363)
76 Oakland (8.9186263927)
77 Oregon (8.91203196461)
78 Duquesne (8.90419433875)
79 Boston Coll. (8.90010038244)
80 Morgan St. (8.8952970081)
81 Northwestern (8.88963818909)
82 Harvard (8.8895080359)
83 Mercer (8.88181308695)
84 San Diego St. (8.85067344293)
85 Stanford (8.84532617832)
86 Louisiana Tech (8.84197368222)
87 San Jose St. (8.82173364911)
88 Tulsa (8.82170452554)
89 Missouri St. (8.81315886007)
90 Cincinnati (8.79493855176)
91 Niagara (8.76603702345)
92 Florida St. (8.76475396565)
93 Boise St. (8.76355302968)
94 Hofstra (8.75730714315)
95 Buffalo (8.74189303958)
96 Murray St. (8.73611793116)
97 Southern Ill. (8.73280668662)
98 Utah St. (8.72402191384)
99 Georgia (8.72398768514)
100 CSU Fullerton (8.72154974619)
101 Wisconsin (8.70891624191)
102 Lehigh (8.70264803831)
103 Creighton (8.6997766943)
104 Appalachian St. (8.68598851192)
105 Butler (8.67851699281)
106 South Florida (8.65946082027)
107 Kent St. (8.65787713043)
108 La Salle (8.65580569834)
109 Indiana (8.65247255758)
110 IUPUI (8.65188290013)
111 Northwestern St. (8.650235705)
112 Alabama (8.64614610947)
113 Tennessee Tech (8.64465023399)
114 Portland (8.63953820513)
115 South Dakota (8.6393257918)
116 Dayton (8.62156605483)
117 Massachusetts (8.62118474604)
118 Pittsburgh (8.61917708428)
119 Florida Atlantic (8.61684915412)
120 Rutgers (8.60939294212)
121 Illinois St. (8.60220828937)
122 South Dakota St. (8.58818128717)
123 St. Joseph's (8.58635760433)
124 Belmont (8.57947885872)
125 St. Bonaventure (8.57627623419)
126 UCLA (8.56735122638)
127 N.C. State (8.56484924061)
128 Arizona St. (8.56228116852)
129 Western Carolina (8.55781508298)
130 Northern Colorado (8.55566403218)
131 Akron (8.55509217116)
132 Boston U. (8.54478891175)
133 Wichita St. (8.54473529915)
134 Geo. Washington (8.52543739356)
135 Long Beach St. (8.52103754504)
136 St. John's (8.51878683872)
137 West. Kentucky (8.50587746016)
138 Penn St. (8.50373836683)
139 UAB (8.4925456171)
140 North Texas (8.49212894777)
141 Bradley (8.49115606127)
142 Wyoming (8.46965791013)
143 CSU Northridge (8.46053573273)
144 Wis. Milwaukee (8.46041313033)
145 Austin Peay (8.44759396845)
146 Towson (8.43043210642)
147 Fairfield (8.42743029488)
148 Davidson (8.42734534402)
149 Rider (8.42711072139)
150 Troy (8.42688530197)
151 Nebraska (8.42014334254)
152 Longwood (8.40286077115)
153 Richmond (8.39797612089)
154 Quinnipiac (8.39744540759)
155 W. Michigan (8.38682961886)
156 Virginia (8.37602397935)
157 TCU (8.37093507936)
158 Old Dominion (8.36246957369)
159 East Carolina (8.35925965558)
160 Idaho (8.35268923895)
161 Michigan (8.3365433049)
162 Sacred Heart (8.32808333199)
163 Lafayette (8.32625599744)
164 Detroit (8.32542281924)
165 Drake (8.31709561953)
166 Indiana St. (8.31341329676)
167 Jacksonville (8.30868848135)
168 Utah (8.30390692076)
169 Colorado St. (8.29295481735)
170 Norfolk St. (8.29219191754)
171 Cal Poly (8.28455631283)
172 Vermont (8.27729930135)
173 Georgia Southern (8.27678648167)
174 Northern Illinois (8.27057671426)
175 San Francisco (8.25424694999)
176 William & Mary (8.25208642378)
177 Cleveland St. (8.23586167401)
178 Wis. Green Bay (8.23082932485)
179 N.C. Asheville (8.22769377309)
180 Wright St. (8.19144625554)
181 UCF (8.18654496997)
182 SE Louisiana (8.18149519102)
183 Navy (8.16042522414)
184 Oral Roberts (8.15086234093)
185 Houston Baptist (8.14243450471)
186 N.C. Wilmington (8.1400577915)
187 Montana (8.13988183176)
188 TX Arlington (8.12712309124)
189 Morehead St. (8.11426184637)
190 Long Island (8.10772355358)
191 Eastern Kentucky (8.106981612)
192 E. Tennessee St. (8.09684175744)
193 James Madison (8.08667648927)
194 UCSB (8.07427827314)
195 UC Davis (8.07190032751)
196 SMU (8.06687909503)
197 North Dakota St. (8.06186227025)
198 Canisius (8.06106458325)
199 Wofford (8.05958276878)
200 Hawaii (8.05942849129)
201 Northern Arizona (8.05878912917)
202 Centenary (8.04635244693)
203 Brown (8.03501954313)
204 Temple (8.03261201059)
205 Florida Intl. (8.02774849995)
206 Fresno St. (8.02272203563)
207 Holy Cross (8.01169361973)
208 Arkansas St. (8.00963777856)
209 Chattanooga (8.00745524164)
210 Pepperdine (8.00657585665)
211 George Mason (8.00084105845)
212 Tennessee St. (7.98946385038)
213 Furman (7.98087495336)
214 Northeastern (7.97733855598)
215 Radford (7.96513211378)
216 AR Little Rock (7.96289967102)
217 East. Michigan (7.95995718537)
218 LA Lafayette (7.95789934659)
219 Iona (7.95702027986)
220 Jacksonville St. (7.95636634842)
221 S. Carolina St. (7.95398725818)
222 UTSA (7.95272384668)
223 Robert Morris (7.94917766814)
224 Youngstown St. (7.94838383395)
225 Stony Brook (7.9417013915)
226 Idaho St. (7.93729628843)
227 Delaware (7.91199423416)
228 Iowa (7.9088096721)
229 Middle Tenn. St. (7.8810085041)
230 Fordham (7.87892005172)
231 Loyola (MD) (7.87272847122)
232 Texas A&M C.C. (7.86758877119)
233 Manhattan (7.86516248955)
234 Lamar (7.86410573215)
235 Evansville (7.86311218223)
236 Coastal Carolina (7.85161633959)
237 Colgate (7.84802475514)
238 Northern Iowa (7.84218230398)
239 Montana St. (7.83560428312)
240 LA Monroe (7.81963610353)
241 Loyola Chicago (7.81714457956)
242 Eastern Wash. (7.81449468244)
243 CSU Bakersfield (7.80991967038)
244 Santa Clara (7.79518235909)
245 Southern Miss (7.78346772869)
246 Elon (7.77743909896)
247 Eastern Illinois (7.7760680638)
248 McNeese St. (7.77366837427)
249 IPFW (7.76983760159)
250 Campbell (7.76633283401)
251 UC Irvine (7.764649057)
252 Drexel (7.75562001336)
253 N.C. Greensboro (7.74896701072)
254 Tulane (7.74292641597)
255 Ill. Chicago (7.73972600341)
256 Saint Louis (7.73412222778)
257 Yale (7.72699256894)
258 N.C. A&T (7.72138799967)
259 DePaul (7.71328141092)
260 Rice (7.70807536381)
261 South Alabama (7.70764782786)
262 LSU (7.69950003223)
263 Cent. Michigan (7.69586780381)
264 Pacific (7.68724054707)
265 Mo. Kansas City (7.67736402677)
266 High Point (7.66522423916)
267 Bucknell (7.64268499918)
268 Miami (OH) (7.63081561921)
269 Mount St. Mary's (7.63048414746)
270 Denver (7.62180749134)
271 Bowling Green (7.61975926885)
272 Fla Gulf Coast (7.6131476021)
273 Kennesaw St. (7.59963686767)
274 Gardner-Webb (7.58850969258)
275 San Diego (7.56885211032)
276 USC (7.54594795722)
277 Texas Southern (7.52570746444)
278 Georgia St. (7.51597035452)
279 Hampton (7.514234785)
280 Pennsylvania (7.50925883307)
281 Tenn-Martin (7.50182722172)
282 UMBC (7.4873574332)
283 Liberty (7.47148187962)
284 Florida A&M (7.4349682354)
285 Sacramento St. (7.42424021272)
286 SE Missouri St. (7.42136521258)
287 Nicholls St. (7.41215783519)
288 Albany (7.40098997539)
289 AR Pine Bluff (7.39173225453)
290 Oregon St. (7.38679861661)
291 Monmouth (7.37489634877)
292 Fair. Dickinson (7.3356201816)
293 Binghamton (7.33260132598)
294 St. Peter's (7.31955349901)
295 UC Riverside (7.29725090775)
296 St. Francis (PA) (7.28129141613)
297 Southern Utah (7.27886711148)
298 Maine (7.26674029118)
299 Ball St. (7.25577667672)
300 Charleston S. (7.24025306611)
301 Hartford (7.23842840017)
302 South Carolina Upstate (7.23537470181)
303 Stephen F. Austin (7.22833231784)
304 Miss. Valley St. (7.22022084486)
305 Alabama St. (7.21965990029)
306 American (7.1989899691)
307 Citadel (7.19776887055)
308 Alabama A&M (7.15863976235)
309 Central Arkansas (7.14821031528)
310 SIU Edwardsville (7.14017516397)
311 MD Eastern Shore (7.13988606988)
312 Texas Pan Amer. (7.1395551766)
313 Grambling St. (7.10377729696)
314 New Hampshire (7.10016962468)
315 North Dakota (7.08799906924)
316 Central Conn. St. (7.08791824285)
317 Columbia (7.06121962841)
318 Winthrop (7.05229738658)
319 Stetson (7.05058324169)
320 Jackson St. (7.03502114953)
321 Alcorn St. (7.03412739453)
322 North Florida (7.02615483067)
323 Wagner (7.01714656847)
324 Coppin St. (6.982664216)
325 St. Francis (NY) (6.93553118848)
326 New Orleans (6.90034577172)
327 Marist (6.88547344299)
328 Presbyterian (6.88275297017)
329 Princeton (6.87238530992)
330 Chicago St. (6.86467540963)
331 Air Force (6.86156157659)
332 Prairie View A&M (6.85885459159)
333 Howard (6.84415147973)
334 Samford (6.83629259329)
335 Winston-Salem (6.83109766563)
336 N.C. Central (6.80707521442)
337 Delaware St. (6.76011692275)
338 Army (6.75601767785)
339 Southern (6.75205622186)
340 Utah Valley (6.74731423289)
341 Bethune-Cookman (6.74090336909)
342 Savannah St. (6.70615731815)
343 Western Ill. (6.67671544068)
344 N.J.I.T. (6.64872787082)
345 Toledo (6.53500233042)
346 Dartmouth (6.29772838065)
347 Bryant (5.95480032938)


Defenses
1 Wisconsin (6.32057065963)
2 Temple (6.44605516553)
3 USC (6.44692532578)
4 Northern Iowa (6.45024371697)
5 Princeton (6.71832788491)
6 Arizona St. (6.75973258441)
7 Old Dominion (6.81807850841)
8 Florida St. (6.8242040874)
9 Pittsburgh (6.87076188071)
10 Duke (6.89518504414)
11 Oregon St. (6.91918358722)
12 Purdue (6.92284973952)
13 Utah St. (6.9599625176)
14 West Virginia (7.00124414689)
15 Michigan (7.02679704679)
16 Saint Louis (7.03835353947)
17 Kansas (7.04030322554)
18 Southern Miss (7.07228715031)
19 Ohio St. (7.07884259933)
20 Butler (7.08964174039)
21 Northeastern (7.09629072959)
22 Texas A&M (7.12530348174)
23 Marquette (7.1480705149)
24 Richmond (7.15339157418)
25 Dayton (7.15407151931)
26 Virginia (7.18353766063)
27 Pacific (7.19891611498)
28 Georgetown (7.21675750724)
29 Clemson (7.23483082807)
30 UAB (7.23922253822)
31 Tennessee (7.27184878438)
32 Stephen F. Austin (7.28604960942)
33 San Diego St. (7.29372227606)
34 Michigan St. (7.30339210826)
35 Alabama (7.30903904967)
36 Air Force (7.31649872138)
37 Cincinnati (7.31906298288)
38 N.C. State (7.32095956951)
39 Baylor (7.32165324762)
40 Wright St. (7.33416688762)
41 Minnesota (7.3365113623)
42 Kentucky (7.34945752954)
43 St. John's (7.36255627257)
44 Connecticut (7.36459502867)
45 Mississippi St. (7.39821704997)
46 Florida (7.40017780545)
47 Montana (7.40285526754)
48 UNLV (7.40595866694)
49 St. Peter's (7.41269436385)
50 Georgia Tech (7.41589079417)
51 Syracuse (7.44868324078)
52 South Florida (7.45089659866)
53 Wichita St. (7.46348451676)
54 Missouri (7.4729284621)
55 LSU (7.47573255956)
56 Miami (OH) (7.47900971338)
57 Wofford (7.48742664695)
58 Iona (7.48867594252)
59 Penn St. (7.49814888772)
60 Citadel (7.49830311569)
61 Nebraska (7.49904269456)
62 Miami (FL) (7.5006777214)
63 UTEP (7.50283671305)
64 DePaul (7.5052997362)
65 Memphis (7.51440983833)
66 Virginia Tech (7.51613488033)
67 Iowa (7.56420047514)
68 Kansas St. (7.56509818048)
69 Murray St. (7.58529581953)
70 Delaware St. (7.58895799063)
71 Boston Coll. (7.59333620215)
72 SMU (7.59480253644)
73 Oklahoma St. (7.60612280358)
74 Samford (7.62059057461)
75 Drexel (7.62293362159)
76 Tulsa (7.62459657347)
77 Illinois (7.62532578585)
78 Portland (7.63482056947)
79 Morehead St. (7.66066378862)
80 Northwestern (7.67233635766)
81 Utah (7.69836822656)
82 Coastal Carolina (7.7033515457)
83 San Diego (7.70396346086)
84 William & Mary (7.72423302282)
85 Winthrop (7.72487329491)
86 UCLA (7.72648162538)
87 Army (7.72722115146)
88 Louisville (7.73382821831)
89 Georgia (7.74376644246)
90 Western Ill. (7.74385661906)
91 Illinois St. (7.76426797661)
92 Maryland (7.77528026697)
93 Wake Forest (7.79036526764)
94 Denver (7.79401961232)
95 Ball St. (7.80074237613)
96 Notre Dame (7.80089366125)
97 Georgia St. (7.80197909369)
98 Gonzaga (7.80662609984)
99 BYU (7.81416990538)
100 California (7.82606819299)
101 Vanderbilt (7.83014433876)
102 Iowa St. (7.83028613886)
103 Texas (7.8335621828)
104 Fresno St. (7.84611580197)
105 St. Mary's (7.84645378187)
106 New Mexico (7.8479444928)
107 Middle Tenn. St. (7.84942637318)
108 UC Riverside (7.85475895144)
109 Bethune-Cookman (7.87396971794)
110 Kent St. (7.87629754562)
111 George Mason (7.90025735892)
112 Campbell (7.91598773648)
113 Colorado St. (7.93210203831)
114 Missouri St. (7.94094032558)
115 Tulane (7.95101601543)
116 Mississippi (7.95367851061)
117 Washington (7.95432647712)
118 Stanford (7.95720907604)
119 South Carolina (7.96277587671)
120 UCSB (7.966640255)
121 VCU (7.96746197122)
122 Maine (7.98920434411)
123 Bradley (7.98920860735)
124 North Carolina (7.99006864652)
125 Oral Roberts (7.99080464416)
126 UCF (8.00605473831)
127 Mount St. Mary's (8.01404551663)
128 Xavier (8.02351076537)
129 Texas A&M C.C. (8.02446615457)
130 Savannah St. (8.04569467886)
131 Cent. Michigan (8.05402586387)
132 West. Kentucky (8.05887288397)
133 Geo. Washington (8.06038065862)
134 Indiana St. (8.07246235979)
135 New Hampshire (8.07774962112)
136 Akron (8.09083851123)
137 Oklahoma (8.09402823024)
138 North Florida (8.10379974475)
139 Cleveland St. (8.10389606057)
140 E. Tennessee St. (8.10471837755)
141 Siena (8.10595282816)
142 Detroit (8.11149093745)
143 Arizona (8.11247555657)
144 Cornell (8.12109349834)
145 Oregon (8.12499261218)
146 Bowling Green (8.13087582324)
147 Montana St. (8.14627271973)
148 Fairfield (8.15012446845)
149 Villanova (8.17587846299)
150 Idaho (8.18000737199)
151 Creighton (8.18360558613)
152 Wis. Green Bay (8.19445627815)
153 Southern Ill. (8.1950580689)
154 Hampton (8.20326126486)
155 Toledo (8.21302925358)
156 IUPUI (8.21576028368)
157 Manhattan (8.22130057055)
158 Rice (8.22253013708)
159 Washington St. (8.22288790194)
160 UTSA (8.22574472187)
161 Eastern Kentucky (8.23109725059)
162 Drake (8.23649402409)
163 Rhode Island (8.23982869893)
164 TCU (8.24110941024)
165 Indiana (8.24621895551)
166 Marshall (8.24996953102)
167 Rutgers (8.2538701799)
168 Eastern Illinois (8.25416720774)
169 Northern Colorado (8.2630252777)
170 Loyola (MD) (8.27909029889)
171 New Orleans (8.2793863026)
172 Louisiana Tech (8.28055379758)
173 Wis. Milwaukee (8.28255600529)
174 Hofstra (8.28449899266)
175 Stony Brook (8.28483270871)
176 Charlotte (8.3041660791)
177 Loyola Chicago (8.31062184407)
178 East. Michigan (8.31182296875)
179 St. Bonaventure (8.31310274207)
180 AR Pine Bluff (8.32110536489)
181 Columbia (8.3272917188)
182 Vermont (8.33870032527)
183 Evansville (8.34641283057)
184 LA Lafayette (8.36296635221)
185 Robert Morris (8.36585424925)
186 Santa Clara (8.37058557082)
187 Colorado (8.38106259999)
188 Arkansas St. (8.38127263458)
189 Ohio (8.39718772894)
190 South Carolina Upstate (8.4007082262)
191 Dartmouth (8.40335946266)
192 Utah Valley (8.40337517228)
193 W. Michigan (8.40440404052)
194 Jacksonville (8.40782902276)
195 Hawaii (8.41135487474)
196 American (8.42065268017)
197 San Francisco (8.42601745261)
198 Harvard (8.42749126568)
199 St. Francis (NY) (8.44054055968)
200 Seton Hall (8.44771710686)
201 Texas Tech (8.46818959437)
202 Appalachian St. (8.47215601275)
203 Long Beach St. (8.47595388304)
204 Boston U. (8.48855576781)
205 Central Arkansas (8.48921146639)
206 South Alabama (8.49126224194)
207 Belmont (8.49141453145)
208 Arkansas (8.49621673644)
209 Binghamton (8.50035868758)
210 UC Irvine (8.5006142522)
211 Canisius (8.50775907628)
212 Prairie View A&M (8.51091532265)
213 IPFW (8.51524783412)
214 N.C. Wilmington (8.5152581359)
215 James Madison (8.52000467171)
216 Auburn (8.52461452915)
217 Jackson St. (8.52910457893)
218 Weber St. (8.53362984696)
219 Niagara (8.53659694399)
220 Radford (8.54006564549)
221 La Salle (8.54693640652)
222 Nicholls St. (8.54913856799)
223 SE Louisiana (8.56077726702)
224 Duquesne (8.59773053405)
225 Delaware (8.60280698837)
226 Ill. Chicago (8.60808215306)
227 Troy (8.60982491205)
228 Bucknell (8.61386897796)
229 Monmouth (8.62278504593)
230 Charleston S. (8.62680629702)
231 North Texas (8.63825319561)
232 Boise St. (8.64434713111)
233 Western Carolina (8.64809265887)
234 Marist (8.65338106147)
235 Nevada (8.65608723751)
236 Oakland (8.65642781706)
237 Kennesaw St. (8.67869116985)
238 N.C. Greensboro (8.67981044751)
239 Massachusetts (8.67988477401)
240 Albany (8.69120190998)
241 Rider (8.69757385785)
242 Youngstown St. (8.70061407371)
243 Central Conn. St. (8.70393489342)
244 Quinnipiac (8.70612999176)
245 Texas Southern (8.72030476475)
246 Davidson (8.73101338061)
247 Wyoming (8.7360652008)
248 St. Joseph's (8.74362472809)
249 Coppin St. (8.75493806185)
250 Furman (8.75980614742)
251 Sacramento St. (8.77326310285)
252 Bryant (8.78048912511)
253 Austin Peay (8.78487613032)
254 Alabama St. (8.79203844695)
255 Liberty (8.79750178724)
256 UC Davis (8.79994976191)
257 Pepperdine (8.80675284012)
258 Yale (8.81463822777)
259 Northern Arizona (8.8327407083)
260 North Dakota St. (8.83329331039)
261 Houston (8.86681547748)
262 N.J.I.T. (8.89190042364)
263 Elon (8.89508768935)
264 San Jose St. (8.90228645909)
265 Holy Cross (8.9102852514)
266 Buffalo (8.91496425265)
267 Jacksonville St. (8.91622612699)
268 East Carolina (8.92111643715)
269 Sam Houston St. (8.92669268469)
270 SE Missouri St. (8.92849481542)
271 Winston-Salem (8.93934219245)
272 Hartford (8.94077611862)
273 Presbyterian (8.95632457717)
274 Florida Atlantic (8.96338451873)
275 LA Monroe (8.97012356545)
276 Chattanooga (8.98154423833)
277 McNeese St. (8.98263439309)
278 Eastern Wash. (8.99090298468)
279 Charleston (8.99428844871)
280 Stetson (8.99529851335)
281 Mo. Kansas City (9.0006530765)
282 Morgan St. (9.00163300792)
283 AR Little Rock (9.00213672152)
284 TX Arlington (9.00256935542)
285 Tennessee St. (9.01003374672)
286 Lamar (9.01530528868)
287 Pennsylvania (9.01734872767)
288 Loyola Marymount (9.01919725003)
289 Fla Gulf Coast (9.02014638581)
290 CSU Bakersfield (9.02103555006)
291 MD Eastern Shore (9.04143668998)
292 New Mexico St. (9.05158842458)
293 N.C. Central (9.05196304571)
294 Howard (9.05294825394)
295 S. Carolina St. (9.05605147407)
296 Texas Pan Amer. (9.05996521795)
297 CSU Northridge (9.06285169277)
298 Wagner (9.06589569738)
299 St. Francis (PA) (9.06639894959)
300 Colgate (9.07605299997)
301 Northern Illinois (9.08285569398)
302 South Dakota St. (9.09169089921)
303 Fair. Dickinson (9.09364072219)
304 Idaho St. (9.10398788438)
305 Cal Poly (9.11776657342)
306 CSU Fullerton (9.12936899069)
307 Southern Utah (9.1310335987)
308 N.C. A&T (9.15651666505)
309 Lehigh (9.17181198377)
310 Valparaiso (9.19006331902)
311 Chicago St. (9.22816567117)
312 High Point (9.2314108556)
313 Lafayette (9.25065942627)
314 North Dakota (9.27399334025)
315 Providence (9.27454860001)
316 Brown (9.2806385682)
317 SIU Edwardsville (9.33791078455)
318 Towson (9.34944644617)
319 Fordham (9.35448135911)
320 Lipscomb (9.38079698141)
321 Mercer (9.41301764102)
322 Portland St. (9.43760160698)
323 Long Island (9.4473457865)
324 UMBC (9.45742098012)
325 Alabama A&M (9.48291716302)
326 Tenn-Martin (9.48881486106)
327 Florida Intl. (9.48960691055)
328 Navy (9.52218904341)
329 South Dakota (9.52469625509)
330 Miss. Valley St. (9.5409902926)
331 Southern (9.57825041251)
332 Florida A&M (9.59913215592)
333 Sacred Heart (9.62362981881)
334 Georgia Southern (9.69349029777)
335 Tennessee Tech (9.70248177492)
336 Norfolk St. (9.7145731196)
337 Grambling St. (9.7645185558)
338 N.C. Asheville (9.78288772817)
339 Seattle (9.78365887659)
340 Gardner-Webb (9.80860538609)
341 Texas St. (9.85033995182)
342 Centenary (9.85333899769)
343 Longwood (10.0855988691)
344 Northwestern St. (10.1669651218)
345 Houston Baptist (10.2985052311)
346 Alcorn St. (10.6680173801)
347 Virginia Military (12.5710938304)

Combined (Offensive Rating divided by Defensive Rating)
1 Duke (1.48007999011)
2 Kansas (1.45882555272)
3 Syracuse (1.38141386962)
4 Wisconsin (1.37786866264)
5 Ohio St. (1.3655766948)
6 Kentucky (1.36278847398)
7 Kansas St. (1.34289451376)
8 West Virginia (1.34071196565)
9 Purdue (1.33042674444)
10 Georgetown (1.32247947836)
11 BYU (1.31997501617)
12 Maryland (1.31911913621)
13 Texas (1.3115057155)
14 Missouri (1.30881162016)
15 California (1.30530887034)
16 Michigan St. (1.30103731743)
17 Baylor (1.29823603766)
18 Villanova (1.29767183095)
19 Clemson (1.29660549206)
20 Minnesota (1.29546849395)
21 Tennessee (1.29148633403)
22 Marquette (1.28818046142)
23 Washington (1.28648564604)
24 Florida St. (1.28436281409)
25 Texas A&M (1.26926099006)
26 Arizona St. (1.2666597475)
27 Vanderbilt (1.26552145271)
28 Xavier (1.26143887358)
29 Georgia Tech (1.25714208524)
30 Louisville (1.25502196239)
31 Pittsburgh (1.25447180879)
32 Utah St. (1.25345817478)
33 Connecticut (1.24690709057)
34 Temple (1.24612833808)
35 Notre Dame (1.24171415006)
36 UNLV (1.23462028724)
37 St. Mary's (1.23419267555)
38 Mississippi (1.23107671046)
39 New Mexico (1.23078881021)
40 UTEP (1.23065946656)
41 Mississippi St. (1.22835757327)
42 Memphis (1.22760903603)
43 Old Dominion (1.22651412174)
44 Butler (1.22411220631)
45 Illinois (1.21807125044)
46 Oklahoma St. (1.21779645406)
47 Gonzaga (1.21676317737)
48 Northern Iowa (1.21579627811)
49 San Diego St. (1.21346455321)
50 Florida (1.20956278131)
51 Wake Forest (1.20893039576)
52 Miami (FL) (1.20641857831)
53 Dayton (1.20512718269)
54 Cincinnati (1.20164815801)
55 North Carolina (1.19899153983)
56 Virginia Tech (1.19650388404)
57 VCU (1.19377527041)
58 Seton Hall (1.1867315907)
59 Michigan (1.18639306776)
60 Alabama (1.18293883104)
61 Richmond (1.17398523956)
62 UAB (1.17312951388)
63 Boston Coll. (1.17209354959)
64 USC (1.17047236875)
65 N.C. State (1.16990800991)
66 Iowa St. (1.16945693763)
67 Virginia (1.16600265427)
68 South Florida (1.16220386441)
69 Northwestern (1.15866116587)
70 St. John's (1.15704200054)
71 Tulsa (1.15700607115)
72 Marshall (1.15205893506)
73 Murray St. (1.1517174991)
74 Siena (1.15056054637)
75 Rhode Island (1.1491675533)
76 Arizona (1.14803391637)
77 South Carolina (1.145646893)
78 Wichita St. (1.14487211435)
79 Texas Tech (1.14340761824)
80 Penn St. (1.13411169799)
81 Providence (1.13171031829)
82 Portland (1.13159675811)
83 Colorado (1.13006886906)
84 Georgia (1.12658197403)
85 Houston (1.12631972812)
86 Northeastern (1.12415610633)
87 Nebraska (1.12282909773)
88 Oklahoma (1.11864944847)
89 Wright St. (1.11688844569)
90 Cornell (1.11271330689)
91 Stanford (1.1116116334)
92 Washington St. (1.11052028857)
93 Missouri St. (1.1098381928)
94 Nevada (1.10962353096)
95 Arkansas (1.10949112475)
96 UCLA (1.10882956069)
97 Illinois St. (1.10792264194)
98 Auburn (1.10161425293)
99 Southern Miss (1.10055878152)
100 Montana (1.09955976952)
101 Kent St. (1.09923185104)
102 Saint Louis (1.09885389877)
103 Oregon (1.09686646992)
104 Charlotte (1.0830481977)
105 Utah (1.07865805796)
106 Wofford (1.0764155896)
107 Ohio (1.07442527656)
108 William & Mary (1.06833732222)
109 Pacific (1.06783304935)
110 Louisiana Tech (1.06779979919)
111 Oregon St. (1.0675823995)
112 Southern Ill. (1.06561864641)
113 Weber St. (1.06440051316)
114 Creighton (1.06307380075)
115 Bradley (1.06282818219)
116 Iona (1.0625403397)
117 SMU (1.06215784496)
118 Morehead St. (1.05921132558)
119 Geo. Washington (1.05769662186)
120 Akron (1.05738016638)
121 Hofstra (1.0570714235)
122 West. Kentucky (1.05546738144)
123 Harvard (1.05482257479)
124 IUPUI (1.05308365889)
125 Sam Houston St. (1.04974092876)
126 Indiana (1.04926543962)
127 New Mexico St. (1.04773005418)
128 Iowa (1.0455579143)
129 Colorado St. (1.04549270512)
130 Rutgers (1.04307346184)
131 Duquesne (1.03564473247)
132 Northern Colorado (1.0354154495)
133 Fairfield (1.03402473514)
134 St. Bonaventure (1.03165767347)
135 Oakland (1.03028946595)
136 LSU (1.02993251442)
137 Indiana St. (1.02984850548)
138 DePaul (1.02771130828)
139 Niagara (1.02687723)
140 Detroit (1.02637392847)
141 Appalachian St. (1.02523944305)
142 Princeton (1.02293091788)
143 UCF (1.02254421654)
144 Fresno St. (1.02250874676)
145 Wis. Milwaukee (1.02147370026)
146 Idaho (1.02111023366)
147 Miami (OH) (1.02029759442)
148 Oral Roberts (1.02003023524)
149 Coastal Carolina (1.01924679057)
150 Drexel (1.01740621109)
151 Loyola Marymount (1.01660133952)
152 Cleveland St. (1.01628421841)
153 TCU (1.01575342137)
154 Boise St. (1.01379004068)
155 UCSB (1.01351109309)
156 La Salle (1.01273781466)
157 George Mason (1.0127316991)
158 Belmont (1.01037098436)
159 Drake (1.00978591075)
160 Boston U. (1.0066245832)
161 Long Beach St. (1.00531900747)
162 Wis. Green Bay (1.00443873827)
163 Middle Tenn. St. (1.0040234954)
164 E. Tennessee St. (0.999028143885)
165 W. Michigan (0.997908903288)
166 Charleston (0.995769703748)
167 Massachusetts (0.993237234191)
168 Vermont (0.992636619434)
169 Stephen F. Austin (0.992078383393)
170 San Jose St. (0.990951447097)
171 Western Carolina (0.989560984201)
172 Jacksonville (0.988208544543)
173 Morgan St. (0.988187032316)
174 St. Peter's (0.987434951414)
175 Eastern Kentucky (0.984921130827)
176 North Texas (0.98308405131)
177 San Diego (0.982462098732)
178 St. Joseph's (0.982013509425)
179 Campbell (0.981094601526)
180 Buffalo (0.980586437795)
181 Texas A&M C.C. (0.980450115889)
182 San Francisco (0.979614271679)
183 Lipscomb (0.978847238664)
184 Troy (0.978752226445)
185 Denver (0.977904582032)
186 Tulane (0.973828552343)
187 Valparaiso (0.971440053644)
188 Wyoming (0.969504887549)
189 Rider (0.968903611411)
190 UTSA (0.966808977859)
191 Davidson (0.965219611591)
192 Quinnipiac (0.9645439955)
193 Portland St. (0.963959749582)
194 Georgia St. (0.963341514284)
195 Montana St. (0.961863732373)
196 Austin Peay (0.961606497705)
197 Florida Atlantic (0.961338781808)
198 Seattle (0.960849695374)
199 Citadel (0.959919699097)
200 Stony Brook (0.958583193014)
201 Hawaii (0.958160559305)
202 East. Michigan (0.957666833774)
203 Manhattan (0.956681053327)
204 N.C. Wilmington (0.955937877818)
205 SE Louisiana (0.955695369221)
206 Arkansas St. (0.955658899045)
207 Cent. Michigan (0.955530555015)
208 CSU Fullerton (0.955328868302)
209 Mount St. Mary's (0.952138858162)
210 LA Lafayette (0.951564195219)
211 Loyola (MD) (0.950917091975)
212 Robert Morris (0.950193181867)
213 James Madison (0.949139912578)
214 Lehigh (0.948847191123)
215 Canisius (0.947495634393)
216 South Dakota St. (0.944618705407)
217 Mercer (0.943567028733)
218 Evansville (0.942094806697)
219 Eastern Illinois (0.942077846026)
220 Loyola Chicago (0.940620897717)
221 Air Force (0.937820375277)
222 Rice (0.937433519282)
223 Bowling Green (0.937138806999)
224 East Carolina (0.937019454289)
225 CSU Northridge (0.933540128377)
226 Radford (0.93267808989)
227 Santa Clara (0.93125890574)
228 Ball St. (0.930139251736)
229 UC Riverside (0.929022895911)
230 Delaware (0.919699145274)
231 UC Davis (0.917266637412)
232 Texas St. (0.916816204084)
233 Hampton (0.916005786284)
234 Youngstown St. (0.913542856471)
235 UC Irvine (0.913422115937)
236 Winthrop (0.912933729441)
237 North Dakota St. (0.912667788442)
238 IPFW (0.912461710211)
239 Northern Arizona (0.912376961501)
240 Furman (0.91107894616)
241 Northern Illinois (0.910570088628)
242 Maine (0.909569961938)
243 Pepperdine (0.909140520009)
244 Cal Poly (0.908616846694)
245 South Alabama (0.907715202787)
246 South Dakota (0.907044756119)
247 TX Arlington (0.902755954482)
248 Towson (0.901703876797)
249 Lafayette (0.900071618008)
250 Holy Cross (0.899151193669)
251 Ill. Chicago (0.899123157259)
252 Samford (0.89708173223)
253 N.C. Greensboro (0.892757630778)
254 Jacksonville St. (0.89234685562)
255 Chattanooga (0.891545488077)
256 Tennessee Tech (0.890973096836)
257 Delaware St. (0.890783284227)
258 AR Pine Bluff (0.888311339707)
259 Bucknell (0.887253453558)
260 Tennessee St. (0.886729625546)
261 AR Little Rock (0.884556624427)
262 New Hampshire (0.878978670758)
263 S. Carolina St. (0.878306321575)
264 Yale (0.876609155053)
265 Kennesaw St. (0.875666240328)
266 Elon (0.874352155997)
267 Army (0.874314005699)
268 Lamar (0.872306092843)
269 Idaho St. (0.871848292114)
270 LA Monroe (0.871742295017)
271 Eastern Wash. (0.869155711697)
272 North Florida (0.867019799597)
273 Nicholls St. (0.867006397925)
274 Brown (0.865783047587)
275 CSU Bakersfield (0.865745360057)
276 McNeese St. (0.865410750797)
277 Sacred Heart (0.865378603374)
278 Colgate (0.864695782975)
279 Texas Southern (0.863009684577)
280 Binghamton (0.862622578114)
281 Western Ill. (0.86219512694)
282 South Carolina Upstate (0.861281514247)
283 Long Island (0.858201206647)
284 Navy (0.856990465842)
285 Bethune-Cookman (0.856099732481)
286 Monmouth (0.855280087523)
287 American (0.854920662629)
288 Georgia Southern (0.853849978432)
289 Norfolk St. (0.853582737548)
290 Mo. Kansas City (0.852978551836)
291 Albany (0.851549653552)
292 Northwestern St. (0.850817879413)
293 Liberty (0.849273130066)
294 Columbia (0.847961121918)
295 Sacramento St. (0.846234761877)
296 Florida Intl. (0.845951636946)
297 Fla Gulf Coast (0.844015970082)
298 N.C. A&T (0.843266962986)
299 Fordham (0.842261558846)
300 Central Arkansas (0.842034662887)
301 N.C. Asheville (0.841029152302)
302 Charleston S. (0.839273865302)
303 Savannah St. (0.83350880015)
304 New Orleans (0.833436865913)
305 Longwood (0.833154369927)
306 Pennsylvania (0.832756840159)
307 SE Missouri St. (0.831200036065)
308 High Point (0.830341576067)
309 Jackson St. (0.824825289035)
310 St. Francis (NY) (0.82169277423)
311 Alabama St. (0.821158818157)
312 Centenary (0.816611754535)
313 Central Conn. St. (0.814334933526)
314 Hartford (0.809597321769)
315 Fair. Dickinson (0.806675830473)
316 Prairie View A&M (0.805889182488)
317 St. Francis (PA) (0.80310732592)
318 Utah Valley (0.802929072494)
319 Virginia Military (0.800528692777)
320 Coppin St. (0.797568659729)
321 Southern Utah (0.79715697383)
322 Marist (0.79569747294)
323 Toledo (0.795687209756)
324 UMBC (0.791691249542)
325 Houston Baptist (0.790642362361)
326 Tenn-Martin (0.790596858677)
327 MD Eastern Shore (0.78968490459)
328 Texas Pan Amer. (0.788033398015)
329 Stetson (0.783807589179)
330 Florida A&M (0.774545877131)
331 Wagner (0.774015806347)
332 Gardner-Webb (0.773658373834)
333 Presbyterian (0.768479626979)
334 SIU Edwardsville (0.764643754766)
335 North Dakota (0.76428770317)
336 Winston-Salem (0.764161111474)
337 Miss. Valley St. (0.756758011846)
338 Howard (0.756013542522)
339 Alabama A&M (0.754898481056)
340 N.C. Central (0.751999889974)
341 Dartmouth (0.749429845128)
342 N.J.I.T. (0.747728556783)
343 Chicago St. (0.743882983276)
344 Grambling St. (0.727509221921)
345 Southern (0.704936280747)
346 Bryant (0.678185491097)
347 Alcorn St. (0.65936594813)