Sunday, November 30, 2008

Cocktail party conversation starter (aka how to mess with statistics 101 students)

Posted by Danny Tarlow
I don't want to sound too pedantic here, so I came up with a witty title. Please don't randomly bring this up in any sort of normal conversation (ahah, get it?). The (normalized) product of two Gaussian distributions is itself a Gaussian distribution. If you're not afraid of a little algebra, you can prove it yourself by writing down the expression for the probability density function of a Gaussian random variable twice, do the multiplication, combine the exponent terms, rearrange terms, then complete the square (yes, I know that's too fast if you don't know what I'm talking about). If you take the (normalized) sum of two Gaussian distributions, you get a mixture of distributions that can have two modes, so it's certainly not Gaussian. Now here's the tricky part. If you have two Gaussian random variables
A ~ N(mu_A, sigma_A^2)
B ~ N(mu_B, sigma_B^2)
then you define random variable C to take on the value of the sum A + B, then C will be distributed according to a Gaussian distribution:
C ~ N(mu_A + mu_B, sigma_A^2 + sigma_B^2)
If instead you define random variable D to take on the value of the product A * B, then D will not be distributed normally. As an example, if A = B and mu_A = mu_B = 0 and sigma_A = sigma_B = 1, then D is distributed according to a chi-square distribution with 1 degree of freedom. The "trick" (if you want to call it that) comes from the loose wording people use when they say things like "the product of two Gaussians." In the first case, you are actually multiplying probability distributions. In the second case, you are multiplying the values of draws from probability distributions -- it's kind of subtle. Unfortunately, both interpretations are reasonable and used in practice. The first one comes up most for me, because if you have two independent beliefs about the value of a variable, then the right thing to do to combine the evidence is to multiply the distributions. The second comes up in places like multiplicative models.

Friday, November 28, 2008

Temporal social networks

Posted by Danny Tarlow
This sounds like a really interesting data set. It shows how the social (Facebook) connections between a class of students at Harvard evolved over a four-year period. I'll add checking it out to the queue with the 17 other mini projects on my todo list.

Wednesday, November 26, 2008

Even more football

Posted by Danny Tarlow
I'll add this to my to-read list, but some subtle aspects of the abstract wording bother me. For example, play-calling is often done by the head coach or offensive coordinator. The quarterback usually only has the option to make small changes to a given play (e.g. choose whether to run it right or left), or the ability to call an audible. I guess I need to read it to fully understand the play-calling scenario they're addressing. S. D. Patek and D. P. Bertsekas,"Play Selection in American Football: a Case Study in Neuro-Dynamic Programming", Chapter 7 in Advances in Computational and Stochastic Optimization, Logic Programming, and Heuristic Search: Interfaces in Computer Science and Operations Research, David L. Woodruff, editor. Kluwer Academic Publishers, Boston, 1997.
Abstract: We present a computational case study of neuro-dynamic program- ming, a recent class of reinforcement learning methods. We cast the problem of play selection in American football as a stochastic shortest path Markov Decision Problem (MDP). In particular, we consider the problem faced by a quarterback in attempting to maximize the net score of an offensive drive. The resulting optimization problem serves as a medium-scale testbed for numerical algorithms based on policy iteration. The algorithms we consider evolve as a sequence of approximate policy eval- uations and policy updates. An (exact) evaluation amounts to the computation of the reward-to-go function associated with the policy in question. Approxi- mations of reward-to-go are obtained either as the solution or as a step toward the solution of a training problem involving simulated state/reward data pairs. Within this methodological framework there is a great deal of flexibility. In specifying a particular algorithm, one must select a parametric form for esti- mating the reward-to-go function as well as a training algorithm for tuning the approximation. One example we consider, among many others, is the use of a multilayer perceptron (i.e. neural network) which is trained by backpropaga- tion. The objective of this paper is to illustrate the application of neuro-dynamic programming methods in solving a well-defined optimization problem. We will contrast and compare various algorithms mainly in terms of performance, al- though we will also consider complexity of implementation. Because our version of football leads to a medium-scale Markov decision problem, it is possible to compute the optimal solution numerically, providing a yardstick for meaningful comparison of the approximate methods.

Wednesday, November 19, 2008

Crunch power

Posted by Danny Tarlow
Yikes! And for those of us who dream of having supercomputers in our home office:

Multiplying for success

Posted by Danny Tarlow
From the article:
Now how does this elucidate the elusive X-Factor? My esteemed colleague Dean Keith Simonton [1] offers a nuanced genetic model of talent that I think is relevant. Simonton has argued that additive models of talent are too simplistic (see last post for an additive model of music talent). It's too simple to say that practice + music ability + high IQ equals musical ability. No, Simonton says that talent, especially in complex domains, is better represented by a multidimensional and multiplicative model. When you hear the term multiplicative model, you should think of AND, and when you hear the term additive model, you should think OR. Essentially, ability in any given task is better modeled by saying that you have to be strong in every relevant trait than by saying that you can make up for a lack of strength in one trait with more strength in another. It is very hard to make up for a lack of musical ability with a high IQ and lots of practice (if your goal is overall musical achievement), for example.

Tuesday, November 18, 2008


Posted by Danny Tarlow
I was looking at Intrade's market on potential Secretary of State nominees this morning, and I felt that Hillary's odds were somewhat overstated at $84 for a $100 payoff if she wins (see here and here). Across the board, I felt that was contributing to a low estimate for Bill Richardson ($9 for a $100 contract), so I was thinking about spending $9 to put my money where my mouth is (if Richardson were chosen, I would get a payoff of $100 for that bet). EDIT: I'm glad I didn't make that bet. My credit card company doesn't let me make payments to Intrade, so I gave up shortly after, but I did notice that the spread between Bid and Ask prices were quite large in some of these low volume markets. I did a bit of Googling, and it led me to some tangentially related, interesting articles about Intrade:

Summers time?

Posted by Danny Tarlow
The excerpt from Super Crunchers in this article is an interesting look into the deeper details of the comments Larry Summers made about the differences between men and women in science and mathematics: I think it's relevant to keep in mind that Summers made this claim at a conference on "Diversifying the Science & Engineering Workforce." You can also see the caveats he lays out by looking at the full text of his speech:
I asked Richard, when he invited me to come here and speak, whether he wanted an institutional talk about Harvard's policies toward diversity or whether he wanted some questions asked and some attempts at provocation, because I was willing to do the second and didn't feel like doing the first. And so we have agreed that I am speaking unofficially and not using this as an occasion to lay out the many things we're doing at Harvard to promote the crucial objective of diversity. There are many aspects of the problems you're discussing and it seems to me they're all very important from a national point of view. I'm going to confine myself to addressing one portion of the problem, or of the challenge we're discussing, which is the issue of women's representation in tenured positions in science and engineering at top universities and research institutions, not because that's necessarily the most important problem or the most interesting problem, but because it's the only one of these problems that I've made an effort to think in a very serious way about. The other prefatory comment that I would make is that I am going to, until most of the way through, attempt to adopt an entirely positive, rather than normative approach, and just try to think about and offer some hypotheses as to why we observe what we observe without seeing this through the kind of judgmental tendency that inevitably is connected with all our common goals of equality. It is after all not the case that the role of women in science is the only example of a group that is significantly underrepresented in an important activity and whose underrepresentation contributes to a shortage of role models for others who are considering being in that group. To take a set of diverse examples, the data will, I am confident, reveal that Catholics are substantially underrepresented in investment banking, which is an enormously high-paying profession in our society; that white men are very substantially underrepresented in the National Basketball Association; and that Jews are very substantially underrepresented in farming and in agriculture. These are all phenomena in which one observes underrepresentation, and I think it's important to try to think systematically and clinically about the reasons for underrepresentation.
There is a pretty overwhelming consensus that the arguments he goes on to make are flawed in "twenty different ways," but I think the rough idea of using an order statistics type approach is interesting -- rather than explaining differences we see in ultra-competitive positions as evidence of different means, we can equally explain them as evidence of different standard deviations. Now, we can debate whether nurture or nature can better explain different standard deviations in characteristics that lead one to high-powered science and engineering jobs, and Summers goes on to present some arguments that it may not be all nature, which is probably the source of most of his troubles with the media. Regardless, Summers's speech is interesting and intellectually provocative. One of the things that I admire about Barack Obama is that he generally speaks in an intelligent, more nuanced manner than most politicians I've seen. I felt that same sort of appreciation reading Summers's speech. Now, I won't go so far as to argue whether this speech should have gotten Summers dismissed as Harvard's president, but I agree that the reason why his dismissal might be justified is because of the drastic oversimplification of his arguments that are presented to the broader public. At least in my reading, I saw a person looking at a complex piece of data and trying to come up with some plausible hypotheses that explain it. Nowhere did I see any evidence of latent sexist beliefs held by Summers. I think Summers is an excellent fit for an Obama presidency that is serious about taking an open-minded, pragmatic approach to tackling the problems our country is facing without worrying about the media response and political implication of every decision.

Monday, November 17, 2008

More football

Posted by Danny Tarlow
Here's an example of a data-driven approach to decision-making in football strategy. This is from 2006. I wonder why we haven't heard more about it. The article gives an attempt at an explanation:
The NFL hasn't embraced the technology just yet. The league is known for its conservative decisions and its trust in the highly paid coaches -- and not necessarily computers. Zeus's makers say the program could also be tailored to college football.
I don't completely buy it. If it works well and can help a coach, there should be enough incentive to win games to break the inertia. Somehow this product isn't quite the right way to do it. Or maybe the developers need to talk to the Florida coach from my other football statistics post. More at NY Times:

Information warfare

Posted by Danny Tarlow
Neat video about how the US is using unmanned aircrafts to collect data from over enemy territory. Side note: The website should have more descriptive URLs. Something like should drive more organic search engine traffic.

Sunday, November 16, 2008

Looking in brains

Posted by Danny Tarlow
I'll be the first to admit that I don't understand why people make the decisions that they do about money, but I'm not sure I'd ever turn to fMRI to help. These guys are, though. I like the high level notion of mixing psychology and neuroscience with economics:

Saturday, November 15, 2008

Statistics really are everywhere

Posted by Danny Tarlow
This just seems like the right way to run a football team. Has anybody worked on fancy ways of using statistics for choosing football strategies? I'd be interested in hearing about it.,0,7267488.story

When all else fails

Posted by Danny Tarlow I'm glad this isn't how ties are resolved in the case of a presidential election. Can you imagine John McCain and Barack Obama standing in the middle of a football stadium, waiting for the result of a coin toss to decide who becomes the next president?