Posted by Danny TarlowI'll add this to my to-read list, but some subtle aspects of the abstract wording bother me. For example, play-calling is often done by the head coach or offensive coordinator. The quarterback usually only has the option to make small changes to a given play (e.g. choose whether to run it right or left), or the ability to call an audible. I guess I need to read it to fully understand the play-calling scenario they're addressing. S. D. Patek and D. P. Bertsekas,"Play Selection in American Football: a Case Study in Neuro-Dynamic Programming", Chapter 7 in Advances in Computational and Stochastic Optimization, Logic Programming, and Heuristic Search: Interfaces in Computer Science and Operations Research, David L. Woodruff, editor. Kluwer Academic Publishers, Boston, 1997.
Abstract: We present a computational case study of neuro-dynamic program- ming, a recent class of reinforcement learning methods. We cast the problem of play selection in American football as a stochastic shortest path Markov Decision Problem (MDP). In particular, we consider the problem faced by a quarterback in attempting to maximize the net score of an offensive drive. The resulting optimization problem serves as a medium-scale testbed for numerical algorithms based on policy iteration. The algorithms we consider evolve as a sequence of approximate policy eval- uations and policy updates. An (exact) evaluation amounts to the computation of the reward-to-go function associated with the policy in question. Approxi- mations of reward-to-go are obtained either as the solution or as a step toward the solution of a training problem involving simulated state/reward data pairs. Within this methodological framework there is a great deal of flexibility. In specifying a particular algorithm, one must select a parametric form for esti- mating the reward-to-go function as well as a training algorithm for tuning the approximation. One example we consider, among many others, is the use of a multilayer perceptron (i.e. neural network) which is trained by backpropaga- tion. The objective of this paper is to illustrate the application of neuro-dynamic programming methods in solving a well-defined optimization problem. We will contrast and compare various algorithms mainly in terms of performance, al- though we will also consider complexity of implementation. Because our version of football leads to a medium-scale Markov decision problem, it is possible to compute the optimal solution numerically, providing a yardstick for meaningful comparison of the approximate methods.