NFL Play By Play Analysis

NFL Play By Play Analysis

Advanced NFL Stats just released the play by play of the 2002 season on.

I some quick analysis of the data using Hive and MapReduce and decided to look at incomplete passes.  The code is here on my GitHub account.



Most incomplete passes from a QB to a receiver.


Most incomplete passes from a QB to a receiver averaged out over the number of seasons they played together and ordered by the highest average.


 Update: Added in 2010 data.


  1. I think it’d be fun to try to query this data in Impala, while a game is on, to help you quickly predict what’s going to happen next.

  2. You’d have to preprocess the data more to make it easier to query. All of this data is parsed from the description. You’d have to take the description and break it down by type of plays and who participated.

  3. would it be possible for you to add the dependencies file to the upload? I’m trying to run the sample and I’m having a ton of difficulty resolving all the references. Sorry, new to java.

  4. @eric – You’re going to have install CDH (Hadoop) here Probably the easiest way to do this is by using a virtual machine with CDH already installed. Go to the download page and choose a virtual machine download option. Install Eclipse on the VM and the project’s dependencies should resolve.


  1. Augmenting Unstructured Data - Programming - O'Reilly Media - [...] produce? You’ll have to come to my OSCON 2013 talk to find out. For now, you can check out …

Leave a Reply

Your email address will not be published.