Office Pool Statistics

I just finished watching an O’Reilly webcast on statistics for NFL office pools. I don’t care much about football, unless it’s the other kind of football, but I was interested to see what pieces of Python the presenter, [Tanya Schlusser][], was going to use: [pandas][] and [scikit-learn][]. Her presentation was pretty tense, but, luckily she made the code, including a Jupyter notebook, available on [GitHub][]. *Thank you, Tanya!*

A couple of other things came up in the group chat that accompanied the presentation or in the presentation itself:

* [seaborn][] is statistical data visualization library for Python.
* [statsmodels][] “provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.”
* You can store models in `scikit-learn` in [pickles][].
* I shouldn’t forget about [OpenRefine][].

## Addendum ##

As regular readers of these notes know, installation of `scikit-learn` is as easy as:

% sudo port install py34-scikit-learn

What I didn’t know is that the installation of `seaborn` in MacPorts includes `statsmodels`:

% sudo port install py34-seaborn
—> Computing dependencies for py34-seaborn
—> Dependencies to be installed: py34-patsy py34-statsmodels

I didn’t know about `patsy`, here’s what its readme at GitHub says:

> Patsy is a Python library for describing statistical models (especially linear models, or models that have a linear component) and building design matrices. Patsy brings the convenience of R “formulas” to Python.

[Tanya Schlusser]:

Leave a Reply