Machine Learning for Human Memorization

A machine learning researcher, Danny Tarlow, has come up with a way to describe his problem in competitive scrabble in programming terms. [Here’s a link to the post][post], and here’s his rough description of the problem:

> As some of you know, I used to play Scrabble somewhat seriously. Most Tuesdays in middle school, I would go to the local scrabble club meetings and play 4 games against the best Scrabble players in the area (actually, it was usually 3 games, because the 4th game started past my bedtime). It’s not your family game of Scrabble: to begin to be competitive, you need to know all of the two letter words, most of the threes, and you need to have some familiarity with a few of the other high-priority lists (e.g., vowel dumps; short q, z, j, and x words; at least a few of the bingo stems). See here for a good starting point.

> Anyway, I recently went to the Toronto Scrabble Club meeting and had a great time. I think I’ll start going with more regularity. As a busy machine learning researcher, though, I don’t have the time or the mental capacity to memorize long lists of words anymore: for example, there are 972 legal three letter words and 3902 legal four letter words.

> So I’m looking for an alternative to memorization. Typically during play, there will be a board position that could yield a high-scoring word, but it requires that XXX or XXXX be a word. It would be very helpful if I could spend a minute or so of pen and paper computation time, then arrive at an answer like, “this is a word with 90% probability”. So what I really need is just a binary classifier that maps a word to probability of label “legal”.

> Problem description: In machine learning terms, it’s a somewhat unique problem (from what I can tell). We’re not trying to build a classifier that generalizes well, because the set of 3 (or 4) letter words is fixed: we have all inputs, and they’re all labeled. At first glance, you might think this is an easy problem, because we can just choose a model with high model capacity, overfit the training data, and be done. There’s no need for regularization if we don’t care about overfitting, right? Well, not exactly. By this logic, we should just use a nearest neighbors classifier; but in order for me to run a nearest neighbors algorithm in my head, I’d need to memorize the entire training set!