Getting NLTK Up and Running on Mac OS X

*Please note that there is a more direct version of these instructions that walks you through setting up everything around a Python 2.7 installation: read it [here](http://johnlaudun.org/20131229-the-complete-python-for-text-analysis/).*

So somehow, somewhere, you got interested in natural language processing, and the Natural Language Toolkit available for Python strikes you as one reasonable place to start. First of all, congratulations for wanting to go the building-block route as opposed to the already assembled route. The first few steps are a bit more complex, but I think you will be gratified pretty quickly with how much you can do and how quickly you are doing it. More importantly, you are doing it for yourself.

Here are the steps that we are going to take to get NLTK up and running on Mac OS X:

1. Install Xcode (and then the Xcode development tools).
2. Install [MacPorts][].
3. Use MacPorts to install all the Python libraries you need, including NLTK.
4. There is no step four.

### First, Xcode

Before you do anything else, you will need to open the [App Store][] and download Xcode, Apple’s Developer’s Suite for creating OS X applications.

I’m not entirely clear what there is in Xcode or what Xcode installs that is needed for package managers like MacPorts, but MacPorts requires it, so go do it … by the way, I think the new way this happens is that the App Store will install an application that you will find in the Applications folder and that you have to click on to install Xcode. (Again, I don’t know why that is.)

As I note in an updated version of this [list of instructions][list], there is now supposed to be a shortcut from the command line to install what you need, but I have had various results and I have received reports of others having the same, various, results. Until this gets cleared up, my best recommendation is to install Xcode the usual way, and then to proceed as MacPorts directs you to.

### Install MacPorts

Download the [Mac OS X Package `pkg` Installer](http://www.macports.org/install.php) and step through the GUI install.

MacPorts should, as part of the install process, run `sudo port selfupdate -v` but you can always run it again. You know, just to make yourself feel better.

### Use MacPorts to Install Python and the Libraries You Need for NLTK

Now you’ll need to install a version of Python. In my case, I am building a setup around Python 2.7, and so I entered `sudo port -v install python27`. The `-v` option gives you a verbose description of what’s happening. Be prepared to watch a lot of stuff scroll by. (If you’d rather not see all that and having the machine quietly do its thing, you can leave the `-v` off. Good for you for having quiet confidence in your Mac.)

How did I know to type in `python27` and not just `python`? Good question. MacPorts gives you some nice functionality with its `search` feature, which you can use to find MacPort portfiles. If you type in:

port search python

A whole lot of stuff is going to fly by, but you can scroll to the middle of the list to see all the versions of Python that are available for you to install. As of this writing, you can install everything from Python 2.4 to Python 3.1.

By the way, perhaps you are savvy enough to know that your Mac already comes with Python (and Perl and Ruby and PHP and goodness knows what else) already installed. Yes, it does, but the consensus seems to be that leave those system installations well enough alone. If you screw up Python, you want it to be an instance the system doesn’t need. Don’t worry. I’m terrible at this coding business and I’ve yet to screw up Python. Python is very good at telling you that you’ve screwed and, thank you very much, it won’t follow your orders down the path to electronic perdition.

We are going to install the most recent version of Python 2, which is Python 2.7. (As of this writing, Python 3 is still considered a draft version of Python — it’s complicated beyond my ability to explain. It’s just as well, as we will see in a minute, the NLTK support only runs up to Python 2.7.

port install python27

Once that’s done, and I should remind you that depending upon your connection speed and the size of any particular portfile, this could take a while, you will want to make it so that your computer turns to your nice custom install of Python and not the one that came with the system. I usually accomplish this by editing my `.bash_profile`, but this did not work for me. Luckily, MacPorts has the solution:

sudo port install python_select

Once you’ve done this, enter `sudo port select –set python python27` and you’re done with your base installation of Python. Now it’s time to process some natural languages, or process languages naturally, or … you know, whatever it is we are going to do with the NLTK.

### Install NLTK, But First Some Dependencies

In all honesty, here is where the magic really happens. I know that sounds weird, especially when we are talking about something that happens at the command line, but, honestly, MacPorts, makes the rest of this so easy that you might just wonder, I tell you, why everyone prefers clanging around with GUI installer packages that you have to go find, download, open, and click on.

Anytime you want to install anything using MacPorts, the best place to start is to see if a *portfile* is available. (Else you can’t.) Using the new-found power of the `port` command, this is quite simple:

port search nltk

Right? We want to see if the NLTK is available as a `portfile` and we want to see what, if any, versions are available. Here is what the search turns up:

py-nltk @2.0.1rc1 (python, textproc)
Natural Language Toolkit

py24-nltk @2.0.1rc1 (python, textproc)
Natural Language Toolkit

py25-nltk @2.0.1rc1 (python, textproc)
Natural Language Toolkit

py26-nltk @2.0.1rc1 (python, textproc)
Natural Language Toolkit

py27-nltk @2.0.1rc1 (python, textproc)
Natural Language Toolkit

Found 5 ports.

Great! There’s a version of the NLTK available that matches the version of Python we just installed. But before you install the NLTK, you will want to know what other Python modules it requires. (Okay, sidebar here, this is really something MacPorts does for you, but I can’t help being a bit of a control freak and wanting to take care of this myself.) Again, MacPorts has your back. Enter:

port deps py27-nltk

And MacPorts reports back:

Full Name: py27-nltk @2.0.1rc1_0
Library Dependencies: py27-numpy, py27-yaml, py27-matplotlib

Now all you have to do is `port install` and then add each of those. While you’re at it, you might as well add `py27-scipy` to your list. Don’t ask; just do it.

### Conclusion

That’s it. You’re done. Fire up IDLE, `import nltk`, and you can can do an amazing assortment of things.

If you like, you can show your love for the makers of the NLTK and buy their book. It’s available both at [O’Reilly][] in a variety of formats and at [Amazon][]. Personally, I find the [NLTK Cookbook][] a little less helpful, but this could be purely a matter of a cognitive style mismatch. It happens. It’s not the author’s fault. It’s me. Really.

### Updates

The first version of this post detailed how to “get up and running” using the HomeBrew package manager, but after running into a number of difficulties, I discovered that something about the HomeBrew setup just doesn’t work. I updated this post to point to the now preferred use of MacPorts, but this post continues to be the most popular. Since I believe it’s turning up in search engines because of the obviousness of the post title, and because I don’t want to send people down the path of pain, I have removed the HomeBrew directions and replaced them with a much more detailed version of the MacPorts installation path. If you would like to try HomeBrew, please drop me a note — my contact info is on the [About][] page — and I’ll be glad to send you directions.

[list]: http://johnlaudun.org/20131229-the-complete-python-for-text-analysis/
[MacPorts]: http://macports.org/
[App Store]: https://itunes.apple.com/us/app/xcode/id497799835
[O’Reilly]: http://shop.oreilly.com/product/9780596516499.do
[Amazon]: http://www.amazon.com/gp/product/0596516495/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=0596516495&linkCode=as2&tag=johnlaudun-20
[NLTK Cookbook]: http://www.amazon.com/gp/product/1849513600/ref=as_li_ss_tl?ie=UTF8&camp=1789&creative=390957&creativeASIN=1849513600&linkCode=as2&tag=johnlaudun-20
[About]: http://johnlaudun.org/about/

Change Order of PATH Entries in Mac OS X

Homebrew prefers that `/usr/local/bin` get seen first so that `brew installed` programs will get used before native programs. You can change this in `.bash_profile`. The easiest way to edit .bash_profile is to open it in a command line editor — the dot before the file name means that it is normally hidden from view. You can “see” it if you enter `ls -a` at the command line, or, if you do some freaky things with the Finder. (I figure that when I’m in the Finder, I’m in one mode and shouldn’t play with things like hidden files. When I’m at the command line, then I’m in maker mode and nothing is invisible!) I use `vi` (because `emacs` scares me):

% vi .bash_profile

Depending upon anything you’ve done before or what’s been done before you, this file may or may not exist or it may or may not have anything in it. If it doesn’t exist, congratulations, you are in the process of creating it. If it does exist, you are now editing it.

My profile looks like this:

# Bash Profile

# PROMPT
PS1=”[\W]% ”

# Homebrew
export PATH=/usr/local/bin:$PATH
export PATH=”$PATH:/usrs/local/Cellar”
export PATH=/usr/local/share/python:$PATH

# Blogpost
export PATH=”$PATH:/Users/john/local/blogpost”
export PATH=”$PATH:/Users/john/local/asciidoc”

# Emacs
alias emacs=’/usr/local/Cellar/emacs/23.3b/bin/emacs’

# ALIASES
alias Learn=’cd Dropbox/personal/programming/learn’

Being a humanist type, I have to name my file, and that’s what you see on the first line. That’s followed by my preference for how I want my prompt to look. Next up is my adjustments to my PATH in order to take advantage of my brewed installations. Everything else should be equally self-explanatory.

You need to log out of your terminal session and then back in in order to enjoy the fruits of your PATH labors.

I did see a note from someone about how changing `/etc/path` might be better, but from some comments on StackOverflow, it seems like that might be a bad idea. A very bad idea. 

Make Your Own Maps

For my own research, I make my own maps, usually creating them in Adobe Illustrator, where I create hand-drawn lines on top of commercially-available maps that I have downloaded (e.g., Google Maps), created through software applications (e.g., Topos), or scanned from paper. This is the only I have known to create a map that displayed the information I felt my reader needed and not too much more.

Turns out, there is another way to do this, perhaps, and it is to use [Open Street Maps][osm]. I am not entirely sure what all is involved, but I look forward to checking it out as soon as the revision of the aesthetics paper is done.

[osm]: http://switch2osm.org/

Zotero Takes Over

My colleagues Clai Rice and Jonathon Goodwin swear by Zotero. I hated it as a Firefox extension, but it is now available as a standalone app. I tried it at 1.0 and it was not ready yet, but at RC 3, it is beginning to get interesting. It even began to sync some data I had synced a very long time ago, which was good, but then it took over my CPU cycles, and that was not so good:

Screen Shot 2012 01 28 at 8 50 27 AM

Deathmarch on Mars

I don’t know Warren Ellis’ work, but I like his verve. This _Mother Board_ interview is quite good. His assessment of the political pandering to the current space industry strikes me as the product of a long-time observer who deserves a listen.

Mathematicians are getting pissed about the journal publishing regime.

[Jason Jackson](http://jasonbairdjackson.com/) is far more expert here, but I keep track of these issues as best I can. The switch to open access is being led by the sciences — strange to see mathematics so encumbered by the old paradigm — and I hope the humanities can follow soon. It’s hard work, but not only does such an effort align with the overall ethical schemes of many of our disciplines but it is it worth it in terms of making our work more accessible not only to each other but also to a larger potential pool of interested individuals.

That said, the next step is a coordinated infrastructure that allows for easy accessing across the entire landscape. That’s the promise of [Project Bamboo](http://projectbamboo.org/).

Home Again

My mother used to sing out “Home again, home again” sometimes when we arrived home after being away for what felt like “a while.” It could have been a week-long vacation, an overnight trip to my grandmother’s, or simply an extended series of errands. She sang it to the tune of “To Market” and it itself now signifies something “homey” for me.

If I didn’t say it myself this morning when we got home, then it rang in my head as wobbled into the house with bags of stuff, bunches of balloons, and a daughter with one arm fixed in a splinted ell. We were home, and it felt good.

Our stay in the hospital was incredibly pleasant by any standard I can imagine. The nursing and support staff at Women’s and Children’s have an amazing ability to appear when you need them. Our night nurses never disturbed us: they whisked in, turned on a bathroom light and cracked a door to be able to see, did what they needed, and then departed again. The day time nurses regularly checked on us, checking in with our daughter to make sure she was doing okay. The child specialist who was also reading the Warrior series, and who revealed to us that Erin Hunter is not a person but a collective, was so personable that Lily missed her when she was gone. The young woman who brought us our dishes was the very face of conscientiousness.

And this says nothing about our daughter’s surgeon and his physician’s assistant who were her primary care givers in this moment. They both took their time to speak with us and answer any and all questions. (And, too, I have to thank the doctor for catching me when I almost fainted after he told us the good news about our daughter’s operation.)

On top of all that, we also had the most amazing experience of friends turning out to show their support. Two of my colleagues showed up, bearing gifts for our daughter, as did several of her classmates: one brought his iPod so that she could listen to music; another brought her barbies so that they could play together — Barbies spilled across the floor of the hospital room!; and yet another brought nail polish so that they could do each other’s nails. Our daughter’s teacher also came by with a poster-board sized get well card. And it didn’t stop there: three teammates from gymnastics showed up as well as two coaches and the owner of the gym. The balloons and posters and cookies and pet pillows really began to pile up in the room.

What we would like to say to all of you is *thank you, thank you, thank you*. You simple cannot know how much you brightened not only Lily’s day but ours as well. At times, the visits came back to back to back and they were exhausting, but it was the kind of exhaustion one lives for, the exhaustion of being surrounded by such amazingly good people, by such thoughtful, kind, generous people that the world itself is almost too much and we realize that we really do enjoy an embarrassment of riches.

Thank you.

Towards My Own Markup

*Please note: this entry was written while sitting in my daughter’s hospital room. It was written for want of something to do on this the second day of her hospitalization as the timelessness and placelessness of hospitals continued to fray the edges of my consciousness. It was written in an attempt to begin to smooth some of the fraying.*

For general purposes, Markdown, as well as the other “plain text markup languages”, serves very well. I do not, however, find Markdown very conducive when I am writing either for myself or writing to think. For one, I find I do generally prefer an indented line for the beginning of a paragraph, with no blank line above or below. It’s especially useful when you are either writing through a series of short paragraphs or bits of dialogue, where the Markdown language could very well have half your screen filled with white space.

I also find that I prefer the `Creole` language’s use of equal signs for headers a better option than the hash signs, which it reserves for numbered lists. Using the hash sign also resolves the problem of having numbers get out of order as you write a list. Markdown of course fixes this as it converts to HTML, but you still have some confusion in the plain text original.

Now, one solution to developing my own markup language would be to fork a version of Markdown, in whatever programming language I would prefer to work in — there are versions of Markdown in Perl, PHP, Python, and Ruby (and I am sure there are more versions in other programming languages). My problem is that I have a pretty extensive back catalog of entries in my WordPress database, 1034 posts as of today, and most of them are in Markdown. I also have over 200 notes in MacJournal, most of which are by default marked up similarly.

It would be easy, I think, to write a script, using something like `awk`, to go through those posts and replace `\n\n` with `\n\t`. The same would be true for numbered lists — replace `^d.` with `#.` — and most uses of the hash signs for headings would be similar.

But instead of working with over one thousand bits of text, and with no real interest in double-checking if everything came out correctly, the better solution might be to proceed with my new markup language and then simply write a quick script to change it to Markdown *when I decide to make a text public*.

Mind, only some of this is brought about by my current return to command line geekery. It’s also the case that my favorite note-taking application, MacJournal, cannot sync all of my devices easily. Two (or more) computers by Dropbox? No problem. iPhone and iPad … well, you can sync but only through the abomination of getting both machines on the same network, setting them up to sync, etc. This is silly. I already have my MacJournal data sitting in a DropBox account. My iPhone can connect to my Dropbox account. Sync to that.

MacJournal can’t do that. The cool new journaling application Day One *can* sync many devices through DropBox, but it currently cannot hold images and it does not feature tags. (I suppose one could make tags work the same way I make them work in my textCMS, as hash tags, e.g. `#tag`, but that only offers me searchability not my preferred way of working with tags, via browsing.) And Day One stays away from plain text files for storage, preferring a variant of the Mac OS `plist` for formatting entries in the file system. And, too, I have to abide by its preferred markup language, which is Markdown, and not one of my own choosing. But, its UI is quite nice.