Stay Clear

I’m giving a talk on Wednesday, December 21, to the Rotary Club of Eunice (Louisiana). I wanted to end with something a bit humorous, that reflected the nature of the talk, which is to take things with which they are familiar, agricultural matters, and to frame them within a folkloristic perspective. I found this image, which originally warned to “stay clear while machine is running”, in my collection of pictograms. A little bit of editing in Pixelmator, which would not let me skew the new text, even the little bit I wanted to, and now I have an image that warns them to “stay clear while the professor is thinking:

Warning: Stay clear while professor is thinking.

Open Notebook

Despite having had the experience of having some of my work copied from publications of initial findings, I remain committed to the idea that science and scholarship should be affairs conducted, insofar as it is practicable and ethical to do so, in the open, available to others for inspection and consideration. To some extent, this website cum blog, which I originally called a logbook, was/is an effort to do some of that. But blogs are now best understood as more oriented toward chronological arrangements, which runs counter to what is often the slow accumulation of notes and ideas that are fundamental to science and scholarship.

While software like WordPress includes the options for more static pages, it is not necessarily easy to set up nor is it necessarily easily integrated into other workflows — that is, being able to use a an always available application that can capture notes and scribblings wherever you happen to be reading or writing. (I suspect Automattic’s purchase of SimpleNote is one step to solving this problem, and, like many others, I will be watching to see how things develop. For now, publishing from within SimpleNote simply creates a URL which one can send to others, like Dropbox or Google Drive.)

I have been following Caleb McDaniel’s experiments in open notebook scholarship for a while now, waiting for a moment where I might be in a position to follow suit. McDaniel’s system is based on Gitit, a Pandoc-based wiki that, obviously, uses Git for publishing results to a web server — it’s not quite clear to me, at this stage of my reading, how much of the Gitit infrastructure is copied to the web.

This past semester, I tried out a Python-based system, MkDocs, which I chose principally for the simplicity of its folder structure and the fact that it seemed least “fussy.” (Obviously such an evaluation comes with a rather large block of salt.) The advantage of Gitit, as I understand it, is that it uses Pandoc to create HTML, but Pandoc can be used within any setup — so long as you have it installed — as can Git. McDaniel’s system is, like so many others, including my own, founded on markdown.

The really exciting part of his note system, I think, is that instead of building one giant Bibtex file, each bibtex entry is at the top of the document that also contains his reading notes and quotations. I have been using Papers for reference management, and while my experience of it has been mixed: the automatic determination of the likely citation information is quite good; my experience of its ability to keep track of associated files — sometimes hard-won PDFs — has been lackluster. The experience of taking notes on electronic documents is okay, and, obviously the auto-generation of quotations from highlighted material is exceptional — but also dependent on the quality of the PDF. (Why, oh why, do so many humanities journals and materials generate such poor PDFs? Is this a workflow issue? Would a LaTeX-based workflow generate better PDFs? What’s going on here?)

It’s this generation of Bibtex on-the-fly, which I find truly exciting, and McDaniel has released the Pandoc filter which does that work. As I move forward with my own system, probably based on MkDocs, I’ll be looking to see if I can either re-use his filter or re-create it within Python. I am not alone in exploring this territory, Johannes Grassler has released [mkdocs-pandoc][], a Python module “contain[ing] a set of filters for converting mkdocs style markdown documentation into a single pandoc-flavored markdown document.”

Why do this? Open science/scholarship is hard, and, as I have experienced, possibly opens you up to some unpleasant experiences. First, there is the principle of openness itself, which is something which one aspires to, knowing that it’s probably never fully achievable because, first, an individual researcher from a public university simply doesn’t have the resources to make it happen, and, second, because human. Second, there is my own reticence to commit everything into a file system that remains outside my control, and, especially, open to corruption — and here I simply mean the dangers of having a system break ungracefully. In the early aughts I had a Windows 2000 machine crash — when the power supply friend, it took out part of the hard disk. The part that remained could not be logged into, leaving several years of work, which, yes, should have been backed up, unaccessible. Since then, I am more active in terms of having multiple copies and having those copies in a format that degrades with some grace. Plain text can’t be beat in this regard.

For the record, I wrote The Amazing Crawfish Boat in Scrivener, and I will probably do the same with the next book, especially now that Scrivener has an iOS companion. But Scrivener is a place where I focus on writing, not necessarily on compilation of notes and ideas — indeed, when I have used it as such, the writing suffers. (To be clear, the app itself can handle pretty much anything you throw at it. That dump everything and sort it out in the writing approach simply doesn’t work for me.)

This business of compiling notes has remained one of those things I wish I could sort out. At present, I have a few Devonthink databases as well as a number of Evernote notebooks, and that’s not counting the collection of plain text notes I have sitting in at least one directory in my Dropbox account — and some sitting in a Ulysses notebook on my iCloud account. I don’t think all these notes necessarily need to come together in one place, but reducing the places isn’t a bad idea, and, just as importantly, finding a place for scientific and scholarly notes that facilitate productivity is one piece of the puzzle that I think can be solved sooner rather than later.

Winter Break 2016-2017 just got a little more interesting.


I can’t help it. I’m just mesmerized by this: a woman falling onto a catch-net filled with leaves. Thanks to inertia the leaves stay in place for a moment, leaving them suspended in mid-air as the net drops down with the woman’s weight in it.

One Meaning of “Statistical Analysis”

One of the things that interests me is all the ways that “statistical analysis” can be defined, even within the confines of a relatively nascent domain like text analytics. Of course, being nascent also means that things are not yet defined. Moreover, as a domain, text analytics is emerging at the intersection of a number of fields. Some of the differences about assumptions of what were the applicable dimensions of statistics, let alone mathematics, were quite striking at this year’s Culture Analytics program at UCLA’s Institute for Pure and Applied Mathematics.

Below is a recent request posted on The Humanist that I am capturing here as another entry in this area:

The work will involve investigating the temporal relationships between
spoken and gesture events, so experience with methods for conducting
statistical analysis (correlation, t-test, anova, hypothesis testing) are expected.

In addition, the preferred workflow is as follows:

Ideally, the work will be done in Python (ideally using pandas), but if people prefer using R, I’d be happy to hear from them.

Listing Python Modules

Sometimes you need to know what Python modules you already installed, the easiest way to get a list is to:

[code lang=python]

This will give you a list of installed modules typically as a series of columns. All you have are names, not version numbers. If you need to know version numbers, then try:

[code lang=python]
import matplotlib

On the Rise of the Machines

In a thoughtful essay in The Guardian, Stephen Hawking argues that scientists, as much as any other individual operating within the sphere of “the elites” as variously understood, need to attend to the rise of populism in the recent elections in the United Kingdom and the United States.

The concerns underlying these votes about the economic consequences of globalisation and accelerating technological change are absolutely understandable. The automation of factories has already decimated jobs in traditional manufacturing, and the rise of artificial intelligence is likely to extend this job destruction deep into the middle classes, with only the most caring, creative or supervisory roles remaining.

This in turn will accelerate the already widening economic inequality around the world. The internet and the platforms that it makes possible allow very small groups of individuals to make enormous profits while employing very few people. This is inevitable, it is progress, but it is also socially destructive.

His answer: the elites need to be more humble. Really? I don’t know if I’m one of those elites or not. I suspect many would put me in there because I’m an academic, but I look at my paycheck and the declining possibility of retirement, and I don’t feel very elite. One thing I do feel is that it is not, not, the responsibility of scholars and scientists that their expertise has been undermined. I think that moment has to be laid at the doorstep of industry which is always happy to have science when it makes them money, but when it suggests that paradigms shifts are required, prefer the status quo.