Some Notes on a Plain Text (CM) System

The idea of a “trusted system” probably can be attributed to David Allen as much as to anyone else. Certainly the idea is his within the current zeitgeist. Even if you have not heard of him you probably have heard the ubiquitous three letters associated with him, GTD. Allen’s focus is on projects and tasks, but the idea of a trusted system applies just as well to any undertaking. For folks who type for a living, be it words in sentence or functions in a line of code, ideas are just as important as tasks when it comes to accomplishing projects. Allen’s GTD system has a response to ideas, but it largely comes down to putting things in folders.

But as anyone who works with ideas knows, sometimes you don’t know where to put them. And, just as importantly, why should you have to put them in any particular place? In the era of computation — that is, in the era of `grep` and `#tag` — having to file things, at least right away, would seem an anachronism that forces us to return to a paper era that often forced us to ignore the way the human mind words. That is, when operating in rich mode the mind is capable of grasping diffuse patterns across a range of items in a given corpus, but finding those items when they are filed across a number of separate folders, or their digital equivalent of directories is tedious work. `grep` solves some of that problem, of course.

I have largely committed, in the last few weeks, to using DevonThink as the basis for my workflow, because I like its UI and its various features and because it makes casual use so easy — and when I am sitting in my campus office, I need things to be casually easy.

But the more I learn about DevonThink’s artificial intelligence, the more I want to be able to tweak it, add my own dimensions to it. For example, DevonThink readily gives you a word frequency list, but what I want to exclude common words from that list? I know a variety of command line programs that allow me to feed them a “stop list”, a list of words to drop from consideration (and indeed these lists are sometimes known as “drop lists”) when presenting me a table of words and the number of times they appear in a given corpus. I am also guessing that when DT offers to “auto group” or “auto classify” a collection of texts, it is using some form of semantic, or keyword, mapping to do so. What if I would like to tweak those results? Not possible. This is, of course, the problem with closed applications.

The other problem with applications like DevonThink and MacJournal, as much as I like both of them, is that you can do a lot within them, but not so much without. While neither application holds your data captive — both offer a variety of export options — a lot of their functionality exists within the application itself. Titles, tags, etc.

Having seen what these applications can do and how I use them, would it be possible to replicate much of the functionality I prefer in a plain text system that would also have the advantage of, well, being plain text? As the Linux Information Project notes:

> Plain text is supported by nearly every application program on every operating system and on every type of CPU and allows information to be manipulated (including, searching, sorting and updating) both manually and programmatically using virtually every text processing tool in existence. … This flexibility and portability make plain text the best format for storing data persistently (i.e., for years, decades, or even millennia). That is, plain text provides insurance against the obsolescence of any application programs that are needed to create, read, modify and extend data. Human-readable forms of data (including data in self-describing formats such as HTML and XML) will most likely survive longer than all other forms of data and the application programs that created them. In other words, as long as the data itself survives, it will be possible to use it even if the original application programs have long since vanished.

Who doesn’t want their data to be around several millennia from now? On a smaller horizon, I once lost some data to a Windows NT crash that could not be recovered even with three IT specialists hovering over the machine. (To be fair to Windows NT, I think I remember the power supply was just about to go bad and that it was going to take the hard drive with it.) Ever since that moment, I have had a tendency to want to keep several copies of my data in several places at the same time. Both DropBox and our NAS satisfy that lingering anxiety, but both of them are largely opaque in their process and they largely sync my data as it exists in various closed formats.

And as the existence of this logbook itself proves, I have problems with focus, and there is something deeply appealing in working inside an environment as singularly focused as a terminal shell. That is, I really do daydream about having a laptop which has no GUI installed. All command line, all the time. Data would be synced via `rsync` or something like it, and I would da various kinds of data manipulation via a set number of scripts, that I also maintained via Git or something like it.

Now, the chief problem plain text systems have, compared to other forms of content management, is a lack of an ability to hold metadata, and so the system I have sketched out defaults to two conventions about which I am ambivalent but which I feel offer reasonable working solutions.

The first of these conventions is the filename. Whether I am writing in MacJournal or making a note in my notebook, I tend to label most private entries with their date and time. In MacJournal this looks like this: `2012-01-04-1357`. In my Moleskine notebook, every page has a day header and each entry has its own title. Diary entries are titled with the time they were begun. So a ‘date-time` file naming convention will work for those notes.

When I am reading, I write down two kinds of things: quotes and notes. Quotes are obvious, but notes can range from short questions to extended responses and brainstorming. Quotes are easily named using the Turabian author-date system which would produce a file name that looks like this: `Author-date-pagenumber(s)`. Such a scheme requires that a key be kept somewhere that decodes `author-date`s into bibliographic entries. What about notes? I think the easiest way to handle this is using `author-date-page-note`. In my own hand-written notes, I tend to handle page numbers to citations within parentheses and pages to notes with square brackets, but I don’t know that regex on filenames is how I want to handle this.

Filenames handle the basics of metadata, in some fashion, but obviously not a lot, and I am being a bit purposeful here in trying to avoid overly long filenames. For additional metadata, I think the best way to go is with Twitter-style “hashtags”. E.g., `#keyword`.

Where to put the tags, at the beginning like MultiMarkdown or AsciiDoc, or at the end where they don’t interfere with reading? I haven’t decided yet? I use MultiMarkdown, and PHPMarkdown, almost by default when writing in plain text. The current exception to this is that I am not separating paragraphs by an additional line feed, which is the basis for most Markdown variants. This is just something I am trying, because when I am writing prose with dialogue or prose with short paragraphs, the additional white space looks a bit nonsensical. The fact is, after years of being habituated to books, I am used to seeing paragraphs begin with an indent and no extra line spacing. It’s very tidy looking, and so I am playing with a script through which I pass my indented prose notes and which replaces the tab characters, `\t`, with a newline character, `\n`, before passing the text onto Markdown.

Now, this system is extremely limited: it doesn’t handle media. It doesn’t handle PDFs. It doesn’t handle a whole host of things, but that is also its essence. It’s a work in progress. I will let you know how it goes. Look for the collection of scripts to appear on GitHub on some point in the near future.