TextCMS | John Laudun

The idea of a “trusted system” probably can be attributed to David Allen as much as to anyone else. Certainly the idea is his within the current zeitgeist. Even if you have not heard of him you probably have heard the ubiquitous three letters associated with him, GTD. Allen’s focus is on projects and tasks, but the idea of a trusted system applies just as well to any undertaking. For folks who type for a living, be it words in sentence or functions in a line of code, ideas are just as important as tasks when it comes to accomplishing projects. Allen’s GTD system has a response to ideas, but it largely comes down to putting things in folders.

But as anyone who works with ideas knows, sometimes you don’t know where to put them. And, just as importantly, why should you have to put them in any particular place? In the era of computation – that is, in the era of grep and #tag – having to file things, at least right away, would seem an anachronism that forces us to return to a paper era that often forced us to ignore the way the human mind words. That is, when operating in rich mode the mind is capable of grasping diffuse patterns across a range of items in a given corpus, but finding those items when they are filed across a number of separate folders, or their digital equivalent of directories is tedious work. grep solves some of that problem, of course.

I have largely committed, in the last few weeks, to using DevonThink as the basis for my workflow, because I like its UI and its various features and because it makes casual use so easy – and when I am sitting in my campus office, I need things to be casually easy.

But the more I learn about DevonThink’s artificial intelligence, the more I want to be able to tweak it, add my own dimensions to it. For example, DevonThink readily gives you a word frequency list, but what I want to exclude common words from that list? I know a variety of command line programs that allow me to feed them a “stop list”, a list of words to drop from consideration (and indeed these lists are sometimes known as “drop lists”) when presenting me a table of words and the number of times they appear in a given corpus. I am also guessing that when DT offers to “auto group” or “auto classify” a collection of texts, it is using some form of semantic, or keyword, mapping to do so. What if I would like to tweak those results? Not possible. This is, of course, the problem with closed applications.

The other problem with applications like DevonThink and MacJournal, as much as I like both of them, is that you can do a lot within them, but not so much without. While neither application holds your data captive – both offer a variety of export options – a lot of their functionality exists within the application itself. Titles, tags, etc.

Having seen what these applications can do and how I use them, would it be possible to replicate much of the functionality I prefer in a plain text system that would also have the advantage of, well, being plain text? As the Linux Information Project notes:

Plain text is supported by nearly every application program on every operating system and on every type of CPU and allows information to be manipulated (including, searching, sorting and updating) both manually and programmatically using virtually every text processing tool in existence. … This flexibility and portability make plain text the best format for storing data persistently (i.e., for years, decades, or even millennia). That is, plain text provides insurance against the obsolescence of any application programs that are needed to create, read, modify and extend data. Human-readable forms of data (including data in self-describing formats such as HTML and XML) will most likely survive longer than all other forms of data and the application programs that created them. In other words, as long as the data itself survives, it will be possible to use it even if the original application programs have long since vanished.

Who doesn’t want their data to be around several millennia from now? On a smaller horizon, I once lost some data to a Windows NT crash that could not be recovered even with three IT specialists hovering over the machine. (To be fair to Windows NT, I think I remember the power supply was just about to go bad and that it was going to take the hard drive with it.) Ever since that moment, I have had a tendency to want to keep several copies of my data in several places at the same time. Both DropBox and our NAS satisfy that lingering anxiety, but both of them are largely opaque in their process and they largely sync my data as it exists in various closed formats.

And as the existence of this logbook itself proves, I have problems with focus, and there is something deeply appealing in working inside an environment as singularly focused as a terminal shell. That is, I really do daydream about having a laptop which has no GUI installed. All command line, all the time. Data would be synced via rsync or something like it, and I would da various kinds of data manipulation via a set number of scripts, that I also maintained via Git or something like it.

Now, the chief problem plain text systems have, compared to other forms of content management, is a lack of an ability to hold metadata, and so the system I have sketched out defaults to two conventions about which I am ambivalent but which I feel offer reasonable working solutions.

The first of these conventions is the filename. Whether I am writing in MacJournal or making a note in my notebook, I tend to label most private entries with their date and time. In MacJournal this looks like this: 2012-01-04-1357. In my Moleskine notebook, every page has a day header and each entry has its own title. Diary entries are titled with the time they were begun. So a date-time file naming convention will work for those notes.

When I am reading, I write down two kinds of things: quotes and notes. Quotes are obvious, but notes can range from short questions to extended responses and brainstorming. Quotes are easily named using the Turabian author-date system which would produce a file name that looks like this: Author-date-pagenumber(s). Such a scheme requires that a key be kept somewhere that decodes author-dates into bibliographic entries. What about notes? I think the easiest way to handle this is using author-date-page-note. In my own hand-written notes, I tend to handle page numbers to citations within parentheses and pages to notes with square brackets, but I don’t know that regex on filenames is how I want to handle this.

Filenames handle the basics of metadata, in some fashion, but obviously not a lot, and I am being a bit purposeful here in trying to avoid overly long filenames. For additional metadata, I think the best way to go is with Twitter-style “hashtags”. E.g., #keyword.

Where to put the tags, at the beginning like MultiMarkdown or AsciiDoc, or at the end where they don’t interfere with reading? I haven’t decided yet? I use MultiMarkdown, and PHPMarkdown, almost by default when writing in plain text. The current exception to this is that I am not separating paragraphs by an additional line feed, which is the basis for most Markdown variants. This is just something I am trying, because when I am writing prose with dialogue or prose with short paragraphs, the additional white space looks a bit nonsensical. The fact is, after years of being habituated to books, I am used to seeing paragraphs begin with an indent and no extra line spacing. It’s very tidy looking, and so I am playing with a script through which I pass my indented prose notes and which replaces the tab characters, \t, with a newline character, \n, before passing the text onto Markdown.

Now, this system is extremely limited: it doesn’t handle media. It doesn’t handle PDFs. It doesn’t handle a whole host of things, but that is also its essence. It’s a work in progress. I will let you know how it goes. Look for the collection of scripts to appear on GitHub on some point in the near future.

Some Further Notes on a Plain Text System (2012-01-04-1657)

If you are working in plain text, you are probably still going to want some way of structuring your text, that is marking it up just a little so that you can do a variety of things with it. As I have already noted, the way that I know best is a variant of Markdown known as MultiMarkdown. But there are other systems out there: I have always been intrigued by the amazing scope of reStructuredText and I am somewhat impressed by AsciiDoc. (By way of contrast, I have always hated MediaWiki markup: it is almost incomprehensible to me.) The beauty of reStructuredText is that you can convert it to HTML or a lot of other formats with docutils. Even better is Pandoc, which converts back and forth between Markdown, HTML, MediaWiki, man, and reStructuredText. Oh my!

You can get Pandoc through a standalone installer or you can get it through MacPorts. To get MacPorts, however, you need the latest version of Xcode, which brings me to the topic of the moment: a plain text system is really founded on the Unix way of doing things, which means that your data is in the clear but you as an operator must be more sophisticated. Standalone applications like MacJournal and DevonThink, which I keep mentioning not at all because they are inadequate but because they are so good and because I use them when I am more in an “Apple” mode of doing things, are wonderful because you download them and all this functionality is built in. At the command line, not only do you assemble the functionality you want out of a variety of small applications, but in order to install or maintain those applications you need to have a better grasp of what requires what, also known as dependencies.

The useful Python script Blogpost, a command line tool for uploading posts directly to a WordPress site, is available through a Google Code project, which requires that you get a local copy through Mercurial, a distributed version control system, which is easily available … through MacPorts. There are other ways to get it, but allowing MacPorts to keep track of it means that you have an easier time getting it updated. This works much like Mac’s Software Update functionality, or the new badges that come with the Mac App store that tell you that updates are available. No badges at the command line, but if you allow MacPorts, also known as a package manager, to, well, manage your packages, then all you need to remember to do is to run update once a week or so and all of that stuff is taken care of for you.

And so to summarize the dependencies:

Blogpost -> Mercurial -> MacPorts -> XCode

Package managers, like MacPorts, only keep track of things locally, that is on the one machine on which they are installed, and not across several machines. It’s a bit of a pain to replicate all these steps across various machines, and so I now understand the appeal of debconf for Ubuntu users. I don’t quite know how to make that happen for myself, but I am open to suggestions.

Markdown vs AsciiDoc vs reStructuredText (2012-01-07-1345)

I have been using Markdown for five or more years now. It’s very easy to use, and it does what it does well: provide an easy means for writing documents that can be transformed into HTML. Because of its close ties to HTML, it has all of that markup languages limitations, which is to say it is focused on presentation and not meaning. I have tried in my own personal use, to constrain my use of underlines for the titles of works, as opposed to using asterisks, which typically achieve the same effect. Markdown in its original state does not support footnotes nor tables, MultiMarkdown and PHP Markdown Extra do, but they share in Markdown’s limitations.

I am mostly happy to live within those limits, but there are two other lightweight markup schemes out there that are worth considering: AsciiDoc and reStructuredText.

On Markups (2012-01-26-1432)

For general purposes, Markdown, as well as the other “plain text markup languages”, serves very well. I do not, however, find Markdown very conducive when I am writing either for myself or writing to think. For one, I find I do generally prefer an indented line for the beginning of a paragraph, with no blank line above or below. It’s especially useful when you are either writing through a series of short paragraphs or bits of dialogue, where the Markdown language could very well have half your screen filled with white space.

I also find that I prefer the Creole language’s use of equal signs for headers a better option than the hash signs, which it reserves for numbered lists. Using the hash sign also resolves the problem of having numbers get out of order as you write a list. Markdown of course fixes this as it converts to HTML, but you still have some confusion in the plain text original.

Now, one solution to developing my own markup language would be to fork a version of Markdown, in whatever programming language I would prefer to work in – there are versions of Markdown in Perl, PHP, Python, and Ruby (and I am sure there are more versions in other programming languages). My problem is that I have a pretty extensive back catalog of entries in my WordPress database, 1034 posts as of today, and most of them are in Markdown. I also have over 200 notes in MacJournal, most of which are by default marked up similarly.

It would be easy, I think, to write a script, using something like awk, to go through those posts and replace \n\n with \n\t. The same would be true for numbered lists – replace ^d. with #. – and most uses of the hash signs for headings would be similar.

But instead of working with over one thousand bits of text, and with no real interest in double-checking if everything came out correctly, the better solution might be to proceed with my new markup language and then simply write a quick script to change it to Markdown when I decide to make a text public.

Mind, only some of this is brought about by my current return to command line geekery. It’s also the case that my favorite note-taking application, MacJournal, cannot sync all of my devices easily. Two (or more) computers by Dropbox? No problem. iPhone and iPad … well, you can sync but only through the abomination of getting both machines on the same network, setting them up to sync, etc. This is silly. I already have my MacJournal data sitting in a DropBox account. My iPhone can connect to my Dropbox account. Sync to that.

MacJournal can’t do that. The cool new journaling application Day One can sync many devices through DropBox, but it currently cannot hold images and it does not feature tags. (I suppose one could make tags work the same way I make them work in my textCMS, as hash tags, e.g. #tag, but that only offers me searchability not my preferred way of working with tags, via browsing.) And Day One stays away from plain text files for storage, preferring a variant of the Mac OS plist for formatting entries in the file system. And, too, I have to abide by its preferred markup language, which is Markdown, and not one of my own choosing. But, its UI is quite nice.

@2012-01-26-1432-my_markup