Automating Text Cleaning

I am fundamentally ambivalent about the automation of text-cleaning: spending time with the data, by getting unexpected results from your attempts at normalization strikes me as one way to get to know the data and to be in a position to do better analysis. That noted, there have been a number of interesting text-cleaning libraries, or text-cleaning functionality built into analytic libraries, that have caught my attention over the past year or so. The most recent of these is clean-text. Installation is simple:

pip install clean-text

And then:

from clean-text import clean

The clean(the_string, *parameters*) takes a number of interesting parameters that focus on a particular array of difficulties:

Where I Want to Live

Ostensibly an essay on urban planning, and how Houston doesn’t plan for walking, Not Just Bike‘s “How I Got Into Urban Planning (and Why I Hate Houston)” is really an exploration of how traveling to, and spending time in, different places can open your eyes to what you think of as given and how it can lead to you choosing something different. It could be argued that every society gets things right and gets things wrong, and if you are lucky, you will, or get yourself to, be mobile enough to view a selection of societies and then choose the one that fits you best. But what if you are not lucky? What if you aren’t mobile? What if you are stuck? I think too often, and this is one of the subtexts of the essay (I think), we do not recognize that poverty isn’t a choice, but a lack of choices. How societies strive, or whether they strive at all, to give everyone within its bounds the ability to make fundamental choices is probably a better measure of its health than many other metrics we use.

Of Mollusks, Matrices, Modeling

In the Science Museum of Minnesota this past weekend, I found myself pouring rather closely over the mollusk exhibit, if only because I could not fathom—pardon the pun—why there was such a grand display of them. I didn’t have to look far: Mollusks are often the leading indicators that an environment is in danger. The museum is home to a larger collection of mollusk shells, many of which reveal where mollusks once thrived but are now scarce or non-existent thanks to changes in the landscape brought about by agricultural or industrial contamination.

Leading indicators is, of course, quite popular right now, because a lot of people are interested in being able to predict the future based on the kinds of early successes we have had with machine learning whereby algorithms trained on a reasonably large dataset can discover the same patterns in new data. I am reminded of work reported by Peter Brooking and Singer on XXX as well as the “calculus of culture” being developed by Jianbao Gao.

What you would need of course is a reasonable definition of the parameters of your “socio-cultural matrix.” The social dimensions would be all those non-text data points that might be of interest and associated with humans either individually or collectively. The cultural dimensions would be texts and other discernible behaviors, again either described individually or collectively. We know this is possible because, to some degree, Cambridge Analytica has already done it, and we can be sure that other organizations are doing the same and just not talking about it. (In a moment when all this data is available, either by cash, hack, or crook, you would be a fool not to collect it all and compile it as many ways as you can possible imagine, and then some.)

Breaking off some piece of this larger matrix, or set of matrices, is something we all need to be better about doing: modeling complex environments is something that needs to get taught and practiced more widely—all the while reminding people never to mistake the map for the territory. To some degree the social sciences do some of this at the computational end, but the kind of statistical, and sometimes speculative, modeling suggested here is not as pervasive in public discourse as it should be.

Connective Environments

I have wondered for a little bit now how social information systems fit within the Army’s MDO conceptual framework. I understand that they are part of what the Army considers “information operations,” which has a variety of dimensions to it, from OSINT (open source intelligence — basically scraping the web and other public sources) to information warfare. What I have not understood is where information fits within the conventional MDO framework, which seems very tied still to physical environments, with cyberspace representing a non-physical, thus “virtual”, environment.

It would appear, from a recent briefing, which is entirely in the public domain that information is considered a connective environment, like the electro-magnetic spectrum. You would think that cyber might be considered a connective environment — just as land, sea, air, and space are really connective environments — within which certain kinds of operations take place. But the reverse is true?

I’m scratching my head on this one.

The Internecine Project

Any number of films have left lasting impressions on me. Certainly some have gone onto shape my imagination in various ways. A lot of them are films that I watched with my father, like The Man with No Name trilogy that featured Clint Eastwood, or any number of WW2 films — Kelly’s Heroes or The Dirty Dozen or The Longest Day to name just a few — that shaped the fiction I read and the movies I watched as I grew older.

Then there are the films I found myself and watched, some of which were either “made for television” or found their only real home there or on the nascent cable television channels. The Blonde with One Black Shoe certainly falls under the latter category, as does a host of British, or near-British, spy thrillers. Like the Harry Palmer trilogy, the original Italian Job, or The Internecine Project, a film that stayed in my head for as long as it has because I loved its hammered dulcimer soundtrack so much that I recorded it as it played on air onto a cassette tape which contained a host of other favorite soundtracks, most of which I can no longer remember.

It turns out that The Internecine Project is not only watchable again on Youtube, but its soundtrack is available on Apple Music.

Terry Talks Movies

I don’t know how the Youtube channel Terry Talks Movies made it into my stream, but it did and it’s turned up a number of older movies to watch. In a review of “1960s Science Fiction Movies You Should See,” Terry observes the following about the protagonists of Privilege, a film he describes as a didactic docudrama (worth watching):

They are innocents in a world where predatory men are turning the passions of young people into social cages in which to enslave them.

Hello world (again)!

New possibilities and new horizons do not necessarily demand new infrastructure, but in this case my first shared hosting provider, A Small Orange, was long ago sold to a mega-provider and the prices have gone up while the services have somewhat declined. When I asked about alternatives, one hosting provider, CynderHost, stepped forward and invited me to try them out. So far, I have liked what I have seen, and it was time that I moved, if only to re-learn some of the basics of hosting and also to have a sense of what it is to move and what all needs to be moved. There are bound to be some hiccoughs along the way, but, in the end, you might as well embrace the change. It’s coming anyway.