DH/Networking Explorations 1

## Following My Own Advice

For years now I have been encouraging students, both beginning and advanced, to keep a journal of their activities as one way of breaking down the barrier to getting writing done. I have especially encouraged graduate students working on their dissertations to try it. And I have done this while only being an intermittent practitioner myself. (I confess that this is in part one of the great advantages of having a spouse who practices the same profession: one is free to do much of the daily review over the dinner table. The pret-a-ecouter audience is great, but it disengages one important dimension of the process: writing.)

And so, John Anderson, if you are reading this post, here is me doing what I said, an account of trying my hand at textual analysis.

## The Onus ##

At the end of last year I was invited to participate in an NEH seminar on “Networks and Networking in the Humanities” which will be hosted by UCLA’s Institute for Pure and Applied Mathematics later this summer. Earlier this year the participants received a list of homework assignments: two books to read, a technical paper or two, and the production of an edge list.

The books have been interesting. (More on each one in separate posts.) The technical paper was at the border of my ken, but I followed chunks of it. The production of the edge list, a list of links in a network, has been the hardest task. Of course, part of it was nomenclature. “Edge list” through for a loop, new as I am to networkese, but I grokked it with the help of the assigned reading — and a variety of web reading. (Thank you, intarwebs.)

But there was another dimension to the edge list assignment that was stymying me: the data. Yes, I have the emergent data from the boat book, but I don’t feel entirely comfortable rushing to produce more data for the sake of the seminar if it means rushing certain dimensions of the research and I don’t quite have a grip on all the data I already have in a way that I am comfortable pouring it into a new paradigm of analysis and modeling. (Like some mental version of Twister.)

And so I needed a data set with which I could work that would allow me to do the kind of analysis that I hoped network theories and models would make possible. In particular I am interested in applying these paradigms to ethnographic contexts where we need to understand how individuals make their way through the world using the ready-made mentifacts that we sometimes call folklore as “equipment for living.”

What I think that means is that I want to understand how individuals within a given group (a social graph, if you will) draw from a repertoire (network) of forms (stories, legends, anecdotes, jokes, etc.) which themselves variously reflect and refract a network of ideas (ideology) dispersed (variably) throughout the group.

Networks of People, Stories, and Ideas

Or, as folklorist Henry Glassie once put it: “Culture is made up of ideas, society of people.” But ideas just don’t bounce around peoples’ heads and they don’t exist out in the world, at least very often, unencapsulated. Ideas and values are usually embedded in the things we say and do.[^1] We keep these things around, these stories and explanations, because they resonate with our values and beliefs. At the same time, the forms not only give shape to the ideas but also shape them.

This dynamic interaction has been the focus of folklore studies for the past century. For the last forty years, studies of culture and language have taken an ethnographic turn, sometimes called “performance” and sometimes called “ethnomethodology,” which has focused on the important role that individuals play in the intertextual network of forms (and thus the ideological network embedded within them).

I am one of those performance-oriented scholars. Performance studies has produced a wide range of profound micro-level studies of folklore in action. In the last decade or so, there has begun to be an attempt to build back toward the philological framework from which the performance orientation sprang and against which it initially pushed back. It’s time to fold these things together, and I think network theories offer one possibility for doing so.

## The Data ##

If not my own data, then what other corpus? I wanted to work with materials that I knew fairly well. I began to build a database of Louisiana folklore in print, focusing especially on tales and legends, but the amount of time to get a large enough corpus digitized and into the database, even using OCR software, quickly loomed too large. A great project, but one that could easily take up an entire summer, not the limited time I had to get something up and usable in order to begin to complete the seminar assignment — which I was late fulfilling anyway.

I did, however, initiate some conversations that may yet produce a foundation for such a database, contacting authors of several texts for electronic copies of their manuscripts to facilitate data entry. (The metadata is entirely a separate matter for now.)

The answer to my question didn’t come to me until I was in Providence, Rhode Island for the sixth, and final, Project Bamboo planning workshop. I don’t know if somebody said something or suggested something, but I struck upon the idea of using Zora Neale Hurston’s _Mules and Men_ as the basis for the seminar assignment and for my own initial explorations into the various software tools that are available. I was reasonably hopeful that somewhere, someone would have digitized the text, and I was right: the text is not in Project Gutenberg, nor in the Oxford Text Archive, but at the University of Virginia’s American Studies’ [hypertext collection][xroads]. There I found a [hypertext version of _Mules and Men_ put together by Laura Grand-Jean in 2001][lgj].

I am not yet at a point where I could deploy a `bash` script to `wget` or `curl` or something else the pages I needed, but since I decided to focus on only the folktales section of the book, the book’s first half, it wasn’t too much of a task to click on each page and then copy the text and paste it into a plain text document in my text editor, Textmate. For reference, I also copied and pasted the HTML in hopes that it might prove useful for getting certain kinds of texts out. That is, I had hopes of figuring out how to tell a piece of software to pull everything out between `

` tags. Unfortunately, Grand-Jean had used some non-standard `

` markup to handle the long blockquotes. I thought about doing some fancy find and replace work with regular expressions, but in the end I decided I would rather work with the plain text, which would also encourage (force) me to re-read the text. The latter proved useful as I came across some long texts embedded in dialogue that were worth including in the extracted corpus.

(The plain text version of Part One of _Mules and Men_ can be found both on [Scribd][] as well as on [GitHub][] — forked critical editions of texts is an interesting idea, no? It weighs in at 55,798 words in 2,127 lines — somewhere along the way I’ll put up some stats on word counts for block quoted text, quoted text, narrative text, etc.)

## And Now for Some Software ##

So I’ve got a digitized text. An ethnographic text.[^2] That will give me people and forms, and I’m reasonably familiar with the kinds of speech communities involved that I can take a crack at ideas. Now I hope to use software to begin to discern those patterns more clearly. (And to produce that edge list.)

The first thing I try is SEASR’s [Meandre][]. Meandre is really something like a software suite, consisting of server and client software, both of which you install and run locally. The server software syncs with the component and workflow repositories at SEASR HQ which are then made available to you through the workbench.

Meandre Workbench

As a quick glance at the UI reveals, it’s not exactly user friendly. Then again, none of this software really is. The good folks running the seminar have provided us with links to useful software: Network Workbench, Wordij, and Pajek (which is, sigh, Windows-only). I am still working my way through these various packages, but I have to say that so far my best results have been using [IBM’s Many Eyes][ibm].

[xroads]: http://xroads.virginia.edu/~HYPER/hypertex.html
[lgj]: http://xroads.virginia.edu/~MA01/Grand-Jean/Hurston/Chapters/siteintroduction.html
[Scribd]: http://www.scribd.com/doc/33800238/Zora-Neale-Hurston-s-Mules-and-Men-in-plain-text
[GitHub]: http://github.com/johnlaudun/Mules-and-Men
[Meandre]: http://seasr.org/meandre/download/
[ibm]: http://manyeyes.alphaworks.ibm.com/manyeyes/users/johnlaudun
[^1]: The poet William Carlos Williams once advised in “A Sort of Song” to: “Let the snake wait under / his weed / and the writing / be of words, slow and quick, sharp / to strike, quiet to wait, / sleepless. / — through metaphor to reconcile / the people and the stones. / Compose. (No ideas / but in things) / Invent! / Saxifrage is my flower that splits / the rocks.” His famous urging to himself and other poets to find the ideas that already surrounded them in the world echoes the anthropological project of the twentieth century: to find the intelligence and beauty in the always already peopled world of the everyday. (My apologies to Williams for eliminating his line breaks but my software, `PHP Markdown Extra`, wasn’t handling a poem within a footnote at all well.)
[^2]: To be sure, I’m fully aware of the potential problems of Hurston’s text. For a fuller discussion, see my essay in _African American Review_ ([JSTOR](http://www.jstor.org/stable/1512231)).

Pro Photographer’s Workflow

Chase Jarvis is the author of the popular Best Camera blog and book. (His argument is/was the best camera is the one you have with you, and so the book is a collection of photographs taken with his iPhone camera. The subtext is that one should focus on such abilities as composition, lighting, and framing rather than worry about the gear/gadgets in your hand.)

Also on his website is a nice video that details his workflow. Jarvis is a professional photographer with not only a serious staff who accompany him everywhere but also a pretty serious collection of gear. Essentially, he runs all his images and video through Aperture and onto hard drives — Adobe, are you paying attention? Video! — the hard drives escalate from portable drives in the field, to small RAID drives in hotel rooms, to a serious XServe set up back at his office/studio.

The takeaway here? Backup, backup, backup. And an important corollary is many, many copies in diverse locations. (Offsite, offsite, offsite.)

A tidbit within all this is the file naming convention they use:

year/project/day/camera/shot

Example:

20100630_ProjectHere_1_S900123.Cr2

Console Message of the Day

Apparently sometimes the left hand doesn’t know what the right hand is doing, or at least what it is deprecating:

6/25/10 12:26   
AppleScript Editor[3795]    
*** WARNING: Method selectedRowEnumerator in class NSOutlineView is deprecated. 
It will be removed in a future release and should no longer be used.

I came across this while trying to debug my Meandre installation. (More on that later.)

Knowledge for All

The University of Prince Edward Island cancelled their subscription to Web of Science:

This is to inform the UPEI campus community that we have not renewed our subscription to ISI’s Web of Science database (WoS). We realize this is a key research database for many of you and we have taken steps to ensure access to appropriate alternative resources, as well as the WoS back‑files. Late last year we received notification that our subscription price was going to increase by 120%. A number of factors went into the decision not to renew:

‑ a challenging fiscal climate means that we are unlikely to see an increase to Library budgets;
‑ any subscription increase in these challenging times is difficult, but an increase of 120% is simply not acceptable;
‑ we would have been forced to sign a 3‑year agreement, with additional increases in each of the 3 years;
‑ a weaker Canadian dollar would have a significant impact on our subscription costs;
‑ accommodating this level of increase lends credence to the vendors’ business practices and we felt it important to make a statement against these practices (see http://chronicle.com/article/U‑of‑California‑Tries‑Just/65823/ for a recent decision at UC).

UPEI is also leading an effort to create a free and open index to the world’s scholarly literature called “Knowledge For All”. This proposal is currently being sent to various Canadian and international library consortia in an effort to gain support for the project. One goal of Knowledge For All is to ensure that scholars and members of the broader public are no longer disenfranchised by a broken system of scholarly communication. We will provide the campus community with updates on this effort.

It’s interesting to note that it may very well be the smaller universities that make some of these shifts, perhaps clumsily, first because they usually are closer to the economic trends than the majors. I think such is also the case with my own university.

Regex Part Something

I am working on cleaning up a bunch of HTML files that have needless links in them. This is the regex that worked:

<a(/?[^\>]+\>)

The idea was to select everything from the opening of the link ``, in a second pass through the documents. (I used a text editor, [Textmate](http://macromates.com), which allows me to search and replace throughout an entire directory.)

PB6 Day 2

*Advisory (and Apology): This post is finally going up early Wednesday afternoon. A tiring flight back — we sat in an un-air-conditioned aircraft on the ground in Atlanta for half an hour — combined with a delightful series of Father’s Day activities delayed my finishing the post. (Oh, and a cold I caught at some point still tugs at the corners of my ability to focus.)*

I’m writing this summary in Boston’s Logan Airport. It’s the Saturday morning after the second and final day of Project Bamboo’s sixth, and final, planning workshop. The Bamboo Technology Program is well under way, and I believe, the proposal will be submitted to Mellon some time soon.

The piece that remains is the social component — perhaps ironic given our current era’s focus on social graphs — and it seems the hardest one to get right. Bamboo’s innovation is not to build tech nor is it to build a community: there have been plenty of efforts to do both. But never has anyone aspired to build both together.

And so the second day was focused on building the consortium that will, at first, seek to support the technology program, and be the dialogic partner that will ensure that technologies support theories and methodologies and that, in turn, reveal that make new theories and methodologies possible.

Our task for the day was a set of interconnected steps: determine the scope of the three working groups that will be the consortium’s first working social units, enumerate three deliverables for each WG, and describe how the WG will form.

The three working groups are:

* the Consortium WG, which is to establish the organizational and leadership structure as well as define membership dimensions and benefits,
* the Community Outreach WG, which is to develop and initiate a variety of communications efforts to reach out to interested parties and orgs both within campuses as well as across campuses, and
* the Bamboo Labs WG, which is to outline the nature of how individual labs and centers can be involved in BTP initiatives as well as what seed grants and fellowships might look like.

Because my mind is not naturally drawn to abstract organizational matters, I decided to join the consortium group. (Sometimes you have to work against your own grain. I can only hope I didn’t impair the group’s functioning in doing so.)

The first thing we decided was that social openness had to be a working principle, working in tandem with technological openness to make it as easy and as welcoming as possible for individuals and organizations to explore Bamboo’s communities and technologies. To do that, we engaged in some semantic re-jiggering, if you’ll allow me to use that term here, in order to open up *membership*. To do this, we converted the proposed *Partner* tier into *Executive Partner* and the proposed *Member* tier into *Contributing Partner*. The result matrix then becomes:

Tier Commitment Benefits

Executive Partner

$20,000+ cash
$100,000+ in-kind
Strong presence on governing board
Ability to influence technologies and standards that will determine course of digital humanities

Contributing Partner

$4,000+ cash
$19,000+ in-kind
Presence on governing board
Ability to vote on board members and other important decisions
Ability to be first to adopt new technologies and repositories

Member

$250 – $500 cash Access to technologies and repositories

World/User

That’s a very quick sketch done as an HTML table, and so forgive me if it doesn’t reveal the fact that there are gradations within the tiers as well as the host of benefits and other matters we discussed. I think the point we were trying to make is that what Bamboo is looking for is people’s time: we want partners to invest time and we want members and potential members and users to invest time as well.

The working groups went through several iterations, and it became clear that, well, clarity is key. Clarity achieved through communication, both internally and externally. But by this point in the day, we needed to begin to wrap up and to have some concrete tasks to achieve. My sense is that the BTP has such tasks and deadlines: I fear that the BOP, or the Bamboo Organizational Program — the social side of Bamboo, er, the consortium — did not quite get there. My hope is that there will be a lot of post-workshop communication and activity.

I volunteered to co-chair the website development working group, which got broken out because it needs to get done and it needs to happen outside the scope of the Community Outreach WG in order to get done.

Project Bamboo Workshop 6, Day 1

It’s the end of the first day of Project Bamboo’s Workshop 6, which represents an opportunity for the larger (arguably still emergent) community to shape a response to the new context, which is, as I understand it, a function of the Mellon Foundation’s merging of the Research in Technology program with the Scholarly Communications program.

In the interval between this change in context and the workshop itself, the core PB team has worked with a group of universities who early on had identified themselves as likely partner level contributors to whatever it is we’re building. That has resulted in the Bamboo Technology Project.

The goal of the BTP is to identify “strategic areas of work” within which they can plan and, in the case of Phase I projects, build something — because across the board any number of us agree that it’s time for Bamboo to make something, to have an identifiable product that we can show to colleagues and administrators and others that reveals the potential profit in universities and other organizations collaborating in an open way to build services, software, and standards for knowledge creation and distribution. The list of partners is impressive. (I will list them in an update to this post.) The four major areas of work to be completed in Phase I are: work spaces, scholarly web services, collections interoperability, and corpora space. (Phase I is to last eighteen months, as is Phase II to follow.) The first three areas already have some pieces in place that the BTP hopes to build upon and, at the same time, begin to draw together into the kind of whole that is the promise of Bamboo.

For work spaces, there is HubZero and an ECM (Enterprise Content Management System) which will be the foundations for further work.

For scholarly web services, the partner institutions will be able to draw upon a number of projects, including, but not limited to, PhiloLogic, Perseus, CLARIN, SEASR, and Prosopography. (Links to follow.) Most of these services offer some or all of what are becoming the usual analytical tools for textual scholars: document mapping, concordance, collocation, frequency, etc. Collection interoperability will focus on metadata interchange.

The one area of work that will not be built but will be subject to planning in Phase I is corpora space, which is going to focus on the production of five or so white papers as well as identifying some high priority/profile corpora that can be targeted for a project. (I would like this to be a folklore corpus, of course.)

There are other projects and plans within the BTP, but much of the morning was focused on determining the kind of consortium that would, during this transitional period, support the BTP projects. This is, of course, the reverse of Bamboo’s ultimate goal, but I think it rightly puts resources and imaginations in motion. A number of organizations have stuck with the planning process now for two years, and we will, I think, continue to stick with it because we believe in the greater good that Bamboo seeks to serve. What we need are tangibles to show to others to concretize our participation and to act as an incentive for others to join.

Once more firmly established, Bamboo can do a lot of good, if it can negotiate the somewhat crowded waters of already existing as well as emerging organizations, coalitions, and other consortia with similar goals and/or visions. E.g., CHCI, CenterNet, and now CHAIN. Part of what I think Chad Kainz was struggling to articulate in trying to develop an organizational structure for Bamboo was to make as many people and institutions feel included as is humanly possible. (In all honesty, humanists and their organizations can be a fairly territorial lot, as contradictory as that seems to the rhetoric that we so often deploy.)

One of the things it could do, that was the focus of our table’s conversation not once but twice during the day, is the development of a federated researcher/user identification system for the humanities. Think Thomson-Reuters’ ResearcherID but open source and run by the collaboration of member organizations — and even non-member organizations. Throw in DOIs for publications, projects, datasets, tools, and workflows and you have not only a very powerful, and searchable, data stream but one that fits within every organization’s already existing workflows of annual reports and assessments and every individual scholar’s workflows of vita maintenance. And it would be a natural component/connection to institutional repositories. (I will link to the small presentation I pulled together for my colleagues at UL-Lafayette in an update.)

*UPDATE*: [The document is here.](http://www.scribd.com/doc/33595752/An-University-Institutional-Repository)

There was a lot more that happened today. Some of it can be gleaned from Chad and David’s slide decks, which I hope they make available later, and some of it can be found in the planning documents, which may be available on the Bamboo website. For now, I will leave off my summary of the day here.

Strandbeests

This one minute video produced by BMW is a nice introduction to the work of Theo Janssen who is both an engineer and an artist. He builds kinetic sculptures that “live” on the Danish Strand beach. They capture their energy from the wind and use it to live on the dynamic strip of land between the sea itself and the dry sand of the beach. At the end of the video, the Youtube UI presents you with a host of other possible selections. Click around or you can visit Janssen’s website Strandbeest.

Images of the Week

A collection of images from this week’s news and events:

Japanese Spacecraft Ikaros Deploys it Solar Sail

The Japanese Spacecraft Ikaros Successfully Deployed Its Solar Sail

The Last Shuttle Liftoff

The Last NASA Space Shuttle Lifted Off

John Howe's Imagining of "Lord of the Rings"

John Howe’s Imagining of the Opening Events in “The Lord of the Rings” — for some reason this image just captured my imagination. I think it’s the combination of the gathering storm in the far background, the sunlit valley in the middle, and the wizard’s urgent strides in the foreground. Few painters these days work all three grounds like the Dutch Masters once did. And I have long loved the work of Brueghel et al.

Ruskin on Privileging Certain Forms of Imagination

In his 1857 lecture on “Influence of Imagination in Architecture” to members of the Architectural Association, John Ruskin noted:

If we see an old woman spinning at the fireside, and distributing her thread dexterously from the distaff, we respect her for her manipulation — if we ask her how much she expects to make in a year, and she answers quickly, we respect her for her calculation — if she is watching at the same time that none of her grandchildren fall into the fire, we respect her for her observation — yet for all this she may still be a commonplace old woman enough. But if she is all the time telling a fairy tale out of her head, we praise her for her imagination, and say, she must be a rather remarkable old woman.

From The Two Paths (George Allen edition of 1906), page 136.