Speaking of Legend Corpora

Working with these texts for my paper at this year’s meeting of [ISCLR](http://contemporarylegend.org) (International Society for Contemporary Legend Research), I remembered that I have an entire inbox dedicated to emails sent to me by friends and family that struck me as “net lore” (which is the name of the mailbox, by the way). I just checked and the archive reaches back to 2003. (And I think I have an older archive somewhere on disk.) My goal in the months to come is to find a way to slice the 56MB text file into individual text files that are appropriately named, perhaps by subject line and date. My guess, and it’s only a guess right now, is that making these files available in plain text, with something like the following filename as a primitive form of `metadata` is going to be the most efficient form of sharing:


I think I can figure out how to write a Python script to do that. While I know that a better set of metadata might include who the texts were from and the trace route for them, I am unwilling to imperil the privacy of my correspondents. Plus, I think most folklorists are going to be chiefly interested in the texts. (We’re still playing catch-up to the notion of social graphs. *Sigh*.)

Once I’ve got the collection put together, my best guess is that I will make it available through something like GitHub or BitBucket. Neither is really designed to support this kind of thing, but they are oriented towards public repositories and they do make forking projects very simple, and it would be interesting if researchers interested in this material, folklorists among them, could find some way to have projects remain connected in some fashion. Both GitHub and BitBucket make it possible to follow the chain of forked projects and also for users to “follow” those projects and make comments or even, fold those advances back into their own projects. (How cool would that be?)

In case you are wondering about the actual texts involved: they are an admixture of jokes and legendry. Some of the materials are quite topical (and racist):

> It seems that once again,
> all us white folks have missed
> a great opportunity.
> While all the black people attended
> Obama’s inauguration and parades,
> we should have broken into their homes
> and gotten all our shit back.

And some of the materials, like the joke referenced in the file name above, have been around for quite some time on the internet and probably in oral circulation before that:

> A man was riding his Harley along a California beach when suddenly the sky clouded above his head. In a booming voice, the Lord said, “Because you have tried to be faithful to me in all ways, I will grant you one wish.” The biker pulled over and said, “Build a bridge to Hawaii so I can ride over anytime I want.” The Lord said, “Your request is materialistic. Think of the enormous challenges for that kind of undertaking; the supports required to reach the bottom of the Pacific and the concrete and steel it would take! I can do it, but it is hard for me to justify your desire for worldly things. Take a little more time and think of something that could possibly help mankind The biker thought about it for a long time.
> Finally, he said, “Lord, I wish that I, and all men, could understand our wives. I want to know how she feels inside, what she’s thinking, why she cries, what she means when she says nothing’s wrong, and how I can make a woman truly happy.”
> The Lord replied, “You want 2 lanes or 4 on that bridge

(Please note that the period and the closing quotation mark are missing in the original.)

Any feedback on how to proceed is quite welcome.

Big Data

My friend and colleague Gwydion provides a nice overview of [big data for humanists](http://babeltech.wordpress.com/2012/12/07/tech-wednesday-talk/). It won’t be useful for people already doing the work, but it is useful for how he pitches it to those who have not thought, or don’t want to think, about it.

Open Data Commons

[Open Data Commons](http://opendatacommons.org/) “is the home of a set of legal tools to help you provide and use Open Data.” They have a lovely write-up of why open data matters:

> Why bother about openness and licensing for data? After all they don’t matter in themselves: what we really care about are things like the progress of human knowledge or the freedom to understand and share.

> However, open data is crucial to progress on these more fundamental items. It’s crucial because open data is so much easier to break-up and recombine, to use and reuse. We therefore want people to have incentives to make their data open and for open data to be easily usable and reusable — i.e. for open data to form a ‘commons’.

Interactive Map of U.S. Migration

Forbes has a fantastic dynamic map of migration statistics drawn from IRS data. The migration is internal to the U.S., but clicking on cities or areas reveals patterns that make you ask questions. Here’s a screen shot for the map with Lafayette as the focus:

What’s the inbound migration from south California and Nevada? Is that migrant workers?


Google, Microsoft, and Yahoo have gotten together to adapt a collection of microformats that will make it possible for folks who produce and publish content to the web to make searching that content more meaningful:

> Most webmasters are familiar with HTML tags on their pages. Usually, HTML tags tell the browser how to display the information included in the tag. For example, `


` tells the browser to display the text string “Avatar” in a heading 1 format. However, the HTML tag doesn’t give any information about what that text string means — “Avatar” could refer to the a hugely successful 3D movie, or it could refer to a type of profile picture—and this can make it more difficult for search engines to intelligently display relevant content to a user.

> Schema.org provides a collection of shared vocabularies webmasters can use to mark up their pages in ways that can be understood by the major search engines: Google, Microsoft, and Yahoo!

> You use the schema.org vocabulary, along with the microdata format, to add information to your HTML content. While the long term goal is to support a wider range of formats, the initial focus is on Microdata. This guide will help get you up to speed with microdata and schema.org, so that you can start adding markup to your web pages.

Using Lightroom

Photography is part of my research, and I also enjoy photographing my family and just generally documenting my world — more on that as my next potential project later. Between those various interests and commitments, I have about 15,000 images, all of which are safely cataloged by Adobe’s Lightroom. (I tried Aperture when it premiered at an unbelievable price point on the Mac App store, but either I have worked with Lightroom too long and couldn’t figure out how to access Aperture’s features or it doesn’t have the functionality on which I now depend that exists in Lightroom.)

I get a lot of questions about using Lightroom from students and colleagues. From now on, I am telling everyone to [start here](http://www.mulita.com/lightroom/tutorialpodcast45/). That link takes you [George Jardine’s website](http://mulita.com/blog/) and the half-hour tutorial he recorded on the basics of image management with Lightroom.

If the tutorial convinces you to try Lightroom, then you should also read [Rob Sylvan’s “10 Things I Wish I Could Tell Every Lightroom User.”](http://photofocus.com/2009/10/16/10-things-i-wish-i-could-tell-every-new-lightroom-user/)

iPhone Tracker on GitHub

Apple’s latest update to iOS fixes the problem of making the location services cache easily available on your computer, but before you update, you might still enjoy seeing how much information about you is available. How widely available it is is a matter for a separate discussion.

I tried out the app on myself, just before I updated, to see what the results look like:

It’s pretty much what you expect: it shows that I live most of my life within Lafayette, where I live and work, and the city’s environs, where I do research. What I found interesting, since the app offers this data as an animated timeline, are the brief flowerings that occurred thanks to travel I have done over the past year.

Viewed within a historical perspective, and internally, this information raises no great concerns for me. Viewed from a chance to market to me I have some concerns. Viewed from a particularized and dynamic tracking of my movements … I don’t like it at all.