Course Stakes (for a Course on Digital Folklore)

A prolonged quarantine thanks to a worldwide pandemic provides a welter of opportunities to think about why I do the things I do. To a lesser degree, or at least within a smaller scope, designing a new course also asks me to think about why I do the things I do. Designing a course in the middle of a pandemic and designing it to be remote from start to finish, something I have never done before, is not so much an opportunity to ask myself but a requisite to knowing what to do at all.

In the Fall of 2020, I will offer for the first time a Digital Folklore and Culture class. I already offer a course on online legendry, but I imagine this course as more foundational, a course that might eventually be a prerequisite for the legend course such that students taking the latter course could be counted on to know a select set of theories and approaches to online culture(s) and possess a select set of skills to which they have already been introduced and have even practiced a time or two.

But what are those approaches and what are those skills?

First, I think students need to know how the internet works (and they don’t). To be honest, I think every university student should know how the internet works. It’s too important. Understanding how pages are made up of pieces of text, code, and images delivered from other computers and that those pieces can be tracked (for whatever reason) is something every single person using the internet should know.

That is, everyone should know HTML. Everyone. It’s not like knowing how your car works. You have a mechanic for that, and you develop a relationship with your mechanic, so you can trust him or her to translate things. The same applies for other trades and even other systems. (Dealing with a legal matter? Then you should probably have a lawyer help you.)

The assumption I am making is that for most complex systems, and a car is a complex system, we have curators/intermediaries who, if we are lucky and/or careful, we can trust. Once upon a time if you wanted information, you headed to your local library, and you had not only the librarian behind the counter to act as a curator, but the library itself and the librarians who staffed the acquisitions and cataloging offices to curate and intermediate for you.

The internet is the sine qua non of disintermediation, and what is dis-intermediated is information itself.

That could be, so we hoped, and enormously powerful thing, but that was before the internet was, first, commercialized, and, second, weaponized. The latter phenomenon builds off the former.

The commercialization of the internet that I am focused on here is not Amazon or anyone else: the commercialization of the internet I mean is the same commercialization of media that has been the foundation of American information systems since the nation’s founding, and it’s commercials. The simplest version of this is the time-worn adage that probably dates to Richard Serra and Carlota Fay Schoolman’s “Television Delivers People” (1973) but is probably best known in phrasing that eventuated in “If you are not paying for it, you’re not the customer; you’re the product.” (See Quote Investigator for fuller history.)

It wasn’t that long ago that the commercialization process was mediated by relatively few outlets: the handful of television and radio stations, newspapers, and magazines to which we had access. Say what you want about those old media outlets, but they drew lines between what was information and what was advertising. I’m not saying they weren’t blurred, but there was a discussion, and there were at least some individuals acting on the user’s behalf without the user being directly involved.

Now we are all directly involved. The outlets are still there, but, thanks to a publishing platform like Facebook and a search outlet like Google, they are broken into tiny pieces of information and the platforms decide who gets what pieces. The result is the kind of information bubbles that conservatives once lamented, but it’s capitalism itself, enabled by the power of information technologies, that has gotten us here.

Organizations can now reach us one-by-one and, just as importantly, know who we are and know how to approach us, and this is all a result of the way the web appears before us either in a browser or in an app on our phone — many social media apps are little more than custom browsers.

And we all need to know that. And to my mind, whether universities are teaching students skills they can use in the future or whether they are simply making them better citizens, then they better be teaching them how the internet works.

Compare Lists in Python

If you search for how to compare two lists in Python, you will find a lot of helpful pages in a lot of places, many of which assume you are working with numbers or you want exact matches. But what if you want to compare all the items in one list with all the items in another list and you want to be able to set some arbitrary measure of similarity or difference?

The problem arose for me recently when I was trying to compare two lists of different lengths. The two lists represented keyword sets derived from a corpus using NMF, which I had run with two different component values. As part of wanting to discover a probable “best fit” I wanted to compare which strings had remained the same and which had changed to some degree.

My first impulse was to try the Jaccard coefficient, and I used some simple code to make that work:

def jaccard_similarity(query, document):
    intersection = set(query).intersection(set(document))
    union = set(query).union(set(document))
    return len(intersection)/len(union)

I then embedded that bit of code, but it could be any code you wanted, in the following:

for jk, jv in enumerate(second_list):
    for ik, iv in enumerate(first_list): 

The logic is pretty simple, but it is a leap, at least for me, in terms of how I think about things. When I started work on this, I kept trying to pack everything in one for loop: after all, I wanted to compare one list to another. But I wanted to compare all of one list with all of another list, which means I needed to iterate through both lists. A simpler version of this would be:

for j in second_list:
    for i in first_list:

The addition of enumerate above was so that I could keep track of which string in each list was matching without necessarily having to see the string itself — I could use the index values that enumerate produces to call those, if I needed. enumerate is one of those functions I regularly forget, and it is very convenient: essentially it takes a list of items and transforms it into a list of tuples where the first value is the item’s index and the second value is the item itself, so [‘a’] becomes [(0,’a’)]. You can call the parts of the tuple by any variable name you like, but I tend to stick with k and v, for key and value, because … well, because. (It could easily be anything else, and I’ve even written code that called three-item tuples with rather bland, and thus also not advisable, t, u, v. Do not do this.)

So essentially both the for loops above are transforming each of the lists involved into a list of tuples and then walking through the list, comparing the items themselves but reporting only their indices.

It doesn’t really matter which list is which, so far as I can tell, so long as you keep the variables correctly aligned. My final code block looked like this:

print("Jc = Jaccard coefficient")
for jk, jv in enumerate(topics_45):
    for ik, iv in enumerate(topics_35):
        if jaccard_similarity(iv.split(" "), jv.split(" ")) > 0.5:
            print(f"35-{ik} and 45-{jk} have a Jc of {jaccard_similarity(iv,jv):.2f}.") 

My next step is to determine how to transform this into a network or tree so that I can see which keyword clusters continues (relatively) unchanged — where I set the threshold for relatively (and perhaps end up using something other than the Jaccard coefficient which doesn’t seem terribly discriminating — and also where clusters split or, in a few cases, disappear/die.

These Books

At least two newsletters arrived in my inbox this week using this stock photo of books. I’ve seen the image used elsewhere, but seeing it twice on the same day made me wonder “Whose books are these?” @ me on Twitter if you know.