Thank you, NASA, for keeping some of our dreams of space alive.
I’m continuing to have difficulties within R: suddenly
rJava doesn’t work and a version is not available for R 3.3.0. Time to re-activate 3.2.4:
sudo port activate R @3.2.4
3.3.0 is still around. Whenever the various R packages have caught up with “Somewhat Educational”, I can re-activate it and then uninstall 3.2.4, whose nickname I have forgotten:
sudo port uninstall R @3.2.4
Suddenly, when starting up ipython/Jupyter notebook, the R kernel won’t load. I’m getting the following error:
[W 10:21:13.833 NotebookApp] Notebook Code/notebooks/Syuzhet_of_Small_Stories.ipynb is not trusted [W 10:21:13.887 NotebookApp] 404 GET /nbextensions/widgets/notebook/js/extension.js?v=20160521102047 (::1) 11.86ms referer=http://localhost:8888/notebooks/Code/notebooks/Syuzhet_of_Small_Stories.ipynb [I 10:21:14.266 NotebookApp] 302 GET /notebooks/Code/images/R-dendrogram.png (::1) 1.95ms [I 10:21:14.476 NotebookApp] Kernel started: f2fee2d4-fdc2-4bd7-8cc5-8cd628894807 Error in loadNamespace(name) : there is no package called ‘IRkernel’ Calls: :: ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous> Execution halted
Let’s work through these difficulties one line at a time:
Notebook Code/notebooks/Syuzhet_of_Small_Stories.ipynb is not trusted
The documentation for iPython notebook states:
Sometimes re-executing a notebook to generate trusted output is not an option, either because dependencies are unavailable, or it would take a long time. Users can explicitly trust a notebook in two ways:
At the command-line, with:
ipython trust /path/to/notebook.ipynb
After loading the untrusted notebook, with File / Trust Notebook
After running that, I still get the following:
The next fail appoint appears to be:
Error in loadNamespace(name) : there is no package called ‘IRkernel’
What the what the?
I’ve been running R in iPython notebook for the past week. What happened to the IRkernel package? When I run R and simply try to re-install it:
Warning message: package ‘IRkernel’ is not available (for R version 3.3.0)
Ack. At some point MacPorts updated R when I wasn’t paying attention, which is the problem with package managers and it’s really my responsibility. So let’s see if I can un-install 3.3.0 and if that clears things up … too complicated.
BUT there is a solution and it can be done from within R (3.3.0):
install.packages(c('pbdZMQ', 'repr', 'devtools')) # repr is already on CRAN devtools::install_github('IRkernel/IRdisplay') devtools::install_github('IRkernel/IRkernel') IRkernel::installspec() # to register the kernel in the current R installation
4 simple steps and everything is hunky-dory (why one would want a muscular skiff, I don’t know).
With Matt Jockers coming next week, I thought I would try out some of his recent work on the shapes of stories and share the results with everyone. Those familiar with Jockers’ work will know that these shapes are based on sentiment analysis on a large corpus — a frighteningly large corpus — of novels. (To be clear, I’m not entirely clear on what’s in and what’s not.) While the size of the corpus is awe-inspiring, for humanists anyway, it is made up of very complex texts to which, for this folklorist anyway, the term story applies rather loosely.
It’s all a matter of scale, and before pursuing this matter of scale in theory, it might help to have a point of comparison, something at the other end of the scale, as it were. So, with a typical novel in your mind, let me give you an example of the kinds of texts with which I work, the kind of text I imagine when I use the word story:
I went to meet an old man in Marrero, and he told me a story. He went to look for a treasure with some other men. And there was a controller who had brought a Bible to control the spirits. And when they arrived at the site, they saw a big horse coming through the woods with a man riding it, and when he dismounted, it was no longer a man on the horse. It was a dog. And he said the dog came and rubbed itself against his legs. He said it was growling. He said he knew the dog was touching him, but he didn’t feel anything. It was like there was just a wind. And he said they all took off running. He lost his hat and his glasses and he tore all his clothes. And even the controller ran off and he never saw his Bible again after that.
The story is a single episode affair of a mere 153 words, with only 83 unique words being used to bring a fairly robust storyworld to life. Let’s postpone the usual discussions about dimensionality and step to an implementation of the
syuzhet library to this small story:
anc089 <- read_file("../texts/legends/anc-089.txt")
anc089_v <- get_sentences(anc089)
anc089_sentiment <- get_sentiment(anc089_v, method="syuzhet")
 0.00 0.75 0.00 0.25 0.00 0.00 -0.60 0.00 0.50 0.00 -0.75 -0.25
A couple things to note about this first application of the library: first, for me,
syuzhet‘s loading of files was a bit wonky, so I am using the
readr library instead. Second, as you can see both from the listing of values,
syuzhet evaluates sentiment by sentence and there are only 12 sentences here. The standard plot for these sentences looks like this:
main="Example Plot Trajectory",
xlab = "Narrative Time",
ylab= "Emotional Valence"
Longer texts will obviously produce more lines, and more discernable shapes, but, for this discussion, we are starting here, so we need to find some way to represent this as more of a shapge, and, for the sake of simplicity I am going to jump a little ahead in the various plot shapes Jockers discusses and use the one based on percentage values.
get_percentage_values puts things into 10 bins by default, but that number is too big for stories this small:
Error in get_percentage_values(anc089_sentiment, bins = 10): Input vector needs to be twice as long as value number to make percentage based segmentation viable
As a quick workaround, I divided the length in half, and, as we move forward, I will need to remember to create some sort of if statement:
if length(_v)/2 < 10... to accommodate the smaller texts with which I work. (Forgive the use of a goofy variable name, it was the first thing that came to mind as I was working.)
abs = length(anc089_v)/2 # abs : arbitrary bin size
anc089_percent_vals <- get_percentage_values(anc089_sentiment, bins = abs)
xlab = "Narrative Time",
ylab= "Emotional Valence",
To hasten a discussion about stories and shapes, I would like to compare another version of the legend above with one I collected myself (see Laudun 2013 for details on collection and context):
One day — my family was kind of weird. Because they would always try to dig for money. So one day — I was young, about twelve I guess — so my mother, couldn’t leave me home, had to take me out there. So we went out there, to a place called the country, some property we had out there, about an acre of land. So they said form a circle. And this is — my eyes seen this myself. We formed this circle, man, my brother, my brother was preaching. He was digging in the middle. We were all around him and he was digging in the middle. Man, he took that shovel. I guess from the way it looked it must have been a shovel deep, about like this. Something went yanga yanga yanga yang. And then went boom. And when you looked again, they had a fucking coffin, man. Solid gold. Open it up, nothing but coins in there. And then a bull appeared, just appeared out of nowhere The bull had fire coming out his nose and his eye was red red red And you hadn’t supposed to talk, because it would break the chain of everything, that’s just how it was. That bull started charging I was trying to get out of there. I’m young. I don’t know what’s going on. My mother telling me just don’t move. There ain’t nothing, ain’t nothing. You just seeing things And that sucker come up from me to like where I’m sitting to you and disappeared. Now you know that scared the shit out of me I was damn near shitting in my clothes. My uncle come up in the car. And when he drove in the yard that shit exploded. And when I looked again it didn’t look like anyone had dug in the ground at all. Everything disappeared.
It’s twice as long as the previous story at 312 words and it has three times as many sentences, thanks to the somewhat staccato, disjointed nature of the teller’s delivery, which helps to replicate the confused nature of the reality being represented. As to the nature of those sentences: we will have to leave aside a conversation about the nature of transcription. Of course there is no punctuation in oral discourse, only pauses that, without the application of scribal syntax rules, the transcriber is left to turn to his/her own devices, or conventions, to represent.
For the sake of comparison, we are going to use some of Jockers’ built-in techniques that returns 12 points of granularity in our representation of ANC-089 (in red again) and 36 for LAU-013 (in blue).
lau013_v <- get_sentences(read_file("../texts/legends/lau-013.txt"))
lau013_sentiment <- get_sentiment(lau013_v, method="syuzhet")
anc089_rounded <- round(length(anc089_sentiment)*.1)
anc089_rolled <- zoo::rollmean(anc089_sentiment, k=anc089_rounded)
lau013_rounded <- round(length(lau013_sentiment)*.1)
lau013_rolled <- zoo::rollmean(lau013_sentiment, k=lau013_rounded)
anc089_list <- rescale_x_2(anc089_rolled)
lau013_list <- rescale_x_2(lau013_rolled)
lines(lau013_list$x, lau013_list$z, col="blue", lwd="2")
Two graphs are not much to go on, so, to see what might develop, I ran
syuzhet across two more text which were very similar in discursive sequence as well as being of the same “type”. (Type is in quotes there because folklorists mean something very particular by type — more on this later, I think.) If you are curious about the stories, I will include them in an appendix at the end of this post.
# Two new stories
lau014_v <- get_sentences(read_file("../texts/legends/lau-014.txt"))
lau014_sentiment <- get_sentiment(lau014_v, method="syuzhet")
loh164_v <- get_sentences(read_file("../texts/legends/loh-164.txt"))
loh164_sentiment <- get_sentiment(loh164_v, method="syuzhet")
# All the norming rolled into a line per item
lau014_list <- rescale_x_2(zoo::rollmean(lau014_sentiment, k=round(length(lau014_sentiment)*.1)))
loh164_list <- rescale_x_2(zoo::rollmean(loh164_sentiment, k=round(length(loh164_sentiment)*.1)))
# Graph of all 4 stories
lines(lau013_list$x, lau013_list$z, col="blue", lwd="2")
lines(lau014_list$x, lau014_list$z, col="light blue", lwd="2")
lines(loh164_list$x, loh164_list$z, col="blue", lwd="2")
Okay, some convergences, sort of, maybe?, at 45% mark of the stories? On the one hand, these are hand-curated close matches. On the other, these are short texts, and arguably from the perspective of
syuzhet probably not long enough for particularities of narration to “even out” enough for the narrative to be fairly represented. That might mean, then, that one response to Jockers is: don’t call this the shape of stories. It’s the shape of novels.
Or, perhaps better, the shapes that
syuzhet probably draws best are those of complex secondary genres, of which the novel is the most advanced written form. The notion of primary genres and secondary genres is one introduced by the Russian Formalist Mikhail Bakhtin. (It is most clearly delineated in his essay “The Problem of Speech Genres” (1986). The only thing to note here is that Bakhtin argued that the larger, secondary, genres were built up of combinations of smaller, primary, genres. In this model, legends consisting of only one episode or two, rarely three in this collection, are towards one end of a continuum which passes through long forms of oral narration like the epic, multi-episodic in nature, and ending with the novel, which seems always able to absorb almost any other genre, primary or secondary. (E.g., Robinson Crusoe.)
One way to think about these combinatrics of genres, as Bakhtin imagined it was also somewhat evolutionary in scheme, with simple forms leading to more complex forms, but never with the simple forms themselves giving way. Rather, forms are free to combine in any way in which the culture they are found finds them useful or interesting. Cultural practice was the constraint for Bakhtin.
The two stories above, and the two below (as promised), are taken from a small collection of narratives that I and other folklorists have collected in Louisiana. They are the basis for a larger project on which I am working which is to create a “born digital” archive of folk narrative — with Louisiana as but one starting point but, with any luck, spreading to other parts of the U.S. (And beyond would be nice: the digital resources for culture analytics of folk culture materials are, at present, limited.)
This narrative, as well as LAU-013 above, was told by Oscar Babineaux to me while sitting in his home in Rayne, Louisiana. His stories, along with a number of narratives collected by my students, reveal a very particular African American tradition that has, as yet, been under-appreciated.
Like I said my family was weird, they liked to dig for money and stuff. Said my grandfather had left us some money, and they was digging for it So one day we went, and I was at work, so I can see, we at a country spot, like our property. So I can see a lot of people dressed in white. So I’m curious me. I said well shit what the hell is everybody doing out there dressed in white? I wanna see. So I goes out there. So they tell me you’re working right now, just go home come back. You know, come back after work. So I goes back, man, after work. So, they all in the house. We all praying man, everyone’s on their knees praying. They got an excavator in the back yard, digging. You understand? Find this money, I guess. We’re on our knees, man, we’re praying. It’s like in the pit of the summer like here. No wind nothing. They had a wind come through the house. That wind was so strong: my aunt was holding onto the door like that and both her legs was in the air. That’s how strong the wind was. In the house.
So they said — they picked me, my nephew, the one I was telling you that talk all that shit, and my little niece to go bring some water to the workers in back, the one that was doing the work. So we got to walking. We passed on the side of the house to bring them. So my nephew said, say man you see that guy in the tree? I said man fuck I don’t see nobody in no tree. He said yeah man he be right there sitting on that limb. I said I don’t see nobody man. I’m getting scared now. Man I don’t see nobody. But he’s seeing this, you know. So he said — I said how he look? It’s a guy, he said, it’s a guy dressed in a pirate suit, man. He said he got a pirate hat on. He got a pirate jacket. And he started talking to him. The guy in the tree started talking to him while he’s telling me this. But the guy in the tree is tell him shut up don’t tell me that. So he telling me, man, look he right there. You can’t see him? Look he right there on that branch. He say he want something more to drink You know, because what they had did: they’d put a bowl in the back yard, under this tree, with some alcohol in it. You understand? And I don’t know if it was the sun that would dissolve it, but it would be gone. Okay, so he say he say man he want another drink. So I said fuck man don’t tell me that. I wanna get back in the house. I said I don’t see nobody up there.
So we kept on walking. We went out there. We brung them some water. So on our way back. Look at him. He say, see you, you son of a bitch. He say you don’t wanna give me another drink, huh? He say you gonna be just like me. He say you see this here peg leg? He say you going to be just like me. He say for this out here y’all are going to have to lose something. So, man, it got kind of scared. We started walking fast. By the time we got to the house, I broke out a run. A shovel, man, come from the back of the house. I mean full force. That shovel stuck in that tree so deep we had to dig it out with an axe. It stuck — you know with a shovel, it’s hard to stick a shovel into anything. That shovel went inside the tree halfway.
This story can be found in Swapping Stories.
Have you all heard anything about hunting money? Well, that’s the thing I’m gonna get involved with here. I went on a hunting expedition. And people don’t know what actually goes on. I was invited to go on this trip, and they explained to me what it was. I’d never heard of it before. That was news to me. Okay, people a long time ago (pause) they claimed they buried their money. That was back when they had the slaves and all that there. And they, this old slave owner, he’d have a lot of money to bury. Well, he’d take his most-trusted slave he had. His old slave. And he’d take him with him. Well, he’d go out and he’d pick him out a spot where he wanted it. And he’d tell the old slave, “Now, I want you to dig right here.” And he’d put it about four or five foot deep.
Well, the old slave’d be down there digging. When he thought it was deep enough, the old slave owner’d tell him, “Now, look. I want you to promise me something. That you’ll guard this money as long as you can.” And the old slave’d say, “Yeah, I’ll do that, boss.” Well, then he would, he’d shoot him. Kill him. Well, then the owner’d cover the hole up. He was the only man that knew where it was. That slave down in there, the belief was, that the slave, his spirit, would continue to guard that money.
Well, these people decided how to get around that. Well, I went off on this trip with ’em. I saw things I thought I’d never see again. They told me when they went, they said, “You gotta be pure.” Now, these are grown men. These ain’t little boys. They said, “You can’t have any dealings with your wife for a week, at least seven days before this hunt.” Well, these guys hired a man from the other side of Houma to come over here. He was supposed to be a professional finder. Well, he done it for a fee. They had to pay him, his expenses, to come over here. He brought a guy with him, and the two of ’em drove from Houma over here. He had a forked rod. And that’s the thing he used to hunt the money with there. They could find it.
Well, we went way out in the country to an old plantation deal there. They told us, they said, “Now, y’all stay here. And if that money’s within a mile of here, we can find it.” And he said, “We’ll come back and get y’all.” So he told ’em all that. Well, we sat there in the dark, and he told us, “Now don’t talk loud. Y’all can just whisper.” So it was weird. I mean I was sitting there, and I didn’t know what was going on. Well, in a little while, they come back. And the guy said, “We’ve located it.” He said, “Now, this is what I want y’all to do. When we start out to get it, they can’t nobody say another word.” So, we trailed along behind ’em back through the woods.
And it was dark. We was stumbling over roots and everything else. And we finally got to the point, the place there, and they had an old lantern. It was the only light we had, and they had it up in front. We was all in the back. And they got up there and sure enough, there was that forked piece of metal sticking in the ground where they’d located it. Well, they told us, and everybody, wasn’t nobody saying a word. They was motioning, stand back and all that there. Well, all at once, this guy come out with a little old box.
This was the part that was weird to me. He got that box and he went in there and he had some white powder. I know it was flour. That’s what it looked like. He went and made a big circle around the whole thing. That was to keep that spirit in there. In other words, to keep that spirit from getting that money and running with it. It was their belief that once you got that powder around it, he couldn’t cross that powder.
Well, they got there and they got that old powder’d up. Then they started digging. And I mean they was going at it fast. Nobody wasn’t saying a word, it was just strictly just everybody was working. I didn’t know nothing about it. I did get in there and dig just a little bit, but I was [pause] more or less [pause] wanting to watch than anything else. Well, they dug a hole , I bet you, it was seven-foot deep. You could’ve buried a car in it. There was some of them fellers there, they wouldn’t work in a pie factory. But boy they was at it with that digging. Well, they was going at it and all at once, one of them fellers got a coughing spell.
Well when he did, this guy that was supposed to be the professional, he just got up and said, “Boys, it’s all over with.” He said, “It’s gone now.” He said, “I told you, you can’t make no noise of no kind!” He said, “Now you can dig all day and they ain’t no money there now.” And that wound the money hunting up. Since then, I’ve heard other people talk about going on trips, and they always blame it on something! Even one time, they got so mad with one of the guys, they thought he’d lied [inaudible]. They thought he was lying. I never heard of nobody finding the money. But they’re right close to it. Now, there’s people right today that’s still doing it. Here a while back, I heard a man down below his house there, he went to work one morning and there was a hole there the side of the field. He said he don’t know when they dug that! It was some time during the night!
Bakhtin, Mikhail. 1986. Speech Genres and Other Essays. Tr. Vern W. McGee. University of Texas Press. (“The Problem of Speech Genres” was written circa 1952 but not published in Russia until 1975 as part of a volume entitled Aesthetics of Verbal Creativity. The 1986 English translation contains only selected essays from the Russian volume.)
I can’t, frustratingly, find the tweet now that brought this to my attention, but the Sherman Center at McMaster’s University has a nice collection of workflows that look really useful.
As I work through Matt Jockers’ material on sentiment analysis of stories — I’m not quite prepared to call it the shape of stories — I decided it would be interesting to try
syuzhet out on some non-narrative materials to see what shapes turn up. A variety of possibilities ran through my head, but the one that stood out was, believe it or not, TED talks! Think about it. TED talks are a well-established genre with a stable structure/format. Text-to-text comparison shouldn’t really invite too many possible errors on my part — this is always important for me. Moreover, in 2010 Sebastian Wernicke assessed the corpus as it stood at that time, and so perhaps a revision of that early assessment might be due.
The next step was how to download all the transcripts. The URLs all looked like this:
While I would love it if this worked:
wget -r -l 1 -w 2 --limit-rate=20k https://www.ted.com/talks/*/transcript?language=en
wget is flexible, however, and if you feed it a list of files, it will work its way through that list. Fortunately, in this moment, a search of the web turned up a post on Open Culture describing a list of the 1756 TED Talks available in 2014. As luck would have it, the Google Spreadsheet is still being maintained.
I downloaded the spreadsheet as a CSV file and then simply grabbed the column of URLs using Numbers. (This could have been done with
pandas but it would have taken more time, and I didn’t need to automate this part of the process.) The URLs were to the main page for each talk, and not the transcript, but all I needed to do was to add the following to the end of each line:
Which I did with some of the laziest regex ever. I could then
cd into the directory I created for the files and ran this:
wget -w 2 -i ~/Desktop/talk_list.txt
What remains now is to use Beautiful Soup to rename the files using the html title tag and to get rid of everything but the actual transcript. Final report from
FINISHED --2016-05-18 16:16:52-- Total wall clock time: 2h 14m 51s Downloaded: 2114 files, 153M in 3m 33s (735 KB/s)
From a longer quotation by Willard McCarty in The Humanist:
When the number of scholars grows to a certain point, they can produce their own conferences, become one anothers’ reviewers and critics, and even finance their own journals. They establish a private language and tradition; allusions, jokes and friendships spring up, and the corporate life grows through students and in extreme cases through intermarriage. They form an example of the social equivalent of what in atomic physics is known as a critical–or in this case some would say, uncritical–mass.
William N. Parker. 1962. Work in Progress: A Report to Ernst Söderlund. Scandinavian Economic History Review 10.2: 233-44.
Every time I have to handle something that somehow just brushes past Java, I have a bad time. This time it was getting Matthew Jockers
syuzhet R package up and running on my machine with a clean install of Mac OS 10.11 on it. The craziness is still with me, witness this:
But after many, many attempts to
install.packages("syuzhet") being met withinstallation of package ‘syuzhet’ had non-zero exit status
and the same forrJava`, it appears the following may have worked:
Unfortunately, ETE, Python framework for the analysis and visualization of (phylogenetic) trees, is not currently available to install through MacPorts. The recommended way to install ETE on a Mac is through Anaconda or a Miniconda setup. I confess I was not familiar with the
conda open source package management system, and I had not heard of Miniconda. I like Anaconda quite a lot, and I like it even more now that I know it’s part of a larger open source ecosystem, and it may be that one day I will switch over to it, but, right now, I am fairly happy with my Macports setup and I would rather not break what’s working.
To get a better sense of what I need to do, I clicked on the Linux native installation directions, which skip
Install dependencies: python-qt4, python-lxml, python-six and python-numpy
Those don’t look too bad. Python Six, a compatibility library for Python 2 and 3, is available as
py34-six. Done. As for the rest:
py34-lxml, and, of course,
py34-numpy are already installed. Done, again.
It looks like
pip will work here, so after making sure I have
py34-pip installed and making sure I run
sudo port --select pip pip34 to set it, I can run:
sudo pip install ete3
If I run an IDLE session and
import ete3, I get no error prompts. Yay! Time to make some spam.