The process I am going to describe here is drawn from my experience with Bookends, but I am sure the functionality is available in other reference management apps as well. I chose Bookends because it’s focused on Mac users and thus its GUI is native to the platform. I am fairly certain that Zotero has similar functionality, and I may end up using it when I am on Windows (and also because on Windows I am part of a team). The process I have in mind is adding a new reference and then adding its concomitant PDF.
First, an establishing shot drawn from work I am doing now for an essay about COVIDlore. This collection is built on top of some previous work on the flu. (Somewhere I also have Zika and Ebola bibliographies, and one day I will migrate them here as well — for those curious about the library just above entitled Legends/Virality it is in fact related but more focused on the notion of informational “virality.”)
To add a new reference, I usually use the Quick Add function, which is handily called with
CMD + CTRL + N:
I can paste the DOI from the website where I found the reference, which may or may not be the originating site — it could be a reference from another paper, for example, and Bookends does all the lifting. (This works 80-90% of the time, and so it is frustrating when it doesn’t, but there is a built-in browser that allows you collect metadata for a reference quickly.)
Once the reference is in the collection, I then
CMD + OPT + R to fetch the PDF to the reference. (If you have already downloaded the PDF, you can use
CMD + OPT + P to attach it from a local source.)
That’s it. The PDF is now in that particular collection as well as in the main library. Since the PDF is sitting in a particular folder which I also have indexed by DevonThink, I can take notes in that app, which will create an annotation file just for that purpose.
Marble Science does not have a lot of videos produced just yet, but his/their explanation/demonstration of Monte Carlo simulations is top-notch and well worth the 8 minutes.
I am fundamentally ambivalent about the automation of text-cleaning: spending time with the data, by getting unexpected results from your attempts at normalization strikes me as one way to get to know the data and to be in a position to do better analysis. That noted, there have been a number of interesting text-cleaning libraries, or text-cleaning functionality built into analytic libraries, that have caught my attention over the past year or so. The most recent of these is
clean-text. Installation is simple:
pip install clean-text
from clean-text import clean
clean(the_string, *parameters*) takes a number of interesting parameters that focus on a particular array of difficulties:
In the Science Museum of Minnesota this past weekend, I found myself pouring rather closely over the mollusk exhibit, if only because I could not fathom—pardon the pun—why there was such a grand display of them. I didn’t have to look far: Mollusks are often the leading indicators that an environment is in danger. The museum is home to a larger collection of mollusk shells, many of which reveal where mollusks once thrived but are now scarce or non-existent thanks to changes in the landscape brought about by agricultural or industrial contamination.
Leading indicators is, of course, quite popular right now, because a lot of people are interested in being able to predict the future based on the kinds of early successes we have had with machine learning whereby algorithms trained on a reasonably large dataset can discover the same patterns in new data. I am reminded of work reported by Peter Brooking and Singer on XXX as well as the “calculus of culture” being developed by Jianbao Gao.
What you would need of course is a reasonable definition of the parameters of your “socio-cultural matrix.” The social dimensions would be all those non-text data points that might be of interest and associated with humans either individually or collectively. The cultural dimensions would be texts and other discernible behaviors, again either described individually or collectively. We know this is possible because, to some degree, Cambridge Analytica has already done it, and we can be sure that other organizations are doing the same and just not talking about it. (In a moment when all this data is available, either by cash, hack, or crook, you would be a fool not to collect it all and compile it as many ways as you can possible imagine, and then some.)
Breaking off some piece of this larger matrix, or set of matrices, is something we all need to be better about doing: modeling complex environments is something that needs to get taught and practiced more widely—all the while reminding people never to mistake the map for the territory. To some degree the social sciences do some of this at the computational end, but the kind of statistical, and sometimes speculative, modeling suggested here is not as pervasive in public discourse as it should be.
A useful map of Africa that features population densities: good to know where the people are.
There has to be a more elegant, and pythonic, way to do this, but none of my experiments with nested list comprehensions or with itertool’s
chain function worked.
What I started with is a function that creates a list of sentences, each of which is a list of words from a text (string):
def sentience (the_string):
sentences = [
[word.lower() for word in nltk.word_tokenize(sentence)]
for sentence in nltk.sent_tokenize(the_string)
But in the current moment, I didn’t need all of a text, but only two sentences to examine with the NLTK’s part-of-speech tagger.
nltk.pos_tag(text), however, only accepts a flat list of words. So I needed to flatten my lists of lists into one list, and I only needed, in this case, the first two sentences:
test = 
for i in range(len(text2[0:2])): #the main list
for j in range (len(text2[i])): #the sublists
I’d still like to make this a single line of code, a nested list comprehension, but, for now, this works.