Texts as Data

The amount of data you need for textual analysis / text analytics varies from domain to domain and, even within domains, from project to projects. Withint the realm of the computational humanities, a single text may be the focus of anything from a research note to a book-lenght exploration. As you move across the disciplinary spectrum, from the humanities through corpus stylistics to informatics, collections, corpora, and data sets tend to have a larger number of texts.

But it all starts with data, er, texts.

But you want the kind of data that humanities scholars, textual scientists as Katherine Kinnaird sometimes calls us, find compelling. Here is a lightly curated list that might help you find the kinds of texts you seek.

Cultural Data of All Kinds


45 places you can download tens of thousands books, plays and other literary texts completely and legally for free compiled by Professor Wu, “a four-foot Chinese Salamander dubbed “critically endangered” by the International Union for Conservation of Nature,” for Nothing in the Rule Book a site that seeks to make imaginative work free in a world where too many people are getting priced out of reading.

In 2021 Hazel Clementine published an omnibus post on Medium of “28 places to find free books.” Her post is gone, but the mega-list remains:

100 Folklore Texts on Gutenberg