For almost everything I do in text analytics, I find myself with a directory of texts which, in most instances, need to be turned into a list of strings, with each text its own item in the list. Here’s my Python boilerplate:
import glob
file_list = glob.glob('../texts' + '/*.txt')
mytexts = []
for filename in file_list:
with open(filename, 'r', encoding='utf-8') as f:
mytexts.append(f.read().replace('\n', ' '))
You can double-check your work by simply calling up any given text, using mytexts[1]
with the “1” being any number you want, remembering that Python starts counting at 0 and not 1, so your list of 12 texts, for example, will be 0-11.
And if you need to mush all those texts back into a single string:
alltexts = ''.join(mytexts)