I regularly need to split larger text files into smaller text files, or chunks, in order to do some kind of text analysis/mining. I know I could write a Python script that would do this, but that often involves a lot more scripting than I want, and I’m lazy, and there’s also this thing called csplit
which should do the trick. I’ve just never mastered it. Until now.
Okay, so I want to split a text file I’ll call excession.txt
(because I like me some Banks). Let’s start building the csplit
line:
csplit -f excession excession.txt 'Culture 5' '{*}'
… Apparently I still haven’t mastered it. But this bit of awk
worked right away:
awk '/Culture 5 - Excession/{filename=NR"excession"}; {print >filename}' excession.txt
For the record, I’m interested in working with the Culture novels of Iain M. Banks. I am converting MOBI files into EPUBs using Calibre, and then into plain text files. No, I cannot make these available to anyone, so please don’t ask.
The Culture series:
- Consider, Phlebas (1987)
- The Player of Games (1988)
- Use of Weapons (1990)
- The State of the Art (1991)
- Excession (1996)
- Inversions (1998)
- Look to Windward (2000)
- Matter (2008)
- Surface Detail (2010)
- Hydrogen Sonata (2012)