There seems no end to the number of posts and videos, as well as books and other materials, that seek to explain data science at various levels and to various audiences. I try to read them as I come upon them in order, first, to discover what new things I can learn and, second, to determine if they might be useful as a basis for explaining things to others.

Data science is such a multitude of things: applied mathematics (statistics and probability, yes, but also linear algebra and calculus). I did not get a good foundation in any of these in my formal education, so I am having to make up for that. Mathematics and programming are best learned by practicing. I don’t get the chance to do that as much as I would like — it’s hard to find time to do the essential re-working of a professional career that tracked along one path for such a long time.

It’s hard to know where to start. The list below is a collection of things that I am currently working through in an effort to learn as I go. I keep it here on the blog to make it easier to access when I am traveling. (If I have access to a web browser, I am set to learn.) As of Spring 2022 this list is very much under construction: use at your own risk.


“One popular misconception [about machine learning] is that people think they have enough data when they don’t. When people say machine learning, a very large segment of predictions are based on existing data. And in order for that to work, you generally have to have a big labeled set of data,” says Hillary Green-Lerman of Codecademy.

Emphasis on labeled.


“People often don’t realize how much of machine learning is getting data into a format so that you can feed it into an algorithm. The algorithms are actually usually available pre-baked,” Hillary said. “In a lot of ways, you need to know how to pick the best linear regression for your data, but you don’t really need to know the intricacies of how it’s programmed. You do need to work the data into a format where each row is a data point, the kind of thing you’d want to pick.

Cookie Components

This Reddit post uses data analysis techniques to distinguish between cookies, pastries, and pizzas in order to win an office party argument. And there’s data too — “1931 recipes from the Food Network that contain the keywords cookies (my group of interest), pastry, or pizza (two control groups).”


Me a Data Scientist?

As things continue to deteriorate here in Louisiana, and it becomes increasingly obvious that what our administration wants from faculty, especially humanities faculty, is for us to become teaching bots, I find myself more and more interested in non-academic alternatives. And, the fact is that I really enjoy my current work on the small end of the big data revolution, or however it’s termed these days.

Mostly, it seems increasingly to be termed *data science*, but what people mean by that can vary. As I try to understand this emergent field, both from the removed position of a humanist just trying to track how ideas and practices play out in history as well as a humanist who maybe wants to play on those fields himself, I find myself looking at various data science programs. UC Berkeley’s School of Information offers a more traditional program, but there is also [Zipfian Academy](http://www.zipfianacademy.com/). They offer 12-week intensive programs and the possibility of some tuition relief. (And that sounds pretty good to a poor Southern humanist, or is a Southern humanist poor by definition?)