[Open Refine] is a “tool for working with messy data: cleaning it; transforming it from one format into another; extending it with web services; and linking it to databases.” Link takes you to a page with lots of video tutorials. There is also Thomas Padilla’s [Getting Started with OpenRefine].
[Open Refine]: http://openrefine.org
[Getting Started with OpenRefine]: http://thomaspadilla.org/dataprep/
[Why the explosion in machine learning?][q] As always, major and minor reasons. Major reason? Data. Lots and lots of data, both because we human beings have put so much up ourselves, but also because businesses, and other organizations — Hello!, NSA! (Call when you’re ready to talk about how I can help!) — have collected so much. And that’s the minor reason right there, if one can consider it minor: organizations want to “learn” things from all this data.
One day I would like to be considered a [data scientist](http://jobsearch.money.cnn.com/data-scientist-jobs-Q0,13.html). There are jobs aplenty.
I was working on a post that outlines my own version of “Text Analytics 101” that I have been using in freshmen writing classes for the past three years, and I found myself considering, momentarily, the uses of “text mining” versus “text analytics” and “data mining” versus “big data.” I’m sure there are distinctions to be made between the two terms, but it’s also the case that terms map onto various disciplines/domains and or historical moments. A quick ngram search in Google, which is based on Google Books, produced the following graph:
Data Mining vs Big Data
A similar search for the first pair produced the following:
Text Mining vs Text Analytics
The only thing the two graphs suggest to me is that, possibly, the latter terms appear later and thus haven’t made it into paper. I would like to do a similar search of ngrams on the web, but I haven’t found the same simple interface for doing this kind of quick survey.
It’s all fun and games until daddy learns his daughter is pregnant.