def clean_text(rgx_list, text):
new_text = text
for rgx_match in rgx_list:
new_text = re.sub(rgx_match, ' ', new_text, flags=re.IGNORECASE)
return new_text
Monthly Archives: February 2022
Python List Indices
I needed to delete this code from a notebook, because it wasn’t doing anything, but the code was useful enough to keep:
shorts = [ texts.index(text) for text in texts if len(text) < 500 ]
Lessons
There seems no end to the number of posts and videos, as well as books and other materials, that seek to explain data science at various levels and to various audiences. I try to read them as I come upon them in order, first, to discover what new things I can learn and, second, to determine if they might be useful as a basis for explaining things to others.
Data science is such a multitude of things: applied mathematics (statistics and probability, yes, but also linear algebra and calculus). I did not get a good foundation in any of these in my formal education, so I am having to make up for that. Mathematics and programming are best learned by practicing. I don’t get the chance to do that as much as I would like — it’s hard to find time to do the essential re-working of a professional career that tracked along one path for such a long time.
It’s hard to know where to start. The list below is a collection of things that I am currently working through in an effort to learn as I go. I keep it here on the blog to make it easier to access when I am traveling. (If I have access to a web browser, I am set to learn.) As of Spring 2022 this list is very much under construction: use at your own risk.
- Rinu Gour’s We are Living in “The Era of Python” is a brief intro to Python with a link to a Python course at its end.
- Data Flair has a list of lessons, including a Python Tutorial for Beginners.
- Dario Radečić’s The Ultimate Data Science Prerequisite Learning List.
- Dhruva Krishna’s Your historical, theoretical and slightly mathematical introduction to the world of Machine Learning stumbles a bit in its explanations, but his hand-drawn illustrations are quite compelling.
Flattening a List of Lists in Python
Sometimes you have a list of lists and you just need a list. In my case, I have a list of texts within which is a list of sentences. But all I really need is the list of sentences. To peel off the additional layer of listiness, use the following list comprehension.
flattenedList = [[t for t in l if None not in t] for l in test]
And if that doesn’t work, try flattening:
import itertools
flat_list = list(itertools.chain(*regular_list))
UPDATE: some better code using itertools
:
from itertools import chain
flattened = chain.from_iterable(iterable)
Syncing Files on macOS
When you don’t have access to Carbon Copy Cloner, which is apparently uses rsync, there is rsync
itself: rsync -avrP source dest
with -E
thrown in when files are “particularly Mac-ish” according to Ars Technica community member Jonathon.