Graph Visualization Libraries

Network visualization is something I thoroughly enjoy, and I am fooled by the ease with which one can manipulate graphs in an application like Gephi. I would like to learn how to take more control of the process for myself — and, with luck, learn more about network theory in the process. For that, I need some graph visualization libraries, and over on Medium Elise Devaux has that covered: List of Graph Visualization Libraries.

Python List Indices

I needed to delete this code from a notebook, because it wasn’t doing anything, but the code was useful enough to keep:

shorts = [ texts.index(text) for text in texts if len(text) < 500 ]

Lessons

There seems no end to the number of posts and videos, as well as books and other materials, that seek to explain data science at various levels and to various audiences. I try to read them as I come upon them in order, first, to discover what new things I can learn and, second, to determine if they might be useful as a basis for explaining things to others.

Data science is such a multitude of things: applied mathematics (statistics and probability, yes, but also linear algebra and calculus). I did not get a good foundation in any of these in my formal education, so I am having to make up for that. Mathematics and programming are best learned by practicing. I don’t get the chance to do that as much as I would like — it’s hard to find time to do the essential re-working of a professional career that tracked along one path for such a long time.

It’s hard to know where to start. The list below is a collection of things that I am currently working through in an effort to learn as I go. I keep it here on the blog to make it easier to access when I am traveling. (If I have access to a web browser, I am set to learn.) As of Spring 2022 this list is very much under construction: use at your own risk.

Flattening a List of Lists in Python

Sometimes you have a list of lists and you just need a list. In my case, I have a list of texts within which is a list of sentences. But all I really need is the list of sentences. To peel off the additional layer of listiness, use the following list comprehension.

flattenedList = [[t for t in l if None not in t] for l in test]

And if that doesn’t work, try flattening:

import itertools
flat_list = list(itertools.chain(*regular_list))

UPDATE: some better code using itertools:

from itertools import chain

flattened = chain.from_iterable(iterable)

Automating Text Cleaning

I am fundamentally ambivalent about the automation of text-cleaning: spending time with the data, by getting unexpected results from your attempts at normalization strikes me as one way to get to know the data and to be in a position to do better analysis. That noted, there have been a number of interesting text-cleaning libraries, or text-cleaning functionality built into analytic libraries, that have caught my attention over the past year or so. The most recent of these is clean-text. Installation is simple:

pip install clean-text

And then:

from clean-text import clean

The clean(the_string, *parameters*) takes a number of interesting parameters that focus on a particular array of difficulties:

Quick Labels with Python’s f-string

Sometimes I need a list of titles or labels for a project on which I am working. E.g., I am working with a toy dataset and I’ve created a 10 x 10 array and I want to give the rows and columns headers so I can try slicing and dicing. I prefer human-readable/thinkable names for headers, loc over iloc in pandas-speak. And this one-liner works a treat, as they say:

labels = [label{item}' for item in range(1,11)]

Done. Place it into your dataframe creation (as below) and you are good to go.

df = pd.DataFrame(data=scores, index=names, columns=labels)

Flattening a List in Python

There has to be a more elegant, and pythonic, way to do this, but none of my experiments with nested list comprehensions or with itertool’s chain function worked.

What I started with is a function that creates a list of sentences, each of which is a list of words from a text (string):

def sentience (the_string):
    sentences = [
            [word.lower() for word in nltk.word_tokenize(sentence)]
            for sentence in nltk.sent_tokenize(the_string)
        ]
    return sentences

But in the current moment, I didn’t need all of a text, but only two sentences to examine with the NLTK’s part-of-speech tagger. nltk.pos_tag(text), however, only accepts a flat list of words. So I needed to flatten my lists of lists into one list, and I only needed, in this case, the first two sentences:

test = []
for i in range(len(text2[0:2])): #the main list
    for j in range (len(text2[i])): #the sublists
        test.append(text2[i][j]) 

I’d still like to make this a single line of code, a nested list comprehension, but, for now, this works.