TikTok and ChatGPT in the News

16 Jan 2023

The lead article in this news-roundup isn’t about ChatGPT at all, but rather about the current trend among state governments to ban TikTok on state-issued devices and for public universities, usually in the same states, to ban TikTok on their wifi networks. The ostensible, and perhaps actual, reason for doing so is because of the data that TikTok can, or does, collect on its users with the additional factor of TikTok’s unclear relationship to/with the CCP-run Chinese government.

To be clear, governments, and their publics, should be concerned about data collection by social media platforms, as well as all other businesses and organizations, including themselves.¹ Given the amount of data currently already available, what more the Chinese government, or any other entity, needs to know about each individual American citizen is really a matter of finer strokes of the brush.

Here’s a partial account of the data already out there:

YEAR	PLATFORM	ACCOUNTS	DETAILS
2018	Instagram, TikTok, YouTube	235 million	profile name, real name, profile photo, likes, age, gender, +
2018	Facebook	30 million	everything
2018	Facebook	419 million	IDs, names, phone numbers
2019	Facebook	540 million	IDs, comments, likes, “reaction data”
2019	Facebook	533 million in 106 countries	IDs, phone numbers, “other info”
2021	LinkedIn	500 million	full names, email, phone numbers, workplace information, +
2021	Clubhouse	1.3 million	User ID, Name, Photo URL, Username, Twitter handle, Instagram handle, + *
2021	Parler	60TB user data	all network activity
2021	Gab	60TB data	all posts (public and private)

Given this data, and the ability for an entity with the will and means to do so – and the means to do so amounts to sufficient computational power and data storage, each of which still gets cheaper every year – the ability to generate custom material that addresses a user with the correct form and content to get inside their information bubble is now entirely not only imaginable but feasible.

When you add in the ability to run A/B testing to see what works, and how well it does (and to whom the user passes on the package), and what does not work, functionality which already exists on almost all social media platforms, you have the ability to deliver with remarkable precision exactly the package you want delivered.

This is something I explored with the Army over the last two years, but with the rise of ChatGPT, and other generative AIs, it has begun to creep into public discourse that we are facing a new landscape, even now, as glimpsed in a recent report for Yahoo Finance, which notes “90% of online content could be generated by AI by 2025.”

Elsewhere, the NYT has coverage of the concerns over ChatGPT, and how they might be addressed, are working their way through universities.

For the record, I think Kevin Roose, also writing for the NYT has the right approach: it makes me feel a little sorry for younger people that so much of the world as they will encounter it will be generated for them, but not necessarily of their choosing.

The mantra, which should be a policy (or even a law?), for any organization should be not to collect any data you are not prepared either to spend inordinate time and sums of money protecting or are prepared to lose. ↩

You can go back to the logbook or dive into the archive. Choose your own adventure!