Spoken language becomes written language becomes internet language – let’s pass on AI language

Does this sound familiar?

The use of generative AI is on the rise–and you can hear it in the way we write. Its impact on language falls into three broad categories:

  • A uniform style: GenAI often produces language that is average, polished, and predictable.
  • One-sided perspectives: models recycle dominant patterns from their training data and tend to favor majority views.
  • A standard reasoning style: answers often follow similar lines of logic and argument.

Why this matters: language is not a neutral tool, but it helps shape how we think, make decisions, and live together.

You can tell right away that this text feels a bit off. The em dashes those long horizontal strokes that are longer than an ordinary hyphen, the bullet list with bolded keywords, the stock phrases like “not this, but that.” Yes, this is AI language. We know it from those suspiciously polished cover letters, the slickly optimistic LinkedIn posts, the emails that suddenly seem much better written than usual.

Uh, right, so… what we’re seeing is that the use of generative AI is increasing, you know. And if you look at what that does to the way we use language, well… first of all, you get this kind of samey style. It all sounds so… uh… average, neat, predictable. Then, second point, the perspectives are often pretty one-sided. Those models mostly pick up on what is already dominant, what most people say or think. So not exactly a lot of room for dissenting voices.

And third, the reasoning often follows the same track. You keep getting the same logic, the same steps, the same arguments. And yeah, that matters, because language is not just a tool for passing on information, it is also how we think, and uh, how we make choices, and all that.

That version is not exactly more pleasant to read. Of course, it was never meant to be read. Spoken language is clearly not the same thing as written language.

Written language is actually pretty strange

In an earlier blog, I already used a rant by Aristotle in which he saw the rise of reading and writing, and the technology of books, as a major danger. He had a point: once people learned to read and write, a separate kind of language emerged. Sometimes that leads to amusing mix-ups: some words really belong to the written register, and people start pronouncing them without ever actually having heard it. (For the Dutch version of this blog, I found some nice examples, but the English pronunciation is so illogical that it’s almost too easy.)

Notorious examples are epitome, hyperbole and category. And English placenames!

Epitome is often pronounced as ‘EPI-toam’, but the correct version is ‘eh-PIT-uh-mee’. Another one is hyperbole, which is often understandably pronounced as ‘hyper-bowl’, but which should be ‘hy-PUR-buh-lee’. Category is often heard as ‘kah-TEH-go-ree’ but it is ‘KAH-teh-go-ree’. These are Greek loan words, you might say, but ‘proper’ English is just as bad. For native speakers, Worcestershire might be logical, but for any second language learner, it’s horrible.

The technology of the book, and the skills that come with it, reading and writing, were designed to record, store, keep and reproduce language. That changed the way we use language. So it was not a surprise that computers would have an impact too. Built-in spellcheck has at least made some spelling mistakes almost disappear in the wild, while others stubbornly survive. Everyone knows the frustration of “damn you, autocorrect.” And it has given an extra push to another familiar habit: writing compounds as separate words because spellcheck recognizes the individual parts more readily than the long compound. (This effect is prominent in languages like Dutch and German, where it’s called ‘the English disease’, but even in English itself it can be observed.)

Internet language

When the internet was added to computers, our language changed again. Abbreviations such as WTF and LOL are widely understood and sometimes even slip into spoken language. And of course we owe the use of emojis entirely to the internet. We also see words being stretched out for emphasis: nniiiccceeee! That is something that grew out of online language and really only belongs to that domain. If you enjoy this kind of thing, the last two examples, along with many others, are discussed in the delightful book Because Internet by Gretchen McCulloch.

Another internet language trend I see a lot, especially on LinkedIn, is the style in which an argument is built out of paragraph after paragraph containing just a single sentence. It is a way of giving the writer’s statements extra weight. A point is really being made. And it also caters to the online reader who is scrolling quickly and apparently finds it easier to latch onto a sentence standing on its own.

AI language

The chances are quite high that AI will change the way we write. It is just so easy to produce a piece of text with AI that the total amount of written text is exploding. By now, more than half of all new text on the internet is generated by AI.

I previously compared data to oil, AI to plastic, and AI slop to microplastics. Just as plastics gave rise to disposable packaging, AI is now giving us disposable text. Is that really what we want? People are already saying: “It’s rude to show AI output to people.” A bit like how a prepackaged meal may be fine if you need something quick for yourself, but when guests come over, you do not serve them something straight out of a packet.

Changing language may be worrying, but that has always been the case. “The decline of language among our youth!” However, there are also concerns that the same thing may now happen to the way we reason. Do we end up with a monoculture if everyone turns to ChatGPT or another LLM for their facts and arguments?

There are indications that the use of large language models leads to “homogenization.” That applies to writing style, to the perspectives people adopt, and to the way they reason. The text at the start of this blog is a summary of that article. The examples they give are still somewhat anecdotal, and you could probably make similar arguments about the impact of books on spoken language. But recent research does suggest that two effects are already measurable. Texts on the internet now show a demonstrably narrower range of ideas and viewpoints, and they are also much more likely to adopt a forced optimistic tone.

Other effects were not yet measurable.

There was no increase yet in hallucinated “facts,” nor was there any decline in the number of source references. The amount of real “slop,” in other words lots of words and little substance, was not visible yet either. And surprisingly, variation in writing style was not really declining either, at least not yet.

The role of mass media has something in common with AI

The question is whether these trends will continue. The rise of mass media in the middle of the twentieth century did not really produce a monoculture either. Three quarters of a century ago, we had two television channels (at least, in the Netherlands) and a handful of national newspapers. The pub, the church, and the sports club canteen played the role that social media now play.

The mass media of that time are somewhat comparable to today’s AI language models. Journalists have access to all kinds of public sources in society, plus a number of non-public ones. They observe events and process facts into a story. But it is the editorial team that determines how that story reaches the public, by making substantive choices and setting the tone. In the end, the editors determine the overall character of a medium.

AI language models are trained on all the public textual information they can find on the internet, plus some additional sources that are not always entirely beyond dispute. Through pretraining, large language models learn language and, along the way, absorb quite a lot of factual material into the model itself. But it is fine-tuning that determines how a model presents itself to the user. Fine-tuning is what makes a chatbot follow instructions, adopt a certain tone, and learn which kinds of answers people experience as helpful. Fine-tuning determines the “culture” of a model.

You can take the analogy a bit further. Journalists have unconscious (and sometimes conscious) preferences, so they may not select all the facts. An editorial team can correct that only to a limited extent. LLM training material is not always representative or of sufficient quality either, and fine-tuning can only compensate for that to a limited degree as well. Consumers have subscriptions to newspapers and streaming services, but they make their own choices what to read or watch. Just like it’s the end user that chooses what prompt to submit to an AI chatbot.

So… a monoculture?

Democratic governments try to prevent a monoculture in mass media. That is why they support press freedom and aim for pluralism in public broadcasting. That is actually quite comparable to wanting pluralism among AI chatbots. Right now there are still dozens of language models and a fair number of chatbots, and you can see why it matters to preserve that variety.

A little bit of a monoculture may not be all that terrible: communication becomes easier if there is some sort of standard. To return to an earlier parallel, the printing press helped standardize spelling to some extent. As long as it is not pushed too far, spelling standardization is genuinely useful for readers. A standard only becomes a problem when it suppresses divergent voices and unexpected ideas.

How will this develop?

The media landscape has broadened enormously: the number of television channels has exploded, and alongside linear TV we now have a huge number of streaming services. Newspapers have become less central, but blogs and social media have joined them. So there is more variation, not less, and there are no real signs of a monoculture. If anything, the greater range of choices has produced a new kind of segregation of subcultures: everyone in their own bubble.

The parallel between AI and mass media is surely there, but it stalls at an uncomfortable point. Media is governed by by laws that mandate press freedom, public broadcasting is subject to requirements of pluriformity, and editorial boards of news channels have independency put down in statutes, foundations and other constructs. AI does not have that. Yes, we have the AI Act, but will it be sufficient?

The use of AI may well have the effect of making mediocrity the norm. Which is, by the way, a common criticism on mass media, too: mediocrity. But there will always be different AI systems, because language models do not really benefit from a winner-takes-all dynamic, though that is a topic for another blog. Each of those systems will have its own group of users, and within those bubbles the risk is that mediocrity becomes the norm, while other groups will still diverge from it.

Therefore, it’s good that the tech giants in the US continue fighting, and it’s even better that Europe, as modest its efforts may seem, works on their own tech with determination. This will keep that monoculture away.

Posted in

Plaats een reactie