As evidence continues to mount that we are living in a cyberpunk dystopia, I’ve decided to do a series of posts on artificial intelligence. This one is about text generation.
I’m something of an AI skeptic. While they’ve proven to be very effective in some areas, these are mostly tasks with a narrow scope and well-defined set of rules, such as playing chess and go, or determining whether something is a picture of a cat. More complex tasks like driving have proven to be a lot harder. Driving in particular has very ambiguous inputs that are highly context-dependent, two things that AIs have trouble with.
(Let’s please continue reading and not get hung up on the political and ethical roadblocks to self-driving cars.)
AIs have also historically had trouble generating text and images. Not because these are magical tasks that only humans can perform; it’s simply been difficult to make computers good at them. Well, much to my surprise, this may be coming to an end in the next few years. And I don’t mean that in the “self-driving cars are always five years away” sense.
In the last couple of years, computerized content generation has made some remarkable advances. Deepfakes–videos that believably replace one person with somebody else–are proliferating, and fairly easy to make. AIs are frighteningly good at generating faces now, too.
But what about text? Google Translate has gotten much more sophisticated lately as Alphabet has perfected the neural networks that power it. But is that more like writing, or more like playing go? I’d say it’s in between.
Which brings us to writing. Earlier this year, the nonprofit group OpenAI built a text-generation neural network that was, in their opinion, too frightening to release. I called malarkey, but now that they’ve been gradually releasing more sophisticated versions of their model, I call… less malarkey. What this means for you, and some examples, below the fold.
This neural network, called GPT-2, was trained on 40GB of internet text (which is a lot). I downloaded and set up the newest version of their model to play with it, and the results were impressive. When given a prompt, it will output its best guess of what the next word should be, over and over, until you have something approaching prose. Here are some that I generated, with the seed text in italics.
Just moments ago, a ballistic missile was launched from the Tonghae Satellite Launching Ground in North Korea.
It was tracked as it made its way to its target in Japan.
Japan’s Defense Minister announced this morning that North Korea launched a ballistic missile that flew over Japan into the Sea of Japan.
The missile’s trajectory is the same as last time North Koreans fired a ballistic missile. The North Korean missile launch happened just hours after the UN Security Council voted unanimously to impose additional sanctions on the reclusive nation because of their nuclear and ballistic missile programs.
China, North Korea’s main ally, condemned the launch and called for “urgent efforts by the international community to properly implement the resolution,” according to a statement issued from the Foreign Ministry.
The missile launch followed the US strike on Syria, which came after four of its air strikes were launched in retaliation to an alleged chemical weapons attack in the Syrian city of Khan Sheikhoun last week.
Supreme Court Justice Ruth Bader Ginsburg has died, her family reports.She was 88.
Ginsburg suffered her latest bout of complications in early June, her brother-in-law Ron Sachdeva said Tuesday morning. She was in intensive care following her surgery to relieve the pain from her latest bout of pneumonia, Sachdeva said.
“It is with a very heavy heart and with great sorrow that we say goodbye to our beloved Ruth,” Sachdeva told CNN.
She was brought back to the hospital to begin physical therapy in late June after being “sore for quite a while.”
“The treatment included a series of lung functions over the course of several weeks,” Sachdeva said.
A physician who was not authorized to speak publicly and requested anonymity told CNN last month that she was in critical condition.
Are they perfect? No. Do they always come out this well? No. But, when they do: Are they good enough to trick people who only skim them? Good enough to rile up the gullible or already-convinced? Good enough to generate natural-sounding tweets and website comments? Good enough to run an ongoing confusion & disinformation campaign?
Unfortunately, yes. Add the fact that it’s trivial to generate an endless supply of these, and we can see the problem. It’s already creating content that’s better than some of what comes out of the Russian propaganda mills.
These models are only going to get more sophisticated as time goes on. (Indeed, this model already is more sophisticated; they just haven’t released the whole thing). Neural networks leave fingerprints all over the stuff they create, and it’s easy to detect GPT-2 text, but who’s really going to go out of their way to do so?
We aren’t in a world of endlessly-scalable automated propaganda yet… but we’re getting to the point where I can see it on the horizon.
(If you’d like to play with a smaller version of this model yourself, you can try it at this website.)
?BillinGlendaleCA
AI is the new buzzword in photo processing, both Luminar and Photoshop have stuff they say rely on AI.
Major Major Major Major
@?BillinGlendaleCA: what sort of features?
smintheus
My b-i-l’s academic field is AI. I never understood why he would willingly devote his life to the destruction of humanity, but I decided it was better not to ask.
Mnemosyne
To paraphrase a Michael Crichton character, a lot of IT people seem to spend their time seeing if they can do something without stopping to wonder if they should do it.
Mnemosyne
And, yes, this is already a suspected problem for self-published authors who are being pushed off of Kindle Unlimited by what seem to be AI-generated books that get a quick QC by humans for obvious mistakes before they get pushed out onto KU at a far faster rate than any human author could match.
Major Major Major Major
@Mnemosyne: I put in the first line of a few of my stories and it took them in some weird directions, but matched the voice and subject matter. The latter being particularly impressive since I write sci-fi & fantasy.
?BillinGlendaleCA
@Major Major Major Major: Luminar just updated and sky replacement was the big one, it works pretty well with some exceptions. Photoshop has a new “Object Select” tool that they say uses AI.
Mnemosyne
@Major Major Major Major:
Among other things, it’s a whole new way to plagiarize other authors in a way that makes it very hard for you to get caught because of the sheer volume produced.
?BillinGlendaleCA
@?BillinGlendaleCA: I’ll give a couple of examples:
Here’s a case where sky replacement worked well.
Here’s one where it worked badly(the sky was blown out on the original Kodachrome slide).
In the second case the replacement sky blended over on to the faces on Mt. Rushmore(they ended up with blue parts) since there wasn’t sufficient contrast between the blown out sky in the slide.
Jeffro
Nadler with a stellar opening statement
Jeffro
Collins declaring for all time that he is scum
MattF
The linguistics blog ‘Language Log’ has a current post entitled ‘AI Is Brittle’ showing, as you note, that AI can be trained for narrow tasks, but fails dramatically for broader ones. Also, the computational burden required to do even narrow tasks is enormous.
Cheryl Rofer
Machine translation of French and German is pretty good, Russian less so, and highly inflected languages like Finnish and Estonian, bad to incomprehensible. I can sympathize, because when I started with Estonian, I found that I couldn’t even look up maybe 75% of the words. But you get a sense for how the genitive is formed, and it gets easier from there.
I find it useful to speed up translations when I’ve forgotten a lot of the vocabulary in French and German. I don’t fully trust machine translations from Russian, and I use the machine translation to pick out words I don’t know in Estonian and do my own translating.
Major Major Major Major
@MattF: the computational requirements will be a blocker for extremely widespread adoption, but a few grand a month will get you a pretty good server, and a nation or corporation could easily afford more.
Amir Khalid
Your faked RBG death report gave me quite a scare. Its style is convincing. News reporting is by its nature very formulaic, and is probably easier for a machine to fake than many other kinds of human-generated writing.
Major Major Major Major
@Cheryl Rofer: I was shocked by how good the Japanese translation was in March. It’ll do pictures.
Brachiator
@Major Major Major Major:
So, an author with writer’s block or who is stuck on a particular plot point, might enter some text to see where the story might go.
Would this be a bad thing?
Could an AI get co-author credit?
NotMax
“Ai yi yi yi yi.”
– Carmen Miranda
Doug R
Now that we’re safely in the main body of comments, I’ll add this:
Driving is one of those tasks with a lot of rigid logical rules surrounded by countless seemingly random events.
Every decision driving is an educated guess-is that bicycle going to wobble into my path? Is that jogger heading to the corner going to run into the road? Is that dual wheel pickup that wanders across the lane marker going to do that when I pass him? Is there a patch of compact snow/black ice at the intersection where the light may be turning yellow any second?
Major Major Major Major
@Brachiator: I don’t see anything wrong with that; a few writers have done it and written about it.
Major Major Major Major
@Amir Khalid: it also generates good anti-vax polemics and, surprisingly, Hemingway.
Goku (aka Amerikan Baka)
@Mnemosyne:
This is my big beef with humanity. There has never been any true coordinated planning for the future. We’ve known for over a century that CO2 is a greenhouse gas. Several industrial processes emit greenhouse gases and yet very little has been done to correct this or move to cleaner alternatives.
I refuse to believe that we must live in the Stone Age in order to live in balance with nature. Fossil Fuels should have only been a stepping stone towards more advanced technologies, not an ends to itself. But of course powerful vested economic and ideological interests have conspired to make it so, with most people going along with it because they’re ignorant and stupid.
The issue, as I see it, is human nature. Human nature must change if humanity, as the only known intelligent life in the universe, is to survive. I believe humanity is worth preserving.
Nation states and national sovereignty are also outdated concepts that should be abolished.
Leaders such as Jinping, Putin, Trump, and Muhammad Bone Saw are evidence of this failure. They are mad men who will destroy the world and doom human civilization and must be removed from power
schrodingers_cat
During the recent getting together of three parties to keep the BJP out of power in Maharashtra, one way to avoid BJP troll operation was to switch to Marathi. The operation must be highly centralized because at least for the first few days their Marathi tweets were painful to read and sounded like gibberish. Or they would just switch in Hindi which was a signal to ignore those tweets.
Roger Moore
@?BillinGlendaleCA:
The whole thing about this is that AI is a very broad term that means different things to different people in different contexts. What we’ve gotten much better at recently are expert systems, i.e. systems that have been trained on an absurd amount of data so that they can do a task really well. As long as you have enough data to feed them, you can get them to the point they’re really good at doing that thing. At the same time, we aren’t particularly close to general purpose AI, which would be capable of picking up new skills just by observing and searching out stuff on its own.
NotMax
@Roger Moore
Indeed. GIGO lives.
AI can provide content (as opposed to context) following the form of grammatically satisfactorily strung together words but nuance and a binding and/or elaborate skein of meaning – the similar yet still unique branches necessary as structure for the life-giving greenery – are AWOL.
Major Major Major Major
@Goku (aka Amerikan Baka):
Chairman Xi’s family name is Xi.
Butter Emails
Not sure whether the AI just sucks at geography or is accurately reflecting how bad our presstitutes are at geography.
Amir Khalid
@Major Major Major Major:
AHA! I knew it!
NotMax
C’mon, M⁴, do one which begins “It was a dark and stormy night.”
;)
Major Major Major Major
@NotMax: Give me 1-2 more sentences and I will!
?BillinGlendaleCA
@Roger Moore: I think the AI, as far as it goes more than a buzzword, in Luminar and Photoshop is more of a static AI. I’m not sure that it continues to learn as it sees new photos.
Roger Moore
@NotMax:
What I’m saying more specifically is that if you want a program that generates plausible sounding text, you have to get a program and feed it a whole bunch of text. You can’t then take the same program and feed it a whole bunch of pictures, chess games, and medical charts and expect it to become a simultaneous expert in text generation, picture classification, chess playing, and pathology. You may be able to make programs that are good at each of those things, and they may even have some features in common, but you can’t just switch from one to the other.
It’s also notable, and probably related, that these AIs require far more material to get good at what they’re doing than a human would. As Major^4 points out above, this AI has been trained on 40 GB of text, which is a lot. It’s the equivalent of thousands of novels worth of text, far more than a person would need to read before they were able to write convincing text. Similarly, a computer probably needs to see literally millions of cat pictures before it can reliably identify a cat in a photograph.
Liam Yore
@Major Major Major Major:
jesus tapdancing christ! I was skimming and not reading carefully and I saw this in bold:
“Supreme Court Justice Ruth Bader Ginsburg has died, her family reports.”
my blood ran cold, I soiled my britches, and keeled over dead from a heart attack and I am now suing you for eleventy million dollars in damages.
Don’t DO that to me!
Mnemosyne
@Major Major Major Major:
The canonical sentence that follows is, “Suddenly, a shot rang out.”
I don’t think Snoopy ever got to a third sentence.
Roger Moore
@Major Major Major Major:
How about one long sentence:
It’s the opening sentence of Paul Clifford by Edward Bulwer-Lytton, the namesake of the bad opening sentence writing contest.
NotMax
@Major Major Major Major
How about Bulwer-Lytton’s complete opening sentence, then.
Roger Moore
@Major Major Major Major:
I have a great, or perhaps terrible, idea for how to misuse this tool. You should feed the program the “winners” of the Bulwer-Lytton bad writing contest and see where they go.
NotMax
@Roger Moore
You comment was not there when I mashed the Post Comment button.
different-church-lady
Someday, soon, machines will be just as fuckin’ stupid as people.
Roger Moore
@NotMax:
Jinx!
different-church-lady
@Mnemosyne: He did, quite a few times in fact.
Major Major Major Major
Mnemosyne
@different-church-lady:
Already happened. Remember the Microsoft twitterbot that had to be taken down because it became virulently racist within 24 hours of interacting with humans in the wild?
Mnemosyne
@different-church-lady:
Now I want Majorx4 to feed that into a computer and see what comes out. ?
Jay
@Major Major Major Major:
It was a dark and stormy night. The windows rattled in their frames from the thunderclaps and the shingles shook from the icy gusts of wind. In the light of the candles, Carol stood there, looking down at her bloody hand and the knife, thinking to herself, over and over like a litany or a prayer, “Bob should have taken the garbage out when I asked.”
enough sentences?
Uncle Cosmo
A close election with voting a week away. A webserver pops up on the net, blasts out video to a few hundred thousand recipients. The grainy but rather graphic video appears to show one of the candidates engaged in sex with children. The server has vanished & cannot be traced, but the videos go viral.
How do you stop that? How do you defend against it once it’s out? How do you even convince worthwhile candidates to run for office if they understand they risk this sort of treatment?
I’ve been telling folks for years that this is coming. Apparently it’s nearly here.
Major Major Major Major
@Liam Yore: Sorry about that, I figured putting them below the fold would be enough context.
different-church-lady
@Uncle Cosmo:
IT’S IN UKRANE, NUMBNUTS!!1!
2liberal
I’ve got those google waymos all over the place here in the PHX-burbs. One woman got killed by an AI driven car (not google) where the rider was texting while supposedly safeguarding the pedestrian population.
Roger Moore
@Major Major Major Major:
You’re certainly right about it doing a good job of matching the tone of the original.
different-church-lady
Did anyone else notice the media was all “SELF DRIVING CARS! SELF DRIVING CARS! SELF DRIVING CARS!!!” until the day one of them killed a pedestrian on a public street and then all of a sudden nobody wanted to talk about self driving cars anymore?
Mnemosyne
@2liberal:
They do those tests in AZ because mean ol’ California law says they have to take safety precautions.
Major Major Major Major
@Roger Moore: It’s really something.
NotMax
@Major Major Major Major
Now the obvious question to ask is whether or not the result is identical if you feed the same data in over multiple tries.
And another experiment (although you might want to opt for less slang and use “the Exchange” in place of ‘Change):
Major Major Major Major
@NotMax:
It generates a random seed each time. And I’m afraid I’ve put the computer running it away, but you can try the site I link to at the end of the post.
Major Major Major Major
@Uncle Cosmo: Does seem that way.
Elizabelle
@Uncle Cosmo: I know. The potential is horrifying.
joel hanes
Relevant, and short
“The Great Automatic Grammatizator” by Roald Dahl, 1953
https://roalddahl.fandom.com/wiki/The_Great_Automatic_Grammatizator_(short_story)
joel hanes
@Roger Moore:
Bulwer-Lytton contest entry I remember:
“Her eyes were like two brown circles with black dots”
Steeplejack
Redacted for extreme tardiness.
Martin
@Uncle Cosmo: Technologically it’s here.
If you ran the operation out of the WH, nobody would ever be able to investigate and debunk it, even if you had a whistleblower.
Honestly, I’d rather Congressional Dems do the experiment rather than wait until one is in the wild, when it’s too late to do anything about it.
J R in WV
So thankful for the interesting yet fatal stories Balloon Juice is providing today.
Major Major Major Major
@J R in WV: we live to serve.
Bill Arnold
@Roger Moore:
How about this (Lauri Anderson):
It was one of those black cat nights. The moon had gone out and the air was thin. It was the kind of night that the cat would drag in.
The small model linked at the end of the top post sucks at continuing it in at least 10 tries.
For that matter, nearly all of the generated sentences in the top post feel like they did not emerge from a sentient mind. One or two were marginal. But one has to actually read them into meaning to notice this.
EAlbert
@NotMax:
The phrase is “dead as a doornail” because on old handmade doors, you had a number of vertical boards held together with horizontal boards at the top middle and bottom. These boards were nailed together with nails that were longer than the two boards. The end of the football’s were bent over so that they couldn’t be pulled out of catch on anything. Since the couldn’t be reused they were “dead”.
Mike J
What’s the joke? ML is usually written in R. AI is written in Powerpoint.
Major Major Major Major
@Bill Arnold: you don’t find passages like this… indistinguishable from a newspaper article?
Bill Arnold
@Major Major Major Major:
The first is a bit weird because the North Korean weapons programs have existed for a long while and usually a UN resolution is in response to some change. But it’s at least semantically marginal.
The second mentions a “resolution” without giving a name/number (there have been a bunch,, and weirdly separates “China” and “Foreign Ministry” rather than simply saying the “The Chinese Foreign Ministry issued X Y Z”. A bad and ignorant human writer could have written it, true.
Jinchi
We know that major news organizations write obituaries of major actors well in advance, and I wouldn’t be surprised if something very similar is literally ready to go, after swapping out key details like dates, names and places.
Major Major Major Major
Which is 1) actually a big AI achievement! And 2) all it takes for some influence ops; you aren’t submitting it to the New Yorker.
NotMax
@EAlbert
Don’t tell me, I didn’t express ignorance of it. That was Dickens; tell him.
Kayla Rudbek
@Cheryl Rofer: and machine translation from Japanese, Chinese, or Korean also has a lot of problems IMAO
lol chikinburd
oh my god
lol chikinburd
Hmm. Editing’s being weird.
lol chikinburd
trying to be less wall-of-text-y, but this editor’s not cooperating; it previews smaller, and then it publishes at full size
Ruthless
“…determining if something is a picture of a cat…” As an ex-software engineer…it’s not trivial for a human to make a computer have the ability to recognize a picture of a cat. I agree, self-driving cars are not quite there yet. They will get there, but…where’s my flying cars, eh