There have been some big changes on the AI-generated-images front lately. Most of them have been spurred by a model called Stable Diffusion, which has two novel features: its architecture, and the fact that the creators released it for free use at home.
Architecture-wise, it’s a latent diffusion model, which (roughly speaking) combines the best parts of several different architectures into something flexible, powerful, and, most importantly, small. OpenAI’s DALL-E 2, which I’ve written about before, could theoretically be run on a $1000 consumer graphics card, but more realistically you’re looking at one that costs at least ten times that much–and you’d need to get your hands on the model itself, which you cannot. Stable Diffusion has neither of these problems. You can make do with a $150 card, and it’s really easy to download and set up. (If you have an NVIDIA card and want to give it a shot, I recommend this repo.) This has resulted in a dramatic shift in the way people use and think about AI-generated art.
The art world is now freaking out about it to a degree that they were not freaking out about DALL-E 2 or other models. Many artists would like their images removed from the training set, since people are making images mimicking their styles. Particularly outspoken on this front is fantasy artist Greg Rutkowski, whose name appears in so many image prompts that you’d be forgiven for thinking Stable Diffusion was a Greg Rutkowski painting generator. There’s something to this–it’s really easy to generate a decent composition of a recognizable subject in something approximating an existing artist’s style. Certain art gigs may be in peril here, especially things like “indie sci-fi/fantasy board game illustrator”, “corporate noodly-arm clip-artist”, “lovecraftian horror concept artist”… you can peruse the images (sometimes NSFW) that were made during the Stable Diffusion beta to see what I’m talking about. (Notice how it’s all a little same-y? That’s because like two-thirds of those include Rutkowski in the prompt. Poor guy.)
But there are clear limitations to these things. (Read on for a discussion of that, and porn, and another Samwise drawing.)
These AIs aren’t good at producing more than one subject in a picture. Things are usually a little incoherent. The resolution is quite low unless you have specialized equipment–with a few exceptions, my gaming laptop maxes out at about 1024 pixels on a side, which at the 300dpi typical for a high-quality print gets you about 3.5 inches. Perhaps most damning is that the best you can usually do is good enough. At the risk of sounding like an essentialist, these pieces lack spark. There are no visual puns, no relationships to other generated images, no symbolism, none of the things I associate with good art. You end up with subpar low-end commercial pieces–the sorts of things sci-fi and fantasy publishers would bulk purchase in the 1980s to stockpile for book covers–but worse. Will this be good enough for many people who would otherwise hire artists? If not now, then soon, I suppose.
I’d argue, though, that when the dust settles, you’ll see these models as a productivity multiplier for existing artists. They are also useful as a major shortcut for visual-art hobbyists like me, who could puzzle our way to an image from scratch, or get 75% of the way there in a browser window and then paint over it. This tweet has a good example of what the model produces vs. what an artist might do with it:
“AI is the future of art” pic.twitter.com/BH3qnhN1Kp
— Jon Neimeister (@Andantonius) September 30, 2022
This is petty and kind of a joke, but also a good example of what I mean when I say that AI is not taking our jobs any time soon. Even if it can get you 75% of the way there, 75% is not enough and you’ll still have to be able to draw and paint in a professional context.🤷♀️
— Jon Neimeister (@Andantonius) September 30, 2022
One thing that Stable Diffusion has proven to excel at is porn. There’s a very popular extension of the current model trained on fifty thousand NSFW anime-style drawings called Waifu Diffusion. This makes sense; porn is one area where good enough is usually good enough. A big boob generator with sufficient verisimilitude is going to be a big hit no matter how it works under the hood.
Which brings us to the government. They’re getting upset, too–Rep. Eshoo (D-CA), representing Silicon Valley, is urging the government to Do Something about such “unsafe AI practices”. As a result, StabilityAI, stewards of Stable Diffusion, will be making a number of changes to the next version of their model, including making it unable to generate porn. (Of course, since the model will presumably be public, and can be retrained, that particular feature will last all of ten minutes…)
So, basically, the future of AI image generation is here, much sooner than most people would have predicted. You no longer need an invitation to participate. You no longer need to pay for expensive cloud processing power. You barely even need a gaming laptop. Even OpenAI has opened up DALL-E 2 to everybody, now that StabilityAI has drank their milkshake, and those guys have always been super conservative regarding public use of their models.
What’s this have to do with Samwise, you ask? Well, I finetuned my local Stable Diffusion model to be able to generate pictures of Samwise using the custom token samwisethecat. It took some know-how and an hour of processing time. Now I can make images like the one up top (and do click to embiggen, WordPress has pretty lossy image compression), and this one, which you may also click to embiggen:
It doesn’t exactly know what a playing card is, but that’s just the sort of thing you can go touch up after the fact.
Exciting! Scary? Cool. I think so, at least.