There have been some big changes on the AI-generated-images front lately. Most of them have been spurred by a model called Stable Diffusion, which has two novel features: its architecture, and the fact that the creators released it for free use at home.
Architecture-wise, it’s a latent diffusion model, which (roughly speaking) combines the best parts of several different architectures into something flexible, powerful, and, most importantly, small. OpenAI’s DALL-E 2, which I’ve written about before, could theoretically be run on a $1000 consumer graphics card, but more realistically you’re looking at one that costs at least ten times that much–and you’d need to get your hands on the model itself, which you cannot. Stable Diffusion has neither of these problems. You can make do with a $150 card, and it’s really easy to download and set up. (If you have an NVIDIA card and want to give it a shot, I recommend this repo.) This has resulted in a dramatic shift in the way people use and think about AI-generated art.
The art world is now freaking out about it to a degree that they were not freaking out about DALL-E 2 or other models. Many artists would like their images removed from the training set, since people are making images mimicking their styles. Particularly outspoken on this front is fantasy artist Greg Rutkowski, whose name appears in so many image prompts that you’d be forgiven for thinking Stable Diffusion was a Greg Rutkowski painting generator. There’s something to this–it’s really easy to generate a decent composition of a recognizable subject in something approximating an existing artist’s style. Certain art gigs may be in peril here, especially things like “indie sci-fi/fantasy board game illustrator”, “corporate noodly-arm clip-artist”, “lovecraftian horror concept artist”… you can peruse the images (sometimes NSFW) that were made during the Stable Diffusion beta to see what I’m talking about. (Notice how it’s all a little same-y? That’s because like two-thirds of those include Rutkowski in the prompt. Poor guy.)
But there are clear limitations to these things. (Read on for a discussion of that, and porn, and another Samwise drawing.)
These AIs aren’t good at producing more than one subject in a picture. Things are usually a little incoherent. The resolution is quite low unless you have specialized equipment–with a few exceptions, my gaming laptop maxes out at about 1024 pixels on a side, which at the 300dpi typical for a high-quality print gets you about 3.5 inches. Perhaps most damning is that the best you can usually do is good enough. At the risk of sounding like an essentialist, these pieces lack spark. There are no visual puns, no relationships to other generated images, no symbolism, none of the things I associate with good art. You end up with subpar low-end commercial pieces–the sorts of things sci-fi and fantasy publishers would bulk purchase in the 1980s to stockpile for book covers–but worse. Will this be good enough for many people who would otherwise hire artists? If not now, then soon, I suppose.
I’d argue, though, that when the dust settles, you’ll see these models as a productivity multiplier for existing artists. They are also useful as a major shortcut for visual-art hobbyists like me, who could puzzle our way to an image from scratch, or get 75% of the way there in a browser window and then paint over it. This tweet has a good example of what the model produces vs. what an artist might do with it:
“AI is the future of art” pic.twitter.com/BH3qnhN1Kp
— Jon Neimeister (@Andantonius) September 30, 2022
This is petty and kind of a joke, but also a good example of what I mean when I say that AI is not taking our jobs any time soon. Even if it can get you 75% of the way there, 75% is not enough and you’ll still have to be able to draw and paint in a professional context.🤷♀️
— Jon Neimeister (@Andantonius) September 30, 2022
One thing that Stable Diffusion has proven to excel at is porn. There’s a very popular extension of the current model trained on fifty thousand NSFW anime-style drawings called Waifu Diffusion. This makes sense; porn is one area where good enough is usually good enough. A big boob generator with sufficient verisimilitude is going to be a big hit no matter how it works under the hood.
Which brings us to the government. They’re getting upset, too–Rep. Eshoo (D-CA), representing Silicon Valley, is urging the government to Do Something about such “unsafe AI practices”. As a result, StabilityAI, stewards of Stable Diffusion, will be making a number of changes to the next version of their model, including making it unable to generate porn. (Of course, since the model will presumably be public, and can be retrained, that particular feature will last all of ten minutes…)
So, basically, the future of AI image generation is here, much sooner than most people would have predicted. You no longer need an invitation to participate. You no longer need to pay for expensive cloud processing power. You barely even need a gaming laptop. Even OpenAI has opened up DALL-E 2 to everybody, now that StabilityAI has drank their milkshake, and those guys have always been super conservative regarding public use of their models.
What’s this have to do with Samwise, you ask? Well, I finetuned my local Stable Diffusion model to be able to generate pictures of Samwise using the custom token samwisethecat. It took some know-how and an hour of processing time. Now I can make images like the one up top (and do click to embiggen, WordPress has pretty lossy image compression), and this one, which you may also click to embiggen:
It doesn’t exactly know what a playing card is, but that’s just the sort of thing you can go touch up after the fact.
Exciting! Scary? Cool. I think so, at least.
Open thread!
MazeDancer
Samwise looks outstanding.
But here’s hoping AI art never gets “spark’.
Mike in NC
Today we attended a local “Get Acquainted” open house for a recently retired Marine colonel who is hoping to replace our current reactionary Republican in the NC House. Got a free t-shirt, yard sign, and bumper sticker.
NotMax
Am it or amn’t it?
:)
Major Major Major Major
@MazeDancer: I’m not sure how the current architecture could even get there… fixing most of the deficiencies require knowing what a thing is, which simply just not a thing these models do.
Of course, we’ve gone from productionized GANs (most everything before last year) to transformer models (DALL-E 2) to latent diffusion models fast enough to make your head spin, so who knows what near the future might bring.
Another Scott
Neato.
Imagine all the good we could do if we harnessed all that computer power for Good rather than, eh, what’s that??
(via nycsouthpaw)
Cheers,
Scott.
Major Major Major Major
@Another Scott: I saw that! Too funny. I tried it out and of course it’s perfectly capable of making actual fish though.
I mean the other primary use of my GPU is playing JRPGs so.
Walker
It will be really interesting to see how the law shakes out on AI generation. There is a strong argument that an AI image is a derived work of the images in its training set.
This is important because companies like Microsoft are clearly trying to spin up these tools for commercial use. The result is going to be a copyright mess.
Major Major Major Major
One of my favorite things is how bad they are at text lol
Major Major Major Major
@Walker:
IMO it’s not an especially technically literate argument (are paragraphs made by GPT derivative works of its training set, the internet?). I think the most restrictive analogy would be collage, which also isn’t especially accurate, and at any rate most of those are transformative, not derivative. And I really really don’t trust legislators on this one, they’ll just do what they’re paid to, and it will somehow mostly enrich Disney.
moops
Is there a similar technology for music? I would assume yes.
is it just static images, or is this creating movies?
MazeDancer
@Major Major Major Major: Eventually, the lowering of taste and the raising of AI skills will make art a commodity.
With luck, that will be long after I am here.
Major Major Major Major
@moops: I’ve spoken to some people about algorithmic music generators but haven’t actually tried any.
People are making videos with SD, you can search the subreddit for ‘video’ and see some. But it’s done using the “image to image” feature on a frame by frame basis so it’s pretty weird
https://www.reddit.com/r/StableDiffusion/
@MazeDancer: oh, it’s already a commodity. Wasn’t there a Koons post here earlier today? And as I mention in the post I think the commoditized sectors are exactly what this is going to eat into, through a combination of making some artists so efficient that there are even fewer gigs to go around, and removing some gigs from the artist market entirely.
Walker
@Major Major Major Major: There are noted examples of them clearly reproducing an artist’s style. There was an AI “tribute” to a recently dead artist (forgotten the name) that drew a firestorm for exactly this reason. That is very different from collage.
My reading of artists is that even if the law does not agree with this now, there will be a push to adapt copyright law to make this the case.
Major Major Major Major
@Walker:
I talk about an egregious case of this in my post. But since you can’t copyright that, it’s not copyright infringement.
NotMax
Bit of a detour because it’s humorous.
Recently watched one of those ubiquitous unboxing videos for a product made in China. Prominently plastered on the box was the notice PATIENT PENDING.
Old School
So Samwise porn is just around the corner?
Major Major Major Major
Here’s another funny use of SD
oatler
I got acquainted with AI through this video. extremely trippy
https://www.youtube.com/watch?v=WyuvS_CbO5A
Steeplejack
@Another Scott:
Congratulations, you broke the margin on mobile devices again! Just because it’s in a tweet doesn’t make your comment immune.
dm
Just the thing to produce illustrations for Medium posts.
Major Major Major Major
@Steeplejack: oh is that what happened. Fixed it for you all.
dm
I guess I’m just a nerd. This is what excites me: By turning matrix multiplication into a game, AlphaGo’s daughter AlphaTensor beats a fifty year-old algorithmic record
It’s a sweet result with real mathematics behind it.
It’s also a little self-referential, since matrix operations are how these things work.
(ETA: if you get paywalled, just google “AlphaTensor”)
Steeplejack
@Major Major Major Major:
Thanks.
Major Major Major Major
@dm: nice. Good friend of mine works at DeepMind.
Urza
@Major Major Major Major: Been a couple years since I looked at AI coding. Has it progressed beyond feed it the data in a different order if you don’t like how the training is going?
MobiusKlein
My old college gaming group has a discord server, and algorithmic image generation has been a hot topic.
It’s been fun to use to make avatars for online games – want a Trek science officer that looks like Nikola Tesla – boom you got it. He’s also wrestling a 8 limbed cat-spider hybrid.
But then the folks trying to make a living on art in the group are not sanguine about it, as it feels like it devalues their work. It’s been a lot of fun to me, but it’s not my profession on the line.
Gravenstone
I browse a lot of YouTube. Something I’m starting to see is various songs turned into videos where the imagery is reputed to be AI generated. I’ve checked out a couple and it’s interesting but nothing I’d consider earthshaking. But everything starts somewhere.
Eolirin
@Major Major Major Major: I suspect multi model approaches are going to be necessary to resolve some of these issues.
I think we’re probably going to need new types of hardware architecture to make this stuff energy and cost efficient in the long run too.
Eolirin
@Walker: Pretty sure human artists copy each other’s styles all the time, or we wouldn’t have things like impressionism or anime. And artists learn by studying each other’s work.
Certainly nothing stops someone from commissioning someone to create art in someone else’s style. Almost all rule 34 porn commissions are duplicating styles of other artists, just as an easy example. Hell, there are probably more copyright concerns there, as they’re usually copying copyrighted characters. So these aren’t exactly new issues brought by AI.
Major Major Major Major
@Eolirin: the energy efficiency gain from SD is insane. You could heat a room with a TPU for DALL-E but SD runs better than fine on a laptop that doesn’t melt even a little bit. My machine gets hotter playing Fallout 76.
Eolirin
@Major Major Major Major: Yes, but SD has limitations. If solving those requires more expensive techniques, we’re back to where we were on energy costs.
If multiple passes through different models become necessary that becomes even more of a challenge.
But there’s been some interesting research into novel memory architectures that do processing as part of their storage structure (kind of like our brains), and some other compute methods that are more energy efficient, either because they’re using novel materials or because they’re better aligned with how AI functions need to be structured. And quantum computing seems promising for speeding up some functions important to AI techniques, if we can ever get that working right.
Major Major Major Major
@Eolirin: yeah. Was just responding on the one point. As impressive as all this stuff is, it’s still early days
Another Scott
@Major Major Major Major: Thanks.
Maybe someone can fix FYWP to do that wrapping automagically? (URLs seem to get wrapped; it’s not clear to me why text doesn’t.)
Sorry for the bonkering. :-/
Cheers,
Scott.
VFX Lurker
@Walker: The artist, Kim Jung Gi, died recently. The art community grieved the loss of his tremendous skill and his great kindness.
Two days later, somebody used an AI engine and a library of Kim Jung Gi art to generate soulless collages of his work. The same person demanded artistic credit for any collage shared online.
I still don’t have words for this.
Steeplejack
@Another Scott:
URLs contain nonalphabetic characters, which FYWP uses as break points. Alphabetic-only strings are what causes the problem.
Another Scott
@Steeplejack: Ok, but it’s a problem that I think was solved about 60 years ago when CRTs were first used for text display. IOW, “if next character position exceeds right margin, increment line by one and reset column to 1″…
(sigh)
Thanks.
Cheers,
Scott.
Steeplejack
@Another Scott:
Maybe you can resign yourself to the fact that HA⁵ is probably about as funny as HA²⁴.
Brachiator
This is a great topic! I am happy to see a post about it.
I see three problems. There are more, but these quickly come to mind.
The Internet will soon become overwhelmed with a blizzard of mediocre AI generated art. This is the nature of the beast. Most people love crap. And the more crap they enjoy, the happier they are. I recently watched a video podcast about AI generated images. Someone posted an example of “Mickey Mouse preparing his tax return.” Two iterations shown were ghastly, but the hosts and guests all gushed how “cool” it was that the AI engine could come up with something recognizable with apparently just a few descriptors. Later, someone suggested that it would be cool to provide AI generated images to accompany all fiction currently in the public domain. Of course, many novels already have artwork associated with them. But resorting to AI as a mainstay is a bad idea.
Even though it might be crap, a lot of AI generated imagery is probably functional. There are reasonable fears that this might eliminate commercial illustration as an occupation, putting tons of people out of work. The fears may be exaggerated for now, but I can see this having a huge impact in the very near future.
AI generated will probably maintain or intensify societal racism and sexism. Even though there are some non-white executives running tech companies, the gatekeepers and influencers tend to be white males with a similar world view concerning innovation and a blind spot when it comes to issues of diversity. AI images will likely reflect a narrow vision.
Another Scott
@Steeplejack: I didn’t write the tweet.
Tweets with strings of characters wider than the margin works fine on browsers on phones – obviously.
The problem is WordPress and/or the various plugins.
I’m sorry that the embed tweet implementation here occasionally breaks margins on phones when we use the provided functionality, I truly am.
I’ll do my best to remember to check for that quirk / annoyance / breakery when I use the Embed Tweet functionality. But nothing is guaranteed.
Fixing the implementation here would be more likely to solve the problem, since I’m not the only one that it occasionally affects.
My $0.02, FWIW.
Cheers,
Scott.
hilts
OT
Fuck the goddamn motherfucking NY Yankees.
Steeplejack
@Major Major Major Major:
For the “cozy mysteries” file: Magpie Murders.
Major Major Major Major
@Steeplejack: hmm if I’ve read the book will I be irreparably spoiled?
Major Major Major Major
@Brachiator: art subreddits are already drowning in AI crap. And it is crap, not because of the limitations of the tooling really, but because you still need an artist’s eye and patience to make something worthwhile, and most likely some chops around digital painting etc. to make the pieces really shine. Everything on that website I link to is same-y because the prompts are all written by nerds who can only think of things they’ve already seen. There’s nothing wrong with that, really… but…
People are making some truly amazing stuff assisted by these tools, but it’s not what’s getting promoted by the nerds with no taste.
An artisan product is always going to be better than a mass produced one, but as you say, this will eat into lots of opportunities, mostly in the “forgettable crap” space. Which is how a lot of artists capable of making memorable non-crap make a living.
I’m really happy I’m able to make these cat pictures. But would they be that interesting if it weren’t Samwise? Nope. Will they keep me from some day hiring an artist to paint him? I mean maybe, on the off chance I was going to do that.
You used the word ‘functional’—that’s the word I was looking for when I was writing this! Especially relevant for corporate and adult material where ‘functional’ is king.
Steeplejack
@Major Major Major Major:
No idea. Haven’t seen it yet. You’ll have to suspend your pre-knowledge of the smoke-and-mirror bits, I assume. But Anthony Horowitz did the adaptation of his own novel and has a pretty good track record, e.g., Foyle’s War. I’m somewhat optimistic.
ETA: I guess it depends on how well you can enjoy things where you already know the spoiler-y stuff.
kalakal
@moops:
There are quite a few programs that claim to be AI in music these days. Some are ‘smart’ plugins such as compressors, Eqs etc – actually some are really good eg Gullfoss, Sonibles ‘Smart’ series etc. There are also ones that claim to be AI for mastering & mixing and they do a decent(ish) to good job eg Exonic, Ozone. If you want to concentrate on the music but have a decent sounding product without having to learn mixing/mastering they’re great. (or you can use them as suggestions.
There’s a lot of algorithmic rhythm, melody, chord, harmony generators and there are programs that will produce complete bits of music. The former can be very useful for getting ideas, the latter tend to really samey. It depends what you mean by AI but I think it’s a bit of a stretch for a lot of this stuff.
Kirk Spencer
Accountants who experienced VisiCalc/Peachtree and librarians with Google among many others would like to welcome static visual artists to the consequences of modern technology.
Major Major Major Major
@Kirk Spencer: librarian employment figures haven’t really changed and accountants are keeping pace with population. Travel agents on the other hand…
Starfish
Thank you for the perfect cover for my soon-to-be self-published book novella short story haiku “Samwise Seeks His Waifu.” From the picture, it is clear that Samwise enjoys long walks on the beach.
Kirk Spencer
@Major Major Major Major: I checked. I sit corrected, led by my impressions instead of data.