Saturday, January 25, 2025

An Engineer does (AI) Art - early local gens

Two years ago today, after a period where the the HuggingFace Stable Diffusion t2i demo wasn't functional, I admitted to myself that I liked doing this thing, and bit the bullet and made a local, CPU based, install of the Automatic-1111 program on the high-powered workstation I'd bought a few years previously, and found that I could generate a 512x512 image in "only" 12 or so minutes. The world was now my oyster.

So I retried the sort of prompts that had worked previously, only to find that the NAI model used in the base install instructions responded rather differently to the SD1.x I'd been used to. But no problem, now I could try other models! And by the end of the day, the best result had come from Waifu Diffusion 1.3 -

which is probably the only tolerable result I ever got from that model.

The next few days were a learning curve - that SciFi in a model name did not mean it could handle green Treens, or other similar staples, and that my purposes were better suited with anime models; the care and feeding of your VAE, use of LoRA and embeds, particularly negative prompt embeds. And so I embarked upon the project to illustrate the stories I'd written long ago.

And that was where the limitations of the technology became apparent. While generic 1girl pictures were simple enough to achieve, trying to translate from mind's eye to image via prompt was less so. Even simple descriptions of clothing like "black top and green skirt" or "green dress with white belt" were enough to confound matters; and while simple scenery was possible, the system often liked to insert a 1girl unprompted, as here -

but that could be treated as serendipitous in the right contexts.

With plenty of scenery or single-character scenes in the various stories I was trying to illustrate, I kept on trying different ones, building up a repetoire of test prompts for comparing new models/LoRA/whatever (𝕏/twitter thread). I might not be directly achieving what I wanted, but the results were generally pretty (even if faces often needed inpainting to fix).

But then some time around mid-April, I reached a point where the test-card activities (see a new model, ideally with an idiosyncratic style, and run the scripted set of prompts over it), and messing around for fun took over from the original intent to illustrate, and I started to dabble in AI art twitter as a brash and shameless n00b, generating things that had no illustrative intent, but scratched the making pictures itch.

Also at this point I had a catastrophic motherboard failure on my old workstation, and faced a decision point...

No comments :