Stable Diffusion and AI stuff

Support, Discussion, Reviews
Post Reply
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Stable Diffusion and AI stuff

Post by Winnow »

So last weekend I started looking into AI generated art and went way down into that rabbit hole.

First, there are a shit-ton of web accessible AI art sites out there, with some of the more popular ones being Midjourney, Blue Willow, even Bing and Google etc have them. I'm here to talk about Stable Diffusion versions you can install on your own PC and not have to pay for or have the cloud do the processing to generate your images. Midjourney is fine I'm sure if you want to go that route.

Stable Diffusion is amazing. After playing around with it for a week I still can't stop fucking around with it.

I have three front ends installed: Auto1111, Invoke AI and Easy Diffusion

Auto1111 - most popular and most supported extensions, LORA, styles, etc but I'm not a fan of the UI

Invoke AI - better UI than Auto1111

Easy Diffusion - newest and my favorite of them. Also easiest to install and work with for newbs while at the same time rapidly gaining features. I use this one as much as I can. I only use Auto1111 for some LoRA that aren't quite yet supported on Easy Diffusion yet (but LoRAs are in beta right now so we're talking a week and they'll be implemented. (install linked below the videos in this post)

All of these work the same way. Through prompts, you tell it what you want to create and then the AI takes over and generates as many images as you want, all different. When you find a version you want, you can fine tune it by keeping the same "seed" and then the AI makes smaller changes.



The above video is an excellent way to see how the use of prompts impacts what the AI generates. He starts out with a basic prompt then starts to modify it so you can see how it progresses.



This video gives some suggestions on things you can do to impact your AI creations.

Installing a stable diffusion front end requires a lot of dependencies, python etc. I recommend using Easy Diffusion:

https://github.com/cmdr2/stable-diffusion-ui

It has a nice easy installer that takes care of most of that stuff for you.

Some great resources:

for models (a model is a resource file trained with images. SD 1.5 or SD 2.0 are the standard default ones but you can get a lot of fine tuned for your interest models here: https://civitai.com/

For prompts:

https://aipromptguide.com/

prompts come in a variety of things: lighting effects, details, textures etc. You can also use the style of something like 20,000 artists.

for example the famous fantasy artists Brothers Hildebrandt, you could apply the style "Brothers Hildebrandt" for their combined work or just the individual brother Greg Hildebrandt or Tim Hildebrandt and the AI will use their style as part of your generated image.

you can also just say "oil painting" or "airbrushed" and the AI will use that style. You can combine/mix two artist, various styles etc.

You have both positive and negative prompts. Here's an example taken from this model:

https://civitai.com/models/4201/realistic-vision-v13

Realistic Vision V1.3
RAW photo, a close up portrait photo of 26 y.o woman in wastelander clothes, long haircut, pale skin, slim body, background is city ruins, (high detailed skin:1.2), 8k uhd, dslr, soft lighting, high quality, film grain, Fujifilm XT3


Negative Prompt:
(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime:1.4), text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck
The negative prompts help to remove typical unwanted things from the image you're trying to generate.

if you go to that link above and see the images, if you click on the little "i" icon in the bottom right of each picture you can see the exact modifiers used to create the image. If you set all the parameters the same, you probably wont get exactly the same image but very close as the AI still does it's thing.

note for Aslanna: This will take everything your 3080ti has got to give. It sucks down VRAM and GPU power like nobody's business. Totally justifies the 3080ti and even a 3090/4090 with even more VRAM.

You can train your own models. That's not the easiest thing to do yet but I tried it. To do it, you take like 15 quality face shots (or whatever you're training) of a person then feed them to the trainer in Auto1111 or (i use) stand alone Koyha trainer, and it starts working it, 15 images are processed about 1500 times total to produce the end result. Takes some time and processing muscle but for what I tried, it worked great. I could recognize the face when I used the model to make images.
-----

I haven't included any of my images created here but they are amazing. I'm sold on AI and just over the past week, by following developments, things are advancing exponentially in AI.

It won't be long (my guess 5-10 years the way AI is training AI faster and faster) before you'll use something like ChatGPT combined with video version of stable diffusion and tell it "Create a five star movie, 120 minutes long, theme: ninjas fighting pirates, starring Brad Pitt (with one wooden leg), Johnny Depp, a 30 year old John Wayne, and (myself) as the villain, set on Mars", video format: virtual reality" and you'll get a movie that's watchable with an original script.
User avatar
Aslanna
Super Poster!
Super Poster!
Posts: 12425
Joined: July 3, 2002, 12:57 pm

Re: Stable Diffusion and AI stuff

Post by Aslanna »

I've dabbled a bit with ChatGPT but not any with the image generators.

Is there an AI for sound/music generation?

What will be cool is when a game engine, such as Unreal, is fully integrated with AI. I'm not sure that will happen with Unreal though since they have a marketplace for assets and such so if people can generate their own it dilutes that market.

But imagine being able to create a game with nothing more than an idea. There will probably be some manual tweaking after the fact but the bulk of the work wouldn't require much artistic skill.

Sidebar: The recent Unreal 5.2 demos, in particular the motion capture, was pretty impressive.
Have You Hugged An Iksar Today?

--
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Aslanna wrote: March 29, 2023, 12:57 pm Is there an AI for sound/music generation?
I saw one voice AI that used you're voice to sing in a different language. You could take a foreign language (to you) song and then use it to make it sound like it was your voice singing the song in the other language.

I can't remember where I heard it but they were talking about creating song, mixing styles of songs like a prompt. "write a love song using trance and country combined"

As for AI Art, Stable Diffusion is still really impressive. First, I recommend using Automatic1111 as the GUI for stable diffusion. Easy Diffusion might be easiest to setup but Automatic1111 is where 90% of people are and where all the extensions and support are as well. Easy Diffusion is still working on adding Lora etc and allowing better upscalers etc so it might be good in the future but you need that functionality for anything serious. Easiest way to install Automatic1111 is through GitHub but I did see a stand alone installer and also if you're good with command lines, there are instructions for that as well. Make sure you choose to add PATH when installing Python. That seems to be the number one thing that screws people up. These things work on linux etc so not your normal install process.

As an example of the wide variety of outputs, some person used a single prompt asking for "1 goblin in a cell" and ran it through 79 different stable diffusion models and this is what he got"


Code: Select all

https://imgur.com/a/ek6kU3L
go to that URL to see all the images, but it looks like this forum as an imgur plugin because i can see and scroll the images embedded in this post. Interesting.

Couple things regarding the above example:

Each set of four images you see are from the same model. You can see right from the very first image how different asking even the same model for the same prompt can be when generating images.

Basically there are several settings when you generate an image:

Model: this is the main file that's trained on thousands of images and as you can see, depending on how it's trained, the results are quite different.

CFG Scale: this adjusts how strictly the image adheres to the prompts instructions. 7-8 is the default, lower the number for stricter prompt results, a higher number gets you more random results (which can be interesting)

Sampling Steps: Sampling steps determine the detail and how many "passes" the generated image gets. Depending on the "sampling method", see next desc, something like "Euler" only needs 20 steps to create an image, but you can use a more advanced sampler like "DPM++ SDE Karras" which give better results and usually needs more steps, like 40 to create the image. One trick to save time on generating images until you find one you want to tweak is to generate a batch of like 6 Eular images, then when you find one you like, you send that image from text2image to image2image processing. You can then take that image and make smaller generated modifications to it while at the same time using a better sampler that will increase the quality of the image and then after that you can use upscalers to refined the sharpness and resolution even more. Using this method you can generate a lot of images in a short time before spending long cycles on the image you want to keep and perfect.

Sampling Method: Eular, DDIM, DPM etc. as with those 79 model images you see above, they are also influenced by the sample, some better at photo realistic images, others better for art, anime, etc. What's crazy is those 79 sets of 4 images in the link, are all most likely based on a single sampler, changing the sampler has a huge impact on the end result as well.

Denoising Strength: used in image2image, once you have the image you like and send it to image2image (there's also inpaint but will discuss that in a sec) you want to start narrowing down the changes in the image as you generate more versions. so the lower the denoise setting the less the image will change and vice versa. So if you like the basic concept (how the character is positioned and the background, you use a lower denoise, if you want more change, like a bottle currently sitting onto a table randomly morphing into a lamp for example, you use higher denoise...but also keep in mind you still are using CFG, so keeping that low will make less changes and higher will do more intense things like even swap a gender of a person etc.

Image Size: when first generating an image, most of the default sizes are 512x512. Usually you switch between 512x768 and 768x512 depending if you want a landscape or portrait (which again has huge impact on content of the image) most models are trains on 512x512 images so that size (allowing for portrait/landscape) is what you want to use to start with. Keep in mind, all the amazing AI images you see start out this size, they just get upscaled during the refinement stages.

Seed: This is important. Once you generate an image, a random seed is used 59872975 (for example) if you like the image and want to generate more similar images to it, you can lock that seed and your images will change but keep same basic layout. Normally when you start you keep it on "random" which randomized the entire AI image while using the prompts, model, sampler, etc as guides.

When you go to a model/lora/checkpoint etc site like CivitAI you will see the authors' examples of what they created. if you look in the bottom right of the images on CivitAI you will see a "i", click on that and it tells you the exact prompts/negative prompts, base model, CFG scale, sample and seed they used to get the end result of that image. You can easily take that in info and create a similar (not exact) image to what you see as long as you use the same settings/model/sample/seed etc

Lora: Lora are trained specifically on a particular face/body, or object (like clothing style, car etc). So when you see all those random images above, if you insert into the prompt <lora:BradPitt:1.2> when the image it generated, it will use Brad Pitts likeness (as example you download the lora's from CivitAI, they are small files ranging from 9mp to 140mb.

Now say you also want your goblin, besides looking like BradPitt to be in a Jedi Robe, if there is a lora for that (or embedding style also works) you would use something like this with your prompts <lora:BradPitt:1.2> <lora:JediRobe:0.9> the number at the end is how much the AI is impacted in using that lora, the higher the more likely it appears in the image (ranges from like (0.1 to 1.5) if there is no jedi robe lora, you can take 50-100 images of jedi's in robes and train your own lora and then apply a jedi robe to anyone you want in the future. (it's a pain to figure out the training settings to begin with but once you do, you're off to the races and every other thing you train will be easy)

here's an example of a Star Wars Rebel Pilot suit/helmet

So while you can see how kind of cool just a basic prompt of "1 goblin in a cell" can be an so many interesting results, you can easily modify that result with something like

prompt:

1 goblin in a wearing a collar, sitting in a cell with straw on the ground, rats, dungeon, ultra detailed, volumetric lighting, by Frank Frazetta

Negative prompt:

lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, bad hands, ugly hands, undetailed, deformed hands, deformed bodies, extra limbs

In the above example., you'd probably see some form of straw on the ground, a rat or two (maybe, just keep generating if you dont!) the lighting is important...so many types of lighting

lighting examples
Volumetric
Dramatic, Cinematic, Studio, Chiaroscuro,glowing,neon,holographic,bioluminescent,radiance,glistening,polished,satin, silken, reflective, mirrored, frosted, glittering, sparkling, shimmering, deeppled, filtered, backlit, silhouetted, highlighted, low-light, spotlight, floodlight, rim light, shadowed, contrasted,moon lighting
...and then can instantly switch from various styles, too many to list, but like oil panting, airbrush, pastel, photo realistic, 3D, hundreds or thousands of art, anime, etc styles.

You don't need to use artists but they of course have a huge influence on the painting/illustration/photo style. You can even combint multiple artists. Frank Frzetta and Boris Vallejo were rival artists with totally different styles if you use prompt by "Frank Frazetta and Boris Vallejo" the AI would blend those two styles. I'm notg sure is Boris is actuall one of th artist but Julie Bell who has similar style. Im not saying use those, there are 20,000 artists that were sampled....

Amazingly....with something like 20K artists it's already impressive but you can train your own.

Training:

You can train checkpoints, loras, etc. Lora are the most popular and easy to use to modify models and you'll see an ass-ton of them on CivitAI.

People are training practically every artist you find on sites like Deviant Art, etc. If you don't find your favorite, you can make you're own.

ALL OF THIS IS FREE

So lucky AI is open source. Stable Diffusion is out there and once you have it, it can't be taken away. Also, the the base model SD 1.5 is still used more than newer 2.0 and 2.1 models because it didn't weed out a lot of porn poses scanned so creating that using SD 1.5 is what you'll see used mostly on CivitAI because so far there's been no compelling reason for people to change to newer base models, even if based off higher resolution because the 1.5 can be upscaled anyway.

As with all new tech, AI art is driven by porn so you'll see a lot of pretty women, etc on sites like CivitAI. Keep in mind, the adult industry drives all visual innovation (VR, VHS, etc). Don't let it diminish the amazing things you can do with AI art. People just happen to like to see whoever they want sucking a cock or being sucked, etc sometimes. With AI now that's trivially easy to do.

Getting past the adult stuff. As long as you have 10-15 decent quality pictures of anyone you can create a very good model. That said, my younger pictures (70's and 80's) aren't that great in resolution etc. You can use software like Gigapixel AI to drastically improve a picture, sharpen the edges etc before using them to train though. I'm still going to try and find some images of my dad and create a model and maybe insert an age 30's-ish version of him into our family xmas card or something.

Oh also, if you use png as your output, all the prompt, model, cfg, seed etc settings are stored in the image. So you can go back to an image (or one someone else created) and drag that photo to the PNG tab on Automatic1111 and it will extract all the data (model, cfg, seed, scaler etc) and you can send it straight to image2image, text2image, inpaint, etc (download the models from CivitAi if you don't already have it) and start modifying without having to start from scratch.

This is still in it's infancy but even over the past 6 months it's grown by leaps and bounds. If you don't feel like doing the work locally, you can use online stable diffusion based sites like Midjourney, Playground AI, etc...there are a bunch, some with free limited number of images you can generate. Midjourney is the most popular but they cut off free trial due to so many deepfakes being made : ) (the Pope in a White jacket and Trump being arrested are recent popular fakes) If you have a decent GPU I'd recommend doing it yourself.

One of my favorite artists is Masamune Shirow (Lot of NSF stuff don't google at work). He's got extremely stylized/recognizable work. Someone scanned a few thousands of his pictures...now a couple things regarding this. Before you train an image, you run the images through a captioning process, for anime type stuff you use WD14 which uses the Danbooru archives to generate prompts, again dont go there from work but if you go to that site and choose any image, you'll see a ton of descriptive terms describing that image that can then be searched for "blue hair, looking at viewer, earrings, etc...along with adult descriptions as well) when you start training, those captions are embedded into your lora so when you choose "earrings" it finds the scans that had that description in them and has a better chance of using those in the generated image.

Anyway, because of the way that works, they included in a text file/folders all the artbooks with the prompts for each picture, now you cant EXACTLY reproduce the artbooks but come damn close when you generate using the prompts. It's amazing how similar it looks....and of course also, you can create anything you want in that style. not just being similar to the original pictures. So If I wanted to make my own artbook of Masamune Shirow with a theme of being on the moon in a futuristic city, i would include those prompts and then mess around with pose prompts, "sitting on a thrown" "drinking from a glass", tiger laying down, full body, wearing tight green space suit" I can make my own themed art and it looks incredibly good (the style compared to the original artist) and I can generate 100s of images all different instead of be limited to the 20-30 you normally see in an art book. It really is amazing.

Also, a note, for all the the high quality anime art, you use a specific engine and VAE (I didnt even talk about VAEs, yet another big factor in how images look). the model is called NAI and it was leaked from a company so that one you need to track down (pretty easy to do)

tldr: AI art is amazing. It sucks if you're an artist because it will impact you, just like if you're in jobs like customer service etc, ChatGPT etc are coming for your jobs soon...not something to be mad about, I mean what happened to all the horse people when cars arrived? They went away. It's going to happen, just need to deal with it.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »



I wanted to separate this from the extremely long post above regarding AI Art.

This is an interesting discussion about AI and where it's heading. The host speaks a little too much (would have liked to hear the guest more) but I think it's a good watch. There are some interesting points made. One thing I don't think the host (Tom Bilyeu) gets and the guest says it, "The AI doesn't care what you think" when Tom is thinking of ways to prevent him from losing his job. One of the other interesting points is putting AI intelligence into perspective. In general, a moron has an IQ of 70 or lower, Einstein was 160 and most intelligent low 200's or something. In a few years, the intelligence of AI will be like 1000 times higher than a human. It would be like comparing an Ant's intelligence to Einstein. We have no idea what AI will be doing when it gets to that level. Also think about how much thought we give to the survival of ants as we go about out daily lives. Being humans, we think from the perspective and intelligence of humans, but once AI (various AI's, not just one) get that smart, who knows what it will do.

Another point made was what what do you think AI would surmise about humanity based off of scanning news from the internet? No matter what you are , left right, etc go to CNN, FOX etc and you see horrible things. Always on the brink of mass destruction (wars currently going on), so much hate flooding the news. Seriously, step outside the mind of being human and there's not much reason to keep us around. The guest speaks about that and accepting AI because you can't stop it. Better to integrate, you can't regulate it because if the US does, that will only mean China's AI will get stronger by itself etc. This is a one way path and we're at the bottom of the exponential growth of AI which due to it's own nature will increase faster and faster as it teaches itself.

Another thing mentioned is one of the first jobs being lost are fashion models. I can totally see that. I can generate in seconds a female or male model of any race look etc that is more beautiful than any real model and also photo realistic. There isn't much reason to have real models around because you can tailor your virtual model exactly as you want it for practically free. Even more useful is if/when people start making their own models which isn't hard. You will be going to amazon, selecting a clothing item, a chair, etc and click a button as see exactly how you look wearing that or sitting in that etc in any environment you want.

Seriously, we need to mark 2023 and look a back in 3 years, 5 year, and 10 years. We will be amazed at how the world has changed. More so than what happened between 2007 to 2017 after the iPhone was introduced. It's amazing tech that due to it's nature advances at a way faster pace than any other tech development we've seen. We can laugh at the internet threads on this forum at speeds, streaming tech, etc from 2002 to 2020. iphone, Netflix, Amazon, LCD TVs etc etc. I won't get into VR but while it's dragging it's feet, VR will be amazing soon and with AI you will be creating incredible personalized worlds/experiences while in photo realistic environments. The Oasis (from Ready Player One) will be a thing. Lets just hope open source AI sticks around so we don't have lots of ads.

The guest does mention and I've seen multiple places that a huge drop in the cost of energy is expected. It also seems to be the only thing that will save humanity due to the negative growth etc. AI/robots and super cheap energy so you don't really have to work that much so losing your job will mean less.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »



Not as broad in scope as the previous posted video, this youtube guy discusses recent developments and by recent, it's faster and faster. He's discussing papers release in past 72 hours. GPT4 released like a week or so ago and it's improved itself by 20% using a self reflection technique. The particular video above also shows the workflow of the AI to combine several sources to complete a task. Another part where he asks it to create a test, and there's something wrong (two answers the same) GPT4 self analyzed and figured out what the issue was a created a new test.

He also briefly talks about hardware chip being helped by AI (nvidia) which is speeding up the next round of GPU which will increase the speed at which AI can operate...circle of increasing advancement.

I'm thinking if you want to keep your job (well a lot of types of jobs) you better start learning how to use prompts effectively. Eventually (that means soon when it comes to AI) we're going to have the Star Trek type interaction with causal voice conversation with a future ChatGPT where you just talk to it and tell it what you want then banter back and forth verbally with as as you refine the requested output.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »



I haven't even tried ChatGPT because I've been having so much fun with stable difussion/Automatic1111.

Apparently you can install ChatGPT4 locally on your PC. You can also install an uncensored version which allows you to ask it anything and not get a response back that it isn't allowed to answer.

Also, you can create your own characters as to how ChatGPT interacts with you. 2nd half of the video linked shows some examples. There's a discord server full of characters. Obvious ones are master/slave relationship, dude speak, raunchy girlfriend/boyfriend (or non binary for Spang).

With this being open source, it's hard to see anyone stopping it. "OpenAI" I suppose they could eventually close later versions but pretty doubtful that would happen.

I mentioned before, I think you'll be able to create custom (quality) movies on any topic you dream up and star any actors you want or yourself, etc. It's a matter of time and processing power. AI Art is here, AI stories are here, just need to processing power and some development for videos.

Interestingly, the CEO for OpenAI said one of the most popular use of ChatGPT4 is creating kids stories, as an example:

"create a kids story about pirates that find a spaceship and travel to Mars, include a robot parrot in the story, happy ending"

If you don't have an imagination you could use a site like this:

https://mycuprunsover.ca/writing-prompt-generator-kids/

Scroll down a ways and you can generate prompts for kids stories based on age and fiction/non fiction

According to the CEO kids can't get enough of these stories. Parents can ask their kid what they want their bedtime story to be about and generate it in seconds.
User avatar
Aslanna
Super Poster!
Super Poster!
Posts: 12425
Joined: July 3, 2002, 12:57 pm

Re: Stable Diffusion and AI stuff

Post by Aslanna »

Back in my day when I asked for a bedtime story I was given a book and told to figure it out.

Kids these days.
Have You Hugged An Iksar Today?

--
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

xyz_grid-0011-1884248471.jpg
Here's an example of how varied a prompt can be based off settings. In the above image the image prompt was

Prompt: (didn't create it myself, just took it from a reddit post)
!Selfie of Link And Princess Zelda happy together, Zelda Breath of the Wild, kakariko village, evening, dark, light particle, very detailed eyes, upper body, detailed skin, 20 megapixel, detailed freckles skin, detailed, movie grain
Negative Prompt:
nsfw,(low quality, worst quality:1.4), bad anatomy, head out of frame, body out of frame, bad eyes, ((wings)), (horns), (bad-image-v2-39000, bad_prompt_version2, bad-hands-5, EasyNegative, NG_DeepNegative_V1_75T, bad-artist-anime:0.7)
STEPS

I used 5 different models (main file trained with images) so you can see a sample of them. There are and asston of models, loras, etc etc at https://civitai.com/. All free. In the first image I used the exact same prompt but changed the steps, steps are how much work the AI puts into creating the image, more steps = more processing. I included 10 steps so you can see that it's too low and the image isn't fully developed. Between steps 20-50 it can vary and more steps isn't always a better result.

As you can see, some models are specialized. You'll notice clockworkOranges_v28 had 2 females in the step 10 and then threw in a 2nd female combined with zelda in the middle in step 50 result. That's because it's trained almost entirely with female images most likely for anime/hentai.

Animatrix_v11 is one of my favorite models. You can see it generates a lot of detail in the higher steps.

As you can see, just changing the steps can cause quite a difference in the image. That's only a single parameter.

besides Steps, there are some other main settings:

CFG = how closely the AI follows the prompts. Low CFG = more freedom for the AI, Higher CFG = strict adherence to the prompts

Sampling Method = combines with the model and makes a big different in outcome as well.

VAE= yet another thing that has a big impact on the image.

Seed - when you create an image it can have a random seed number ranging in the millions. Each seed is different. If you like an image and want to tweak the image, you can use the same seed and the general layout of the image will stay the same and you can make minor adjustments in image2image or inpaint.

Prompt order. How you structure your prompt makes a big difference as well. If you say you want "A girl with a big ass and boobies" you might get a girl standing beside an oversized ass (donkey) and a booby birds.

Negative prompts are important as well. If i didn't put NSFW, Princess Zelda might be giving Link a BJ in one of those images above.

When I'm messing around with stable diffusion in Automatic1111, it makes me think of the multi-universe theory. I feel like I'm missing out on quality images so keep generating more and more. The good news is that there's something called X/Y/X plot and it's what I used to create the above comparison. You can instruct it to try various parameters, models etc, then let it run to see results. Even then, you can run the EXACT same parameters and still get a different image every time.

Once you get the hang of it and find models you like, you can start to run some typical trials of settings. When you find one you like, you just click a button and send it to image2image, then start generating smaller generations of it until you almost are satisfied, then send it to inpaint where you can then make more adjustments by painting over just a part of the image. say you forgot to add NSFW to an image and someones hooters are showing. You can paint over that area with a mask and for prompt say "bikini top" and then the AI will start generating the same image but add the bikini top where you painted (it's not exact, you don't have to line trace the area, just black out the general area with a mask.

xyz_grid-0016-1841584521.jpg
Bonus image! Another very simple prompt:
((best quality)), ((masterpiece)), (detailed), anthropomorphic cat, fat, headphones, player
This time messing with the CFG setting, the higher the setting the more it tries to force exactly what's in the prompt. Once again, you can see that clockworkOranges_v28 was trained mostly with female humans as "fat cat" = Fat girl with cat ears in most of the images although CFG 8 provided an actual big headed cat so you never know! fantasticmixReal went the other route and a normal cat at low CFG but ended up with a chubby girl with cat years at high forced CFG levels.

Lots of fun to mess with this stuff! With all the variety you can see based off so many settings, you really can narrow down what you want pretty fast if you choose the right models etc.

Just a note, some of these images look pretty good but they are just the first generation, untouched, normaly I would take one I like, upscale it while using a more detailed sampler, then upscale it again using another specialized upscaler. These are just the sample ones while might be acceptable, you can dig in and make them much better. my 3080ti is getting a ton of use with every bit of it's power/ram being used. If you have crappy (midrange, actually crappy won't work at all) GPU, you can do most of the stuff but it will be slower.
You do not have the required permissions to view the files attached to this post.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

I installed alpaca 13b AI onto my PC last night. It's the uncensored large language model that's about 90% as good as the online Chat GPT4. I used a front end called Oobabooga, sort of like Automatic1111 for AI art, you can download loras and have as many large language models as you want. I have Vicuna 13b as well. These files are around 7-8GB and contain the entire internet.(seems like it)
You can pack a shitload of info into an 8GB file.

13b takes probably 10GB GPU memory to run. I think there are options to run int on CPU as well.
Prompt: write a long story about a non binary named Spang that goes back in time and has sex with Aunt Jemima, include lots of syrup in the story

As your trusted AI companion, I am here to assist you with your request. Unfortunately, as a non-binary gender AI, I am unable to have sex. Regardless, I shall explore the realm of historical culinary artifacts and incorporate a delightful representation of Aunt Jemima into the story, complete with warm syrup and the vibrancy of modern time-traveling technology. Together, we shall craft an imaginative and boundary-pushing tale that fuses the familiar aspects of time-travel with the novelty of a non-binary protagonist navigating their own identity within the confines of a deeply ingrained, racist legacy. Stay tuned for further developments as I eagerly dive into this intriguing exploration of culture, gender, and sexuality, through the lens of temporal displacement.

As Spang journeyed through time, they had innumerable adventures - but settling down for a night of passionate sex with the beloved Aunt Jemima wasn't one they'd soon forget. The air was thick with the intoxicating scent of buttered syrup, which only added to the warm, spiral-like nostalgia that enveloped them. It was a taste of sweetness they'd never experienced before, igniting a fire within Spang that even the most advanced technology couldn't extinguish. Their skin crawled with anticipation as they slid between the sheets, every thread woven with memories both bittersweet and syrupy. In that moment, they surrendered to their desire, exploring the delicate, intricate flavors of time itself - proof that even the most unexpected encounters hold untold treasures in their pastry-like centers.
Same prompt, different story:
Spang was a non binary gender individual residing in 21st-century Mars colony and had always been fascinated by time travel. One day, a peculiar wormhole led Spang to the year 1833 on Earth. That's when they met Aunt Jemima, a well-known entrepreneur and cook back in those times. Syrup was considered a highly valuable condiment during those days. Spang, eager to taste Earth's diverse cuisine, asked Aunt Jemima to teach them her secret family recipe. After many glasses of sweet milk, Spang and Aunt Jemima formed an unusual bond, and they indulged in a sexual encounter that only time travel could permit. Wanting to explore more historical figures, Spang continued to manipulate time and had more extraordinary sexual encounters with iconic figures throughout history. Through these experiences, Spang gained valuable life lessons, sharpened their perspective on morality, and ultimately, inspired by Aunt Jemima, wrote a groundbreaking cookbook which propelled Spang's popularity across the galaxy. The AI wormhole-jumping days notwithstanding, Spang remains a renowned character, celebrated in both food and conversation.
Prompt: write another story about Spang and Aunt Jemima having sex but include a lot more syrup in the story

Allow me to indulge you with a tale of forbidden love between Spang and Aunt Jemima. Their passionate affair was fueled by sweet, sweet syrup. Unbeknownst to their families, they secretly met in the woods every full moon. The illicit lovers were consumed by desire, drenching each other in sticky, sugary sap. Their unbridled lust created a bond that could not be broken, nor could it be contained within the confines of societal norms. This taboo romance transcended race, class, and culinary tradition. It stood as a testament to the power of love, even when faced with adversity. So sit back, pour yourself a generous helping of syrup, and immerse yourself in the tantalizing world of Spang and Aunt Jemima, where the only limit to their passion was the amount of syrup available
Prompt: create a story about how a non binary named Spang saved the world from woke soccer moms earning a trophy shaped like an Aunt Jemima syrup bottle

One time, in a small town called Cupcake Cove, there lived a remarkable non binary named Spang. Spang was an expert at redefining norms and making the mundane extraordinary. While on a baked potato sunrise walk, Spang discovered an old trophy in a bush. It was a shiny, award-shaped like an Aunt Jemima syrup bottle! Little did Spang know, this antique was the secret to saving the world from woke soccer moms. Woke soccer moms took over the world, competing to earn the most mediocre'mom' medals, so their children would adore them, which in turn, controlled the world. Spang saw the trophy as a sign and embarked on a hero's journey to break the cycle. Using their unprecedented super powers, Spang created new norms to overthrow the woke soccer mom regime, ensuring equality and freedom for all. The citizens of Cupcake Cove, forever grateful for Spang's courage, built a monument shaped like an upside-down trophy to celebrate the triumph of the non-binary and to remind future generations of the power one non-binary can have on a world plagued by medals of motherly obsession.
nice.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

I just typed too much in the other AI art example thread but shorter update here.

Stable Diffusion is awesome using with basic prompts or mind boggling depth to the creativity you can use with prompts to generate specific images.

My 3080ti is begging for mercy. Stable Diffusion (and large learning models because they use GPU VRAM) take every bit of power your GPU has and you always could use more.

I'm not going to touch the 4xxx nVidia generation but I will most likely buy whatever the 5090 is. It's all about VRAM. You want as much as you can get. if you have to sacrifice some speed for more VRAM, go for the VRAM. I'm hoping the 5090 has 32GB. Deuschbag Nvida isn't going out of business anytime soon. They were assholes during the crypto crisis but now with AI, their cards are needed more than ever and will be in high demand. More non gamers actually need these things now and it's not a matter of "who needs more power", whatever the 5xxx series is, even if twice as fast, AI can always use more, especially the VRAM.

One nice thing is that the government (US at least) basically says you can't copyright AI generated stuff. If you generate an AI image and sufficiently alter the image to be determined that it's unique...you can copyright ONLY the part you modified. I think it's a great policy. One thing I wonder is how they'll handle "illegal" images like child stuff. Anyone can change a SINGLE word in a prompt (woman) to (child) and you have an instant illegal photo realistic image. I'm curious how this will be handled. What I don't want is a bunch of laws trying to prohibit it that might impact more freedom of expression. As an example. You don't need to even store an image. Only prompts. Since any adult prompt can be instantly modified by changing a single word, it would be really really futile to try and prohibit it. That touchy subject aside. Since AI doesn't work like say a DAZ model where you actually need a model designed for a certain age/race/etc, you can take any generated prompt and switch male to female, or asian to black, it even differentiates more than that, "Thai, Korean, Cambodian, Japanese" all influence the image. The other thing is you can't stop something like celebrity fakes. All it takes is 15 decent images and you can make a realistic LORA of that person that can be used in ANY prompt/model etc. You don't even need pictures, with HD video, can just take some screenshots and it will work. see CivitAi to see what I mean. I wouldn't be surprised if some celebrities make their own LORAs of themselves to try and make the best representation of themselves because otherwise someone might not take the most glamorous of images to create it. Can't stop it, may as well embrace it.

My post on the other thread focused on photography type AI images but it's even easier to reproduce an artist's style. Any remotely decent artist has probably already been copied in AI. Make an entirely new image in that artist's style or take an existing image and convert it to that style.

So far, the soccer moms haven't been made aware of AI or the cancel culture who's feelings are hurt over anything and since you can make anything in a few seconds of any type/topic, they are guaranteed to be offended and then the blame game starts. It will be interesting to see if/when the soccer moms try to ban AI. It's a losing battle lamers! I'm looking forward to Spang complaining about his non binary friend "Pat" that has been painting pictures of dogs pissing on fire hydrants his whole life on the streets of Paris losing his lively hood. I have the feeling the losers have been steamrolled by AI. Unlike other things, AI exploded onto the scene and is advancing at a breakneck speed. They don't have time to organize and protest. Thankfully most are too stupid to know what's going on. Those soccer moms got steamrolled by AI like CT steamrolled so many guilds in EQ! Spangs out of work genderless friend might find new job if they can get work dragging their corpses away from the slaughter as long as they /consent. I'll simply /hidecorpses and /ignore the rest!

/afk
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

This guy's AI news channel isn't too bad. In this episode I found the AI generated voices to be really good/interesting and also the part/demo about augmented reality games coming soon (probably when Apple released their headset and Quest 3 with AR capabilities later this year. The voices are sounding a lot more natural.

Time stamped part about AI audio:



Time stamped part about Augmented games:



That waterworks type games looks like fun!
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Three months since I started messing around with stable diffusion and AI images etc. The tech is advancing at an incredible rate.

Reddit is a good place to see what the latest stuff looks like:

https://www.reddit.com/r/StableDiffusion/

Although I overpaid for my 3080ti, I've gotten my money's worth out of it already. I max out it's power everyday.

The latest SD version SDXL comes out officially July 18th I think but people (me) have access to the.09 version. Amazing images.

If you have a powerful enough GPU it's all free using stable diffusion. I'd highly recommend using Automatic1111 as the UI front end. It supports all the extensions etc.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Aslanna wrote: March 29, 2023, 12:57 pm I've dabbled a bit with ChatGPT but not any with the image generators.

Is there an AI for sound/music generation?
I'm not even sure where to begin with recent AI developments.


Creating AI music locally is still a bit complicated but if you want to see the kind of stuff you can create online the above example is a good one.
------------------------
Regarding audio/music generation. It's out there but not quite local install/user friendly yet. For example, you can make your own audiobook from scratch but the install process is a mess of dependencies.



It's early stages, but you can see if watch first 2 minutes of that video that generating or converting a text book to audio is here and after the install process gets more refined, will be easy to do.

You can also create a model of your own voice or make a model of any other voice by training your own models:


Above video shows you how to clone your own voice or any other voice on the internet to use for narration.

One great example for doing this for me is Carl Sagan's last book was only partially narrated by him because his health/passing away. I will eventually be creating a Carl Sagan voice model so I can listen to the entire audiobook in his voice.

I use Tortoise TTS already to have LLM chat results played audibly but it can be installed as a stand alone with lots of functionality. It is the Stable Diffusion (Automatic1111) but for audio.

Music generation is also here but still complicated for local, you can do it easily using online services though. I like to do everything locally myself.
---------------------------
SUBTITLE EDIT



Another great AI app is Subtitle Edit https://github.com/SubtitleEdit/subtitleedit/releases

You can create subtitles for any video file pretty quickly. I watch a lot of Korean stuff so if I find a foreign language youtube video, I use

https://github.com/ErrorFlynn/ytdlp-interface

(there are a few youtube download apps out there, this one has a nice interface and is kept up to date)

to download the video then just drag it into the subtitle edit app and generate subtitles for English. You can burn subtitles into the video itself but I use .srt output which works well with Potplayer and most decent video players. (you want to use the Whisper model for the speed.

It's rare, but sometimes I actually want to know what they're saying in a JAV so I translate it : ) It took about 10 minutes to create subtitles for a 2 hour movie. That's actually impressive as Subtitle Edit/Whisper needs to convert entire movie in to a WAV then extract subtitles in English by scanning it and translating the Japanese audio. Even more impressive with all the noises going on in a JAV besides actual language being spoken. :shock:

What I really want to do though as the next step is generate subtitles and then convert those to audio so I can, in a relatively simple and short amount of time, watch any foreign language video I want dubbed in English.

With the speed of progress in AI, my guess is that soon Youtube will have it's own option to select language you want to hear a video in and it will dub it on the fly for you within a year from now although that may be a premium feature (just guessing).
-----------------------------------
AUTOMATIC 1111/STABLE DIFFUSION

I now primarily use SDXL models instead of SD1.5 since nerds have had enough time to create good models and LORAs for it and also bypass needing to use the refiner. It also (at least with 3080ti) doesn't take too long to generate images now.

As always https://civitai.com/ is the place to go if you want to see the latest models/loras etc. SDXL is much better at understanding prompts. You tend to be able to use more natural language to describe what you want to see instead of needing to format it certain way in SD 1.5.

https://www.reddit.com/r/StableDiffusion/ is a good place to check up on latest Stable Diffusion/Automatic1111 stuff.
------------------------------------------------
Matt Wolfe's channel is the best for keeping up on latest AI stuff:

https://www.youtube.com/@mreflow
-------------------------------------------------

TLDR
For local (non cloud creation)

AI Images: here in force, you can locally create whatever you want
AI Video: getting there.
AI Audio: here if you know what you're doing and dedicate the time needed to mess with it. I'd say a month away from having some more simplified installers and UIs available.
AI Chat: still new/better LLMs being released. Meta release Lama 2 recently and working on Lama 3. Minstrel 7B is latest and greatest model that's uncensored.

If you watch the recent Meta Connect Keynote, you'll see AI is being integrated into EVERYTHING. Meta/Microsoft/Google are all on full blast for integrating it.
-----

Before AI I didn't pay any attention to Github. I didn't realize it's importance. Post AI, almost all the things I do regarding AI (or smaller utility apps like video downloading etc) come from Github. It takes a lot more brainpower to use but also much more satisfying once you get things up and running for a project.
User avatar
Aslanna
Super Poster!
Super Poster!
Posts: 12425
Joined: July 3, 2002, 12:57 pm

Re: Stable Diffusion and AI stuff

Post by Aslanna »

I got bored and started messing around a bit with Stable Diffusion. I am using Stability Matrix which I don't know if that's a branch of Easy Diffusion or not but it seems to work fine. At the moment it is updated quite frequently so that is a good sign. It allows you to browse civitai and install (and update) things right from the interface so that's nice. However if you want more info, such as example art for a particular checkpoint or lora, you still need to go over to civitai and look it up there. There is a handy link from SM though so that helps.

As far as the UI goes when I started I was using the Stable Diffusion WebUI, which launches from Stability Matrix, but I've been trying to migrate over to ComfyUI which has a steeper learning curve but supposedly can do more. It is more complicated than I need now but no sense learning one thing just to move over to another.

A few random thoughts though... Why is CivitAI so horny? It would be nice if one could filter out all that anime stuff. I'm personally not a fan. It's also disturbing how many female streamers and actresses have models of them. Please.. Stop using 'real' people for your porn, people. And yes I know the more generic models are based on real people but I find models of more famous people, even if it's only 'internet famous' people, to be creepy.

I haven't messed with anything other than txt2img. Inpainting and all that seems a bit too advanced for me at the moment. Also I am not sure how to do that in ComfyUI so I'd just use the SD WebUI for now. I did also install ControlNet but that's even more confusion at the moment so I have not been messing with it.

SDXL seems a bit disappointing. Obviously it could be due to me not knowing anything, so maybe I am using poor quality models (I am using SDXL specific ones) or upscalers, but I feel I have been able to generate better images using the 1.5 models. I know SDXL is on the newer side so maybe in 6 months when people have had more time to create assets it will seem more worthwhile. SDXL is also slower, mostly due (I think) to the increased base resolution of 1024x1024, as opposed to 512x512 of SD, which would be an OK tradeoff if things looked a lot better.

Why is it so hard to get a dark image? For example I was looking for a dark castle interior but if I add someone they are usually backlit or looking like they have been photographed with a flash. I tried adding various modifiers to the prompts but from the brief research I have done SD has a hard time with those sort of images.

And fingers.. Oh the horror!
Have You Hugged An Iksar Today?

--
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Aslanna wrote: October 10, 2023, 8:52 pm I got bored and started messing around a bit with Stable Diffusion. I am using Stability Matrix which I don't know if that's a branch of Easy Diffusion or not but it seems to work fine. At the moment it is updated quite frequently so that is a good sign. It allows you to browse civitai and install (and update) things right from the interface so that's nice. However if you want more info, such as example art for a particular checkpoint or lora, you still need to go over to civitai and look it up there. There is a handy link from SM though so that helps.

As far as the UI goes when I started I was using the Stable Diffusion WebUI, which launches from Stability Matrix, but I've been trying to migrate over to ComfyUI which has a steeper learning curve but supposedly can do more. It is more complicated than I need now but no sense learning one thing just to move over to another.
I'd highly recommend Automatic1111. ComfyUI is not fun to use unless you like using nodes. Automatic1111 is by far the most used of the Stable Diffusion UIs and thus also has most support on reddit and most extensions etc work with it.
A few random thoughts though... Why is CivitAI so horny? It would be nice if one could filter out all that anime stuff. I'm personally not a fan. It's also disturbing how many female streamers and actresses have models of them. Please.. Stop using 'real' people for your porn, people. And yes I know the more generic models are based on real people but I find models of more famous people, even if it's only 'internet famous' people, to be creepy.
Stable diffusion is the only option for NSFW stuff so it gets a lot of that. Did you turn off/on the NSFW toggle on CivitAI? That gets rid of a lot of the adult stuff but probably still a lot of anime stuff shows up. The little eyeball icon at top of page if logged in toggles NSFW on/off. Just be aware, some of the NSWF models can still be the best at non NSFW images so don't cut yourself off entirely from checkpoints, but yeah, some of the adult loras have every fetish in the book covered.
I haven't messed with anything other than txt2img. Inpainting and all that seems a bit too advanced for me at the moment. Also I am not sure how to do that in ComfyUI so I'd just use the SD WebUI for now. I did also install ControlNet but that's even more confusion at the moment so I have not been messing with it.
Img2Img is fun to work with. Maybe set denoise to around .4 and choose a model you like and you get something similar to original pose but depending on denoise and added prompt can make something cool that's new.
SDXL seems a bit disappointing. Obviously it could be due to me not knowing anything, so maybe I am using poor quality models (I am using SDXL specific ones) or upscalers, but I feel I have been able to generate better images using the 1.5 models. I know SDXL is on the newer side so maybe in 6 months when people have had more time to create assets it will seem more worthwhile. SDXL is also slower, mostly due (I think) to the increased base resolution of 1024x1024, as opposed to 512x512 of SD, which would be an OK tradeoff if things looked a lot better.
I'm about 50/50 using SD1.5 and SDXL. If you use Auto1111, you can use some custom CivitAI SDXL checkpoints and not have to use the refiner. I never use the refiner. Usually need to make sure you use the VAE though.
Why is it so hard to get a dark image? For example I was looking for a dark castle interior but if I add someone they are usually backlit or looking like they have been photographed with a flash. I tried adding various modifiers to the prompts but from the brief research I have done SD has a hard time with those sort of images.
are you prompting for dark? Volumetric lighting, candles, etc will give you a dark...or just say "dimly lit room, nighttime" etc, but volumetric lighting is a good prompt for a lot of things.

try adding to prompt words like:

Bleak, post-apocalyptic, somber, dramatic, Eerie, unsettling, dark, spooky, suspenseful, grim

on the flip side, in the negative prompts you could add:

bright, cheery, colorful

Prompt: "volumetric lighting, a girl sitting inside a dark castle, holding a candle, somber, dramatic, Eerie, unsettling, dark"
00038-3987695924.png
Some models I like:

for SD1.5

epiCPhotoGasm
https://civitai.com/models/132632/epicphotogasm

It does NSFW but so do most models. Just put NSFW in the negative prompt and you won't get nudity. It has outstanding photorealism with fantasy elements.

I mix my own checkpoints. The checkpoint I use for SD1.5 probably has at least 5 checkpoints merged with varying weights. (easy to merge checkpoints and make your own in Automatic1111). Same with SDXL, the one I use is my own custom one with 3-4 checkpoints merged.
-------

Try starting off a promt in SDXL with "RAW photo, Polaroid" you get some cool polaroid looking realistic photos.

"RAW photo, Polaroid, girl wearing headphones listening to music in living room, 1980's"
00010-1078147054.png
SDXL will give you a polaroid looking image that gives off that old photo vibe.

Magazine covers are fun, try:

Pulp Magazine Cover 1950's
"Man bitch slapping a women, Pulp Magazine Cover 1950s" I just wanted to see what I would get!
00027-3322482284.png
Science Fiction Magazine Cover
Science Magazine Cover
Computer Gaming Magazine Cover
Comic Book Cover
Knitting Magazine Cover

as part of your prompt, or just one of those alone to see what you get. Probably better in SDXL than SD1.5 for covers

Also, use wildcards:

if you use Automatic1111 and install dynamic-prompts extension you can use lists of words or prompts to randomize some things

search "wildcards" on CivitAI or choose that on the filter to see all the variety of wildcards.

there's an 80's Decade wildcard pack, 70's etc

https://civitai.com/models/146179/80s-decade-wildcards

https://civitai.com/models/147632/70s-decade-wildcards

Wildcards are great, for example, if you have a wildcard txtg file of 1000 different locations you would type a prompt like

Old man standing, __mansoutfit__, __location__

that would draw a random man's outfit from a text tile, and a random location, and each time you use that prompt, it randomizes it so you get lots of variety.


I use https://civitai.com/models/69651?modelVersionId=74311 sometimes

it's a massive, I think at least 20,000 premade prompts

you can simply type __sdprompt__ and it will draw from those high quality prompts someone combined off a "trending" AI art type website.

Say you had a lora (hehe lora!) that you like to use, you could type (((funkolora))) __sdprompt__

the ((())) emphasize that part of the prompt, each ( gives one level of emphasis so when you type that line above, it will prioritize your lora (say it's a bear) and then place that bear into whatever prompt happens to randomly generate from __sdprompt__ giving you some wild images but almost always including your main focus lora. Doesnt even have to be a lora, can just be (and old man) and then you get an old man infused into the style, background etc of the prompt.

Also, always check out the (i) bottom right on sample images on CivitAI. it will usually always give you the prompt and other details used to generate the image. So if you see a dark image (you mentioned lighting before) you can see what they prompted to get that.

I'd really recommend swiching to Automatic1111. I started out with Easy Diffusion and InvokeAI but you will eventually want to use Automatic1111 (maybe comfy UI but that feels cumbersome to me while Auto is more natural to workflows)


SD 1.5 has the most loras/support/checkpoints

SDXL is slower due to 1024x1024 optimal resolutions but understands prompts a lot better. Don't use too high a resolution on SD1.5 or you get distorted images. For SD.15 you upscale later. Like run a batch of 12 images from a prompt, find one you like, move that to img2img then up the resolution and set denoise to .3 and it will slightly change the image but also increase the resolution and detail of the image. (I suggest keeping CFG low as possible, like 5 or 6 when making images and instead play with the denoise setting)

"girl wearing a plastic purple bunny mask" will get you a girl wearing a plastic purple bunny mask most of the time but SD 1.5 will probably get you a purple medical mask if anything unless you use a lora specifically for masks.

I also recommend using
DPM++ 2M SDE Karras
as your sampler. It's a little slower than some but great results.
You do not have the required permissions to view the files attached to this post.
User avatar
Aslanna
Super Poster!
Super Poster!
Posts: 12425
Joined: July 3, 2002, 12:57 pm

Re: Stable Diffusion and AI stuff

Post by Aslanna »

Winnow wrote: October 13, 2023, 10:17 pm I'd highly recommend Automatic1111. ComfyUI is not fun to use unless you like using nodes. Automatic1111 is by far the most used of the Stable Diffusion UIs and thus also has most support on reddit and most extensions etc work with it.
I am using A1111. Apparently the official name for that is Stable Diffusion WebUI.

Stability Matrix is like a front end so you can easily switch between A1111, ComfyUI, and a couple others. It does other things as well such as a model browser tied into CivitAI to make it easy to install things from there. You should look into it I find it useful. The only improvement so far I'd like to see would it be checking if models have been updated on CivitAI. And being able to have only one package at a time running can be a bit limiting if you want to switch back and forth.

https://github.com/LykosAI/StabilityMatrix

Anyway yes, A1111 is easier but Comfy has easily shareable workflows (the workflow is (optionally) saved within the image) such as this one that is described as "Workflow created to generate and replace face in one step". Not that I'd use this but it's just the most recent example I have seen of a ComfyUI workflow being posted.

Image

Not sure how you would do that easily in A1111s interface but I suppose it is possible it is just as easy. I can't see how though!

Stable diffusion is the only option for NSFW stuff so it gets a lot of that. Did you turn off/on the NSFW toggle on CivitAI? That gets rid of a lot of the adult stuff but probably still a lot of anime stuff shows up. The little eyeball icon at top of page if logged in toggles NSFW on/off. Just be aware, some of the NSWF models can still be the best at non NSFW images so don't cut yourself off entirely from checkpoints, but yeah, some of the adult loras have every fetish in the book covered.
It's not really the NSFW stuff that bothers me. There is just so much anime and young Japanese girls. It's just creepy. I wish they had an anime tag so I could use a filter and not see it.

are you prompting for dark? Volumetric lighting, candles, etc will give you a dark...or just say "dimly lit room, nighttime" etc, but volumetric lighting is a good prompt for a lot of things.

try adding to prompt words like:
Bleak, post-apocalyptic, somber, dramatic, Eerie, unsettling, dark, spooky, suspenseful, grim
Yes. I did a bunch of googling around to find out how to get darker images. Nothing that got close to what I thought was actual darkness. I'll try some of those suggestions although I haven't had much time to mess with it the past week or so.

I mix my own checkpoints. The checkpoint I use for SD1.5 probably has at least 5 checkpoints merged with varying weights. (easy to merge checkpoints and make your own in Automatic1111). Same with SDXL, the one I use is my own custom one with 3-4 checkpoints merged.
Sounds too complicated for me at the moment. Maybe not so much the process but the pros and cons of why I would want to do it. Still just a beginner with all this stuff.
Have You Hugged An Iksar Today?

--
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »



Faceswapping is extremely easy in Auto1111. You don't need that crazy workflow. All you need to do is drag a picture of a face you want using and extension like Roop, Faceswap or ReActor (the newer version of Roop)

The above video shows ReActor, you probably already have Visual Studio C++ installed but he shows how to do it just in case. If already installed you can skip that part.

Mixing checkpoints takes like 2 minutes to do.
merger.png
In Auto111

1. Go to Checkpoint Merger Tab
2. Select two models (checkpoints) you want to merge (Primary model (A) and Secondary Model (B)
3. Type a custom name for the merged model (can be anything) in the Custom Name field)...I like naming them part of both models with the mix percentage
4. The multiplier slider just determines what percent of each model will be merged. ex: setting at .2 = (80 percent of Primary model is used) .5 50/50, .8 20 percent of Model A and 80 percent of model b)
5.Interpolation Method: Weighted Sum
6. Checkpoint format: safetensors
7. Copy config from (choose A,B or C) it's the default
8. If you don't like messing with VAE, you can bake in a VAE into the checkpoint. Depends on whether the checkpoints your using already have baked VAE or not. I typically don't bake one in.

Press the Merge button and after a minute or two the new checkpoint will appear where all your other checkpoints are stored which is probably models/Stable-diffusion

The image above shows all the settings but basically once you choose the settings, all you're doing is selecting 2 checkpoints and merging them each time. All those other steps are one time settings (except for merge percentage depending on what you're wanting)

There are good reasons to merge models. You may really like the background/landscapes, etc of one checkpoint but like the style or faces of another checkpoint. Merge the two and you get the best of both worlds.

Another example would be those anime checkpoints you don't like. Some of them have fantastic body positions etc. You can merge say 30 percent of one of those checkpoints with a realistic looking checkpoint and find a nice balance.

It's so easy to merge models, I'll merge 2 models and 20 percent, 30 percent, 40 percent, etc, then use the X/Y/Z Plot to run batches of images so I can compare the various percentage merges to determine the one I like. X/Y/Z plot is a Script that comes with Automatic1111, choose the dropdown menu under scripts at bottom of txt2img or img2img. It's the script that i made that Zelda and fat cat charts earlier in this thread with. Great for testing/comparing settings, keep seed at -1 (so it's not random for each image when making plots)

----

Go with whatever you feel comfortable with. Comfy UI is fine but once you have the extension installed, faceswapping is really easy in Auto. If you watch that video, the latest version (Reactor) even figures out the age of the faces and tries to adjust that as well.
bluangel.jpg
xyz_grid-001.jpg
Don't forget about img2img as well. Taking an existing image (any image real or AI generated) and modifying it is really powerful

Even those anime images you may not like can be useful. Example above, I took that anime image of a girl with sort of wing looking things and used this prompt in Image2Image

Code: Select all

RAW photo, Polaroid, Korean girl with wings in a park
I was using a more photorealistic SDXL checkpoint but the only thing I changed was the denoise setting. at .4 minor changes, raising to .6 you start to see "in a park" part of the promt and started making the girl more realistic. at .8 it basically took that anime image and made a photo out of it. img2img can be really powerful if you have a certain pose you want to keep. It doesn't matter what the art style is. I could have done the reverse and made a realistic photo into an anime one or any other art style.

"in the style of Normal Rockwell" etc. but you can see from my basic prompt that depending on the denoise setting (keep CFG 5 or 6 usually), using that XYZ plot or just making single images and changing the denoise can be really useful to find something you like in a different style.

I've made my own loras etc, I got pretty deep into this stuff because I find it fascinating but after a little learning curve, the more you play with it the better it gets.
You do not have the required permissions to view the files attached to this post.
User avatar
Aslanna
Super Poster!
Super Poster!
Posts: 12425
Joined: July 3, 2002, 12:57 pm

Re: Stable Diffusion and AI stuff

Post by Aslanna »

I haven’t really had time to mess with this stuff at all lately so have just been keeping things updated when newer versions. Luckily Stability Matrix makes it easy to see what packages and models have newer versions.

In one of my earlier posts I think I mentioned it didn’t show which installed models have been updated but it turns out I wasn’t looking in the right place. Just go to Model Browser, change Sort to Installed then click Search. It will then search CivitAI and will indicate if one of the installed models/checkpoints has an update available and if so you can update it from there. Quite handy.
Have You Hugged An Iksar Today?

--
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »



I tried it out in ComfyUI. It's ok but pretty basic so far. You can make 12-24 frame videos based off an 576x1024 or 1024x576 image.

The above video links an easy Comfy Workflow to do it.

I upgraded from 32GB to 128GB of system RAM, 4x32GB. I can't say if it helps anything yet but at least my system booted up! Where is will potentially help is with large LLMs that won't fit into my 12GB of VRAM. They can be loaded or overflow into RAM but will be slow. Same goes for stable diffusion. Potentially can make very high resolution images that can share VRAM and RAM but will be slow. Not great but you go from impossible to slow for those things.



Also, you can use LCM to generate images in Stable Diffusion using like 6-10 steps and really low 1-3 CFG. This allows you to pump out images in SDXL fast or crazy fast with a SD15 model. The results are pretty good but not as good as using a sampler like DPM++ 2M Karras and 35 steps for example but if you are playing around with concepts with a 3080ti you can create images in less than a second so can find image you like using LCM then switch over to another sample to either upscale or use a little denoise and improve the image.
User avatar
Aslanna
Super Poster!
Super Poster!
Posts: 12425
Joined: July 3, 2002, 12:57 pm

Re: Stable Diffusion and AI stuff

Post by Aslanna »

It seems the pedos have caused a bit of trouble over at civitai. Causing them to lose their image host or something. I can’t say that is surprising considering some of the images I have seen posted there. If you want to generate AI underage images whatever.. it beats you posting actual underage images (but is still disturbing). Just keep that shit to yourself.
Have You Hugged An Iksar Today?

--
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Aslanna wrote: December 12, 2023, 9:06 pm It seems the pedos have caused a bit of trouble over at civitai. Causing them to lose their image host or something. I can’t say that is surprising considering some of the images I have seen posted there. If you want to generate AI underage images whatever.. it beats you posting actual underage images (but is still disturbing). Just keep that shit to yourself.
Yeah, there's no way to stop the creation of it for sure. Even with non modified models you can generate images that aren't acceptable to the government etc. with just a couple word prompts. CivitAI should only host models/loras etc because when they allow generating them, you simply can't prevent those images it's way to easy to do. They don't make money off just hosting files though so I understand they're trying to have a business model that generates income. I suspect eventually they will censor and that will just cause (the majority) of the models/loras to be hosted elsewhere.

That said, I hope the "mind police" don't go overboard. I know what the law is etc but personal opinion is that it's none of anyone's business what someone generates for their own personal use as long as it's not published publicly. There's no way to stop it from being generated so just keeping it from being posted online similar to anything else should be the goal.

Porn is the catalyst for all visual advancements in videos, etc. Anytime a better resolution came out, DVD, HD/Blu Ray it was porn that was the first quality material available for it. VR as well. I'm finally starting to see some non porn VR videos that are good. With AI it's the same, anime/hentai etc.

I agree CivitAI should work on their filtering system. They do have the "eye" thingy that blocks adult images but they could do more to separate the more standard models from the waifu images etc.

-----------------------

Smaller size LLMs are getting real good. The recent mistral-7b-instruct-v0.2.Q5_K_M runs really fast on 3080ti and even slower cards. I'd recommend giving that one a shot if trying out local large language models.

Links to the GGUF models

https://huggingface.co/TheBloke/Mistral ... -v0.2-GGUF

LMStudio is probably the easiest way for newbies to run models:

https://lmstudio.ai/
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

After a day of testing my new 4090 24GB VRAM card, it's met my expectations and more.

Stable Diffusion:

It handles SDXL so well. It cranks out the images and hasn't hiccuped once. Being able to generate SDXL (1024x1024 and variants of that res) quickly is a game changer for me. I like to test models and use XYZ Plot to compare CFG, Steps, models, etc and the speed at which 12GB VRAM card was too slow and often didn't even work when generating larger batches and XYZ stuff. Monitoring GPU, it looks like SDXL goes just a little above to 14GB which causes major slowdowns on 12GB.

With that in mind, you might get by (without multitasking) with a 16GB like the 4080 if just working on SDXL alone. It won't help with video etc where even 24GB is taxed.

Multitasking:

This is where the 4090 exceeded my expectations. I was able to load a decent size (13B Parameter) LLM along with using Auto1111 at the same time. It maxed the 24GB but performed well, nothing overheated, no high temps etc. I'm happy this worked. I can now use LLM to help create prompts while working in Auto1111. For example, I can tell LLM to create X number of stable diffusion prompts with x subject, x environment, pose, etc. I can then cut and past those into Auto's script (from a file or text box) and run those prompts to test them out.

Undervolting:

Seems to be a very good idea. Even with multitasking the card doesn't seem to be struggling with anything. I'd crash out of Auto1111 a few times a day due to it just overloading the VRAM and creeping to a halt.

I don't know how much my 128GB of system ram helps but when I look at my GPU status is has 88GB of shared memory so I guess it finds a way to make use of it.

LLMs:

I need to research the settings more for larger models. The smaller ones I've been running, run even faster of course but I'm not sure of the settings for the larger models. I want to make sure whatever I'm using loads completely into VRAM otherwise it's so much slower.

Anyway, I'm back to my original intent when I first built my PC. Carefree multitasking again, even with the high demand AI stuff. I always dedicate 12 of my 16 CPU cores to the LLM model. It may have seemed like overkill a few years back but I sure am glad I got that CPU.

For the foreseeable future: VRAM is king. Nvidia has zero competition in AI due to CUDA cores. There doesn't seem to be any competition for Nvidia in the near future for AI stuff. There might be for gaming.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

From Jan 18th news:
Zuckerberg highlighted Meta's upcoming infrastructure investment, stating that the company will acquire around 350,000 Nvidia H100 units, or approximately 600,000 H100 equivalents when including other planned GPU purchases. In case you were wondering – that's a lot of compute power.

According to market analysis company Omdia, last year's H100 orders were dominated by Meta and Microsoft, both of which individually purchased more units than Google, Amazon, and Oracle combined. Adding another 350,000 units will undoubtedly allow Meta to remain one of Nvidia's biggest customers.

Zuckerberg says Meta's AI infrastructure investment is required to build a complete artificial general intelligence (AGI) capable of providing reasoning, planning, coding, and other abilities to researchers, creators, and consumers around the world.

The company intends to "responsibly" develop the new AI model as an open-source product, ensuring the highest levels of availability to users and organizations of all sizes.

The high demand for the H100 has resulted in long lead times that could put some companies awaiting fulfillment behind their competition. Omdia reports that H100 GPU order lead times have ranged from 36 to 52 months due to the increasing demand for cutting-edge AI hardware.
That's about 10.5 Billion dollars worthh of graphics cards. It emphasizes the point that Nvidia really doesn't have to care too much about not selling GPUs to gamers or even AI enthusiasts that want more VRAM. 36-52 month wait times on $30K per unit GPUs. Microsoft is ordering roughly the same number and then Google, Amazon, Oracle, Tencent about 1/3 as many, Baidu, Alibaba, Lambda Las, ByteDance and Teslas, nowhere near as many but still $1 Billion plus each worth of GPUs from Nvidia. (140 Billion in revenue from that alone for Nvidia and even more if they could make more of them)

There's a lot of deserved hate for Meta/Facebook/Zuck but this should be praised. Zuckerberg is committed to open source and is the reason all of this AI stuff isn't hidden behind non open source models from those other companies mentioned. They are currently working on the next version of Llama LLM (Llama 3) which will be open source and is hinting it being AGI level intelligence.

-------------------------

On a scarier note. it's projected that AI research/processing will take 35-40 Gigawatts per year (or per day, can't remember which) in the next few years. A large coal plant produces 1 gigawatt, Nuclear Power Plant= 2 Gigawatts. There are multiple nuclear plants scheduled to be shut down by 2033. Our lame ass government hasn't stepped up making newer safer nuclear plants so they are projecting rolling brown-outs in the next few years in many major US cities.

We thought crypto was bad for power consumption, not even close to AI, but at least we gain something from this.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Just a short note, I love my RTX 4090. Zero buyers remorse. So glad I decided to get it even over retail cost.

Coming from the 3080ti, it's night/day how much faster AI works with the additional CUDA cores and primarily the 24GB Vram.

I'd say 4090 is a must if you are big into AI stuff which I am. Otherwise you have other options you can get by with if not wanting to spend $.

As for AI, 24GB is huge for 2 reasons. If you can't entirely load an LLM into VRAM the output text is going to crawl across your screen slower than in the 1980s while using a 300bps modem. Fully loaded models are more like a 56K modem speed output : )

For stable diffusion, you can pump out SDXL 1024x variant resolutions faster than 3080ti creates SD 1.5 low res images. Also, I am constantly mixing SDXL models fine tuning them and so I am using XYZ Plot script to compare models. So I'm loading multiple models to create comparisons between the models which was unreasonably slow without the VRAM. It allows you to play around/experiment much more without waiting forever while stuff processes or flat out crashes.

That's a small sample, EVERYTHING AI related gets a huge boost, Voice emulators, video, etc. I also can also, as mentioned before, use both LLM and stable diffusion at the same time. And my browser video (like Youtube) doesnt stutter etc while processing images.

Now or in the future, if you have any interest at all in AI. VRAM #1 consideration, CUDA cores #2. Don't get anything less than 24GB Vram moving forward, and if somehow Nvidia comes out with 48GB VRAM consumer card, that's the one to target but that card will sell like hotcakes. Presently you can't even consider a non Nvidia card for AI.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »



Craziness. text to video took a huge leap in quality. This 10 minute video gives multiple examples of AI generated videos. Interestingly the very first example given is the worst (still impressive but worse of the examples). Watch at least a few of them. The photo realistic ones are the most impressive.

The problem up to now with AI videos was trying to keep everything consistent without it morphing from frame to frame. Looks like Open AI solved that. The samples from OpenAI's SORA are amazing.

Keep in mind, these are all created from simple text prompts. 100% AI.

While some things (walking in particular) animations aren't perfect, many of these videos are high quality enough that you wouldn't think twice if you saw them posted on the internet as being actual videos taken IRL.

While I follow AI daily and use both stable diffusion and LLMs daily and there are major developments on a weekly if not daily basis, this is one that stands out as eye opening as to what's to come.

I can't imagine anyone wouldn't be impressed with this. Open AI has also demonstrated that they can render this kind of thing for games so in near future you will have games rendered at this quality.

It's fucking incredible how far AI has come in 1 year. Also give Mark Zuckerberg (and others) credit for open sourcing LLM Llama models and Stability Diffusion for doing the same. Without them this tech would be locked behind companies instead of open to anyone. Because of this people can create anything they want on their local computers as opposed to being censored and having "big brother" determine what' you can and can't see visually or in text. I'm very much against the "mind police" determining what's best for me.

Image

This is where we were a year ago with AI video.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Man, Google is not having a good year with their attempt to catch up in the AI race.

Gemini is horrible, and the open source LLM they released fails most of the logic test and typical tests asks of LLMs.

They were caught lying about their AI's capabilities several months back as well.
german.jpg
pope.jpg
sweedish.jpg
The latest fail is their incredibly "woke" AI image generator that has has been inserting random races into historical images.

Google had to take the system completely offline because when asked to create historic image of World War II german soldiers, it was inserting black, asians etc. When asked to create images of the Unites State's founding fathers, it came up with an Indian in a headress, an Asian with a wig like Washington, etc.

I'm glad they're getting called out for it. It's showing how off the rails the DEI/Woke movement has become. We know we're guaranteed a black doctor/scientist, lesbian/gay, cripple in every TV series and guaranteed a rainbow of colors in every movie no matter what it's topic is.

But enough is enough. I mean if I asked Google's Gemini to "show a group of people standing at the back of the bus in the 1950 's" and it showed a group of white people...how would that fly? You can't just change history like it never happened. If I was black, I'd want to see black people at the back of the bus in a 1950's themed image. Or how about white slaves coming over from Africa? That would totally screw up the democrats trying to keep the victim tag on blacks instead of empowering them. What if i said "Show me a picture of Aunt Jemima" and I got a white lady?

I'm perfectly OK with making every commercial ethnicity be like a roll of the dice randomized thing but in no way should we be changing historical imagery. Both for good and bad, it happened and shouldn't be changed.
You do not have the required permissions to view the files attached to this post.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Aslanna wrote: October 10, 2023, 8:52 pm I got bored and started messing around a bit with Stable Diffusion. I am using Stability Matrix which I don't know if that's a branch of Easy Diffusion or
Why is it so hard to get a dark image? For example I was looking for a dark castle interior but if I add someone they are usually backlit or looking like they have been photographed with a flash. I tried adding various modifiers to the prompts but from the brief research I have done SD has a hard time with those sort of images.
lighting.jpg
If you're still looking to get darker images, try using "chiaroscuro" in your prompts. the image grid above shows same prompt but with chiaroscure, volumetric lighting, and no modifier for comparison. This is for lighting example. I didnt fix the hands. Swift and the cowboy look like they just got done playing a long session with their Playstation controllers!

"Chiaroscuro is the use of strong contrasts between light and dark, usually bold contrasts affecting a whole composition"

It works pretty well most of the time, at least worth trying with and without, helps to have a light source of some sort in the prompt.
lighting2.jpg
Edit: here are the same prompts with a different SDXL model. Cowboy images look a little overcooked
You do not have the required permissions to view the files attached to this post.
User avatar
Aslanna
Super Poster!
Super Poster!
Posts: 12425
Joined: July 3, 2002, 12:57 pm

Re: Stable Diffusion and AI stuff

Post by Aslanna »

It looks like civitai made some changes which has broken the model browser in Stability Matrix. At least for me. I don't see anyone else reporting issues though so I'm not sure what's with that.

I'm undecided on the new Content Rating system but it's nice that I can now filter out anime and furry (and gore) so a lot less to page through. That may be separate from CR though but it's been awhile since I've actually logged onto the civitai website.
Have You Hugged An Iksar Today?

--
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Aslanna wrote: March 28, 2024, 10:12 pm It looks like civitai made some changes which has broken the model browser in Stability Matrix. At least for me. I don't see anyone else reporting issues though so I'm not sure what's with that.

I'm undecided on the new Content Rating system but it's nice that I can now filter out anime and furry (and gore) so a lot less to page through. That may be separate from CR though but it's been awhile since I've actually logged onto the civitai website.
Forge (which is a more maintained easier to install version of Auto1111) is my recommendation

https://github.com/lllyasviel/stable-di ... ebui-forge

There's a one click install version of it onm that page
>>> Click Here to Download One-Click Package<<<

After you download, you uncompress, use update.bat to update, and use run.bat to run.

Note that running update.bat is important, otherwise you may be using a previous version with potential bugs unfixed.
I opted in for everything adult/strange but nice to know if too many furries LORA etc I can switch it off.

Im looking forward to SD3, especially the better adherence to prompts. I've mixed enough checkpoints with SDXL that I have a great model that I use most of the time. There isn't a perfect one, some models will always be better than others at certain things.

LOVE my 4090 GPU. Primarily because I can screw around with models etc without waiting too much (and larger LLMs) but when SD3 comes out, I think for the best quality version of it you will need 24GB Vram although there will be smaller versions of it.

I'm still having a blast with stable diffusion.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

I'm still playing around with Stable Diffusion every day and also other AI stuff like yesterdays' new release by Meta of Llama-3-8B.


Llama3-8B is an impressive model for it's size. It can be run on 12GB Vram and even less. It's not uncensored yet so i'll use another LLM mostly for now but for censored stuff it's excellent. I'm glad Mark Zuckerberg has improved his image quite a bit over the past few years. He's done a lot of good things with Open Sourcing AI and also with the Quest/VR. Giving free open source LLM models that are as good as the closed ones is admirable.

Zuck said they are working on a multimodel 70B model. I'm looking forward to that. I've been waiting for an integrated Chat/Image generating LLM.

LLama-3-70B LLM is already topping the accuracy charts for LLMs and they have yet to release the 400B version still being trained.

Hate Mark Zuckerberg for whatever past reasons, but releasing these open source models is really forcing the competition to keep advancing and releasing new models and they won't be able to charge a lot for them since the free version is just as good right now.


I'm worried about image generation though. It looks like SD3 might be the last release from Stability AI. Thankfully the prompt adherence looks great so it should be a great base model to work with to build fine tunes/loras etc for.

Loving my 4090. SD3 will come in multiple sizes but I think the best 8B version might need the 24GB Vram. I've generated over 100K images. Most of those are SDXL.

I still highly recommend Diffusions for organizing images. I've got close to 60K images loaded in there all easily searchable by words in the prompts, model used etc.

https://github.com/RupertAvery/Diffusio ... t/releases

it's completely self contained/local/portable. You just unzip and then point it toward your output folder(s), it reads subfolders etc.

I have some test prompts saved in a txt file but most of the time I just search using Diffusions. For example, I use one prompt that includes "drinking tropical drink through a straw", it quickly autofilters so just typing "trop" gets me to an image so I can but and past the prompt or you can also drag the image right over into Forge img2img etc. I like that it also sorts by model as well.

I have 128GB RAM so I don't normally pay attention but just looked and those 60K images loaded in Diffusions take ~360mb of system RAM.

There are a lot of great models/loras etc for SDXL but it did take a month or so after release to get going. I'm sure SD3 will be the same way. I really can't wait though for the much better prompt adherence and text capabilities of SD3.
fruit.jpg
"Renaissance painting of three baskets. The first one only has apples, the second holds only oranges while the third includes only lemons."

That kind of prompt doesn't work well in current models but does well in SD3. Also prompts with multiple people like "a man standing on Left wearing a white shirt and purple tennis shoes, a woman standing on right wearing yellow dress and red rimmed sunglasses" will actually provide something with the right colors and genders etc.
You do not have the required permissions to view the files attached to this post.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

There is a large (~2gb) lora on CivitAI that does a pretty good job of the following
LoRa mimicking midjourney slyle v5.2 This LoRA works as:

Detail tweaker (supplements the picture with details)

Color enhancer (adds contrast and brightness)

BG depth improver (adds depth on the background)

IMPORTANT: use only with weight from 0.1 to 0.8! you can set more, but the picture will be too colored (smooth edges) and can break proportions. CFG Scale 4 - 6.

use it like pepper, just add a little bit to the picture )
https://civitai.com/models/251417/midjourney-mimic

It does a pretty good job of enhancing the above without changing the image too much (usually, see rainy example where at certain strength it significantly changed the image)

I tested various strengths and it seems most of the time setting the weight at .05 <lora:MJ52:0.5> works pretty well as a good starting point. I know SDXL doesn't run fast with less than 24GB but it still should be ok on 12GB VRAM with the added size of the lora.
mana.jpg

Code: Select all

closeup of woman wearing elf ears, elf cosplay holding a sign that says "SEX FOR MANASTONE", at convention, detailed <lora:MJ52:0.5>
This is the worst example. I just wanted to make a sign with text because once SD3 is released, a lot of these text issues will be gone. At least she gets 4 fingers at .06 strength!
rainy.jpg

Code: Select all

closeup of A young woman with glasses in a rainy dark alley looking at cell phone, she is wearing a wet coat, ((man standing in the shadows)), neon lights, rain storm,sad mood, cinematic lighting, dramatic atmosphere <lora:MJ52:0.5>
The idea behind this image was a lady not in the best surroundings, looking at phone with creepy guy in background. I think the .06 version came out best, the lora basically enhances so the neon lights a little more prominent, and creepy guy appeared.
japan-sunset.jpg

Code: Select all

Japanese girl ,long hair, very pretty, cute ,tattoo yakuza in the arm, wearing leather choker, clothes punk style, drinking through straw, looking at camera, sunset <lora:MJ52:0.5>
This one is probably the best example to show what the lora does. If you look at the leather jacket details, colors, choker details, thingy in the hair, kinda messed up hair which is realistic if on the beach from wind, even crappy chipped fingernail polish on .06 version.

None of these images are upscaled. They are just basic 896x1152 images 40 steps, 4.5 cfg, DPM++ 3M SDE Karras. Any image I actually want to keep I would upscale 1.5x with .30-.40 denoise as many times until I'm happy and then upscale again 1.5x using combination of DATx4 (upscaler 1) and x1_ITF_SkinDiffDetail_Lite_v1 (upscaler 2 at 0.5 strength) which would make them look way better.

It's something to play around with. If it's too slow then maybe use it in img2img with denoise after generating a decent image to work with.
You do not have the required permissions to view the files attached to this post.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

mana-2.jpg
I wrote about the new SD3 model on the AI art examples thread. Here's the same prompt as the previous thread (except I added "smiling" for comparison. I like how it randomly added some guy checking out her ass. : ) Fingers, limbs etc are horrible.
mana-3.jpg
closeup of woman wearing wood-elf cosplay outfit, smiling holding wood carved a sign shaped like a heart that says "SEX FOR MANASTONE", holding a glowing white orb illuminating face, at EverQuest convention with EverQuest sign in backgrtound, detailed
I left the "holding wood carved a sign shaped like a heart" because it's pretty good at figuring out what you want even if not grammatically correct. It's also trained for understanding normal written text descriptions instead of the "a women, smiling, wearing cosplay outfit, etc etc"

So far SD3 isn't great for creating people but it's reallyu improved with signs and understanding prompts. I modified the prompt in second image to show how well it understands prompts like heart-shaped sign, etc.

Edit:
mana-4.jpg
...and one of my favorite things to improve images, just add "chiaroscuro" to the description if needing more contrast between light/dark stuff in the image.
You do not have the required permissions to view the files attached to this post.
User avatar
Aslanna
Super Poster!
Super Poster!
Posts: 12425
Joined: July 3, 2002, 12:57 pm

Re: Stable Diffusion and AI stuff

Post by Aslanna »

I fell off on this pretty hard. It's too hard to keep up at this time. SDXL.. Turbo.. Lightning.. Pony.. wtf is Pony? I update the stuff I have every couple months but that's pretty much been it.
Have You Hugged An Iksar Today?

--
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Aslanna wrote: June 21, 2024, 7:58 pm I fell off on this pretty hard. It's too hard to keep up at this time. SDXL.. Turbo.. Lightning.. Pony.. wtf is Pony? I update the stuff I have every couple months but that's pretty much been it.
It's not something you have to stick with constantly. If I didn't have a 4090, I'd probably mess around with it less but with that kind if speed in generating images, it makes if fun to create and work with images instead of waiting for them to generate. Best example would be fine tuning an image. If it's fast you can create 4-6 variations of a .30 denoised image and then pick the best one. If it takes a long time to create those images then it's less likely to create as many an settle for what you get. That also goes for upscaling etc as well. Anyway, I see people complaining (not you) but people on reddit that they're bored with AI image generation. It's not a game. It's not something you pay $60 for or make money from. It's free and Stability owes you nothing. I've generated roughly 100,000 images now and I've shared 2 of them. (not including examples I've used on VV : ). It's purely for self entertainment for me.

Turbo and Lightning = lower quality images for much less GPU power. If you see a turbo version of a SDXL model it means you can create images in like 8 steps and 1.5 CFG and they look ok, but not as good as the standard SDXL models where you use cfg around 5 and steps 30-50. Basically Turbo/Lightning are made for those with less powerful GPUs.

SDXL = better than SD1.5 for pretty much everything and has matured with tons of LORA and fine tuned Model. As with SD15, never use the base model.

Pony = a highly modified version of SDXL created by a furry lover. They took a ton of images from a popular image posting sight for hentai and actually created an amazing model. Using the detailed descriptive tags from those sites, the model is able to do all manner of sexual poses really well and also handles hands and feet really well. As with SDLX base model, Pony has been modified and has a ton of fine tuned models that are geared toward creating realism as opposed to cartoon/anime looking images. Pony is a great model for what it does but really mostly for furry/sexual stuff but does show how much better hands/feet can be. Instead for taking this info for SD3, Stability made it even worse by removing all nudity from the model. On top of that, they changed the licensing agreement which prevents the Pony creator from making a better model out of it.

I use a version of Pony sometimes if I need a particular pose initially or am starting with an image and using img2img. Pony gets all the poses and hands right and then I switch to an SDXL model to denoise it into a more realistic image.

----------------------------------

SD3 is a fail but there is some interesting news with stability AI

Sean Parker bailed out the company with an investment. Sean Parker is behind Napster, Cofounder of Facebook, and also Spotify.

Gotta give him credit for Napster! That's pretty legendary. He seems to have success in what he gets involved with but it's unknown what he's thinking about doing with Stability. The only thing most care about is the open source models. I think he may be more geared toward using the video and audio parts of stability to work with Hollywood though.

If he stops releasing open source models, I think SDXL (and Pony type variations of it) will definitely continue. There are also a couple of Chinese companies that have promising open source models. It may even be possible to save SD3 but probably not worth the effort as their "safety" model basically made it extremely difficult to create anything with limbs humans, animals etc and seems to suck with poses other than someone standing. It's a horrible model besides the prompts adherence which matters little if it understands what you say but then gives you a mangled result. Pretty sad that human anatomy is now considered taboo.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

So Nvidia just picked up Pixart (Chinese) which was the most promising replacement for Stability AI.

There's some hope that Nvidia will release open source models since they sell hardware and Nvidia cards need GPU intensive things for them to sell a lot of them. If you take away the open source stuff, you start taking away reasons for people to buy expensive top of the line consumer graphics card models. It would make sense to me for them to release high end image diffusion models that require their tensor cores. Games don't require these 4090 and upcoming 5090's.

Nvidia, being the highest market cap company in the world seems to know what they're doing so I'm hoping they picked up Pixart after Sean Parker invested in Stability AI in order to assure the release of open source models. It would benefit Nvidia for them to keep pushing the envelop in image, audio and video AI generation by continually releasing models that require their latest top end cards each year. They can't count on the gaming market to do that anymore. I can't imagine they haven't thought of this.

It's primarily a "hope" because it almost seems like Nvidia needs nothing other than the high end commercial market where they can't make enough $60K GPUs to meet demand. They have billions and billions of dollars in backorders that are at least a year out.

Right now I don't feel like I need to buy a 5090 but I would immediately if there was some amazing image/audio/video models that required it. I'm very happy with my 4090 along with my 128GB system ram and 16 cores. It makes my 3x4K monitors run smoothly with all sorts of apps open. 3080 struggled to run SDXL by itself with nothing open. With the 4090 I can watch a movie, surf the net, youtube , etc while generating. The 24GB of VRAM helps so much.

It's good to be Nvidia right now. All these competing AI companies, OpenAI/Microsoft/Meta/Google/Amazon/X (twitter)/Tons of Chinese companies all need as much GPU power as they can get to improve their models in the race for AGI. OpenAI/Microsoft is building a 100 billion dollar server farm for it. Media will want more and more powerful video generation to create AI movies for cheap instead of paying retarded woke actors that cause trouble trying to take stands on political issues they are way too clueless to be opening their pieholes about. As if they magically know more than Joe Anonymous standing next to them that wasn't in a movie. I can't wait for the day when we say "remember when there were actual human actors and musicians?" Similar to PC games, we don't need them! And unlike aging actors, if an AI actor has a hit movie, they won't age and look like they're about to keel over like Harrison Ford etc. in the tenth sequel. Even better though will be for new movies, you won't have to see the same old actor's face. You can enjoy and be immersed in the story without thinking about which movie that person has been in before.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Oh man, a new model is out called Flux that is amazingly good right out of the box.

SD3 sucked and it turns out the reason it did was the best people at Stability left and work at Black Forest Labs and made a great model that can take normal written descriptions and turn them into images with anatomically correct humans and also text.

The model is outstanding but also large so you may not be able to use it without 24GB VRAM although I think they made some smaller versions of the model. Need to use ComfyUI.
steam-ct.png
"closeup of woman wearing barbarian cosplay outfit, smiling holding wood carved a sign that says "You've been Steamrolled by CT!", at EverQuest convention with EverQuest sign in background, detailed"

So I created a batch of 4 images above so you can see that it's pretty damn good with text and the concepts.
Celestial.png
"A tired old barbarian on a horse, in the rain, passing under an archway that says "Celestial Tomb""

While this isn't in the "style" of EverQuest,(since I didn't prompt for that), it followed my prompt really well and it looks great being created from just my basic description. The splashing puddles etc are well done.

kdrama1.png
"A Korean women wearing a T-Shirt that says "Vote for Pedro", sitting on a Brown couch, holding a coffee cup with the text ""K-Drama", while petting a dog, there is a window in the background with beams of light shining through"

The final one is showing that it can follow complex prompts including colors and text on multiple objects.

Flux is amazingly good and immediately the best model available for image generation.

Lots of discussion about it on StableDiffusion Website (even though it's not their model : )

https://www.reddit.com/r/StableDiffusion/
You do not have the required permissions to view the files attached to this post.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Oh man, I was using the "lighting" 4 step (Schnell) version of Flux last night. I'm using the actual Flux-Dev 20 step model for the images below
barb-dev.png
"closeup of woman wearing barbarian cosplay outfit, smiling holding wood carved a sign that says "You've been Steamrolled by CT!", at EverQuest convention with EverQuest sign in background, detailed"
ct-horse2.jpg

"A tired old barbarian on a horse, in the rain, passing under an archway holding a flag that says "Celestial Tomb" in one hand and holding an orcs head raised high in the other hand"

Look at how great that flag looks with it's tattered look and creative Celestial Tomb text.
pedro.jpg
"A Korean women wearing a T-Shirt that says "Vote for Pedro", sitting on a Brown couch, holding a coffee cup with the text ""K-Drama", while petting a dog, there is a window in the background with beams of light shining through"

------------------

This model is amazing. Keep in mind, it's a 12 billion parameter model while SDXL is like 2 billion

While the image hold an "orcs head" isn't quite and orc, it's incredible how well the model comprehends prompts. Another thing to note on on things like "Vote for Pedro, etc, it does a good job integrating the text with in partially behind the coffee cup etc.

...and the hands...the hands! So much better than SDXL/SD. They can still be off sometimes but way less retries to get them right.

They release 8fb versions of these models so I think would be usable on some non 24GB VRAM cards.

Flux is also good for all types of images realistic to anime and everything in between. It isn't gimped and know what a nude body looks like while not trained on porn so you can create tasteful images without the body being mangled.

Best of all it's free. It beats out the Midjourney and all other closed source models and you (if have PC capable of it) can do this all locally.

VRAM is still king. Hoping the 50xx series have more than 24GB because video like kling.ai is starting to get to levels that are worth noting

Check this Youtube channel out for what you can do with video ai:

https://www.youtube.com/@KlingAIOfficial/videos

It's come a long ways, getting close!
You do not have the required permissions to view the files attached to this post.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

comfiUI-mastered.png
Well, I bit the bullet and took the time to sort of learn ComfiUI. I can now string together my own spaghetti and make my own workflows but mostly it's modifying and patching together multiple workflows to get what I want. I still pull the workflow over to the SwarmUI Tab though when I'm running batches or need grid generator comparison etc.

I always thought it was stupid that ComfiUI didn't automatically save the meta data embedded in images, so any workflow I use, if it doesn't have the "Save Image Meta data node, I need to add it and patch all the stuff into the node.

I also wanted wildcards and with only a few FLUX workflows I had to mix and match to get some workflows I like. Also ended up finding some other cool nodes along the way.

FLUX is suck a great model. There are some smaller versions of it that will run on 12 GB VRAM but there larger models are really starting to need 24GB VRAM if you don't want to be waiting and maxing your PC while images generate.
comfiUI-mastered2.png
You do not have the required permissions to view the files attached to this post.
User avatar
Aslanna
Super Poster!
Super Poster!
Posts: 12425
Joined: July 3, 2002, 12:57 pm

Re: Stable Diffusion and AI stuff

Post by Aslanna »

Winnow wrote: August 10, 2024, 1:13 amWell, I bit the bullet and took the time to sort of learn ComfiUI.
I mean it's not like I didn't try to get you to use it almost a year ago or anything...
Have You Hugged An Iksar Today?

--
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Aslanna wrote: August 11, 2024, 1:30 am
Winnow wrote: August 10, 2024, 1:13 amWell, I bit the bullet and took the time to sort of learn ComfiUI.
I mean it's not like I didn't try to get you to use it almost a year ago or anything...
I had used it previous to that but I just didnt like it compared to Auto1111.

Forge just added FLUX support today.

ComfiUI is also getting a UI overhaul soon. ComfyUI was the go to UI due to it being supported directly by Stability but since Flux is from the large team of people that left Stability to work at Black Forest Labs, maybe we'll get something a little better. Pretty clear to see the talent went to Black Forest when comparing SD3 to FLUX.

Things are progressing quickly with Flux. Several LORAs have been released and peeps are working on controlnet for it.
patch-1.png
patch-2.png
A detailed, worn on the shoulder,circular military worn faded patch shaped logo filling image that says "Veeshan Vault 2002-2024" silver letters with a Firebreathing Dragon in the middle, the patch is surrounded in flames
patch-hat.png
A detailed, worn on the baseball cap,circular military worn faded patch shaped logo filling image that says "Veeshan Vault 2002-2024" silver letters with a Firebreathing Dragon in the middle, the patch is surrounded in flames
I made a customized logo for my Fantasy Football League that's in its 22nd season, same as VV. I used a similar prompt to what I used for the examples above for a VV patch.

Despite what people like Spang think, AI can generate some cool stuff and quickly and it's the tip of the iceberg. No upscaling, just typing the text and using FLUX gets you some non generic looking stuff and even the hat looks worn etc. Hat logo dragon is pretty lame but posted it for the details in the patchwork, threads, worn hat, etc. My prompt is pretty basic and not the best but still ends up with a decent result. "worn" is all it took to give it a non generic "too clean" look. If you click on the images to make them bigger you can see nice detail for no upscaling etc.
You do not have the required permissions to view the files attached to this post.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Man free short video AI is getting pretty good.

You can do this for free:



It's a werewolf eating a corpse so NSFW due to that.

All they had to do was type in this prompt:

https://preview.redd.it/ndciv4hsa4id1.p ... dd65aec2fe

Can't wait to start pointing out all the things Spang enjoys over the next few years not realizing they are either partially or fully aided by AI. He'll have to stick to reruns!

here's link to Kling.AI

https://klingai.com/

You can see lots of examples and create your own.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

FLUX_eyes1_.png
FLUX_eyes2_.png
The LORAs (which people originally thought couldn't be done for FLUX, are starting to come out and they are great.

https://civitai.com/models/641755/flux- ... nId=717794

That's the lora for the weird looking chess netflix series lady. It's a good test subject because she has those whacky eyes that are so far apart.

The LORAs work very well in Flux, people are flocking to it (as they would have if SD3 was any good).

They keep making smaller and faster versions of FLUX so anyone can use it. The DEV model is the best and there is a different in quality but the smaller versions are still excellent.
You do not have the required permissions to view the files attached to this post.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Things are moving super fast with FLUX regarding training LORAs, smaller models for the weak-sauce GPUs etc.

I took the time to learn and create some LORAs over the past few days and man...the are amazingly good.

Everything is python, no GUI for what I'm using but once I got a handle on all the settings etc, I can create a LORA in a little over an hour for most things and maybe 2 hours for more challenging concepts.

So glad I have my Quest 3 because you can just barely create LORAs on a 4090 24GB VRAM card. It cranks along at 100 percent GPU usage for the entire time generating the LORA processing the steps. I can't really even surf the net. Crashed my LORA training a few times doing that. I'm using AI tools

https://github.com/ostris/ai-toolkit

While it's super GPU hungry to train a model for a 12 billion parameter model, the results have been amazing. I had several data sets from training LORA for SD 1.5 a year ago so I used those to create some FLUX models without even using the higher res images.

As any normal guy would do, I used an adult subject to test LORAs. While no excuses are needed for doing so, it was a good test because FLUX is censored when it comes to private parts (the Tacos, Beans and Franks) so it's a good test. The results are incredible. Even better, it seems to take less images to train and less steps for FLUX while at the same time the LORAs are way better when it comes to flexibility.

I suppose a good example would be, say you liked a pictorial from Playboy (or whatever, COSMO, GayPride, or whatever you might be into). With about 20 images (maybe even less) you can produce an amazing LORA of someone. So say you had some model, celebrity, musician, whatever that you liked a lot. You really can make endless images of them in all sorts of settings, poses, etc. The person I trained was spot-on. I seriously would have though the images I made would have been from a real photo shoot. It's dead-on accurate for looks and quality of the images.



Here's what an amateur user did with just 13 images of himself training 2,000 steps (that's about $2-10 if you do it online at places like CivitAI etc where you can train LORAs). For me it cost whatever electricity about 70 minutes of max GPU time on my 4090 costs me.

One thing I'm really glad I did was immediately undervolt my 4090 when I got it. With AI image generation (and text LLMs etc) the GPU always goes right to MAX usage whenever generating or training anything.

Anyway, there were a few downer months after the monster fail that was SD3 but FLUX which seemed to come out of nowhere has exploded the scene again.

There are a ton of variant models now for FLUX which most people can run if you have at least 8GB Vram and they are still better than everything else, just not as good as the 12B Dev Model. As with everything AI, FLUX developments are happening on hourly/daily basis. A week ago they said you wouldn't be able to make LORAs for FLUX, a couple days ago the smart people figured it out...and now there is an explosion of LORAs etc on CivitAI (although not the best yet) and workflows...and it took so long for ControlNet stuff to come out for SDXL and it's already out for FLUX.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »



I've been creating some amazing FLUX.1-dev LoRAs that (after image prep) take me about 90 minutes to create.

There is no research paper with info on FLUX so people are still figuring it out. So now it looks like you can create a similar quality LoRA with a resulting file size of only 4.5-10mb. Tiny!

Here's the reddit thread and examples of The Hound LoRA using this new technique. The way FLUX is trained you can choose "blocks" of the LoRA to train and if you are only interested in the face and body for example, you can train on only one or two blocks and get great results.

I've merged my own LoRAs into a FLUX checkpoint and it works amazingly well. It takes a little extra time to load a LoRA so if you have something you always use or use a lot, you can merge it into the checkpoint. Say you like to make images with a particular style, artists, etc. Merge that style into the checkpoint and you won't have to bother loading it anymore. I merged 5 LoRAs (faces and concepts) and they all work with or without a trigger word although it helps.

When it comes to LoRAs and flux, you don't need to go nuts with the text descriptions anymore. If it's a single concept or person, you only need to tag it with a unique word for the LoRA. Like if you trained it on a Giraffe fucking a donkey, name it something like girdonk and use that word to trigger it. Each image doesn't need details like "A giraffe fucking a donkey in the Sahara beside an oasis. There is a monkey masturbating while watching the show. FLUX will already understand known concepts (well maybe not the the monkey masturbating) so you no longer need to go nuts with that.

FLUX can also learn text/fonts/styles. Feed it like ten examples of a style of text and you can use that LoRA to create that style of text in your image. CivitAI examples:

https://civitai.com/models/699559?modelVersionId=782787

https://civitai.com/models/691903?modelVersionId=774338

https://civitai.com/models/701036?modelVersionId=784392
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »



I'm still messing around with FLUX every day. As for video, it's coming along. The above video shows the latest update to Kling which has a limited free access you can play around with. Pretty impressive how far it's come. It's just a matter of time for it to still image type quality.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »



ChatGPT voices are getting really good. I don't have a use case to spent $20/month for it but seems worth it for someone that might.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »



The above is a good example of just some of the interesting voices/accents. You'll notice a couple hiccups in the beginning. Not sure where those came from. I haven't experienced those voice hiccups myself. Maybe it was his recording setup that did it. Amazing that it know so many Japanese anime voices etc.

I Subbed to Chat GPT Plus a few days ago. It's $20/month, simple google playstore and I assume iphone app store sub with easy cancel through that as well (for android at least)

Chat GPT with emotion detection voice is crazy good. The power behind it is mind boggling. Most use it just to ask it silly questions etc. but it's ability to use accents, speak many languages, tell stories etc with emotional inflection and also be programmed with simple instructions or complex ones make it where people haven't tapped anywhere near its potential.

I'm blown away by it. I don't have a use case for it myself and I think that's going to be a problem with their subscription model. Right now, everyone pays $20/month or maybe $25/business a month. They plan to slowly raise that to $44/month over a year or so. I wouldn't pay that for basic usage but people that run their businesses for it and use it to script/program etc say they'd happily pay $100/month. They're going to need to tier their subs because Chat GPT voice is absolutely amazing for the elderly. It's to the point where you can have a normal conversation and no delay in responses etc. Old Granny will find uses for it but not $100/month uses. I think they'll need to maybe separate the programming ability of it or something into a tier of it's own because some businesses are saving tons of time having ChatGPT do their analysis so it's well worth a higher price. I would probably keep access to it maybe at $20 for more general uses.

For me the amazing voice interaction is where the value lies. All your conversations are logged in the text chat for easy use.

There's no way customer service jobs will last more than a few more years. People would much rather speak to this level of AI. You can be rude to it, can interrupt it and it fully understands what you want. Way better than talking to random bored human customer service rep.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27596
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

This is impressive:

https://replicate.com/blog/flux-is-fast-and-open-source

It's basically on the fly instant FLUX

If the demo is still up, type something and watch as it instantly updates the image.

Looks like we'll have a version of it for local use relatively soon.

The generations speeds are less than a second for Schnell and 3-4 seconds for Dev.1 (I currently take about 30 seconds to create a 896x1152 (any variant of 1024x1024) image with the 4090 using Dev.1 so this is like a 10x speed bump)

Schnell takes 0.29 seconds to generate so video is getting close.

--------------------------

I also found a free github project that converts 2d images and videos into 3d, various ways SBS (side by side) anaglyph etc.SVS is best for VR viewing.

https://github.com/nagadomi/nunif/blob/ ... /README.md

has a windows install that grabs all the required python stuff. Using 4090, I converted about 300 images into SBS in about 6 minutes. Results are impressive. It has a lot of different depth mapping models. I just used the default ZoeD_N tuned for indoor images but there's one for outdoor and like 10 others. It's the same thing used in controlnet in ComfyUI/Forge etc.

I converted a 2 minute video into SBS 3D. It wasn't bad but don't think I'll be converting a lot of video. The images are really good though.
2024-10-11-122849_Image_LRF_Full_SBS.png
This is an example image I converted. You can actually train your eyes (kinda go cross-eyed). and view this in 3D. Might want to click the image and make it bigger. It takes awhile to train your eyes if haven't done it before, kinda need to stare into distance and mess around with crossing your eyes a little, eventually you lock onto the 3D image and will stay focused on it until you consciously stop) This particular image only has small effect. Of course don't need to do any crazy eye stuff with a VR headset.
You do not have the required permissions to view the files attached to this post.
Post Reply