Stable Diffusion and AI stuff

Support, Discussion, Reviews
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27712
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »



AI is progressing at a break neck speed still. There were at least 5 new models that came out ranging from video, image, audio to text models. All nice advancements. Huge progress in AI voice/emotion for local AI.

Just a little while ago I was gushing over OpenAI's image generator that allowed you to edit images using text commands and keep the image mostly the same.

Fast forward to yesterday and Black Forest Labs released FLUX Kontext

Basically it allows you to do all sorts of things with an image, keeping the context of the image. The video above is worth a watch. Right now it's available only in the Pro/Ultimate models but a local FLUX-Kontext DEV model (Like FLUX DEV before it) will be released allowing this to be done locally.

It can modify a scene, extract things from an image, Change text in an image, keeping the same style but with new text, if you check out the video, it can do things like "Take the pattern off the store window and make it a tattoo on a man's back"

It will also allow you to expand and fill (InPainting/OutPainting) an image

According to the charts near the end of the video. Even the Local FLUX Kontext DEV version is better than the commercial competition like ChatGPT, etc. in some areas. It's also fast. Looks like around 10 seconds for image2image using Kontext DEV. version and less than 3 seconds for the pro (vs almost a minute for gpt-image)

The local version won't be the "best" version but what matters is that it can run locally.
User avatar
Winnow
Super Poster!
Super Poster!
Posts: 27712
Joined: July 5, 2002, 1:56 pm
Location: A Special Place in Hell

Re: Stable Diffusion and AI stuff

Post by Winnow »

Quick AI update before going traveling.

AI is amazing and hasn't stopped progressing at all. Always same or faster increase in AI capabilities.

As for video. The developments have come in the way of making the same videos faster with less VRAM. I can now make quality 7 second videos in about 90 seconds which means those with less VR can make them but slower processing time. Image to video is almost like magic. It still amazes me how the models figure out the movement of everything, hair, cloth, wind blown stuff, etc etc. They key here is the ability of the model to retain the facial features from a single still image and create a consistent video from it...and it's getting really good at that.

I won't post any guides. It's still takes some effort and not worth trying to show how do it here for now.

I've been focusing on audio AI. Chatterbox is amazing. This extended version in particular:

https://github.com/petermg/Chatterbox-TTS-Extended

You can use any 8-20 seconds of any voice and it will exactly reproduce that voice along with a setting on how much emotion you want.

It's really fucking good. This is "zero-shot" meaning you don't need to train it. Just give it the 8-20 second audio sample and you're off to the races.

Here are some examples of same voice and then the zero-shot cloned voices with various levels of Exaggeration:

https://resemble-ai.github.io/chatterbox_demopage/

The extended version (more capabilities) allows you to stitch together segments. I've been making around 15 minute stories but I think you can go to around 30 minutes without issue.
------------------------------

Another really cool feature of the extended version is voice conversion. It allows you to take an audio file and convert the voice to the sample voice you provide. It changes the voice but not the accent etc.

For example. I took the first chapter of Harry Potter's first audio book. I provided a sample of a Japanese Anime actress. I converted the entire first chapter in about 90 seconds (RTX4090). The result was the female actress's voice but she sounded British and used the exact tone/inflection of the original male narrator. Same high quality as the ''professional" narrator.

What this means is people like Spang can use their favorite Trans voice actor and convert any audio so he hears into that special Trans actor's voice that he holds dear. For the rest of us, it means that if you have an audio book and you can't stand the voice of that "professional" voice actor, you can convert the entire book into a voice you do like with zero loss in quality or voice emphasis/emotion etc.

If you also don't like their tone/emotion, you could completely convert it using just the text and setting the temperature/CFG, etc for the voice of your choice and just create an entirely new one.

For quality control, you can have it generate multiple times each sentence and then it has Whisper check each line for accuracy and it will regenerate the line if it doesn't sound right. That takes a lot more time but if you really want really high quality, it's capable.
---------------------

This is a huge step. Voice AI is now able to reproduce emotion and it's on the cusp of making it easy to break down a book/script into the individual voices and have the AI use as many voices as you want for the book....way better than those poor soon to be out of work voice actors can do. How many times have you (except Spang/trans stuff) been disgusted by a voice actor trying to emulate the opposite sex of what they are? No more. You will very soon be able to have high quality, emotional setting voices for Narration and every character in the book. Badass. The tech is already there but it will be another month or two before it's as easy as Chatterbox above to do it. They're working on the model already.

I attached a zip file with an mp3 of this post with Graham Hancock's voice. (can't attach mp3s directly) It's always more interesting if Graham Hancock is saying it!

Note: I upped the exaggeration just a tad so Graham is speaking a little fast : )

It's worth a listen! Listen to it as you read along so you can evaluate it's quality. I did change Baddass to "It's badass bitches!) in the audio version and also left the sentence " It's still takes some effort and not worth trying to show how do it here for now." in so that's on me, not the AI for repeating what I said.
You do not have the required permissions to view the files attached to this post.
Post Reply