It's the real deal. It can change image but keep the context. same pose (or change pose), add text, change text keep same style of text. It can change style "make realistic" "make the image a sketch" It can combine images . instant colorize images, etc etc.
It can do a shitload of things. I've been messing with it for 4 days straight and barely scratched the surface.
It takes about 60 seconds to generate an image (RTX 4090). It is able to manipulate the character/items but keep them consistantThe old man is sitting in a chair. The robot is looking at the old man. The robot is smiling
It is good with text. It kept the text on shirt same while making the t-shirt dirty. It added the additional text. Kept the hat the same.The old man is weating the baseball cap from the second image. He is wearing a dirty old T-Shirt. He is sitting at a table with a wooden sign on it that says "Spang was Right!"
expand the image for more detail.
It's is amazing with changing text. Look at how well it changed "Bowling for Combine" to the newt text keeping the exact formatting and even his feet on top of the text....60 seconds and a simple prompt to do this. These are not cherry picked. First result I got I used. I even typo'd "The old man is weating the baseball cap from the second image." and it still got it right.Change Snow White text on bottom to "Woke Garbage" keep same style.
Change Bowling Columbine text on bottom to "Spang Loves this Movie" keep same style.
make image realistic
"Male image realistic" is all you need to prompt for Kontext to convert any art style to realist (or vice versa changing styles of all kinds)
Optimally you want 24GB VRAM to run this but I've seen people do it with as little as 8GB VRAM but I'm sure it will be really slow instead of under a minute but at least you can actually use some form of it. Do not but any new GPU with less than 24 GB.
This model is AMAZING. AI is constantly improving but this kind of image manipulation on your own PC is a huge step. The recent emotion/inflection AI voices are amazing as well.
Way less need for LORAs. you can make full story comics keeping character consistency using a single image. (can change poses, clothes etc while keeping face etc consistent) I'll show that in another post.