Black Forest Labs dropped FLUX.1 Kontext: next generation of their flagship image generation model. Apart from the usual stuff being better, faster, and more, this time we also get image editing through text prompts. That is precisely the part that interested me more, since I already know BFL know what they’re doing when it comes to image generation (Flux 1.dev duh).
So i decided to take it for a spin - full notebook for those interested (including detailed explanations of the code):
https://github.com/tng-konrad/tutorials/blob/main/flux_kontext.ipynb
Courtesy of the brilliant people at HuggingFace, we have a pipeline for Flux Kontext - but we need the cutting edge version of diffusers:
!pip install git+https://github.com/huggingface/diffusers.git
We define the pipeline:
pipe = FluxKontextPipeline.from_pretrained(CFG.model, torch_dtype= CFG.dtype).to(CFG.device)
And in a pattern familiar to anyone who’s ever built anything with Gradio, we wrap the main functionality in a single function:
def infer(input_image, prompt, guidance_scale=2.5, steps=28, progress=gr.Progress(track_tqdm=True)):
input_image = input_image.convert("RGB")
image = pipe( image=input_image, prompt=prompt, guidance_scale=guidance_scale, width = input_image.size[0], height = input_image.size[1], num_inference_steps=steps, generator=torch.Generator().manual_seed(CFG.seed),
).images[0]
return image, gr.Button(visible=True)
And that’s it! We are now ready to test drive the latest offering from BFL.
The nice thing is that origin of the photo we edit does not matter - it can be a real photo, or something generated in a text-to-image model - so I started with a somewhat artistic portrait I had laying around:
change the head cover to a black hat


I rather like the result. What about some home design variations?
make the chairs red and the table blue


Works like a charm, although one can’t help noticing the details in the background getting somewhat sharper (most notable with the clock on the top shelf).
Let’s try something more complicated: we start with a picture of a cute robot, who looks like Wall-E after a week in Eastern Europe. Maybe getting some more company could cheer him up?
there are two of those robots , they are raising their arms


Ok, it looks like they both are happy the one on the right is ending it all - my bad, I should’ve made the prompt more specific. The important thing is different though: despite its simplicity, the prompt require inpainting a second object (cloning it), modifying both, and changing the dynamics of the scene - and it worked, although the background is made more sharp and focused again (which i did not ask for)
Nice and easy, isn’t it? I don’t know about you, but I rather like our timeline - where you write a few lines of code and get to do things that only used to be possible in Photoshop (with a triple digit price tag).