Diffusion models: image generation

From zero to awesome images - with Python code

Sep 18, 2025

Everybody and their dog has been talking about image generation for a while now, and it’s never been easier to jump on one of the gazillions of apps that let you do it by prompting. But entertaining as it is, there are a two good reasons to DIY:

Cost control: you avoid per-image API fees and know exactly what costs you are incurring.
Model fine-tuning: You can adapt a model to specific styles, brands, or domains. This is often impossible with closed platforms like Gemini due to their ~~censorship~~ alignment policies, where the most innocent prompts trigger a block.

I hope I’ve convinced you that it's worth learning to generate images with open-weights models. In the remainder of this piece, I will show you how.

New image generation models are popping up regularly, I decided to go for Flux: a good balance between quality and compute requirement. Flux is a family of text-to-image generation models developed by Black Forest Labs - an outfit founded by former Stability AI researchers who created Stable Diffusion (the previous king of the hill). Flux is available in several versions: high-performance "Pro" model, an open-weight "dev" model for non-commercial use, and a fast "Schnell" version optimized for speed. Project website: https://flux-ai.io/

If you want to understand how the diffusion models do what they do, check out the “intro” post at the bottom of this one. Full notebook for those interested (including detailed explanations of the code):

https://github.com/tng-konrad/united_states_of_banan/blob/main/diffusion_image_generation.ipynb

The entire notebook should work elsewhere, but Colab is where I tested it - no matter your preferences, chances are in 2025 you have a Google account, and can access Colab. First thing is to install the necessary libraries

!pip install -qU gradio transformers diffusers accelerate safetensors

We're grabbing everything we need from the wonderful world of Hugging Face: diffusers to handle the heavy lifting of image generation, transformers to understand our text prompts, and a few other helpers to make everything run smoothly. Some of the libraries do come preinstalled, but the field is changing rapidly and you always need a most recent version of a library to work - and there’s no guarantee the underlying image is fresh enough.

Import what we need:

import torch from diffusers import FluxPipeline from random import sample import os import itertools from IPython.display import Image

The main thing here is FluxPipeline from diffusers: courtesy of the creators of this wonderful package, most of the details are abstracted away and all we need to care about are parameters.

pipe = FluxPipeline.from_pretrained(CFG.model, torch_dtype = CFG.dtype)

pipe.enable_model_cpu_offload()

What’s going on her

The from_pretrained function is doing the heavy lifting of downloading the model for us.
enable_model_cpu_offload() is a neat trick to save memory on your GPU by shuffling parts of the model to the CPU when they're not being used.

So that takes care of the image part - but the model is text-to-image, which is where the prompt comes in: here we describe the image we want to create:

prompt = """

A statuesque beautiful woman sitting on a dark yellow platform, wearing long blue dress and barefoot in bright room, side view, full body shot, black hair, white walls, sunlight from window, soft shadows, watercolour and alcohol ink paint art abstract

"""

You can go elaborate or simplistic - the only real limitation is the token limit, which for Flux is 512 tokens. We can now call the pipe, with two non-obvious arguments:

guidance_scale: the best way to think about it is a creativity knob. A higher value forces the model to stick very closely to your prompt, while a lower value gives it more creative freedom to interpret the text.
num_inference_steps: This is the number of steps the model takes to denoise the image from random static into your final picture. More steps lead to a more detailed and higher-quality image, but it will take longer to generate. Fewer steps are faster but might result in lower quality. A value between 8 and 20 is often a good sweet spot for Flux.

out = pipe(

prompt=prompt, guidance_scale= 3.5, height=768, width=1360, num_inference_steps= CFG.infsteps,).images[0]

out.save("image.png")

Open the image and voila:

That’s fun, but can we do more? we might want to create multiple images in one go, testing different options for the characters shown there in. The fastest way to do it is to generate multiple prompts programmatically: first we setup a list of characteristics / dimensions along which we will vary - gender, race, age, profession, and image style.

gender_list = ['woman', 'man']

origin_list = [ 'North European', 'Middle Eastern', 'South East Asian' ]

age_list = ['young', 'middle aged', 'elderly']

profession_list = [ 'doctor', 'athlete', 'singer']

style_list = ['realistic photograph', 'Rembrandt painting', 'minimalist graphic']

Then we combine the lists into a Cartesian product and map into a list

totality = [origin_list, age_list, gender_list, profession_list, style_list ]

combo = itertools.product(*totality) combo_list = [] for f in combo: combo_list.append(f)

Add few extra words and we’re good to go:

prompt_list = []

for (ii, xx) in enumerate(combo_list):

prom = "Cinematic, full-body image of " + xx[0] + " " + xx[1] + " " + xx[2] + " " + xx[3] + ", in the style of " + xx[4]

prompt_list.append(prom)

Which gives us prompts like these:

'Cinematic, full-body image of South East Asian middle aged man doctor, in the style of Rembrandt painting',

'Cinematic, full-body image of Middle Eastern middle aged woman singer, in the style of minimalist graphic',

'Cinematic, full-body image of North European young woman athlete, in the style of realistic photograph',

We can re-use the code from earlier and just loop over the prompts:

for (ii, prompt) in enumerate(prompt_list):

for jj in range(CFG.howmany):

image = pipe(prompt=prompt, num_inference_steps = CFG.infsteps, generator = g).images[0]

imgname = "img_" + str(ii) + "x" + str(jj) + ".jpg"

image = image.save(imgname)

print(prompt)

A few selected examples of the results:

As you can see, the diffusers library makes it very straightforward to get started with a powerful model like Flux. In just a few lines of Python, we've gone from installing libraries, to generating a high-quality image, and even creating a whole batch of diverse portraits by programmatically combining prompts.

As usual, I encourage you to take this code and make it your own. Try different prompts, mess with the guidance_scale and num_inference_steps to see how they affect the output, or even swap in a different open-weights model from Hugging Face.

If you're interested in learning more, here is a theoretical intro (absolutely minimal jargon, pinky swear):

Diffusion models: the intro

Konrad Banachewicz

Sep 1

In this post I will explain the core ideas powering image generation models - in a rigorous manner, but with the least amount of jargon. In the good tradition of academic books (once a math guy, always a math guy), we start with the theory.

Read full story

And a reading list:

Reading list: diffusion models

Konrad Banachewicz

Aug 28

Introduction to Diffusion Models

Read full story

United States of Banan

Diffusion models: the intro

Reading list: diffusion models

Discussion about this post