Visual Storytelling using SDXL _LoRA_Gradio

Exploring Dali and Bosch-inspired storytelling with SDXL + Gradio

Our project is a tool that helps architects turn stories into surreal image sequences. We explored styles inspired by Dali and Bosch to create dreamlike visuals. Although we trained a custom model, we used prompt-based image generation in the final version. Our Gradio app lets users enter a short text and get a Video showing a visual journey — offering a creative way to present projects.

Use – Case

Storytelling Tool for Architectural Presentations

User: Architects & designers

Need: Architects need to tell compelling project stories in early design or competition stages.

Problem Solved: Manually composing narratives from renders is time-consuming and fragmented.

Frequency of Use: During client presentations, design reviews, competitions, or stakeholder pitches.

SDXL _LoRA + User Input: Story Text + Style

30 images as a dataset

From Mycelium to Society: A Surreal Narrative in LoRA Training

“Mycelium awakens, evolving into a human-like creature. It walks, spreading spores, forming a society shaped by surreal architecture”

Gradio tool uses an LLM to turn stories into animated zoom sequences

Code insight – How it works

NARRATIVE TO VISUAL PROMPT PREPARE THE BASE FRAME SIMULATE ZOOM-OUT EXTEND SCENE WITH INPAINTING

COMPOSITE THE NEW OUTPAINTED IMAGE GENERATE INTERPOLATION FRAMES REPEAT FOR MULTIPLE STEPS COMPILE THE FRAMES INTO A VIDEO

Beyond the Frame Gradio Interface

Output

Users Can View a Frame Carousel Alongside the Full Prompt Sequence

Inpainting Model Comparison

Prompt a castle in a jungle, a jungle in a cave, a cave in a cathedral

Model ID stabilityai/stable-diffusion-2-inpainting

Outpaint Steps 20

Zoom Strength 128

Frame per Steps 30

Seed 0

Prompt a castle in a jungle, a jungle in a cave, a cave in a cathedral

Model ID parlance/dreamlike-diffusion-1.0-inpainting

Outpaint Steps 20

Zoom Strength 128

Frame per Steps 30

Seed 0

Zoom Factor Comparison

Prompt a castle in a jungle, a jungle in a cave, a cave in a cathedral

Model ID stabilityai/stable-diffusion-2-inpainting

Outpaint Steps 20

Zoom Strength 128

Frame per Steps 30

Seed 0

Prompt a castle in a jungle, a jungle in a cave, a cave in a cathedral

Model ID stabilityai/stable-diffusion-2-inpainting

Outpaint Steps 20

Zoom Strength 64

Frame per Steps 30

Seed 0

Prompt input

The primary input mode is a list of individual prompts. An automated system appends a style suffix to each prompt to ensure visual and stylistic continuity.

Prompt a beautiful room with a window, a corridor with dark boiserie, a window carved in stone

Model ID parlance/dreamlike-diffusion-1.0-inpainting

Outpaint Steps 35

Zoom Strength 128

Frame per Steps 30

Seed 170727

Prompt generated

a beautiful room with a window — Hieronymus Bosch and Salvador Dali surreal style

a corridor with dark boiserie — Hieronymus Bosch and Salvador Dali surreal style

a window carved in stone — Hieronymus Bosch and Salvador Dali surreal style

Prompt + image input

In this mode, generation begins from an initial image and a prompt. The prompt guides the style and content of the first frame, which then evolves through successive zoom-out steps.

Prompt a floating castle, a sea full of boat and mythological creatures, a land made of ice and mysterious rock formation

Model ID stabilityai/stable-diffusion-2-inpainting

Outpaint Steps 20

Zoom Strength 128

Frame per Steps 30

Seed 0

The same prompts are also tested without the initial image input

Prompt a floating castle, a sea full of boat and mythological creatures, a land made of ice and mysterious rock formation

Model ID stabilityai/stable-diffusion-2-inpainting

Outpaint Steps 20

Zoom Strength 128

Frame per Steps 3

Seed 0

Inside the LLM’s Process

This function uses a large language model (LLM) to turn a short text description into a sequence of visual prompts that gradually zoom out—perfect for creating frame-by-frame animations or infinite zoom effects. It sends the user’s input to OpenAI’s GPT model with a special instruction: break the description into a fixed number of short, Stable Diffusion–style prompts, each showing a slightly wider view than the last while keeping the subject, style, and lighting consistent. The result is a smooth, coherent series of prompts that can be used to generate a zooming visual narrative. If the LLM call fails, it falls back to a default sequence to keep things running.

Prompt A men in Venice

Model ID parlance/dreamlike-diffusion-1.0-inpainting

Outpaint Steps 35

Zoom Strength 128

Frame per Steps 3

Seed 0

Prompt generated

Mysterious man in a gondola on Venice canal — Hieronymus Bosch and Salvador Dali surreal style

Silhouetted figure against Venetian sunset — Hieronymus Bosch and Salvador Dali surreal style

Masked man wandering through narrow Venetian streets — Hieronymus Bosch and Salvador Dali surreal style

Glimpse of a man disappearing into Venetian mist — Hieronymus Bosch and Salvador Dali surreal style

Specter of a man haunting Venice’s grand square — Hieronymus Bosch and Salvador Dali surreal style

Story input with LLM

By providing a story, the system triggers a call to LLM, which elaborates a sequence of visual prompts. An automated suffix is attached to create style continuity.

Prompt A child stands at the edge of a quiet forest, staring into a floating mirror framed in ivy. As she steps through it, the forest unravels into a bioluminescent garden, where flowers pulse like jellyfish and trees stretch into the stars

Outpaint Steps 20

Zoom Strength 128

Frame per Steps 30

Seed 0

Prompt generated

Child enters mirror, forest transforms into glowing garden — Hieronymus Bosch and Salvador Dali surreal style

Bioluminescent flowers pulse, trees reach for the stars — Hieronymus Bosch and Salvador Dali surreal style

Trained LoRA Model Didn’t Integrate with Zoom Tool—But Future Integration Remains Promising