Why “Faster” Matters

A single photorealistic render can quickly be very expensive and take days to produce, locking design changes behind hefty bills and long waits for architects. Generative AI, through models like Stable Diffusion and custom LoRAs, slashes both cost and time, turning sketches into near-final visuals in seconds. By democratizing high-fidelity imagery, it empowers every team member to iterate rapidly and collaborate more creatively.

Introduction & Context

Pain Points in Architectural Visualization

Traditional pipelines demand specialized software and skilled operators, driving up costs for each image and stretching delivery times. Iterations are cumbersome, each revision means reconfiguring lighting, materials, and camera setups, locking out non-experts and slowing design dialogues.

Primer on Generative AI in Creative Fields

Generative AI uses deep learning models to synthesize or transform images based on text prompts or reference inputs. Closed platforms like Midjourney and DALL·E offer polished, user-friendly interfaces but can be restrictive and limited in customization. Open-source engines such as Stable Diffusion enable full control over model weights, fine-tuning (e.g. training custom Low-Rank Adaptation (LoRAs)), and local deployment, ideal for integrating bespoke studio workflows.

This term and project explored AI image generation testing SDXL by Stability.ai first and finally jumped to Flux.1-Dev by Black Forest Lab, all with a ComfyUI and Google Colab worflow experimentation.

ComfyUI preview for Image to Image with LoRA workflow

Core Techniques in Action

Text-to-Image (using Flux.1-Dev)

Text-to-Image lets you go from a simple prompt, “ring-shaped house perched on a forested mountain island”, to a high-fidelity concept render in seconds. Just by writing the description, maybe applying a LoRA, and AI interprets your vision with realistic lighting, materials, and mood without any other inputs.

Image to Image (using Flux.1-Dev)

With Image-to-Image, you transform a rough sketch, model screenshot, or photo into a polished visualization in seconds. By leveraging Canny edge or Depth-map conditioning, just optimizing the threshold, and applying your chosen LoRA, you retain your original composition while instantly boosting detail, lighting, and style.

AI Image
3D model (Rhino)
3D model (Rhino)
Hand drawing
Hand drawing
(AI) Image to Depth(AI) Image to Depth
3D to Canny
3D to Canny
Depth on hand drawing
Depth on hand drawing
Depth to Image (+ inpainting house)
Depth to Image (+ inpainting origami)
Canny to Image
Depth to Image
Depth to Image

Finetuning LoRA

After experimenting with the basics of AI image generation, we moved on to the next step with finetuning our custom LoRAs. For this, we firstly gathered a dataset of 60 images from top-edge architecture visualisation offices and wrote the caption for each. We trained a first LoRA with Dreambooth unsuccesfully. Going through the literature and published papers again helped us realized that much less images are required. Thus, we trained two LoRAs, one with 5 images, the Sunset Golden Hour, and one with 9 images, the Sunset Contrast.

Sunset Contrast LoRA scale 0-1
0 (without LoRA)
0.2
0.4
0.6
0.8
1
Sunset Golden Hour LoRA scale 0-1
0 (without LoRA)
0.2
0.4
0.6
0.8
1
Before / after Sunset Golden Hour LoRA
Image to image with only keywords and basic Flux.1-Dev, without any LoRA.
Image to image with Flux.1-Dev and our custom Golden Hour LorA.
Image to image with depth (seen above)
Image to image & finetuned Sunset Contrast LoRA

From image with LoRA to image with LoRA

Text to Image & finetuned Sunset Golden Hour LoRA
Image to Image & finetuned Sunset Contrast LoRA
Golden Hour Sunset LoRA to Contrast Sunset LoRA

Bringing It Into the Studio

Gradio Deployement

With tight competition deadlines, every minute counts. To accelerate ideation and rapidly explore design directions, we wrapped everything into a Gradio app where you:

  • Paste a brief for a Large Language Model (LLM) to generate five starter prompts
  • Select keywords or type them to put more weight on your architectural priorities.
  • Drawing to Render, upload sketches, model or 3D view, and adjust Canny / Depth thresholds to bring your concept to life.
  • Select a LoRAs (or upload your own) by trigger word to apply custom styles.
  • Browse a live gallery of all your work and results and download PNGs.

All in under a minute, no coding skills required, yet fully tunable for power users.

Frontend: Gradio UI

Backend: Google Collab script

Tab 1: Concept Generator
Tab 2: Moodboard Generator
Tab 3: Gallery of Work
Workflow Gains

Iteration Speed What once took hours or days on a render farm now completes in seconds, letting teams test multiple design variants in a single meeting.

Democratization Junior designers, interns, and clients can generate and tweak visuals themselves, no specialized 3D or rendering expertise required.

Reflection on Process

We found that a tightly curated dataset of just 5–9 high-quality images outperforms larger, noisier collections, proof that “less is more” when training Flux LoRAs. Equally vital was dialing in Canny and Depth thresholds (even a 0.1 shift dramatically altered detail retention), refined through three rounds of fine-tuning feedback to lock in each model’s unique aesthetic.

Our pipeline still faces edge-artifact challenges around intricate geometry, a hard 1024×1024 px resolution cap (or 512×512 px when speed is paramount), and no built-in multi-view consistency for fully coherent 3D scenes. To bridge these gaps, we’re exploring direct integration with architectural software, building a shared repository of curated datasets for cross-studio collaboration, and running user studies to polish the interface and validate real-world design benefits.

On the economic side, replacing a €2.000 render with near-zero variable cost shifts the conversation from gate-kept visualization to architect-empowered ideation. While this raises complex questions about future skill sets and studio roles, it ultimately returns budget and creative control to architects, letting them invest in the ideas that matter most.