ISOGEN — Isometric Building Generation with AI

By Joaquín Broquedis & Marco Durand
Final project for Generative AI

Introduction

This project explores the use of generative AI to produce isometric visualizations of architectural typologies. We developed ISOGEN, a small-scale prototype aimed at testing whether a custom-trained image-to-image LoRA (Low-Rank Adaptation) model could learn to generate building designs from simplified voxel massing inputs and descriptive prompts.

The prototype consists of two main components:

  • A dataset creation and model training pipeline.
  • An interactive frontend application that integrates the model inference process via ComfyUI.

Concept: Satellite Images as a Design Dataset

We started by collecting 45° satellite images from urban areas, since they offer clear isometric views of buildings with minimal distortion. The focus was not on location or context, but rather on the typological characteristics of the buildings.

After selecting relevant images, we performed background removal to isolate the architectural forms. These cropped images served as the core dataset for training.

Typologies included in the dataset:

  • U-shaped blocks
  • L-shaped forms
  • Courtyard buildings
  • Blocks

Each image was tagged with descriptive prompts related to form, size, and structural features, such as “L-shaped, mid-rise, residential buildings, slanted roofs, with balconies.”

Dataset: From Pixels to Prompts

The dataset was structured with a combination of:

  • Cropped PNG images (uniform size and angle)
  • Associated textual prompts for each image
  • Manual metadata including typology category, height estimation, and roof types (where visible)

A total of 42 images were included in the training set. These were sufficient to fine-tune a LoRA model for testing, but we acknowledge the dataset remains small for broader generalization.

LoRA Training: Custom Typology Generator

We used the Flux model as a base, and trained a LoRA adapter using 42 tagged images and prompts.

Training details:

  • Model base: FLUX_fp8
  • Steps: 2,000
  • Prompting approach: Tag-based, descriptive sentences
  • Training tool: Custom Collab Workflow

Our goal was to test whether the model could learn to apply stylistic features and architectural detailing onto base geometries provided later as simplified voxel renderings. The LoRA performed well within this scope, especially in reproducing geometric traits like stacked volumes or enclosed courtyards.

The App: ISOGEN Playground

To test the trained LoRA in a user-facing environment, we built a custom voxel editor in Three.js, embedded into a Gradio-based frontend.

Key features of the app:

  • Editable isometric grid
  • Voxel placing and deleting
  • Adjustable grid size
  • Color toggle and visualization modes (solid or wireframe)
  • Prompt input field
  • Export functions (image and 3D model)
  • AI generation button linked to the ComfyUI API

The app captures a screenshot of the voxel geometry, combines it with the user’s prompt, and sends both to ComfyUI where the LoRA inference runs.

Output and Evaluation

After processing, the user receives an image generated by the LoRA, styled according to the descriptive prompt and aligned to the spatial logic of the voxel input.

We included a comparison slider for visual reference between the original voxel shape and the AI-enhanced version.

Parameters during generation:

  • LoRA weight: 1.0
  • Sampling steps: 20

Results were visually consistent with the original dataset typologies, although fidelity decreased for complex inputs or vague prompts.

Limitations and Next Steps

Possible next steps to improve the current limitations include:

Add prompt templates or auto-complete suggestions for more consistent results.

Use semantic segmentation or depth maps for improved geometry transfer, canny lines can be limiting.

Explore use of ControlNet for voxel-aware generation