Pioneering Design: AI Driven Kapla Stack Generation

Team member(s): Vasileios Mavromitros, Lauren Deming and Santosh Prabhu Shenbagamoorthy
Modified by Vasileios Mavromitros on April 28, 2025

A Closed-Loop RL-GAN-ML Pipeline for Generative Design

VIDEO:

Objective:

Demonstrate an intelligent system for stable Kapla stack design, showcasing its intelligent, iterative pipeline.

Automated Generation of Stable Kapla Plank Stacks.

1. Introduction

The Python script (`loop.py`) presents a sophisticated framework for automating the design of stable Kapla plank stacks, a wooden construction toy consisting of rectangular blocks with fixed dimensions (120 mm length, 24 mm width, 8 mm height). The system optimizes stacks for either maximum height (“tallest” mode) or horizontal extension (“cantilever” mode), ensuring physical stability through constraints like collision avoidance, center of mass (COM) alignment, contact area sufficiency, and torque balance. This is achieved through a closed-loop system integrating multiple intelligent components:

Reinforcement Learning (RL) via Q-learning: A Q-learning agent learns optimal plank placement strategies by trial and error.
Generative Adversarial Network (GAN): A simplified GAN proposes diverse stack parameters (e.g., number of layers, offsets).
Meta-Learning: A meta-optimizer adaptively tunes RL parameters and stability thresholds based on performance.

Visualization and Export: Interactive 3D visualization and export capabilities for `.3dm`, `.csv`, and `.png` formats.

This chapter provides a comprehensive analysis of the script, focusing on the role of the GAN, the interplay of intelligent components, and the dynamics of the closed-loop system. It is organized as follows:

1. Overview and Objectives: High-level description of the script’s purpose.
2. Key Concepts and Terminologies: Explanation of RL, GAN, meta-learning, and stability constraints.
3. Code Structure and Dependencies: Organization and required libraries.
4. Detailed Component Analysis In-depth breakdown of classes, methods, and functions.
5. Role of the GAN: Specific contributions and integration in the closed-loop system.
6. Closed-Loop System Dynamics: Interactions among intelligent components.
7. Mathematical Formulations: Equations governing Q-learning, GAN, and stability checks.
8. Examples and Analogies: Practical illustrations for clarity.
9. Conclusion: Summary and potential extensions.

2. Overview and Objectives

The script automates the generation of stable Kapla plank stacks, addressing the challenge of constructing physically viable structures under user-defined constraints (e.g., number of planks, mode). It ensures stability by enforcing physical rules and employs intelligent components to explore and optimize stack configurations. Key objectives include:

Stability: Ensure stacks are physically stable, avoiding collisions and maintaining COM and torque balance.
Optimization: Maximize height (tallest mode) or cantilever (cantilever mode) while using all specified planks.
Diversity: Generate varied stack configurations for user selection.
Interactivity: Provide visualization and export options for practical use.

The system operates as a closed-loop framework, where feedback from stability checks and placement outcomes informs parameter adjustments, iteratively improving performance. The GAN initializes diverse configurations, the Q-learning agent places planks, the meta-optimizer tunes parameters, and visualization allows user interaction, forming a robust design pipeline.

Analogy: The system resembles a self-improving architectural team. The GAN proposes blueprints, the Q-learning agent builds structures, the meta-optimizer refines strategies based on past failures, and stability checks ensure structural integrity. The user, as the client, reviews and selects designs.

3. Key Concepts and Terminologies

3.1 Reinforcement Learning (RL)RL is a machine learning paradigm where an agent learns to make decisions by interacting with an environment. The agent observes the state(\( s \)), takes an action (\( a \)), receives a reward (\( r \)), and transitions to a new state (\( s’ \)), updating its policy(\( \pi \)) to maximize cumulative rewards.

State A representation of the environment (e.g., stack height, COM).
Action: A decision (e.g., placing a plank at an offset).
Reward : Feedback on action quality (e.g., +10 for stable placement, -100 for unstable).
Policy: A mapping from states to actions, learned via Q-learning.
Q-value (\( Q(s, a) \)): Expected cumulative reward for action \( a \) in state \( s \).
Example: A robot stacking blocks learns to place them stably (high reward) versus causing collapses (low reward).

3.2 Q-Learning

Q-learning is an off-policy RL algorithm that learns the optimal action-value function \( Q(s, a) \).

Example: In a maze, a Q-learning agent learns paths to the exit by updating Q-values based on rewards for each move.

3.3 Generative Adversarial Network (GAN)

A GAN typically comprises a generator producing data and a discriminator evaluating authenticity. In this script, `SimpleGAN` is a simplified generator, transforming noise into stack parameters.

Example: A GAN generating cat images learns to produce realistic images (generator) while distinguishing them from real photos (discriminator). Here, it proposes stack configurations.

3.4 Meta-Learning

Meta-learning optimizes the learning process. The `MetaOptimizer` adjusts RL parameters (e.g., \( \epsilon \)) and stability thresholds based on performance.

Example: A meta-learner tuning a chess RL agent increases exploration if it repeatedly loses due to predictable moves.

3.5 Stability Constraints

Physical stability is enforced via:

Collision Check: No plank overlap in the same layer.
Center of Mass (COM): COM within the base’s support polygon.
Contact Area: Sufficient overlap with lower planks.
Torque: Rotational forces below a threshold.
Example: Stacking books requires aligning COM over the table and ensuring enough contact to prevent slipping.

4. Code Structure and Dependencies

4.1 Dependencies

rhino3dm: For exporting 3D models in Rhino (.3dm) format.
numpy: For numerical operations like rotations, centers of mass calculations, etc.
csv: For logging and exporting plank or stack data to CSV files.
matplotlib: For 3D visualization of the generated stacks.
torch: To build and train the Generative Adversarial Network (GAN).
logging, os, datetime, uuid, random: For debugging, file management, timestamps, unique IDs, and random number generation.
dataclasses, typing, collections: For organizing structured data, providing type hints, and working with grouped data structures.

4.2 Structure

The code is organized into:

Classes:
- KaplaPlank: Models the geometry and properties of a single plank.
- SimpleGAN: Creates and trains the GAN for generating plank stack parameters.
- MetaOptimizer: Optimizes reinforcement learning (RL) and stability-related parameters.
- QLearningAgent: Implements Q-learning logic for improving stacking strategies.
- StackGenerator: Coordinates plank stacking and checks for stability conditions.
- StackVisualizer: Creates visual outputs and exports stacks into .3dm and CSV files.
Functions:
- export_to_rhino3dm, export_to_csv: Handle exporting data to Rhino and CSV formats.
- get_valid_input: Ensures user-provided input is valid before execution.
- main: Main execution function that runs the whole stack generation process.

Parametric Design

4.3 Constants

Plank Dimensions : PLANK_LENGTH = 120.0PLANK_WIDTH = 24.0, PLANK_HEIGHT = 8.0 mm; PLANK_MASS = 1.0

Physical Constants: GRAVITY = 9.81` m/s².

5. Detailed Component Analysis

Generative Design:

GAN :

Model Name: SimpleGAN
Type: Feedforward Neural Network (simulated GAN generator)
Where It Works: StackGenerator, initializing stack configurations.
How It Works: Transforms random noise into parameters (layers, offsets) via layers.
Math (Simplified): Think of noise as a random seed. The GAN mixes it through layers to output a recipe

Q-Learning Agent :

Model Name: QLearning Agent
Type: Q-Learning (Reinforcement Learning, tabular) with Q-table
Where It Works: StackGenerator, placing planks in tallest/cantilever modes.
How It Works:
- State
- Action
- Reward – positive / negative
- Learning
Math (Simplified): The agent scores actions (Q-values) like a game scorecard. It updates scores to prefer actions that lead to stable stacks

Stability constraints :

Model Name: Stability Checks

Type: Physics-Based Constraint Model
Where It Works: StackGenerator, validating plank placements.
How It Works:
- Collision
- COM
- Contact Area
- Torque
Math (Simplified): COM is like the stack’s balance point. Torque is like the twist that could tip it over

Meta-optimization :

Name: MetaOptimizer
Type: Meta-Learning (rule-based optimization)
Where It Works: StackGenerator, adjusting system parameters.
How It Works:
- Tracks success rate and failure reasons (e.g., collision, torque).
- Adjusts: Exploration (epsilon), contact/torque thresholds based on failures.Warm-starts Q-table with weighted best performances.
Math: Success rate is like a report card average. If it’s low (e.g., < 80%), tweak settings (randomization)

Stack Visualization :

Name: StackVisualizer
Type: Interactive visualization
Where It Works: StackVisualizer main function, post generation display
How It Works:
- Renders 3d stack
- Display metrics
- Display RL stats
- Export function

6. Role of the GAN

The `SimpleGAN` is the diversity engine, generating varied stack parameters.

Contributions

Diversity: Prevents convergence to a single design.
Initialization: Sets Q-learning environment.
Scalability: Enables multiple variations.
Feedback: Meta-optimizer adjusts thresholds, enhancing GAN proposal success.

Limitations

Static weights limit adaptability.
Simplified architecture reduces complexity.
Heuristic output mapping.

7. Closed-Loop System Dynamics

The system forms a feedback-driven pipeline

1. GAN: Proposes parameters.
2. Q-Learning: Places planks, updates Q-table.
3. Stability Checks: Enforce constraints, provide rewards.
4. Meta-Optimizer: Adjusts parameters based on failures.
5. Visualization: User selects stacks.

Analogy: A construction crew where the GAN designs, Q-learning builds, stability checks ensure safety, and the meta-optimizer refines strategies.

8. Mathematical Formulations

– Q-Learning:

– GAN:

– COM:

– Torque:

– Success Rate:

9. Examples and Illustrations

9.1 GAN Example

Noise \( z = [0.4, 0.6, \ldots] \), \( p = [0.7, 0.5, \ldots] \), total_planks = 30:
Layers: 18.
Planks per layer: [5, 4, 2, …].

9.2 Q-Learning Example

9.3 Meta-Optimizer Example

Success rate 0.64, torque failures: Increase torque threshold to 0.055.

9.4 Closed-Loop Example

GAN proposes 10 configurations, Q-learning places planks, meta-optimizer relaxes COM margin, user exports 3 stacks.

MatplotLib Outputs :

Iterations + Evaluations :

10. Conclusion

The script offers a robust framework for generating stable Kapla stacks, leveraging a closed-loop system of GAN, Q-learning, meta-optimization, and stability checks. The GAN drives diversity, the Q-learning agent optimizes placements, the meta-optimizer ensures adaptability, and visualization enables user interaction.

Future enhancements include training the GAN online, adding complex stability checks, and parallelizing generation.

Pioneering Design: AI Driven Kapla Stack Generation is a project of IAAC, Institute for Advanced Architecture of Catalonia developed in the Master in Robotics & Advanced Construction 01 - 2024-2025 by the student(s) Vasileios Mavromitros, Lauren Deming and Santosh Prabhu Shenbagamoorthy during the course MRAC01 24/25 Workshop 3.1 "Learning the Matrix" with Zeynep Aksöz.