Abstract

The project explores the implementation of machine learning models to generate LEGO building instructions manuals and providing a detailed description of the set. We employee diffusion models along with LLM (Large Language Model) to generate both the images of the Lego set and its description. LoRA (Low-Rank Adaptation) we train a stable diffusion model on a set of LEGO building instructions manual to fine tune the model and have it generate imagery in a similar style. For our LLM, we also fine tune it to generate the text part of our project in a similar tone to LEGO’s.

Workflow

Our workflow starts by data scraping for the training dataset from existing LEGO manuals. These manuals are extracted from PDFs found on Lego’s as well as fans websites such as Rebrickable and Brickset.

Diffusion Model and LoRA Training

By running a LoRA training run for our diffusion model with 5000 steps, we achieve the desired results and the model becomes fine tuned to generate images in the desired style. We leverage Weights and Biases documentation capabilities to observe the training performance. This allowed us to confirm the learning effectivens and see how the adaptation process is shaping up throughout all steps.

LoRA Scales

LLM Fine Tuning

Conclusion

By combining a fine-tuned diffusion model and an LLM model using LoRA, we successfully managed to produce a synergy between two different machine learning concepts. The potential for such workflows exhibts the potential of these tools and how it can change our approach towards creating LEGO sets.