Mas Profundo is a pipeline that allows designers to generate interesting architectural images from simple models using a live camera feed. 

Architectural models tend to portray a reduced colorway and detail is limited by workmanship and time.  

Mas Profundo allows each model ideation to be visualized more realistically than a simple foam model.

Mas Profundo allows a user to generate detailed renderings of even simple block models using depth mapping.  Our goal is to allow even simple models to have deep meaning.

We allow the user to invert the depth which can produce subtle changes in tone and lighting. Inverted depth maps, combined with the right user prompt can generate interior views from models.

Can you tell that the input image is a human fist?
Can you tell that the input image is a human fist?

Demo

Mas Profundo allows a user to record a short video clip of their model.

In the capture settings, the frames count and the length of recording can be changed.

The video extracts the frames and produces a depth map.

Depth contrast can be increased and decreased, flattening or extending the frame of reference, and we can also invert the depth.

Next the user enters their prompt, and clicks run to generate their renders.

Then Mas Profundo generates images for each frame.

Reflections

Through producing Mas Profundo we overcame several challenges, critically responsiveness of the app was greatly improved by batching the frames for inferences. Mas Profundo runs with a Gradio Interface through a Google Colab notebook, which increases the throughput and computational overhead. Additionally, we struggled with controlling the image storage and recording system, a future version of Mas Profundo would have a customized native application tailored to the pipeline.

For future work it would be interesting to use an LLM to improve the user’s prompt, but at this time we didn’t feel like the delay in processing time would be acceptable.  We would also like to adjust the batching such that each image contains visual context from the previous, closer to video generation. Another improvment would be a native interface or dedicated web interface that accesses a cloud graphics server for generation. This would allow the user to run Mas Profundo on a phone or laptop without needing a high-end GPU.