In this seminar, you will be introduced to conceptual perspectives around dataset design, feature encoding, for targeted ML applications. You will develop your own ML pipeline for a use case of your choice taking departure from a common building-specific dataset.


Syllabus

Data Encoding Seminar 

MaCAD Digital Tools for DATA ENCODING SEMINAR

 


Hesham Shawky & Cami Quinteros –  Imaginary Vessels

Machine learning algorithms offer an alternative modelling paradigm for complex problems. Differently from analytical or procedural models where processes are computed using iterative, time-based, heavy calculations and pre-assigned property parameters, machine learning models learn by example through searching for an approximated relationship between the inputs and the targets. Once trained, they can be easily deployed within a design workflow to offer targeted predictions. While these models promise advantages and productivity, they are notoriously data-hungry. Many open source datasets are available online for training of state-of-the-art models, however these datasets are not relevant for architectural applications.

Dataset design is the enabler to using Machine Learning algorithms in different fields. While an existing ecology of algorithms provides architectures suitable to specific types of problems be classification or regression, the models must be trained on datasets pertraining to the problem they are expected to solve. These custom datasets can be composed through web-scraping, or be generated, either computationally through heuristic algorithms, or through sensing and digitizing physical samples. Dataset design requires several criteria to be followed: Increased sample diversity, bias avoidance and ambiguity avoidance. These criteria ensure that the problem is well represented, with equal distribution and without confusion – and is the key to a successful ML training campaign.

In this seminar, you will be introduced to conceptual perspectives around dataset design, feature encoding, for targeted ML applications. You will develop your own ML pipeline for a use case of your choice taking departure from a common building-specific dataset. Working in groups, you will be tasked to curate a dataset and train an adequate model for your predictive task. Through the study of the parameter space and feature distribution and representation, you will propose a dataset encoding method, and evaluate it through the training of shallow models as well as artificial neural networks. A key milestone will be desiging a computational pipeline bringing the prediciton back into the design workflow. This is an occasion to step back from “mega-models” such as LLMS and Diffusion Models, and build a foundational understanding of the basics of ML. 

Learning Objectives

At course completion the student will:

  •   Become knowledgeable of fundamental machine learning concepts and workflows.
  •   Reflect analytically on notions of parameter space, data encoding and feature selection
  •   Acquire competences in developing custom datasets for architectural application
  •   Acquire competences in feature engineering (dimensionality reduction, data analysis)
  •   Acquire a hands-on experience in using state of the art machine learning libraries
  •   Develop appropriate representational methods and tools to showcase your findings
  •   Collaborate effectively within a group-working exercise

Faculty


Faculty Assistants


Projects from this course

From Raw Acoustics to Predictive Insights: Modeling Comfort in Architectural Spaces

Introduction In this blog, we walk through the complete data science process applied to a unique challenge: predicting the acoustic comfort index of apartment units based on architectural and environmental features. This includes data preprocessing, feature engineering, model training, interpretation, and tuning—with tools like XGBoost, Neural Networks, and SHAP for explainability. The workflow is split … Read more

Predicting Microclimatic Comfort (UTCI) in Courtyard via Data-Driven Regression Models

Our team developed a simple, data-driven process that predicts courtyard thermal comfort in real time. Our method gives designers immediate feedback on how their decisions impact the Universal Thermal Climate Index (UTCI), a standard measure of outdoor comfort. In this post, we’ll take you through our approach, from dataset creation to deployment in Grasshopper, and … Read more

CarbonAI

Bridging Carbon estimations in early design stages The built environment accounts for nearly 40% of global carbon emissions, the urgency to integrate sustainability into architectural and engineering processes has never been greater. Yet, decisions made in the earliest stages of building design often lock in the majority of a structure’s lifetime carbon footprint, long before … Read more

Fire Risk Prediction

Fire Spread Prediction in Building Layouts Project Goal The goal is to predict fire safety risk at the unit level in a building. We are conducting a fire spread analysis for building layouts by examining spatial, structural, and material features. The objective is to model and understand how fire propagates through different parts of a … Read more