Github Repository. https://github.com/LaurenD66/ROS-GridWorld-RL-with-Obstacles

Construction Industry Fabrication Workflow + Machine Learning Integration

Workflow: Robotic navigation in dynamic construction environments, focusing on obstacle avoidance during material handling tasks.

Construction sites are inherently dynamic, with unpredictable obstacles and changing layouts. Traditional rule-based navigation systems struggle to adapt in real-time. Integrating Reinforcement Learning (RL) allows robots to learn optimal navigation strategies through interaction with the environment, improving adaptability and efficiency.​

Application Details

  • What: Develop an RL-based navigation system enabling robots to perform pick-and-place tasks while dynamically avoiding obstacles in a grid-based environment.​
  • Why: Enhance robotic autonomy and efficiency in construction settings by enabling real-time adaptation to changing environments, reducing human intervention, and minimizing downtime.​
  • How: Utilize Q-learning within a ROS-integrated GridWorld environment, employing Gymnasium for environment simulation. Implement reward shaping to guide learning and obstacle clustering to manage dynamic obstacles.​

Assumptions.

  • The environment can be simplified into a grid format.​
  • Robots will ultimately have access to sensors for obstacle detection.​
  • Obstacles can be static or dynamic, and their behaviors can be modeled or learned.​
  • In the case of moving obstacles, robots may ultimately be able to apply the training to real-time response.

Expectations.

  • Robots will learn to navigate efficiently avoiding obstacles to effectively and efficiently reach goals.​
  • The system will generalize to various obstacle configurations and dynamics.​

Inputs, Outputs, Data Flow, and Data Types

Inputs.

  • Current robot position (coordinates)​
  • Obstacle positions and dynamics​
  • Goal position​

Outputs.

  • Next action for the robot (e.g., move up, down, left, right)​

Data Flow.

  1. Robot perceives the environment (state).​
  2. Reinforcement Learning agent selects an action based on the current policy.​
  3. Environment updates based on the action, providing new state and reward.
  4. Agent updates its policy based on the reward and new state.

Data Types.

  • States: Tuples representing grid positions (e.g., (x, y))
  • Actions: Discrete actions (e.g., ‘up’, ‘down’, ‘left’, ‘right’)​
  • Rewards: Floating-point numbers indicating the desirability of actions​

Proof of Concept Design

Dataset.

  • Simulated grid environments with varying obstacle configurations.​
  • Logs of robot interactions, including states, actions, rewards, and outcomes.​

Implementation Steps.

  1. Set up the ROS environment + Docker Container with the provided repository.​
  2. Define the GridWorld environment using Gymnasium, incorporating dynamic obstacles.​
  3. Implement Q-learning with reward shaping to guide the learning process.​
  4. Train the agent over multiple episodes, allowing it to learn optimal navigation strategies.​
  5. Evaluate performance by measuring success rates and efficiency in reaching goals.​

Proof of Concept Video

Video Demonstration.

The below animation demonstrates the agent moving through four varient obstacles behaviors:

  1. the original
  2. clustered obstacles
  3. moving obstacles
  4. reward shaped (increased penalty for proximity to red obstacles)

The following trained agent shows the learned behavior after 20,000 episodes of the above.

Results + Conclusion

The RL method was likely ineffective due to sparse rewards, poor exploration, and limited state representation. Fixed learning parameters and Q-table scalability issues also hinder performance, especially in dynamic or large environments. Improvements include reward shaping, better exploration strategies, and using Deep Q-Networks. Training across varied environments and applying curriculum learning can also boost generalization and overall navigation success.