Progress report

Project Summary

Our project focuses on stabilizing a quadrotor using Reinforcement Learning with PPO (Proximal Policy Optimization). We integrate PyDrake and PyFlyt for simulation, utilizing Stable-Baselines3 for training. The goal is to train an RL agent to control the quadrotor efficently while incorporating traditional control methods like LQR for stability before PPO takes over. We aim to enhance performance through hyperparameter tuning, vectorized environments, and robust evaluation.

Approach

The system follows a structured RL pipeline:

1 - Environment setup

pyflyt

pydrake

2 - Control algorithms

3 - Enchancements Implemented

Evaluation

We evaluate our approach using both qualitative and quantitative metrics:

Quantitative Metrics

Qualitative

Current Results

Training Loss for both environments

Pyflyt training loss with default hyperparams

Pyflyt training reward with default hyperparams

Remaining Goals and Challenges

Next steps

Challenges

Resources Used

Video

Watch Video