Deploying Locomotion Policies Trained in Differentiable Simulation on Real Hardware

In recent years, using deep Reinforcement Learning (RL) for robotic motion policies has demonstrated impressive performance, yielding unprecedented robustness on real hardware. Current sim2real approaches rely on large-scale pre-training with domain randomization to make policies robust but struggle with high-dimensional spaces. Current RL methods are primarily limited by their low sample efficiency. Leveraging differentiable simulators for first-order gradient information shows great results for enhancing sample efficiency. Although promising simulation results exist, deployment on hardware is not usually done. The goal of this thesis is to train quadrupedal locomotion policies in a differentiable simulation framework, and then enable real-world deployment by modifying the simulation, the policy training, or the learning algorithm. Ideally, we can leverage properties of differentiable simulators in this process to improve sim2real transfer by fitting real data.

Keywords: Deep Reinforcement Learning, Differentiable Simulation, Quadrupedal Locomotion Control, Sim2Real

Description
Related literature: Xu, Jie, et al. "Accelerated policy learning with parallel differentiable simulation." arXiv preprint arXiv:2204.07137 (2022). https://adaptive-horizon-actor-critic.github.io/ Rudin, Nikita, et al. "Learning to walk in minutes using massively parallel deep reinforcement learning." Conference on Robot Learning. PMLR, 2022.
Related literature:

Xu, Jie, et al. "Accelerated policy learning with parallel differentiable simulation." arXiv preprint arXiv:2204.07137 (2022).

https://adaptive-horizon-actor-critic.github.io/

Rudin, Nikita, et al. "Learning to walk in minutes using massively parallel deep reinforcement learning." Conference on Robot Learning. PMLR, 2022.
Work Packages
Literature research Train quadrupedal locomotion policies leveraging differentiable simulation Modify the simulation, policy training, or the learning algorithm for better sim2real Validate on real hardware
Literature research

Train quadrupedal locomotion policies leveraging differentiable simulation

Modify the simulation, policy training, or the learning algorithm for better sim2real

Validate on real hardware
Requirements
Highly motivated student eager to test on real hardware Familiar with basic concepts of RL, ML Comfortable with Python, Pytorch, C++, and ROS stacks Familiar with physics simulation and optimization for robotics
Highly motivated student eager to test on real hardware

Familiar with basic concepts of RL, ML

Comfortable with Python, Pytorch, C++, and ROS stacks

Familiar with physics simulation and optimization for robotics
Contact Details
Victor Klemm (RSL), vklemm@ethz.ch Ignat Georgiev (Georgia Tech), ignat@imgeorgiev.com
Victor Klemm (RSL),
vklemm@ethz.ch

Ignat Georgiev (Georgia Tech),
ignat@imgeorgiev.com
Student(s) Name(s)
Not specified
Project Report Abstract
Not specified

Calendar

Earliest start	No date
Latest end	No date

Location

Robotic Systems Lab (ETHZ)

Labels

Semester Project
Master Thesis

Deploying Locomotion Policies Trained in Differentiable Simulation on Real Hardware

Calendar

Location

Labels

Topics