Universal Humanoid Motion Representations for Expressive Learning-based Control

Recent advances in physically simulated humanoids have broadened their application spectrum, including animation, gaming, augmented and virtual reality (AR/VR), and robotics, showcasing significant enhancements in both performance and practicality. With the advent of motion capture (MoCap) technology and reinforcement learning (RL) techniques, these simulated humanoids are capable of replicating extensive human motion datasets, executing complex animations, and following intricate motion patterns using minimal sensor input. Nevertheless, generating such detailed and naturalistic motions requires meticulous motion data curation and the development of new physics-based policies from the ground up—a process that is not only labor-intensive but also fraught with challenges related to reward system design, dataset curation, and the learning algorithm, which can result in unnatural motions. To circumvent these challenges, researchers have explored the use of latent spaces or skill embeddings derived from pre-trained motion controllers, facilitating their application in hierarchical RL frameworks. This method involves training a low-level policy to generate a representation space from tasks like motion imitation or adversarial learning, which a high-level policy can then navigate to produce latent codes that represent specific motor actions. This approach promotes the reuse of learned motor skills and efficient action space sampling. However, the effectiveness of this strategy is often limited by the scope of the latent space, which is traditionally based on specialized and relatively narrow motion datasets, thus limiting the range of achievable behaviors. An alternative strategy involves employing a low-level controller as a motion imitator, using full-body kinematic motions as high-level control signals. This method is particularly prevalent in motion tracking applications, where supervised learning techniques are applied to paired input data, such as video and kinematic data. For generative tasks without paired data, RL becomes necessary, although kinematic motion presents challenges as a sampling space due to its high dimensionality and the absence of physical constraints. This necessitates the use of kinematic motion latent spaces for generative tasks and highlights the limitations of using purely kinematic signals for tasks requiring interaction with the environment or other agents, where understanding of interaction dynamics is crucial. We would like to extend the idea of creating a low-level controller as a motion imitator to full-body motions from real-time expressive kinematic targets.

Keywords: representation learning, periodic autoencoders

Description
**Work packages** Literature research Motion representation learning **Requirements** Strong programming skills in Python Experience in reinforcement learning and imitation learning frameworks Good understanding of autoencoder-based representation learning **Related literature** Peng, Xue Bin, et al. "Deepmimic: Example-guided deep reinforcement learning of physics-based character skills." ACM Transactions On Graphics (TOG) 37.4 (2018): 1-14. Starke, Sebastian, et al. "Deepphase: Periodic autoencoders for learning motion phase manifolds." ACM Transactions on Graphics (TOG) 41.4 (2022): 1-13. Luo, Zhengyi, et al. "Perpetual humanoid control for real-time simulated avatars." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023. Luo, Zhengyi, et al. "Universal Humanoid Motion Representations for Physics-Based Control." arXiv preprint arXiv:2310.04582 (2023). Li, Chenhao, et al. "FLD: Fourier Latent Dynamics for Structured Motion Representation and Learning." arXiv preprint arXiv:2402.13820 (2024).
**Work packages**

Literature research

Motion representation learning

**Requirements**

Strong programming skills in Python

Experience in reinforcement learning and imitation learning frameworks

Good understanding of autoencoder-based representation learning

**Related literature**

Peng, Xue Bin, et al. "Deepmimic: Example-guided deep reinforcement learning of physics-based character skills." ACM Transactions On Graphics (TOG) 37.4 (2018): 1-14.

Starke, Sebastian, et al. "Deepphase: Periodic autoencoders for learning motion phase manifolds." ACM Transactions on Graphics (TOG) 41.4 (2022): 1-13.

Luo, Zhengyi, et al. "Perpetual humanoid control for real-time simulated avatars." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.

Luo, Zhengyi, et al. "Universal Humanoid Motion Representations for Physics-Based Control." arXiv preprint arXiv:2310.04582 (2023).

Li, Chenhao, et al. "FLD: Fourier Latent Dynamics for Structured Motion Representation and Learning." arXiv preprint arXiv:2402.13820 (2024).
Goal
Not specified
Contact Details
Please include your CV and transcript in the submission. **Chenhao Li** https://breadli428.github.io/ chenhli@ethz.ch
Please include your CV and transcript in the submission.

**Chenhao Li**

https://breadli428.github.io/

chenhli@ethz.ch

Calendar

Earliest start	No date
Latest end	No date

Location

ETH Competence Center - ETH AI Center (ETHZ)

Other involved organizations
Course 6: Electrical Engineering and Computer Science (MIT), Robotic Systems Lab (ETHZ)

Labels

Master Thesis

Topics

Information, Computing and Communication Sciences
Engineering and Technology