A collaborative research team comprising scholars from the University of Hong Kong, Zhiyuan AGIBOT, Fudan University, and Shanghai Chuangzhi College has recently unveiled the WholeBodyVLA framework. This innovative framework integrates Vision–Language–Action (VLA) capabilities to enable comprehensive whole-body control of humanoid robots in real-world scenarios. Utilizing the Zhiyuan Lingxi X2 as a foundational platform, this study pushes the boundaries of VLA technology by extending its application to the realm of bipedal humanoid robot whole-body control. Furthermore, it successfully demonstrates the framework's viability through a series of whole-body motion manipulation tasks.
