Xiaomi has officially announced the release and open-sourcing of its cutting-edge autonomous driving model, Xiaomi OneVL. This model represents a groundbreaking achievement in the industry, as it is the first to seamlessly integrate diverse technical approaches, including Vision-Language-Action (VLA), world models, and latent space reasoning. Building upon the robust reasoning capabilities of the XLA model, Xiaomi OneVL further elevates performance by enhancing reasoning speed and accuracy.
By unifying VLA and world models within a single, cohesive framework, Xiaomi OneVL has established new benchmarks for latent reasoning methods across a range of mainstream tests. Remarkably, it stands as the sole implicit reasoning method to surpass the performance of explicit autoregressive Chain-of-Thought (CoT) approaches in all evaluated benchmarks. Additionally, a variant of the model equipped with an MLP regression head delivers low-latency performance, making it a practical and efficient solution for real-time deployment in mass-produced vehicles.
Ablation experiments conducted during the model’s development have revealed that compressing dynamic information from the physical world can significantly enhance overall performance. Furthermore, Xiaomi OneVL offers interpretability in both linguistic and visual dimensions, providing deeper insights into its decision-making processes.
