On March 17, 2026, at the NVIDIA GTC 2026 conference, Li Auto took the wraps off its latest autonomous driving foundation model, MindVLA-o1. This model serves as a cornerstone for achieving autonomous driving intelligence in the real world, leveraging five pivotal technological breakthroughs: 3D spatial comprehension, multimodal reasoning, unified behavior generation, closed-loop reinforcement learning, and hardware-software co-optimization. Specifically, 3D spatial comprehension integrates cameras and LiDAR to accurately perceive the three-dimensional physical environment. Multimodal reasoning utilizes a latent world model to anticipate future scenarios, thereby enhancing the foresight of decision-making processes. Unified behavior generation adopts a VLA-MoE architecture to ensure seamless and physically feasible driving paths. Closed-loop reinforcement learning cuts down on training expenses by employing a world simulator. Lastly, hardware-software co-optimization strikes an optimal balance between model precision and hardware latency, streamlining architecture design and boosting deployment efficiency.
