SenseTime Releases Open-Source SenseNova-MARS: Shattering the Limits of Multimodal Search and Reasoning
3 day ago / Read about 0 minute
Author:小编   

On January 29, 2026, SenseTime made a groundbreaking announcement—the official open-sourcing of its cutting-edge multimodal autonomous reasoning model, SenseNova-MARS. This model comes in two variants: an 8-billion-parameter (8B) version and a more powerful 32-billion-parameter (32B) version.

In rigorous benchmark tests for multimodal search and reasoning, such as MMSearch and HR-MMSearch, SenseNova-MARS achieved an impressive score of 69.74. This performance not only outshines that of Gemini-3-Pro (69.06) and GPT-5.2 (67.64) but also positions it as the state-of-the-art (SOTA) among all open-source models.

What sets SenseNova-MARS apart is its status as the first Agentic Vision-Language Model (VLM) capable of deeply integrating dynamic visual reasoning with image-text search. It boasts the ability to autonomously plan out a series of steps and effectively utilize various tools. This empowers it to tackle complex tasks that demand multi-step reasoning and the coordinated use of multiple tools.

Take, for instance, a task that involves identifying a tiny logo on a racing suit, then querying the founding year of the corresponding company, matching it with the driver's birth date, and finally calculating the difference between the two dates. SenseNova-MARS can seamlessly and autonomously invoke tools for image cropping, image search, and text search, completing the entire closed-loop solution without any human intervention.

The model's remarkable capabilities are a result of its two-stage training mechanism. In the first stage, it leverages a multimodal agent automated data synthesis engine to construct highly complex multi-hop reasoning chains. This ensures that the model is exposed to a wide range of challenging scenarios during training. The second stage incorporates reinforcement learning and the BN-GSPO algorithm, which work together to guarantee training stability and convergence, ultimately refining the model's performance.

Now, the entire SenseNova-MARS package is available to the global community. The model weights, training code, and synthetic dataset have all been open-sourced. Researchers, developers, and enthusiasts can easily access and download these resources directly from the Hugging Face platform, fostering further innovation and collaboration in the field of multimodal AI.