On January 29, 2026, SenseTime officially made its multimodal autonomous reasoning model, SenseNova-MARS, open-source. This model comes in two variants, boasting 8 billion (8B) and 32 billion (32B) parameters respectively. In core benchmark assessments for multimodal search and reasoning, SenseNova-MARS attained an impressive average score of 69.74, outperforming both Gemini-3-Pro (which scored 69.06) and GPT-5.2 (with a score of 67.64). It stands as the world's first Agentic Vision-Language Model (VLM) model that facilitates a seamless integration of dynamic visual reasoning with image-text search capabilities. This model is adept at autonomously devising step-by-step plans, utilizing various tools, and managing intricate tasks.
In benchmark evaluations like MMSearch and HR-MMSearch, SenseNova-MARS has achieved State-of-the-Art (SOTA) performance among open-source models, excelling in both search reasoning and visual comprehension—the two pivotal domains. The model showcases proficiency in image cropping, image search, and text search, allowing for pinpoint accuracy in focusing on minute details, swift matching of pertinent information, and efficient retrieval of precise data.
SenseNova-MARS adopts a two-stage training methodology. In the initial phase, an automated data synthesis engine is employed to construct highly intricate multi-hop reasoning chains. Subsequently, the second phase leverages reinforcement learning, incorporating the BN-GSPO algorithm to guarantee training stability and convergence. SenseTime has generously open-sourced not only the model weights and training code but also the synthetic datasets for SenseNova-MARS. These resources are readily available for direct download on the Hugging Face platform.
