Based on the latest data from HuggingFace's official website, as of February 22, the Nanbeige4.1-3B model, open-sourced by Boss Zhipin's Nanbeige Lab, has secured a spot in the top three of the global model trending list and clinched the first place in the text model trending list. Despite having only 3 billion parameters, this model showcases remarkable cross-task generalization abilities and delivers comprehensive performance across core tasks, including general question-answering, complex reasoning, code generation, and in-depth search.
The model's core breakthrough lies in its systematic integration of robust reasoning capabilities, alignment with human preferences, and deep search agent functionalities into a 3B parameter framework. Through meticulous design of its training regimen, it surpasses models like Qwen3-32B—which boast 10 times more parameters—in multiple evaluations.
Key technical highlights include:
General Capabilities: Full-chain optimization of SFT data construction and RL training enables upgrades to instruction recipes, expansion of context length to 256K, and enhanced response quality.
Preference Alignment: Incorrect responses are significantly mitigated through two-stage training involving Point-wise RL and Pair-wise RL.
Deep Search Capabilities: Long-context multi-hop reasoning is facilitated by constructing large-scale complex datasets and implementing turn-level judgment mechanisms.
Code Generation: Correctness and low time complexity of generated code are ensured through a multi-stage training strategy.
