The Team Led by Peng Yuxin of Peking University Achieves Series of Breakthroughs in Fine-Grained Multimodal Large Models
2026-01-19 / Read about 0 minute
Author:小编   

Although multimodal large models demonstrate remarkable proficiency in handling general tasks, they fall short when it comes to fine-grained perception abilities. Striking a balance between open-domain generalization prowess and fine-grained perception capabilities is of utmost importance for transitioning large models from mere chat assistants to real-world applications, such as autonomous driving, embodied intelligence, medical imaging, and industrial manufacturing. Addressing this challenge, the team headed by Professor Peng Yuxin from the Wang Xuan Institute of Computer Technology at Peking University has recently attained significant advancements. These include the creation and open-sourcing of Finedefics, the first fine-grained multimodal large model, as well as the publication of the inaugural related review paper.