xbench and UniPat Jointly Launch a New Evaluation Dataset: BabyVision
2 week ago / Read about 0 minute
Author:小编   

On January 12, 2026, xbench, a project under Sequoia China, collaborated with the UniPat AI team to unveil the BabyVision multimodal understanding evaluation dataset. This dataset was specifically designed to evaluate the pure visual foundational abilities of large models, independent of language - based prompts. The findings revealed a notable performance gap, with mainstream large models trailing behind the visual capabilities of 3 - year - old children.

The BabyVision evaluation dataset systematically categorizes visual capabilities into four primary domains: fine discrimination, visual tracking, spatial perception, and visual pattern recognition. In total, it encompasses 22 subtasks and 388 individual questions, providing a comprehensive framework for assessing visual understanding.