The technical blog featured on Tencent's Hunyuan official website has recently been launched, showcasing innovative research outcomes collaboratively developed by the Tencent Hunyuan team and Fudan University. This milestone represents the inaugural research accomplishment since Yao Shunyu's appointment as Tencent's Chief AI Scientist. Presently, while large language models demonstrate exceptional problem-solving prowess, their real-world applications predominantly hinge on 'parametric knowledge.' This stands in contrast to humans' capacity for real-time learning from immediate context, leading to a disconnect between training methodologies and practical deployment. To quantify the disparity between current models and authentic 'Context Learners,' Yao Shunyu's team has developed the CL-bench benchmark. This benchmark encompasses four real-world contextual learning scenarios, boasts a design free from contamination, and is characterized by high complexity, sequential dependency, and tasks that can be verified. The team utilized CL-bench to assess ten cutting-edge language models, discovering that the average task resolution rate stood at a mere 17.2%, with the top-performing GPT-5.1 (High) achieving only 23.7%. This finding underscores that current large language models are largely ineffective at learning from context.
