On September 25, 2025, Samsung Electronics introduced the TRUEBench benchmarking platform. This platform, crafted by Samsung Research, is specifically designed to evaluate the productivity of artificial intelligence. TRUEBench offers an extensive array of metrics to assess how well large language models perform in real-world work efficiency scenarios.
To guarantee a truly authentic assessment, TRUEBench incorporates a wide range of conversational situations and multilingual environments. It leverages Samsung's in-house AI productivity applications to evaluate typical enterprise tasks. These tasks are categorized into 10 main groups and 46 subgroups, including content creation, data analysis, summarization, and translation.
The benchmark is built upon standards tailored for human-machine collaboration. It ensures the reliability of scoring through AI-powered automated evaluation methods. TRUEBench encompasses a total of 2,485 test sets. These sets span 10 categories and cover 12 languages. They also support cross-lingual scenarios, with test set lengths varying from as short as 8 characters to over 20,000 characters.
Moreover, its data samples and leaderboard have been made openly accessible on the Hugging Face platform. This allows users to compare the performance of up to five models simultaneously and view the average response time data for the results.