On May 26th, amidst the relentless progress of foundational models and the widespread adoption of AI Agents, benchmarks for assessing AI capabilities encountered a significant hurdle: accurately reflecting the true potential of AI systems. This predicament stems from the remarkable performance of foundational models on the benchmark test question banks readily available on the market, often achieving high scores or even perfect marks.