Recent research conducted by the Oxford Internet Institute highlights that present-day approaches for assessing the capabilities of artificial intelligence (AI) systems frequently fall short of scientific precision. This inadequacy results in an "overestimation" of the true performance levels of AI. The study, which was carried out in partnership with more than thirty academics, scrutinized 445 widely recognized AI benchmark tests. These tests are commonly used by developers and researchers to gauge model performance and to assert progress in technological development. Nevertheless, the study contends that the credibility of these benchmark tests is dubious and advocates for a reassessment of their legitimacy.
