On May 8, it was officially announced that the Beijing Academy of Artificial Intelligence (BAAI) had launched the extensive open-source text dataset CCI 4.0 at the GOSIM Global Open Source Innovation Forum held in Paris, France, on May 6. This dataset encompasses both Chinese and English languages, with plans to introduce additional language versions in the near future. The release was spearheaded by BAAI and saw significant contributions from Alibaba Cloud, Shanghai AI Laboratory, Huawei, Mobvoi, Kingsoft Office, Kunlun Tech, and other esteemed institutions. The introduction of the CCI 4.0 dataset is poised to accelerate the application and advancement of text data within the realm of artificial intelligence.
