Tibet's AI Progress: From Scenario-Specific Applications to the Development of a Tibetan Large Language Model
2025-11-30 / Read about 0 minute
Author:小编   

According to a report from the China News Service, the "Yangguang Qingyan" V1.0, a large-scale foundational model for the Tibetan language boasting hundreds of billions of parameters, was recently introduced to the public. On the 30th (day not specified, could be month/day or day/month depending on context), Ni Ma Zhaxi, an academician at the Chinese Academy of Engineering and a professor at Tibet University, remarked that Tibet has achieved remarkable strides in the creation of Tibetan large language models. This advancement signifies a shift from AI applications tailored to specific scenarios to a more systematic approach to research and development in the field.

As a self-developed milestone in Tibet's AI landscape, the "Yangguang Qingyan" V1.0 model underwent training with a dataset comprising roughly 28.8 billion tokens of high-quality Tibetan language data. This extensive dataset spans a diverse array of domains, including news, law, medicine, education, and technology. Furthermore, it encompasses a wealth of content, such as Tibetan monolingual data, multilingual parallel corpora, and entries from bilingual dictionaries.