China's Cyberspace Administration: Data Providers Must Bolster Training Data Management During Pre-training and Optimization
6 day ago / Read about 0 minute
Author:小编   

China's Cyberspace Administration has formally opened the floor for public feedback on the 'Interim Measures for the Administration of AI Personified Interactive Services (Draft for Comment)'. The draft document underscores the necessity for service providers to enhance their management of training data throughout data processing endeavors, particularly during pre-training and optimization phases. It specifically mandates adherence to the following guidelines:

  • Firstly, utilize datasets that resonate with core socialist values and showcase the richness of traditional Chinese culture.
  • Secondly, rigorously cleanse and label training data to bolster its transparency and dependability, while thwarting data poisoning and tampering attempts.
  • Thirdly, diversify training data sources and elevate the security of model-generated content through techniques like negative sampling and adversarial training.
  • Fourthly, conduct thorough safety evaluations of synthetic data employed in model training and key capability refinements.
  • Fifthly, institute routine inspections of training data, ensuring its regular iteration and upgrade to continually refine product and service efficacy.
  • Sixthly, guarantee the legality and traceability of training data sources, implement robust measures to safeguard data security, and mitigate potential leakage risks.