A collaborative research team from Google, Carnegie Mellon University, and MultiOn has delved into the realm of synthetic data utilization in the training of large models. They have introduced an innovative approach that seamlessly integrates positive and negative data, significantly enhancing the mathematical reasoning prowess of these models. The findings of their study reveal that pre-training with synthetic data boosts the reasoning performance of large models by an impressive eightfold, presenting an efficient and timely solution amidst the burgeoning demand for training data.
