In the realm of training large-scale models, reinforcement learning algorithms hold paramount importance for enhancing performance. However, these algorithms grapple with significant challenges, including steep demands for computational resources and sluggish training speeds, which render them financially infeasible for average enterprises and institutions. Addressing these issues necessitates the optimization of algorithms, refinement of model structures, augmentation of parallelization, and judicious allocation of computational resources.