Meta has introduced the LlamaRL reinforcement learning framework, leveraging a fully asynchronous distributed architecture to dramatically enhance the training efficiency of large models. On a model comprising 405 billion parameters, each reinforcement learning step now takes just 59.5 seconds, down from 635.8 seconds, marking a performance leap of over tenfold. The framework adeptly tackles challenges like excessive memory usage and suboptimal GPU utilization through its modular design and efficient data transmission technology, offering a scalable solution tailored for the training of even larger models.