Large models have garnered significant attention for their remarkable performance in various fields, including computer vision and natural language processing. Nonetheless, the training of these models is heavily constrained by the capacity of GPU memory. In an effort to overcome this challenge, Tang Yu, Li Dongsheng, and their team from the National University of Defense Technology have thoroughly investigated training techniques for large language models under constrained GPU memory conditions. Their paper compiles a series of optimization strategies that address this limitation.
