NVIDIA and MIT Partner to Unveil Fast-dLLM Framework, Elevating AI Inference Speed by 27.6x
2 week ago / Read about 0 minute
Author:小编   

NVIDIA, in partnership with MIT and the University of Hong Kong, has launched the innovative Fast-dLLM framework. This groundbreaking solution boosts the inference speed of diffusion models by an impressive 27.6 times. By integrating a chunked approximate KV caching mechanism and a confidence-aware parallel decoding strategy, Fast-dLLM effectively mitigates issues related to computational redundancy and dependency conflicts. Across a range of benchmark tests, Fast-dLLM has demonstrated remarkable acceleration while maintaining generation quality close to the baseline, thereby offering robust support for the practical deployment of diffusion models.