NVIDIA and the University of Maryland Jointly Release Audio Flamingo Next, an Open-Source Long-Audio Understanding Model
1 week ago / Read about 0 minute
Author:小编   

According to Marktechpost, the research team from NVIDIA and the University of Maryland has jointly released Audio Flamingo Next (AF-Next), the most powerful open-source large audio language model in the Audio Flamingo series, specifically designed to tackle challenges in long-audio understanding and complex reasoning. Built upon Qwen-2.5-7B, AF-Next supports audio inputs of up to 30 minutes and a 128k context window. Through the innovative 'Temporal Audio Chain-of-Thought' technique, it significantly enhances the model's evidence aggregation capabilities and accuracy in long-audio tasks. This open-source release includes three variants: AF-Next-Instruct, AF-Next-Think, and AF-Next-Captioner, optimized for general question answering, multi-step reasoning, and audio captioning tasks, respectively. Experimental data shows that the model substantially outperforms open-source models of the same class across 20 benchmarks and surpasses Gemini 2.5 Pro on challenging benchmarks like MMAU-Pro, demonstrating exceptional generalization ability and practical value.