On December 29, 2025, the multimodal interaction team at Tongyi Lab made the universal GUI agent foundation model MAI-UI available as open - source. This model stands out as the first to seamlessly incorporate three crucial capabilities—user interaction, MCP tool invocation, and edge - cloud collaboration—into a single, cohesive architecture. Its primary goal is to tackle fundamental challenges like cross - application operations and the comprehension of ambiguous semantics. MAI-UI boasts a range of impressive capabilities. It can proactively pose questions to clear up ambiguous instructions, prioritize structured tool invocation, and safeguard privacy through edge - cloud collaboration. This model has demonstrated exceptional performance, securing the top spot across five authoritative benchmarks, such as ScreenSpot - Pro and AndroidWorld. It has outperformed well - known mainstream models, including Gemini - 3 - Pro and UI - Tars - 2. In an effort to conduct a more realistic evaluation of agent capabilities, the team introduced MobileWorld, a benchmark specifically designed for real - world mobile scenarios. In this benchmark, MAI-UI achieved a success rate of 41.7%. Moreover, its largest variant attained a remarkable 76.7% success rate on the AndroidWorld benchmark. Both the relevant code and the research paper have been released as open - source resources.
