AI startup Inception Labs has officially rolled out Mercury 2, a large-scale inference model that leverages a diffusion architecture. This innovative model enables efficient reasoning by processing multiple text segments simultaneously. When running on NVIDIA Blackwell GPUs, Mercury 2 achieves an impressive end-to-end latency of just 1.7 seconds. This performance marks a significant improvement over both Gemini 3 Flash and Claude Haiku 4.5, all while maintaining a generation quality on par with leading high-speed models. With a pricing structure of $0.25 per million tokens for input and $0.75 per million tokens for output, Mercury 2 supports a 128K context window, tool invocation capabilities, and JSON output formatting. These features make it an ideal solution for low-latency applications, such as voice assistants and coding tools. Early access to Mercury 2 is now available.
