ElevenLabs Unveils Scribe v2 Realtime: Achieving Ultra-Low Latency in Speech Transcription
2026-01-15 / Read about 0 minute
Author:小编   

As reported by Quasa, ElevenLabs has officially launched its Scribe v2 Realtime speech recognition model. Tailored specifically for real-time applications, this model boasts support for over 90 languages. It achieves an impressive end-to-end latency of just 30–150 milliseconds, positioning it as an ideal solution for highly dynamic environments such as AI-driven agents, real-time captioning services, translation tasks, and call center operations.

Scribe v2 Realtime is built on a streaming-first architecture, accommodating a variety of audio formats, including PCM and μ-law. Its advanced features encompass predictive transcription, voice activity detection, contextual memory retention, and the ability to accurately recognize complex terminology. In the FLEURS multilingual benchmark test, the model attained an accuracy rate of 93.5%, surpassing competitors such as Google Gemini Flash 2.5 and OpenAI GPT-4o Mini.

The model is now accessible via the ElevenLabs API, providing enterprise-level security with certifications including GDPR, HIPAA, and SOC 2, along with support for EU data residency requirements. Pricing begins at $0.28 per hour of audio processing, with enterprise clients gaining access to high concurrency capabilities and dedicated support services.