Microsoft has recently bolstered its lineup of in-house artificial intelligence models with the introduction of a cutting-edge speech-to-text model, MAI-Transcribe-1. This innovative model claims an impressive average Word Error Rate (WER) of merely 3.9% across a diverse range of 25 languages, positioning it as the most accurate transcription model on a global scale. Prior to this, Microsoft had already launched the speech synthesis model MAI-Voice-1 and the image generation model MAI-Image-2. With the addition of MAI-Transcribe-1, Microsoft now boasts three self-developed models within its MAI series.
