Inteligencia Artificial (IA)
Microsoft unveils its first in-house AI models for transcription and voice generation.
Gianro Compagno
2026-04-04
5 min read
Microsoft advances in AI with new voice and transcription models integrated into Copilot and Azure
Microsoft has made a significant step in artificial intelligence development by launching its first proprietary models focused on voice generation and transcription, now available in services like Copilot and Azure Speech. This initiative is part of a long-term strategy aimed at positioning the company at the forefront of the most advanced AI models in the industry by 2027.
In public early access, Microsoft has introduced three key models: MAI-Image-2, a photorealistic image generator from text; MAI-Voice-1, an ultra-fast voice generator; and MAI-Transcribe-1, a high-precision transcription system. MAI-Image-2, launched in March, stands out for its ability to create professional images, while MAI-Voice-1 and MAI-Transcribe-1 mark the beginning of a comprehensive audio AI platform aimed at developers.
MAI-Transcribe-1 supports 25 languages and, according to Microsoft, reduces GPU costs by 50% compared to other alternatives, facilitating real-time transcriptions and subtitles for events, virtual assistants, call centers, and educational environments. Meanwhile, MAI-Voice-1 can generate up to 60 seconds of audio in less than a second using a single GPU, enabling expressive voice experiences in Copilot features, such as audio and podcasts.
These models are already integrated into services like Copilot, Bing, PowerPoint, and Azure Speech, and are available for developers in Playground and Foundry. Microsoft's commitment to in-house development aims to compete directly with industry leaders like OpenAI and Anthropic. Mustafa Suleyman, CEO of Microsoft AI, told Bloomberg that the goal is to reach the absolute technological frontier in models capable of generating text, images, and audio by 2027.