Microsoft’s new Text-to-Speech voices are more ‘realistic, lifelike, and engaging’

The new Text-to-Speech (TTS) voices promise more realistic and lifelike user interactions.

When you purchase through links on our site, we may earn an affiliate commission.Here’s how it works.

What you need to know

With the exponential growth of AI and its capabilities across the world, there’s a rise in the demand for “naturalness and expressiveness in Text-to-Speech voices,” according to Microsoft. The company recently announcedfour new voices, including en-US-AndrewNeural, en-US-BrianNeural, en-US-EmmaNerual, and zh-CN-YunjieNeural.

The tech giant indicated that the new voices are designed for conversational scenarios to ensure user interactions are “more realistic, lifelike, and engaging.” The four new voices are available in public preview in three regions: East US, Southeast Asia, and West Europe.

To demystify the difference between existing voices designed for general purposes and the new voices optimized for conversations, Microsoft also includedseveral demosshowcasing the different flavors of the newly incorporated voices.

Microsoft explained that it’s possible to integrate the voices into existing applications viaAzure OpenAI, using Azure Speech SDK, REST API, and leveraging Azure Bot Framework’s capabilities to develop intelligent bots with the ability to use the new Text-to-Speech (TTS) voices.

We began by crafting the persona of each voice as if it were a real person who is friendly and optimistic about life, always eager to assist others and share intriguing or practical knowledge. The speaking style of the voice resembles a conversation with an acquaintance over a cup of tea, maintaining a natural and unexaggerated tone. Furthermore, we continuously enhance our Text-to-Speech (TTS) modeling techniques to improve the quality of our AI voices. Our most recent projects, such as DelightfulTTS 2, and MuLanTTS, have significantly narrowed the quality gap between AI voices and professional human recordings, producing more natural and realistic voices than ever before. These technological advancements serve as the foundation upon which these new AI voices are built.

Adding a natural and expressive touch

AI has enjoyed several wins and setbacks, with an incline to the latter. There have been several reports indicating that chatbots aregetting dumberand also experiencing adecline in accuracy and user base.

Perhaps the debut of the new voices will positively impact this trend. Microsoft “offers over 400 neural voices covering more than 140 languages and locales,” and those figures seem likely to expand over time.

All the latest news, reviews, and guides for Windows and Xbox diehards.

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. You’ll also catch him occasionally contributing at iMore about Apple and AI. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.

Microsoft’s new Text-to-Speech voices are more ‘realistic, lifelike, and engaging’#

What you need to know#

Adding a natural and expressive touch#

Get the Windows Central Newsletter#

Microsoft’s new Text-to-Speech voices are more ‘realistic, lifelike, and engaging’

What you need to know

Adding a natural and expressive touch

Get the Windows Central Newsletter