Microsoft has made it a goal to integrate artificial intelligence (AI) into all of its products in the past several months, from Copilot 365 for enterprises to the consumer-focused Microsoft Office. The tech giant rebranded Bing Chat as Copilot and unveiled a number of new AI-based products, including Copilot Studio and Windows AI Studio, at its most recent Ignite 2023 conference. Additionally, the business introduced Azure AI Speech, a text-to-speech avatar tool that facilitates the creation of talking avatar films. The public preview is introducing it. Learn every detail about this amazing feature.
Microsoft Azure AI Speech
You can turn text into a 2D video of a talking avatar that resembles a human with the Azure AI Speech text-to-speech avatar. According to Microsoft, text-to-speech vocal models give the avatar’s voice, while deep neural networks are used to train the models using human video recording samples. More digital interactions are made possible by users’ ability to create training videos, product introductions, customer testimonials, and more using text inputs.
How it works
The text analyzer, the TTS audio synthesizer, and the TTS avatar video synthesizer are the three processes in the Azure AI Speech avatar content generating pipeline. Initially, the user enters the text, which the text analyzer outputs as a phoneme sequence. Next, the voice is synthesized by the TTS audio synthesizer, which anticipates the acoustic characteristics of the incoming text. Text-to-speech voice models power both of these features.
Finally, the neural text-to-speech avatar model creates a synthetic movie by predicting the lip-synch image based on the acoustic parameters.
There are two levels to the Azure AI Speech service. The first is a neural voice that is prebuilt and has default voices that seem natural. Users must sign up for the Speech service and create an Azure account in order to access it. After that, they can choose from prebuilt voices using the Speech Studio interface or the Speech SDK.
However, Microsoft also provides the ability to generate unique neural voices. We refer to this capability as Custom Neural Voice. It is a simple self-service tool with restricted access for responsible use that helps establish a natural brand voice. For now, Microsoft is just providing restricted access to this capability.