OpenAI’s Voice Engine can clone a voice from a 15-second clip. Hear for your self

Since releasing ChatGPT and ushering within the generative AI period, OpenAI has stayed forward of the curve with cutting-edge AI know-how resembling Sora, its spectacular text-to-video generator. On Friday, the corporate took one other step ahead by sharing insights from its small-scale preview of Voice Engine, a voice cloning AI mannequin that may create reasonable, emotive voices utilizing textual content enter and a 15-second audio pattern. 

As seen within the clip beneath, the know-how can generate a extremely realistic-sounding voice that carefully resembles the voice within the reference clip. An AI voice generator able to impersonating somebody’s voice from only a 15-second pattern — what might go fallacious?

OpenAI simply launched Voice Engine,
It makes use of textual content enter and a single 15-second audio pattern to generate natural-sounding speech that carefully resembles the unique speaker.
Reference and Generated audio could be very shut and exhausting to distinguish.
Extra particulars in 🧵 pic.twitter.com/tJRrCO2WZP

— AshutoshShrivastava (@ai_for_success) March 29, 2024

OpenAI is conscious of the dangers of a voice cloning mannequin and, because of this, has not but launched it to the general public, regardless of first creating Voice Engine in late 2022. “We acknowledge that producing speech that resembles individuals’s voices has severe dangers, that are particularly high of thoughts in an election yr,” the corporate mentioned in its weblog publish.

In 2023, OpenAI started privately testing Voice Engine with a small group of companions to assist the corporate study extra concerning the mannequin, together with its potential use circumstances, safeguards, and extra.

Additionally: Microsoft has a intelligent approach of displaying you AI is regular (particularly in case you’re alone)

The companions testing Voice Engine needed to comply with OpenAI’s utilization insurance policies, which explicitly prohibit them from impersonating a person or group with out the unique speaker’s consent. Different safeguards embody disclosing to the viewers that the voice they’re listening to is AI-generated, watermarks that hint again to Voice Engine, monitoring the mannequin’s utilization, and prohibiting the creation of their very own voices.

OpenAI’s companions have taken Voice Engine and developed use circumstances with a doubtlessly optimistic impression.

For instance, edtech startup Age of Studying used Voice Engine to offer non-readers and kids with studying help by producing pre-scripted voice-over content material and customized responses. Equally, AI avatar-generating startup HeyGen constructed a instrument on Voice Engine that interprets a speaker’s voice into a number of languages.

Whereas OpenAI is retaining Voice Engine in preview for now, different comparable fashions are already obtainable to the general public. Take ElevenLabs, a startup that has made headlines for each optimistic and unfavorable use circumstances of its AI-powered voice-generating platform. The perfect-known instance of ElevenLabs’s tech  might be the latest faux robocall of President Joe Biden that inspired voters to not present up on the polls.

Additionally: ChatGPT is lastly revealing its sources – however there is a catch

The ElevenLabs Voice Cloning instrument is straightforward to entry and use. All you want is an ElevenLabs account, a couple of minutes of voice samples, and a textual content immediate.

OpenAI is wise to delay its entrance into the voice cloning area. The tech business must deliver consciousness to the dangers of AI-generated voices and emphasize to customers the significance of verifying sources earlier than they imagine what they hear and see.

Leave a Reply

Your email address will not be published. Required fields are marked *