OpenAI Claims to Clone a Voice from Just 15 Seconds of Audio

OpenAI Claims to Clone a Voice from Just 15 Seconds of Audio

OpenAI Unveils Voice Engine: A New Voice Cloning Technology

OpenAI has recently previewed a new tool called Voice Engine, which is a voice cloning technology capable of mimicking any speaker by analyzing a 15-second audio sample. The company claims that Voice Engine generates "natural-sounding speech" with "emotive and realistic voices."

This technology is an extension of OpenAI's existing text-to-speech API and has been in development since 2022. OpenAI has already utilized a version of this toolset to power the preset voices in the current text-to-speech API and the Read Aloud feature. Samples of the technology can be found on the company's official blog, showcasing voices that closely resemble the original speakers.

OpenAI envisions Voice Engine being used for reading assistance, language translation, and aiding individuals with sudden or degenerative speech conditions. The company highlighted a Brown University pilot program that successfully assisted a patient with speech impairment by creating a Voice Engine clone from audio recorded for a school project.

Despite its potential benefits, OpenAI acknowledges that bad actors could misuse this technology for deepfake purposes, which is already a concern. Due to these risks, Voice Engine is not yet ready for widespread use, as serious privacy concerns need to be addressed before a full rollout.

OpenAI is actively soliciting feedback from US and international partners in government, media, entertainment, education, and civil society to mitigate these risks. All preview testers are required to adhere to OpenAI's usage policies, which prohibit the impersonation of another individual without consent or legal right.

OpenAI has implemented safety measures, such as watermarking to trace the origin of audio and "proactive monitoring" of the system's use. When Voice Engine officially launches, there will be a "no-go voice list" to prevent AI-generated speakers that are too similar to prominent figures.

Regarding pricing, Voice Engine is expected to cost $15 per one million characters, which is approximately 162,500 words, comparable to the length of Stephen King's novel "The Shining." OpenAI also mentions an "HD" version that will cost twice as much, but details on its functionality are yet to be revealed.

In addition to Voice Engine, OpenAI recently announced a partnership with Microsoft to develop an AI-based supercomputer called "Stargate," with an estimated cost of $100 billion.