ElevenLabs’ voice-generating tools launch out of beta
ElevenLabs, the viral AI-powered platform for creating synthetic voices, today launched its platform out of beta with support for more than 30 languages.
Using a new AI model developed in-house, ElevenLabs says that its tools are now capable of automatically identifying languages, including Korean, Dutch and Vietnamese, and generating “emotionally rich” speech in those languages.
In combination with the new model, ElevenLabs customers can leverage the platform’s voice-cloning tool to speak across the almost 30 languages without first having to type text.
“ElevenLabs was started with the dream of making all content universally accessible in any language and in any voice,” ElevenLabs CEO and co-cofounder Mati Staniszewski said in a statement. “With this release, we’re one step closer to making this dream a reality and making human-quality AI voices available in every dialect. Our text-to-speech generation tools help level the playing field and bring top quality spoken audio capabilities to all the creators out there.”
Founded by Staniszewski, who previously worked at Palantir, and his childhood friend Piotr Dabkowski, an ex-Google employee, ElevenLabs has made headlines over the past few months for reasons both good and abhorrent. Inspired by the mediocre dubbing of American movies Staniszewski and Dabkowski watched growing up in Poland, the pair set about designing a platform that could do better — employing AI of course.
ElevenLabs launched in beta in late January, and picked up steam rather quickly — owing to the high quality of its generated voices and generous free tier. But as alluded to earlier, the publicity hasn’t been consistently positive — particularly once bad actors exploited the platform for their own ends.
The infamous message board 4chan, known for its conspiratorial content, used ElevenLabs’ tools to share hateful messages mimicking celebrities like actor Emma Watson. Elsewhere, The Verge’s James Vincent was able to tap ElevenLabs to clone targets’ voices in a matter of seconds, generating audio samples containing everything from threats of violence to expressions of racism and transphobia.
In response, ElevenLabs said that it would introduce a set of new safeguards, like limiting voice cloning to paid accounts and providing a new AI detection tool.
ElevenLabs has yet to grapple with the other controversy brewing around its platform and other platforms like it, though: their threat to the voice acting industry.
Motherboard writes about how voice actors are increasingly being asked to sign away rights to their voices so that clients can use AI to generate synthetic versions that could eventually replace them. Meanwhile, internal emails seen by The New York Times indicate that Activision Blizzard, one of the biggest game publishers in the world, is working on tools for AI-assisted “voice cloning.”
It would appear that ElevenLabs sees this as the natural progression of things, touting its work with publishers like Storytel; media platforms like TheSoul Publishing and MNTN for audiobooks and radio content; and publishers like Embark Studios and Paradox Interactive for video games, (Storytel and TheSoul Publishing are strategic investors). The company claims that it has more than a million registered users across the creative, entertainment and publishing spaces who’ve created 10 years’ worth of audio content.
ElevenLabs, which recently raised $19 million from investors, including Andreessen Horowitz and DeepMind co-founder Mustafa Suleyman at a $99 valuation, plans to eventually extend its AI models to voice dubbing — following in the footsteps of startups like Papercup and Deepdub and building what it calls “a foundation to be able to transfer emotions and intonation from one language to another.”
Beyond this, ElevenLabs says it plans to introduce a mechanism that’ll allow users to share voices on the platform, although the details remain hazy.