OpenAI gives ChatGPT a voice for verbal conversations

September 25, 2023 ndowd

ChatGPT is evolving into much more than a text-based search engine, with OpenAI announcing today that it’s adding new voice and image-based smarts to the mix.

The wildly popular generative AI assistant has been one of the biggest technology success stories of recent times since its debut some nine months ago, allowing anyone to generate essays, poems and summaries from simple text-based prompts. But now, ChatGPT is about to get a lot more interactive, with users also able to have a voice conversation with the chatbot.

The announcement comes on the same day Amazon committed to invest up to $4 billion in OpenAI rival Anthropic, a move that constitutes part of a bigger generative AI battle between the tech giants of the world that includes Google trying to play catchup via its Bard chatbot, Meta adopting a firm open source ethos to help it get a leg up and Microsoft closely aligning itself with OpenAI itself.

Conversation starter

Today marks a notable evolution for the generative AI movement, with OpenAI meshing the familiar world of voice-based assistants with its powerful large language models (LLMs).

For instance, a user will be able to verbally ask ChatGPT to make up a bedtime story on the spot, with a few vocal prompts to guide the narrative. Or the user can simply ask it a question, with ChatGPT giving its response in spoken word form.

Elsewhere, ChatGPT users will also be able to search for answers using images, for instance uploading a picture of something and asking ChatGPT to explain what it is, or to provide instructions for completing a goal.

ChatGPT image search. Image Credits: OpenAI

The voice feature is powered by a new text-to-speech model that can generate human-like voices from text and a few seconds of sampled speech. OpenAI said that it teamed up with established voice actors to create five different voices, with its open source Whisper speech recognition system used to transcribe verbal utterances into text.

Spotify was also unveiled as a launch partner, with the music-streaming giant introducing a pretty neat new feature for podcasters that allows them to sample their voice and translate their shows from English into Spanish, French or German — while retaining their own original voice. However, it seems that OpenAI is being careful not to attract criticism, as it’s not making this technology available to anyone — it has worked specifically with podcasters including Dax Shepard, Monica Padman, Lex Fridman, Bill Simmons and Steven Bartlett for the launch.

“The new voice technology — capable of crafting realistic synthetic voices from just a few seconds of real speech — opens doors to many creative and accessibility-focused applications,” the company wrote in a blog post. “However, these capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud.”

The new features will begin rolling out to paying Plus and Enterprise subscribers in the coming two weeks. To activate voice features, users need to head to the “settings” menu in the app, then head to “new features” and opt-in to voice conversations. They then have to tap the headphone button in the top-right corner, and select the voice they want.

Voice will be limited to the ChatGPT Android and iOS apps on an opt-in beta basis initially, while image search will be landing on all platforms by default.

source