OpenAI delays ChatGPT’s new Voice Mode

June 26, 2024 ndowd

In May, when OpenAI first demoed an eerily realistic, nearly real-time “advanced voice mode” for its AI-powered chatbot platform ChatGPT, the company said that the feature would roll out to paying ChatGPT users within a few weeks.

Months later, OpenAI says that it needs more time.

In a post on OpenAI’s official Discord server, OpenAI says that it had planned to start rolling out advanced Voice Mode in alpha to a small group of ChatGPT Plus users in late June, but that lingering issues forced it to postpone the launch to sometime in July.

“For example, we’re improving the model’s ability to detect and refuse certain content,” OpenAI writes. “We’re also working on improving the user experience and preparing our infrastructure to scale to millions while maintaining real-time responses. As part of our iterative deployment strategy, we’ll start the alpha with a small group of users to gather feedback and expand based on what we learn.”

Advanced Voice Mode might not launch for all ChatGPT Plus customers until the fall, OpenAI says, depending on whether it meets certain internal safety and reliability checks. The delay will not, however, affect the rollout of the new video and screen sharing capabilities demoed separately during OpenAI’s spring press event.

Those capabilities include solving math problems given a picture of the problem, and explaining various settings menus on a device. They’re designed to work across ChatGPT on smartphones as well as desktop clients, like the app for macOS, which became available to all ChatGPT users earlier today.

“ChatGPT’s advanced Voice Mode can understand and respond with emotions and nonverbal cues, moving us closer to real-time, natural conversations with AI,” OpenAI writes. “Our mission is to bring these new experiences to you thoughtfully.”

Onstage at the launch event, OpenAI employees showed off ChatGPT responding almost instantly to requests such as solving a math problem on a piece of paper placed in front of a researcher’s smartphone camera.

OpenAI’s advanced Voice Mode generated quite a bit of controversy for the default “Sky” voice’s similarity to actress Scarlett Johansson’s. Johansson later released a statement saying that she hired legal counsel to inquire about the voice and get exact details about how it was developed — and that she’d refused repeated entreaties from OpenAI to license her voice for ChatGPT.

OpenAI, while denying that it used Johansson’s voice without permission or a soundalike, later removed the offending voice.

source