Retell AI lets companies build ‘voice agents’ to answer phone calls

May 9, 2024 ndowd

Call centers are embracing automation. There’s debate as to whether that’s a good thing, but it’s happening — and quite possibly accelerating.

According to research firm TechSci Research, the global market for contact center AI could grow to nearly $3 billion in 2028, from $2.4 billion in 2022. Meanwhile, a recent survey found that around half of contact centers plan to adopt some form of AI in the next year.

The motivation is rather obvious: Call centers are looking to reduce costs while scaling up their operations.

“Companies with heavy call center operations, looking to scale quickly without the constraints of human contact center agents, are highly receptive to adopting effective AI voice agent solutions,” entrepreneur Evie Wang told TechCrunch. “This approach not only reduces their overall costs but also decreases wait times.”

Wang is one of the co-founders of Retell AI, which provides a platform companies can use to create AI-powered “voice agents” that answer customer phone calls and perform basic tasks such as scheduling appointments. Retell’s agents are powered by a combination of large language models (LLMs) fine-tuned for customer service use cases and a speech model that gives voice to text generated by the LLMs.

Retell’s customers include some contact center operators but also small- and medium-sized businesses that regularly deal with high call volumes, like telehealth company Ro. They can build voice agents using the platform’s low-code tooling, or they can upload a custom LLM (e.g. an open model like Meta’s Llama 3) to further tailor the experience.

“We invest a lot in the voice conversation experience, as we see that as the most critical aspect of the AI voice agent experience,” Wang said. “We don’t view AI voice agents as mere toys that one can create with a few lines of prompts, but rather as tools that can offer substantial value to businesses and replace complex workflows.”

Retell worked well enough in my brief testing, at least on the call-facing side.

I arranged a call with a Retell bot using the demo form on Retell’s website. The bot walked me through the process of scheduling a hypothetical dentist’s appointment, asking questions like my preferred date and time, phone number and so on.

I can’t say the bot’s synthetic voice was the best I’ve heard in terms of realism — certainly not on par with Eleven Labs or OpenAI’s text-to-speech API. Wang, in Retell’s defense, said that the team’s been mostly focused on reducing latency and handling edge cases, like interruptions that might occur in a conversation.

The latency is low: In my test, the bot responded pretty much without hesitation to my answers and follow-up questions. And it stuck to its script. Try as I might, I couldn’t confuse it or prompt it to behave in a way it shouldn’t. (When I asked the bot about my dental records, it insisted that I speak with the office manager.)

So are platforms like Retell the future of call centers?

Maybe. For basic tasks like appointment scheduling, automation makes a lot of sense, which is probably why both startups and big tech firms alike offer solutions that compete head-on with Retell’s. (See Parloa, PolyAI, Google Cloud’s Contact Center AI, etc.)

It’s low-hanging — and seemingly revenue-generating — fruit. Retell claims to have hundreds of customers, all of which are paying per minute of voice agent conversation. Retell has raised a total of $4.53 million in capital to date, courtesy of backers including Y Combinator (where the company was incubated).

But the jury’s out on more-complicated queries, particularly given LLMs’ tendency to make up facts and go off the rails even with safeguards in place.

As Retell’s ambitions grow, I’m curious to see how the company navigates the many well-established technical challenges in the space. Wang, at least, seems confident in Retell’s approach.

“With the advent of LLMs and recent breakthroughs in speech synthesis, conversational AI is getting good enough to create really exciting use cases,” Wang said. “For example, with sub-one-second latency and the ability to interrupt the AI, we’ve observed users speaking in fuller sentences and conversing as they would with another person. We’re trying to make it easy for developers to build, test, deploy and monitor AI voice agents, ultimately to help them achieve production readiness.”

source