AI hallucinations will be solvable within a year, ex-Google AI researcher says—but that may not be a good thing

April 16, 2024 ndowd

If you’ve ever asked ChatGPT, Gemini and other generative AI chatbots a question, you’ll have found that the answers they throw out can be fascinating at best—or completely made up, at worst.

But you won’t need to worry about AI hallucinating for much longer. That’s because AI experts told Fortune that the phenomenon is “obviously solvable” and they’ve estimated it’ll happen soon.

“I’m optimistic that we can solve it,” Raza Habib, who went from Google AI research intern to founding Humanloop in less than six months, revealed on a panel at Fortune’s Brainstorm AI conference in London.

“When you train a large language model, it goes through three stages—there’s pre-training, fine-tuning and then reinforcement learning from human feedback as the last stage,” Habib explained.

His London-based startup, which has raised $2.6 million, pioneered methods to make the training of large language models, such as those that underpin OpenAI’s ChatGPT, more efficient.

“If you look at the models before they are fine-tuned on human preferences, they’re surprisingly well calibrated. So if you ask the model for its confidence to an answer—that confidence correlates really well with whether or not the model is telling the truth—we then train them on human preferences and undo this.

“So the knowledge is kind of there already,” Habib added. “The thing we need to figure out is how to preserve it once we make the models more steerable.”

“But the fact that the models are learning calibration in the process already makes me very optimistic that it should be much easier to solve.”

A year, but…

When pressed for how long this will take, Habib responded: “Within a year.”

But, he doesn’t think it needs to be solved because we are so used to clunky tech anyway.

“We’re used to designing user experiences that are fault tolerant in some way,” the UCL graduate who has a PhD in machine learning added. “You go to Google search, you get a ranked list of links, you don’t get an answer, and people who are in Perplexity get citations back now.

“So I don’t think it has to be solved to make it useful.”

Plus, Habib thinks that a little bit of hallucination could be good—necessary even—if we want AI to help humanity think outside of the box.

“If we want to have models that will one day be able to create new knowledge for us, then we need them to be able to act as conjecture machines, we want them to propose things that are weird and novel—and then be able to filter that in some way,” he explained.

“So in some senses, [especially] if you’re doing creative tasks, having the models be able to sort of fabricate things that are going off the data domain is not necessarily a terrible thing.”

Last year, Habib was among around 20 software developers and startup CEOs to attend a closed-door meeting with Sam Altman. At the time, OpenAI’s recently reinstated co-founder and CEO told the collective his plans for the company, including how it’s coping with chip shortages—and Habib came under fire for leaking the entire conversation in a blog post.

Air Canada chatbot mishap ‘completely avoidable’

The panel—made up of Habib, ServiceNow’s VP of AI product Jeremy Barnes, and Rebecca Gorman, co-founder and CEO of Aligned AI—also debated Air Canada’s chatbot mishap.

If you’re unfamiliar with the story: Jake Moffatt bought an Air Canada ticket to go to the funeral of his grandmother in 2022. The company’s AI chatbot convinced the customer to buy a full-price ticket on the premise that he’d retrospectively be able to get a partial refund under its reduced bereavement fare policy.

So Moffatt bought full-price tickets from Vancouver to Toronto for about $590 and then a few days later paid $630 to return.

But when he requested some money back, Air Canada said the chatbot was wrong. Now, Canada’s main airline has been ordered to pay the customer compensation—but the entire situation was, in Habib’s eyes, “completely avoidable.”

“I just don’t think that they had done enough around testing,” he said, adding that the airline at least should have “had sufficient guardrails in place”.

“They gave the chatbot a much wider range than what it should have been able to say,” he added. “They more or less gave people almost raw access to ChatGPT with a little bit of rag attached and sure that’s a dangerous thing to do. But most people shouldn’t do that.”

Just because something “seems to work in a proof of concept, you probably don’t just want to put it straight into production, with real customers who have expectations and terms and conditions and things like that,” Barnes, a serial tech entrepreneur, echoed.

Fortune has contacted Air Canada for comment.

source