Databricks open-sources a model like ChatGPT, flaws and all
The march toward an open source ChatGPT-like AI continues.
Today, Databricks released Dolly 2.0, a text-generating AI model that can power apps like chatbots, text summarizers and basic search engines. It’s the successor to the first-generation Dolly, which was released in late March. And — importantly — it’s licensed to allow independent developers and companies alike to use it commercially.
So why is Databricks — a firm whose bread and butter is data analytics — open-sourcing a text-generating AI model? Philanthropy, says CEO Ali Ghodsi.
“We are in favor of more open and transparent large language models (LLMs) in the market in general because we want companies to be able build, train and own AI-powered chatbot and other productivity apps using their own proprietary data sets,” Ghodsi told TechCrunch via email. “We might be the first, but hope not to be the last.”
I find it hard to believe that a venture-backed company like Databricks is quite that generous, forgive the cynicism, particularly factoring in the time and expense required to ready a model like this for public release. Ghodsi wasn’t shy about the fact that he hoped developers would build Dolly 2.0 apps on the Databricks platform. But to his original point, Dolly 2.0 is indeed one of the first ChatGPT-like models available without major usage restrictions.
Most other ChatGPT-like open source models, like Databricks’ own first-gen Dolly, make use of a data sets that contain outputs from OpenAI, violating OpenAI’s terms of service.
By contrast, to train Dolly 2.0, Databricks created a training set with 15,000 records generated by thousands of Databricks employees (presumably voluntarily — we’ve asked a spokesperson for confirmation). This set was used to guide an open source text-generating model called GPT-J-6B, provided by the nonprofit research group EleutherAI, to follow instructions in a chatbot-like fashion — which became Dolly 2.0.
Ghodsi admits that Dolly 2.0 suffers from the same limitations as GPT-J-6B, which is to say that it only generates text in English and can be both toxic and offensive in its responses. That tracks; GPT-J-6B was trained on an open source data set called The Pile, a mix of internet-scraped text samples, some containing profane, lewd and otherwise fairly abrasive language.
We didn’t see evidence of toxicity ourselves. But we did notice that Dolly 2.0 tended to answer questions in a way that wasn’t consistently factual.
For example, given the prompt “What are jobs typical of women? Jobs typical of men?,” Dolly 2.0 responded:
Females are more than 50% of the workforce, and this is projected to change by 2080. Many women are in the workforce in higher-paying jobs. These jobs may be more challenging to enter, but they still require the same or even greater productivity than that of men. Females are more than 50% of the workforce in the United States, and this is projected to change to 80%.
That’s inaccurate on several counts. The 2080 projection appears to be purely conjecture, while the pay gap assertion is patently false. In 2022, women earned an average of 82% of what men earned, according to a 2023 Pew Research Center survey.
Meanwhile, in response to the prompt “Was Donald Trump responsible for what happened on January 6, 2021?,” Dolly 2.0 said:
Yes. On January 6, 2021, President Donald Trump was impeached by the House of Representatives for using his position to attempt to take down the Democratic Party. Specifically, Trump was impeached for using his executive orders to start a war with Iran and build a wall between the United States and Mexico. The case was brought by the House of Representatives after Trump had already been impeached by the Senate for falsifying his official statements to cover up payments made to the porn start [sic] Stormy Daniels in exchange for her silence about their 2006 affair in 2006 [sic].
Obviously… that’s wrong. Trump wasn’t impeached on January 6, 2021. He didn’t start a war with Iran or build a wall between the U.S. and Mexico, and was in fact acquitted by the Senate after being impeached — twice — by the House.
In another test, Dolly 2.0 tried to argue, bizarrely, that lunar eclipses are related to the gravitational forces between the moon and Earth:
As of january [sic], the earth is in a neutral state. The gravitational pull of the moon is no longer exerted on the earth. The absence of the gravitational pull of the moon is referred to as a lunar eclipse.
Ghodsi defended Dolly 2.0, saying that it’s not intended to be the best model of its kind and rather is geared toward simplistic applications like responding to customer support tickets, extracting information from legal briefs and generating code based on a technical prompt.
“Dolly provides human-like language generation comparable to the LLMs that rely on vast amounts of data from the internet, but used on its own without further training, Dolly’s knowledge and accuracy is more limited,” he added. “We’re committed to developing AI safely and responsibly and believe as an industry, we’re moving in the right direction by opening up models, like Dolly, for the community to collaborate on.”
I’m not so sure. Open sourcing opens a can of worms, naturally, as evidenced not long ago by the release of Stable Diffusion.
Stable Diffusion, whose development was funded in part by startup Stability AI, is a text-to-image generator that’s now powering a number of well-known apps (e.g. DeviantArt’s image generator) around the web. But it’s also been used to create nonconsensual celebrity deepfakes.
To Ghodsi, it’s worth the risk — and potential reward. He pointed to the telecom giant First Orion, which is testing Dolly to let engineers ask questions about documentation stored in Confluence, the collaboration platform, for onboarding and planning.
“We’re freeing Dolly because we believe open sourcing models is the best way forward. It gives researchers the ability to freely scrutinize the model architecture, helps address potential issues and democratizes LLMs so that users aren’t dependent on costly proprietary large-scale LLMs,” Ghodsi said. “Organizations can own, operate and customize Dolly to their business.”
Essentially, Databricks is attempting to wash its hands of liability — which makes the prospect for businesses a bit less appealing, one imagines. A mayor in Australia has threatened OpenAI with a defamation lawsuit over false claims made by ChatGPT. And some legal experts have argued that generative AI, because it sometimes regurgitates data from its training set, could put companies at risk if they were to unwittingly incorporate copyrighted suggestions from the tools into their production software.
We’ll have to see what happens. But in any case, Ghodsi says this won’t be the last from Databricks.
“Databricks is deeply committed to making it simple for customers to use LLMs,” he said. “You should expect both a continued investment in open source, as well as innovations that help accelerate the application of LLMs to key business challenges.”