Lakera launches to protect large language models from malicious prompts

October 12, 2023 ndowd

Large language models (LLMs) are the driving force behind the burgeoning generative AI movement, capable of interpreting and creating human-language texts from simple prompts — this could be anything from summarizing a document to writing a poem to answering a question using data from myriad sources.

However, these prompts can also be manipulated by bad actors to achieve far more dubious outcomes, using so-called “prompt injection” techniques whereby an individual inputs carefully crafted text prompts into an LLM-powered chatbot with the purpose of tricking it into giving unauthorized access to systems, for example, or otherwise enabling the user to bypass strict security measures.

And it’s against that backdrop that Swiss startup Lakera is officially launching to the world today, with the promise of protecting enterprises from various LLM security weaknesses such as prompt injections and data leakage. Alongside its launch, the company also revealed that it raised a hitherto undisclosed $10 million round of funding earlier this year.

Data wizardry

Lakera has developed a database comprising insights from various sources, including publicly available open source datasets, its own in-house research and — interestingly — data gleaned from an interactive game the company launched earlier this year called Gandalf.

With Gandalf, users are invited to “hack” the underlying LLM through linguistic trickery, trying to get it to reveal a secret password. If the user manages this, they advance to the next level, with Gandalf getting more sophisticated at defending against this as each level progresses.

Lakera’s Gandalf. Image Credits: TechCrunch

Powered by OpenAI’s GPT3.5, alongside LLMs from Cohere and Anthropic, Gandalf — on the surface, at least — seems little more than a fun game designed to showcase LLMs’ weaknesses. Nonetheless, insights from Gandalf will feed into the startup’s flagship Lakera Guard product, which companies integrate into their applications through an API.

“Gandalf is literally played all the way from like six-year-olds to my grandmother, and everyone in between,” Lakera CEO and co-founder David Haber explained to TechCrunch. “But a large chunk of the people playing this game is actually the cybersecurity community.”

Haber said the company has recorded some 30 million interactions from 1 million users over the past six months, allowing it to develop what Haber calls a “prompt injection taxonomy” that divides the types of attacks into 10 different categories. These are: direct attacks; jailbreaks; sidestepping attacks; multi-prompt attacks; role-playing; model duping; obfuscation (token smuggling); multi-language attacks; and accidental context leakage.

From this, Lakera’s customers can compare their inputs against these structures at scale.

“We are turning prompt injections into statistical structures — that’s ultimately what we’re doing,” Haber said.

Prompt injections are just one cyber risk vertical Lakera is focused on though, as it’s also working to protect companies from private or confidential data inadvertently leaking into the public domain, as well as moderating content to ensure that LLMs don’t serve up anything unsuitable for kids.

“When it comes to safety, the most popular feature that people are asking for is around detecting toxic language,” Haber said. “So we are working with a big company that is providing generative AI applications for children, to make sure that these children are not exposed to any harmful content.”

Lakera Guard. Image Credits: Lakera

On top of that, Lakera is also addressing LLM-enabled misinformation or factual inaccuracies. According to Haber, there are two scenarios where Lakera can help with so-called “hallucinations” — when the output of the LLM contradicts the initial system instructions, and where the output of the model is factually incorrect based on reference knowledge.

“In either case, our customers provide Lakera with the context that the model interacts in, and we make sure that the model doesn’t act outside of those bounds,” Haber said.

So really, Lakera is a bit of a mixed bag spanning security, safety and data privacy.

EU AI Act

With the first major set of AI regulations on the horizon in the form of the EU AI Act, Lakera is launching at an opportune moment in time. Specifically, Article 28b of the EU AI Act focuses on safeguarding generative AI models through imposing legal requirements on LLM providers, obliging them to identify risks and put appropriate measures in place.

In fact, Haber and his two co-founders have served in advisory roles to the Act, helping to lay some of the technical foundations ahead of the introduction — which is expected some time in the next year or two.

“There are some uncertainties around how to actually regulate generative AI models, distinct from the rest of AI,” Haber said. “We see technological progress advancing much more quickly than the regulatory landscape, which is very challenging. Our role in these conversations is to share developer-first perspectives, because we want to complement policymaking with an understanding of when you put out these regulatory requirements, what do they actually mean for the people in the trenches that are bringing these models out into production?”

Lakera founders: CEO David Haber flanked by CPO Matthias Kraft (left) and CTO Mateo Rojas-Carulla. Image Credits: Lakera

The security blocker

The bottom line is that while ChatGPT and its ilk have taken the world by storm these past nine months like few other technologies have in recent times, enterprises are perhaps more hesitant to adopt generative AI in their applications due to security concerns.

“We speak to some of the coolest startups, to some of the world’s leading enterprises — they either already have these [generative AI apps] in production, or they’re looking at the next three to six months,” Haber said. “And we are already working with them behind the scenes to make sure they can roll this out without any problems. Security is a big blocker for many of these [companies] to bring their generative AI apps to production, which is where we come in.”

Founded out of Zurich in 2021, Lakera already claims major paying customers, which it says it’s not able to name-check due to the security implications of revealing too much about the kinds of protective tools that they’re using. However, the company has confirmed that LLM developer Cohere — a company that recently attained a $2 billion valuation — is a customer, alongside a “leading enterprise cloud platform” and “one of the world’s largest cloud storage services.”

With $10 million in the bank, the company is fairly well-financed to build out its platform now that it’s officially in the public domain.

“We want to be there as people integrate generative AI into their stacks, to make sure these are secure and the risks are mitigated,” Haber said. “So we will evolve the product based on the threat landscape.”

Lakera’s investment was led by Swiss VC Redalpine, with additional capital provided by Fly Ventures, Inovia Capital and several angel investors.

source