HoundDog.ai helps developers prevent personal information from leaking

May 23, 2024 ndowd

HoundDog.ai, a startup that helps developers ensure their code doesn’t leak personally identifiable information (PII), came out of stealth Wednesday and announced a $3.1 million seed round lead by E14, Mozilla Ventures and ex/ante, in addition to a number of angel investors. Unlike other scanning tools, HoundDog actually looks at the code a developer is writing, using both traditional pattern matching and large language models (LLMs) to find potential issues.

HoundDog was founded by Amjad Afanah, who previously co-founded DCHQ, which was later acquired by Gridstore (which, to complicate things, then changed its name to HyperGrid) in 2016. Afanah also co-founded apisec.ai, which is still up and running, and worked at self-driving startup Cruise. The inspiration for HoundDog came during his time at data security startup Cyral and talking to privacy teams there, he told me.

“When I was at Cyral, we had a lot of data,” he said. “What Cyral does — like many others in the data security space — is they focus on production systems. They help you discover, classify your structured data and your databases, and then help you apply access controls. But the overwhelming feedback that I kept hearing from security and privacy teams alike was: ‘You know, it’s a little too reactive and it doesn’t keep up with the changes in the code base.’”

So HoundDog shifts this process even further left. While it still sits in the continuous integration flow and not yet in the development environment (though that may happen in the future), the idea here is to find potential data leaks before the code is merged. And most importantly, HoundDog does so by looking at the actual code, not the data flow it produces. “Our source of truth is the code base,” Afanah said.

Thanks to this, if a development team starts collecting Social Security numbers, for example, HoundDog would raise a flag and warn the team about that before the code is ever merged; it would also alert the security team. That could potentially be a major — and costly issue — after all.

The service currently supports code written in Java, C#, JavaScript and TypeScript, as well as SQL, GraphQL and OpenAPI/Swagger queries. Support for Python is imminent, the company says.

Afanah noted that a tool like this is becoming especially important in this age of AI-generated code, something Replit CEO (and HoundDog angel investor) Amjad Masad also echoed.

“As an increasing number of companies turn to AI-generated code to accelerate development, embedding security best practices and ensuring the security of the generated code becomes essential,” Masad said. “HoundDog.ai is leading the way in securing PII data early in the development cycle, making it an indispensable component of any AI code generation workflow. This is the reason I chose to invest in this company.”

HoundDog itself does use AI, though, too. It currently relies on OpenAI’s models to do so, but it’s important to stress that this is optional. Users who worry about their code leaving their private repositories can also choose to only rely on the company’s more traditional code scanner.

A major part of HoundDog’s value proposition is that it can cut compliance costs for startups thanks to its automated reporting capabilities. The service can automatically generate a record of processing activities (RoPA). To do this, HoundDog uses generative AI to generate these reports and sends that data to OpenAI. The team does stress that only the tokens the service has discovered through its regular scanner are shared with OpenAI and that the actual source code isn’t shared.

The company offers a limited free plan, with paid plans starting at $200/month for scanning up to two repos.

source