Betterdata uses synthetic data to keep real data safe
Betterdata, a Singapore-based startup that uses programmable synthetic data to keep real data secure, announced today it has raised $1.55 million. The seed round, which it says was oversubscribed, was led by Investible with participation from Franklin Templeton, Xcel Next, Singapore University of Technology and Design, Bon Auxilium, Tenity, Plug and Play and Entrepreneur First.
The startup was founded in 2021 by Dr. Uzair Javaid, its CEO, and chief technologist Kevin Yee, with the goal of making data sharing faster and more secure as data protection regulations increased around the world. The company is currently in research and development partnerships with two major universities in Singapore and the United States (it can’t publicly disclose who they are) and its clients include Shanghai Pudong Development Bank.
Betterdata says it is different from traditional data sharing methods that use data anonymization to destroy data because it utilizes generative AI and privacy engineering instead.
Yee explained to TechCrunch that programmatic synthetic data uses generative models, like deep learning models including generative adversarial models used in deepfakes, transformers used in ChatGPT and diffusion models used in stable diffusion, to create and augment new datasets.
These synthetic datasets have similar characteristics and structure to real-world data without disclosing sensitive or private information about individuals.
“The idea is to create a fictional version of a real dataset that can be used safely for a variety of purposes including safeguarding confidential data, reducing bias and also improving machine learning models,” he said.
Programmatic synthetic data helps developers in many ways. A few examples include helping them protect sensitive data, comply with data protection regulations like GDPR and HIPAA, increase data availability between teams, create more data to train, test and validate machine learning models and address data imbalance issues by creating more records for underrepresented groups or classes.
Betterdata’s funding will be used on its product launch and to enhance its programmable synthetic data tech stack, including support for single-table, multi-table and time-series datasets. These are different variations of tabular datasets and Yee explains that the main differences are their structures and the problems thy are created to address.
For example, single-table datasets focus on standalone tables, while multi-table datasets are meant to consider relationships between multiple tables, and time-series datasets deal with data collected over time.
Betterdata also plans to hire more people, including sales and marketing employees, and expand beyond Singapore to more of the Asia-Pacific region over the next one to two years.
In a statement about Investible’s investment, principal Khairu Rejal said, “Betterdata solves one of the biggest issues the AI industry is facing today: lack of high-quality data that also meets privacy requirements. Through its powerful platform, Betterdata generates synthetic data that mimics real-world data without compromising quality and privacy, helping businesses meet global compliance and privacy laws at scale.”