The market for synthetic data is bigger than you think
To understand what's happening, but also what's coming if synthetic data does get more broadly adopted, we talked to various CEOs and VCs over the last few months.
“By 2024, 60% of the data used for the development of AI and analytics projects will be synthetically generated.” This is a prediction from Gartner that you will find in almost every single article, deck, or press release related to synthetic data.
We are repeating this quote here despite its ubiquity because it says a lot about the total addressable market of synthetic data.
Let’s unpack: First, describing synthetic data that is “synthetically generated” may seem tautologic, but it is also quite clear: We are talking about data that is artificial/fake and created, rather than gathered in the real world.
Next, there’s the core of the prediction — that synthetic data will be used in the development of most AI and analytics projects. Since such projects are on the rise, the correlation is that the market for synthetic data is also set to grow.
Last but not least is the time horizon. In our startup world, 2024 is almost today, and people at Gartner already have a longer-term prediction: Some of its team published a piece of research “Forget About Your Real Data — Synthetic Data Is the Future of AI.”
“The future of AI” is the kind of promise that investors like to hear, so it’s no surprise that checks have been flowing into synthetic data startups.
In 2022 alone, MOSTLY AI raised a $25 million Series B round led by Molten Ventures; Datagen landed a $50 million Series B led by Scale Venture Partners, and Synthesis AI pocketed a $17 million Series A.
Synthetic data startups that have raised significant amounts of funding already serve a wide range of sectors, from banking and healthcare to transportation and retail. But they expect use cases to keep on expanding, both inside new sectors as well as those where synthetic data is already common.
To understand what’s happening, but also what’s coming if synthetic data does get more broadly adopted, we talked to various CEOs and VCs over the last few months. We learned about the two main categories of synthetic data companies, which sectors they address, how to size the market, and more.
The tip of the iceberg
Quiet Capital’s founding partner, Astasia Myers, is one of the investors bullish about synthetic data and its applications. She declined to disclose whether she invested in this space, but said that “there’s a lot to be excited about in the synthetic data world.”
Why the enthusiasm? “Because it gives teams faster access to data in a secure way at a lower cost,” she told TechCrunch.
We can simply say that the TAM of synthetic data and the TAM of data will converge. Ofir Zuk (Chakon)
Access to large troves of data has become critical for machine learning teams, and real data is often not up to the task, for different reasons. This is the gap that synthetic data startups are hoping to fill.
There are two main contexts in which these startups focus: structured data and unstructured data. The former refers to the kind of datasets that sit in tables and spreadsheets, while the latter points toward what we could call media files, such as audio, text, and visual data.
“It makes sense to distinguish between structured and unstructured synthetic data companies,” Myers said, “because the synthetic data type is applied to different use cases and therefore different buyers.”