Yesterday Snowflake released their Q4 FY 2024 earnings during which they announced that Frank Slootman is retiring. The stock proceeded to drop by ~20% after hours. Obviously losing Frank Slootman as the CEO is of huge significance, but there were more worrying trends from the company’s earnings call. One of which is Apache Iceberg.
Iceberg was mentioned no less than 18 times during the company’s earnings call, second only to “AI” (28) as per my very non-scientific analysis of the call. Now, why would Snowflake’s CEO, both new and past, alongside their CFO mention a somewhat obscure Apache project on the call? The answer is simple: Iceberg is moving data out of Snowflake.
We are forecasting increased revenue headwinds associated with product efficiency gains, tiered storage pricing, and the expectation that some of our customers will leverage Iceberg tables for their storage. Source: Snowflake (SNOW) Q4 2024 Earnings Call Transcript
The impact is not just losing the storage revenue alone, but the more valuable compute revenue too.
So, the amount of revenue associated with storage is coming down. But on top of that, we do expect a number of our large customers are going to adopt Iceberg formats and move their data out of Snowflake where we lose that storage revenue and also the compute revenue associated with moving that data into Snowflake. Source: Snowflake (SNOW) Q4 2024 Earnings Call Transcript
Not only is Apache Iceberg moving data out of Snowflake, but moving it to other systems to process and analyze this data. Storage and the highly lucrative compute suffer.
So what is Apache Iceberg and why is enabling this data migration off the world’s most popular data warehouse?
A brief tour of Iceberg
Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. Source: Apache Iceberg
In non data-nerd terms Iceberg is a technology that allows you to create tables and offer the same transactional guarantees that databases offer but it does that on storage systems that do not necessarily support tables or transactions like AWS S3. Not only that, but the data you can store in Iceberg tables can be in non-proprietary formats like Parquet, CSV, JSON.
Iceberg is cheaper tremendously more flexible than data warehouses like Sowflake. Data stored in Iceberg tables can be queried with different querying engines like Dremio, Spark, Trino and others. In short, Iceberg allows you to create the backend of a data warehouse at a fraction of the cost of data warehouse like Snowflake, and gives you the freedom of storing your data in different format and using querying engines for your own choice. Iceberg gives you data and compute choices.
I wrote about this a few months ago when I analyzed traditional data warehouses against some of the modern lakehouse approaches.
Looking forward into the future, I suspect that the lakehouse approach will be the dominant model. The benefits this model offers - openness, lower cost, & interoperability - far outweigh its current limitations of complexity and performance relative to data warehouses..
Paradigm shifts: enterprise vs consumer
I first encountered Iceberg in 2019 when I was working at Dremio. I remember the excitement about this technology. It promised everything you can do with a data warehouse or database (backend only) with interoperability and openness at a fraction of the cost. What’s not to love?
Well, big enterprise paradigm shifts, and this is one, take time to gain widespread adoption. It took about 5 years for this somewhat obscure piece of technology to be the star of the earnings call of the most successful data company in the modern era to date. Compare that to consumer shifts. Take AI as an example which is taking both the consumer and enterprise world by storm and has been able to accomplish that in a matter of months. AI’s large adoption starting in the consumer space. Millions were able to very quickly experience ChatGPT without having to go through any enterprise red tape. The consumer led adoption starting to propagate into the enterprise quickly thereafter. The same patterns could be observed with mobile and social, which were both consumer led adoptions that quickly, although much slower than AI, gained adoption in the enterprise.
So, Iceberg has arrived in the enterprise. The question I now ponder is who is best positioned to capture the lucrative compute dollars to process this data? Storage is like a bookshelf filled with books. The books are of little value, other than decor, until they are read. The same applies to data. The high value is compute or analysis.
I suspect the folks at Tabular, Dremio, Starburst and Databricks are all eager to capitalize on this moment.
[I work for Snowflake but do not speak for them.]
Apache Iceberg support isn't GA in Snowflake yet so isn't affecting any current revenue. What was stated on the earnings call is it *could* affect some future revenue. However, adding native support was the right thing to do and ultimately should bring more workloads to Snowflake where customers can't or don't want to move data out of a data lake. Snowflake storage pricing is $23 to $13.80/compressed TB per month based on contract ACV. Some companies will pay more than that storing it on their own so won't move. Others will but will still use Snowflake compute to maintain the Iceberg tables.
Snowflake's native table format (FDN) was the first and still arguably the best cloud native table format. I think most companies will evaluate the feature and governance advantages of FDN and keep using it, but it is nice for customers to have a choice.