Operational Benefits of Databases Built on Object Storage

Operational Benefits of Databases Built on Object Storage

I had the honor of both attending and speaking at the last two International Workshops on High Performance Transaction Systems, fondly known as HPTS.[1] This is a special event where experts in systems research, development, and operations are invited to privately discuss the latest trends in the industry, with a particular focus on systems performance and scalability. My position statement is below. I look forward to discussing these experiences and learning from the community, whether or not I end up on stage.

Position Statement

In recent years, databases have increasingly adopted object storage in place of block storage or direct-attached storage.[2] Object storage offers exceptional durability and scalability, while the clean separation of storage from compute provides superior operational flexibility. The primary trade-off is higher latency compared with block or direct-attached storage.[3] Mitigating this requires batching writes and caching reads, but these techniques have been used in databases for decades and they extend naturally to object storage. Because objects are immutable, object storage pairs particularly well with log-structured merge (LSM) trees, where writes are batched for scalability, and periodic compaction keeps reads efficient. Examples of database systems that use object storage include WarpStream (a Kafka-compatible streaming platform), Turbopuffer (a search engine), InfluxDB (a time-series database), SlateDB (a key-value store), and Snowflake (a data analytics platform).

Over the past two years, I developed a purpose-built database for managing critical operations of battery energy storage systems (BESS), at both industrial and IoT scale, that relies on object storage. If invited to speak, I will describe the database architecture—including its LSM structure, caching strategies, and integration with open-source query optimizers—with a focus on the operational advantages of object storage from my practical experience.

The advantages are numerous. Object storage delivers atomic writes, strong durability, linear scalability, and essentially unlimited capacity. Immutable objects simplify read caching, data replication (for redundancy or disaster recovery), and multi-resource access because the same objects can be read by isolated compute resources to meet different service-level objectives or to support side-by-side testing and validation. Efficient random access to large objects is achieved via range queries, while tail latencies can be reduced through read hedging. Data reprocessing, replication, and repartitioning leverage standard open-source tools rather than custom software. Object tagging enables straightforward data retention policies and regulatory compliance.

When objects are stored in open formats and are accessible by multiple systems, data are no longer locked inside a single database. When the number of reads and writes is bounded, object storage is highly cost-effective. Data tiering makes it even more cost-effective. Finally, the independence of compute from storage allows flexible scaling of compute resources to match load and control costs, while the ubiquitous S3 API ensures portability across public cloud providers, on-premises object stores, and even edge platforms and embedded systems.[4]


  1. My position statements for the previous two HPTS workshops are also available on my blog: Object Storage and In-Process Databases are Changing Distributed Systems and Our Transition to Renewable Energy: Motivating the Most Challenging Problems in Distributed Computing and IoT. The picture above was taken at HPTS in 2022. ↩︎

  2. See my talk Predicting the Future of Distributed Systems and my essay of the same title. ↩︎

  3. Although S3 Express One Zone offers the S3 API with block storage latency through reduced availability guarantees. See my talk Bridging the Gap: The Convergence of Transactional and Analytical Data. ↩︎

  4. I have now used five different object storage providers without changing the design of the database. ↩︎