Object Storage and In-Process Databases are Changing Distributed Systems
I attended the 19th International Workshop on High Performance Transaction Systems (HPTS) in 2022.[1] It was my first time attending. Highlights for me were the talks HPTS Comes Full Circle by James Hamilton, Formal Methods Only Solve Half My Problems by Marc Brooker, and Coordination: The Partial Collapse of Partial Order by Pat Helland. I was humbled to present Scaling Systems for Critical Infrastructure[2] in Randy Shoup’s session on scalability. Most valuable was discussing the challenges I work on with other experts in high-scale transactional systems in an informal and collaborative setting. This is a special gathering. Its uniqueness was apparent throughout my experience and reinforced by the words people shared about Jim Gray, a person synonymous with HPTS, in the closing session, Remembering Jim Gray.
Jim Gray was a proponent of writing things down and sharing them.[3] I am hoping to attend the 20th International Workshop on High Performance Transaction Systems (HPTS) in September 2024. I am sharing my one-page position statement and biography with the hope it might inspire others. As an aside, if you are motivated to work on these challenges, my team is hiring.
Position Statement
In my 2022 HPTS position statement, Our Transition to Renewable Energy: Motivating the Most Challenging Problems in Distributed Computing and IoT, I described tensions between local computing and cloud computing when IoT systems are used for critical infrastructure, like electric vehicle chargers or battery storage systems for renewable energy.
WebAssembly is attractive for moving computation between the cloud and the IoT edge.[4] While computation is critical, the computation must act upon the transactional and analytical data that underpins the system and these platforms remain largely in the cloud. Why ship all the data from IoT devices to the cloud, merging it together in message brokers or databases to achieve the necessary economies of scale, only to pull it apart again to serve queries, build applications, or share the data with third parties? Increasingly, cloud platforms are relying on durable and inexpensive object storage,[5] like Amazon S3,[6] as the database;[7] storing data in open file formats, like Parquet, and using open table formats, like Iceberg, to support efficient queries using data manifests; and using in-process databases, like SQLite and DuckDB, rather than large clusters dedicated to analytical workloads. This provides the opportunity to run widely distributed and independent workloads for each IoT device, even projecting the embedded IoT databases into the cloud, rather than using aggregated transactional or analytical data systems. As a consequence, the distinction between transactional and analytical workloads is blurring.[8] Object storage is also evolving to match block-storage latency, within certain availability guarantees.[9]
But why stop there? When storage is an open format, like Parquet, and the data systems, like SQLite or DuckDB, can be embedded in-process in as little as tens of megabytes of memory, and perform queries on data sets that are much larger than memory, it becomes feasible to take these workloads, even exactly the same code, and move them from the cloud directly on the IoT device or IoT gateway. This can drastically reduce costs (cloud networking, computation, and storage; cellular networking); improve latency and reliability; offer traditionally cloud services locally, independent of the cloud, or with intermittent connectivity to the cloud; improve data privacy by keeping sensitive data locally; and support direct integration with third-party services (service providers for installation and maintenance; platforms for command and control; enterprise historians in industrial environments).[10]
The cloud will remain critical for the coordination of many IoT devices (e.g., communicating weather forecasts or energy prices), data aggregations (e.g., energy forecasts for market participation; electric vehicle charger status), managing firmware updates and device settings, among other things, but the cloud will become increasingly less important for transactional and analytical data processing and storage. When it is, there will be a tighter relationship between the computation and storage at the edge and in the cloud. I would like to share the innovations I have been working on and learn from others working in this rapidly evolving space, particularly people who are leading the way, or are skeptical, or are pursuing different approaches.
Biography
I lead the cloud platforms organization for Tesla Energy developing real-time services and critical infrastructure for power generation, battery storage, vehicle charging, and grid services. My presentation Tesla Virtual Power Plant details the architecture and technologies used in this large-scale, distributed IoT platform, including some of the unique grid services it can provide. I am most actively focused on creating platforms that bridge storage and computation in the cloud and on the IoT device for both transactional and analytical workloads.
Before joining Tesla, I worked at OSIsoft developing real-time infrastructures for the monitoring and control of industrial applications. This included developing a distributed time-series database that could scale to millions of series and millions of transactions per second, and publish-subscribe services for messaging among distributed, event-based applications. This software is widely used for critical operations in the process industries, manufacturing, and for power generation, transmission, and distribution.
The picture at the top of this page is from a walk I took at Asilomar where the workshop is held. ↩︎
One of the great things about HPTS is the talks are not recorded, so the speakers can talk a little more openly about their work. My slides are not available, but much of what I covered is available on my blog. See WebAssembly at the IoT Edge: A Motivating Example and The State of the Art for IoT and the linked talks. ↩︎
I learned from the stories in the Remembering Jim Gray session that Jim was a wonderful mentor to so many people. He believed writing things down allowed ideas to have a life far beyond one person or the present moment. He also believed the first name on a paper should be the first person to write down and share the idea, not the first person with the idea. ↩︎
I am continuing to explore WebAssembly, but its widespread adoption remains dependent on the widespread adoption of the WebAssembly Component Model. ↩︎
I am pleased to see an increasing interest in software systems built around object storage. Over a decade ago, I built a durable messaging system using Microsoft Azure Page Blobs for sharing industrial data between enterprises. It allowed us to build a very simple and reliable system without having to worry about leadership election, quorum, data replication, or data re-partitioning. For more information, see Shared-Nothing Architectures for Server Replication and Synchronization. ↩︎
Amazon S3 is unimaginably durable and inexpensive. The talk Building and Operating a Pretty Big Storage System (My Adventures in Amazon S3) by Andy Warfield is an excellent history of the amazing engineering that has gone into S3, as well as some of the social and organizational aspects of large-scale engineering and engineering excellence. It is one of my all-time favourite talks. I had the pleasure of meeting with Andy a couple of times in the past year to discuss some of my ideas in the position statement. ↩︎
It might seem strange to view object storage as the database, but at the end of the day, a database is also a collection of data and indexes stored on disk with various abstractions to hide the latency of the underlying storage. Sometimes I hesitate to call DuckDB a database, because it can just be used as a library for efficient data processing without ever creating a DuckDB database, and I have seen this confuse people. But what constitutes the database in this case is the combination of the files on disk with DuckDB as the query planner and execution engine. See Databases Are Falling Apart: Database Disassembly and Its Implications by Chris Riccomini for an excellent review of what is happening in this space. ↩︎
This can be thought of as an extension of the idea of a “digital twin” to include not just the current state of the hardware, but all of the operational data and metadata available. ↩︎
See Andy Warfield’s talk AWS storage: The backbone for your data-driven business from AWS re:Invent 2023 for more information on innovations in S3 including S3 Express One Zone and Mountpoint for Amazon S3. ↩︎
I am hopeful that the use of in-memory databases, both in the cloud and on the IoT edge, will encourage a renewed focus on more efficient computing by considering programming languages, power consumption, resource consumption, performance, and data formats. For example, many analytical systems use strings because of their flexibility and because it removes the need to join a binary format with a dictionary to understand the data. But strings are wildly inefficient and can be abused to inject just about anything. A bit field can provide a much more efficient, flexible, and correct representation of state. See Masking the Problem: Representing Complex State Without Strings. ↩︎