Future Software Should Be Memory Safe: Reflections on A Path Toward Secure and Measurable Software

Future Software Should Be Memory Safe: Reflections on A Path Toward Secure and Measurable Software

The United States White House Office of the National Cyber Director recently released a report entitled Back to the Building Blocks: A Path Toward Secure and Measurable Software encouraging the software industry to proactively reduce the likelihood of cyber attack. The report received attention for its guidance on using memory-safe programming languages,[1] discouraging the use of C and C++ and proactively recommending Rust. The press release was even titled Future Software Should Be Memory Safe. However, the report itself focused equally on formal verification, secure hardware, and measuring software quality. I appreciate the conclusions of the report but I don’t think it went far enough and I want to relate the report to my work engineering software for critical infrastructure, including memory-safe languages, formal verification, making quality visible, sandboxing with WebAssembly, and embeddeding open-source databases written in memory-unsafe languages, like SQLite and DuckDB.

Memory-Safe Languages

I have been writing software for critical infrastructure for twenty-five years. For more than half of my career, I used programming languages that were not memory-safe. I started programming in C, writing services for data collection embedded in SCADA and EMS[2] that controlled the transmission and distribution of electricity to millions of customers in North America and Australia. I also wrote drivers in C++ for data collection from paper machines and building automation. I then spent over a decade writing a time-series database in C++. This was complex multi-process, multi-threaded code with a number of memory managers and indexing mechanisms.[3] This database began development before the C++ standard template library existed. A lot of my work involved evolving code to use the standard template library[4] and language features introduced in C++11 for safer and more efficient memory management, like range-based for loops, null-pointer constants, rvalue references, move constructors, and smart pointers.[5] This database, including C++ code I wrote over fifteen years ago, remains widely deployed for operating critical infrastructure, including water, electricity, oil, gas, and manufacturing.

For the past decade, I have been using memory-safe languages, primarily C# and Scala. For the past year, I have been programming in Rust.[6] I’m interested in Rust for memory safety, but also for the performance improvements and cost savings of native code,[7] Rust’s support for functional programming constructs,[8] the ease of using the same code in cloud environments and embedded environments, and Rust’s excellent support for WebAssembly (more on this later).

As I have noted on this blog a few times, when performance and resource constraints are important, application developers should favour Rust over C or C++. But reality is not that simple. There has been an enormous amount of C and C++ code written over the past decades. It cannot all be rewritten in a memory-safe language, or magically become safe. Over the years, the C++ standard has added many memory-safe features and it continues to get better—the whole point of C++ was to write safer code. In this evolution, however, C++ has not deprecated or banned the “unsafe” parts of the standard. Doing so often does more harm than good and it is easy to underestimate the impact of breaking changes. They can cause people to never upgrade their C++ version. Bringing memory safety to C and C++ requires an incremental approach, something the report failed to detail.[9]

At the end of the day, computer programs require some “unsafe” code to do useful things. Encouraging people to use Rust over C++ just to then write a bunch of unsafe Rust means we are no further ahead. In general, unsafe code should be the domain of library developers anyway, not something the average software developer is concerned with. Rust’s approach is to clearly demarcate the unsafe code to signal the programmer’s intent. We can give this code extra scrutiny for memory safety, and then rely on the type system, the compiler, and the static analyzer to ruthlessly eliminate errors everywhere else. This feels like the right approach and it is the incremental approach C++ is also taking. The C++ Core Guidelines has profiles for type safety, bounds safety, and lifetime safety, and profiles are supported by a number of static analyzers.

But safe code for critical infrastructure is about a lot more than just memory safety. Variable overflow and underflow, timing, and race conditions are equally important. Go is often described as a memory-safe language. It is arguably safer than C or C++, but I regularly see data races related to concurrency in Go that would be prevented in Rust, or even C++, due to the ability to statically check that only one thread at a time can mutate an object. Data races are possible in other memory-safe languages, but they are not something I have have been concerned with in Scala because of Akka and its use of actors, streams, and immutable data as the concurrency primitives. The concurrency model matters a lot.[10]

In addition, it doesn’t matter how memory safe a language is if the software fails to validate input. Studies by Microsoft, Google, and Mozilla are often quoted to demonstrate that seventy percent of critical vulnerabilities are related to memory safety.[11] On the other hand, some of the most high profile vulnerabilities of the past few years, like Log4j, have been related to input validation or the deserialization of untrusted data. The ICS Advisory Project compiles vulnerability data for the critical infrastructure sector and lists CWE-20: Improper Input Validation as the single most enumerated vulnerability.

Finally, when people stress that it is possible to write memory-safe code in languages like C++, they often point to rockets or autonomous space-vehicles as exemplars. The report on memory-safe languages even calls out Rust as being unproven for these systems. However, these software systems are generally not connected to the Internet, and while satellites are certainly critical infrastructure, no one is going to be without food, water, electricity, transportation, or medical care if a lunar rover is disabled via a memory vulnerability. Obscured by this, the report is really calling out the software used in everyday critical infrastructure, an industry that has traditionally not had the same rigor or focus on quality as the space industry, with code that is developed, deployed, and maintained by many organizations, and under much different circumstances.

Formal Methods

Formal methods involve writing formal specifications and checking systems behaviours against those specifications. Initially, I was surprised to see the report mention formal methods as a path to more secure software given the challenges of formal verification, like modelling and verifying large state spaces, or the software implementation never matching, or diverging from, the model. However, the report defines formal methods fairly broadly, including static analysis, model checkers, and even assertion-based testing.

In a distributed system, there is no such thing as a perfect failure detector.
Peter Alvaro, The Twilight of the Experts: The Future of Observability

Despite its challenges, formal verification is becoming more accessible. At the 20th International Workshop on High Performance Transaction Systems (HPTS) in 2022,[12] Ankush Desai of Amazon described the P programming language, a state-machine-based programming framework for verifying distributed systems. It was used to formally verify code used by Amazon S3 when it implemented read-after-write consistency. P specifications are run-time monitors that listen to messages, maintain local state, and assert global invariants. The message-based, state-machine approach reminds me of actor-model programming. P has been used on top of logs to verify invariants at run-time, in production.[13] As of this writing, P also supports code generation for C and C# with support for Java in the works, which means production code can be generated directly from the model. Not to be lost in the technical details, one of the takeaways Ankush emphasized was that the discipline itself is invaluable because it gets product managers and engineers to discuss, agree on, and document the systems invariants.

The challenge of designing a system from the top down is that you never confront whether there are any agencies that can deliver appropriate promises.
Mark Burgess, Thinking in Promises: Designing Systems for Cooperation

The challenge with formal methods is that it is hard to formally verify the promises of the entire system. We can build bottom up and use formal methods, broadly defined, to help verify that each agent in a complex system is capable of fulfilling its promises, but guarantees do not necessarily compose into systems.

Figuring out how systemic promises emerge from the real promises of individual agents is the task of the modern engineer. We should not make light of the challenge. System designers think more like town planners than bricklayers, and need to know how to think around corners, not merely in straight lines.
Mark Burgess, Thinking in Promises: Designing Systems for Cooperation

I am glad the report defined formal methods broadly and viewed them as complimentary. In a two-part essay, I shared how I view errors in distributed software systems. In the first article, On Eliminating Error in Distributed Software Systems, I encouraged a hierarchical approach that involves automated tests, the type system, functional programming, and formal verification. In the second article, On Embracing Error in Distributed Software Systems, I considered embracing failure as part of the programing model, rather than trying to eliminate it, handling run-time system dynamics, and considering the people who develop, operate, and rely on these systems as part of the system itself.

Quality Views

It was interesting to see the report emphasize the importance of the visual communication of software quality:

The power of visualization in conveying complex data is well-established, and in the context of cybersecurity, proves invaluable. With a quality graph, trends and patterns that might have remained obscured in tables of data would become more comprehensible.

While the research presented in the Accelerate book includes many aspects of software development and operations and isn’t focused on cybersecurity or critical infrastructure, one of the key conclusions is the importance of making quality issues and process issues visual:

The use of WIP [work in progress] limits and visual displays is well known in the Lean community. They are used to ensure that teams don’t become overburdened (which may lead to longer lead times) and to expose obstacles to flow. What is most interesting is that WIP limits on their own do not strongly predict delivery performance. It’s only when they’re combined with the use of visual displays and have a feedback loop from production monitoring tools back to delivery teams or the business that we see a strong effect. When teams use these tools together, we see a much stronger positive effect on software delivery performance.
— Accelerate: Building and Scaling High Performing Technology Organizations

In 2015, I developed a technique called quality views for visualizing software quality in a distributed system. My intent was to model risk holistically, including scalability, deployability, high-availability, operations, and security, making it visual, and showing it changing over time. Quality views helped me communicate and reduce latent risks that were going unrecognized. I described how quality views could be used for security concerns by visualizing the high-risk parts of the system that also have a large number high-severity, unpatched security vulnerabilities, or significant threats enumerated from a threat modeling exercise. I’m glad to see the report encourage software quality metrics and the importance of making them visual. As part of managing risk, perhaps quality views have a role to play in mapping quality metrics to a visual representation of the software system.

WebAssembly

The report dedicates a whole section to how secure hardware can offer protections against memory-unsafe languages.[14] It specifically mentions the memory-tagging extension (MTE) and Capability Hardware Enhanced RISC
Instructions (CHERI).[15] Surprisingly, the report does not provide a similar level of guidance on sandboxing unsafe code through software, only mentioning in passing that containers can be used to limit systems privileges. Sandboxing unsafe code is a particularly attractive option if the hardware or software cannot be modified. Even with hardware protections, sandboxing offers additional protections and operational flexibility.

WebAssembly is a binary instruction format paired with a stack-based virtual machine that executes with near-native performance. WebAssembly was originally developed to run C++ safely in a web browser, but it now has the potential to run sandboxed code in many environments, including on the server, or in embedded environments, like IoT.[16]

WebAssembly offers a number of ways to make software more secure. First, unsafe code can be compiled to WebAssembly and the WebAssembly run-time can ensure the code is safely executed within the linear memory of the virtual machine.[17] Second, the WebAssembly run-time ensures the code only has access to the imports and exports it declares, and nothing more. For example, the code cannot open ports, or read from or write to the file system, if these capabilities have not been explicitly provided by the host run-time. Third, the run-time can limit resources, like the amount of memory, the execution time, or the number of CPU instructions, and terminate the process if it violates these limits.[18] Fourth, WebAssembly binaries can be signed, including their capabilities, providing a strong guarantee of their interfaces. Finally, because WebAssembly components can declare the capabilities they need, like reading messages from a message broker, storing data in a key-value database, or logging messages using a logging function, a lot of code normally maintained by the software development team and compiled into application binaries moves down into the infrastructure instead. This could profoundly change how software is developed and deployed through the ability to audit, update, or restrict access to these capabilities. Imagine if the Log4j vulnerability could have been mitigated just by patching the WebAssembly infrastructure, rather than trying to track down every last application in your organization using Log4j.[19]

I am looking forward to the day that WebAssembly is recommended. While a step in the right direction, memory safe languages is only a part of writing secure software. We need secure-by-default and capability-driven deployment targets and platforms.
Bailey Hayes

While WebAssembly provides a sandboxed execution environment, the integrity of the run-time is critically important for providing that sandbox.[20] One also needs to consider the host environment in which the WebAssembly run-time executes. Layering protections through operating system security, virtual machines, containers, hypervisors, and hardware security are important and complimentary.[21] I go into more details on this subject in my article Choosing a WebAssembly Run-Time, but, in short, I appreciate the attention to security, and the transparency, demonstrated by Wasmtime. For example, it is written in Rust, it uses fuzz testing and formal methods, and it has a documented vulnerability disclosure policy.[22]

Options for sandboxing code continue to improve and I wish the report paid more attention to sandboxing unsafe or third-party code as an important and practical tool for protecting critical infrastructure, especially when the hardware or software cannot be modified.

Embedded Databases

Over the past year, I have been doing a lot of work with both SQLite and DuckDB, databases that can be embedded in-process. SQLite is a relational database and DuckDB is a columnar database, although DuckDB can also read and write SQLite, CSV, and Parquet files, among other formats, and can be used just as a data-processing library.

As of this writing, SQLite has been around for twenty-four years. SQLite is probably the most ubiquitous piece of software in the world, used everywhere from mobile phones, to servers, to airplanes, to IoT devices. It was written in C for its efficiency and portability. Rust wasn’t even an option when SQLite was started, but even the version of C++ available at the time, C++98, had few of the memory-safe features it has today.[23] DuckDB is a different story, since it was started in 2018 and is primarily written in C++.[24] It is hard to fault the creators for choosing C++ since Rust was not a mature language in 2018. The quality of the C++ code in the DuckDB project is generally very high. Because SQLite and DuckDB are databases, it is inevitable they need to use unsafe code. Databases need memory managers which involve multiple indexes with pointers to the same objects in memory. Rewriting code like this in Rust only to use unsafe Rust to get around Rust’s exclusive ownership model won’t make the code any safer.

SQLite has a security policy. It communicates how seriously the project considers security defects, especially errors related to memory safety. SQLite has comprehensive documentation of its testing, including fuzz testing, static analysis, dynamic analysis, and malformed inputs. Richard Hipp, the creator of SQLite, said he views the extensive set of tests his company has accumulated over the years as the major intellectual property of the company and critically important for continuing to evolve SQLite reliably and securely.[25] The security documentation also describes how to take precautions when loading SQLite databases from external sources, like disabling triggers and virtual tables if they are not used. SQLite has been around so long, many bugs have been identified and fixed.[26] Rewriting SQLite in Rust would take many years to become as stable and as memory safe as it is now. However, SQLite continues to evolve and be widely deployed and I would appreciate detailed documentation for how memory safety is addressed.

As of this writing, DuckDB does not publish a security policy.[27] It does publish documentation on testing, including fuzzing, and the DuckDB fuzzer is open source. DuckDB uses Clang-Tidy as a static analyzer, but it doesn’t appear to use all of the C++ Core Guidelines checks, including some related to memory safety. If DuckDB is going to be used widely, like SQLite, and embedded in server infrastructure, industrial control systems (ICS), and industrial IoT (IIoT), I want to see the project publish a comprehensive security program, similar to Wasmtime mentioned above, detailing its approach to memory-safe C++, fuzzing, formal verification, input validation, supply-chain security, and vulnerability reporting and remediation.

SQLite and DuckDB both support WebAssembly. When running SQLite and DuckDB in embedded environments, it is worth considering WebAssembly to further sandbox the application. Finally, since both of these databases support SQL, protecting against SQL injection is a perpetually important consideration for reducing software vulnerabilities.

Conclusion

We haven’t seen cyber attacks cause major disruptions to critical infrastructure in North America in the past few years. Attacks on IT networks have arguably caused more disruption to critical infrastructure than attacks on OT networks. This makes it easy to become complacent or put our focus elsewhere. Given time and budgets are limited, investing in asset management and the hardening of software systems in critical infrastructure is likely more valuable than active detection and response.[28] Attackers are not going to disrupt critical infrastructure, like power distribution, every time they discover an exploit. They will reserve these exploits to attack critical infrastructure when it will have the greatest impact: in a war or when it achieves a financially or politically advantageous outcome. In my opinion, what the reporting is really encouraging is a strengthening of our defensive position. The time is now to improve the quality of software deployed on critical infrastructure.

The report is focused on the vast amount of code used in industrial control systems (ICS) and industrial IoT (IIoT) that has received much less scrutiny than web browsers or operating systems. While I appreciate the report’s attention to memory-safe programming languages, its broad definition of formal methods, and attention to measuring and visualizing software quality, it should have gone further in describing how to incrementally move C and C++ forward, since there is so much code written in C and C++ that is not going away for years. Concretely, the ease of writing memory-safe C++ is getting closer to Rust, WebAssembly can further sandbox code without sacrificing performance, and for projects using languages that are not memory safe, they must publish their security programs and demonstrate their comprehensiveness if they are to be consider for critical infrastructure.

Future software should be memory safe, but secure code is about a lot more than memory safety.


  1. Memory-safe languages prevent entire classes of software defects related to memory management. ↩︎

  2. SCADA and EMS are common industry terms. SCADA is Supervisory Control and Data Acquisition and EMS is Energy Management System. ↩︎

  3. See my essay The Most Difficult Bug I Have Encountered for an example of subtle errors in concurrent programming. ↩︎

  4. When this database was first developed, memory was at a premium. The data structures favoured a small memory footprint over scalability and performance. For instance, using a sorted array as an index uses minimal memory, but insertions can be expensive because of memory copying. As scalability and performance became more important for larger systems, and memory was cheaper and more abundant, simply replacing many of the original data structures with ones from the standard template library improved performance considerably. ↩︎

  5. If you come to Rust from C++, it is easy to see the influence of C++11. In particular, the ownership model with unique and shared smart pointers, move semantics, and rvalue references for perfect forwarding. ↩︎

  6. If you are not familiar with the advantages of Rust, particularly for critical infrastructure, see Adam Crain’s talk from S4x20 entitled Applying the Rust Programming Language in ICS. ↩︎

  7. For some applications, Rust uses an order of magnitude fewer resources than a comparable Java Virtual Machine (JVM) application that relies on garbage collection. ↩︎

  8. Rust supports functional programming concepts like immutability, Option and Result types, map and flatMap functions, higher-order functions, and pattern matching. The conclusion of many people I work with is they prefer Rust when performance or resource constraints matter, but they prefer the richer functional programming concepts in Scala, and tools like Akka Streams, for applications with a lot of business logic where correctness, testability, and composition are more important than raw performance. ↩︎

  9. Refer to these two excellent talks by Bjarne Stroustrup, the creator of C++, on the history of safe C++, as well as the state of the art: Approaching C++ Safety and Delivering Safe C++. Also refer to the paper A call to action: Think seriously about “safety”; then do something sensible about it. ↩︎

  10. While I appreciate the Rust static analyzer for detecting race conditions, with the exception of channels, the ability to express concurrent code still feels complex and primitive. ↩︎

  11. See the United States Cybersecurity and Infrastructure Security Agency (CISA) report The Urgent Need for Memory Safety in Software Products for references to the Microsoft, Google, and Mozilla studies. ↩︎

  12. My position paper for HPTS 2022 was about secure software for critical infrastructure: Our Transition to Renewable Energy: Motivating the Most Challenging Problems in Distributed Computing and IoT. ↩︎

  13. P could be even more powerful if applied on top of actor persistence (event sourcing) like Akka Persistence or Akka Projection, or Concordance from Cosmonic. ↩︎

  14. While not intended for general-purpose computing, Amazon’s AWS Nitro demonstrates how hosting providers are also using hardware isolation to improve security and performance. A similar separation of concerns could be valuable in critical infrastructure. ↩︎

  15. Mo Javadi gave a talk at S4x24 on CHERI entitled Stop Panicking Over Patching: CHERI Morello Memory Safety. The video is not posted yet, but I will try and remember to update this footnote once it is. ↩︎

  16. If you are unfamiliar with WebAssembly, see my introduction Why Am I Excited About WebAssembly? and WebAssembly at the IoT Edge: A Motivating Example. While the source code in the second article is valuable for demonstrating the basic approach, it is now somewhat dated given the WebAssembly Component Model can be used to describe the imports and exports. I touched on this in the linked talks, but it has evolved significantly since. ↩︎

  17. It is an older article, but see Securing Firefox with WebAssembly for an example of this. Note, unsafe code can still corrupt linear memory, but it shouldn’t impact other processes; more on this in a minute. ↩︎

  18. See Katie Bell’s talk Don’t Trust Anything! Real-world Uses For WebAssembly for a practical example of limiting the size, execution time, and the number of instructions in hosted third-party code. ↩︎

  19. This point is better demonstrated visually. See my talk WebAssembly at the IoT Edge, linked to the appropriate timestamp, from S4x23. ↩︎

  20. In WebAssembly, all of the linear memory is writable and it does not employ all of the protections used by native programs, like memory page protections, guard pages, or stack canaries. See Everything Old is New Again: Binary Security of WebAssembly for examples of WebAssembly vulnerabilities, including process vulnerabilities and host vulnerabilities. ↩︎

  21. See the talk Sandboxing Your Sandbox: Leveraging Hypervisors for WebAssembly Security, by Dan Chiarlone, for a demonstration of some of the vulnerabilities documented in the Everything Old is New Again: Binary Security of WebAssembly paper and some details on Microsoft’s Hyperlight hypervisor for WebAssembly. More details on Hyperlight can be found in Aaron Schlesinger’s article What is Hyperlight? ↩︎

  22. See the blog Security and Correctness in Wasmtime, and the accompanying talk, by Nick Fitzgerald. As of this writing, Wasmtime has only had one critical vulnerability, CVE-2023-26489. It was demonstrated in Dan Chiarlone’s talk linked above. However, this hasn’t been the only vulnerability discovered with Wasmtime’s polling allocator. See also CVE-2022-39393. ↩︎

  23. Probably the most notable improvement C++ would have offered over C is resource acquisition is initialization, or RAII. See Revisiting Test Context Classes in C++ for a practical example. ↩︎

  24. Interestingly, since DuckDB can read and write SQLite databases, it includes the C code for the SQLite client. It also contains the Postgres parser which is written in C but stripped down and converted to C++. ↩︎

  25. This was discussed on The Changelog podcast Richard Hipp Returns. ↩︎

  26. As of this writing, SQLite has 159 CVEs. ↩︎

  27. As of this writing, DuckDB has only 1 CVE, a vulnerability that could inject malicious code through loading a database with an extension. ↩︎

  28. Ralph Langner made a similar point on the S4x24 closing panel. The video is not posted yet, but I will try and remember to update this footnote once it is. ↩︎