We live in a complex world. Since the industrial revolution we have been building machines to create more complex machines. The digital revolution has accelerated this, as we exploit digital electronics to deliver advanced functionality in the devices we use and the machines that make these devices.
Although they’ve been designed and built by humans, these devices are beyond the comprehension of any one person. The time when everyday items could be built by a craftsman are long gone, and so has any chance of a complete understanding of how a device works.
Unsatisfied with building complex devices we do not understand, we not delight in going even further – software. Not only is software a collection of complex components interacting in complex ways, it can also modify itself. We rely on this feature for flexible devices. It’s why we can download a new app on our phones without rebuilding them, but it means we have no hope of understanding what the device might do.
In practice, we can live without knowing everything about a device’s potential behavior. If it normally does the right thing we can live with the occasional glitch. It’s the same with software: we live with the fact that our mail client seizes up occasionally, our word processor sometimes draws text off the page for no reason, and the printer never works when you need it. Given no-one can fully understand the system’s behavior, how can we be sure this won’t happen?
This question first arose during the Cold War. Defense was becoming dependent on computers the size of buildings, with all the computing power of your watch. These computers were shared and some users couldn’t be trusted with certain information being processed.
But these machines were already too complex to be fully understood. The challenge was to find a way of being certain that, even if something goes wrong, a user couldn’t access defense secrets and leak them.
The answer was the Trusted Computing Base (TCB). This small, simple component sits at the heart of the system, controlling how other components interact. The TCB is designed to prevent disastrous things from happening.
In the case of the Cold War, stopping sensitive information ending up in the hands of someone who might leak it. The trick is to design the TCB to be simple enough that it’s certain to work, yet powerful enough to allow other components to function as required.
While you rarely hear anyone talk about TCBs now, they are everywhere. The principle is at the heart of all operating systems – the modern kernel is a TCB, though they are a lot larger than was envisaged in the 1970s. The TCB is the kernel software that drives memory management features of the hardware to isolate application software. This prevents applications corrupting others or stealing information.
The TCB software configures the hardware to get the required isolation, but it relies on the hardware to enforce it. Computers in the 1970s were big and slow, but they had one virtue – simplicity. It was possible to understand the effects of their memory management units, making it possible to understand what would happen when the TCB configured the hardware. With modern hardware that is lost.
Advanced features like speculative instruction execution and caching make it hard to understand hardware’s properties. This means we can’t properly understand what the TCB is doing, so we don’t understand the security properties of our systems.
The result? We end up with flawed designs and the recent Meltdown and Spectre attacks. These are stunning in their simplicity of concept and elegance of execution and they arise simply because we cannot understand the behavior of complex hardware/software systems. Such failures are bad news for security, but what can we do? Do we wait until the next fatal flaw is discovered, hoping we can find a way of patching it up before a disaster?
There isn’t a general solution that fixes the problem everywhere, because we desire the complexity that caused the problem, but we can trade complexity and risk. We can have complexity in places with minimal and recoverable damage, but must use less in places where failures mean disaster – like Critical National Infrastructure.
The basic requirement for a TCB is to separate application software. We can’t rely on modern processor hardware, due to its complexity, and we can’t rely on software because it changes. Yet we can use separate processors with hardware logic to glue the components together, giving us separation we can rely on.
This is common practice in government-grade devices and is the approach taken in a new breed of commercial security devices, such as Garrison’s Silicon Assured Video Isolation and Deep Secure’s High Speed Verifier. This doesn’t ensure infallible security everywhere, but allows us to build resilient systems able to survive future disastrous failures in our ability to understand complexity.