As details of the Meltdown and Spectre vulnerabilities1 have become clearer a number of statements have been published by the multiple vendors affected; Canonical has issued advisories and updates on fixes and mitigations, the latest of which includes a first round of Spectre mitigations. However, most of these statements focus on the mechanics of applying fixes and corresponding damage control, and not on explaining what the problems are, how the mitigations work, and how they may affect you.
Because the vulnerabilities and their fixes are CPU-dependent and involve a major performance-security tradeoff, understanding their general model is important to every system administrator and developer. In the spirit of Ubuntu, which is known for providing easily accessible technology — we are Linux for Human Beings, after all — this post attempts to provide an accessible description of the impact of these vulnerabilities and their mitigations.
What are the vulnerabilities?
The essence of both vulnerabilities is that a program running on a computer can read memory it is not supposed to. This is not an arbitrary code execution issue, but rather that the CPU can be tricked by a malicious program to expose memory that it wouldn’t otherwise have access to. Second, due to the way the CPU is being tricked, the exposed memory can only be retrieved relatively slowly2. In summary:
- A CPU needs to be running untrusted code in order to be attackable
- The direct consequence of an attack is unrestricted, read-only access to all system memory
- Memory is not exposed at a very fast rate.
Although an attacker can’t directly use Meltdown or Spectre to read disks or change anything on a system — for instance, to modify a password or write a file — the problem is that passwords, keys and other secrets are usually stored unencrypted in memory. Once stolen, these can then be used to escalate privileges, which can then be used to modify the system or obtain sensitive information.
Proof of concept code exploiting these vulnerabilities has been published that demonstrates:
- Reading passwords typed into web browsers, and reading images displayed on a web page from a separate, malicious application running on the same machine.
- Stealing hypervisor credentials, and reading secrets inside any hosted virtual machines from a malicious application running inside a virtualized guest, such as a public cloud instance.
- Generally reading any kernel and user-space memory from userspace and VM-hosted applications, including memory belonging to other VMs and the hypervisor.
Other exploits that follow the patterns above almost certainly will emerge over time.
Who is most affected?
These issues affect practically every computer in server and end-user contexts, but due to the nature of the possible attack vectors, different use cases are exposed to different degrees. We have therefore provided the following risk grading:
How do the vulnerabilities work?
In order to understand how mitigations are being implemented, it’s important to grasp what the underlying issues are, which in turn requires an understanding of operating system and processor concepts.
There are two good “kindergarten-class” analogies that may serve as useful introductions, which we have nicknamed the Helpful Grandson Analogy and the Book Voyeur Analogy. A quick read through William Cohen’s excellent post on Red Hat’s developer blog will also provide basic knowledge on CPU pipelines and cache behavior, and an optional hour-long read of Dan Luu’s branch prediction treatise will complete the necessary background for those unfamiliar with that aspect of processor design.
And that’s where our explanation starts. Summarizing the root cause:
- All modern CPUs pre-fetch data from system memory in advance of code being executed.
- Pre-fetched data is placed in internal registers and the CPU caches, which can be read much faster than system memory.
- Nearly all CPUs released since 19953 pre-fetch data out of order, and in addition, most pre-fetch speculatively4.
- Now, the fundamental issue behind the Meltdown and Spectre vulnerabilities: it is possible to trick a CPU into pre-fetching arbitrary data into internal registers and the cache, something that CPU and operating system protections would otherwise prevent.
- Through sophisticated techniques called cache timing side-channel attacks, the pre-fetched data can be exposed.
- Attacks can apply a targeted brute-force approach using these vulnerabilities to progressively read system memory regardless of any protections.
Meltdown and Spectre are caused by slightly different processor design choices, and expose system memory in different ways:
Neither Meltdown nor Spectre can be directly addressed by CPU manufacturers in existing hardware: they are a consequence of fundamental hardware design which cannot be modified in field. Mitigations involve a combination of software and CPU microcode updates to hypervisor and guests, which we will discuss next.
What mitigations are available?
Mitigations are specific to each attack, and have different performance implications. Existing mitigations are summarized in the following table:
It’s worth calling out that some of the mitigations for Spectre are not yet fully mature, which implies multiple iterations will be implemented and rolled out before the situation is fully addressed.
How will mitigations be deployed?
In order to benefit from the mitigations being provided, the following must hold true:
- Operating system and affected application code must be patched.
- For full protection against Spectre, CPU microcode or system firmware will need to be updated.
- The operating system’s protections must be active.
- In virtualized environments, all of the above must be true for both the hypervisor and the guest in order to protect against all known attacks.
Ubuntu provides security updates free of charge to all Ubuntu users. Updates will automatically install all necessary code and — where available — CPU microcode. Ubuntu will also, by default, enable all protections that are stable and safely implemented.
However, not all vulnerabilities identified have protections available covering their full extent, as outlined in the Current Status section of our Knowledge Base entry on the vulnerabilities. To further complicate matters, the protections that are available have significant performance impact for a number of workloads. The next section will discuss this in more detail.
How do the mitigations affect performance?
Performance impact from the mitigations has been a top concern, and at the highest level, the answer is that performance regressions are entirely workload-dependent; the slowdowns we have directly observed vary from 0 to 50%. To put the complexity of producing useful performance impact data in perspective, let’s review the different ways in which mitigations can apply to a test scenario:
- For virtualized environments, mitigations must be applied to both hypervisor and guests — including microcode, which on x86 is currently unavailable.
- Public clouds have been at the forefront of development of these mitigations, and we have directly observed shifts between approaches during the past 3 weeks.
- Performance degradation from Spectre mitigation depends on the CPU family being used: the more advanced its branch predictor, the greater the impact of the features being used to protect from attack.
- Retpoline, which reduces that impact, does not mitigate on Skylake platforms — but on Skylake the non-retpoline mitigations have less of an impact of performance.
- Finally, Meltdown impact is greatly reduced on platforms that have the PCID feature available and enabled.
The possible permutations make communicating through benchmarks incredibly difficult, and recognizing that, we will focus on practical advice first. The information in this section assumes all protections published by Ubuntu for Spectre and Meltdown have been enabled.
For Ubuntu Desktop users, including users running official flavors derived from Ubuntu, here is a summary of the impact of mitigations being applied:
For server workloads, impact is described in more detail in the following table.
Where necessary, offsetting performance impact will involve a combination of scaling out workloads, increasing compute power (by choosing a larger cloud instance type, for instance) or selectively disabling mitigations in contexts where the tradeoffs justify it. We are maintaining a Mitigation Controls page which describes the relevant knobs available on Ubuntu.
What performance data is available?
Canonical are in the process of finalizing a set of performance runs across private and cloud environments and applications. We aim to have our performance findings, based on our internal experience applying mitigations published by February 12th.
For one preliminary datapoint, on our pre-Skylake build farm, the currently published Meltdown and Spectre mitigations for Ubuntu (including pre-release Intel microcode) show us that:
- Kernel build times have increased by 50%; prior to the mitigations build times averaged at around 3.5 hours, whereas they are now averaging 5 hours.
- Package build times, across the board, have increased by 30%.
In the meantime, we’re providing a Published Application Performance summary page which collects per-application performance descriptions published by third parties. That page will be updated as new information gets published and can assist administrators in evaluating trade-offs in risk and performance to determine their own mitigation plans.
We appreciate this is frustrating to many administrators aiming to frame decisions with hard data, and we ourselves have been faced with the same issue internally. But the immature nature of the mitigations, made worse by an evolving understanding of which mitigation strategies should be active in what contexts — kernel/userspace, hypervisor/guest, and specific code paths, makes the topic anything but straightforward.
We have built this document with the intention of putting forward a practical framework to support decision making in this unusual situation. The evolving nature of the industry’s collective understanding of the vulnerabilities has lead to an excess of public information, in part incomplete and in part contradictory, and we have selected here a set of links that are coherent, well-written and expand detail on what we have presented above:
We will issue updates to this post and additional information as the situation evolves. We encourage Ubuntu users who seek more information to contact an Ubuntu Advantage support representative for an in-depth discussion relative to your use cases.
- These vulnerabilities are tracked as CVE-2017-5754, CVE-2017-5753 and CVE-2017-5715.
- These are specific to CPU and method of attack, but the Meltdown paper measured 500KB/s reads, and the Spectre paper measured 10KB/s read. Two independent runs (1, 2) of a simple Spectre PoC on Intel Core i5 based laptops averaged 8.5KB/s read.
- The Pentium Pro (1995) is probably the first CPU affected by Meltdown, which doesn’t strictly require speculative execution; pipelining and out of order execution are sufficient. Spectre requires speculative execution.
- “Prefetch speculatively” means that, when a CPU sees an if clause (a condition) or a for/while block (a loop), it will try and guess what will happen, and in the process fetch data it speculates it will need.
- It is likely every Intel CPU post the Pentium Pro are affected. Atom-based processors released prior to 2013, Quark and Itanium are in-order and are not affected.
- Technically, only CPUs with speculative execution are affected, but it is practical to assume that every CPU used in server and desktop environments is affected by this issue, as the non-exhaustive list of affected vendors include AMD, Apple, ARM, IBM, Intel, Marvell, Nvidia, Qualcomm and Samsung.
- Meltdown cannot break out from a non-PV VM due to the fact that hypervisor memory is generally not mapped when executing in guest context. However, the hypervisor has all memory mapped.
- PCID is a feature present in many post-2010 CPUs which has been enabled in the 4.14 kernel and included with the KPTI patchset Ubuntu has backported; see this forum post by Gil Tene for more details.
- This paste outlines a simple, but very repeatable demonstration of a 50% slowdown by enabling just the Spectre mitigations.
- “HTTP application servers” and “Load balancer impact” assume userspace implementations.