|
|
Subscribe / Log in / New account

Addressing Meltdown and Spectre in the kernel

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 18:22 UTC (Sun) by matthias (subscriber, #94967)
In reply to: Addressing Meltdown and Spectre in the kernel by dskoll
Parent article: Addressing Meltdown and Spectre in the kernel

The delay of a cache miss is in the order of some hundred CPU cycles. The advantage of speculatively fetching into the cache is that the needed data arrives shortly after the information whether the speculative execution is correct. If there would be no speculative fetching, the CPU would stall some hundred cycles to wait for the jump condition and then another some hundred cycles to wait for the data. With speculative fetching, the performance hit is only taken once.

Also, for Spectre, the privileged information might be already in the cache, allowing speculative execution to run without a stall. Running the same procedure twice should force the needed code into the cache the first time, and use it the second time.


to post comments

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 18:32 UTC (Sun) by dskoll (subscriber, #1630) [Link] (18 responses)

OK, how about this: When something is fetched into the cache by speculatively-executing code, tag it as "speculatively fetched". If the speculatively-executed code turns out to be required, the data is in cache and the speculatively-fetched tag is cleared. If the speculatively-executed code is abandoned, then pretend the data is not in cache if some other code requires it.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 18:42 UTC (Sun) by Otus (subscriber, #67685) [Link] (5 responses)

That could still leak that something was *removed* from cache by the speculatively executed path.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 22:03 UTC (Sun) by dskoll (subscriber, #1630) [Link] (4 responses)

Well, you could have a separate dedicated cache only used by speculatively-executed code and you only move it to the main cache (and evict something else) if the speculative execution was needed. This means more cache memory, some of which is "wasted".

I agree that you can never hope to shut all covert channels, but I think it is worth brainstorming how to reduce their bandwidth and make attacks harder

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 23:19 UTC (Sun) by excors (subscriber, #95769) [Link] (3 responses)

Maybe you could add a new buffer to store speculative data before it goes into the L1 cache; but what about the L2 cache, and L3, and eDRAM, and buffers inside the DRAM? Any of those could be modified by the memory read in an observable way.

Also, what would happen if you try to read a cache line that's currently dirty in another core's L1? The read would normally trigger that other core to write back to RAM (or share its data in some other way), which may be observable even if the first core perfectly hides the read from itself.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 0:35 UTC (Mon) by dskoll (subscriber, #1630) [Link] (2 responses)

Ok. :) I get it. So then it seems to me speculative-execution is by its very nature a covert channel impossible to shut down completely. That's a somewhat unsettling reality.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 16:51 UTC (Mon) by dw (subscriber, #12017) [Link] (1 responses)

Speaking as the mostly clueless, would there be any sense in halting speculation if it leads to a load that was not present in L1? In that case, at least latency of L2/L3 will always be involved, if not a bus transaction. This would seem to be efficient where it matters, e.g. tight loops where the instruction stream is already cached, and for other cases PREFETCHxx could be used to explicitly request population where it was known it could not create problems

Addressing Meltdown and Spectre in the kernel

Posted Jan 17, 2018 14:55 UTC (Wed) by mstone_ (subscriber, #66309) [Link]

since the point of speculation is basically to paper over memory latency, this would probably have a significant performance impact

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 18:47 UTC (Sun) by matthias (subscriber, #94967) [Link]

Good idea, but probably not enough. At least, this will make the attack harder. Even if the data is tagged in the cache, there is the effect that for some data to get into the cache some other data has to be revoked from the cache. The attacker can look which data was revoked and (because of cache associativity) learn which of two possible addresses has been loaded into the cache. It will make the attack harder, but not mitigate it completely.

There will always be some side channels. Even the timing of speculative execution itself could reveal some information. The goal with side channels has to be to make the bandwidth really small, such that they become unusable. Closing all side channels after all means that the execution time must not depend on the data. Especially the best case performance has to be the same as the worst case performance. This would be a big performance hit that people usually will not pay. This is what is done in cryptography (where best and worst case usually do not differ that much anyway), but I do not think this is a valid option for each and every code.

Addressing Meltdown and Spectre in the kernel

Posted Jan 7, 2018 21:54 UTC (Sun) by khim (subscriber, #9252) [Link] (10 responses)

You couldn't insert data into the cache without evicting something else first. Thus your scheme would need only minor modification to circumvent it: see what data it EVICTED from the cache by speculative read and go from there. Would be less reliable than current Spectre, though.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 12:20 UTC (Mon) by nix (subscriber, #2304) [Link] (9 responses)

As I noted elsewhere, you'd need to keep a bit of the cache free, populate only that free bit with speculated fetches, cease speculation if it filled up and more reads were required, and evict such bits if only required by speculations that never retired.

Making this all more complex is that you might have multiple speculations requiring the same bit of cacheline data, only some of which might fail so you need refcounting, and now you have counter overflow problems and oh gods I'm glad I'm not a silicon engineer right now.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 15:15 UTC (Mon) by mgb (guest, #3226) [Link] (8 responses)

Maybe speculative fetches should not look beyond the L1 cache.

Throwing away cached information and a hundred cycles to fetch something that might not be needed can be counter-productive.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 15:27 UTC (Mon) by nix (subscriber, #2304) [Link] (3 responses)

But one major benefit of speculation is that it can take the huge 100-clock hit of going to main memory in advance! We don't want to lose that *entirely*.

Maybe this will require a memory controller redesign as well (a signal that this is a speculative fetch, reset the RAS and buffers of affected DIMMs to some default value before any subsequent nonspeculative fetch to those DIMMs, perhaps).

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 15:57 UTC (Mon) by mgb (guest, #3226) [Link] (2 responses)

I suspect that memory running far enough ahead of CPU for speculative fetches to be beneficial is exceeding rare.

Such rare use cases - staggering amounts of floating point ops on each fetched datum - could be hand crafted to use a speculative fetch without the risk of Spectre.

And remember that speculation can just as easily be counter-productive - speculatively replacing a cache line not only leaks information but also throws away good cached information and replaces it with information of unknown merit.

Addressing Meltdown and Spectre in the kernel

Posted Jan 9, 2018 23:46 UTC (Tue) by immibis (subscriber, #105511) [Link]

> I suspect that memory running far enough ahead of CPU for speculative fetches to be beneficial is exceeding rare.

It would be quite common if most of the data the CPU is working on is in the cache already - which, in a well-designed and well-tuned program, should be the case.

Addressing Meltdown and Spectre in the kernel

Posted Jan 10, 2018 11:51 UTC (Wed) by farnz (subscriber, #17727) [Link]

If the latency hit is 100 clocks,your cacheline size is 64 bytes, and the CPU is running sequentially through the data, each 100 clock delay gets you 64 bytes to work on. If the datum size is a 32 bit integer, that's 16 items to work on for every 100 clock latency hit. If my workload takes more than 6 clock cycles per item, then speculating far enough ahead that I can trigger the next cacheline fetch as soon as I've finished the first cacheline fetch means that my workload never sees a cache miss.

I suspect this type of case isn't that rare - while I've described the absolute best case which can also be done easily by a prefetch engine, it also covers workloads where the code fits in L1I, the bytes you need to work on any one datum fit in L1D, but the bytes you need to work on the next datum are not all going to be in L1D immediately after finishing the last datum.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 15:50 UTC (Mon) by excors (subscriber, #95769) [Link] (3 responses)

Branch predictors are apparently correct ~90% of the time, so it's worth doing things that have some performance cost on misprediction if they give a similar performance benefit in correctly-predicted cases.

I'd imagine there's plenty of code that does something a bit like "for (linked_list_node *n = head; n->data->key != key; n = n->next) { }". If the CPU waits for n->data before fetching n->next, I think it's going to take two memory-latency periods per iteration. If it speculatively fetches n->next concurrently with n->data, it should run twice as fast, which is a huge improvement, with only a single incorrectly-predicted fetch at the end of the loop. I can't imagine CPU designers or marketers would be happy with throwing away so much performance in what seems like fairly common code.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 17:03 UTC (Mon) by mgb (guest, #3226) [Link] (2 responses)

In your example if n->data and n->next are in the same cache line there will be one fetch and if not there will be two fetches.

It makes no difference whether speculative fetches are enabled, disabled, or enabled only to the L1 cache.

Addressing Meltdown and Spectre in the kernel

Posted Jan 8, 2018 17:30 UTC (Mon) by excors (subscriber, #95769) [Link] (1 responses)

I was thinking of something like "struct data { int key; ... }; struct node { node *next; data *data; };". Each iteration, it has to read one 'struct node' and one 'struct data', and they likely won't be in the same cache line as each other. It needs to read the first node to get the address of the first data (so there's a data dependency), and it needs to read the first data to determine whether it's safe to dereference node->next and read the second node (so there's a control dependency).

I tried testing that code on a Haswell CPU. Nodes were 64B-aligned and randomly shuffled in memory (to avoid simple prefetching). Simply iterating over the list (i.e. one uncached memory read per node) takes about 310 cycles per node, which sounds plausible for RAM latency. Adding an 'lfence' instruction (which should prevent out-of-order reads) makes basically no difference (since these reads can't be reordered anyway). With the extra read of a 'data' pointer (i.e. two uncached memory reads per node, with control and/or data dependencies between them all), and no lfence, it takes about 370 cycles per node. With an lfence between the two reads, it goes up to 650 cycles.

That suggests that (without lfence) it is indeed doing two memory reads in parallel, and must be speculatively ignoring the control dependency, so the second read is nearly free. Preventing speculation almost doubles the cost.

(On a Cortex-A53 (which is in-order and doesn't really speculate), the one-read-per-node version takes 200 cycles, and the two-read-per-node version takes 420 cycles, so it's equivalent to the lfenced x86 version.)

Addressing Meltdown and Spectre in the kernel

Posted Jan 26, 2018 8:08 UTC (Fri) by mcortese (guest, #52099) [Link]

Hats off, sir! I wish more comments were as articulated and proof-backed as this one.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds