Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Posted Jan 7, 2018 18:22 UTC (Sun) by matthias (subscriber, #94967)In reply to: Addressing Meltdown and Spectre in the kernel by dskoll
Parent article: Addressing Meltdown and Spectre in the kernel
Also, for Spectre, the privileged information might be already in the cache, allowing speculative execution to run without a stall. Running the same procedure twice should force the needed code into the cache the first time, and use it the second time.
Posted Jan 7, 2018 18:32 UTC (Sun)
by dskoll (subscriber, #1630)
[Link] (18 responses)
OK, how about this: When something is fetched into the cache by speculatively-executing code, tag it as "speculatively fetched". If the speculatively-executed code turns out to be required, the data is in cache and the speculatively-fetched tag is cleared. If the speculatively-executed code is abandoned, then pretend the data is not in cache if some other code requires it.
Posted Jan 7, 2018 18:42 UTC (Sun)
by Otus (subscriber, #67685)
[Link] (5 responses)
Posted Jan 7, 2018 22:03 UTC (Sun)
by dskoll (subscriber, #1630)
[Link] (4 responses)
Well, you could have a separate dedicated cache only used by speculatively-executed code and you only move it to the main cache (and evict something else) if the speculative execution was needed. This means more cache memory, some of which is "wasted".
I agree that you can never hope to shut all covert channels, but I think it is worth brainstorming how to reduce their bandwidth and make attacks harder
Posted Jan 7, 2018 23:19 UTC (Sun)
by excors (subscriber, #95769)
[Link] (3 responses)
Also, what would happen if you try to read a cache line that's currently dirty in another core's L1? The read would normally trigger that other core to write back to RAM (or share its data in some other way), which may be observable even if the first core perfectly hides the read from itself.
Posted Jan 8, 2018 0:35 UTC (Mon)
by dskoll (subscriber, #1630)
[Link] (2 responses)
Ok. :) I get it. So then it seems to me speculative-execution is by its very nature a covert channel impossible to shut down completely. That's a somewhat unsettling reality.
Posted Jan 8, 2018 16:51 UTC (Mon)
by dw (subscriber, #12017)
[Link] (1 responses)
Posted Jan 17, 2018 14:55 UTC (Wed)
by mstone_ (subscriber, #66309)
[Link]
Posted Jan 7, 2018 18:47 UTC (Sun)
by matthias (subscriber, #94967)
[Link]
There will always be some side channels. Even the timing of speculative execution itself could reveal some information. The goal with side channels has to be to make the bandwidth really small, such that they become unusable. Closing all side channels after all means that the execution time must not depend on the data. Especially the best case performance has to be the same as the worst case performance. This would be a big performance hit that people usually will not pay. This is what is done in cryptography (where best and worst case usually do not differ that much anyway), but I do not think this is a valid option for each and every code.
Posted Jan 7, 2018 21:54 UTC (Sun)
by khim (subscriber, #9252)
[Link] (10 responses)
Posted Jan 8, 2018 12:20 UTC (Mon)
by nix (subscriber, #2304)
[Link] (9 responses)
Making this all more complex is that you might have multiple speculations requiring the same bit of cacheline data, only some of which might fail so you need refcounting, and now you have counter overflow problems and oh gods I'm glad I'm not a silicon engineer right now.
Posted Jan 8, 2018 15:15 UTC (Mon)
by mgb (guest, #3226)
[Link] (8 responses)
Throwing away cached information and a hundred cycles to fetch something that might not be needed can be counter-productive.
Posted Jan 8, 2018 15:27 UTC (Mon)
by nix (subscriber, #2304)
[Link] (3 responses)
Maybe this will require a memory controller redesign as well (a signal that this is a speculative fetch, reset the RAS and buffers of affected DIMMs to some default value before any subsequent nonspeculative fetch to those DIMMs, perhaps).
Posted Jan 8, 2018 15:57 UTC (Mon)
by mgb (guest, #3226)
[Link] (2 responses)
Such rare use cases - staggering amounts of floating point ops on each fetched datum - could be hand crafted to use a speculative fetch without the risk of Spectre.
And remember that speculation can just as easily be counter-productive - speculatively replacing a cache line not only leaks information but also throws away good cached information and replaces it with information of unknown merit.
Posted Jan 9, 2018 23:46 UTC (Tue)
by immibis (subscriber, #105511)
[Link]
It would be quite common if most of the data the CPU is working on is in the cache already - which, in a well-designed and well-tuned program, should be the case.
Posted Jan 10, 2018 11:51 UTC (Wed)
by farnz (subscriber, #17727)
[Link]
If the latency hit is 100 clocks,your cacheline size is 64 bytes, and the CPU is running sequentially through the data, each 100 clock delay gets you 64 bytes to work on. If the datum size is a 32 bit integer, that's 16 items to work on for every 100 clock latency hit. If my workload takes more than 6 clock cycles per item, then speculating far enough ahead that I can trigger the next cacheline fetch as soon as I've finished the first cacheline fetch means that my workload never sees a cache miss.
I suspect this type of case isn't that rare - while I've described the absolute best case which can also be done easily by a prefetch engine, it also covers workloads where the code fits in L1I, the bytes you need to work on any one datum fit in L1D, but the bytes you need to work on the next datum are not all going to be in L1D immediately after finishing the last datum.
Posted Jan 8, 2018 15:50 UTC (Mon)
by excors (subscriber, #95769)
[Link] (3 responses)
I'd imagine there's plenty of code that does something a bit like "for (linked_list_node *n = head; n->data->key != key; n = n->next) { }". If the CPU waits for n->data before fetching n->next, I think it's going to take two memory-latency periods per iteration. If it speculatively fetches n->next concurrently with n->data, it should run twice as fast, which is a huge improvement, with only a single incorrectly-predicted fetch at the end of the loop. I can't imagine CPU designers or marketers would be happy with throwing away so much performance in what seems like fairly common code.
Posted Jan 8, 2018 17:03 UTC (Mon)
by mgb (guest, #3226)
[Link] (2 responses)
It makes no difference whether speculative fetches are enabled, disabled, or enabled only to the L1 cache.
Posted Jan 8, 2018 17:30 UTC (Mon)
by excors (subscriber, #95769)
[Link] (1 responses)
I tried testing that code on a Haswell CPU. Nodes were 64B-aligned and randomly shuffled in memory (to avoid simple prefetching). Simply iterating over the list (i.e. one uncached memory read per node) takes about 310 cycles per node, which sounds plausible for RAM latency. Adding an 'lfence' instruction (which should prevent out-of-order reads) makes basically no difference (since these reads can't be reordered anyway). With the extra read of a 'data' pointer (i.e. two uncached memory reads per node, with control and/or data dependencies between them all), and no lfence, it takes about 370 cycles per node. With an lfence between the two reads, it goes up to 650 cycles.
That suggests that (without lfence) it is indeed doing two memory reads in parallel, and must be speculatively ignoring the control dependency, so the second read is nearly free. Preventing speculation almost doubles the cost.
(On a Cortex-A53 (which is in-order and doesn't really speculate), the one-read-per-node version takes 200 cycles, and the two-read-per-node version takes 420 cycles, so it's equivalent to the lfenced x86 version.)
Posted Jan 26, 2018 8:08 UTC (Fri)
by mcortese (guest, #52099)
[Link]
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Addressing Meltdown and Spectre in the kernel
Hats off, sir!
I wish more comments were as articulated and proof-backed as this one.
Addressing Meltdown and Spectre in the kernel