boringssl/crypto/fipsmodule/bn
David Benjamin 4188c3f495 Remove cacheline striping in copy_from_prebuf.
The standard computation model for constant-time code is that memory
access patterns must be independent of secret data.
BN_mod_exp_mont_consttime was previously written to a slightly weaker
model: only cacheline access patterns must be independent of secret
data. It assumed accesses within a cacheline were indistinguishable.

The CacheBleed attack (https://eprint.iacr.org/2016/224.pdf) showed this
assumption was false. Cache lines may be divided into cache banks, and
the researchers were able to measure cache bank contention pre-Haswell.
For Haswell, the researchers note "But, as Haswell does show timing
variations that depend on low address bits [19], it may be vulnerable to
similar attacks."

OpenSSL's fix to CacheBleed was not to adopt the standard constant-time
computation model. Rather, it now assumes accesses within a 16-byte
cache bank are indistinguishable, at least in the C copy_from_prebuf
path. These weaker models failed before with CacheBleed, so avoiding
such assumptions seems prudent. (The [19] citation above notes a false
dependence between memory addresses with a distance of 4k, which may be
what the paper was referring to.) Moreover, the C path is largely unused
on x86_64 (which uses mont5 asm), so it is especially questionable for
the generic C code to make assumptions based on x86_64.

Just walk the entire table in the C implementation. Doing so as-is comes
with a performance hit, but the striped memory layout is, at that point,
useless. We regain the performance loss (and then some) by using a more
natural layout. Benchmarks below.

This CL does not touch the mont5 assembly; I haven't figured out what
it's doing yet.

Pixel 3, aarch64:
Before:
Did 3146 RSA 2048 signing operations in 10009070us (314.3 ops/sec)
Did 447 RSA 4096 signing operations in 10026666us (44.6 ops/sec)
After:
Did 3210 RSA 2048 signing operations in 10010712us (320.7 ops/sec)
Did 456 RSA 4096 signing operations in 10063543us (45.3 ops/sec)

Pixel 3, armv7:
Before:
Did 2688 RSA 2048 signing operations in 10002266us (268.7 ops/sec)
Did 459 RSA 4096 signing operations in 10004785us (45.9 ops/sec)
After:
Did 2709 RSA 2048 signing operations in 10001299us (270.9 ops/sec)
Did 459 RSA 4096 signing operations in 10063737us (45.6 ops/sec)

x86_64 Broadwell, mont5 assembly disabled:
(This configuration is not actually shipped anywhere, but seemed a
useful data point.)
Before:
Did 14274 RSA 2048 signing operations in 10009130us (1426.1 ops/sec)
Did 2448 RSA 4096 signing operations in 10046921us (243.7 ops/sec)
After:
Did 14706 RSA 2048 signing operations in 10037908us (1465.0 ops/sec)
Did 2538 RSA 4096 signing operations in 10059986us (252.3 ops/sec)

Change-Id: If41da911d4281433856a86c6c8eadf99cd33e2d8
Reviewed-on: https://boringssl-review.googlesource.com/c/33268
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2018-11-19 19:10:09 +00:00
..
asm Tidy up type signature of BN_mod_exp_mont_consttime table. 2018-11-19 17:44:44 +00:00
add.c
bn_test_to_fuzzer.go
bn_test.cc Speculatively remove __STDC_*_MACROS. 2018-11-14 16:14:37 +00:00
bn_tests.txt
bn.c Modernize OPENSSL_COMPILE_ASSERT, part 2. 2018-11-14 16:06:37 +00:00
bytes.c
check_bn_tests.go
cmp.c Modernize OPENSSL_COMPILE_ASSERT, part 2. 2018-11-14 16:06:37 +00:00
ctx.c
div_extra.c
div.c Fix div.c to divide BN_ULLONG only if BN_CAN_DIVIDE_ULLONG defined. 2018-10-10 15:33:35 +00:00
exponentiation.c Remove cacheline striping in copy_from_prebuf. 2018-11-19 19:10:09 +00:00
gcd_extra.c
gcd.c
generic.c
internal.h Fix div.c to divide BN_ULLONG only if BN_CAN_DIVIDE_ULLONG defined. 2018-10-10 15:33:35 +00:00
jacobi.c
montgomery_inv.c Modernize OPENSSL_COMPILE_ASSERT, part 2. 2018-11-14 16:06:37 +00:00
montgomery.c Modernize OPENSSL_COMPILE_ASSERT, part 2. 2018-11-14 16:06:37 +00:00
mul.c Modernize OPENSSL_COMPILE_ASSERT, part 2. 2018-11-14 16:06:37 +00:00
prime.c Update Miller–Rabin check numbers. 2018-08-14 23:10:53 +00:00
random.c Modernize OPENSSL_COMPILE_ASSERT, part 2. 2018-11-14 16:06:37 +00:00
rsaz_exp.c Modernize OPENSSL_COMPILE_ASSERT, part 2. 2018-11-14 16:06:37 +00:00
rsaz_exp.h Include bn/internal.h for RSAZ code. 2018-06-04 17:26:29 +00:00
shift.c Modernize OPENSSL_COMPILE_ASSERT, part 2. 2018-11-14 16:06:37 +00:00
sqrt.c