boringssl

Author	SHA1	Message	Date
David Benjamin	4188c3f495	Remove cacheline striping in copy_from_prebuf. The standard computation model for constant-time code is that memory access patterns must be independent of secret data. BN_mod_exp_mont_consttime was previously written to a slightly weaker model: only cacheline access patterns must be independent of secret data. It assumed accesses within a cacheline were indistinguishable. The CacheBleed attack (https://eprint.iacr.org/2016/224.pdf) showed this assumption was false. Cache lines may be divided into cache banks, and the researchers were able to measure cache bank contention pre-Haswell. For Haswell, the researchers note "But, as Haswell does show timing variations that depend on low address bits [19], it may be vulnerable to similar attacks." OpenSSL's fix to CacheBleed was not to adopt the standard constant-time computation model. Rather, it now assumes accesses within a 16-byte cache bank are indistinguishable, at least in the C copy_from_prebuf path. These weaker models failed before with CacheBleed, so avoiding such assumptions seems prudent. (The [19] citation above notes a false dependence between memory addresses with a distance of 4k, which may be what the paper was referring to.) Moreover, the C path is largely unused on x86_64 (which uses mont5 asm), so it is especially questionable for the generic C code to make assumptions based on x86_64. Just walk the entire table in the C implementation. Doing so as-is comes with a performance hit, but the striped memory layout is, at that point, useless. We regain the performance loss (and then some) by using a more natural layout. Benchmarks below. This CL does not touch the mont5 assembly; I haven't figured out what it's doing yet. Pixel 3, aarch64: Before: Did 3146 RSA 2048 signing operations in 10009070us (314.3 ops/sec) Did 447 RSA 4096 signing operations in 10026666us (44.6 ops/sec) After: Did 3210 RSA 2048 signing operations in 10010712us (320.7 ops/sec) Did 456 RSA 4096 signing operations in 10063543us (45.3 ops/sec) Pixel 3, armv7: Before: Did 2688 RSA 2048 signing operations in 10002266us (268.7 ops/sec) Did 459 RSA 4096 signing operations in 10004785us (45.9 ops/sec) After: Did 2709 RSA 2048 signing operations in 10001299us (270.9 ops/sec) Did 459 RSA 4096 signing operations in 10063737us (45.6 ops/sec) x86_64 Broadwell, mont5 assembly disabled: (This configuration is not actually shipped anywhere, but seemed a useful data point.) Before: Did 14274 RSA 2048 signing operations in 10009130us (1426.1 ops/sec) Did 2448 RSA 4096 signing operations in 10046921us (243.7 ops/sec) After: Did 14706 RSA 2048 signing operations in 10037908us (1465.0 ops/sec) Did 2538 RSA 4096 signing operations in 10059986us (252.3 ops/sec) Change-Id: If41da911d4281433856a86c6c8eadf99cd33e2d8 Reviewed-on: https://boringssl-review.googlesource.com/c/33268 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2018-11-19 19:10:09 +00:00
David Benjamin	5963bff237	Tidy up type signature of BN_mod_exp_mont_consttime table. It's a table of BN_ULONGs. No particular need to use unsigned char. Change-Id: I397883cef9f39fb162c2b0bfbd6a70fe399757a2 Reviewed-on: https://boringssl-review.googlesource.com/c/33267 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2018-11-19 17:44:44 +00:00
David Benjamin	53affef486	No negative moduli. https://boringssl-review.googlesource.com/31085 wasn't right. We already forbid creating BN_MONT_CTX on negative numbers, which means almost all moduli already don't work with BN_mod_exp_mont. Only -1 happened to not get rejected, but it computed the wrong value. Reject it instead. Update-Note: BN_mod_exp* will no longer work for negative moduli. It already didn't work for all negative odd moduli other than -1, so rejecting -1 and negative evens is unlikely to be noticed. Bug: 71 Change-Id: I7c713d417e2e6512f3e78f402de88540809977e3 Reviewed-on: https://boringssl-review.googlesource.com/31484 Reviewed-by: Adam Langley <agl@google.com>	2018-09-04 22:26:53 +00:00
David Benjamin	378cca8016	Handle a modulus of -1 correctly. Historically, OpenSSL's modular exponentiation functions tolerated negative moduli by ignoring the sign bit. The special case for a modulus of 1 should do the same. That said, this is ridiculous and the only reason I'm importing this is BN_abs_is_word(1) is marginally more efficient than BN_is_one() and we haven't gotten around to enforcing positive moduli yet. Thanks to Guido Vranken and OSSFuzz for finding this issue and reporting to OpenSSL. (Imported from upstream's 235119f015e46a74040b78b10fd6e954f7f07774.) Change-Id: I526889dfbe2356753aa1e6ecfd3aa3dc3a8cd2b8 Reviewed-on: https://boringssl-review.googlesource.com/31085 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2018-08-16 15:57:10 +00:00
Brian Smith	fee8709f69	Replace \|alloca\| in \|BN_mod_exp_mont_consttime\|. \|alloca\| is dangerous and poorly specified, according to any description of \|alloca\|. It's also hard for some analysis tools to reason about. The code here assumed \|alloca\| is a macro, which isn't a valid assumption. Depending on what which headers are included and what toolchain is being used, \|alloca\| may or may not be defined as a macro, and this might change over time if/when toolchains are updated. Or, we might be doing static analysis and/or dynamic analysis with a different configuration w.r.t. the availability of \|alloca\| than production builds use. Regardless, the \|alloca\| code path only kicked in when the inputs are 840 bits or smaller. Since the multi-prime RSA support was removed, for interesting RSA key sizes the input will be at least 1024 bits and this code path won't be triggered since powerbufLen will be larger than 3072 bytes in those cases. ECC inversion via Fermat's Little Theorem has its own constant-time exponentiation so there are no cases where smaller inputs need to be fast. The RSAZ code avoids the \|OPENSSL_malloc\| for 2048-bit RSA keys. Increasingly the RSAZ code won't be used though, since it will be skipped over on Broadwell+ CPUs. Generalize the RSAZ stack allocation to work for non-RSAZ code paths. In order to ensure this doesn't cause too much stack usage on platforms where RSAZ wasn't already being used, only do so on x86-64, which already has this large stack size requirement due to RSAZ. This change will make it easier to refactor \|BN_mod_exp_mont_consttime\| to do that more safely and in a way that's more compatible with various analysis tools. This is also a step towards eliminating the \|uintptr_t\|-based alignment hack. Since this change increases the number of times \|OPENSSL_free\| is skipped, I've added an explicit \|OPENSSL_cleanse\| to ensure the zeroization is done. This should be done regardless of the other changes here. Change-Id: I8a161ce2720a26127e85fff7513f394883e50b2e Reviewed-on: https://boringssl-review.googlesource.com/28584 Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: David Benjamin <davidben@google.com>	2018-05-21 19:43:05 +00:00
David Benjamin	a63d0ad40d	Require BN_mod_exp_mont* inputs be reduced. If the caller asked for the base to be treated as secret, we should provide that. Allowing unbounded inputs is not compatible with being constant-time. Additionally, this aligns with the guidance here: https://github.com/HACS-workshop/spectre-mitigations/blob/master/crypto_guidelines.md#1-do-not-conditionally-choose-between-constant-and-non-constant-time Update-Note: BN_mod_exp_mont_consttime and BN_mod_exp_mont now require inputs be fully reduced. I believe current callers tolerate this. Additionally, due to a quirk of how certain operations were ordered, using (publicly) zero exponent tolerated a NULL BN_CTX while other exponents required non-NULL BN_CTX. Non-NULL BN_CTX is now required uniformly. This is unlikely to cause problems. Any call site where the exponent is always zero should just be replaced with BN_value_one(). Change-Id: I7c941953ea05f36dc2754facb9f4cf83a6789c61 Reviewed-on: https://boringssl-review.googlesource.com/27665 Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: Steven Valdez <svaldez@google.com>	2018-04-24 18:29:29 +00:00
David Benjamin	9291be5b27	Remove return values from bn_*_small. No sense in adding impossible error cases we need to handle. Additionally, tighten them a bit and require strong bounds. (I wasn't sure what we'd need at first and made them unnecessarily general.) Change-Id: I21a0afde90a55be2e9a0b8d7288f595252844f5f Reviewed-on: https://boringssl-review.googlesource.com/27586 Reviewed-by: Adam Langley <alangley@gmail.com>	2018-04-24 15:34:32 +00:00
David Benjamin	9af9b946d2	Restore the BN_mod codepath for public Montgomery moduli. https://boringssl-review.googlesource.com/10520 and then later https://boringssl-review.googlesource.com/25285 made BN_MONT_CTX_set constant-time, which is necessary for RSA's mont_p and mont_q. However, due to a typo in the benchmark, they did not correctly measure. Split BN_MONT_CTX creation into a constant-time and variable-time one. The constant-time one uses our current algorithm and the latter restores the original BN_mod codepath. Should we wish to avoid BN_mod, I have an alternate version lying around: First, BN_set_bit + bn_mod_lshift1_consttime as now to count up to 2R. Next, observe that 2R = BN_to_montgomery(2) and RR = BN_to_montgomery(R) = BN_to_montgomery(2^r_bits) Also observe that BN_mod_mul_montgomery only needs n0, not RR. Split the core of BN_mod_exp_mont into its own function so the caller handles conversion. Raise 2R to the r_bits power to get 2^r_bitsR = RR. The advantage of that algorithm is that it is still constant-time, so we only need one BN_MONT_CTX_new. Additionally, it avoids BN_mod which is otherwise (almost, but the remaining links should be easy to cut) out of the critical path for correctness. One less operation to worry about. The disadvantage is that it is gives a 25% (RSA-2048) or 32% (RSA-4096) slower RSA verification speed. I went with the BN_mod one for the time being. Before: Did 9204 RSA 2048 signing operations in 10052053us (915.6 ops/sec) Did 326000 RSA 2048 verify (same key) operations in 10028823us (32506.3 ops/sec) Did 50830 RSA 2048 verify (fresh key) operations in 10033794us (5065.9 ops/sec) Did 1269 RSA 4096 signing operations in 10019204us (126.7 ops/sec) Did 88435 RSA 4096 verify (same key) operations in 10031129us (8816.1 ops/sec) Did 14552 RSA 4096 verify (fresh key) operations in 10053411us (1447.5 ops/sec) After: Did 9150 RSA 2048 signing operations in 10022831us (912.9 ops/sec) Did 322000 RSA 2048 verify (same key) operations in 10028604us (32108.2 ops/sec) Did 289000 RSA 2048 verify (fresh key) operations in 10017205us (28850.4 ops/sec) Did 1270 RSA 4096 signing operations in 10072950us (126.1 ops/sec) Did 87480 RSA 4096 verify (same key) operations in 10036328us (8716.3 ops/sec) Did 80730 RSA 4096 verify (fresh key) operations in 10073614us (8014.0 ops/sec) Change-Id: Ie8916d1634ccf8513ceda458fa302f09f3e93c07 Reviewed-on: https://boringssl-review.googlesource.com/27287 Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: Adam Langley <agl@google.com>	2018-04-20 20:50:15 +00:00
David Benjamin	7e2a8a34ba	Speed up variable windowed exponentation a bit. The first non-zero window (which we can condition on for public exponents) always multiplies by one. This means we can cut out one Montgomery multiplication. It also means we never actually need to initialize r to one, saving another Montgomery multiplication for P-521. This, in turn, means we don't need the bn_one_to_montgomery optimization for the public-exponent exponentations, so we can delete bn_one_to_montgomery_small. (The function does currently promise to handle p = 0, but this is not actually reachable, so it can just do a reduction on RR.) For RSA, where we're not doing many multiplications to begin with, saving one is noticeable. Before: Did 92000 RSA 2048 verify (same key) operations in 3002557us (30640.6 ops/sec) Did 25165 RSA 4096 verify (same key) operations in 3045046us (8264.2 ops/sec) After: Did 100000 RSA 2048 verify (same key) operations in 3002483us (33305.8 ops/sec) Did 26603 RSA 4096 verify (same key) operations in 3010942us (8835.4 ops/sec) (Not looking at the fresh key number yet as that still needs to be fixed.) Change-Id: I81a025a68d9b0f8eb0f9c6c04ec4eedf0995a345 Reviewed-on: https://boringssl-review.googlesource.com/27286 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2018-04-20 20:37:45 +00:00
David Benjamin	56ea9e2769	Fix bn_mod_exp_mont_small when exponentiating to zero. It's defined to return one in Montgomery form, not a normal one. (Not that this matters. This function is only used to Fermat's Little Theorem. Probably it should have been less general, though we'd need to make new test vectors first.) Change-Id: Ia8d7588e6a413b25f01280af9aacef0192283771 Reviewed-on: https://boringssl-review.googlesource.com/27285 Reviewed-by: Adam Langley <agl@google.com>	2018-04-18 22:13:16 +00:00
David Benjamin	e0ae249f03	Remove a = 0 special-case in BN_mod_exp_mont. BN_mod_exp_mont is intended to protect the base, but not the exponent. Accordingly, it shouldn't treat a base of zero as special. Change-Id: Ib053e8ce65ab1741973a9f9bfeff8c353567439c Reviewed-on: https://boringssl-review.googlesource.com/27284 Reviewed-by: Adam Langley <agl@google.com>	2018-04-18 22:03:16 +00:00
David Benjamin	08d774a45f	Remove some easy bn_set_minimal_width calls. Functions that deserialize from bytes and Montgomery multiplication have no reason to minimize their inputs. Bug: 232 Change-Id: I121cc9b388033d684057b9df4ad0c08364849f58 Reviewed-on: https://boringssl-review.googlesource.com/25258 Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: Adam Langley <agl@google.com>	2018-02-05 23:47:14 +00:00
David Benjamin	09633cc34e	Rename bn->top to bn->width. This has no behavior change, but it has a semantic one. This CL is an assertion that all BIGNUM functions tolerate non-minimal BIGNUMs now. Specifically: - Functions that do not touch top/width are assumed to not care. - Functions that do touch top/width will be changed by this CL. These should be checked in review that they tolerate non-minimal BIGNUMs. Subsequent CLs will start adjusting the widths that BIGNUM functions output, to fix timing leaks. Bug: 232 Change-Id: I3a2b41b071f2174452f8d3801bce5c78947bb8f7 Reviewed-on: https://boringssl-review.googlesource.com/25257 Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: Adam Langley <agl@google.com>	2018-02-05 23:44:24 +00:00
David Benjamin	226b4b51b5	Make the rest of BIGNUM accept non-minimal values. Test this by re-running bn_tests.txt tests a lot. For the most part, this was done by scattering bn_minimal_width or bn_correct_top calls as needed. We'll incrementally tease apart the functions that need to act on non-minimal BIGNUMs in constant-time. BN_sqr was switched to call bn_correct_top at the end, rather than sample bn_minimal_width, in anticipation of later splitting it into BN_sqr (for calculators) and BN_sqr_fixed (for BN_mod_mul_montgomery). BN_div_word also uses bn_correct_top because it calls BN_lshift so officially shouldn't rely on BN_lshift returning something minimal-width, though I expect we'd want to split off a BN_lshift_fixed than change that anyway? The shifts sample bn_minimal_width rather than bn_correct_top because they all seem to try to be very clever around the bit width. If we need constant-time versions of them, we can adjust them later. Bug: 232 Change-Id: Ie17b39034a713542dbe906cf8954c0c5483c7db7 Reviewed-on: https://boringssl-review.googlesource.com/25255 Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: Adam Langley <agl@google.com>	2018-02-05 23:05:34 +00:00
David Benjamin	f4b708cc1e	Add a function which folds BN_MONT_CTX_{new,set} together. These empty states aren't any use to either caller or implementor. Change-Id: If0b748afeeb79e4a1386182e61c5b5ecf838de62 Reviewed-on: https://boringssl-review.googlesource.com/25254 Reviewed-by: Adam Langley <agl@google.com>	2018-02-02 20:23:25 +00:00
David Benjamin	2ccdf584aa	Factor out BN_to_montgomery(1) optimization. This cuts down on a duplicated place where we mess with bn->top. It also also better abstracts away what determines the value of R. (I ordered this wrong and rebasing will be annoying. Specifically, the question is what happens if the modulus is non-minimal. In https://boringssl-review.googlesource.com/c/boringssl/+/25250/, R will be determined by the stored width of mont->N, so we want to use mont's copy of the modulus. Though, one way or another, the important part is that it's inside the Montgomery abstraction.) Bug: 232 Change-Id: I74212e094c8a47f396b87982039e49048a130916 Reviewed-on: https://boringssl-review.googlesource.com/25247 Reviewed-by: Adam Langley <agl@google.com>	2018-02-02 18:42:39 +00:00
David Benjamin	32b5940267	Don't leak the exponent bit width in BN_mod_exp_mont_consttime. (See also https://github.com/openssl/openssl/pull/5154.) The exponent here is one of d, dmp1, or dmq1 for RSA. This value and its bit length are both secret. The only public upper bound is the bit width of the corresponding modulus (RSA n, p, and q, respectively). Although BN_num_bits is constant-time (sort of; see bn_correct_top notes in preceding patch), this does not fix the root problem, which is that the windows are based on the minimal bit width, not the upper bound. We could use BN_num_bits(m), but BN_mod_exp_mont_consttime is public API and may be called with larger exponents. Instead, use all top*BN_BITS2 bits in the BIGNUM. This is still sensitive to the long-standing bn_correct_top leak, but we need to fix that regardless. This may cause us to do a handful of extra multiplications for RSA keys which are just above a whole number of words, but that is not a standard RSA key size. Change-Id: I5e2f12b70c303b27c597a7e513b7bf7288f7b0e3 Reviewed-on: https://boringssl-review.googlesource.com/25185 Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: Adam Langley <agl@google.com>	2018-01-23 22:27:37 +00:00
David Benjamin	a08bba51a5	Add bn_mod_exp_mont_small and bn_mod_inverse_prime_mont_small. These can be used to invert values in ECDSA. Unlike their BIGNUM counterparts, the caller is responsible for taking values in and out of Montgomery domain. This will save some work later on in the ECDSA computation. Change-Id: Ib7292900a0fdeedce6cb3e9a9123c94863659043 Reviewed-on: https://boringssl-review.googlesource.com/23071 Reviewed-by: Adam Langley <agl@google.com>	2017-11-20 16:23:48 +00:00
David Benjamin	d66bbf3413	Tidy up BN_mod_exp_mont. This was primarily for my own understanding, but this should hopefully also be clearer and more amenable to using unsigned indices later. Change-Id: I09cc3d55de0f7d9284d3b3168d8b0446274b2ab7 Reviewed-on: https://boringssl-review.googlesource.com/22889 Reviewed-by: Adam Langley <agl@google.com>	2017-11-10 22:43:54 +00:00
Daniel Hirche	2eb2889702	bn/exp: don't check \|copy_to_prebuf\|'s retval in \|BN_mod_exp_mont_consttime\|. It always returns one, so just void it. Change-Id: I8733cc3d6b20185e782cf0291e9c0dc57712bb63 Reviewed-on: https://boringssl-review.googlesource.com/22564 Reviewed-by: Adam Langley <agl@google.com> Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2017-11-03 15:43:52 +00:00
David Benjamin	fed560ff2a	Clear no-op BN_MASK2 masks. This is an OpenSSL thing to support platforms where BN_ULONG is not actually the size it claims to be. We define BN_ULONG to uint32_t and uint64_t which are guaranteed by C to implement arithemetic modulo 2^32 and 2^64, respectively. Thus there is no need for any of this. Change-Id: I098cd4cc050a136b9f2c091dfbc28dd83e01f531 Reviewed-on: https://boringssl-review.googlesource.com/21784 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2017-10-27 02:38:45 +00:00
Martin Kreichgauer	6dc892fcdf	Remove redundant calls to \|OPENSSL_cleanse\| and \|OPENSSL_realloc_clean\|. Change-Id: I5c85c4d072ec157b37ed95b284a26ab32c0c42d9 Reviewed-on: https://boringssl-review.googlesource.com/19824 Reviewed-by: Martin Kreichgauer <martinkr@google.com> Commit-Queue: Martin Kreichgauer <martinkr@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2017-09-18 19:16:51 +00:00
David Benjamin	808f832917	Run the comment converter on libcrypto. crypto/{asn1,x509,x509v3,pem} were skipped as they are still OpenSSL style. Change-Id: I3cd9a60e1cb483a981aca325041f3fbce294247c Reviewed-on: https://boringssl-review.googlesource.com/19504 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2017-08-18 21:49:04 +00:00
David Benjamin	9d4e06e6bc	Switch some pointer casts to memcpy. This isn't all of our pointer games by far, but for any code which doesn't run on armv6, memcpy and pointer cast compile to the same code. For code with does care about armv6 (do we care?), it'll need a bit more work. armv6 makes memcpy into a function call. Ironically, the one platform where C needs its alignment rules is the one platform that makes it hard to honor C's alignment rules. Change-Id: Ib9775aa4d9df9381995df8698bd11eb260aac58c Reviewed-on: https://boringssl-review.googlesource.com/17707 Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2017-07-11 02:02:41 +00:00
Adam Langley	5c38c05b26	Move bn/ into crypto/fipsmodule/ Change-Id: I68aa4a740ee1c7f2a308a6536f408929f15b694c Reviewed-on: https://boringssl-review.googlesource.com/15647 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2017-05-01 22:51:25 +00:00

25 Commits