boringssl

Author	SHA1	Message	Date
David Benjamin	ffbf95ad41	Devirtualize ec_simple_{add,dbl}. Now that the tuned add/dbl implementations are exposed, these can be specific to EC_GFp_mont_method and call the felem_mul and felem_sqr implementations directly. felem_sqr and felem_mul are still used elsewhere in simple.c, however, so we cannot get rid of them yet. Change-Id: I5ea22a8815279931afc98a6fc578bc85e3f8bdcc Reviewed-on: https://boringssl-review.googlesource.com/c/32849 Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: Adam Langley <agl@google.com>	2018-11-06 18:32:11 +00:00
David Benjamin	8370fb6b41	Implement constant-time generic multiplication. This is slower, but constant-time. It intentionally omits the signed digit optimization because we cannot be sure the doubling case will be unreachable for all curves. This is a fallback generic implementation for curves which we must support for compatibility but which are not common or important enough to justify curve-specific work. Before: Did 814 ECDH P-384 operations in 1085384us (750.0 ops/sec) Did 1430 ECDSA P-384 signing operations in 1081988us (1321.6 ops/sec) Did 308 ECDH P-521 operations in 1057741us (291.2 ops/sec) Did 539 ECDSA P-521 signing operations in 1049797us (513.4 ops/sec) After: Did 715 ECDH P-384 operations in 1080161us (661.9 ops/sec) Did 1188 ECDSA P-384 verify operations in 1069567us (1110.7 ops/sec) Did 275 ECDH P-521 operations in 1060503us (259.3 ops/sec) Did 506 ECDSA P-521 signing operations in 1084739us (466.5 ops/sec) But we're still faster than the old BIGNUM implementation. EC_FELEM more than paid for both the loss of points_make_affine and this CL. Bug: 239 Change-Id: I65d71a731aad16b523928ee47618822d503ea704 Reviewed-on: https://boringssl-review.googlesource.com/27708 Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: Adam Langley <agl@google.com>	2018-04-27 20:11:29 +00:00
David Benjamin	8b0dc7a720	Simplify ec_wNAF_mul table sizing. w=4 appears to be the correct answer for P-224 through P-521. There's nominally some optimizations in here for 70- and 20-bit primes, but that's absurd. Change-Id: Id4ccec779b17e375e9258c1784e46d7d3651c59a Reviewed-on: https://boringssl-review.googlesource.com/27707 Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: Adam Langley <agl@google.com>	2018-04-27 19:49:08 +00:00
David Benjamin	041dd68cec	Clear mallocs in ec_wNAF_mul. EC_POINT is split into the existing public EC_POINT (where the caller is sanity-checked about group mismatches) and the low-level EC_RAW_POINT (which, like EC_FELEM and EC_SCALAR, assume that is your problem and is a plain old struct). Having both EC_POINT and EC_RAW_POINT is a little silly, but we're going to want different type signatures for functions which return void anyway (my plan is to lift a non-BIGNUM get_affine_coordinates up through the ECDSA and ECDH code), so I think it's fine. This wasn't strictly necessary, but wnaf.c is a lot tidier now. Perf is a wash; once we get up to this layer, it's only 8 entries in the table so not particularly interesting. Bug: 239 Change-Id: I8ace749393d359f42649a5bb0734597bb7c07a2e Reviewed-on: https://boringssl-review.googlesource.com/27706 Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: Adam Langley <agl@google.com>	2018-04-27 19:44:58 +00:00
David Benjamin	e14e4a7ee3	Remove ec_compute_wNAF's failure cases. Replace them with asserts and better justify why each of the internal cases are not reachable. Also change the loop to count up to bits+1 so it is obvious there is no memory error. (The previous loop shape made more sense when ec_compute_wNAF would return a variable length schedule.) Change-Id: I9c7df6abac4290b7a3e545e3d4aa1462108e239e Reviewed-on: https://boringssl-review.googlesource.com/27705 Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: Adam Langley <agl@google.com>	2018-04-27 19:24:58 +00:00
David Benjamin	6a289b3ec4	Remove EC_POINTs_make_affine and related logic. This does not appear to actually pull its weight. The purpose of this logic is to switch some adds to the faster add_mixed in the wNAF code, at the cost of a rather expensive inversion. This optimization kicks in for generic curves, so P-384 and P-521: With: Did 32130 ECDSA P-384 signing operations in 30077563us (1068.2 ops/sec) Did 27456 ECDSA P-384 verify operations in 30073086us (913.0 ops/sec) Did 14122 ECDSA P-521 signing operations in 30077407us (469.5 ops/sec) Did 11973 ECDSA P-521 verify operations in 30037330us (398.6 ops/sec) Without: Did 32445 ECDSA P-384 signing operations in 30069721us (1079.0 ops/sec) Did 27056 ECDSA P-384 verify operations in 30032303us (900.9 ops/sec) Did 13905 ECDSA P-521 signing operations in 30000430us (463.5 ops/sec) Did 11433 ECDSA P-521 verify operations in 30021876us (380.8 ops/sec) For single-point multiplication, the optimization is not useful. This makes sense as we only have one table's worth of additions to convert but still pay for the inversion. For double-point multiplication, it is slightly useful for P-384 and very useful for P-521. However, the next change to stack-allocate EC_FELEMs will more than compensate for removing it. (The immediate goal here is to simplify the EC_FELEM story.) Additionally, that this optimization was not useful for single-point multiplication implies that, should we wish to recover this, a modest 8-entry pre-computed (affine) base point table should have the same effect or better. Update-Note: I do not believe anything was calling either of these functions. (If necessary, we can always add no-op stubs as whether a point is affine is not visible to external code. It previously kicked in some optimizations, but those were removed for constant-time needs anyway.) Bug: 239 Change-Id: Ic9c51b001c45595cfe592274c7d5d652f4234839 Reviewed-on: https://boringssl-review.googlesource.com/27667 Reviewed-by: Adam Langley <agl@google.com>	2018-04-25 16:12:06 +00:00
David Benjamin	eda47f5d98	Make generic point arithmetic slightly less variable-time. The generic code special-cases affine points, but this leaks information. (Of course, the generic code also doesn't have a constant-time multiply and other problems, but one thing at a time.) The optimization in point doubling is not useful. Point multiplication more-or-less never doubles an affine point. The optimization in point addition is useful because the wNAF code converts the tables to affine. Accordingly, align with the P-256 code which adds a 'mixed' parameter. (I haven't aligned the formally-verified point formulas themselves yet; initial testing suggests that the large number of temporaries take a perf hit with BIGNUM. I'll check the results in EC_FELEM, which will be stack-allocated, to see if we still need to help the compiler out.) Strangly, it actually got a bit faster with this change. I'm guessing because now it doesn't need to bother with unnecessary comparisons and maybe was kinder to the branch predictor? Before: Did 2201 ECDH P-384 operations in 3068341us (717.3 ops/sec) Did 4092 ECDSA P-384 signing operations in 3076981us (1329.9 ops/sec) Did 3503 ECDSA P-384 verify operations in 3024753us (1158.1 ops/sec) Did 992 ECDH P-521 operations in 3017884us (328.7 ops/sec) Did 1798 ECDSA P-521 signing operations in 3059000us (587.8 ops/sec) Did 1581 ECDSA P-521 verify operations in 3033142us (521.2 ops/sec) After: Did 2310 ECDH P-384 operations in 3092648us (746.9 ops/sec) Did 4080 ECDSA P-384 signing operations in 3044588us (1340.1 ops/sec) Did 3520 ECDSA P-384 verify operations in 3056070us (1151.8 ops/sec) Did 992 ECDH P-521 operations in 3012779us (329.3 ops/sec) Did 1792 ECDSA P-521 signing operations in 3019459us (593.5 ops/sec) Did 1600 ECDSA P-521 verify operations in 3047749us (525.0 ops/sec) Bug: 239 Change-Id: If5d13825fc98e4c58bdd1580cf0245bf7ce93a82 Reviewed-on: https://boringssl-review.googlesource.com/27004 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2018-04-04 21:33:22 +00:00
Daniel Hirche	36714fc8ee	Remove redundant length-check in \|ec_wNAF_mul\|. Right now, \|g_wNAF\| and \|p_wNAF\| are of same size. This change makes GCC's "-Werror=logical-op" happy and adds a compile-time assertion in case the initial size of either array ever changes. Change-Id: I29e39a7a121a0a9d016c53da6b7c25675ddecbdc Reviewed-on: https://boringssl-review.googlesource.com/26104 Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2018-02-21 17:03:14 +00:00
David Benjamin	638a408cd2	Add a tuned variable-time P-256 multiplication function. This reuses wnaf.c's window scheduling, but has access to the tuned field arithemetic and pre-computed base point table. Unlike wnaf.c, we do not make the points affine as it's not worth it for a single table. (We already precomputed the base point table.) Annoyingly, 32-bit x86 gets slower by a bit, but the other platforms are faster. My guess is that that the generic code gets to use the bn_mul_mont assembly and the compiler, faced with the increased 32-bit register pressure and the extremely register-poor x86, is making bad decisions on the otherwise P-256-tuned C code. The three platforms that see much larger gains are significantly more important than 32-bit x86 at this point, so go with this change. armv7a (Nexus 5X) before/after [+14.4%]: Did 2703 ECDSA P-256 verify operations in 5034539us (536.9 ops/sec) Did 3127 ECDSA P-256 verify operations in 5091379us (614.2 ops/sec) aarch64 (Nexus 5X) before/after [+9.2%]: Did 6783 ECDSA P-256 verify operations in 5031324us (1348.2 ops/sec) Did 7410 ECDSA P-256 verify operations in 5033291us (1472.2 ops/sec) x86 before/after [-2.7%]: Did 8961 ECDSA P-256 verify operations in 10075901us (889.3 ops/sec) Did 8568 ECDSA P-256 verify operations in 10003001us (856.5 ops/sec) x86_64 before/after [+8.6%]: Did 29808 ECDSA P-256 verify operations in 10008662us (2978.2 ops/sec) Did 32528 ECDSA P-256 verify operations in 10057137us (3234.3 ops/sec) Change-Id: I5fa643149f5bfbbda9533e3008baadfee9979b93 Reviewed-on: https://boringssl-review.googlesource.com/25684 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2018-02-12 22:00:48 +00:00
David Benjamin	09633cc34e	Rename bn->top to bn->width. This has no behavior change, but it has a semantic one. This CL is an assertion that all BIGNUM functions tolerate non-minimal BIGNUMs now. Specifically: - Functions that do not touch top/width are assumed to not care. - Functions that do touch top/width will be changed by this CL. These should be checked in review that they tolerate non-minimal BIGNUMs. Subsequent CLs will start adjusting the widths that BIGNUM functions output, to fix timing leaks. Bug: 232 Change-Id: I3a2b41b071f2174452f8d3801bce5c78947bb8f7 Reviewed-on: https://boringssl-review.googlesource.com/25257 Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: Adam Langley <agl@google.com>	2018-02-05 23:44:24 +00:00
David Benjamin	b9f30bb6fe	Unwind total_num from wNAF_mul. The EC_POINTs are still allocated (for now), but everything else fits on the stack nicely, which saves a lot of fiddling with cleanup and allocations. Change-Id: Ib8480737ecc97e6b40b2c05f217cd8d3dc82cb72 Reviewed-on: https://boringssl-review.googlesource.com/25150 Reviewed-by: Adam Langley <agl@google.com>	2018-01-23 22:04:58 +00:00
David Benjamin	d86c0d2889	Pull the malloc out of compute_wNAF. This is to simplify clearing unnecessary mallocs out of ec_wNAF_mul, and perhaps to use it in tuned variable-time multiplication functions. Change-Id: Ic390d2e8e20d0ee50f3643830a582e94baebba95 Reviewed-on: https://boringssl-review.googlesource.com/25149 Reviewed-by: Adam Langley <agl@google.com>	2018-01-23 21:53:58 +00:00
David Benjamin	6ca09409cc	Always compute the maximum-length wNAF. This cuts out another total_num-length array and simplifies things. Leading zeros at the front of the schedule don't do anything, so it's easier to just produce a fixed-length one. (I'm also hoping to ultimately reuse this function in //third_party/fiat/p256.c and get the best of both worlds for ECDSA verification; tuned field arithmetic operations, precomputed table, and variable-time multiply.) Change-Id: I771f4ff7dcfdc3ee0eff8d9038d6dc9a0be3d4e0 Reviewed-on: https://boringssl-review.googlesource.com/25148 Reviewed-by: Adam Langley <agl@google.com>	2018-01-23 21:51:25 +00:00
David Benjamin	522ad7e8fc	Use EC_SCALAR for compute_wNAF. Note this switches from walking BN_num_bits to the full bit length of the scalar. But that can only cause it to add a few extra zeros to the front of the schedule, which r_is_at_infinity will skip over. Change-Id: I91e087c9c03505566b68f75fb37dfb53db467652 Reviewed-on: https://boringssl-review.googlesource.com/25147 Reviewed-by: Adam Langley <agl@google.com>	2018-01-23 21:34:50 +00:00
David Benjamin	338eeb0c4f	Remove r_is_inverted logic. This appears to be pointless. Before, we would have a 50% chance of doing an inversion at each non-zero bit but the first (r_is_at_infinity), plus a 50% chance of doing an inversion at the end. Now we would have a 50% chance of doing an inversion at each non-zero bit. That's the same number of coin flips. Change-Id: I8158fd48601cb041188826d4f68ac1a31a6fbbbc Reviewed-on: https://boringssl-review.googlesource.com/25146 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2018-01-23 21:29:13 +00:00
David Benjamin	5d9408714c	Remove unnecessary window size cases. The optimization for wsize = 1 only kicks in for 19-bit primes. The cases for b >= 800 and cannot happen due to EC_MAX_SCALAR_BYTES. Change-Id: If5ca908563f027172cdf31c9a22342152fecd12f Reviewed-on: https://boringssl-review.googlesource.com/25145 Reviewed-by: Adam Langley <agl@google.com>	2018-01-23 21:08:39 +00:00
David Benjamin	4111dd2fc2	Don't compute a per-scalar window size in wNAF code. Simplify things slightly. The probability of the scalar being small enough to go down a window size is astronomically small. (2^-186 for P-256 and 2^-84 for P-384.) Change-Id: Ie879f0b06bcfd1e6e6e3bf3f54e0d7d6567525a4 Reviewed-on: https://boringssl-review.googlesource.com/25144 Reviewed-by: Adam Langley <agl@google.com>	2018-01-23 21:06:42 +00:00
David Benjamin	a838f9dc7e	Make ECDSA signing 10% faster and plug some timing leaks. None of the asymmetric crypto we inherented from OpenSSL is constant-time because of BIGNUM. BIGNUM chops leading zeros off the front of everything, so we end up leaking information about the first word, in theory. BIGNUM functions additionally tend to take the full range of inputs and then call into BN_nnmod at various points. All our secret values should be acted on in constant-time, but k in ECDSA is a particularly sensitive value. So, ecdsa_sign_setup, in an attempt to mitigate the BIGNUM leaks, would add a couple copies of the order. This does not work at all. k is used to compute two values: k^-1 and kG. The first operation when computing k^-1 is to call BN_nnmod if k is out of range. The entry point to our tuned constant-time curve implementations is to call BN_nnmod if the scalar has too many bits, which this causes. The result is both corrections are immediately undone but cause us to do more variable-time work in the meantime. Replace all these computations around k with the word-based functions added in the various preceding CLs. In doing so, replace the BN_mod_mul calls (which internally call BN_nnmod) with Montgomery reduction. We can avoid taking k^-1 out of Montgomery form, which combines nicely with Brian Smith's trick in `3426d10119`. Along the way, we avoid some unnecessary mallocs. BIGNUM still affects the private key itself, as well as the EC_POINTs. But this should hopefully be much better now. Also it's 10% faster: Before: Did 15000 ECDSA P-224 signing operations in 1069117us (14030.3 ops/sec) Did 18000 ECDSA P-256 signing operations in 1053908us (17079.3 ops/sec) Did 1078 ECDSA P-384 signing operations in 1087853us (990.9 ops/sec) Did 473 ECDSA P-521 signing operations in 1069835us (442.1 ops/sec) After: Did 16000 ECDSA P-224 signing operations in 1064799us (15026.3 ops/sec) Did 19000 ECDSA P-256 signing operations in 1007839us (18852.2 ops/sec) Did 1078 ECDSA P-384 signing operations in 1079413us (998.7 ops/sec) Did 484 ECDSA P-521 signing operations in 1083616us (446.7 ops/sec) Change-Id: I2a25e90fc99dac13c0616d0ea45e125a4bd8cca1 Reviewed-on: https://boringssl-review.googlesource.com/23075 Reviewed-by: Adam Langley <agl@google.com>	2017-11-22 22:51:40 +00:00
David Benjamin	d24fd47ff4	Fold EC_POINT_clear_free into EC_POINT_free. All frees zero memory now. Change-Id: I5b04a0d14f38d5a7422e148d077fcba85a593594 Reviewed-on: https://boringssl-review.googlesource.com/22225 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2017-10-27 17:41:19 +00:00
David Benjamin	808f832917	Run the comment converter on libcrypto. crypto/{asn1,x509,x509v3,pem} were skipped as they are still OpenSSL style. Change-Id: I3cd9a60e1cb483a981aca325041f3fbce294247c Reviewed-on: https://boringssl-review.googlesource.com/19504 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2017-08-18 21:49:04 +00:00
Adam Langley	aacb72c1b7	Move ec/ and ecdsa/ into fipsmodule/ The names in the P-224 code collided with the P-256 code and thus many of the functions and constants in the P-224 code have been prefixed. Change-Id: I6bcd304640c539d0483d129d5eaf1702894929a8 Reviewed-on: https://boringssl-review.googlesource.com/15847 Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2017-05-04 20:27:23 +00:00

21 Commits