The generic code special-cases affine points, but this leaks
information. (Of course, the generic code also doesn't have a
constant-time multiply and other problems, but one thing at a time.)
The optimization in point doubling is not useful. Point multiplication
more-or-less never doubles an affine point. The optimization in point
addition *is* useful because the wNAF code converts the tables to
affine. Accordingly, align with the P-256 code which adds a 'mixed'
parameter.
(I haven't aligned the formally-verified point formulas themselves yet;
initial testing suggests that the large number of temporaries take a
perf hit with BIGNUM. I'll check the results in EC_FELEM, which will be
stack-allocated, to see if we still need to help the compiler out.)
Strangly, it actually got a bit faster with this change. I'm guessing
because now it doesn't need to bother with unnecessary comparisons and
maybe was kinder to the branch predictor?
Before:
Did 2201 ECDH P-384 operations in 3068341us (717.3 ops/sec)
Did 4092 ECDSA P-384 signing operations in 3076981us (1329.9 ops/sec)
Did 3503 ECDSA P-384 verify operations in 3024753us (1158.1 ops/sec)
Did 992 ECDH P-521 operations in 3017884us (328.7 ops/sec)
Did 1798 ECDSA P-521 signing operations in 3059000us (587.8 ops/sec)
Did 1581 ECDSA P-521 verify operations in 3033142us (521.2 ops/sec)
After:
Did 2310 ECDH P-384 operations in 3092648us (746.9 ops/sec)
Did 4080 ECDSA P-384 signing operations in 3044588us (1340.1 ops/sec)
Did 3520 ECDSA P-384 verify operations in 3056070us (1151.8 ops/sec)
Did 992 ECDH P-521 operations in 3012779us (329.3 ops/sec)
Did 1792 ECDSA P-521 signing operations in 3019459us (593.5 ops/sec)
Did 1600 ECDSA P-521 verify operations in 3047749us (525.0 ops/sec)
Bug: 239
Change-Id: If5d13825fc98e4c58bdd1580cf0245bf7ce93a82
Reviewed-on: https://boringssl-review.googlesource.com/27004
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This is in preparation for representing field elements with
stack-allocated types in the generic code. While there is likely little
benefit in threading all the turned field arithmetic through all the
generic code, and the P-224 logic, in particular, does not have a tight
enough abstraction for this, the current implementations depend on
BN_div, which is not compatible with stack-allocating things and avoiding
malloc.
This also speeds things up slightly, now that benchmarks cover point
validation.
Before:
Did 82786 ECDH P-224 operations in 10024326us (8258.5 ops/sec)
After:
Did 89991 ECDH P-224 operations in 10012429us (8987.9 ops/sec)
Change-Id: I468483b49f5dc69187aebd62834365ce5caab795
Reviewed-on: https://boringssl-review.googlesource.com/26971
Reviewed-by: Adam Langley <agl@google.com>
Alas, it is reachable by way of the legacy custom curves API. Add a
basic test to ensure those codepaths work.
Change-Id: If631110045a664001133a0d07fdac4c67971a15f
Reviewed-on: https://boringssl-review.googlesource.com/26970
Reviewed-by: Adam Langley <agl@google.com>
I'm not sure why I separated "fixed" and "quick_ctx" names. That's
annoying and doesn't generalize well to, say, adding a bn_div_consttime
function for RSA keygen.
Change-Id: I751d52b30e079de2f0d37a952de380fbf2c1e6b7
Reviewed-on: https://boringssl-review.googlesource.com/26364
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
Probably worth having actual test vectors for these, rather than
checking our code against itself. Additionally, small negative numbers
have, in the past been valuable test vectors (see long comment in
point_add from OpenSSL's ecp_nistp521.c).
Change-Id: Ia5aa8a80eb5b6d0089c3601c5fec2364e699794d
Reviewed-on: https://boringssl-review.googlesource.com/26848
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
p224_felem_neg does not produce an output within the tight bounds
suitable for p224_felem_contract. This was found by inspection of the
code.
This only affects the final y-coordinate output of arbitrary-point
multiplication, so it is a no-op for ECDH and ECDSA.
Change-Id: I1d929458d1f21d02cd8e745d2f0f7040a6bb0627
Reviewed-on: https://boringssl-review.googlesource.com/26847
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
No semantic change: the table is the same as before, but now with less
magic.
Change-Id: I351c2446e9765f25b7dfb901c9e98f12099a325c
Reviewed-on: https://boringssl-review.googlesource.com/26744
Reviewed-by: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Rather than writing the answer into the output, it wrote it into some
awkwardly-named temporaries. Thanks to Daniel Hirche for reporting this
issue!
Bug: chromium:825273
Change-Id: I5def4be045cd1925453c9873218e5449bf25e3f5
Reviewed-on: https://boringssl-review.googlesource.com/26785
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This isn't strictly necessary now that BIGNUMs are safe, but we get to
rely on type-system annotations from EC_SCALAR. Additionally,
EC_POINT_mul depends on BN_div, while the EC_SCALAR version does not.
Change-Id: I75e6967f3d35aef17278b94862f4e506baff5c23
Reviewed-on: https://boringssl-review.googlesource.com/26424
Reviewed-by: Steven Valdez <svaldez@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
EC_KEY_copy left unset fields alone, which meant it was possible to
create an EC_KEY with mismatched private key and group. Nothing was
using EC_KEY_copy anyway, and in keeping of us generally preferring
fresh objects over object reuse, remove it. EC_KEY_dup itself can also
be made simpler by using the very setters available.
Additionally, skip copying the method table. As of
https://boringssl-review.googlesource.com/16344, we no longer copy the
ex_data, so we probably shouldn't copy the method pointers either,
aligning with RSAPrivateKey_dup.
Update-Note: If I missed anything and someone uses EC_KEY_copy, it
should be easy to port them to EC_KEY_dup.
Change-Id: Ibbdcea73345d91fa143fbe70a15bb527972693e8
Reviewed-on: https://boringssl-review.googlesource.com/26404
Reviewed-by: Steven Valdez <svaldez@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Right now, |g_wNAF| and |p_wNAF| are of same size.
This change makes GCC's "-Werror=logical-op" happy and adds a compile-time
assertion in case the initial size of either array ever changes.
Change-Id: I29e39a7a121a0a9d016c53da6b7c25675ddecbdc
Reviewed-on: https://boringssl-review.googlesource.com/26104
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This reuses wnaf.c's window scheduling, but has access to the tuned
field arithemetic and pre-computed base point table. Unlike wnaf.c, we
do not make the points affine as it's not worth it for a single table.
(We already precomputed the base point table.)
Annoyingly, 32-bit x86 gets slower by a bit, but the other platforms are
faster. My guess is that that the generic code gets to use the
bn_mul_mont assembly and the compiler, faced with the increased 32-bit
register pressure and the extremely register-poor x86, is making
bad decisions on the otherwise P-256-tuned C code. The three platforms
that see much larger gains are significantly more important than 32-bit
x86 at this point, so go with this change.
armv7a (Nexus 5X) before/after [+14.4%]:
Did 2703 ECDSA P-256 verify operations in 5034539us (536.9 ops/sec)
Did 3127 ECDSA P-256 verify operations in 5091379us (614.2 ops/sec)
aarch64 (Nexus 5X) before/after [+9.2%]:
Did 6783 ECDSA P-256 verify operations in 5031324us (1348.2 ops/sec)
Did 7410 ECDSA P-256 verify operations in 5033291us (1472.2 ops/sec)
x86 before/after [-2.7%]:
Did 8961 ECDSA P-256 verify operations in 10075901us (889.3 ops/sec)
Did 8568 ECDSA P-256 verify operations in 10003001us (856.5 ops/sec)
x86_64 before/after [+8.6%]:
Did 29808 ECDSA P-256 verify operations in 10008662us (2978.2 ops/sec)
Did 32528 ECDSA P-256 verify operations in 10057137us (3234.3 ops/sec)
Change-Id: I5fa643149f5bfbbda9533e3008baadfee9979b93
Reviewed-on: https://boringssl-review.googlesource.com/25684
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This was done by OpenSSL with the kind permission of Intel. This change
is imported from upstream's commit
dcf6e50f48e6bab92dcd2dacb27fc17c0de34199.
Change-Id: Ie8d3b700cd527a6e8cf66e0728051b2acd8cc6b9
Reviewed-on: https://boringssl-review.googlesource.com/25588
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
This imports 384e6de4c7e35e37fb3d6fbeb32ddcb5eb0d3d3f and
79ca382d4762c58c4b92fceb4e202e90c71292ae from upstream.
Differences from upstream:
- We've removed a number of unused functions.
- We never imported 3ff08e1dde56747011a702a9a5aae06cfa8ae5fc, which was
to give the assembly control over the memory layout in the tables. So
our "gather" is "select" (which is implemented the same because the
memory layout never did change) and our "scatter" is in C.
Change-Id: I90d4a17da9f5f693f4dc4706887dec15f010071b
Reviewed-on: https://boringssl-review.googlesource.com/25586
Reviewed-by: Adam Langley <agl@google.com>
As of upstream's 6aa36e8e5a062e31543e7796f0351ff9628832ce, the
corresponding file in OpenSSL has both an Intel and OpenSSL copyright
blocks. To properly sync up with OpenSSL, use the OpenSSL copyright
block and our version of the Intel copyright block.
Change-Id: I4dc072a11390a54d0ce38ec0b8893e48f52638de
Reviewed-on: https://boringssl-review.googlesource.com/25585
Reviewed-by: Adam Langley <agl@google.com>
If a caller is in the process on constructing an arbitrary |EC_GROUP|,
and they try to create an |EC_POINT| to set as the generator which is
invalid, we would previously crash.
Change-Id: Ida91354257a02bd56ac29ba3104c9782b8d70f6b
Reviewed-on: https://boringssl-review.googlesource.com/25764
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This is a bit easier to read than BN_less_than_consttime when we must do
>= or <=, about as much work to compute, and lots of code calls BN_cmp
on secret data. This also, by extension, makes BN_cmp_word
constant-time.
BN_equal_consttime is probably a little more efficient and is perfectly
readable, so leave that one around.
Change-Id: Id2e07fe312f01cb6fd10a1306dcbf6397990cf13
Reviewed-on: https://boringssl-review.googlesource.com/25444
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
As the EC code will ultimately want to use these in "words" form by way
of EC_FELEM, and because it's much easier, I've implement these as
low-level words-based functions that require all inputs have the same
width. The BIGNUM versions which RSA and, for now, EC calls are
implemented on top of that.
Unfortunately, doing such things in constant-time and accounting for
undersized inputs requires some scratch space, and these functions don't
take BN_CTX. So I've added internal bn_mod_*_quick_ctx functions that
take a BN_CTX and the old functions now allocate a bit unnecessarily.
RSA only needs lshift (for BN_MONT_CTX) and sub (for CRT), but the
generic EC code wants add as well.
The generic EC code isn't even remotely constant-time, and I hope to
ultimately use stack-allocated EC_FELEMs, so I've made the actual
implementations here implemented in "words", which is much simpler
anyway due to not having to take care of widths.
I've also gone ahead and switched the EC code to these functions,
largely as a test of their performance (an earlier iteration made the EC
code noticeably slower). These operations are otherwise not
performance-critical in RSA.
The conversion from BIGNUM to BIGNUM+BN_CTX should be dropped by the
static linker already, and the unused BIGNUM+BN_CTX functions will fall
off when EC_FELEM happens.
Update-Note: BN_mod_*_quick bounce on malloc a bit now, but they're not
really used externally. The one caller I found was wpa_supplicant
which bounces on malloc already. They appear to be implementing
compressed coordinates by hand? We may be able to convince them to
call EC_POINT_set_compressed_coordinates_GFp.
Bug: 233, 236
Change-Id: I2bf361e9c089e0211b97d95523dbc06f1168e12b
Reviewed-on: https://boringssl-review.googlesource.com/25261
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
This has no behavior change, but it has a semantic one. This CL is an
assertion that all BIGNUM functions tolerate non-minimal BIGNUMs now.
Specifically:
- Functions that do not touch top/width are assumed to not care.
- Functions that do touch top/width will be changed by this CL. These
should be checked in review that they tolerate non-minimal BIGNUMs.
Subsequent CLs will start adjusting the widths that BIGNUM functions
output, to fix timing leaks.
Bug: 232
Change-Id: I3a2b41b071f2174452f8d3801bce5c78947bb8f7
Reviewed-on: https://boringssl-review.googlesource.com/25257
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
The order (and later the field) are used to size stack-allocated fixed
width word arrays. They're also entirely public, so this is fine.
Bug: 232
Change-Id: Ie98869cdbbdfea92dcad64a300f7e0b47bef6bf2
Reviewed-on: https://boringssl-review.googlesource.com/25256
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
(Just happened to see these as I went by.)
Change-Id: I348b163e6986bfca8b58e56885c35a813efe28f6
Reviewed-on: https://boringssl-review.googlesource.com/25725
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Processing off-curve points is sufficiently dangerous to worry about
code that doesn't check the return value of
|EC_POINT_set_affine_coordinates| and |EC_POINT_oct2point|. While we
have integrated on-curve checks into these functions, code that ignores
the return value will still be able to work with an invalid point
because it's already been installed in the output by the time the check
is done.
Instead, in the event of an off-curve point, set the output point to the
generator, which is certainly on the curve and hopefully safe.
Change-Id: Ibc73dceb2d8d21920e07c4f6def2c8249cb78ca0
Reviewed-on: https://boringssl-review.googlesource.com/25724
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
These empty states aren't any use to either caller or implementor.
Change-Id: If0b748afeeb79e4a1386182e61c5b5ecf838de62
Reviewed-on: https://boringssl-review.googlesource.com/25254
Reviewed-by: Adam Langley <agl@google.com>
This makes it easier going to and from non-minimal BIGNUMs and words
without worrying about the widths which are ultimately to become less
friendly.
Bug: 232
Change-Id: Ia57cb29164c560b600573c27b112ad9375a86aad
Reviewed-on: https://boringssl-review.googlesource.com/25245
Reviewed-by: Adam Langley <agl@google.com>
The EC_POINTs are still allocated (for now), but everything else fits on
the stack nicely, which saves a lot of fiddling with cleanup and
allocations.
Change-Id: Ib8480737ecc97e6b40b2c05f217cd8d3dc82cb72
Reviewed-on: https://boringssl-review.googlesource.com/25150
Reviewed-by: Adam Langley <agl@google.com>
This is to simplify clearing unnecessary mallocs out of ec_wNAF_mul, and
perhaps to use it in tuned variable-time multiplication functions.
Change-Id: Ic390d2e8e20d0ee50f3643830a582e94baebba95
Reviewed-on: https://boringssl-review.googlesource.com/25149
Reviewed-by: Adam Langley <agl@google.com>
This cuts out another total_num-length array and simplifies things.
Leading zeros at the front of the schedule don't do anything, so it's
easier to just produce a fixed-length one. (I'm also hoping to
ultimately reuse this function in //third_party/fiat/p256.c and get the
best of both worlds for ECDSA verification; tuned field arithmetic
operations, precomputed table, and variable-time multiply.)
Change-Id: I771f4ff7dcfdc3ee0eff8d9038d6dc9a0be3d4e0
Reviewed-on: https://boringssl-review.googlesource.com/25148
Reviewed-by: Adam Langley <agl@google.com>
Note this switches from walking BN_num_bits to the full bit length of
the scalar. But that can only cause it to add a few extra zeros to the
front of the schedule, which r_is_at_infinity will skip over.
Change-Id: I91e087c9c03505566b68f75fb37dfb53db467652
Reviewed-on: https://boringssl-review.googlesource.com/25147
Reviewed-by: Adam Langley <agl@google.com>
This appears to be pointless. Before, we would have a 50% chance of
doing an inversion at each non-zero bit but the first
(r_is_at_infinity), plus a 50% chance of doing an inversion at the end.
Now we would have a 50% chance of doing an inversion at each non-zero
bit. That's the same number of coin flips.
Change-Id: I8158fd48601cb041188826d4f68ac1a31a6fbbbc
Reviewed-on: https://boringssl-review.googlesource.com/25146
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
The optimization for wsize = 1 only kicks in for 19-bit primes. The
cases for b >= 800 and cannot happen due to EC_MAX_SCALAR_BYTES.
Change-Id: If5ca908563f027172cdf31c9a22342152fecd12f
Reviewed-on: https://boringssl-review.googlesource.com/25145
Reviewed-by: Adam Langley <agl@google.com>
Simplify things slightly. The probability of the scalar being small
enough to go down a window size is astronomically small. (2^-186 for
P-256 and 2^-84 for P-384.)
Change-Id: Ie879f0b06bcfd1e6e6e3bf3f54e0d7d6567525a4
Reviewed-on: https://boringssl-review.googlesource.com/25144
Reviewed-by: Adam Langley <agl@google.com>
The P-224 implementation was missing the optimization to avoid doing
extra work when asking for only one coordinate (ECDH and ECDSA both
involve an x-coordinate query). The P-256 implementation was missing the
optimization to do one less Montgomery reduction.
TODO - Benchmarks
Change-Id: I268d9c24737c6da9efaf1c73395b73dd97355de7
Reviewed-on: https://boringssl-review.googlesource.com/24690
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
EC_POINT_set_affine_coordinates_GFp already rejects coordinates which
are out of range. There's no need to double-check.
Change-Id: Id1685355c555dda66d2a14125cb0083342f37e53
Reviewed-on: https://boringssl-review.googlesource.com/24688
Reviewed-by: Adam Langley <agl@google.com>
p224-64.c can just write straight into the EC_POINT, as the other files
do, which saves the mess around BN_CTX. It's also more correct.
ec_point_set_Jprojective_coordinates_GFp abstracts out field_encode, but
then we would want to abstract out field_decode too when reading.
That then allows us to inline ec_point_set_Jprojective_coordinates_GFp
into ec_GFp_simple_point_set_affine_coordinates and get rid of an
unnecessary tower of helper functions. Also we can use the precomputed
value of one rather than recompute it each time.
Change-Id: I8282dc66a4a437f5a3b6a1a59cc39be4cb71ccf9
Reviewed-on: https://boringssl-review.googlesource.com/24687
Reviewed-by: Adam Langley <agl@google.com>
All the messing around with field_mul and field_sqr does the same thing
as calling EC_GROUP_get_curve_GFp. This is in preparation for ultimately
moving the field elements to an EC_FELEM type.
Where we draw the BIGNUM / EC_FELEM line determines what EC_FELEM
operations we need. Since we don't care much about the performance of
this function, leave it in BIGNUM so we don't need an EC_FELEM
BN_mod_sqrt just yet. We can push it down later if we feel so inclined.
Change-Id: Iec07240d40828df6b7a29fd1f430e3b390d5f506
Reviewed-on: https://boringssl-review.googlesource.com/24686
Reviewed-by: Adam Langley <agl@google.com>
This is to simplify
https://boringssl-review.googlesource.com/c/boringssl/+/24445/.
Setting or changing an EC_KEY's group after the public or private keys
have been configured is quite awkward w.r.t. consistency checks. It
becomes additionally messy if we mean to store private keys as
EC_SCALARs (and avoid the BIGNUM timing leak), whose size is
curve-dependent.
Instead, require that callers configure the group before setting either
half of the keypair. Additionally, reject EC_KEY_set_group calls that
change the group. This will simplify clearing one more BIGNUM timing
leak.
Update-Note: This will break code which sets the group and key in a
weird order. I checked calls of EC_KEY_new and confirmed they all
set the group first. If I missed any, let me know.
Change-Id: Ie89f90a318b31b6b98f71138e5ff3de5323bc9a6
Reviewed-on: https://boringssl-review.googlesource.com/24425
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
It actually works fine. I just forgot one of the typedefs last time.
This gives a roughly 2x improvement on P-256 in clang-cl +
OPENSSL_SMALL, the configuration used by Chrome.
Before:
Did 1302 ECDH P-256 operations in 1015000us (1282.8 ops/sec)
Did 4250 ECDSA P-256 signing operations in 1047000us (4059.2 ops/sec)
Did 1750 ECDSA P-256 verify operations in 1094000us (1599.6 ops/sec)
After:
Did 3250 ECDH P-256 operations in 1078000us (3014.8 ops/sec)
Did 8250 ECDSA P-256 signing operations in 1016000us (8120.1 ops/sec)
Did 3250 ECDSA P-256 verify operations in 1063000us (3057.4 ops/sec)
(These were taken on a VM, so the measurements are extremely noisy, but
this sort of improvement is visible regardless.)
Alas, we do need a little extra bit of fiddling because division does
not work (crbug.com/787617).
Bug: chromium:787617
Update-Note: This removes the MSan uint128_t workaround which does not
appear to be necessary anymore.
Change-Id: I8361314608521e5bdaf0e7eeae7a02c33f55c69f
Reviewed-on: https://boringssl-review.googlesource.com/23984
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
The fiat-crypto-generated code uses the Montgomery form implementation
strategy, for both 32-bit and 64-bit code.
64-bit throughput seems slower, but the difference is smaller than noise between repetitions (-2%?)
32-bit throughput has decreased significantly for ECDH (-40%). I am
attributing this to the change from varibale-time scalar multiplication
to constant-time scalar multiplication. Due to the same bottleneck,
ECDSA verification still uses the old code (otherwise there would have
been a 60% throughput decrease). On the other hand, ECDSA signing
throughput has increased slightly (+10%), perhaps due to the use of a
precomputed table of multiples of the base point.
64-bit benchmarks (Google Cloud Haswell):
with this change:
Did 9126 ECDH P-256 operations in 1009572us (9039.5 ops/sec)
Did 23000 ECDSA P-256 signing operations in 1039832us (22119.0 ops/sec)
Did 8820 ECDSA P-256 verify operations in 1024242us (8611.2 ops/sec)
master (40e8c921ca):
Did 9340 ECDH P-256 operations in 1017975us (9175.1 ops/sec)
Did 23000 ECDSA P-256 signing operations in 1039820us (22119.2 ops/sec)
Did 8688 ECDSA P-256 verify operations in 1021108us (8508.4 ops/sec)
benchmarks on ARMv7 (LG Nexus 4):
with this change:
Did 150 ECDH P-256 operations in 1029726us (145.7 ops/sec)
Did 506 ECDSA P-256 signing operations in 1065192us (475.0 ops/sec)
Did 363 ECDSA P-256 verify operations in 1033298us (351.3 ops/sec)
master (2fce1beda0):
Did 245 ECDH P-256 operations in 1017518us (240.8 ops/sec)
Did 473 ECDSA P-256 signing operations in 1086281us (435.4 ops/sec)
Did 360 ECDSA P-256 verify operations in 1003846us (358.6 ops/sec)
64-bit tables converted as follows:
import re, sys, math
p = 2**256 - 2**224 + 2**192 + 2**96 - 1
R = 2**256
def convert(t):
x0, s1, x1, s2, x2, s3, x3 = t.groups()
v = int(x0, 0) + 2**64 * (int(x1, 0) + 2**64*(int(x2,0) + 2**64*(int(x3, 0)) ))
w = v*R%p
y0 = hex(w%(2**64))
y1 = hex((w>>64)%(2**64))
y2 = hex((w>>(2*64))%(2**64))
y3 = hex((w>>(3*64))%(2**64))
ww = int(y0, 0) + 2**64 * (int(y1, 0) + 2**64*(int(y2,0) + 2**64*(int(y3, 0)) ))
if ww != v*R%p:
print(x0,x1,x2,x3)
print(hex(v))
print(y0,y1,y2,y3)
print(hex(w))
print(hex(ww))
assert 0
return '{'+y0+s1+y1+s2+y2+s3+y3+'}'
fe_re = re.compile('{'+r'(\s*,\s*)'.join(r'(\d+|0x[abcdefABCDEF0123456789]+)' for i in range(4)) + '}')
print (re.sub(fe_re, convert, sys.stdin.read()).rstrip('\n'))
32-bit tables converted from 64-bit tables
Change-Id: I52d6e5504fcb6ca2e8b0ee13727f4500c80c1799
Reviewed-on: https://boringssl-review.googlesource.com/23244
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Along the way, this allows us to tidy up the invariants associated with
EC_SCALAR. They were fuzzy around ec_point_mul_scalar and some
computations starting from the digest in ECDSA. The latter I've put into
the type system with EC_LOOSE_SCALAR.
As for the former, Andres points out that particular EC implementations
are only good for scalars within a certain range, otherwise you may need
extra work to avoid the doubling case. To simplify curve
implementations, we reduce them fully rather than do the looser bit size
check, so they can have the stronger precondition to work with.
Change-Id: Iff9a0404f89adf8f7f914f8e8246c9f3136453f1
Reviewed-on: https://boringssl-review.googlesource.com/23664
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
We check that the private key is less than the order, but we forgot the
other end.
Update-Note: It's possible some caller was relying on this, but since
that function already checked the other half of the range, I'm
expecting this to be a no-op change.
Change-Id: I4a53357d7737735b3cfbe97d379c8ca4eca5d5ac
Reviewed-on: https://boringssl-review.googlesource.com/23665
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
Rejecting values where we'd previous called BN_nnmod may have been
overly ambitious. In the long run, all the supported ECC APIs (ECDSA*,
ECDH_compute_key, and probably some additional new ECDH API) will be
using the EC_SCALAR version anyway, so this doesn't really matter.
Change-Id: I79cd4015f2d6daf213e4413caa2a497608976f93
Reviewed-on: https://boringssl-review.googlesource.com/23584
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Hasse's theorem implies at most one subtraction is necessary. This is
still using BIGNUM for now because field elements
(EC_POINT_get_affine_coordinates_GFp) are BIGNUMs.
This gives an additional 2% speedup for signing.
Before:
Did 16000 ECDSA P-224 signing operations in 1064799us (15026.3 ops/sec)
Did 19000 ECDSA P-256 signing operations in 1007839us (18852.2 ops/sec)
Did 1078 ECDSA P-384 signing operations in 1079413us (998.7 ops/sec)
Did 484 ECDSA P-521 signing operations in 1083616us (446.7 ops/sec)
After:
Did 16000 ECDSA P-224 signing operations in 1054918us (15167.1 ops/sec)
Did 20000 ECDSA P-256 signing operations in 1037338us (19280.1 ops/sec)
Did 1045 ECDSA P-384 signing operations in 1049073us (996.1 ops/sec)
Did 484 ECDSA P-521 signing operations in 1085492us (445.9 ops/sec)
Change-Id: I2bfe214f968eca7a8e317928c0f3daf1a14bca90
Reviewed-on: https://boringssl-review.googlesource.com/23076
Reviewed-by: Adam Langley <agl@google.com>
None of the asymmetric crypto we inherented from OpenSSL is
constant-time because of BIGNUM. BIGNUM chops leading zeros off the
front of everything, so we end up leaking information about the first
word, in theory. BIGNUM functions additionally tend to take the full
range of inputs and then call into BN_nnmod at various points.
All our secret values should be acted on in constant-time, but k in
ECDSA is a particularly sensitive value. So, ecdsa_sign_setup, in an
attempt to mitigate the BIGNUM leaks, would add a couple copies of the
order.
This does not work at all. k is used to compute two values: k^-1 and kG.
The first operation when computing k^-1 is to call BN_nnmod if k is out
of range. The entry point to our tuned constant-time curve
implementations is to call BN_nnmod if the scalar has too many bits,
which this causes. The result is both corrections are immediately undone
but cause us to do more variable-time work in the meantime.
Replace all these computations around k with the word-based functions
added in the various preceding CLs. In doing so, replace the BN_mod_mul
calls (which internally call BN_nnmod) with Montgomery reduction. We can
avoid taking k^-1 out of Montgomery form, which combines nicely with
Brian Smith's trick in 3426d10119. Along
the way, we avoid some unnecessary mallocs.
BIGNUM still affects the private key itself, as well as the EC_POINTs.
But this should hopefully be much better now. Also it's 10% faster:
Before:
Did 15000 ECDSA P-224 signing operations in 1069117us (14030.3 ops/sec)
Did 18000 ECDSA P-256 signing operations in 1053908us (17079.3 ops/sec)
Did 1078 ECDSA P-384 signing operations in 1087853us (990.9 ops/sec)
Did 473 ECDSA P-521 signing operations in 1069835us (442.1 ops/sec)
After:
Did 16000 ECDSA P-224 signing operations in 1064799us (15026.3 ops/sec)
Did 19000 ECDSA P-256 signing operations in 1007839us (18852.2 ops/sec)
Did 1078 ECDSA P-384 signing operations in 1079413us (998.7 ops/sec)
Did 484 ECDSA P-521 signing operations in 1083616us (446.7 ops/sec)
Change-Id: I2a25e90fc99dac13c0616d0ea45e125a4bd8cca1
Reviewed-on: https://boringssl-review.googlesource.com/23075
Reviewed-by: Adam Langley <agl@google.com>
Change-Id: Id12ab478b6ba441fb1b6f4c2f9479384fc3fbdb6
Reviewed-on: https://boringssl-review.googlesource.com/23144
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
|EC_POINT_mul| is almost exclusively used with reduced scalars, with
this exception. This comes from consumers following NIST SP 800-56A
section 5.6.2.3.2. (Though all our curves have cofactor one, so this
check isn't useful.)
Add a test for this so we don't accidentally break it.
Change-Id: I42492db38a1ea03acec4febdd7945c8a3933530a
Reviewed-on: https://boringssl-review.googlesource.com/23084
Reviewed-by: Adam Langley <agl@google.com>
I still need to revive the original CL, but right now I'm interested in
giving every EC_GROUP an order_mont and having different ownership of
that field between built-in and custom groups is kind of a nuisance. If
I'm going to do that anyway, better to avoid computing the entire
EC_GROUP in one go.
I'm using some manual locking rather than CRYPTO_once here so that it
behaves well in the face of malloc errors. Not that we especially care,
but it was easy to do.
This speeds up our ECDH benchmark a bit which otherwise must construct the
EC_GROUP each time (matching real world usage).
Before:
Did 7619 ECDH P-224 operations in 1003190us (7594.8 ops/sec)
Did 7518 ECDH P-256 operations in 1060844us (7086.8 ops/sec)
Did 572 ECDH P-384 operations in 1055878us (541.7 ops/sec)
Did 264 ECDH P-521 operations in 1062375us (248.5 ops/sec)
After:
Did 8415 ECDH P-224 operations in 1066695us (7888.9 ops/sec)
Did 7952 ECDH P-256 operations in 1022819us (7774.6 ops/sec)
Did 572 ECDH P-384 operations in 1055817us (541.8 ops/sec)
Did 264 ECDH P-521 operations in 1060008us (249.1 ops/sec)
Bug: 20
Change-Id: I7446cd0a69a840551dcc2dfabadde8ee1e3ff3e2
Reviewed-on: https://boringssl-review.googlesource.com/23073
Reviewed-by: Adam Langley <agl@google.com>
Later code will take advantage of these invariants. Enforcing them on
custom curves avoids making them go through a custom codepath.
Change-Id: I23cee72a90c2e4846b41e03e6be26bc3abeb4a45
Reviewed-on: https://boringssl-review.googlesource.com/23072
Reviewed-by: Adam Langley <agl@google.com>
Currently we only check that the underlying EC_METHODs match, which
avoids the points being in different forms, but not that the points are
on the same curves. (We fixed the APIs early on so off-curve EC_POINTs
cannot be created.)
In particular, this comes up with folks implementating Java's crypto
APIs with ECDH_compute_key. These APIs are both unfortunate and should
not be mimicked, as they allow folks to mismatch the groups on the two
multiple EC_POINTs. Instead, ECDH APIs should take the public value as a
byte string.
Thanks also to Java's poor crypto APIs, we must support custom curves,
which makes this particularly gnarly. This CL makes EC_GROUP_cmp work
with custom curves and adds an additional subtle requirement to
EC_GROUP_set_generator.
Annoyingly, this change is additionally subtle because we now have a
reference cycle to hack around.
Change-Id: I2efbc4bd5cb65fee5f66527bd6ccad6b9d5120b9
Reviewed-on: https://boringssl-review.googlesource.com/22245
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
I really need to resurrect the CL to make them entirely static
(https://crbug.com/boringssl/20), but, in the meantime, to make
replacing the EC_METHOD pointer in EC_POINT with EC_GROUP not
*completely* insane, make them refcounted.
OpenSSL did not do this because their EC_GROUPs are mutable
(EC_GROUP_set_asn1_flag and EC_GROUP_set_point_conversion_form). Ours
are immutable but for the two-function dance around custom curves (more
of OpenSSL's habit of making their objects too complex), which is good
enough to refcount.
Change-Id: I3650993737a97da0ddcf0e5fb7a15876e724cadc
Reviewed-on: https://boringssl-review.googlesource.com/22244
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This reverts commit f6942f0d22.
Reason for revert: This doesn't actually work in clang-cl. I
forgot we didn't have the clang-cl try bots enabled! :-( I
believe __asm__ is still okay, but I'll try it by hand
tomorrow.
Original change's description:
> Use uint128_t and __asm__ in clang-cl.
>
> clang-cl does not define __GNUC__ but is still a functioning clang. We
> should be able to use our uint128_t and __asm__ code in it on Windows.
>
> Change-Id: I67310ee68baa0c0c947b2441c265b019ef12af7e
> Reviewed-on: https://boringssl-review.googlesource.com/22184
> Commit-Queue: Adam Langley <agl@google.com>
> Reviewed-by: Adam Langley <agl@google.com>
> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
TBR=agl@google.com,davidben@google.com
Change-Id: I5c7e0391cd9c2e8cc0dfde37e174edaf5d17db22
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://boringssl-review.googlesource.com/22224
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
clang-cl does not define __GNUC__ but is still a functioning clang. We
should be able to use our uint128_t and __asm__ code in it on Windows.
Change-Id: I67310ee68baa0c0c947b2441c265b019ef12af7e
Reviewed-on: https://boringssl-review.googlesource.com/22184
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
crypto/{asn1,x509,x509v3,pem} were skipped as they are still OpenSSL
style.
Change-Id: I3cd9a60e1cb483a981aca325041f3fbce294247c
Reviewed-on: https://boringssl-review.googlesource.com/19504
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
It's confusing to have both mont and mont_data on EC_GROUP. The
documentation was also wrong.
Change-Id: I4e2e3169ed79307018212fba51d015bbbe5c4227
Reviewed-on: https://boringssl-review.googlesource.com/10348
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Using ADX instructions requires relatively new assemblers. Conscrypt are
currently using Yasm 1.2.0. Revert these for the time being to unbreak
their build.
Change-Id: Iaba5761ccedcafaffb5ca79a8eaf7fa565583c32
Reviewed-on: https://boringssl-review.googlesource.com/19244
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
We can test these with Intel SDE now. The AVX2 code just affects the two
select functions while the ADX code is a separate implementation.
Haswell numbers:
Before:
Did 84630 ECDH P-256 operations in 10031494us (8436.4 ops/sec)
Did 206000 ECDSA P-256 signing operations in 10015055us (20569.0 ops/sec)
Did 77256 ECDSA P-256 verify operations in 10064556us (7676.0 ops/sec)
After:
Did 86112 ECDH P-256 operations in 10015008us (8598.3 ops/sec)
Did 211000 ECDSA P-256 signing operations in 10025104us (21047.2 ops/sec)
Did 79344 ECDSA P-256 verify operations in 10017076us (7920.9 ops/sec)
Skylake numbers:
Before:
Did 75684 ECDH P-256 operations in 10016019us (7556.3 ops/sec)
Did 185000 ECDSA P-256 signing operations in 10012090us (18477.7 ops/sec)
Did 72885 ECDSA P-256 verify operations in 10027154us (7268.8 ops/sec)
After:
Did 89598 ECDH P-256 operations in 10032162us (8931.1 ops/sec)
Did 203000 ECDSA P-256 signing operations in 10019739us (20260.0 ops/sec)
Did 87040 ECDSA P-256 verify operations in 10000441us (8703.6 ops/sec)
The code was slightly patched for delocate.go compatibility.
Change-Id: Ic44ced4eca65c656bbe07d5a7fee91ec6925eb59
Reviewed-on: https://boringssl-review.googlesource.com/18967
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
The AVX2 code has alignment requirements.
Change-Id: Ieb0774f7595a76eef0f3a15aabd63d056bbaa463
Reviewed-on: https://boringssl-review.googlesource.com/18966
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This isn't all of our pointer games by far, but for any code which
doesn't run on armv6, memcpy and pointer cast compile to the same code.
For code with does care about armv6 (do we care?), it'll need a bit more
work. armv6 makes memcpy into a function call.
Ironically, the one platform where C needs its alignment rules is the
one platform that makes it hard to honor C's alignment rules.
Change-Id: Ib9775aa4d9df9381995df8698bd11eb260aac58c
Reviewed-on: https://boringssl-review.googlesource.com/17707
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This is the last of the non-GTest tests. We never did end up writing
example files or doc.go tooling for them. And probably examples should
be in C++ at this point.
Bug: 129
Change-Id: Icbc43c9639cfed7423df20df1cdcb8c35f23fc1a
Reviewed-on: https://boringssl-review.googlesource.com/17669
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Change-Id: I8512c6bfb62f1a83afc8f763d681bf5db3b4ceae
Reviewed-on: https://boringssl-review.googlesource.com/17144
Commit-Queue: Adam Langley <alangley@gmail.com>
Reviewed-by: David Benjamin <davidben@google.com>
Drop some redundant instructions in reduction in ecp_nistz256_sqr_montx.
(Imported from upstream's 8fc063dcc9668589fd95533d25932396d60987f9.)
I believe this is a no-op for us as we do not currently enable the
ADX-based optimizations.
Change-Id: I34a5f5ffb965d59c67f6b9f0ca7937e49ba6e820
Reviewed-on: https://boringssl-review.googlesource.com/16884
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
The only place it is used is EC_KEY_{dup,copy} and no one calls that
function on an EC_KEY with ex_data. This aligns with functions like
RSAPublicKey_dup which do not copy ex_data. The logic is also somewhat
subtle in the face of malloc errors (upstream's PR 3323).
In fact, we'd even changed the function pointer signature from upstream,
so BoringSSL-only code is needed to pass this pointer in anyway. (I
haven't switched it to CRYPTO_EX_unused because there are some callers
which pass in an implementation anyway.)
Note, in upstream, the dup hook is also used for SSL_SESSIONs when those
are duplicated (for TLS 1.2 ticket renewal or TLS 1.3 resumption). Our
interpretation is that callers should treat those SSL_SESSIONs
equivalently to newly-established ones. This avoids every consumer
providing a dup hook and simplifies the interface.
(I've gone ahead and removed the TODO(fork). I don't think we'll be able
to change this API. Maybe introduce a new one, but it may not be worth
it? Then again, this API is atrocious... I've never seen anyone use argl
and argp even.)
BUG=21
Change-Id: I6c9e9d5a02347cb229d4c084c1e85125bd741d2b
Reviewed-on: https://boringssl-review.googlesource.com/16344
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This avoids depending the FIPS module on crypto/bytestring and moves
ECDSA_SIG_{new,free} into the module.
Change-Id: I7b45ef07f1140873a0da300501141b6ae272a5d9
Reviewed-on: https://boringssl-review.googlesource.com/15984
Reviewed-by: Adam Langley <agl@google.com>
The names in the P-224 code collided with the P-256 code and thus many
of the functions and constants in the P-224 code have been prefixed.
Change-Id: I6bcd304640c539d0483d129d5eaf1702894929a8
Reviewed-on: https://boringssl-review.googlesource.com/15847
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>