We have little-endian BIGNUM functions now.
Change-Id: Iffc46a14e75c6bba2e170b824b1a08c69d2e9d18
Reviewed-on: https://boringssl-review.googlesource.com/27594
Reviewed-by: Adam Langley <alangley@gmail.com>
This introduces a hook for the OpenSSL assembly.
Change-Id: I35e0588f0ed5bed375b12f738d16c9f46ceedeea
Reviewed-on: https://boringssl-review.googlesource.com/27592
Reviewed-by: Adam Langley <alangley@gmail.com>
This is in preparation for representing field elements with
stack-allocated types in the generic code. While there is likely little
benefit in threading all the turned field arithmetic through all the
generic code, and the P-224 logic, in particular, does not have a tight
enough abstraction for this, the current implementations depend on
BN_div, which is not compatible with stack-allocating things and avoiding
malloc.
This also speeds things up slightly, now that benchmarks cover point
validation.
Before:
Did 82786 ECDH P-224 operations in 10024326us (8258.5 ops/sec)
After:
Did 89991 ECDH P-224 operations in 10012429us (8987.9 ops/sec)
Change-Id: I468483b49f5dc69187aebd62834365ce5caab795
Reviewed-on: https://boringssl-review.googlesource.com/26971
Reviewed-by: Adam Langley <agl@google.com>
p224_felem_neg does not produce an output within the tight bounds
suitable for p224_felem_contract. This was found by inspection of the
code.
This only affects the final y-coordinate output of arbitrary-point
multiplication, so it is a no-op for ECDH and ECDSA.
Change-Id: I1d929458d1f21d02cd8e745d2f0f7040a6bb0627
Reviewed-on: https://boringssl-review.googlesource.com/26847
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
The P-224 implementation was missing the optimization to avoid doing
extra work when asking for only one coordinate (ECDH and ECDSA both
involve an x-coordinate query). The P-256 implementation was missing the
optimization to do one less Montgomery reduction.
TODO - Benchmarks
Change-Id: I268d9c24737c6da9efaf1c73395b73dd97355de7
Reviewed-on: https://boringssl-review.googlesource.com/24690
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
p224-64.c can just write straight into the EC_POINT, as the other files
do, which saves the mess around BN_CTX. It's also more correct.
ec_point_set_Jprojective_coordinates_GFp abstracts out field_encode, but
then we would want to abstract out field_decode too when reading.
That then allows us to inline ec_point_set_Jprojective_coordinates_GFp
into ec_GFp_simple_point_set_affine_coordinates and get rid of an
unnecessary tower of helper functions. Also we can use the precomputed
value of one rather than recompute it each time.
Change-Id: I8282dc66a4a437f5a3b6a1a59cc39be4cb71ccf9
Reviewed-on: https://boringssl-review.googlesource.com/24687
Reviewed-by: Adam Langley <agl@google.com>
It actually works fine. I just forgot one of the typedefs last time.
This gives a roughly 2x improvement on P-256 in clang-cl +
OPENSSL_SMALL, the configuration used by Chrome.
Before:
Did 1302 ECDH P-256 operations in 1015000us (1282.8 ops/sec)
Did 4250 ECDSA P-256 signing operations in 1047000us (4059.2 ops/sec)
Did 1750 ECDSA P-256 verify operations in 1094000us (1599.6 ops/sec)
After:
Did 3250 ECDH P-256 operations in 1078000us (3014.8 ops/sec)
Did 8250 ECDSA P-256 signing operations in 1016000us (8120.1 ops/sec)
Did 3250 ECDSA P-256 verify operations in 1063000us (3057.4 ops/sec)
(These were taken on a VM, so the measurements are extremely noisy, but
this sort of improvement is visible regardless.)
Alas, we do need a little extra bit of fiddling because division does
not work (crbug.com/787617).
Bug: chromium:787617
Update-Note: This removes the MSan uint128_t workaround which does not
appear to be necessary anymore.
Change-Id: I8361314608521e5bdaf0e7eeae7a02c33f55c69f
Reviewed-on: https://boringssl-review.googlesource.com/23984
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
The fiat-crypto-generated code uses the Montgomery form implementation
strategy, for both 32-bit and 64-bit code.
64-bit throughput seems slower, but the difference is smaller than noise between repetitions (-2%?)
32-bit throughput has decreased significantly for ECDH (-40%). I am
attributing this to the change from varibale-time scalar multiplication
to constant-time scalar multiplication. Due to the same bottleneck,
ECDSA verification still uses the old code (otherwise there would have
been a 60% throughput decrease). On the other hand, ECDSA signing
throughput has increased slightly (+10%), perhaps due to the use of a
precomputed table of multiples of the base point.
64-bit benchmarks (Google Cloud Haswell):
with this change:
Did 9126 ECDH P-256 operations in 1009572us (9039.5 ops/sec)
Did 23000 ECDSA P-256 signing operations in 1039832us (22119.0 ops/sec)
Did 8820 ECDSA P-256 verify operations in 1024242us (8611.2 ops/sec)
master (40e8c921ca):
Did 9340 ECDH P-256 operations in 1017975us (9175.1 ops/sec)
Did 23000 ECDSA P-256 signing operations in 1039820us (22119.2 ops/sec)
Did 8688 ECDSA P-256 verify operations in 1021108us (8508.4 ops/sec)
benchmarks on ARMv7 (LG Nexus 4):
with this change:
Did 150 ECDH P-256 operations in 1029726us (145.7 ops/sec)
Did 506 ECDSA P-256 signing operations in 1065192us (475.0 ops/sec)
Did 363 ECDSA P-256 verify operations in 1033298us (351.3 ops/sec)
master (2fce1beda0):
Did 245 ECDH P-256 operations in 1017518us (240.8 ops/sec)
Did 473 ECDSA P-256 signing operations in 1086281us (435.4 ops/sec)
Did 360 ECDSA P-256 verify operations in 1003846us (358.6 ops/sec)
64-bit tables converted as follows:
import re, sys, math
p = 2**256 - 2**224 + 2**192 + 2**96 - 1
R = 2**256
def convert(t):
x0, s1, x1, s2, x2, s3, x3 = t.groups()
v = int(x0, 0) + 2**64 * (int(x1, 0) + 2**64*(int(x2,0) + 2**64*(int(x3, 0)) ))
w = v*R%p
y0 = hex(w%(2**64))
y1 = hex((w>>64)%(2**64))
y2 = hex((w>>(2*64))%(2**64))
y3 = hex((w>>(3*64))%(2**64))
ww = int(y0, 0) + 2**64 * (int(y1, 0) + 2**64*(int(y2,0) + 2**64*(int(y3, 0)) ))
if ww != v*R%p:
print(x0,x1,x2,x3)
print(hex(v))
print(y0,y1,y2,y3)
print(hex(w))
print(hex(ww))
assert 0
return '{'+y0+s1+y1+s2+y2+s3+y3+'}'
fe_re = re.compile('{'+r'(\s*,\s*)'.join(r'(\d+|0x[abcdefABCDEF0123456789]+)' for i in range(4)) + '}')
print (re.sub(fe_re, convert, sys.stdin.read()).rstrip('\n'))
32-bit tables converted from 64-bit tables
Change-Id: I52d6e5504fcb6ca2e8b0ee13727f4500c80c1799
Reviewed-on: https://boringssl-review.googlesource.com/23244
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
None of the asymmetric crypto we inherented from OpenSSL is
constant-time because of BIGNUM. BIGNUM chops leading zeros off the
front of everything, so we end up leaking information about the first
word, in theory. BIGNUM functions additionally tend to take the full
range of inputs and then call into BN_nnmod at various points.
All our secret values should be acted on in constant-time, but k in
ECDSA is a particularly sensitive value. So, ecdsa_sign_setup, in an
attempt to mitigate the BIGNUM leaks, would add a couple copies of the
order.
This does not work at all. k is used to compute two values: k^-1 and kG.
The first operation when computing k^-1 is to call BN_nnmod if k is out
of range. The entry point to our tuned constant-time curve
implementations is to call BN_nnmod if the scalar has too many bits,
which this causes. The result is both corrections are immediately undone
but cause us to do more variable-time work in the meantime.
Replace all these computations around k with the word-based functions
added in the various preceding CLs. In doing so, replace the BN_mod_mul
calls (which internally call BN_nnmod) with Montgomery reduction. We can
avoid taking k^-1 out of Montgomery form, which combines nicely with
Brian Smith's trick in 3426d10119. Along
the way, we avoid some unnecessary mallocs.
BIGNUM still affects the private key itself, as well as the EC_POINTs.
But this should hopefully be much better now. Also it's 10% faster:
Before:
Did 15000 ECDSA P-224 signing operations in 1069117us (14030.3 ops/sec)
Did 18000 ECDSA P-256 signing operations in 1053908us (17079.3 ops/sec)
Did 1078 ECDSA P-384 signing operations in 1087853us (990.9 ops/sec)
Did 473 ECDSA P-521 signing operations in 1069835us (442.1 ops/sec)
After:
Did 16000 ECDSA P-224 signing operations in 1064799us (15026.3 ops/sec)
Did 19000 ECDSA P-256 signing operations in 1007839us (18852.2 ops/sec)
Did 1078 ECDSA P-384 signing operations in 1079413us (998.7 ops/sec)
Did 484 ECDSA P-521 signing operations in 1083616us (446.7 ops/sec)
Change-Id: I2a25e90fc99dac13c0616d0ea45e125a4bd8cca1
Reviewed-on: https://boringssl-review.googlesource.com/23075
Reviewed-by: Adam Langley <agl@google.com>
I really need to resurrect the CL to make them entirely static
(https://crbug.com/boringssl/20), but, in the meantime, to make
replacing the EC_METHOD pointer in EC_POINT with EC_GROUP not
*completely* insane, make them refcounted.
OpenSSL did not do this because their EC_GROUPs are mutable
(EC_GROUP_set_asn1_flag and EC_GROUP_set_point_conversion_form). Ours
are immutable but for the two-function dance around custom curves (more
of OpenSSL's habit of making their objects too complex), which is good
enough to refcount.
Change-Id: I3650993737a97da0ddcf0e5fb7a15876e724cadc
Reviewed-on: https://boringssl-review.googlesource.com/22244
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
crypto/{asn1,x509,x509v3,pem} were skipped as they are still OpenSSL
style.
Change-Id: I3cd9a60e1cb483a981aca325041f3fbce294247c
Reviewed-on: https://boringssl-review.googlesource.com/19504
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This isn't all of our pointer games by far, but for any code which
doesn't run on armv6, memcpy and pointer cast compile to the same code.
For code with does care about armv6 (do we care?), it'll need a bit more
work. armv6 makes memcpy into a function call.
Ironically, the one platform where C needs its alignment rules is the
one platform that makes it hard to honor C's alignment rules.
Change-Id: Ib9775aa4d9df9381995df8698bd11eb260aac58c
Reviewed-on: https://boringssl-review.googlesource.com/17707
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
The names in the P-224 code collided with the P-256 code and thus many
of the functions and constants in the P-224 code have been prefixed.
Change-Id: I6bcd304640c539d0483d129d5eaf1702894929a8
Reviewed-on: https://boringssl-review.googlesource.com/15847
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>