The P-224 implementation was missing the optimization to avoid doing
extra work when asking for only one coordinate (ECDH and ECDSA both
involve an x-coordinate query). The P-256 implementation was missing the
optimization to do one less Montgomery reduction.
TODO - Benchmarks
Change-Id: I268d9c24737c6da9efaf1c73395b73dd97355de7
Reviewed-on: https://boringssl-review.googlesource.com/24690
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
These are remnants of the old code which had a bunch of ftmp variables.
Change-Id: Id14cf414cb67ff08e240970767f7a5a58e883ce4
Reviewed-on: https://boringssl-review.googlesource.com/24689
Reviewed-by: Adam Langley <agl@google.com>
It requires a handful of additional intrinsics for now.
Fiat's freeze function only works on the tight bounds, so fe_isnonzero
gains an extra fe_carry. But all other calls of fe_tobytes are of tight
bounds anyway.
Change-Id: I834858cee7863c7344e456d7a7dbf4f414f04ae5
Reviewed-on: https://boringssl-review.googlesource.com/24545
Reviewed-by: Adam Langley <agl@google.com>
These date to the old code and have been replaced by the fe and fe_loose
bounds in the header file. Also fix up a comment that the comment
converter didn't manage to convert.
Change-Id: I2e3ea867a8cea2b347d09c304a17e532b2e36545
Reviewed-on: https://boringssl-review.googlesource.com/24525
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Change-Id: Ie4060121f6bc8da07d87db8ec8133ea17e99e1fe
Reviewed-on: https://boringssl-review.googlesource.com/24344
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
It actually works fine. I just forgot one of the typedefs last time.
This gives a roughly 2x improvement on P-256 in clang-cl +
OPENSSL_SMALL, the configuration used by Chrome.
Before:
Did 1302 ECDH P-256 operations in 1015000us (1282.8 ops/sec)
Did 4250 ECDSA P-256 signing operations in 1047000us (4059.2 ops/sec)
Did 1750 ECDSA P-256 verify operations in 1094000us (1599.6 ops/sec)
After:
Did 3250 ECDH P-256 operations in 1078000us (3014.8 ops/sec)
Did 8250 ECDSA P-256 signing operations in 1016000us (8120.1 ops/sec)
Did 3250 ECDSA P-256 verify operations in 1063000us (3057.4 ops/sec)
(These were taken on a VM, so the measurements are extremely noisy, but
this sort of improvement is visible regardless.)
Alas, we do need a little extra bit of fiddling because division does
not work (crbug.com/787617).
Bug: chromium:787617
Update-Note: This removes the MSan uint128_t workaround which does not
appear to be necessary anymore.
Change-Id: I8361314608521e5bdaf0e7eeae7a02c33f55c69f
Reviewed-on: https://boringssl-review.googlesource.com/23984
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
The fiat-crypto-generated code uses the Montgomery form implementation
strategy, for both 32-bit and 64-bit code.
64-bit throughput seems slower, but the difference is smaller than noise between repetitions (-2%?)
32-bit throughput has decreased significantly for ECDH (-40%). I am
attributing this to the change from varibale-time scalar multiplication
to constant-time scalar multiplication. Due to the same bottleneck,
ECDSA verification still uses the old code (otherwise there would have
been a 60% throughput decrease). On the other hand, ECDSA signing
throughput has increased slightly (+10%), perhaps due to the use of a
precomputed table of multiples of the base point.
64-bit benchmarks (Google Cloud Haswell):
with this change:
Did 9126 ECDH P-256 operations in 1009572us (9039.5 ops/sec)
Did 23000 ECDSA P-256 signing operations in 1039832us (22119.0 ops/sec)
Did 8820 ECDSA P-256 verify operations in 1024242us (8611.2 ops/sec)
master (40e8c921ca):
Did 9340 ECDH P-256 operations in 1017975us (9175.1 ops/sec)
Did 23000 ECDSA P-256 signing operations in 1039820us (22119.2 ops/sec)
Did 8688 ECDSA P-256 verify operations in 1021108us (8508.4 ops/sec)
benchmarks on ARMv7 (LG Nexus 4):
with this change:
Did 150 ECDH P-256 operations in 1029726us (145.7 ops/sec)
Did 506 ECDSA P-256 signing operations in 1065192us (475.0 ops/sec)
Did 363 ECDSA P-256 verify operations in 1033298us (351.3 ops/sec)
master (2fce1beda0):
Did 245 ECDH P-256 operations in 1017518us (240.8 ops/sec)
Did 473 ECDSA P-256 signing operations in 1086281us (435.4 ops/sec)
Did 360 ECDSA P-256 verify operations in 1003846us (358.6 ops/sec)
64-bit tables converted as follows:
import re, sys, math
p = 2**256 - 2**224 + 2**192 + 2**96 - 1
R = 2**256
def convert(t):
x0, s1, x1, s2, x2, s3, x3 = t.groups()
v = int(x0, 0) + 2**64 * (int(x1, 0) + 2**64*(int(x2,0) + 2**64*(int(x3, 0)) ))
w = v*R%p
y0 = hex(w%(2**64))
y1 = hex((w>>64)%(2**64))
y2 = hex((w>>(2*64))%(2**64))
y3 = hex((w>>(3*64))%(2**64))
ww = int(y0, 0) + 2**64 * (int(y1, 0) + 2**64*(int(y2,0) + 2**64*(int(y3, 0)) ))
if ww != v*R%p:
print(x0,x1,x2,x3)
print(hex(v))
print(y0,y1,y2,y3)
print(hex(w))
print(hex(ww))
assert 0
return '{'+y0+s1+y1+s2+y2+s3+y3+'}'
fe_re = re.compile('{'+r'(\s*,\s*)'.join(r'(\d+|0x[abcdefABCDEF0123456789]+)' for i in range(4)) + '}')
print (re.sub(fe_re, convert, sys.stdin.read()).rstrip('\n'))
32-bit tables converted from 64-bit tables
Change-Id: I52d6e5504fcb6ca2e8b0ee13727f4500c80c1799
Reviewed-on: https://boringssl-review.googlesource.com/23244
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Each operation was translated from fiat-crypto output using fiat-crypto
prettyprint.py. For example fe_mul is synthesized in
https://github.com/mit-plv/fiat-crypto/blob/master/src/Specific/X25519/C32/femul.v,
and shown in the last Coq-compatible form at
https://github.com/mit-plv/fiat-crypto/blob/master/src/Specific/X25519/C32/femulDisplay.log.
Benchmarks on Google Cloud's unidentified Intel Xeon with AVX2:
git checkout $VARIANT && ( cd build && rm -rf * && CC=clang CXX=clang++ cmake -GNinja -DCMAKE_TOOLCHAIN_FILE=../util/32-bit-toolchain.cmake -DCMAKE_BUILD_TYPE=Release .. && ninja && ./tool/bssl speed -filter 25519 )
this branch:
Did 11382 Ed25519 key generation operations in 1053046us (10808.6 ops/sec)
Did 11169 Ed25519 signing operations in 1038080us (10759.3 ops/sec)
Did 2925 Ed25519 verify operations in 1001346us (2921.1 ops/sec)
Did 12000 Curve25519 base-point multiplication operations in 1084851us (11061.4 ops/sec)
Did 3850 Curve25519 arbitrary point multiplication operations in 1085565us (3546.5 ops/sec)
Did 11466 Ed25519 key generation operations in 1049821us (10921.9 ops/sec)
Did 11000 Ed25519 signing operations in 1013317us (10855.4 ops/sec)
Did 3047 Ed25519 verify operations in 1043846us (2919.0 ops/sec)
Did 12000 Curve25519 base-point multiplication operations in 1068924us (11226.2 ops/sec)
Did 3850 Curve25519 arbitrary point multiplication operations in 1090598us (3530.2 ops/sec)
Did 10309 Ed25519 key generation operations in 1003320us (10274.9 ops/sec)
Did 11000 Ed25519 signing operations in 1017862us (10807.0 ops/sec)
Did 3135 Ed25519 verify operations in 1098624us (2853.6 ops/sec)
Did 9000 Curve25519 base-point multiplication operations in 1046608us (8599.2 ops/sec)
Did 3132 Curve25519 arbitrary point multiplication operations in 1038963us (3014.5 ops/sec)
master:
Did 11564 Ed25519 key generation operations in 1068762us (10820.0 ops/sec)
Did 11104 Ed25519 signing operations in 1024278us (10840.8 ops/sec)
Did 3206 Ed25519 verify operations in 1049179us (3055.7 ops/sec)
Did 12000 Curve25519 base-point multiplication operations in 1073619us (11177.1 ops/sec)
Did 3550 Curve25519 arbitrary point multiplication operations in 1000279us (3549.0 ops/sec)
andreser@linux-andreser:~/boringssl$ build/tool/bssl speed -filter 25519
Did 11760 Ed25519 key generation operations in 1072495us (10965.1 ops/sec)
Did 10800 Ed25519 signing operations in 1003486us (10762.5 ops/sec)
Did 3245 Ed25519 verify operations in 1080399us (3003.5 ops/sec)
Did 12000 Curve25519 base-point multiplication operations in 1076021us (11152.2 ops/sec)
Did 3570 Curve25519 arbitrary point multiplication operations in 1005087us (3551.9 ops/sec)
andreser@linux-andreser:~/boringssl$ build/tool/bssl speed -filter 25519
Did 11438 Ed25519 key generation operations in 1041115us (10986.3 ops/sec)
Did 11000 Ed25519 signing operations in 1012589us (10863.2 ops/sec)
Did 3312 Ed25519 verify operations in 1082834us (3058.6 ops/sec)
Did 12000 Curve25519 base-point multiplication operations in 1061318us (11306.7 ops/sec)
Did 3580 Curve25519 arbitrary point multiplication operations in 1004923us (3562.5 ops/sec)
squashed: curve25519: convert field constants to unsigned.
import re, sys, math
def weight(i):
return 2**int(math.ceil(25.5*i))
def convert(t):
limbs = [x for x in t.groups() if x.replace('-','').isdigit()]
v = sum(weight(i)*x for (i,x) in enumerate(map(int, limbs))) % (2**255-19)
limbs = [(v % weight(i+1)) // weight(i) for i in range(10)]
assert v == sum(weight(i)*x for (i,x) in enumerate(limbs))
i = 0
ret = ''
for s in t.groups():
if s.replace('-','').isdigit():
ret += str(limbs[i])
i += 1
else:
ret += s
return ret
fe_re = re.compile(r'(\s*,\s*)'.join(r'(-?\d+)' for i in range(10)))
print (re.sub(fe_re, convert, sys.stdin.read()))
Change-Id: Ibd4f7f5c38e5c4d61c9826afb406baebe2be5168
Reviewed-on: https://boringssl-review.googlesource.com/22385
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This change doesn't actually introduce any Fiat code yet. It sets up the
directory structure to make the diffs in the next change clearer.
Change-Id: I38a21fb36b18a08b0907f9d37b7ef5d7d3137ede
Reviewed-on: https://boringssl-review.googlesource.com/22624
Reviewed-by: David Benjamin <davidben@google.com>