46304abf7d
The fiat-crypto-generated code uses the Montgomery form implementation strategy, for both 32-bit and 64-bit code. 64-bit throughput seems slower, but the difference is smaller than noise between repetitions (-2%?) 32-bit throughput has decreased significantly for ECDH (-40%). I am attributing this to the change from varibale-time scalar multiplication to constant-time scalar multiplication. Due to the same bottleneck, ECDSA verification still uses the old code (otherwise there would have been a 60% throughput decrease). On the other hand, ECDSA signing throughput has increased slightly (+10%), perhaps due to the use of a precomputed table of multiples of the base point. 64-bit benchmarks (Google Cloud Haswell): with this change: Did 9126 ECDH P-256 operations in 1009572us (9039.5 ops/sec) Did 23000 ECDSA P-256 signing operations in 1039832us (22119.0 ops/sec) Did 8820 ECDSA P-256 verify operations in 1024242us (8611.2 ops/sec) master ( |
||
---|---|---|
.. | ||
asn1 | ||
base64 | ||
bio | ||
bn_extra | ||
buf | ||
bytestring | ||
chacha | ||
cipher_extra | ||
cmac | ||
conf | ||
curve25519 | ||
dh | ||
digest_extra | ||
dsa | ||
ec_extra | ||
ecdh | ||
ecdsa_extra | ||
engine | ||
err | ||
evp | ||
fipsmodule | ||
hkdf | ||
hmac_extra | ||
lhash | ||
obj | ||
pem | ||
perlasm | ||
pkcs7 | ||
pkcs8 | ||
poly1305 | ||
pool | ||
rand_extra | ||
rc4 | ||
rsa_extra | ||
stack | ||
test | ||
x509 | ||
x509v3 | ||
CMakeLists.txt | ||
compiler_test.cc | ||
constant_time_test.cc | ||
cpu-aarch64-linux.c | ||
cpu-arm-linux.c | ||
cpu-arm.c | ||
cpu-intel.c | ||
cpu-ppc64le.c | ||
crypto.c | ||
ex_data.c | ||
internal.h | ||
mem.c | ||
refcount_c11.c | ||
refcount_lock.c | ||
refcount_test.cc | ||
thread_none.c | ||
thread_pthread.c | ||
thread_test.cc | ||
thread_win.c | ||
thread.c |