3d450d2844
This commit improves the performance of ECDSA signature verification (over NIST P-256 curve) for x86 platforms. The speedup is by a factor of 1.15x. It does so by: 1) Leveraging the fact that the verification does not need to run in constant time. To this end, we implemented: a) the function ecp_nistz256_points_mul_public in a similar way to the current ecp_nistz256_points_mul function by removing its constant time features. b) the Binary Extended Euclidean Algorithm (BEEU) in x86 assembly to replace the current modular inverse function used for the inversion. 2) The last step in the ECDSA_verify function compares the (x) affine coordinate with the signature (r) value. Converting x from the Jacobian's representation to the affine coordinate requires to perform one inversions (x_affine = x * z^(-2)). We save this inversion and speed up the computations by instead bringing r to x (r_jacobian = r*z^2) which is faster. The measured results are: Before (on a Kaby Lake desktop with gcc-5): Did 26000 ECDSA P-224 signing operations in 1002372us (25938.5 ops/sec) Did 11000 ECDSA P-224 verify operations in 1043821us (10538.2 ops/sec) Did 55000 ECDSA P-256 signing operations in 1017560us (54050.9 ops/sec) Did 17000 ECDSA P-256 verify operations in 1051280us (16170.8 ops/sec) After (on a Kaby Lake desktop with gcc-5): Did 27000 ECDSA P-224 signing operations in 1011287us (26698.7 ops/sec) Did 11640 ECDSA P-224 verify operations in 1076698us (10810.8 ops/sec) Did 55000 ECDSA P-256 signing operations in 1016880us (54087.0 ops/sec) Did 20000 ECDSA P-256 verify operations in 1038736us (19254.2 ops/sec) Before (on a Skylake server platform with gcc-5): Did 25000 ECDSA P-224 signing operations in 1021651us (24470.2 ops/sec) Did 10373 ECDSA P-224 verify operations in 1046563us (9911.5 ops/sec) Did 50000 ECDSA P-256 signing operations in 1002774us (49861.7 ops/sec) Did 15000 ECDSA P-256 verify operations in 1006471us (14903.6 ops/sec) After (on a Skylake server platform with gcc-5): Did 25000 ECDSA P-224 signing operations in 1020958us (24486.8 ops/sec) Did 10373 ECDSA P-224 verify operations in 1046359us (9913.4 ops/sec) Did 50000 ECDSA P-256 signing operations in 1003996us (49801.0 ops/sec) Did 18000 ECDSA P-256 verify operations in 1021604us (17619.4 ops/sec) Developers and authors: *************************************************************************** Nir Drucker (1,2), Shay Gueron (1,2) (1) Amazon Web Services Inc. (2) University of Haifa, Israel *************************************************************************** Change-Id: Idd42a7bc40626bce974ea000b61fdb5bad33851c Reviewed-on: https://boringssl-review.googlesource.com/c/31304 Commit-Queue: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com> |
||
---|---|---|
.. | ||
BUILD.gn | ||
curve25519_tables.h | ||
curve25519.c | ||
internal.h | ||
LICENSE | ||
make_curve25519_tables.py | ||
METADATA | ||
p256.c | ||
README.chromium | ||
README.md |
Fiat
Some of the code in this directory is generated by Fiat and thus these files are licensed under the MIT license. (See LICENSE file.)
Curve25519
To generate the field arithmetic procedures in curve25519.c
from a fiat-crypto
checkout (as of 7892c66d5e0e5770c79463ce551193ceef870641
), run
make src/Specific/solinas32_2e255m19_10limbs/femul.c
(replacing femul
with
the desired field operation). The "source" file specifying the finite field and
referencing the desired implementation strategy is
src/Specific/solinas32_2e255m19_10limbs/CurveParameters.v
, specifying roughly
"unsaturated arithmetic modulo 2^255-19 using 10 limbs of radix 2^25.5 in 32-bit
unsigned integers with a single carry chain and two wraparound carries" where
only the prime is considered normative and everything else is treated as
"compiler hints".
The 64-bit implementation uses 5 limbs of radix 2^51 with instruction scheduling
taken from curve25519-donna-c64. It is found in
src/Specific/solinas64_2e255m19_5limbs_donna
.
P256
To generate the field arithmetic procedures in p256.c
from a fiat-crypto
checkout, run
make src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs/femul.c
.
The corresponding "source" file is
src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs/CurveParameters.v
,
specifying roughly "64-bit saturated word-by-word Montgomery reduction modulo
2^256 - 2^224 + 2^192 + 2^96 - 1". Again, everything except for the prime is
untrusted. There is currently a known issue where fesub.c
for p256 does not
manage to complete the build (specialization) within a week on Coq 8.7.0.
https://github.com/JasonGross/fiat-crypto/tree/3e6851ddecaac70d0feb484a75360d57f6e41244/src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs
does manage to build that file, but the work on that branch was never finished
(the correctness proofs of implementation templates still apply, but the
now abandoned prototype specialization facilities there are unverified).
Working With Fiat Crypto Field Arithmetic
The fiat-crypto readme https://github.com/mit-plv/fiat-crypto#arithmetic-core contains an overview of the implementation templates followed by a tour of the specialization machinery. It may be helpful to first read about the less messy parts of the system from chapter 3 of http://adam.chlipala.net/theses/andreser.pdf. There is work ongoing to replace the entire specialization mechanism with something much more principled https://github.com/mit-plv/fiat-crypto/projects/4.