boringssl/third_party/fiat
Nir Drucker 3d450d2844 Speed up ECDSA verify on x86-64.
This commit improves the performance of ECDSA signature verification
(over NIST P-256 curve) for x86 platforms. The speedup is by a factor of 1.15x.
It does so by:
  1) Leveraging the fact that the verification does not need
     to run in constant time. To this end, we implemented:
    a) the function ecp_nistz256_points_mul_public in a similar way to
       the current ecp_nistz256_points_mul function by removing its constant
       time features.
    b) the Binary Extended Euclidean Algorithm (BEEU) in x86 assembly to
       replace the current modular inverse function used for the inversion.
  2) The last step in the ECDSA_verify function compares the (x) affine
     coordinate with the signature (r) value. Converting x from the Jacobian's
     representation to the affine coordinate requires to perform one inversions
     (x_affine = x * z^(-2)). We save this inversion and speed up the computations
     by instead bringing r to x (r_jacobian = r*z^2) which is faster.

The measured results are:
Before (on a Kaby Lake desktop with gcc-5):
Did 26000 ECDSA P-224 signing operations in 1002372us (25938.5 ops/sec)
Did 11000 ECDSA P-224 verify operations in 1043821us (10538.2 ops/sec)
Did 55000 ECDSA P-256 signing operations in 1017560us (54050.9 ops/sec)
Did 17000 ECDSA P-256 verify operations in 1051280us (16170.8 ops/sec)

After (on a Kaby Lake desktop with gcc-5):
Did 27000 ECDSA P-224 signing operations in 1011287us (26698.7 ops/sec)
Did 11640 ECDSA P-224 verify operations in 1076698us (10810.8 ops/sec)
Did 55000 ECDSA P-256 signing operations in 1016880us (54087.0 ops/sec)
Did 20000 ECDSA P-256 verify operations in 1038736us (19254.2 ops/sec)

Before (on a Skylake server platform with gcc-5):
Did 25000 ECDSA P-224 signing operations in 1021651us (24470.2 ops/sec)
Did 10373 ECDSA P-224 verify operations in 1046563us (9911.5 ops/sec)
Did 50000 ECDSA P-256 signing operations in 1002774us (49861.7 ops/sec)
Did 15000 ECDSA P-256 verify operations in 1006471us (14903.6 ops/sec)

After (on a Skylake server platform with gcc-5):
Did 25000 ECDSA P-224 signing operations in 1020958us (24486.8 ops/sec)
Did 10373 ECDSA P-224 verify operations in 1046359us (9913.4 ops/sec)
Did 50000 ECDSA P-256 signing operations in 1003996us (49801.0 ops/sec)
Did 18000 ECDSA P-256 verify operations in 1021604us (17619.4 ops/sec)

Developers and authors:
***************************************************************************
Nir Drucker (1,2), Shay Gueron (1,2)
(1) Amazon Web Services Inc.
(2) University of Haifa, Israel
***************************************************************************

Change-Id: Idd42a7bc40626bce974ea000b61fdb5bad33851c
Reviewed-on: https://boringssl-review.googlesource.com/c/31304
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2018-11-05 23:48:07 +00:00
..
BUILD.gn Add files in third_party/fiat for Chromium to pick up. 2018-01-10 22:02:03 +00:00
curve25519_tables.h Use 51-bit limbs from fiat-crypto in 64-bit. 2018-01-23 22:25:07 +00:00
curve25519.c Document that ED25519_sign only fails on allocation failure 2018-08-29 18:35:12 +00:00
internal.h Remove x86_64 x25519 assembly. 2018-02-01 21:44:58 +00:00
LICENSE curve25519: fiat-crypto field arithmetic. 2017-11-03 22:39:31 +00:00
make_curve25519_tables.py Use 51-bit limbs from fiat-crypto in 64-bit. 2018-01-23 22:25:07 +00:00
METADATA third_party: re-format METATADA files 2018-02-27 19:57:12 +00:00
p256.c Speed up ECDSA verify on x86-64. 2018-11-05 23:48:07 +00:00
README.chromium Add files in third_party/fiat for Chromium to pick up. 2018-01-10 22:02:03 +00:00
README.md Use 51-bit limbs from fiat-crypto in 64-bit. 2018-01-23 22:25:07 +00:00

Fiat

Some of the code in this directory is generated by Fiat and thus these files are licensed under the MIT license. (See LICENSE file.)

Curve25519

To generate the field arithmetic procedures in curve25519.c from a fiat-crypto checkout (as of 7892c66d5e0e5770c79463ce551193ceef870641), run make src/Specific/solinas32_2e255m19_10limbs/femul.c (replacing femul with the desired field operation). The "source" file specifying the finite field and referencing the desired implementation strategy is src/Specific/solinas32_2e255m19_10limbs/CurveParameters.v, specifying roughly "unsaturated arithmetic modulo 2^255-19 using 10 limbs of radix 2^25.5 in 32-bit unsigned integers with a single carry chain and two wraparound carries" where only the prime is considered normative and everything else is treated as "compiler hints".

The 64-bit implementation uses 5 limbs of radix 2^51 with instruction scheduling taken from curve25519-donna-c64. It is found in src/Specific/solinas64_2e255m19_5limbs_donna.

P256

To generate the field arithmetic procedures in p256.c from a fiat-crypto checkout, run make src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs/femul.c. The corresponding "source" file is src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs/CurveParameters.v, specifying roughly "64-bit saturated word-by-word Montgomery reduction modulo 2^256 - 2^224 + 2^192 + 2^96 - 1". Again, everything except for the prime is untrusted. There is currently a known issue where fesub.c for p256 does not manage to complete the build (specialization) within a week on Coq 8.7.0. https://github.com/JasonGross/fiat-crypto/tree/3e6851ddecaac70d0feb484a75360d57f6e41244/src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs does manage to build that file, but the work on that branch was never finished (the correctness proofs of implementation templates still apply, but the now abandoned prototype specialization facilities there are unverified).

Working With Fiat Crypto Field Arithmetic

The fiat-crypto readme https://github.com/mit-plv/fiat-crypto#arithmetic-core contains an overview of the implementation templates followed by a tour of the specialization machinery. It may be helpful to first read about the less messy parts of the system from chapter 3 of http://adam.chlipala.net/theses/andreser.pdf. There is work ongoing to replace the entire specialization mechanism with something much more principled https://github.com/mit-plv/fiat-crypto/projects/4.