boringssl/third_party/fiat
David Benjamin 638a408cd2 Add a tuned variable-time P-256 multiplication function.
This reuses wnaf.c's window scheduling, but has access to the tuned
field arithemetic and pre-computed base point table. Unlike wnaf.c, we
do not make the points affine as it's not worth it for a single table.
(We already precomputed the base point table.)

Annoyingly, 32-bit x86 gets slower by a bit, but the other platforms are
faster. My guess is that that the generic code gets to use the
bn_mul_mont assembly and the compiler, faced with the increased 32-bit
register pressure and the extremely register-poor x86, is making
bad decisions on the otherwise P-256-tuned C code. The three platforms
that see much larger gains are significantly more important than 32-bit
x86 at this point, so go with this change.

armv7a (Nexus 5X) before/after [+14.4%]:
Did 2703 ECDSA P-256 verify operations in 5034539us (536.9 ops/sec)
Did 3127 ECDSA P-256 verify operations in 5091379us (614.2 ops/sec)

aarch64 (Nexus 5X) before/after [+9.2%]:
Did 6783 ECDSA P-256 verify operations in 5031324us (1348.2 ops/sec)
Did 7410 ECDSA P-256 verify operations in 5033291us (1472.2 ops/sec)

x86 before/after [-2.7%]:
Did 8961 ECDSA P-256 verify operations in 10075901us (889.3 ops/sec)
Did 8568 ECDSA P-256 verify operations in 10003001us (856.5 ops/sec)

x86_64 before/after [+8.6%]:
Did 29808 ECDSA P-256 verify operations in 10008662us (2978.2 ops/sec)
Did 32528 ECDSA P-256 verify operations in 10057137us (3234.3 ops/sec)

Change-Id: I5fa643149f5bfbbda9533e3008baadfee9979b93
Reviewed-on: https://boringssl-review.googlesource.com/25684
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2018-02-12 22:00:48 +00:00
..
BUILD.gn Add files in third_party/fiat for Chromium to pick up. 2018-01-10 22:02:03 +00:00
CMakeLists.txt
curve25519_tables.h Use 51-bit limbs from fiat-crypto in 64-bit. 2018-01-23 22:25:07 +00:00
curve25519.c Require that Ed25519 |s| values be < order. 2018-02-02 20:45:08 +00:00
internal.h Remove x86_64 x25519 assembly. 2018-02-01 21:44:58 +00:00
LICENSE curve25519: fiat-crypto field arithmetic. 2017-11-03 22:39:31 +00:00
make_curve25519_tables.py Use 51-bit limbs from fiat-crypto in 64-bit. 2018-01-23 22:25:07 +00:00
METADATA change URL type in third_party METADATA files to GIT 2017-11-07 21:38:33 +00:00
p256.c Add a tuned variable-time P-256 multiplication function. 2018-02-12 22:00:48 +00:00
README.chromium Add files in third_party/fiat for Chromium to pick up. 2018-01-10 22:02:03 +00:00
README.md Use 51-bit limbs from fiat-crypto in 64-bit. 2018-01-23 22:25:07 +00:00

Fiat

Some of the code in this directory is generated by Fiat and thus these files are licensed under the MIT license. (See LICENSE file.)

Curve25519

To generate the field arithmetic procedures in curve25519.c from a fiat-crypto checkout (as of 7892c66d5e0e5770c79463ce551193ceef870641), run make src/Specific/solinas32_2e255m19_10limbs/femul.c (replacing femul with the desired field operation). The "source" file specifying the finite field and referencing the desired implementation strategy is src/Specific/solinas32_2e255m19_10limbs/CurveParameters.v, specifying roughly "unsaturated arithmetic modulo 2^255-19 using 10 limbs of radix 2^25.5 in 32-bit unsigned integers with a single carry chain and two wraparound carries" where only the prime is considered normative and everything else is treated as "compiler hints".

The 64-bit implementation uses 5 limbs of radix 2^51 with instruction scheduling taken from curve25519-donna-c64. It is found in src/Specific/solinas64_2e255m19_5limbs_donna.

P256

To generate the field arithmetic procedures in p256.c from a fiat-crypto checkout, run make src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs/femul.c. The corresponding "source" file is src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs/CurveParameters.v, specifying roughly "64-bit saturated word-by-word Montgomery reduction modulo 2^256 - 2^224 + 2^192 + 2^96 - 1". Again, everything except for the prime is untrusted. There is currently a known issue where fesub.c for p256 does not manage to complete the build (specialization) within a week on Coq 8.7.0. https://github.com/JasonGross/fiat-crypto/tree/3e6851ddecaac70d0feb484a75360d57f6e41244/src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs does manage to build that file, but the work on that branch was never finished (the correctness proofs of implementation templates still apply, but the now abandoned prototype specialization facilities there are unverified).

Working With Fiat Crypto Field Arithmetic

The fiat-crypto readme https://github.com/mit-plv/fiat-crypto#arithmetic-core contains an overview of the implementation templates followed by a tour of the specialization machinery. It may be helpful to first read about the less messy parts of the system from chapter 3 of http://adam.chlipala.net/theses/andreser.pdf. There is work ongoing to replace the entire specialization mechanism with something much more principled https://github.com/mit-plv/fiat-crypto/projects/4.