638a408cd2
This reuses wnaf.c's window scheduling, but has access to the tuned field arithemetic and pre-computed base point table. Unlike wnaf.c, we do not make the points affine as it's not worth it for a single table. (We already precomputed the base point table.) Annoyingly, 32-bit x86 gets slower by a bit, but the other platforms are faster. My guess is that that the generic code gets to use the bn_mul_mont assembly and the compiler, faced with the increased 32-bit register pressure and the extremely register-poor x86, is making bad decisions on the otherwise P-256-tuned C code. The three platforms that see much larger gains are significantly more important than 32-bit x86 at this point, so go with this change. armv7a (Nexus 5X) before/after [+14.4%]: Did 2703 ECDSA P-256 verify operations in 5034539us (536.9 ops/sec) Did 3127 ECDSA P-256 verify operations in 5091379us (614.2 ops/sec) aarch64 (Nexus 5X) before/after [+9.2%]: Did 6783 ECDSA P-256 verify operations in 5031324us (1348.2 ops/sec) Did 7410 ECDSA P-256 verify operations in 5033291us (1472.2 ops/sec) x86 before/after [-2.7%]: Did 8961 ECDSA P-256 verify operations in 10075901us (889.3 ops/sec) Did 8568 ECDSA P-256 verify operations in 10003001us (856.5 ops/sec) x86_64 before/after [+8.6%]: Did 29808 ECDSA P-256 verify operations in 10008662us (2978.2 ops/sec) Did 32528 ECDSA P-256 verify operations in 10057137us (3234.3 ops/sec) Change-Id: I5fa643149f5bfbbda9533e3008baadfee9979b93 Reviewed-on: https://boringssl-review.googlesource.com/25684 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> |
||
---|---|---|
.. | ||
BUILD.gn | ||
CMakeLists.txt | ||
curve25519_tables.h | ||
curve25519.c | ||
internal.h | ||
LICENSE | ||
make_curve25519_tables.py | ||
METADATA | ||
p256.c | ||
README.chromium | ||
README.md |
Fiat
Some of the code in this directory is generated by Fiat and thus these files are licensed under the MIT license. (See LICENSE file.)
Curve25519
To generate the field arithmetic procedures in curve25519.c
from a fiat-crypto
checkout (as of 7892c66d5e0e5770c79463ce551193ceef870641
), run
make src/Specific/solinas32_2e255m19_10limbs/femul.c
(replacing femul
with
the desired field operation). The "source" file specifying the finite field and
referencing the desired implementation strategy is
src/Specific/solinas32_2e255m19_10limbs/CurveParameters.v
, specifying roughly
"unsaturated arithmetic modulo 2^255-19 using 10 limbs of radix 2^25.5 in 32-bit
unsigned integers with a single carry chain and two wraparound carries" where
only the prime is considered normative and everything else is treated as
"compiler hints".
The 64-bit implementation uses 5 limbs of radix 2^51 with instruction scheduling
taken from curve25519-donna-c64. It is found in
src/Specific/solinas64_2e255m19_5limbs_donna
.
P256
To generate the field arithmetic procedures in p256.c
from a fiat-crypto
checkout, run
make src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs/femul.c
.
The corresponding "source" file is
src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs/CurveParameters.v
,
specifying roughly "64-bit saturated word-by-word Montgomery reduction modulo
2^256 - 2^224 + 2^192 + 2^96 - 1". Again, everything except for the prime is
untrusted. There is currently a known issue where fesub.c
for p256 does not
manage to complete the build (specialization) within a week on Coq 8.7.0.
https://github.com/JasonGross/fiat-crypto/tree/3e6851ddecaac70d0feb484a75360d57f6e41244/src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs
does manage to build that file, but the work on that branch was never finished
(the correctness proofs of implementation templates still apply, but the
now abandoned prototype specialization facilities there are unverified).
Working With Fiat Crypto Field Arithmetic
The fiat-crypto readme https://github.com/mit-plv/fiat-crypto#arithmetic-core contains an overview of the implementation templates followed by a tour of the specialization machinery. It may be helpful to first read about the less messy parts of the system from chapter 3 of http://adam.chlipala.net/theses/andreser.pdf. There is work ongoing to replace the entire specialization mechanism with something much more principled https://github.com/mit-plv/fiat-crypto/projects/4.