boringssl/third_party/fiat/README.md
David Benjamin cb1ad205d0 Use 51-bit limbs from fiat-crypto in 64-bit.
Our 64-bit performance was much lower than it could have been, since we
weren't using the 64-bit multipliers. Fortunately, fiat-crypto is
awesome, so this is just a matter of synthesizing new code and
integration work.

Functions without the signature fiat-crypto curly braces were written by
hand and warrant more review. (It's just redistributing some bits.)

These use the donna variants which takes (and proves) some of the
instruction scheduling from donna as that's significantly faster.
Glancing over things, I suspect but have not confirmed the gap is due to
this:
https://github.com/mit-plv/fiat-crypto/pull/295#issuecomment-356892413

Clang without OPENSSL_SMALL (ECDH omitted since that uses assembly and
is unaffected by this CL).

Before:
Did 105149 Ed25519 key generation operations in 5025208us (20924.3 ops/sec)
Did 125000 Ed25519 signing operations in 5024003us (24880.6 ops/sec)
Did 37642 Ed25519 verify operations in 5072539us (7420.7 ops/sec)

After:
Did 206000 Ed25519 key generation operations in 5020547us (41031.4 ops/sec)
Did 227000 Ed25519 signing operations in 5005232us (45352.5 ops/sec)
Did 69840 Ed25519 verify operations in 5004769us (13954.7 ops/sec)

Clang + OPENSSL_SMALL:

Before:
Did 68598 Ed25519 key generation operations in 5024629us (13652.4 ops/sec)
Did 73000 Ed25519 signing operations in 5067837us (14404.6 ops/sec)
Did 36765 Ed25519 verify operations in 5078684us (7239.1 ops/sec)
Did 74000 Curve25519 base-point multiplication operations in 5016465us (14751.4 ops/sec)
Did 45600 Curve25519 arbitrary point multiplication operations in 5034680us (9057.2 ops/sec)

After:
Did 117315 Ed25519 key generation operations in 5021860us (23360.9 ops/sec)
Did 126000 Ed25519 signing operations in 5003521us (25182.3 ops/sec)
Did 64974 Ed25519 verify operations in 5047790us (12871.8 ops/sec)
Did 134000 Curve25519 base-point multiplication operations in 5058946us (26487.7 ops/sec)
Did 86000 Curve25519 arbitrary point multiplication operations in 5050478us (17028.1 ops/sec)

GCC without OPENSSL_SMALL (ECDH omitted since that uses assembly and
is unaffected by this CL).

Before:
Did 35552 Ed25519 key generation operations in 5030756us (7066.9 ops/sec)
Did 38286 Ed25519 signing operations in 5001648us (7654.7 ops/sec)
Did 10584 Ed25519 verify operations in 5068158us (2088.3 ops/sec)

After:
Did 92158 Ed25519 key generation operations in 5024021us (18343.5 ops/sec)
Did 99000 Ed25519 signing operations in 5011908us (19753.0 ops/sec)
Did 31122 Ed25519 verify operations in 5069878us (6138.6 ops/sec)

Change-Id: Ic0c24d50b4ee2bbc408b94965e9d63319936107d
Reviewed-on: https://boringssl-review.googlesource.com/24805
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2018-01-23 22:25:07 +00:00

2.5 KiB

Fiat

Some of the code in this directory is generated by Fiat and thus these files are licensed under the MIT license. (See LICENSE file.)

Curve25519

To generate the field arithmetic procedures in curve25519.c from a fiat-crypto checkout (as of 7892c66d5e0e5770c79463ce551193ceef870641), run make src/Specific/solinas32_2e255m19_10limbs/femul.c (replacing femul with the desired field operation). The "source" file specifying the finite field and referencing the desired implementation strategy is src/Specific/solinas32_2e255m19_10limbs/CurveParameters.v, specifying roughly "unsaturated arithmetic modulo 2^255-19 using 10 limbs of radix 2^25.5 in 32-bit unsigned integers with a single carry chain and two wraparound carries" where only the prime is considered normative and everything else is treated as "compiler hints".

The 64-bit implementation uses 5 limbs of radix 2^51 with instruction scheduling taken from curve25519-donna-c64. It is found in src/Specific/solinas64_2e255m19_5limbs_donna.

P256

To generate the field arithmetic procedures in p256.c from a fiat-crypto checkout, run make src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs/femul.c. The corresponding "source" file is src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs/CurveParameters.v, specifying roughly "64-bit saturated word-by-word Montgomery reduction modulo 2^256 - 2^224 + 2^192 + 2^96 - 1". Again, everything except for the prime is untrusted. There is currently a known issue where fesub.c for p256 does not manage to complete the build (specialization) within a week on Coq 8.7.0. https://github.com/JasonGross/fiat-crypto/tree/3e6851ddecaac70d0feb484a75360d57f6e41244/src/Specific/montgomery64_2e256m2e224p2e192p2e96m1_4limbs does manage to build that file, but the work on that branch was never finished (the correctness proofs of implementation templates still apply, but the now abandoned prototype specialization facilities there are unverified).

Working With Fiat Crypto Field Arithmetic

The fiat-crypto readme https://github.com/mit-plv/fiat-crypto#arithmetic-core contains an overview of the implementation templates followed by a tour of the specialization machinery. It may be helpful to first read about the less messy parts of the system from chapter 3 of http://adam.chlipala.net/theses/andreser.pdf. There is work ongoing to replace the entire specialization mechanism with something much more principled https://github.com/mit-plv/fiat-crypto/projects/4.