Commit Graph

22 Commits

Author SHA1 Message Date
David Benjamin
f109f20873 Clear out a bunch of -Wextra-semi warnings.
Unfortunately, it's not enough to be able to turn it on thanks to the
PURE_VIRTUAL macro. But it gets us most of the way there.

Change-Id: Ie6ad5119fcfd420115fa49d7312f3586890244f4
Reviewed-on: https://boringssl-review.googlesource.com/c/34949
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2019-02-21 19:12:39 +00:00
Adam Langley
899835fad4 Rename Fiat include files to end in .h
Otherwise generate_build_files.py thinks that they're top-level source
files.

Fixes grpc/grpc#17780

Change-Id: I9f14a816a5045c1101841a2ef7ef9868abcd5d12
Reviewed-on: https://boringssl-review.googlesource.com/c/34364
Reviewed-by: Adam Langley <agl@google.com>
2019-01-21 17:29:45 +00:00
David Benjamin
32e59d2d32 Switch to new fiat pipeline.
This new version makes it much easier to tell which code is handwritten
and which is verified. For some reason, it also is *dramatically* faster
for 32-bit x86 GCC. Clang x86_64, however, does take a small hit.
Benchmarks below.

x86, GCC 7.3.0, OPENSSL_SMALL
(For some reason, GCC used to be really bad at compiling the 32-bit curve25519
code. The new one fixes this. I'm not sure what changed.)
Before:
Did 17135 Ed25519 key generation operations in 10026402us (1709.0 ops/sec)
Did 17170 Ed25519 signing operations in 10074192us (1704.4 ops/sec)
Did 9180 Ed25519 verify operations in 10034025us (914.9 ops/sec)
Did 17271 Curve25519 base-point multiplication operations in 10050837us (1718.4 ops/sec)
Did 10605 Curve25519 arbitrary point multiplication operations in 10047714us (1055.5 ops/sec)
Did 7800 ECDH P-256 operations in 10018331us (778.6 ops/sec)
Did 24308 ECDSA P-256 signing operations in 10019241us (2426.1 ops/sec)
Did 9191 ECDSA P-256 verify operations in 10081639us (911.7 ops/sec)
After:
Did 99873 Ed25519 key generation operations in 10021810us (9965.6 ops/sec) [+483.1%]
Did 99960 Ed25519 signing operations in 10052236us (9944.1 ops/sec) [+483.4%]
Did 53676 Ed25519 verify operations in 10009078us (5362.7 ops/sec) [+486.2%]
Did 102000 Curve25519 base-point multiplication operations in 10039764us (10159.6 ops/sec) [+491.2%]
Did 60802 Curve25519 arbitrary point multiplication operations in 10056897us (6045.8 ops/sec) [+472.8%]
Did 7900 ECDH P-256 operations in 10054509us (785.7 ops/sec) [+0.9%]
Did 24926 ECDSA P-256 signing operations in 10050919us (2480.0 ops/sec) [+2.2%]
Did 9494 ECDSA P-256 verify operations in 10064659us (943.3 ops/sec) [+3.5%]

x86, Clang 8.0.0 trunk 349417, OPENSSL_SMALL
Before:
Did 82750 Ed25519 key generation operations in 10051177us (8232.9 ops/sec)
Did 82400 Ed25519 signing operations in 10035806us (8210.6 ops/sec)
Did 41511 Ed25519 verify operations in 10048919us (4130.9 ops/sec)
Did 83300 Curve25519 base-point multiplication operations in 10044283us (8293.3 ops/sec)
Did 49700 Curve25519 arbitrary point multiplication operations in 10007005us (4966.5 ops/sec)
Did 14039 ECDH P-256 operations in 10093929us (1390.8 ops/sec)
Did 40950 ECDSA P-256 signing operations in 10006757us (4092.2 ops/sec)
Did 16068 ECDSA P-256 verify operations in 10095996us (1591.5 ops/sec)
After:
Did 80476 Ed25519 key generation operations in 10048648us (8008.6 ops/sec) [-2.7%]
Did 79050 Ed25519 signing operations in 10049180us (7866.3 ops/sec) [-4.2%]
Did 40501 Ed25519 verify operations in 10048347us (4030.6 ops/sec) [-2.4%]
Did 81300 Curve25519 base-point multiplication operations in 10017480us (8115.8 ops/sec) [-2.1%]
Did 48278 Curve25519 arbitrary point multiplication operations in 10092500us (4783.6 ops/sec) [-3.7%]
Did 15402 ECDH P-256 operations in 10096705us (1525.4 ops/sec) [+9.7%]
Did 44200 ECDSA P-256 signing operations in 10037715us (4403.4 ops/sec) [+7.6%]
Did 17000 ECDSA P-256 verify operations in 10008813us (1698.5 ops/sec) [+6.7%]

x86_64, GCC 7.3.0
(Note these P-256 numbers are not affected by this change. Included to get a
sense of noise.)
Before:
Did 557000 Ed25519 key generation operations in 10011721us (55634.8 ops/sec)
Did 550000 Ed25519 signing operations in 10016449us (54909.7 ops/sec)
Did 190000 Ed25519 verify operations in 10014565us (18972.4 ops/sec)
Did 587000 Curve25519 base-point multiplication operations in 10015402us (58609.7 ops/sec)
Did 230000 Curve25519 arbitrary point multiplication operations in 10023827us (22945.3 ops/sec)
Did 179000 ECDH P-256 operations in 10016294us (17870.9 ops/sec)
Did 557000 ECDSA P-256 signing operations in 10014158us (55621.3 ops/sec)
Did 198000 ECDSA P-256 verify operations in 10036694us (19727.6 ops/sec)
After:
Did 569000 Ed25519 key generation operations in 10004965us (56871.8 ops/sec) [+2.2%]
Did 563000 Ed25519 signing operations in 10000064us (56299.6 ops/sec) [+2.5%]
Did 196000 Ed25519 verify operations in 10025650us (19549.9 ops/sec) [+3.0%]
Did 596000 Curve25519 base-point multiplication operations in 10008666us (59548.4 ops/sec) [+1.6%]
Did 229000 Curve25519 arbitrary point multiplication operations in 10028921us (22834.0 ops/sec) [-0.5%]
Did 182910 ECDH P-256 operations in 10014905us (18263.8 ops/sec) [+2.2%]
Did 562000 ECDSA P-256 signing operations in 10011944us (56133.0 ops/sec) [+0.9%]
Did 202000 ECDSA P-256 verify operations in 10046901us (20105.7 ops/sec) [+1.9%]

x86_64, GCC 7.3.0, OPENSSL_SMALL
Before:
Did 350000 Ed25519 key generation operations in 10002540us (34991.1 ops/sec)
Did 344000 Ed25519 signing operations in 10010420us (34364.2 ops/sec)
Did 197000 Ed25519 verify operations in 10030593us (19639.9 ops/sec)
Did 362000 Curve25519 base-point multiplication operations in 10004615us (36183.3 ops/sec)
Did 235000 Curve25519 arbitrary point multiplication operations in 10025951us (23439.2 ops/sec)
Did 32032 ECDH P-256 operations in 10056486us (3185.2 ops/sec)
Did 96354 ECDSA P-256 signing operations in 10007297us (9628.4 ops/sec)
Did 37774 ECDSA P-256 verify operations in 10044892us (3760.5 ops/sec)
After:
Did 343000 Ed25519 key generation operations in 10025108us (34214.1 ops/sec) [-2.2%]
Did 340000 Ed25519 signing operations in 10014870us (33949.5 ops/sec) [-1.2%]
Did 192000 Ed25519 verify operations in 10025082us (19152.0 ops/sec) [-2.5%]
Did 355000 Curve25519 base-point multiplication operations in 10013220us (35453.1 ops/sec) [-2.0%]
Did 231000 Curve25519 arbitrary point multiplication operations in 10010775us (23075.1 ops/sec) [-1.6%]
Did 31540 ECDH P-256 operations in 10009664us (3151.0 ops/sec) [-1.1%]
Did 99012 ECDSA P-256 signing operations in 10090296us (9812.6 ops/sec) [+1.9%]
Did 37695 ECDSA P-256 verify operations in 10092859us (3734.8 ops/sec) [-0.7%]

x86_64, Clang 8.0.0 trunk 349417
(Note these P-256 numbers are not affected by this change. Included to get a
sense of noise.)
Before:
Did 600000 Ed25519 key generation operations in 10000278us (59998.3 ops/sec)
Did 595000 Ed25519 signing operations in 10010375us (59438.3 ops/sec)
Did 184000 Ed25519 verify operations in 10013984us (18374.3 ops/sec)
Did 636000 Curve25519 base-point multiplication operations in 10005250us (63566.6 ops/sec)
Did 229000 Curve25519 arbitrary point multiplication operations in 10006059us (22886.1 ops/sec)
Did 179250 ECDH P-256 operations in 10026354us (17877.9 ops/sec)
Did 547000 ECDSA P-256 signing operations in 10017585us (54604.0 ops/sec)
Did 197000 ECDSA P-256 verify operations in 10013020us (19674.4 ops/sec)
After:
Did 560000 Ed25519 key generation operations in 10009295us (55948.0 ops/sec) [-6.8%]
Did 548000 Ed25519 signing operations in 10007912us (54756.7 ops/sec) [-7.9%]
Did 170000 Ed25519 verify operations in 10056948us (16903.7 ops/sec) [-8.0%]
Did 592000 Curve25519 base-point multiplication operations in 10016818us (59100.6 ops/sec) [-7.0%]
Did 214000 Curve25519 arbitrary point multiplication operations in 10043918us (21306.4 ops/sec) [-6.9%]
Did 180000 ECDH P-256 operations in 10026019us (17953.3 ops/sec) [+0.4%]
Did 550000 ECDSA P-256 signing operations in 10004943us (54972.8 ops/sec) [+0.7%]
Did 198000 ECDSA P-256 verify operations in 10021714us (19757.1 ops/sec) [+0.4%]

x86_64, Clang 8.0.0 trunk 349417, OPENSSL_SMALL
Before:
Did 326000 Ed25519 key generation operations in 10003266us (32589.4 ops/sec)
Did 322000 Ed25519 signing operations in 10026783us (32114.0 ops/sec)
Did 181000 Ed25519 verify operations in 10015635us (18071.7 ops/sec)
Did 335000 Curve25519 base-point multiplication operations in 10000359us (33498.8 ops/sec)
Did 224000 Curve25519 arbitrary point multiplication operations in 10027245us (22339.1 ops/sec)
Did 68552 ECDH P-256 operations in 10018900us (6842.3 ops/sec)
Did 184000 ECDSA P-256 signing operations in 10014516us (18373.3 ops/sec)
Did 76020 ECDSA P-256 verify operations in 10016891us (7589.2 ops/sec)
After:
Did 310000 Ed25519 key generation operations in 10022086us (30931.7 ops/sec) [-5.1%]
Did 308000 Ed25519 signing operations in 10007543us (30776.8 ops/sec) [-4.2%]
Did 173000 Ed25519 verify operations in 10005829us (17289.9 ops/sec) [-4.3%]
Did 321000 Curve25519 base-point multiplication operations in 10027058us (32013.4 ops/sec) [-4.4%]
Did 212000 Curve25519 arbitrary point multiplication operations in 10015203us (21167.8 ops/sec) [-5.2%]
Did 64059 ECDH P-256 operations in 10042781us (6378.6 ops/sec) [-6.8%]
Did 170000 ECDSA P-256 signing operations in 10030896us (16947.6 ops/sec) [-7.8%]
Did 72176 ECDSA P-256 verify operations in 10075369us (7163.6 ops/sec) [-5.6%]

Bug: 254
Change-Id: Ib04c773f01b542bcb8611cceb582466bfa6f6d52
Reviewed-on: https://boringssl-review.googlesource.com/c/34306
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-18 00:24:03 +00:00
David Benjamin
5ecfb10d54 Modernize OPENSSL_COMPILE_ASSERT, part 2.
The change seems to have stuck, so bring us closer to C/++11 static asserts.

(If we later find we need to support worse toolchains, we can always use
__LINE__ or __COUNTER__ to avoid duplicate typedef names and just punt on
embedding the message into the type name.)

Change-Id: I0e5bb1106405066f07740728e19ebe13cae3e0ee
Reviewed-on: https://boringssl-review.googlesource.com/c/33145
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2018-11-14 16:06:37 +00:00
David Benjamin
8618f2bfe0 Optimize EC_GFp_mont_method's cmp_x_coordinate.
For simplicity, punt order > field or width mismatches. Analogous
optimizations are possible, but the generic path works fine and no
commonly-used curve looks hits those cases.

Before:
Did 5888 ECDSA P-384 verify operations in 3094535us (1902.7 ops/sec)
After [+6.7%]:
Did 6107 ECDSA P-384 verify operations in 3007515us (2030.6 ops/sec)

Also we can fill in p - order generically and avoid extra copies of some
constants.

Change-Id: I38e1b6d51b28ed4f8cb74697b00a4f0fbc5efc3c
Reviewed-on: https://boringssl-review.googlesource.com/c/33068
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2018-11-13 01:48:21 +00:00
David Benjamin
0b3f497bcd Optimize EC_GFp_nistp256_method's cmp_x_coordinate.
Before:
Did 35496 ECDSA P-256 verify operations in 10027999us (3539.7 ops/sec)
After [+6.9%]:
Did 38170 ECDSA P-256 verify operations in 10090160us (3782.9 ops/sec)

Change-Id: Ib272d19954f46d96efc2b6d5dd480b5b85a34523
Reviewed-on: https://boringssl-review.googlesource.com/c/33067
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2018-11-13 00:52:18 +00:00
David Benjamin
fa3aadcd40 Push BIGNUM out of EC_METHOD's affine coordinates hook.
This is in preparation for removing the BIGNUM from cmp_x_coordinate.

Change-Id: Id8394248e3019a4897c238289f039f436a13679d
Reviewed-on: https://boringssl-review.googlesource.com/c/33064
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2018-11-12 21:32:53 +00:00
Adam Langley
9edbc7ff9f Revert "Revert "Speed up ECDSA verify on x86-64.""
This reverts commit e907ed4c4b. CPUID
checks have been added so hopefully this time sticks.

Change-Id: I5e0e5b87427c1230132681f936b3c70bac8263b8
Reviewed-on: https://boringssl-review.googlesource.com/c/32924
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2018-11-07 23:57:22 +00:00
Adam Langley
e907ed4c4b Revert "Speed up ECDSA verify on x86-64."
This reverts commit 3d450d2844. It fails
SDE, looks like a missing CPUID check before using vector instructions.

Change-Id: I6b7dd71d9e5b1f509d2e018bd8be38c973476b4e
Reviewed-on: https://boringssl-review.googlesource.com/c/32864
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2018-11-06 00:29:15 +00:00
David Benjamin
cfd50c63a1 Route the tuned add/dbl implementations out of EC_METHOD.
Some consumer stumbled upon EC_POINT_{add,dbl} being faster with a
"custom" P-224 curve than the built-in one and made "custom" clones to
work around this. Before the EC_FELEM refactor, EC_GFp_nistp224_method
used BN_mod_mul for all reductions in fallback point arithmetic (we
primarily support the multiplication functions and keep the low-level
point arithmetic for legacy reasons) which took quite a performance hit.

EC_FELEM fixed this, but standalone felem_{mul,sqr} calls out of
nistp224 perform a lot of reductions, rather than batching them up as
that implementation is intended. So it is still slightly faster to use a
"custom" curve.

Custom curves are the last thing we want to encourage, so just route the
tuned implementations out of EC_METHOD to close this gap. Now the
built-in implementation is always solidly faster than (or identical to)
the custom clone.  This also reduces the number of places where we mix
up tuned vs. generic implementation, which gets us closer to making
EC_POINT's representation EC_METHOD-specific.

Change-Id: I843e1101a6208eaabb56d29d342e886e523c78b4
Reviewed-on: https://boringssl-review.googlesource.com/c/32848
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2018-11-06 00:17:19 +00:00
Nir Drucker
3d450d2844 Speed up ECDSA verify on x86-64.
This commit improves the performance of ECDSA signature verification
(over NIST P-256 curve) for x86 platforms. The speedup is by a factor of 1.15x.
It does so by:
  1) Leveraging the fact that the verification does not need
     to run in constant time. To this end, we implemented:
    a) the function ecp_nistz256_points_mul_public in a similar way to
       the current ecp_nistz256_points_mul function by removing its constant
       time features.
    b) the Binary Extended Euclidean Algorithm (BEEU) in x86 assembly to
       replace the current modular inverse function used for the inversion.
  2) The last step in the ECDSA_verify function compares the (x) affine
     coordinate with the signature (r) value. Converting x from the Jacobian's
     representation to the affine coordinate requires to perform one inversions
     (x_affine = x * z^(-2)). We save this inversion and speed up the computations
     by instead bringing r to x (r_jacobian = r*z^2) which is faster.

The measured results are:
Before (on a Kaby Lake desktop with gcc-5):
Did 26000 ECDSA P-224 signing operations in 1002372us (25938.5 ops/sec)
Did 11000 ECDSA P-224 verify operations in 1043821us (10538.2 ops/sec)
Did 55000 ECDSA P-256 signing operations in 1017560us (54050.9 ops/sec)
Did 17000 ECDSA P-256 verify operations in 1051280us (16170.8 ops/sec)

After (on a Kaby Lake desktop with gcc-5):
Did 27000 ECDSA P-224 signing operations in 1011287us (26698.7 ops/sec)
Did 11640 ECDSA P-224 verify operations in 1076698us (10810.8 ops/sec)
Did 55000 ECDSA P-256 signing operations in 1016880us (54087.0 ops/sec)
Did 20000 ECDSA P-256 verify operations in 1038736us (19254.2 ops/sec)

Before (on a Skylake server platform with gcc-5):
Did 25000 ECDSA P-224 signing operations in 1021651us (24470.2 ops/sec)
Did 10373 ECDSA P-224 verify operations in 1046563us (9911.5 ops/sec)
Did 50000 ECDSA P-256 signing operations in 1002774us (49861.7 ops/sec)
Did 15000 ECDSA P-256 verify operations in 1006471us (14903.6 ops/sec)

After (on a Skylake server platform with gcc-5):
Did 25000 ECDSA P-224 signing operations in 1020958us (24486.8 ops/sec)
Did 10373 ECDSA P-224 verify operations in 1046359us (9913.4 ops/sec)
Did 50000 ECDSA P-256 signing operations in 1003996us (49801.0 ops/sec)
Did 18000 ECDSA P-256 verify operations in 1021604us (17619.4 ops/sec)

Developers and authors:
***************************************************************************
Nir Drucker (1,2), Shay Gueron (1,2)
(1) Amazon Web Services Inc.
(2) University of Haifa, Israel
***************************************************************************

Change-Id: Idd42a7bc40626bce974ea000b61fdb5bad33851c
Reviewed-on: https://boringssl-review.googlesource.com/c/31304
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2018-11-05 23:48:07 +00:00
David Benjamin
041dd68cec Clear mallocs in ec_wNAF_mul.
EC_POINT is split into the existing public EC_POINT (where the caller is
sanity-checked about group mismatches) and the low-level EC_RAW_POINT
(which, like EC_FELEM and EC_SCALAR, assume that is your problem and is
a plain old struct). Having both EC_POINT and EC_RAW_POINT is a little
silly, but we're going to want different type signatures for functions
which return void anyway (my plan is to lift a non-BIGNUM
get_affine_coordinates up through the ECDSA and ECDH code), so I think
it's fine.

This wasn't strictly necessary, but wnaf.c is a lot tidier now. Perf is
a wash; once we get up to this layer, it's only 8 entries in the table
so not particularly interesting.

Bug: 239
Change-Id: I8ace749393d359f42649a5bb0734597bb7c07a2e
Reviewed-on: https://boringssl-review.googlesource.com/27706
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2018-04-27 19:44:58 +00:00
David Benjamin
e14e4a7ee3 Remove ec_compute_wNAF's failure cases.
Replace them with asserts and better justify why each of the internal
cases are not reachable. Also change the loop to count up to bits+1 so
it is obvious there is no memory error. (The previous loop shape made
more sense when ec_compute_wNAF would return a variable length
schedule.)

Change-Id: I9c7df6abac4290b7a3e545e3d4aa1462108e239e
Reviewed-on: https://boringssl-review.googlesource.com/27705
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2018-04-27 19:24:58 +00:00
David Benjamin
32e0d10069 Add EC_FELEM for EC_POINTs and related temporaries.
This introduces EC_FELEM, which is analogous to EC_SCALAR. It is used
for EC_POINT's representation in the generic EC_METHOD, as well as
random operations on tuned EC_METHODs that still are implemented
genericly.

Unlike EC_SCALAR, EC_FELEM's exact representation is awkwardly specific
to the EC_METHOD, analogous to how the old values were BIGNUMs but may
or may not have been in Montgomery form. This is kind of a nuisance, but
no more than before. (If p224-64.c were easily convertable to Montgomery
form, we could say |EC_FELEM| is always in Montgomery form. If we
exposed the internal add and double implementations in each of the
curves, we could give |EC_POINT| an |EC_METHOD|-specific representation
and |EC_FELEM| is purely a |EC_GFp_mont_method| type. I'll leave this
for later.)

The generic add and doubling formulas are aligned with the formulas
proved in fiat-crypto. Those only applied to a = -3, so I've proved a
generic one in https://github.com/mit-plv/fiat-crypto/pull/356, in case
someone uses a custom curve.  The new formulas are verified,
constant-time, and swap a multiply for a square. As expressed in
fiat-crypto they do use more temporaries, but this seems to be fine with
stack-allocated EC_FELEMs. (We can try to help the compiler later,
but benchamrks below suggest this isn't necessary.)

Unlike BIGNUM, EC_FELEM can be stack-allocated. It also captures the
bounds in the type system and, in particular, that the width is correct,
which will make it easier to select a point in constant-time in the
future. (Indeed the old code did not always have the correct width. Its
point formula involved halving and implemented this in variable time and
variable width.)

Before:
Did 77274 ECDH P-256 operations in 10046087us (7692.0 ops/sec)
Did 5959 ECDH P-384 operations in 10031701us (594.0 ops/sec)
Did 10815 ECDSA P-384 signing operations in 10087892us (1072.1 ops/sec)
Did 8976 ECDSA P-384 verify operations in 10071038us (891.3 ops/sec)
Did 2600 ECDH P-521 operations in 10091688us (257.6 ops/sec)
Did 4590 ECDSA P-521 signing operations in 10055195us (456.5 ops/sec)
Did 3811 ECDSA P-521 verify operations in 10003574us (381.0 ops/sec)

After:
Did 77736 ECDH P-256 operations in 10029858us (7750.5 ops/sec) [+0.8%]
Did 7519 ECDH P-384 operations in 10068076us (746.8 ops/sec) [+25.7%]
Did 13335 ECDSA P-384 signing operations in 10029962us (1329.5 ops/sec) [+24.0%]
Did 11021 ECDSA P-384 verify operations in 10088600us (1092.4 ops/sec) [+22.6%]
Did 2912 ECDH P-521 operations in 10001325us (291.2 ops/sec) [+13.0%]
Did 5150 ECDSA P-521 signing operations in 10027462us (513.6 ops/sec) [+12.5%]
Did 4264 ECDSA P-521 verify operations in 10069694us (423.4 ops/sec) [+11.1%]

This more than pays for removing points_make_affine previously and even
speeds up ECDH P-256 slightly. (The point-on-curve check uses the
generic code.)

Next is to push the stack-allocating up to ec_wNAF_mul, followed by a
constant-time single-point multiplication.

Bug: 239
Change-Id: I44a2dff7c52522e491d0f8cffff64c4ab5cd353c
Reviewed-on: https://boringssl-review.googlesource.com/27668
Reviewed-by: Adam Langley <agl@google.com>
2018-04-25 16:39:58 +00:00
David Benjamin
364a51ec3a Abstract scalar inversion in EC_METHOD.
This introduces a hook for the OpenSSL assembly.

Change-Id: I35e0588f0ed5bed375b12f738d16c9f46ceedeea
Reviewed-on: https://boringssl-review.googlesource.com/27592
Reviewed-by: Adam Langley <alangley@gmail.com>
2018-04-24 16:13:24 +00:00
David Benjamin
5fca613918 Fix typo in point_add.
Rather than writing the answer into the output, it wrote it into some
awkwardly-named temporaries. Thanks to Daniel Hirche for reporting this
issue!

Bug: chromium:825273
Change-Id: I5def4be045cd1925453c9873218e5449bf25e3f5
Reviewed-on: https://boringssl-review.googlesource.com/26785
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2018-03-23 21:12:29 +00:00
David Benjamin
638a408cd2 Add a tuned variable-time P-256 multiplication function.
This reuses wnaf.c's window scheduling, but has access to the tuned
field arithemetic and pre-computed base point table. Unlike wnaf.c, we
do not make the points affine as it's not worth it for a single table.
(We already precomputed the base point table.)

Annoyingly, 32-bit x86 gets slower by a bit, but the other platforms are
faster. My guess is that that the generic code gets to use the
bn_mul_mont assembly and the compiler, faced with the increased 32-bit
register pressure and the extremely register-poor x86, is making
bad decisions on the otherwise P-256-tuned C code. The three platforms
that see much larger gains are significantly more important than 32-bit
x86 at this point, so go with this change.

armv7a (Nexus 5X) before/after [+14.4%]:
Did 2703 ECDSA P-256 verify operations in 5034539us (536.9 ops/sec)
Did 3127 ECDSA P-256 verify operations in 5091379us (614.2 ops/sec)

aarch64 (Nexus 5X) before/after [+9.2%]:
Did 6783 ECDSA P-256 verify operations in 5031324us (1348.2 ops/sec)
Did 7410 ECDSA P-256 verify operations in 5033291us (1472.2 ops/sec)

x86 before/after [-2.7%]:
Did 8961 ECDSA P-256 verify operations in 10075901us (889.3 ops/sec)
Did 8568 ECDSA P-256 verify operations in 10003001us (856.5 ops/sec)

x86_64 before/after [+8.6%]:
Did 29808 ECDSA P-256 verify operations in 10008662us (2978.2 ops/sec)
Did 32528 ECDSA P-256 verify operations in 10057137us (3234.3 ops/sec)

Change-Id: I5fa643149f5bfbbda9533e3008baadfee9979b93
Reviewed-on: https://boringssl-review.googlesource.com/25684
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2018-02-12 22:00:48 +00:00
David Benjamin
0c9b7b5de2 Align various point_get_affine_coordinates implementations.
The P-224 implementation was missing the optimization to avoid doing
extra work when asking for only one coordinate (ECDH and ECDSA both
involve an x-coordinate query). The P-256 implementation was missing the
optimization to do one less Montgomery reduction.

TODO - Benchmarks

Change-Id: I268d9c24737c6da9efaf1c73395b73dd97355de7
Reviewed-on: https://boringssl-review.googlesource.com/24690
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2018-01-08 20:03:42 +00:00
David Benjamin
9112631c1f Remove ftmp* comments from P-256 addition code.
These are remnants of the old code which had a bunch of ftmp variables.

Change-Id: Id14cf414cb67ff08e240970767f7a5a58e883ce4
Reviewed-on: https://boringssl-review.googlesource.com/24689
Reviewed-by: Adam Langley <agl@google.com>
2018-01-08 19:51:03 +00:00
Andres Erbsen
0a54e99848 Add links to proofs of elliptic curve formulas.
Change-Id: I166f740185f26770b51759714efd5d634fbcc173
Reviewed-on: https://boringssl-review.googlesource.com/24424
Reviewed-by: David Benjamin <davidben@google.com>
2017-12-22 19:52:44 +00:00
David Benjamin
6fe960d174 Enable __asm__ and uint128_t code in clang-cl.
It actually works fine. I just forgot one of the typedefs last time.
This gives a roughly 2x improvement on P-256 in clang-cl +
OPENSSL_SMALL, the configuration used by Chrome.

Before:
Did 1302 ECDH P-256 operations in 1015000us (1282.8 ops/sec)
Did 4250 ECDSA P-256 signing operations in 1047000us (4059.2 ops/sec)
Did 1750 ECDSA P-256 verify operations in 1094000us (1599.6 ops/sec)

After:
Did 3250 ECDH P-256 operations in 1078000us (3014.8 ops/sec)
Did 8250 ECDSA P-256 signing operations in 1016000us (8120.1 ops/sec)
Did 3250 ECDSA P-256 verify operations in 1063000us (3057.4 ops/sec)

(These were taken on a VM, so the measurements are extremely noisy, but
this sort of improvement is visible regardless.)

Alas, we do need a little extra bit of fiddling because division does
not work (crbug.com/787617).

Bug: chromium:787617
Update-Note: This removes the MSan uint128_t workaround which does not
    appear to be necessary anymore.
Change-Id: I8361314608521e5bdaf0e7eeae7a02c33f55c69f
Reviewed-on: https://boringssl-review.googlesource.com/23984
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-12-11 22:46:26 +00:00
Andres Erbsen
46304abf7d ec/p256.c: fiat-crypto field arithmetic (64, 32)
The fiat-crypto-generated code uses the Montgomery form implementation
strategy, for both 32-bit and 64-bit code.

64-bit throughput seems slower, but the difference is smaller than noise between repetitions (-2%?)

32-bit throughput has decreased significantly for ECDH (-40%). I am
attributing this to the change from varibale-time scalar multiplication
to constant-time scalar multiplication. Due to the same bottleneck,
ECDSA verification still uses the old code (otherwise there would have
been a 60% throughput decrease). On the other hand, ECDSA signing
throughput has increased slightly (+10%), perhaps due to the use of a
precomputed table of multiples of the base point.

64-bit benchmarks (Google Cloud Haswell):

with this change:
Did 9126 ECDH P-256 operations in 1009572us (9039.5 ops/sec)
Did 23000 ECDSA P-256 signing operations in 1039832us (22119.0 ops/sec)
Did 8820 ECDSA P-256 verify operations in 1024242us (8611.2 ops/sec)

master (40e8c921ca):
Did 9340 ECDH P-256 operations in 1017975us (9175.1 ops/sec)
Did 23000 ECDSA P-256 signing operations in 1039820us (22119.2 ops/sec)
Did 8688 ECDSA P-256 verify operations in 1021108us (8508.4 ops/sec)

benchmarks on ARMv7 (LG Nexus 4):

with this change:
Did 150 ECDH P-256 operations in 1029726us (145.7 ops/sec)
Did 506 ECDSA P-256 signing operations in 1065192us (475.0 ops/sec)
Did 363 ECDSA P-256 verify operations in 1033298us (351.3 ops/sec)

master (2fce1beda0):
Did 245 ECDH P-256 operations in 1017518us (240.8 ops/sec)
Did 473 ECDSA P-256 signing operations in 1086281us (435.4 ops/sec)
Did 360 ECDSA P-256 verify operations in 1003846us (358.6 ops/sec)

64-bit tables converted as follows:

import re, sys, math

p = 2**256 - 2**224 + 2**192 + 2**96 - 1
R = 2**256

def convert(t):
    x0, s1, x1, s2, x2, s3, x3 = t.groups()
    v = int(x0, 0) + 2**64 * (int(x1, 0) + 2**64*(int(x2,0) + 2**64*(int(x3, 0)) ))
    w = v*R%p
    y0 = hex(w%(2**64))
    y1 = hex((w>>64)%(2**64))
    y2 = hex((w>>(2*64))%(2**64))
    y3 = hex((w>>(3*64))%(2**64))
    ww = int(y0, 0) + 2**64 * (int(y1, 0) + 2**64*(int(y2,0) + 2**64*(int(y3, 0)) ))
    if ww != v*R%p:
        print(x0,x1,x2,x3)
        print(hex(v))
        print(y0,y1,y2,y3)
        print(hex(w))
        print(hex(ww))
        assert 0
    return '{'+y0+s1+y1+s2+y2+s3+y3+'}'

fe_re = re.compile('{'+r'(\s*,\s*)'.join(r'(\d+|0x[abcdefABCDEF0123456789]+)' for i in range(4)) + '}')
print (re.sub(fe_re, convert, sys.stdin.read()).rstrip('\n'))

32-bit tables converted from 64-bit tables

Change-Id: I52d6e5504fcb6ca2e8b0ee13727f4500c80c1799
Reviewed-on: https://boringssl-review.googlesource.com/23244
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-12-11 17:55:46 +00:00