638a408cd2
This reuses wnaf.c's window scheduling, but has access to the tuned field arithemetic and pre-computed base point table. Unlike wnaf.c, we do not make the points affine as it's not worth it for a single table. (We already precomputed the base point table.) Annoyingly, 32-bit x86 gets slower by a bit, but the other platforms are faster. My guess is that that the generic code gets to use the bn_mul_mont assembly and the compiler, faced with the increased 32-bit register pressure and the extremely register-poor x86, is making bad decisions on the otherwise P-256-tuned C code. The three platforms that see much larger gains are significantly more important than 32-bit x86 at this point, so go with this change. armv7a (Nexus 5X) before/after [+14.4%]: Did 2703 ECDSA P-256 verify operations in 5034539us (536.9 ops/sec) Did 3127 ECDSA P-256 verify operations in 5091379us (614.2 ops/sec) aarch64 (Nexus 5X) before/after [+9.2%]: Did 6783 ECDSA P-256 verify operations in 5031324us (1348.2 ops/sec) Did 7410 ECDSA P-256 verify operations in 5033291us (1472.2 ops/sec) x86 before/after [-2.7%]: Did 8961 ECDSA P-256 verify operations in 10075901us (889.3 ops/sec) Did 8568 ECDSA P-256 verify operations in 10003001us (856.5 ops/sec) x86_64 before/after [+8.6%]: Did 29808 ECDSA P-256 verify operations in 10008662us (2978.2 ops/sec) Did 32528 ECDSA P-256 verify operations in 10057137us (3234.3 ops/sec) Change-Id: I5fa643149f5bfbbda9533e3008baadfee9979b93 Reviewed-on: https://boringssl-review.googlesource.com/25684 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> |
||
---|---|---|
.. | ||
aes | ||
bn | ||
cipher | ||
des | ||
digest | ||
ec | ||
ecdsa | ||
hmac | ||
md4 | ||
md5 | ||
modes | ||
policydocs | ||
rand | ||
rsa | ||
self_check | ||
sha | ||
tls | ||
bcm.c | ||
CMakeLists.txt | ||
delocate.h | ||
FIPS.md | ||
intcheck1.png | ||
intcheck2.png | ||
intcheck3.png | ||
is_fips.c |