boringssl/crypto/fipsmodule
David Benjamin 55db667c62 Enable vpaes for aarch64, with CTR optimizations.
This patches vpaes-armv8.pl to add vpaes_ctr32_encrypt_blocks. CTR mode
is by far the most important mode these days. It should have access to
_vpaes_encrypt_2x, which gives a considerable speed boost. Also exclude
vpaes_ecb_* as they're not even used.

For iOS, this change is completely a no-op. iOS ARMv8 always has crypto
extensions, and we already statically drop all other AES
implementations.

Android ARMv8 is *not* required to have crypto extensions, but every
ARMv8 device I've seen has them. For those, it is a no-op
performance-wise and a win on size. vpaes appears to be about 5.6KiB
smaller than the tables. ARMv8 always makes SIMD (NEON) available, so we
can statically drop aes_nohw.

In theory, however, crypto-less Android ARMv8 is possible. Today such
chips get a variable-time AES. This CL fixes this, but the performance
story is complex.

The Raspberry Pi 3 is not Android but has a Cortex-A53 chip
without crypto extensions. (But the official images are 32-bit, so even
this is slightly artificial...) There, vpaes is a performance win.

Raspberry Pi 3, Model B+, Cortex-A53
Before:
Did 265000 AES-128-GCM (16 bytes) seal operations in 1003312us (264125.2 ops/sec): 4.2 MB/s
Did 44000 AES-128-GCM (256 bytes) seal operations in 1002141us (43906.0 ops/sec): 11.2 MB/s
Did 9394 AES-128-GCM (1350 bytes) seal operations in 1032104us (9101.8 ops/sec): 12.3 MB/s
Did 1562 AES-128-GCM (8192 bytes) seal operations in 1008982us (1548.1 ops/sec): 12.7 MB/s
After:
Did 277000 AES-128-GCM (16 bytes) seal operations in 1001884us (276479.1 ops/sec): 4.4 MB/s
Did 52000 AES-128-GCM (256 bytes) seal operations in 1001480us (51923.2 ops/sec): 13.3 MB/s
Did 11000 AES-128-GCM (1350 bytes) seal operations in 1007979us (10912.9 ops/sec): 14.7 MB/s
Did 2013 AES-128-GCM (8192 bytes) seal operations in 1085545us (1854.4 ops/sec): 15.2 MB/s

The Pixel 3 has a Cortex-A75 with crypto extensions, so it would never
run this code. However, artificially ignoring them gives another data
point (ARM documentation[*] suggests the extensions are still optional
on a Cortex-A75.) Sadly, vpaes no longer wins on perf over aes_nohw.
But, it is constant-time:

Pixel 3, AES/PMULL extensions ignored, Cortex-A75:
Before:
Did 2102000 AES-128-GCM (16 bytes) seal operations in 1000378us (2101205.7 ops/sec): 33.6 MB/s
Did 358000 AES-128-GCM (256 bytes) seal operations in 1002658us (357051.0 ops/sec): 91.4 MB/s
Did 75000 AES-128-GCM (1350 bytes) seal operations in 1012830us (74049.9 ops/sec): 100.0 MB/s
Did 13000 AES-128-GCM (8192 bytes) seal operations in 1036524us (12541.9 ops/sec): 102.7 MB/s
After:
Did 1453000 AES-128-GCM (16 bytes) seal operations in 1000213us (1452690.6 ops/sec): 23.2 MB/s
Did 285000 AES-128-GCM (256 bytes) seal operations in 1002227us (284366.7 ops/sec): 72.8 MB/s
Did 60000 AES-128-GCM (1350 bytes) seal operations in 1016106us (59049.0 ops/sec): 79.7 MB/s
Did 11000 AES-128-GCM (8192 bytes) seal operations in 1094184us (10053.2 ops/sec): 82.4 MB/s

Note the numbers above run with PMULL off, so the slow GHASH is
dampening the regression. If we test aes_nohw and vpaes paired with
PMULL on, the 20% perf hit becomes a 31% hit. The PMULL-less variant is
more likely to represent a real chip.

This is consistent with upstream's note in the comment, though it is
unclear if 20% is the right order of magnitude: "these results are worse
than scalar compiler-generated code, but it's constant-time and
therefore preferred".

[*] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100458_0301_00_en/lau1442495529696.html

Bug: 246
Change-Id: If1dc87f5131fce742052498295476fbae4628dbf
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35026
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-03-04 20:31:39 +00:00
..
aes Enable vpaes for aarch64, with CTR optimizations. 2019-03-04 20:31:39 +00:00
bn Add ABI tests for x86_64-mont5.pl. 2019-02-11 19:27:13 +00:00
cipher Enable vpaes for aarch64, with CTR optimizations. 2019-03-04 20:31:39 +00:00
des Move OPENSSL_FALLTHROUGH to internal headers. 2018-01-29 18:17:57 +00:00
digest Match OpenSSL's EVP_MD_CTX_reset return value. 2018-05-29 17:07:16 +00:00
ec Handle NULL public key in |EC_KEY_set_public_key|. 2019-03-04 19:45:29 +00:00
ecdh Clean up EC_POINT to byte conversions. 2018-11-13 17:27:59 +00:00
ecdsa Modernize OPENSSL_COMPILE_ASSERT, part 2. 2018-11-14 16:06:37 +00:00
hmac Switch OPENSSL_VERSION_NUMBER to 1.1.0. 2017-09-29 04:51:27 +00:00
md4 Run the comment converter on libcrypto. 2017-08-18 21:49:04 +00:00
md5 Add ABI tests for MD5. 2019-01-08 18:01:07 +00:00
modes Add a 32-bit SSSE3 GHASH implementation. 2019-03-04 19:02:52 +00:00
policydocs Include details about latest FIPS certification. 2018-11-05 19:03:25 +00:00
rand Clear out a bunch of -Wextra-semi warnings. 2019-02-21 19:12:39 +00:00
rsa Clear out a bunch of -Wextra-semi warnings. 2019-02-21 19:12:39 +00:00
self_check Always print some diagnostic information when POST fails. 2018-09-28 19:33:38 +00:00
sha Remove union from |SHA512_CTX|. 2019-01-22 23:36:46 +00:00
tls Fix include path. 2018-05-08 16:26:05 +00:00
bcm.c Always print some diagnostic information when POST fails. 2018-09-28 19:33:38 +00:00
CMakeLists.txt Enable vpaes for aarch64, with CTR optimizations. 2019-03-04 20:31:39 +00:00
delocate.h Use a pool of |rand_state| objects. 2018-07-06 21:25:37 +00:00
FIPS.md Include details about latest FIPS certification. 2018-11-05 19:03:25 +00:00
intcheck1.png
intcheck2.png Inject FIPS hash without running module. 2017-04-12 23:09:38 +00:00
intcheck3.png
is_fips.c Add some more compatibility functions. 2018-05-08 20:51:15 +00:00