boringssl/crypto
David Benjamin 55db667c62 Enable vpaes for aarch64, with CTR optimizations.
This patches vpaes-armv8.pl to add vpaes_ctr32_encrypt_blocks. CTR mode
is by far the most important mode these days. It should have access to
_vpaes_encrypt_2x, which gives a considerable speed boost. Also exclude
vpaes_ecb_* as they're not even used.

For iOS, this change is completely a no-op. iOS ARMv8 always has crypto
extensions, and we already statically drop all other AES
implementations.

Android ARMv8 is *not* required to have crypto extensions, but every
ARMv8 device I've seen has them. For those, it is a no-op
performance-wise and a win on size. vpaes appears to be about 5.6KiB
smaller than the tables. ARMv8 always makes SIMD (NEON) available, so we
can statically drop aes_nohw.

In theory, however, crypto-less Android ARMv8 is possible. Today such
chips get a variable-time AES. This CL fixes this, but the performance
story is complex.

The Raspberry Pi 3 is not Android but has a Cortex-A53 chip
without crypto extensions. (But the official images are 32-bit, so even
this is slightly artificial...) There, vpaes is a performance win.

Raspberry Pi 3, Model B+, Cortex-A53
Before:
Did 265000 AES-128-GCM (16 bytes) seal operations in 1003312us (264125.2 ops/sec): 4.2 MB/s
Did 44000 AES-128-GCM (256 bytes) seal operations in 1002141us (43906.0 ops/sec): 11.2 MB/s
Did 9394 AES-128-GCM (1350 bytes) seal operations in 1032104us (9101.8 ops/sec): 12.3 MB/s
Did 1562 AES-128-GCM (8192 bytes) seal operations in 1008982us (1548.1 ops/sec): 12.7 MB/s
After:
Did 277000 AES-128-GCM (16 bytes) seal operations in 1001884us (276479.1 ops/sec): 4.4 MB/s
Did 52000 AES-128-GCM (256 bytes) seal operations in 1001480us (51923.2 ops/sec): 13.3 MB/s
Did 11000 AES-128-GCM (1350 bytes) seal operations in 1007979us (10912.9 ops/sec): 14.7 MB/s
Did 2013 AES-128-GCM (8192 bytes) seal operations in 1085545us (1854.4 ops/sec): 15.2 MB/s

The Pixel 3 has a Cortex-A75 with crypto extensions, so it would never
run this code. However, artificially ignoring them gives another data
point (ARM documentation[*] suggests the extensions are still optional
on a Cortex-A75.) Sadly, vpaes no longer wins on perf over aes_nohw.
But, it is constant-time:

Pixel 3, AES/PMULL extensions ignored, Cortex-A75:
Before:
Did 2102000 AES-128-GCM (16 bytes) seal operations in 1000378us (2101205.7 ops/sec): 33.6 MB/s
Did 358000 AES-128-GCM (256 bytes) seal operations in 1002658us (357051.0 ops/sec): 91.4 MB/s
Did 75000 AES-128-GCM (1350 bytes) seal operations in 1012830us (74049.9 ops/sec): 100.0 MB/s
Did 13000 AES-128-GCM (8192 bytes) seal operations in 1036524us (12541.9 ops/sec): 102.7 MB/s
After:
Did 1453000 AES-128-GCM (16 bytes) seal operations in 1000213us (1452690.6 ops/sec): 23.2 MB/s
Did 285000 AES-128-GCM (256 bytes) seal operations in 1002227us (284366.7 ops/sec): 72.8 MB/s
Did 60000 AES-128-GCM (1350 bytes) seal operations in 1016106us (59049.0 ops/sec): 79.7 MB/s
Did 11000 AES-128-GCM (8192 bytes) seal operations in 1094184us (10053.2 ops/sec): 82.4 MB/s

Note the numbers above run with PMULL off, so the slow GHASH is
dampening the regression. If we test aes_nohw and vpaes paired with
PMULL on, the 20% perf hit becomes a 31% hit. The PMULL-less variant is
more likely to represent a real chip.

This is consistent with upstream's note in the comment, though it is
unclear if 20% is the right order of magnitude: "these results are worse
than scalar compiler-generated code, but it's constant-time and
therefore preferred".

[*] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100458_0301_00_en/lau1442495529696.html

Bug: 246
Change-Id: If1dc87f5131fce742052498295476fbae4628dbf
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35026
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-03-04 20:31:39 +00:00
..
asn1 Modernize OPENSSL_COMPILE_ASSERT, part 2. 2018-11-14 16:06:37 +00:00
base64 Modernize OPENSSL_COMPILE_ASSERT, part 2. 2018-11-14 16:06:37 +00:00
bio Fix d2i_*_bio on partial reads. 2018-12-05 22:05:28 +00:00
bn_extra Add some Node compatibility functions. 2019-01-25 16:50:30 +00:00
buf Flatten most of the crypto target. 2018-09-05 23:41:25 +00:00
bytestring Add uint64_t support in CBS and CBB. 2019-02-22 20:38:17 +00:00
chacha Add ABI tests for ChaCha20_ctr32. 2019-01-09 03:11:45 +00:00
cipher_extra sync EVP_get_cipherbyname with EVP_do_all_sorted 2019-02-11 17:20:23 +00:00
cmac Flatten most of the crypto target. 2018-09-05 23:41:25 +00:00
conf Use proper functions for lh_*. 2018-10-15 23:37:04 +00:00
curve25519 Automatically disable assembly with MSAN. 2018-09-07 21:12:37 +00:00
dh Flatten most of the crypto target. 2018-09-05 23:41:25 +00:00
digest_extra Fix undefined pointer casts in SHA-512 code. 2019-01-22 23:18:36 +00:00
dsa Tidy up dsa_sign_setup. 2018-10-25 21:51:57 +00:00
ec_extra Use EC_RAW_POINT in ECDSA. 2018-11-13 02:06:46 +00:00
ecdh_extra Clean up EC_POINT to byte conversions. 2018-11-13 17:27:59 +00:00
ecdsa_extra Remove unreachable code. 2018-11-12 23:34:36 +00:00
engine Flatten most of the crypto target. 2018-09-05 23:41:25 +00:00
err Enforce key usage for RSA keys in TLS 1.2. 2019-01-30 21:28:34 +00:00
evp Add a very roundabout EC keygen API. 2019-01-25 23:08:12 +00:00
fipsmodule Enable vpaes for aarch64, with CTR optimizations. 2019-03-04 20:31:39 +00:00
hkdf Flatten most of the crypto target. 2018-09-05 23:41:25 +00:00
hmac_extra Convert a number of tests to GTest. 2017-06-01 17:02:13 +00:00
hrss HRSS: flatten sample distribution. 2019-01-22 22:06:43 +00:00
lhash Fix undefined function pointer casts in LHASH. 2018-10-15 23:53:24 +00:00
obj Add initial HRSS support. 2018-12-12 17:35:02 +00:00
pem Rewrite PEM_X509_INFO_read_bio. 2018-10-01 17:35:10 +00:00
perlasm Fix x86_64-xlate.pl comment regex. 2019-02-21 16:50:17 +00:00
pkcs7 Fix undefined function pointer casts in {d2i,i2d}_Foo_{bio,fp} 2018-10-01 17:34:53 +00:00
pkcs8 Fix undefined function pointer casts in {d2i,i2d}_Foo_{bio,fp} 2018-10-01 17:34:53 +00:00
poly1305 Automatically disable assembly with MSAN. 2018-09-07 21:12:37 +00:00
pool Clear out a bunch of -Wextra-semi warnings. 2019-02-21 19:12:39 +00:00
rand_extra Unwind RDRAND functions correctly on Windows. 2019-02-12 20:24:27 +00:00
rc4 Flatten most of the crypto target. 2018-09-05 23:41:25 +00:00
rsa_extra Rename OPENSSL_NO_THREADS, part 1. 2018-09-26 19:10:02 +00:00
stack Don't pass NULL,0 to qsort. 2019-01-22 23:28:38 +00:00
test Add a reference for Linux ARM ABI. 2019-02-27 17:18:02 +00:00
x509 Fix d2i_*_bio on partial reads. 2018-12-05 22:05:28 +00:00
x509v3 Unexport and rename hex_to_string, string_to_hex, and name_cmp. 2018-11-27 00:08:39 +00:00
abi_self_test.cc Use Windows symbol APIs in the unwind tester. 2019-02-12 20:42:47 +00:00
CMakeLists.txt Implement ABI testing for aarch64. 2019-02-05 21:44:04 +00:00
compiler_test.cc Test that nullptr has the obvious memory representation. 2017-07-28 17:39:28 +00:00
constant_time_test.cc Add a test for CRYPTO_memcmp. 2018-03-27 16:22:47 +00:00
cpu-aarch64-fuchsia.c Add cpu-aarch64-fuchsia.c 2018-02-13 20:12:47 +00:00
cpu-aarch64-linux.c Add cpu-aarch64-fuchsia.c 2018-02-13 20:12:47 +00:00
cpu-arm-linux_test.cc Move ARM cpuinfo functions to the header. 2018-11-21 00:46:57 +00:00
cpu-arm-linux.c Move ARM cpuinfo functions to the header. 2018-11-21 00:46:57 +00:00
cpu-arm-linux.h Move ARM cpuinfo functions to the header. 2018-11-21 00:46:57 +00:00
cpu-arm.c Rewrite ARM feature detection. 2016-03-26 04:54:44 +00:00
cpu-intel.c Pretend AMD XOP was never a thing. 2018-12-03 22:59:55 +00:00
cpu-ppc64le.c Run the comment converter on libcrypto. 2017-08-18 21:49:04 +00:00
crypto.c Add test of assembly code dispatch. 2019-01-22 20:22:53 +00:00
ex_data.c Unexport more of lhash. 2017-10-25 04:17:18 +00:00
impl_dispatch_test.cc Enable vpaes for AES_* functions. 2019-02-22 23:09:19 +00:00
internal.h Fix header file for _byteswap_ulong and _byteswap_uint64 from MSVC CRT 2019-01-14 19:49:39 +00:00
mem.c silence unused variable warnings when using OPENSSL_clear_free 2019-03-04 19:55:29 +00:00
refcount_c11.c Run the comment converter on libcrypto. 2017-08-18 21:49:04 +00:00
refcount_lock.c Modernize OPENSSL_COMPILE_ASSERT, part 2. 2018-11-14 16:06:37 +00:00
refcount_test.cc Rename OPENSSL_NO_THREADS, part 1. 2018-09-26 19:10:02 +00:00
self_test.cc Extract FIPS KAT tests into a function. 2018-01-22 20:16:38 +00:00
thread_none.c Rename OPENSSL_NO_THREADS, part 1. 2018-09-26 19:10:02 +00:00
thread_pthread.c Modernize OPENSSL_COMPILE_ASSERT, part 2. 2018-11-14 16:06:37 +00:00
thread_test.cc Rename OPENSSL_NO_THREADS, part 1. 2018-09-26 19:10:02 +00:00
thread_win.c Replace the last CRITICAL_SECTION with SRWLOCK. 2018-12-03 20:37:35 +00:00
thread.c Remove a bunch of unnecessary includes. 2016-06-28 20:31:14 +00:00