boringssl

History

David Benjamin 55db667c62 Enable vpaes for aarch64, with CTR optimizations. This patches vpaes-armv8.pl to add vpaes_ctr32_encrypt_blocks. CTR mode is by far the most important mode these days. It should have access to _vpaes_encrypt_2x, which gives a considerable speed boost. Also exclude vpaes_ecb_* as they're not even used. For iOS, this change is completely a no-op. iOS ARMv8 always has crypto extensions, and we already statically drop all other AES implementations. Android ARMv8 is not required to have crypto extensions, but every ARMv8 device I've seen has them. For those, it is a no-op performance-wise and a win on size. vpaes appears to be about 5.6KiB smaller than the tables. ARMv8 always makes SIMD (NEON) available, so we can statically drop aes_nohw. In theory, however, crypto-less Android ARMv8 is possible. Today such chips get a variable-time AES. This CL fixes this, but the performance story is complex. The Raspberry Pi 3 is not Android but has a Cortex-A53 chip without crypto extensions. (But the official images are 32-bit, so even this is slightly artificial...) There, vpaes is a performance win. Raspberry Pi 3, Model B+, Cortex-A53 Before: Did 265000 AES-128-GCM (16 bytes) seal operations in 1003312us (264125.2 ops/sec): 4.2 MB/s Did 44000 AES-128-GCM (256 bytes) seal operations in 1002141us (43906.0 ops/sec): 11.2 MB/s Did 9394 AES-128-GCM (1350 bytes) seal operations in 1032104us (9101.8 ops/sec): 12.3 MB/s Did 1562 AES-128-GCM (8192 bytes) seal operations in 1008982us (1548.1 ops/sec): 12.7 MB/s After: Did 277000 AES-128-GCM (16 bytes) seal operations in 1001884us (276479.1 ops/sec): 4.4 MB/s Did 52000 AES-128-GCM (256 bytes) seal operations in 1001480us (51923.2 ops/sec): 13.3 MB/s Did 11000 AES-128-GCM (1350 bytes) seal operations in 1007979us (10912.9 ops/sec): 14.7 MB/s Did 2013 AES-128-GCM (8192 bytes) seal operations in 1085545us (1854.4 ops/sec): 15.2 MB/s The Pixel 3 has a Cortex-A75 with crypto extensions, so it would never run this code. However, artificially ignoring them gives another data point (ARM documentation[] suggests the extensions are still optional on a Cortex-A75.) Sadly, vpaes no longer wins on perf over aes_nohw. But, it is constant-time: Pixel 3, AES/PMULL extensions ignored, Cortex-A75: Before: Did 2102000 AES-128-GCM (16 bytes) seal operations in 1000378us (2101205.7 ops/sec): 33.6 MB/s Did 358000 AES-128-GCM (256 bytes) seal operations in 1002658us (357051.0 ops/sec): 91.4 MB/s Did 75000 AES-128-GCM (1350 bytes) seal operations in 1012830us (74049.9 ops/sec): 100.0 MB/s Did 13000 AES-128-GCM (8192 bytes) seal operations in 1036524us (12541.9 ops/sec): 102.7 MB/s After: Did 1453000 AES-128-GCM (16 bytes) seal operations in 1000213us (1452690.6 ops/sec): 23.2 MB/s Did 285000 AES-128-GCM (256 bytes) seal operations in 1002227us (284366.7 ops/sec): 72.8 MB/s Did 60000 AES-128-GCM (1350 bytes) seal operations in 1016106us (59049.0 ops/sec): 79.7 MB/s Did 11000 AES-128-GCM (8192 bytes) seal operations in 1094184us (10053.2 ops/sec): 82.4 MB/s Note the numbers above run with PMULL off, so the slow GHASH is dampening the regression. If we test aes_nohw and vpaes paired with PMULL on, the 20% perf hit becomes a 31% hit. The PMULL-less variant is more likely to represent a real chip. This is consistent with upstream's note in the comment, though it is unclear if 20% is the right order of magnitude: "these results are worse than scalar compiler-generated code, but it's constant-time and therefore preferred". [] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100458_0301_00_en/lau1442495529696.html Bug: 246 Change-Id: If1dc87f5131fce742052498295476fbae4628dbf Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35026 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>		2019-03-04 20:31:39 +00:00
..
asn1	Modernize OPENSSL_COMPILE_ASSERT, part 2.	2018-11-14 16:06:37 +00:00
base64	Modernize OPENSSL_COMPILE_ASSERT, part 2.	2018-11-14 16:06:37 +00:00
bio	Fix d2i_*_bio on partial reads.	2018-12-05 22:05:28 +00:00
bn_extra	Add some Node compatibility functions.	2019-01-25 16:50:30 +00:00
buf	Flatten most of the crypto target.	2018-09-05 23:41:25 +00:00
bytestring	Add uint64_t support in CBS and CBB.	2019-02-22 20:38:17 +00:00
chacha	Add ABI tests for ChaCha20_ctr32.	2019-01-09 03:11:45 +00:00
cipher_extra	sync EVP_get_cipherbyname with EVP_do_all_sorted	2019-02-11 17:20:23 +00:00
cmac	Flatten most of the crypto target.	2018-09-05 23:41:25 +00:00
conf	Use proper functions for lh_*.	2018-10-15 23:37:04 +00:00
curve25519	Automatically disable assembly with MSAN.	2018-09-07 21:12:37 +00:00
dh	Flatten most of the crypto target.	2018-09-05 23:41:25 +00:00
digest_extra	Fix undefined pointer casts in SHA-512 code.	2019-01-22 23:18:36 +00:00
dsa	Tidy up dsa_sign_setup.	2018-10-25 21:51:57 +00:00
ec_extra	Use EC_RAW_POINT in ECDSA.	2018-11-13 02:06:46 +00:00
ecdh_extra	Clean up EC_POINT to byte conversions.	2018-11-13 17:27:59 +00:00
ecdsa_extra	Remove unreachable code.	2018-11-12 23:34:36 +00:00
engine	Flatten most of the crypto target.	2018-09-05 23:41:25 +00:00
err	Enforce key usage for RSA keys in TLS 1.2.	2019-01-30 21:28:34 +00:00
evp	Add a very roundabout EC keygen API.	2019-01-25 23:08:12 +00:00
fipsmodule	Enable vpaes for aarch64, with CTR optimizations.	2019-03-04 20:31:39 +00:00
hkdf	Flatten most of the crypto target.	2018-09-05 23:41:25 +00:00
hmac_extra	Convert a number of tests to GTest.	2017-06-01 17:02:13 +00:00
hrss	HRSS: flatten sample distribution.	2019-01-22 22:06:43 +00:00
lhash	Fix undefined function pointer casts in LHASH.	2018-10-15 23:53:24 +00:00
obj	Add initial HRSS support.	2018-12-12 17:35:02 +00:00
pem	Rewrite PEM_X509_INFO_read_bio.	2018-10-01 17:35:10 +00:00
perlasm	Fix x86_64-xlate.pl comment regex.	2019-02-21 16:50:17 +00:00
pkcs7	Fix undefined function pointer casts in {d2i,i2d}_Foo_{bio,fp}	2018-10-01 17:34:53 +00:00
pkcs8	Fix undefined function pointer casts in {d2i,i2d}_Foo_{bio,fp}	2018-10-01 17:34:53 +00:00
poly1305	Automatically disable assembly with MSAN.	2018-09-07 21:12:37 +00:00
pool	Clear out a bunch of -Wextra-semi warnings.	2019-02-21 19:12:39 +00:00
rand_extra	Unwind RDRAND functions correctly on Windows.	2019-02-12 20:24:27 +00:00
rc4	Flatten most of the crypto target.	2018-09-05 23:41:25 +00:00
rsa_extra	Rename OPENSSL_NO_THREADS, part 1.	2018-09-26 19:10:02 +00:00
stack	Don't pass NULL,0 to qsort.	2019-01-22 23:28:38 +00:00
test	Add a reference for Linux ARM ABI.	2019-02-27 17:18:02 +00:00
x509	Fix d2i_*_bio on partial reads.	2018-12-05 22:05:28 +00:00
x509v3	Unexport and rename hex_to_string, string_to_hex, and name_cmp.	2018-11-27 00:08:39 +00:00
abi_self_test.cc	Use Windows symbol APIs in the unwind tester.	2019-02-12 20:42:47 +00:00
CMakeLists.txt	Implement ABI testing for aarch64.	2019-02-05 21:44:04 +00:00
compiler_test.cc	Test that nullptr has the obvious memory representation.	2017-07-28 17:39:28 +00:00
constant_time_test.cc	Add a test for CRYPTO_memcmp.	2018-03-27 16:22:47 +00:00
cpu-aarch64-fuchsia.c	Add cpu-aarch64-fuchsia.c	2018-02-13 20:12:47 +00:00
cpu-aarch64-linux.c	Add cpu-aarch64-fuchsia.c	2018-02-13 20:12:47 +00:00
cpu-arm-linux_test.cc	Move ARM cpuinfo functions to the header.	2018-11-21 00:46:57 +00:00
cpu-arm-linux.c	Move ARM cpuinfo functions to the header.	2018-11-21 00:46:57 +00:00
cpu-arm-linux.h	Move ARM cpuinfo functions to the header.	2018-11-21 00:46:57 +00:00
cpu-arm.c	Rewrite ARM feature detection.	2016-03-26 04:54:44 +00:00
cpu-intel.c	Pretend AMD XOP was never a thing.	2018-12-03 22:59:55 +00:00
cpu-ppc64le.c	Run the comment converter on libcrypto.	2017-08-18 21:49:04 +00:00
crypto.c	Add test of assembly code dispatch.	2019-01-22 20:22:53 +00:00
ex_data.c	Unexport more of lhash.	2017-10-25 04:17:18 +00:00
impl_dispatch_test.cc	Enable vpaes for AES_* functions.	2019-02-22 23:09:19 +00:00
internal.h	Fix header file for _byteswap_ulong and _byteswap_uint64 from MSVC CRT	2019-01-14 19:49:39 +00:00
mem.c	silence unused variable warnings when using OPENSSL_clear_free	2019-03-04 19:55:29 +00:00
refcount_c11.c	Run the comment converter on libcrypto.	2017-08-18 21:49:04 +00:00
refcount_lock.c	Modernize OPENSSL_COMPILE_ASSERT, part 2.	2018-11-14 16:06:37 +00:00
refcount_test.cc	Rename OPENSSL_NO_THREADS, part 1.	2018-09-26 19:10:02 +00:00
self_test.cc	Extract FIPS KAT tests into a function.	2018-01-22 20:16:38 +00:00
thread_none.c	Rename OPENSSL_NO_THREADS, part 1.	2018-09-26 19:10:02 +00:00
thread_pthread.c	Modernize OPENSSL_COMPILE_ASSERT, part 2.	2018-11-14 16:06:37 +00:00
thread_test.cc	Rename OPENSSL_NO_THREADS, part 1.	2018-09-26 19:10:02 +00:00
thread_win.c	Replace the last CRITICAL_SECTION with SRWLOCK.	2018-12-03 20:37:35 +00:00
thread.c	Remove a bunch of unnecessary includes.	2016-06-28 20:31:14 +00:00