boringssl

Author	SHA1	Message	Date
David Benjamin	d22578f366	Adapt gcm__neon to aarch64. This makes AES-GCM always constant-time on aarch64 (provided assembly is enabled). Unlike vpaes, this does come at a binary size penalty of 1K compared to the gcm__4bit version. ABI testing already covered by GCMTest.ABI (GHASH_ASM_ARM covers both OPENSSL_ARM and OPENSSL_AARCH64.) Cortex-A53 (Raspberry Pi 3 Model B+) Before: Did 274000 AES-128-GCM (16 bytes) seal operations in 1003461us (273055.0 ops/sec): 4.4 MB/s Did 53000 AES-128-GCM (256 bytes) seal operations in 1007689us (52595.6 ops/sec): 13.5 MB/s Did 12000 AES-128-GCM (1350 bytes) seal operations in 1075908us (11153.4 ops/sec): 15.1 MB/s Did 2068 AES-128-GCM (8192 bytes) seal operations in 1089037us (1898.9 ops/sec): 15.6 MB/s After: Did 298000 AES-128-GCM (16 bytes) seal operations in 1002917us (297133.3 ops/sec): 4.8 MB/s Did 64000 AES-128-GCM (256 bytes) seal operations in 1001124us (63928.1 ops/sec): 16.4 MB/s Did 14000 AES-128-GCM (1350 bytes) seal operations in 1015477us (13786.6 ops/sec): 18.6 MB/s Did 2497 AES-128-GCM (8192 bytes) seal operations in 1057951us (2360.2 ops/sec): 19.3 MB/s Bug: 265 Change-Id: I251bf0f2eae0578580bb14192755e5d8ff64cd14 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35285 Reviewed-by: Adam Langley <agl@google.com>	2019-03-14 21:43:27 +00:00
David Benjamin	5ce12e6436	Add a 32-bit SSSE3 GHASH implementation. The 64-bit version can be fairly straightforwardly translated. Ironically, this makes 32-bit x86 the first architecture to meet the goal of constant-time AES-GCM given SIMD assembly. (Though x86_64 could join by simply giving up on bsaes...) Bug: 263 Change-Id: Icb2cec936457fac7132bbb5dbb094433bc14b86e Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35024 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-03-04 19:02:52 +00:00
David Benjamin	a57435e138	Remove __ARM_ARCH__ guard on gcm_*_v8. OpenSSL's c1669e1c205dc8e695fb0c10a655f434e758b9f7 switched it to __ARM_MAX_ARCH__, which we mirrored in assembly but not C. The C version should be __ARM_MAX_ARCH__ to match. However, __ARM_MAX_ARCH__ is hardcoded to 8, so just remove the check. Change-Id: Ic873203db1478f49437b889b84ee7fb28eba1a6d Reviewed-on: https://boringssl-review.googlesource.com/c/35045 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-02-27 02:26:21 +00:00
David Benjamin	104306f587	Remove STRICT_ALIGNMENT code from modes. STRICT_ALIGNMENT is a remnant of OpenSSL code would cast pointers to size_t* and load more than one byte at a time. Not all architectures support unaligned access, so it did an alignment check and only enterred this path if aligned or the underlying architecture didn't care. This is UB. Unaligned casts in C are undefined on all architectures, so we switch these to memcpy some time ago. Compilers can optimize memcpy to the unaligned accesses we wanted. That left our modes logic as: - If STRICT_ALIGNMENT is 1 and things are unaligned, work byte-by-byte. - Otherwise, use the memcpy-based word-by-word code, which now works independent of STRICT_ALIGNMENT. Remove the first check to simplify things. On x86, x86_64, and aarch64, STRICT_ALIGNMENT is zero and this is a no-op. ARM is more complex. Per [0], ARMv7 and up support unaligned access. ARMv5 do not. ARMv6 does, but can run in a mode where it looks more like ARMv5. For ARMv7 and up, STRICT_ALIGNMENT should have been zero, but was one. Thus this change should be an improvement for ARMv7 (right now unaligned inputs lose bsaes-armv7). The Android NDK does not even support the pre-ARMv7 ABI anymore[1]. Nonetheless, Cronet still supports ARMv6 as a library. It builds with -march=armv6 which GCC interprets as supporting unaligned access, so it too did not want this code. For completeness, should anyone still care about ARMv5 or be building with an overly permissive -march flag, GCC does appear unable to inline the memcpy calls. However, GCC also does not interpret (uintptr_t)ptr % sizeof(size_t) as an alignment assertion, so such consumers have already been paying for the memcpy here and throughout the library. In general, C's arcane pointer rules mean we must resort to memcpy often, so, realistically, we must require that the compiler optimize memcpy well. [0] https://medium.com/@iLevex/the-curious-case-of-unaligned-access-on-arm-5dd0ebe24965 [1] https://developer.android.com/ndk/guides/abis#armeabi Change-Id: I3c7dea562adaeb663032e395499e69530dd8e145 Reviewed-on: https://boringssl-review.googlesource.com/c/34873 Reviewed-by: Adam Langley <agl@google.com>	2019-02-14 17:39:36 +00:00
David Benjamin	fb35b147ca	Remove stray prototype. The function's since been renamed. Change-Id: Id1a9788dfeb5c46b3463611b08318b3f253d03df Reviewed-on: https://boringssl-review.googlesource.com/c/34870 Reviewed-by: Adam Langley <agl@google.com>	2019-02-14 17:31:14 +00:00
David Benjamin	4545503926	Add a constant-time pshufb-based GHASH implementation. We currently require clmul instructions for constant-time GHASH on x86_64. Otherwise, it falls back to a variable-time 4-bit table implementation. However, a significant proportion of clients lack these instructions. Inspired by vpaes, we can use pshufb and a slightly different order of incorporating the bits to make a constant-time GHASH. This requires SSSE3, which is very common. Benchmarking old machines we had on hand, it appears to be a no-op on Sandy Bridge and a small slowdown for Penryn. Sandy Bridge (Intel Pentium CPU 987 @ 1.50GHz): (Note: these numbers are before 16-byte-aligning the table. That was an improvement on Penryn, so it's possible Sandy Bridge is now better.) Before: Did 4244750 AES-128-GCM (16 bytes) seal operations in 4015000us (1057222.9 ops/sec): 16.9 MB/s Did 442000 AES-128-GCM (1350 bytes) seal operations in 4016000us (110059.8 ops/sec): 148.6 MB/s Did 84000 AES-128-GCM (8192 bytes) seal operations in 4015000us (20921.5 ops/sec): 171.4 MB/s Did 3349250 AES-256-GCM (16 bytes) seal operations in 4016000us (833976.6 ops/sec): 13.3 MB/s Did 343500 AES-256-GCM (1350 bytes) seal operations in 4016000us (85532.9 ops/sec): 115.5 MB/s Did 65250 AES-256-GCM (8192 bytes) seal operations in 4015000us (16251.6 ops/sec): 133.1 MB/s After: Did 4229250 AES-128-GCM (16 bytes) seal operations in 4016000us (1053100.1 ops/sec): 16.8 MB/s [-0.4%] Did 442250 AES-128-GCM (1350 bytes) seal operations in 4016000us (110122.0 ops/sec): 148.7 MB/s [+0.1%] Did 83500 AES-128-GCM (8192 bytes) seal operations in 4015000us (20797.0 ops/sec): 170.4 MB/s [-0.6%] Did 3286500 AES-256-GCM (16 bytes) seal operations in 4016000us (818351.6 ops/sec): 13.1 MB/s [-1.9%] Did 342750 AES-256-GCM (1350 bytes) seal operations in 4015000us (85367.4 ops/sec): 115.2 MB/s [-0.2%] Did 65250 AES-256-GCM (8192 bytes) seal operations in 4016000us (16247.5 ops/sec): 133.1 MB/s [-0.0%] Penryn (Intel Core 2 Duo CPU P8600 @ 2.40GHz): Before: Did 1179000 AES-128-GCM (16 bytes) seal operations in 1000139us (1178836.1 ops/sec): 18.9 MB/s Did 97000 AES-128-GCM (1350 bytes) seal operations in 1006347us (96388.2 ops/sec): 130.1 MB/s Did 18000 AES-128-GCM (8192 bytes) seal operations in 1028943us (17493.7 ops/sec): 143.3 MB/s Did 977000 AES-256-GCM (16 bytes) seal operations in 1000197us (976807.6 ops/sec): 15.6 MB/s Did 82000 AES-256-GCM (1350 bytes) seal operations in 1012434us (80992.9 ops/sec): 109.3 MB/s Did 15000 AES-256-GCM (8192 bytes) seal operations in 1006528us (14902.7 ops/sec): 122.1 MB/s After: Did 1306000 AES-128-GCM (16 bytes) seal operations in 1000153us (1305800.2 ops/sec): 20.9 MB/s [+10.8%] Did 94000 AES-128-GCM (1350 bytes) seal operations in 1009852us (93082.9 ops/sec): 125.7 MB/s [-3.4%] Did 17000 AES-128-GCM (8192 bytes) seal operations in 1012096us (16796.8 ops/sec): 137.6 MB/s [-4.0%] Did 1070000 AES-256-GCM (16 bytes) seal operations in 1000929us (1069006.9 ops/sec): 17.1 MB/s [+9.4%] Did 79000 AES-256-GCM (1350 bytes) seal operations in 1002209us (78825.9 ops/sec): 106.4 MB/s [-2.7%] Did 15000 AES-256-GCM (8192 bytes) seal operations in 1061489us (14131.1 ops/sec): 115.8 MB/s [-5.2%] Change-Id: I1c3760a77af7bee4aee3745d1c648d9e34594afb Reviewed-on: https://boringssl-review.googlesource.com/c/34267 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-24 17:19:21 +00:00
David Benjamin	73b1f181b6	Add ABI tests for GCM. Change-Id: If28096e677104c6109e31e31a636fee82ef4ba11 Reviewed-on: https://boringssl-review.googlesource.com/c/34266 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-15 22:49:37 +00:00
Aaron Green	28babde159	Include aes.h in mode/internal.h block128_f was recently changed to take an AES_KEY instead of a void*, but AES_KEY is not defined in base.h. internal.h should not depend on other sources to include aes.h for it. Change-Id: I81aab5124ce4397eb76a83ff09779bfaea66d3c1 Reviewed-on: https://boringssl-review.googlesource.com/32364 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2018-10-03 17:36:04 +00:00
David Benjamin	73535ab252	Fix undefined block128_f, etc., casts. This one is a little thorny. All the various block cipher modes functions and callbacks take a void key. This allows them to be used with multiple kinds of block ciphers. However, the implementations of those callbacks are the normal typed functions, like AES_encrypt. Those take AES_KEY key. While, at the ABI level, this is perfectly fine, C considers this undefined behavior. If we wish to preserve this genericness, we could either instantiate multiple versions of these mode functions or create wrappers of AES_encrypt, etc., that take void *key. The former means more code and is tedious without C++ templates (maybe someday...). The latter would not be difficult for a compiler to optimize out. C mistakenly allowed comparing function pointers for equality, which means a compiler cannot replace pointers to wrapper functions with the real thing. (That said, the performance-sensitive bits already act in chunks, e.g. ctr128_f, so the function call overhead shouldn't matter.) But our only 128-bit block cipher is AES anyway, so I just switched things to use AES_KEY throughout. AES is doing fine, and hopefully we would have the sense not to pair a hypothetical future block cipher with so many modes! Change-Id: Ied3e843f0e3042a439f09e655b29847ade9d4c7d Reviewed-on: https://boringssl-review.googlesource.com/32107 Reviewed-by: Adam Langley <agl@google.com>	2018-10-01 17:35:02 +00:00
David Benjamin	302ef5ee12	Keep the GCM bits in one place. This avoids needing to duplicate the "This API differs [...]" comment. Change-Id: If07c77bb66ecdae4e525fa01cc8c762dbacb52f1 Reviewed-on: https://boringssl-review.googlesource.com/32005 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2018-09-17 22:12:21 +00:00
David Benjamin	580be2b184	Trim 88 bytes from each AES-GCM EVP_AEAD. EVP_AEAD reused portions of EVP_CIPHER's GCM128_CONTEXT which contains both the key and intermediate state for each operation. (The legacy OpenSSL EVP_CIPHER API has no way to store just a key.) Split out a GCM128_KEY and store that instead. Change-Id: Ibc550084fa82963d3860346ed26f9cf170dceda5 Reviewed-on: https://boringssl-review.googlesource.com/32004 Commit-Queue: David Benjamin <davidben@google.com> Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2018-09-17 22:05:51 +00:00
Adam Langley	05750f23ae	Revert "Revert "Revert "Revert "Make x86(-64) use the same aes_hw_* infrastructure as POWER and the ARMs."""" This was reverted a second time because it ended up always setting the final argument to CRYPTO_gcm128_init to zero, which disabled some acceleration of GCM on ≥Haswell. With this update, that argument will be set to 1 if \|aes_hw_*\| functions are being used. Probably this will need to be reverted too for some reason. I'm hoping to fill the entire git short description with “Revert”. Change-Id: Ib4a06f937d35d95affdc0b63f29f01c4a8c47d03 Reviewed-on: https://boringssl-review.googlesource.com/28484 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2018-05-14 22:09:29 +00:00
Steven Valdez	f16cd4278f	Add AES_128_CCM AEAD. Change-Id: I830be64209deada0f24c3b6d50dc86155085c377 Reviewed-on: https://boringssl-review.googlesource.com/25904 Commit-Queue: Steven Valdez <svaldez@google.com> Reviewed-by: Steven Valdez <svaldez@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2018-02-16 15:57:27 +00:00
Adam Langley	f8d05579b4	Add ASN1_INTEGET_set_uint64. Change-Id: I3298875a376c98cbb60deb8c99b9548c84b014df Reviewed-on: https://boringssl-review.googlesource.com/24484 Commit-Queue: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: David Benjamin <davidben@google.com>	2018-01-02 16:01:31 +00:00
David Benjamin	896332581e	Appease UBSan on pointer alignment. Even without strict-aliasing, C does not allow casting pointers to types that don't match their alignment. After this change, UBSan is happy with our code at default settings but for the negative left shift language bug. Note: architectures without unaligned loads do not generate the same code for memcpy and pointer casts. But even ARMv6 can perform unaligned loads and stores (ARMv5 couldn't), so we should be okay here. Before: Did 11086000 AES-128-GCM (16 bytes) seal operations in 5000391us (2217026.6 ops/sec): 35.5 MB/s Did 370000 AES-128-GCM (1350 bytes) seal operations in 5005208us (73923.0 ops/sec): 99.8 MB/s Did 63000 AES-128-GCM (8192 bytes) seal operations in 5029958us (12525.0 ops/sec): 102.6 MB/s Did 9894000 AES-256-GCM (16 bytes) seal operations in 5000017us (1978793.3 ops/sec): 31.7 MB/s Did 316000 AES-256-GCM (1350 bytes) seal operations in 5005564us (63129.7 ops/sec): 85.2 MB/s Did 54000 AES-256-GCM (8192 bytes) seal operations in 5054156us (10684.3 ops/sec): 87.5 MB/s After: Did 11026000 AES-128-GCM (16 bytes) seal operations in 5000197us (2205113.1 ops/sec): 35.3 MB/s Did 370000 AES-128-GCM (1350 bytes) seal operations in 5005781us (73914.5 ops/sec): 99.8 MB/s Did 63000 AES-128-GCM (8192 bytes) seal operations in 5032695us (12518.1 ops/sec): 102.5 MB/s Did 9831750 AES-256-GCM (16 bytes) seal operations in 5000010us (1966346.1 ops/sec): 31.5 MB/s Did 316000 AES-256-GCM (1350 bytes) seal operations in 5005702us (63128.0 ops/sec): 85.2 MB/s Did 54000 AES-256-GCM (8192 bytes) seal operations in 5053642us (10685.4 ops/sec): 87.5 MB/s (Tested with the no-asm builds; most of this code isn't reachable otherwise.) Change-Id: I025c365d26491abed0116b0de3b7612159e52297 Reviewed-on: https://boringssl-review.googlesource.com/22804 Reviewed-by: Adam Langley <agl@google.com>	2017-11-10 21:07:03 +00:00
David Benjamin	808f832917	Run the comment converter on libcrypto. crypto/{asn1,x509,x509v3,pem} were skipped as they are still OpenSSL style. Change-Id: I3cd9a60e1cb483a981aca325041f3fbce294247c Reviewed-on: https://boringssl-review.googlesource.com/19504 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2017-08-18 21:49:04 +00:00
David Benjamin	9f579bfe6c	Use unions rather than aliasing when possible. This is less likely to make the compiler grumpy and generates the same code. (Although this file has worse casts here which I'm still trying to get the compiler to cooperate on.) Change-Id: If7ac04c899d2cba2df34eac51d932a82d0c502d9 Reviewed-on: https://boringssl-review.googlesource.com/16986 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2017-06-08 00:21:18 +00:00
David Benjamin	1997ef22d7	Tidy up aesni_gcm_crypt logic. CRYPTO_gcm128_init is currently assuming that it gets passed in aesni_encrypt whenever it selects the AVX implementation. This is true, but we can easily avoid this assumption by adding an extra boolean input. Change-Id: Ie7888323f0c93ff9df8f1cf3ba784fb35bb07076 Reviewed-on: https://boringssl-review.googlesource.com/15370 Reviewed-by: Adam Langley <agl@google.com>	2017-04-21 22:49:04 +00:00
Adam Langley	0648129566	Move modes/ into the FIPS module The changes to delocate.go are needed because modes/ does things like return the address of a module function. Both of these need to be changed from referencing the GOT to using local symbols. Rather than testing whether \|ghash\| is \|gcm_ghash_avx\|, we can just keep that information in a flag. The test for \|aesni_ctr32_encrypt_blocks\| is more problematic, but I believe that it's superfluous and can be dropped: if you passed in a stream function that was semantically different from \|aesni_ctr32_encrypt_blocks\| you would already have a bug because \|CRYPTO_gcm128_[en\|de]crypt_ctr32\| will handle a block at the end themselves, and assume a big-endian, 32-bit counter anyway. Change-Id: I68a84ebdab6c6006e11e9467e3362d7585461385 Reviewed-on: https://boringssl-review.googlesource.com/15064 Reviewed-by: Adam Langley <agl@google.com>	2017-04-21 17:46:37 +00:00

19 Commits