boringssl

Author	SHA1	Message	Date
David Benjamin	17d553d299	Add a CFI tester to CHECK_ABI. This uses the x86 trap flag and libunwind to test CFI works at each instruction. For now, it just uses the system one out of pkg-config and disables unwind tests if unavailable. We'll probably want to stick a copy into //third_party and perhaps try the LLVM one later. This tester caught two bugs in P-256 CFI annotations already: I47b5f9798b3bcee1748e537b21c173d312a14b42 and I9f576d868850312d6c14d1386f8fbfa85021b347 An earlier design used PTRACE_SINGLESTEP with libunwind's remote unwinding features. ptrace is a mess around stop signals (see group-stop discussion in ptrace(2)) and this is 10x faster, so I went with it. The question of which is more future-proof is complex: - There are two libunwinds with the same API, https://www.nongnu.org/libunwind/ and LLVM's. This currently uses the system nongnu.org for convenience. In future, LLVM's should be easier to bundle (less complex build) and appears to even support Windows, but I haven't tested this. Moreover, setting the trap flag keeps the test single-process, which is less complex on Windows. That suggests the trap flag design and switching to LLVM later. However... - Not all architectures have a trap flag settable by userspace. As far as I can tell, ARMv8's PSTATE.SS can only be set from the kernel. If we stick with nongnu.org libunwind, we can use PTRACE_SINGLESTEP and remote unwinding. Or we implement it for LLVM. Another thought is for the ptracer to bounce SIGTRAP back into the process, to share the local unwinding code. - ARMv7 has no trap flag at all and PTRACE_SINGLESTEP fails. Debuggers single-step by injecting breakpoints instead. However, ARMv8's trap flag seems to work in both AArch32 and AArch64 modes, so we may be able to condition it on a 64-bit kernel. Sadly, neither strategy works with Intel SDE. Adding flags to cpucap vectors as we do with ARM would help, but it would not emulate CPUs newer than the host CPU. For now, I've just had SDE tests disable these. Annoyingly, CMake does not allow object libraries to have dependencies, so make test_support a proper static library. Rename the target to test_support_lib to avoid https://gitlab.kitware.com/cmake/cmake/issues/17785 Update-Note: This adds a new optional test dependency, but it's disabled by default (define BORINGSSL_HAVE_LIBUNWIND), so consumers do not need to do anything. We'll probably want to adjust this in the future. Bug: 181 Change-Id: I817263d7907aff0904a9cee83f8b26747262cc0c Reviewed-on: https://boringssl-review.googlesource.com/c/33966 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-03 22:01:55 +00:00
Adam Langley	6effbf24bc	Add EVP_CIPHER support for Blowfish and CAST to decrepit. Postgres contains a “pqcrypto” module that showcases the worst of 90's crypto, including Blowfish and CAST5 in CFB, CBC, and ECB modes. (Also, 64-bit keys for both of those.) In order to minimise the patching needed to build Postgres, put these things in decrepit. Change-Id: I8390c5153dd7227eef07293a4363878d79df8b21 Reviewed-on: https://boringssl-review.googlesource.com/c/34044 Reviewed-by: Adam Langley <agl@google.com> Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: Adam Langley <agl@google.com>	2019-01-03 21:34:46 +00:00
David Benjamin	f77c8a38be	Be less clever with CHECK_ABI. Unwind testing will make CHECK_ABI much slower. The original ptrace-based design is some 10,000x slower. I've found an alternate design that's a mere 1,000x slower, but this probably warrants being more straightforward. It also removes the weirdness where NDEBUG controlled which tests were run. While it does mean we need to write some extra tests for p256-x86_64.pl, we otherwise do not directly unit test our assembly anyway. Usually we test the public crypto APIs themselves. So, for most files, this isn't actually extra work. Bug: 181 Change-Id: I7cbb7f930c2ea6ae32a201da503dcd36844704f0 Reviewed-on: https://boringssl-review.googlesource.com/c/33965 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-03 21:02:24 +00:00
Adam Langley	e6bf9065af	Remove pooling of PRNG state. Prior to `82639e6f` we used thread-local data for the PRNG state. That change switched to using a mutex-protected pool instead in order to save memory in heavily-threaded applications. However, the pool mutex can get extremely hot in cases where the PRNG is heavily used. `8e8f2504` was a short-term work around, but supporting both modes is overly complex. This change moves back to the state of the prior to `82639e6f`. The best way to review this is to diff the changed files against '82639e6f^' and note that the only difference is a comment added in rand.c: https://paste.googleplex.com/4997991748337664 Change-Id: I8febce089696fa6bc39f94f4a1e268127a8f78db Reviewed-on: https://boringssl-review.googlesource.com/c/34024 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	2019-01-03 20:19:44 +00:00
Jeremy Apthorp	7177c1d29f	Add EC_KEY_key2buf for OpenSSL compatibility Change-Id: If45ef3a9bb757bd0c7f592f40ececaf4aa2f607d Reviewed-on: https://boringssl-review.googlesource.com/c/33824 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2019-01-03 16:32:21 +00:00
Jeremy Apthorp	79c7ec06f6	Add EC_GROUP_order_bits for OpenSSL compatibility Change-Id: I37149fa4274357d84befff85728ce2337131afa7 Reviewed-on: https://boringssl-review.googlesource.com/c/33804 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-02 23:51:14 +00:00
David Benjamin	0eaf783fbf	Annotate leaf functions with .cfi_{startproc,endproc} While gdb can figure it out, libunwind requires CFI directives to unwind a leaf function, even though the directives are trivial. Adding them matches what GCC outputs, and likely gdb has many heuristics that less complex tools (e.g. profilers) may not. Bug: 181 Change-Id: I25c72152de33109a29710a828aeb99c608dd0470 Reviewed-on: https://boringssl-review.googlesource.com/c/33964 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-02 23:49:24 +00:00
David Benjamin	c2e8d016f5	Fix beeu_mod_inverse_vartime CFI annotations and preamble. This was also caught by the in-progress unwind tester. There are two issues here. First, .cfi_endproc must come after ret to fully cover the function. More importantly, this function is confused about whether it has a frame pointer or not. It looks like it does (movq %rsp, %rbp), and annotates accordingly, but it does not actually use the frame pointer. It cannot. $y4 is rbp and gets clobbered immediately after the preamble! Remove this instruction and align the CFI annotations with a frame-pointer-less function. Bug: 181 Change-Id: I47b5f9798b3bcee1748e537b21c173d312a14b42 Reviewed-on: https://boringssl-review.googlesource.com/c/33947 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2019-01-02 23:47:34 +00:00
David Benjamin	a306b1b908	Fix CFI annotations in p256-x86_64-asm.pl. This was caught by in-progress work to test unwind information. It was incorrect at two instructions: immediately before we jump to .Lpoint_double_shortcut$x. This is needed because ecp_nistz256_point_add$x tries to be clever about not unwinding the stack frame in its tail call. It's also unlikely that the SEH handlers in this file are correct at this point, but that will be handled separately while overhauling everything else here. (For Win64, probably the only ABI-compliant option is to just properly unwind the stack frame. Without a custom handler, Win64 unwind codes are very restrictive.) Bug: 181 Change-Id: I9f576d868850312d6c14d1386f8fbfa85021b347 Reviewed-on: https://boringssl-review.googlesource.com/c/33946 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-02 23:39:21 +00:00
David Benjamin	6ef1b64558	Add a comment about ecp_nistz256_point_add_affine's limitations. ecp_nistz256_point_add_affine does not support the doubling case and, unlike ecp_nistz256_point_add which does a tail call, computes the wrong answer. Note TestPointAdd in the unit tests skips this case. This works fine because we only use ecp_nistz256_point_add_affine for the g_scalar term, which is fully computed before the p_scalar term. (Additionally it requires that the windowing pattern never hit the doubling case for single multiplication.) But this is not obvious from reading the multiplication functions, so leave a comment at the call site to point this out. Change-Id: I08882466d98030cdc882a5be9e702ee404e80cce Reviewed-on: https://boringssl-review.googlesource.com/c/33945 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2019-01-02 23:33:31 +00:00
David Benjamin	1c55e54eda	Refresh p256-x86_64_tests.txt. The old points weren't even on the curve. I probably had no clue what I was doing at the time when I generated them. Refresh them with a checked-in generate script. Change-Id: Ib4613fe922edcf45fc4ea49fc4c2cc23a9a2a9bd Reviewed-on: https://boringssl-review.googlesource.com/c/33944 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2019-01-02 23:29:31 +00:00
David Benjamin	fb3f0638ba	Fix some indentation nits. perlasm's bizarre mix of asm and perl indentation and clever editors always mess me up. Change-Id: Iac906a636207867939cc327b4c21b8a982abce29 Reviewed-on: https://boringssl-review.googlesource.com/c/33844 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-02 19:26:54 +00:00
Adam Langley	8e8f250422	Use thread-local storage for PRNG states if fork-unsafe buffering is enabled. We switched from thread-local storage to a mutex-pool in `82639e6f53` because, for highly-threaded processes, the memory used by all the states could be quite large. I had judged that a mutex-pool should be fine, but had underestimated the PRNG requirements of some of our jobs. This change makes rand.c support using either thread-locals or a mutex-pool. Thread-locals are used if fork-unsafe buffering is enabled. While not strictly related to fork-safety, we already have the fork-unsafe control, and it's already set by jobs that care a lot about PRNG performance, so fits quite nicely here. Change-Id: Iaf1e0171c70d4c8dbe1e42283ea13df5b613cb2d Reviewed-on: https://boringssl-review.googlesource.com/c/31564 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: David Benjamin <davidben@google.com>	2018-12-28 18:05:18 +00:00
David Benjamin	74944287e1	Add Win64 SEH unwind codes for the ABI test trampoline. This is all manual right now. Once we've added SEH tests, we can add support for emitting these in x86_64-xlate.pl, probably based on MASM and Yasm's unwind directives, and unify with CFI. (Sadly, NASM does not support these directives.) Then we can push that upstream to replace the error-prone and non-standard custom handlers. Change-Id: I5a734fd494b7eaafab24a00e6df624bd03b37d43 Reviewed-on: https://boringssl-review.googlesource.com/c/33785 Reviewed-by: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <alangley@gmail.com> Commit-Queue: David Benjamin <davidben@google.com>	2018-12-21 16:38:03 +00:00
David Benjamin	5edf8957b5	Translate .L directives inside .byte too. Win64 unwind tables place distances from the start of a function in byte-wide values. Change-Id: Ie2aad7f6f5b702a60933bd52d872a83cba4e73a9 Reviewed-on: https://boringssl-review.googlesource.com/c/33784 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <alangley@gmail.com>	2018-12-21 16:35:32 +00:00
David Benjamin	54efa1afc0	Add an ABI testing framework. Dear reader, I must apologize in advance. This CL contains the following: - A new 256-line perlasm file with non-trivial perl bits and a dual-ABI variadic function caller. - C preprocessor gymnastics, with variadic macros and fun facts about __VA_ARGS__'s behavior on empty argument lists. - C++ template gymnastics, including variadic arguments, template specialization, std::enable_if, and machinery to control template argument deduction. Enjoy. This tests that our assembly functions correctly honor platform ABI conventions. Right now this only tests callee-saved registers, but it should be extendable to SEH/CFI unwind testing with single-step debugging APIs. Register-checking does not involve anything funny and should be compatible with SDE. (The future unwind testing is unlikely to be compatible.) This CL adds support for x86_64 SysV and Win64 ABIs. ARM, AArch64, and x86 can be added in the future. The testing is injected in two places. First, all the assembly tests in p256-x86_64-test.cc are now instrumented. This is the intended workflow and should capture all registers. However, we currently do not unit-test our assembly much directly. We should do that as follow-up work[0] but, in the meantime, I've also wrapped all of the GTest main function in an ABI test. This is imperfect as ABI failures may be masked by other stack frames, but it costs nothing[1] and is pretty reliable at catching Win64 xmm register failures. [0] An alternate strategy would be, in debug builds, unconditionally instrument every assembly call in libcrypto. But the CHECK_ABI macro would be difficult to replicate in pure C, and unwind testing may be too invasive for this. Still, something to consider when we C++ libcrypto. [1] When single-stepped unwind testing exists, it won't cost nothing. The gtest_main.cc call will turn unwind testing off. Change-Id: I6643b26445891fd46abfacac52bc024024c8d7f6 Reviewed-on: https://boringssl-review.googlesource.com/c/33764 Reviewed-by: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <alangley@gmail.com> Commit-Queue: David Benjamin <davidben@google.com>	2018-12-21 16:09:32 +00:00
Adam Langley	9700b44ff5	HRSS: omit reconstruction of ciphertext. In [1], section 5.1, an optimised re-encryption process is given. In the code, this simplifies to not needing to rebuild the ciphertext at all. Thanks to John Schanck for pointing this out. [1] https://eprint.iacr.org/2018/1174.pdf Change-Id: I807bd509e936b7e82a43e8656444431546e9bbdf Reviewed-on: https://boringssl-review.googlesource.com/c/33705 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	2018-12-19 20:09:34 +00:00
Adam Langley	a6a049a6fb	Add start of infrastructure for checking constant-time properties. Valgrind's checking of uninitialised memory behaves very much like a check for constant-time code: branches and memory indexes based on uninitialised memory trigger warnings. Therefore, if we can tell Valgrind that some secret is “uninitialised”, it'll give us a warning if we do something non-constant-time with it. This was the idea behind https://github.com/agl/ctgrind. But tricks like that are no longer needed because Valgrind now comes with support for marking regions of memory as defined or not. Therefore we can use that API to check constant-time code. This CL defines \|CONSTTIME_SECRET\| and \|CONSTTIME_DECLASSIFY\|, which are no-ops unless the code is built with \|BORINGSSL_CONSTANT_TIME_VALIDATION\| defined, which it isn't by default. So this CL is a no-op itself so far. But it does show that a couple of bits of constant-time time are, in fact, constant-time—seemingly even when compiled with optimisations, which is nice. The annotations in the RSA code are a) probably not marking all the secrets as secret, and b) triggers warnings that are a little interesting: The anti-glitch check calls \|BN_mod_exp_mont\| which checks that the input is less than the modulus. Of course, it is because the input is the RSA plaintext that we just decrypted, but the plaintext is supposed to be secret and so branching based on its contents isn't allows by Valgrind. The answer isn't totally clear, but I've run out of time on this for now. Change-Id: I1608ed0b22d201e97595fafe46127159e02d5b1b Reviewed-on: https://boringssl-review.googlesource.com/c/33504 Reviewed-by: Adam Langley <agl@google.com> Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: Adam Langley <agl@google.com>	2018-12-18 22:43:02 +00:00
Adam Langley	c2897a158a	Don't enable intrinsics on x86 without ABI support. At some point after GCC 7.3, but before 8.2, GCC enabled the SSE ABI by default. However, if it isn't enabled, the vector intrinsics in HRSS cannot be used. (See https://github.com/grpc/grpc/issues/17540.) Note that the intrinsics used are SSE2, but that should be ok because they are guarded by a run-time check. The compile-time check for __SSE__ just ensures that GCC will build the code at all. (SDE does not simulate anything that doesn't have SSE2, however.) Change-Id: If092a06a441ed9d38576ea30351b3b40693a3399 Reviewed-on: https://boringssl-review.googlesource.com/c/33744 Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2018-12-18 17:06:48 +00:00
Adam Langley	f8068ce885	HRSS: be strict about unused bits being zero. It's excessively complex to worry about leaving these few bits for extensions. If we need to change things, we can spin a new curve ID in TLS. We don't need to support two versions during the transition because a fallback to X25519 is still fine. Change-Id: I0a4019d5693db0f0f3a5379909d99c2e2c762560 Reviewed-on: https://boringssl-review.googlesource.com/c/33704 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	2018-12-17 21:02:58 +00:00
David Benjamin	41c10e2b5f	Disable AES-GCM-SIV assembly on Windows. I'm working on a test harness to check our assembly correctly restores callee-saved registers. It caught this. While perlasm tries to smooth over the differences between Windows and SysV ABIs, it does not capture the difference in xmm registers. All xmm registers are volatile in SysV, while Windows makes xmm6 through xmm15 callee-saved. Change-Id: Ia549b0f126885768f7fb330271a590174c483a3d Reviewed-on: https://boringssl-review.googlesource.com/c/33685 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2018-12-17 17:54:07 +00:00
David Benjamin	e1b2a65e7f	Fix typo in AES-GCM-SIV comments. Change-Id: I73bd495cf99bbc8a993a726b009d68e74c893420 Reviewed-on: https://boringssl-review.googlesource.com/c/33684 Commit-Queue: David Benjamin <davidben@google.com> Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2018-12-17 17:39:46 +00:00
Alessandro Ghedini	9b0970f1b0	Fix HRSS build error on ARM Seeing the following errors with GCC 6 on ARM: crypto/hrss/hrss.c:212:12: error: function declaration isn't a prototype [-Werror=strict-prototypes] static int vec_capable() { return CRYPTO_is_NEON_capable(); } ^~~~~~~~~~~ crypto/hrss/hrss.c: In function 'vec_capable': crypto/hrss/hrss.c:212:12: error: old-style function definition [-Werror=old-style-definition] Change-Id: Ice540e6d436b8ada1dbc494f1feca10efff11687 Reviewed-on: https://boringssl-review.googlesource.com/c/33624 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2018-12-14 17:12:49 +00:00
Adam Langley	200fe6786b	Remove HRSS confirmation hash. Since the underlying operation is deterministic the confirmation hash isn't needed and SXY didn't use it in their proof. Change-Id: I3a03c20ee79645cf94b10dbfe654c1b88d9aa416 Reviewed-on: https://boringssl-review.googlesource.com/c/33605 Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2018-12-13 18:42:02 +00:00
Adam Langley	35a66d4aae	Drop NEON assembly for HRSS. Since we build Chrome with -mfpu=neon anyway, this isn't currently needed. Additionally, I had included poly3_invert_vec in the assembly but hadn't gotten around to wiring it up yet. That assembly referenced a couple of functions in the C code that had been renamed. Surprisingly, the NDK linker didn't have a problem with the undefined symbols since it could statically find them to be unreachable. But that isn't true everywhere. Some builds did fail because of the undefined symbols although we're not sure what's different about them. (Different NDK version perhaps?) Change-Id: Ibac4724d24df05d6f6007499e1cd884e59889101 Reviewed-on: https://boringssl-review.googlesource.com/c/33604 Commit-Queue: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	2018-12-13 17:43:07 +00:00
David Benjamin	3adb1e5a37	Patch out the XTS implementation in bsaes. We don't call it, so ship less code and reduce the number of places where we must think about the bsaes -> aes_nohw fallback. Bug: 256 Change-Id: I10ac2d70e18ec81e679631a9532c36d9edab1c6e Reviewed-on: https://boringssl-review.googlesource.com/c/33586 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2018-12-12 22:27:13 +00:00
Adam Langley	fc30467f28	Remove .file and .loc directives from HRSS ARM asm. This is a workaround for https://bugs.llvm.org/show_bug.cgi?id=38740. Change-Id: I74d5066c4c782745e003a608b3ccc002599bf6b4 Reviewed-on: https://boringssl-review.googlesource.com/c/33587 Commit-Queue: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	2018-12-12 22:26:53 +00:00
Adam Langley	1ea083d8b2	Always 16-byte align \|poly\| elements. Even if the vector code isn't used in hrss.c, it might call external assembly that still requires alignment. Change-Id: I11ceb88f96deec6b20883872030ca090506ca150 Reviewed-on: https://boringssl-review.googlesource.com/c/33584 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2018-12-12 18:58:50 +00:00
Adam Langley	2526c66b72	Fix bug in HRSS tests. I moved the \|poly3_rand\| code into a function and omitted to update a \|sizeof\|. Change-Id: I861fac4fe26ee3b5e5116d5cee71e64d9af9d175 Reviewed-on: https://boringssl-review.googlesource.com/c/33564 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2018-12-12 18:04:25 +00:00
Adam Langley	7b935937b1	Add initial HRSS support. This change includes support for a variant of [HRSS], a post-quantum KEM based on NTRU. It includes changes suggested in [SXY]. This is not yet ready for any deployment: some breaking changes, like removing the confirmation hash, are still planned. (CLA for HRSS's assembly code noted in b/119426559.) [HRSS] https://eprint.iacr.org/2017/667.pdf [SXY] https://eprint.iacr.org/2017/1005.pdf Change-Id: I85d813733b066d5c578484bdd248de3f764194db Reviewed-on: https://boringssl-review.googlesource.com/c/33105 Reviewed-by: David Benjamin <davidben@google.com>	2018-12-12 17:35:02 +00:00
Adam Langley	bf5021a6b8	Eliminate \|OPENSSL_ia32cap_P\| in C code in the FIPS module. This can break delocate with certain compiler settings. Change-Id: I76cf0f780d0e967390feed754e39b0ab25068f42 Reviewed-on: https://boringssl-review.googlesource.com/c/33485 Commit-Queue: Adam Langley <alangley@gmail.com> Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	2018-12-06 00:58:14 +00:00
David Benjamin	750fea158a	Fix d2i__bio on partial reads. If BIO_read returns partial reads, d2i__bio currently fails. This is a partial (hah) regression from `419144adce`. The old a_d2i_fp.c code did not tolerate partial reads in the ASN.1 header, but it did tolerate them in the ASN.1 body. Since partial reads are more likely to land in the body than the header, I think we can say d2i_*_bio was "supposed to" tolerate this but had a bug in the first few bytes. Fix it for both cases. Add a regression test for this and the partial write case (which works fine). See also https://github.com/google/conscrypt/pull/587. Change-Id: I886f6388f0b80621960e196cf2a56f5c02a14a04 Reviewed-on: https://boringssl-review.googlesource.com/c/33484 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2018-12-05 22:05:28 +00:00
Brian Smith	90247be1d9	Remove XOP code from sha512-x86_64.pl. Other XOP code was removed already. Change-Id: I0c457effebd22f89e722653b93905a0b2e3eb5c0 Reviewed-on: https://boringssl-review.googlesource.com/c/33424 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2018-12-04 01:10:32 +00:00
Brian Smith	36ee9a5a0d	Pretend AMD XOP was never a thing. It's not clear that any AMD XOP code paths are being properly tested. AMD dropped XOP starting in Zen. Here's the one place I found (without looking too hard) where it seems there is a XOP code path in BoringSSL, in sha512-x86_64.pl. Most of the other XOP code was removed. ``` $code.=<<___ if ($avx && $SZ==8); test \$`1<<11`,%r10d # check for XOP jnz .Lxop_shortcut ``` Change-Id: Id3301b2c84648790d010dae546b8e21ece1c528d Reviewed-on: https://boringssl-review.googlesource.com/c/33405 Reviewed-by: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2018-12-03 22:59:55 +00:00
Brian Smith	96b05ed487	Assume hyper-threading-like vulnerabilities are always present. It's not clear that CPUID will always report the correct value here, especially for hyper-threading environments. It also isn't clear that the assumptions made by AMD processors are correct and will always be correct. It also seems likely that, if a code path is security-sensitive w.r.t. SMT, it is probably also security-sensitive w.r.t. other processor (mis)features. Finally, it isn't clear that all dynamic analysis (fuzzing, SDE, etc.) is done separately for the cross product of all CPU feature combinations * the value of this bit. With all that in mind, instruct code sensitive to this bit to always choose the more conservative path. I only found one place that's sensitive to this bit, though I didn't look too hard: ``` aes_nohw_cbc_encrypt: [...] leaq OPENSSL_ia32cap_P(%rip),%r10 mov (%r10), %r10d [...] bt \$28,%r10d jc .Lcbc_slow_prologue ``` I didn't verify that the code in the HTT-enabled paths is any better than the code in the HTT-disabled paths. Change-Id: Ifd643e6a1301e5ca2174b84c344eb933d49e0067 Reviewed-on: https://boringssl-review.googlesource.com/c/33404 Reviewed-by: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2018-12-03 22:32:24 +00:00
David Benjamin	eb7d5b69e9	Replace the last CRITICAL_SECTION with SRWLOCK. We don't support Windows XP, so we can rely on SRWLOCK. Per https://crbug.com/592752, SRWLOCKs are more efficient and less of a hassle to use. We'd previously converted CRYPTO_MUTEX to SRWLOCK, but I missed this one. Not that this one lock matters much, may as well. It's less initialization code. Change-Id: I7ae435be5202b0a19f42015c9abff932dc04dbc7 Reviewed-on: https://boringssl-review.googlesource.com/c/33445 Commit-Queue: David Benjamin <davidben@google.com> Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2018-12-03 20:37:35 +00:00
Brian Smith	0f5ecd3a85	Re-enable AES-NI on 32-bit x86 too. commit `05750f23ae` disabled AES-NI for 32-bit x86, perhaps unintentionally. Change-Id: Ie950c4f49526257138ecc803df5ecfc115bc648d Reviewed-on: https://boringssl-review.googlesource.com/c/33365 Reviewed-by: Adam Langley <agl@google.com>	2018-11-28 00:32:30 +00:00
David Benjamin	e157dc9208	Make symbol-prefixing work on 32-bit x86. On Linux, this introduces yet another symbol to blacklist. Change-Id: Ieafe45a25f3b41da6c6934dd9488f4ee400bcab9 Reviewed-on: https://boringssl-review.googlesource.com/c/33350 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2018-11-27 22:35:17 +00:00
David Benjamin	8c23d3a5df	Make Windows symbol-prefixing work. This teaches read_symbols.go to use debug/pe, and fixes miscellaneous issues with NASM. It also reveals a problem with this strategy of getting symbols out at the linker level: inline functions. I'm thinking a better long-term mechanism may be to parse our header files. Change-Id: I11b008543a7a97db3db9d4062ee4ddb910d174b7 Reviewed-on: https://boringssl-review.googlesource.com/c/33349 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2018-11-27 22:13:22 +00:00
David Benjamin	00d72d342f	Fix stack_test.cc in the prefixed build. Uses of BORINGSSL_MAKE_DELETER must be inside BSSL_NAMESPACE_BEGIN for the specializations to work. Change-Id: Ib96cf5d235586b24c052973d7034c0e5a8019f17 Reviewed-on: https://boringssl-review.googlesource.com/c/33346 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2018-11-27 21:35:56 +00:00
David Benjamin	045ee41928	Unexport and rename hex_to_string, string_to_hex, and name_cmp. Squatting these names is rather rude. Also hex_to_string and string_to_hex do the opposite of what one would expect, so rename them to something a bit less confusing. Update-Note: This removes some random utility functions. name_cmp is very specific to OpenSSL's config file format, so it's unlikely anyone is relying on it. I removed the one use of hex_to_string and string_to_hex I could find. Change-Id: I01554885ad306251e6982100d0b15cd89b1cdea7 Reviewed-on: https://boringssl-review.googlesource.com/c/33364 Commit-Queue: David Benjamin <davidben@google.com> Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2018-11-27 00:08:39 +00:00
David Benjamin	bbc429148f	Add a note that generated files are generated. Folks keep assuming checked-in assembly files are the source. Between the preprocessor, delocate, NASM not using the C preprocessor, and GAS's arch-specific comment syntax, comment markers are kind of a disaster. This set appears to work for now. Change-Id: I48e26dafb444dfa310df80dcce87ac291fde8037 Reviewed-on: https://boringssl-review.googlesource.com/c/33304 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2018-11-21 20:05:05 +00:00
David Benjamin	4f746a9073	Move ARM cpuinfo functions to the header. ClusterFuzz folks want to switch to a shared library build, so call into these another way. The new setup isn't quite ideal because the real code builds as C and now tests as C++, but it should work. Bug: chromium:907115 Change-Id: Ia1ffc18832739b09fee21b84ee5d181e61feaa15 Reviewed-on: https://boringssl-review.googlesource.com/c/33285 Commit-Queue: David Benjamin <davidben@google.com> Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2018-11-21 00:46:57 +00:00
Adam Langley	a3ba8b3289	Regenerate obj_dat.h clang-format seems to have decided to format things differently now. This will eliminate diff noise in the future when there are actual changes. Change-Id: I1f94cf0f0859023b6c926119f39bf0a587464e52 Reviewed-on: https://boringssl-review.googlesource.com/c/33266 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	2018-11-19 20:26:03 +00:00
David Benjamin	293d9ee4e8	Support execute-only memory for AArch64 assembly. Put data in .rodata and, rather than adr, use the combination of adrp :pg_hi21: and add :lo12:. Unfortunately, iOS uses different syntax, so we must add more transforms to arm-xlate.pl. Tested manually by: 1. Use Android NDK r19-beta1 2. Follow usual instructions to configure CMake for aarch64, but pass -DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=lld -Wl,-execute-only". 3. Build. Confirm with readelf -l tool/bssl that .text is not marked readable. 4. Push the test binaries onto a Pixel 3. Test normally and with --cpu={none,neon,crypto}. I had to pass --gtest_filter=-Thread to crypto_test. There appears to be an issue with some runtime function that's unrelated to our assembly. No measurable performance difference. Going forward, to support this, we will need to apply similar changes to all other AArch64 assembly. This is relatively straightforward, but may be a little finicky for dual-AArch32/AArch64 files (aesv8-armx.pl). Update-Note: Assembly syntax is a mess. There's a decent chance some assembler will get offend. Change-Id: Ib59b921d4cce76584320fefd23e6bb7ebd4847eb Reviewed-on: https://boringssl-review.googlesource.com/c/33245 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2018-11-19 19:58:15 +00:00
David Benjamin	4188c3f495	Remove cacheline striping in copy_from_prebuf. The standard computation model for constant-time code is that memory access patterns must be independent of secret data. BN_mod_exp_mont_consttime was previously written to a slightly weaker model: only cacheline access patterns must be independent of secret data. It assumed accesses within a cacheline were indistinguishable. The CacheBleed attack (https://eprint.iacr.org/2016/224.pdf) showed this assumption was false. Cache lines may be divided into cache banks, and the researchers were able to measure cache bank contention pre-Haswell. For Haswell, the researchers note "But, as Haswell does show timing variations that depend on low address bits [19], it may be vulnerable to similar attacks." OpenSSL's fix to CacheBleed was not to adopt the standard constant-time computation model. Rather, it now assumes accesses within a 16-byte cache bank are indistinguishable, at least in the C copy_from_prebuf path. These weaker models failed before with CacheBleed, so avoiding such assumptions seems prudent. (The [19] citation above notes a false dependence between memory addresses with a distance of 4k, which may be what the paper was referring to.) Moreover, the C path is largely unused on x86_64 (which uses mont5 asm), so it is especially questionable for the generic C code to make assumptions based on x86_64. Just walk the entire table in the C implementation. Doing so as-is comes with a performance hit, but the striped memory layout is, at that point, useless. We regain the performance loss (and then some) by using a more natural layout. Benchmarks below. This CL does not touch the mont5 assembly; I haven't figured out what it's doing yet. Pixel 3, aarch64: Before: Did 3146 RSA 2048 signing operations in 10009070us (314.3 ops/sec) Did 447 RSA 4096 signing operations in 10026666us (44.6 ops/sec) After: Did 3210 RSA 2048 signing operations in 10010712us (320.7 ops/sec) Did 456 RSA 4096 signing operations in 10063543us (45.3 ops/sec) Pixel 3, armv7: Before: Did 2688 RSA 2048 signing operations in 10002266us (268.7 ops/sec) Did 459 RSA 4096 signing operations in 10004785us (45.9 ops/sec) After: Did 2709 RSA 2048 signing operations in 10001299us (270.9 ops/sec) Did 459 RSA 4096 signing operations in 10063737us (45.6 ops/sec) x86_64 Broadwell, mont5 assembly disabled: (This configuration is not actually shipped anywhere, but seemed a useful data point.) Before: Did 14274 RSA 2048 signing operations in 10009130us (1426.1 ops/sec) Did 2448 RSA 4096 signing operations in 10046921us (243.7 ops/sec) After: Did 14706 RSA 2048 signing operations in 10037908us (1465.0 ops/sec) Did 2538 RSA 4096 signing operations in 10059986us (252.3 ops/sec) Change-Id: If41da911d4281433856a86c6c8eadf99cd33e2d8 Reviewed-on: https://boringssl-review.googlesource.com/c/33268 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2018-11-19 19:10:09 +00:00
David Benjamin	5963bff237	Tidy up type signature of BN_mod_exp_mont_consttime table. It's a table of BN_ULONGs. No particular need to use unsigned char. Change-Id: I397883cef9f39fb162c2b0bfbd6a70fe399757a2 Reviewed-on: https://boringssl-review.googlesource.com/c/33267 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2018-11-19 17:44:44 +00:00
David Benjamin	46e12b03f9	Print a message when simulating CPUs. Make it more obvious something is happening. Change-Id: Ie68d1e96a9bedd4b572c1cc99910348f89f07624 Reviewed-on: https://boringssl-review.googlesource.com/c/33244 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>	2018-11-16 23:12:27 +00:00
David Benjamin	6ce93ccb80	Simulate other ARM CPUs when running tests. We test all Intel variants via SDE. For ARM, we can do the next best thing and tweak with OPENSSL_armcap_P. If the host CPU does not support the instructions we wish to test, skip it, but print something so we know whether we need a more featureful test device. Also fix the "CRASHED" status to "CRASH", to match https://chromium.googlesource.com/chromium/src/+/master/docs/testing/json_test_results_format.md (It's unclear if anything actually parses that JSON very carefully...) Bug: 19 Change-Id: I811cc00a0d210a454287ac79c06f18fbc54f96dd Reviewed-on: https://boringssl-review.googlesource.com/c/33204 Commit-Queue: David Benjamin <davidben@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: Adam Langley <agl@google.com>	2018-11-15 00:58:09 +00:00
Adam Langley	444c2e59fb	Merge P-224 contract into serialisation. Contraction was always and only done immediately prior to calling \|p224_felem_to_generic\| so merge it into that function. Change-Id: If4fb46c6305ba724dfff15e8362a094c599f3f2c Reviewed-on: https://boringssl-review.googlesource.com/c/33165 Commit-Queue: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org> Reviewed-by: David Benjamin <davidben@google.com>	2018-11-14 23:47:13 +00:00

1 2 3 4 5 ...

2599 Commits