Commit Graph

5691 Commits

Author SHA1 Message Date
David Benjamin
d8598ce03f Remove non-STRICT_ALIGNMENT code from xts.c.
Independent of the underlying CPU architecture, casting unaligned
pointers to uint64_t* is undefined. Just use a memcpy. The compiler
should be able to optimize that itself.

Change-Id: I39210871fca3eaf1f4b1d205b2bb0c337116d9cc
Reviewed-on: https://boringssl-review.googlesource.com/c/34872
Reviewed-by: Adam Langley <agl@google.com>
2019-02-14 17:32:11 +00:00
David Benjamin
4d8e1ce5e9 Patch XTS out of ARMv7 bsaes too.
Bug: 256
Change-Id: I822274bf05901d82b41dc9c9c4e6d0b5d622f3ff
Reviewed-on: https://boringssl-review.googlesource.com/c/34871
Reviewed-by: Adam Langley <agl@google.com>
2019-02-14 17:31:37 +00:00
David Benjamin
fb35b147ca Remove stray prototype.
The function's since been renamed.

Change-Id: Id1a9788dfeb5c46b3463611b08318b3f253d03df
Reviewed-on: https://boringssl-review.googlesource.com/c/34870
Reviewed-by: Adam Langley <agl@google.com>
2019-02-14 17:31:14 +00:00
David Benjamin
eb2c2cdf17 Always define GHASH.
There is a C implementation of gcm_ghash_4bit to pair with
gcm_gmult_4bit. It's even slightly faster per the numbers below (x86_64
OPENSSL_NO_ASM build), but, more importantly, we trim down the
combinatorial explosion of GCM implementations and free up complexity
budget for potentially using bsaes better in the future.

Old:
Did 2557000 AES-128-GCM (16 bytes) seal operations in 1000057us (2556854.3 ops/sec): 40.9 MB/s
Did 94000 AES-128-GCM (1350 bytes) seal operations in 1009613us (93105.0 ops/sec): 125.7 MB/s
Did 17000 AES-128-GCM (8192 bytes) seal operations in 1024768us (16589.1 ops/sec): 135.9 MB/s
Did 2511000 AES-256-GCM (16 bytes) seal operations in 1000196us (2510507.9 ops/sec): 40.2 MB/s
Did 84000 AES-256-GCM (1350 bytes) seal operations in 1000412us (83965.4 ops/sec): 113.4 MB/s
Did 15000 AES-256-GCM (8192 bytes) seal operations in 1046963us (14327.2 ops/sec): 117.4 MB/s

New:
Did 2739000 AES-128-GCM (16 bytes) seal operations in 1000322us (2738118.3 ops/sec): 43.8 MB/s
Did 100000 AES-128-GCM (1350 bytes) seal operations in 1008190us (99187.7 ops/sec): 133.9 MB/s
Did 17000 AES-128-GCM (8192 bytes) seal operations in 1006360us (16892.6 ops/sec): 138.4 MB/s
Did 2546000 AES-256-GCM (16 bytes) seal operations in 1000150us (2545618.2 ops/sec): 40.7 MB/s
Did 86000 AES-256-GCM (1350 bytes) seal operations in 1000970us (85916.7 ops/sec): 116.0 MB/s
Did 14850 AES-256-GCM (8192 bytes) seal operations in 1023459us (14509.6 ops/sec): 118.9 MB/s

While I'm here, tighten up some of the functions and align the ctr32 and
non-ctr32 paths.

Bug: 256
Change-Id: Id4df699cefc8630dd5a350d44f927900340f5e60
Reviewed-on: https://boringssl-review.googlesource.com/c/34869
Reviewed-by: Adam Langley <agl@google.com>
2019-02-14 17:30:55 +00:00
Watson Ladd
2f213f643f Update delegated credentials to draft-03
Change-Id: I0c648340ac7bb134fcda42c56a83f4815bbaa557
Reviewed-on: https://boringssl-review.googlesource.com/c/34884
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2019-02-13 20:04:33 +00:00
David Benjamin
b22c9fea47 Use Windows symbol APIs in the unwind tester.
This should make things a bit easier to debug.

Update-Note: Test binaries on Windows now link to dbghelp.
Bug: 259
Change-Id: I9da1fc89d429080c5250238e4341445922b1dd8e
Reviewed-on: https://boringssl-review.googlesource.com/c/34868
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-02-12 20:42:47 +00:00
David Benjamin
2e819d8be4 Unwind RDRAND functions correctly on Windows.
But for the ABI conversion bits, these are just leaf functions and don't
even need unwind tables. Just renumber the registers on Windows to only
used volatile ones.

In doing so, this switches to writing rdrand explicitly. perlasm already
knows how to manually encode it and our minimum assembler versions
surely cover rdrand by now anyway. Also add the .size directive. I'm not
sure what it's used for, but the other files have it.

(This isn't a generally reusable technique. The more complex functions
will need actual unwind codes.)

Bug: 259
Change-Id: I1d5669bcf8b6e34939885d78aea6f60597be1528
Reviewed-on: https://boringssl-review.googlesource.com/c/34867
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-12 20:24:27 +00:00
David Benjamin
15ba2d11a9 Patch out unused aesni-x86_64 functions.
This shrinks the bssl binary by about 8k.

Change-Id: I571f258ccf7032ae34db3f20904ad9cc81cca839
Reviewed-on: https://boringssl-review.googlesource.com/c/34866
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-11 20:25:22 +00:00
David Benjamin
cc2b8e2552 Add ABI tests for aesni-gcm-x86_64.pl.
Change-Id: Ic23fc5fbec2c4f8df5d06f807c6bd2c5e1f0e99c
Reviewed-on: https://boringssl-review.googlesource.com/c/34865
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-11 20:08:38 +00:00
David Benjamin
7a3b94cd2c Add ABI tests for x86_64-mont5.pl.
Fix some missing CFI bits.

Change-Id: I42114527f0ef8e03079d37a9f466d64a63a313f5
Reviewed-on: https://boringssl-review.googlesource.com/c/34864
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-11 19:27:13 +00:00
Jeremy Apthorp
7ef4223fb3 sync EVP_get_cipherbyname with EVP_do_all_sorted
EVP_get_cipherbyname should work on everything that EVP_do_all_sorted
lists, and conversely, there should be nothing that
EVP_get_cipherbyname works on that EVP_do_all_sorted doesn't list.

node.js uses these APIs to enumerate and instantiate ciphers.

Change-Id: I87fcedce62d06774f7c6ee7acc898326276be089
Reviewed-on: https://boringssl-review.googlesource.com/c/33984
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2019-02-11 17:20:23 +00:00
Katrin Leinweber
d2a0ffdfa7 Hyperlink DOI to preferred resolver
Change-Id: Ib9983a74d5d2f8be7c96cedde17be5a4e9223d5e
Reviewed-on: https://boringssl-review.googlesource.com/c/34844
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2019-02-08 19:20:05 +00:00
David Benjamin
a6c689e0da Remove stray semicolons.
Thanks to Nico Weber for pointing this out.

Change-Id: I763fd4a6f8fe467a027d5b249d9f76633ab4375a
Reviewed-on: https://boringssl-review.googlesource.com/c/34824
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Steven Valdez <svaldez@google.com>
Reviewed-by: Steven Valdez <svaldez@google.com>
2019-02-07 17:36:54 +00:00
Adam Langley
2d38b83976 Remove separate default group list for servers.
It's the same as for clients, and we're probably not going to change
that any time soon.

Change-Id: Ic48cb640e98b0957d264267b97b5393f1977c6e6
Reviewed-on: https://boringssl-review.googlesource.com/c/34665
Reviewed-by: David Benjamin <davidben@google.com>
2019-02-06 00:33:29 +00:00
Adam Langley
fcc1ad78f9 Enable all curves (inc CECPQ2) during fuzzing.
Change-Id: I8083e841de135e9ec244609b1c20f0280ce20072
Reviewed-on: https://boringssl-review.googlesource.com/c/34664
Reviewed-by: David Benjamin <davidben@google.com>
2019-02-06 00:32:45 +00:00
David Benjamin
70fe610556 Implement ABI testing for aarch64.
This caught a bug in bn_mul_mont. Tested manually on iOS and Android.

Change-Id: I1819fcd9ad34dbe3ba92bba952507d86dd12185a
Reviewed-on: https://boringssl-review.googlesource.com/c/34805
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-05 21:44:04 +00:00
David Benjamin
55b9acda99 Fix ABI error in bn_mul_mont on aarch64.
This was caught by an aarch64 ABI tester. aarch64 has the same
considerations around small arguments as x86_64 does. The aarch64
version of bn_mul_mont does not mask off the upper words of the
argument.

The x86_64 version does, so size_t is, strictly speaking, wrong for
aarch64, but bn_mul_mont already has an implicit size limit to support
its internal alloca, so this doesn't really make things worse than
before.

Change-Id: I39bffc8fdb2287e45a2d1f0d1b4bd5532bbf3868
Reviewed-on: https://boringssl-review.googlesource.com/c/34804
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-05 21:17:54 +00:00
David Benjamin
0a87c4982c Implement ABI testing for ARM.
Update-Note: There's some chance this'll break iOS since I was unable to
test it there. The iPad I have to test on is too new to run 32-bit code
at all.

Change-Id: I6593f91b67a5e8a82828237d3b69ed948b07922d
Reviewed-on: https://boringssl-review.googlesource.com/c/34725
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-05 21:01:44 +00:00
David Benjamin
0a67eba62d Fix the order of Windows unwind codes.
The unwind tester suggests Windows doesn't care, but the documentation
says that unwind codes should be sorted in descending offset, which
means the last instruction should be first.

https://docs.microsoft.com/en-us/cpp/build/exception-handling-x64?view=vs-2017#struct-unwind_code

Bug: 259
Change-Id: I21e54c362e18e0405f980005112cc3f7c417c70c
Reviewed-on: https://boringssl-review.googlesource.com/c/34785
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-05 19:38:23 +00:00
David Benjamin
28f035f48b Implement unwind testing for Windows.
Unfortunately, due to most OpenSSL assembly using custom exception
handlers to unwind, most of our assembly doesn't work with
non-destructive unwind. For now, CHECK_ABI behaves like
CHECK_ABI_NO_UNWIND on Windows, and CHECK_ABI_SEH will test unwinding on
both platforms.

The tests do, however, work with the unwind-code-based assembly we
recently added, as well as the clmul-based GHASH which is also
code-based. Remove the ad-hoc SEH tests which intentionally hit memory
access exceptions, now that we can test unwind directly.

Now that we can test it, the next step is to implement SEH directives in
perlasm so writing these unwind codes is less of a chore.

Bug: 259
Change-Id: I23a57a22c5dc9fa4513f575f18192335779678a5
Reviewed-on: https://boringssl-review.googlesource.com/c/34784
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-05 19:22:15 +00:00
David Benjamin
fc31677a1d Tolerate spaces when parsing .type directives.
The .type foo, @abi-omnipotent lines weren't being parsed correctly.
This doesn't change the generated files, but some internal state (used
in-progress work on perlasm SEH directives) wasn't quite right.

Change-Id: Id6aec79281a59f45b2eb2aea9f1fb8806b4c483e
Reviewed-on: https://boringssl-review.googlesource.com/c/34786
Reviewed-by: Adam Langley <agl@google.com>
2019-02-05 15:47:26 +00:00
David Benjamin
20a9b409bb runner: Don't generate an RSA key on startup.
RSA keygen isn't the fastest. Just use the existing one in
rsaCertificate.

Change-Id: Icd151232928e67e0a7d5becabf9dc96b0e9bfa22
Reviewed-on: https://boringssl-review.googlesource.com/c/34764
Commit-Queue: Steven Valdez <svaldez@google.com>
Reviewed-by: Steven Valdez <svaldez@google.com>
2019-02-04 16:08:41 +00:00
David Benjamin
33f456b8b0 Don't use bsaes over vpaes for CTR-DRBG.
RAND_bytes rarely uses large enough inputs for bsaes to be worth it.
https://boringssl-review.googlesource.com/c/boringssl/+/33589 includes some
rough benchmarks of various bits here. Some observations:

- 8 blocks of bsaes costs roughly 6.5 blocks of vpaes. Note the comparison
  isn't quite accurate because I'm measuring bsaes_ctr32_encrypt_blocks against
  vpaes_encrypt and vpaes in CTR mode today must make do with a C loop. Even
  assuming a cutoff of 6 rather than 7 blocks, it's rare to ask for 96 bytes
  of entropy at a time.

- CTR-DRBG performs some stray block operations (ctr_drbg_update), which bsaes
  is bad at without extra work to fold them into the CTR loop (not really worth
  it).

- CTR-DRBG calculates a couple new key schedules every RAND_bytes call. We
  don't currently have a constant-time bsaes key schedule. Unfortunately, even
  plain vpaes loses to the current aes_nohw used by bsaes, but it's not
  constant-time. Also taking CTR-DRBG out of the bsaes equation

- Machines without AES hardware (clients) are not going to be RNG-bound. It's
  mostly servers pushing way too many CBC IVs that care. This means bsaes's
  current side channel tradeoffs make even less sense here.

I'm not sure yet what we should do for the rest of the bsaes mess, but it seems
clear that we want to stick with vpaes for the RNG.

Bug: 256
Change-Id: Iec8f13af232794afd007cb1065913e8117eeee24
Reviewed-on: https://boringssl-review.googlesource.com/c/34744
Reviewed-by: Adam Langley <agl@google.com>
2019-02-01 18:03:39 +00:00
David Benjamin
470bd56c9b perlasm/x86_64-xlate.pl: refine symbol recognition in .xdata.
Hexadecimals were erroneously recognized as symbols in .xdata.

(Imported from upstream's b068a9b914887af5cc99895754412582fbb0e10b)

Change-Id: I5d8e8e1969669a8961733802d9f034cf26c45552
Reviewed-on: https://boringssl-review.googlesource.com/c/34704
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-02-01 18:02:44 +00:00
David Benjamin
9978f0a865 Add instructions for debugging on Android with gdb.
Android's official documentation seems to assume you're using the NDK
build system or Android Studio. I extracted this from one of their
scripts a while back. May as well put it somewhere we can easily find
it.

Change-Id: I259abc54e6935ab537956a7cbf9f80e924a60b7a
Reviewed-on: https://boringssl-review.googlesource.com/c/34724
Reviewed-by: Adam Langley <agl@google.com>
2019-02-01 02:51:11 +00:00
Jesse Selover
d7266ecc9b Enforce key usage for RSA keys in TLS 1.2.
For now, this is off by default and controlled by SSL_set_enforce_rsa_key_usage.
This may be set as late as certificate verification so we may start by enforcing
it for known roots.

Generalizes ssl_cert_check_digital_signature_key_usage to check any part of the
key_usage, and adds a new error KEY_USAGE_BIT_INCORRECT for the generalized
method.

Bug: chromium:795089
Change-Id: Ifa504c321bec3263a4e74f2dc48513e3b895d3ee
Reviewed-on: https://boringssl-review.googlesource.com/c/34604
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-01-30 21:28:34 +00:00
David Benjamin
1a51a5b4a6 Remove infra/config folder in master branch.
As of https://boringssl-review.googlesource.com/c/34584, the LUCI config
has been consolidated on the infra/config branch.

Change-Id: Idd9f38b99197b9ff324d98c4aecb5d8fe94a2f9e
Reviewed-on: https://boringssl-review.googlesource.com/c/34684
Reviewed-by: Andrii Shyshkalov <tandrii@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-01-30 00:21:43 +00:00
Filippo Valsorda
73308b6606 Avoid SCT/OCSP extensions in SH on {Omit|Empty}Extensions
They were causing a "panic: ServerHello unexpectedly contained extensions"
if the client unconditionally signals support for OCSP or SCTs.

Change-Id: Ia60639431daf78679b269dfe337c1af171fd7d8b
Reviewed-on: https://boringssl-review.googlesource.com/c/34644
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-01-29 00:51:31 +00:00
David Benjamin
23e1a1f2d3 Test and fix an ABI issue with small parameters.
Calling conventions must specify how to handle arguments smaller than a
machine word. Should the caller pad them up to a machine word size with
predictable values (zero/sign-extended), or should the callee tolerate
an arbitrary bit pattern?

Annoyingly, I found no text in either SysV or Win64 ABI documentation
describing any of this and resorted to experiment. The short answer is
that callees must tolerate an arbitrary bit pattern on x86_64, which
means we must test this. See the comment in abi_test::internal::ToWord
for the long answer.

CHECK_ABI now, if the type of the parameter is smaller than
crypto_word_t, fills the remaining bytes with 0xaa. This is so the
number is out of bounds for code expecting either zero or sign
extension. (Not that crypto assembly has any business seeing negative
numbers.)

Doing so reveals a bug in ecp_nistz256_ord_sqr_mont. The rep parameter
is typed int, but the code expected uint64_t. In practice, the compiler
will always compile this correctly because:

- On both Win64 and SysV, rep is a register parameter.

- The rep parameter is always a constant, so the compiler has no reason
  to leave garbage in the upper half.

However, I was indeed able to get a bug out of GCC via:

  uint64_t foo = (1ull << 63) | 2;  // Some global the compiler can't
                                    // prove constant.
  ecp_nistz256_ord_sqr_mont(res, a, foo >> 1);

Were ecp_nistz256_ord_sqr_mont a true int-taking function, this would
act like ecp_nistz256_ord_sqr_mont(res, a, 1). Instead, it hung. Fix
this by having it take a full-width word.

This mess has several consequences:

- ABI testing now ideally needs a functional testing component to fully cover
  this case. A bad input might merely produce the wrong answer. Still,
  this is fairly effective as it will cause most code to either segfault
  or loop forever. (Not the enc parameter to AES however...)

- We cannot freely change the type of assembly function prototypes. If the
  prototype says int or unsigned, it must be ignoring the upper half and
  thus "fixing" it to size_t cannot have handled the full range. (Unless
  it was simply wrong of the parameter is already bounded.) If the
  prototype says size_t, switching to int or unsigned will hit this type
  of bug. The former is a safer failure mode though.

- The simplest path out of this mess: new assembly code should *only*
  ever take word-sized parameters. This is not a tall order as the bad
  parameters are usually ints that should have been size_t.

Calling conventions are hard.

Change-Id: If8254aff8953844679fbce4bd3e345e5e2fa5213
Reviewed-on: https://boringssl-review.googlesource.com/c/34627
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-28 21:09:40 +00:00
David Benjamin
ab578adf44 Add RSAZ ABI tests.
As part of this, move the CPU checks to C.

Change-Id: I17b701e1196c1ca116bbd23e0e669cf603ad464d
Reviewed-on: https://boringssl-review.googlesource.com/c/34626
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-01-28 21:00:49 +00:00
David Benjamin
3859fc883d Better document RSAZ and tidy up types.
It's an assembly function, so types are a little meaningless, but
everything is passed through as BN_ULONG, so be consistent. Also
annotate all the RSAZ prototypes with sizes.

Change-Id: I32e59e896da39e79c30ce9db52652fd645a033b4
Reviewed-on: https://boringssl-review.googlesource.com/c/34625
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-01-28 20:54:27 +00:00
David Benjamin
e569c7e25d Add ABI testing for 32-bit x86.
This is much less interesting (stack-based parameters, Windows and SysV
match, no SEH concerns as far as I can tell) than x86_64, but it was
easy to do and I'm more familiar with x86 than ARM, so it made a better
second architecture to make sure all the architecture ifdefs worked out.

Also fix a bug in the x86_64 direction flag code. It was shifting in the
wrong direction, making give 0 or 1<<20 rather than 0 or 1.

(Happily, x86_64 appears to be unique in having vastly different calling
conventions between OSs. x86 is the same between SysV and Windows, and
ARM had the good sense to specify a (mostly) common set of rules.)

Since a lot of the assembly functions use the same names and the tests
were written generically, merely dropping in a trampoline and
CallerState implementation gives us a bunch of ABI tests for free.

Change-Id: I15408c18d43e88cfa1c5c0634a8b268a150ed961
Reviewed-on: https://boringssl-review.googlesource.com/c/34624
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-01-28 20:40:06 +00:00
David Benjamin
8cbb5f8f20 Add a very roundabout EC keygen API.
OpenSSL's EVP-level EC API involves a separate "paramgen" operation,
which is ultimately just a roundabout way to go from a NID to an
EC_GROUP. But Node uses this, and it's the pattern used within OpenSSL
these days, so this appears to be the official upstream recommendation.

Also add a #define for OPENSSL_EC_EXPLICIT_CURVE, because Node uses it,
but fail attempts to use it. Explicit curve encodings are forbidden by
RFC 5480 and generally a bad idea. (Parsing such keys back into OpenSSL
will cause it to lose the optimized path.)

Change-Id: I5e97080e77cf90fc149f6cf6f2cc4900f573fc64
Reviewed-on: https://boringssl-review.googlesource.com/c/34565
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-25 23:08:12 +00:00
David Benjamin
23dcf88e18 Add some Node compatibility functions.
This doesn't cover all the functions used by Node, but it's the easy
bits. (EVP_PKEY_paramgen will be done separately as its a non-trivial
bit of machinery.)

Change-Id: I6501e99f9239ffcdcc57b961ebe85d0ad3965549
Reviewed-on: https://boringssl-review.googlesource.com/c/34544
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2019-01-25 16:50:30 +00:00
Christopher Patton
6c1b376e1d Implement server support for delegated credentials.
This implements the server-side of delegated credentials, a proposed
extension for TLS:
https://tools.ietf.org/html/draft-ietf-tls-subcerts-02

Change-Id: I6a29cf1ead87b90aeca225335063aaf190a417ff
Reviewed-on: https://boringssl-review.googlesource.com/c/33666
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2019-01-24 20:06:58 +00:00
David Benjamin
4545503926 Add a constant-time pshufb-based GHASH implementation.
We currently require clmul instructions for constant-time GHASH
on x86_64. Otherwise, it falls back to a variable-time 4-bit table
implementation. However, a significant proportion of clients lack these
instructions.

Inspired by vpaes, we can use pshufb and a slightly different order of
incorporating the bits to make a constant-time GHASH. This requires
SSSE3, which is very common. Benchmarking old machines we had on hand,
it appears to be a no-op on Sandy Bridge and a small slowdown for
Penryn.

Sandy Bridge (Intel Pentium CPU 987 @ 1.50GHz):
(Note: these numbers are before 16-byte-aligning the table. That was an
improvement on Penryn, so it's possible Sandy Bridge is now better.)
Before:
Did 4244750 AES-128-GCM (16 bytes) seal operations in 4015000us (1057222.9 ops/sec): 16.9 MB/s
Did 442000 AES-128-GCM (1350 bytes) seal operations in 4016000us (110059.8 ops/sec): 148.6 MB/s
Did 84000 AES-128-GCM (8192 bytes) seal operations in 4015000us (20921.5 ops/sec): 171.4 MB/s
Did 3349250 AES-256-GCM (16 bytes) seal operations in 4016000us (833976.6 ops/sec): 13.3 MB/s
Did 343500 AES-256-GCM (1350 bytes) seal operations in 4016000us (85532.9 ops/sec): 115.5 MB/s
Did 65250 AES-256-GCM (8192 bytes) seal operations in 4015000us (16251.6 ops/sec): 133.1 MB/s
After:
Did 4229250 AES-128-GCM (16 bytes) seal operations in 4016000us (1053100.1 ops/sec): 16.8 MB/s [-0.4%]
Did 442250 AES-128-GCM (1350 bytes) seal operations in 4016000us (110122.0 ops/sec): 148.7 MB/s [+0.1%]
Did 83500 AES-128-GCM (8192 bytes) seal operations in 4015000us (20797.0 ops/sec): 170.4 MB/s [-0.6%]
Did 3286500 AES-256-GCM (16 bytes) seal operations in 4016000us (818351.6 ops/sec): 13.1 MB/s [-1.9%]
Did 342750 AES-256-GCM (1350 bytes) seal operations in 4015000us (85367.4 ops/sec): 115.2 MB/s [-0.2%]
Did 65250 AES-256-GCM (8192 bytes) seal operations in 4016000us (16247.5 ops/sec): 133.1 MB/s [-0.0%]

Penryn (Intel Core 2 Duo CPU P8600 @ 2.40GHz):
Before:
Did 1179000 AES-128-GCM (16 bytes) seal operations in 1000139us (1178836.1 ops/sec): 18.9 MB/s
Did 97000 AES-128-GCM (1350 bytes) seal operations in 1006347us (96388.2 ops/sec): 130.1 MB/s
Did 18000 AES-128-GCM (8192 bytes) seal operations in 1028943us (17493.7 ops/sec): 143.3 MB/s
Did 977000 AES-256-GCM (16 bytes) seal operations in 1000197us (976807.6 ops/sec): 15.6 MB/s
Did 82000 AES-256-GCM (1350 bytes) seal operations in 1012434us (80992.9 ops/sec): 109.3 MB/s
Did 15000 AES-256-GCM (8192 bytes) seal operations in 1006528us (14902.7 ops/sec): 122.1 MB/s
After:
Did 1306000 AES-128-GCM (16 bytes) seal operations in 1000153us (1305800.2 ops/sec): 20.9 MB/s [+10.8%]
Did 94000 AES-128-GCM (1350 bytes) seal operations in 1009852us (93082.9 ops/sec): 125.7 MB/s [-3.4%]
Did 17000 AES-128-GCM (8192 bytes) seal operations in 1012096us (16796.8 ops/sec): 137.6 MB/s [-4.0%]
Did 1070000 AES-256-GCM (16 bytes) seal operations in 1000929us (1069006.9 ops/sec): 17.1 MB/s [+9.4%]
Did 79000 AES-256-GCM (1350 bytes) seal operations in 1002209us (78825.9 ops/sec): 106.4 MB/s [-2.7%]
Did 15000 AES-256-GCM (8192 bytes) seal operations in 1061489us (14131.1 ops/sec): 115.8 MB/s [-5.2%]

Change-Id: I1c3760a77af7bee4aee3745d1c648d9e34594afb
Reviewed-on: https://boringssl-review.googlesource.com/c/34267
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-24 17:19:21 +00:00
Adam Langley
9801a07145 Tweak some slightly fragile tests.
These tests failed when CECPQ2 was enabled by default. Even if we're
not going to make CECPQ2 the default, it's worth fixing them to be more
robust.

Change-Id: Idef508bca9e17a4ef0e0a8a396755abd975f9908
Reviewed-on: https://boringssl-review.googlesource.com/c/34524
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2019-01-23 22:48:16 +00:00
Adam Langley
4bfab5d9d7 Make 256-bit ciphers a preference for CECPQ2, not a requirement.
If 256-bit ciphers are a requirement for CECPQ2 then that introduces a
link between supported ciphers and supported groups: offering CECPQ2
without a 256-bit cipher is invalid. But that's a little weird since
these things were otherwise independent.

So, rather than require a 256-bit cipher for CECPQ2, just prefer them.

Change-Id: I491749e41708cd9c5eeed5b4ae23c11e5c0b9725
Reviewed-on: https://boringssl-review.googlesource.com/c/34504
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2019-01-23 22:38:56 +00:00
David Benjamin
fa81cc65dd Update comments around JDK11 workaround.
11.0.2 has since been released, but we are now aware of several more
bugs, so the workaround is unlikely to be removable for the foreseeable
future.

Change-Id: I8e7edcba2f002d0558a21e607306ddf9a205bfb3
Reviewed-on: https://boringssl-review.googlesource.com/c/34484
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-23 20:00:38 +00:00
David Benjamin
c47f7936d0 Add a RelWithAsserts build configuration.
On our bots, debug unit tests take around 2.5x as long to complete as
release tests on Linux, 3x as long on macOS, and 6x as long on Windows.
Our tests are fast, so this does not particularly matter, but SDE
inflates a 13 second test run to 8 minutes. On Windows (MSVC), where we
don't but would like to test with SDE, the difference between optimized
and unoptimized is even larger, and test runs are slower in general.

This suggests running SDE tests in release mode. Release mode tests,
however, are less effective because they do not include asserts. Thus,
add a RelWithAsserts option.

(Chromium does something similar. I believe most of the test-running
configurations on the critical path run is_debug = false and
dcheck_always_on = true.)

Change-Id: I273dd86ab8ea039f34eca431483827c87dc5c461
Reviewed-on: https://boringssl-review.googlesource.com/c/34464
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-23 17:21:56 +00:00
Adam Langley
51011b4a26 Remove union from |SHA512_CTX|.
With 2fe0360a4e, we no longer use the
other member of this union so it can be removed.

Change-Id: Ideb7c47a72df0b420eb1e7d8c718e1cacb2129f5
Reviewed-on: https://boringssl-review.googlesource.com/c/34449
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2019-01-22 23:36:46 +00:00
David Benjamin
4f3f597d32 Avoid unwind tests on libc functions.
When built under UBSan, it gets confused inside a PLT stub.

Change-Id: Ib082ecc076ba2111337ff5921e465e4beb99aab5
Reviewed-on: https://boringssl-review.googlesource.com/c/34448
Reviewed-by: Adam Langley <agl@google.com>
2019-01-22 23:29:24 +00:00
David Benjamin
14c611cf91 Don't pass NULL,0 to qsort.
qsort shares the same C language bug as mem*. Two of our calls may see
zero-length lists. This trips UBSan.

Change-Id: Id292dd277129881001eb57b1b2db78438cf4642e
Reviewed-on: https://boringssl-review.googlesource.com/c/34447
Reviewed-by: Adam Langley <agl@google.com>
2019-01-22 23:28:38 +00:00
David Benjamin
9847cdd785 Fix signed left-shifts in curve25519.c.
Due to a language flaw in C, left-shifts on signed integers are
undefined for negative numbers. This makes them all but useless. Cast to
the unsigned type, left-shift, and cast back (casts are defined to wrap)
to silence UBSan.

Change-Id: I8fbe739aee1c99cf553462b675863e6d68c2b302
Reviewed-on: https://boringssl-review.googlesource.com/c/34446
Reviewed-by: Adam Langley <agl@google.com>
2019-01-22 23:27:34 +00:00
David Benjamin
fc27a1919c Add an option to build with UBSan.
Change-Id: I31d5660fa4792bbb1ef8a721bad4bdbdb0e56863
Reviewed-on: https://boringssl-review.googlesource.com/c/34445
Reviewed-by: Adam Langley <agl@google.com>
2019-01-22 23:26:35 +00:00
David Benjamin
2fe0360a4e Fix undefined pointer casts in SHA-512 code.
Casting an unaligned pointer to uint64_t* is undefined, even on
platforms that support unaligned access. Additionally, dereferencing as
uint64_t violates strict aliasing rules. Instead, use memcpys which we
assume any sensible compiler can optimize. Also simplify the PULL64
business with the existing CRYPTO_bswap8.

This also removes the need for the
SHA512_BLOCK_CAN_MANAGE_UNALIGNED_DATA logic. The generic C code now
handles unaligned data and the assembly already can as well. (The only
problematic platform with assembly is old ARM, but sha512-armv4.pl
already handles this via an __ARM_ARCH__ check.  See also OpenSSL's
version of this file which always defines
SHA512_BLOCK_CAN_MANAGE_UNALIGNED_DATA if SHA512_ASM is defined.)

Add unaligned tests to digest_test.cc, so we retain coverage of
unaligned EVP_MD inputs.

Change-Id: Idfd8586c64bab2a77292af2fa8eebbd193e57c7d
Reviewed-on: https://boringssl-review.googlesource.com/c/34444
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-22 23:18:36 +00:00
Adam Langley
72f015562c HRSS: flatten sample distribution.
With HRSS-SXY, the sampling algorithm now longer has to be the same
between the two parties. Therefore we can change it at will (as long as
it remains reasonably uniform) and thus take the opportunity to make the
output distribution flatter.

Change-Id: I74c667fcf919fe11ddcf2f4fb8a540b5112268bf
Reviewed-on: https://boringssl-review.googlesource.com/c/34404
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2019-01-22 22:06:43 +00:00
Adam Langley
c1615719ce Add test of assembly code dispatch.
The first attempt involved using Linux's support for hardware
breakpoints to detect when assembly code was run. However, this doesn't
work with SDE, which is a problem.

This version has the assembly code update a global flags variable when
it's run, but only in non-FIPS and non-debug builds.

Update-Note: Assembly files now pay attention to the NDEBUG preprocessor
symbol. Ensure the build passes the symbol in. (If release builds fail
to link due to missing BORINGSSL_function_hit, this is the cause.)

Change-Id: I6b7ced442b7a77d0b4ae148b00c351f68af89a6e
Reviewed-on: https://boringssl-review.googlesource.com/c/33384
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2019-01-22 20:22:53 +00:00
Adam Langley
eadef4730e Simplify HRSS mod3 circuits.
The multiplication and subtraction circuits were found by djb using GNU
Superoptimizer, and the addition circuit is derived from the subtraction
one by hand. They depend on a different representation: -1 is now (1, 1)
rather than (1, 0), and the latter becomes undefined.

The following Python program checks that the circuits work:

values = [0, 1, -1]

def toBits(v):
    if v == 0:
        return 0, 0
    elif v == 1:
        return 0, 1
    elif v == -1:
        return 1, 1
    else:
        raise ValueError(v)

def mul((s1, a1), (s2, a2)):
    return ((s1 ^ s2) & a1 & a2, a1 & a2)

def add((s1, a1), (s2, a2)):
    t = s1 ^ a2
    return (t & (s2 ^ a1), (a1 ^ a2) | (t ^ s2))

def sub((s1, a1), (s2, a2)):
    t = a1 ^ a2
    return ((s1 ^ a2) & (t ^ s2), t | (s1 ^ s2))

def fromBits((s, a)):
    if s == 0 and a == 0:
        return 0
    if s == 0 and a == 1:
        return 1
    if s == 1 and a == 1:
        return -1
    else:
        raise ValueError((s, a))

def wrap(v):
    if v == 2:
        return -1
    elif v == -2:
        return 1
    else:
        return v

for v1 in values:
    for v2 in values:
        print v1, v2

        result = fromBits(mul(toBits(v1), toBits(v2)))
        if result != v1 * v2:
            raise ValueError((v1, v2, result))

        result = fromBits(add(toBits(v1), toBits(v2)))
        if result != wrap(v1 + v2):
            raise ValueError((v1, v2, result))

        result = fromBits(sub(toBits(v1), toBits(v2)))
        if result != wrap(v1 - v2):
            raise ValueError((v1, v2, result))

Change-Id: Ie1a4ca5a82c2651057efc62330eca6fdd9878122
Reviewed-on: https://boringssl-review.googlesource.com/c/34344
Reviewed-by: David Benjamin <davidben@google.com>
2019-01-21 21:32:35 +00:00
Adam Langley
20f4a043eb Add SSL_OP_NO_RENEGOTIATION
Since |ssl_renegotiate_never| is the default, this option is moot.
However, OpenSSL defines and supports it so this helps code that wishes
to support both.

Change-Id: I3a2f6e93a078d39526d10f9cd0a990953bd45825
Reviewed-on: https://boringssl-review.googlesource.com/c/34384
Reviewed-by: Adam Langley <alangley@gmail.com>
Commit-Queue: Adam Langley <alangley@gmail.com>
2019-01-21 18:08:55 +00:00