Commit Graph

2650 Commits

Author SHA1 Message Date
David Benjamin
104306f587 Remove STRICT_ALIGNMENT code from modes.
STRICT_ALIGNMENT is a remnant of OpenSSL code would cast pointers to
size_t* and load more than one byte at a time. Not all architectures
support unaligned access, so it did an alignment check and only enterred
this path if aligned or the underlying architecture didn't care.

This is UB. Unaligned casts in C are undefined on all architectures, so
we switch these to memcpy some time ago. Compilers can optimize memcpy
to the unaligned accesses we wanted. That left our modes logic as:

- If STRICT_ALIGNMENT is 1 and things are unaligned, work byte-by-byte.

- Otherwise, use the memcpy-based word-by-word code, which now works
  independent of STRICT_ALIGNMENT.

Remove the first check to simplify things. On x86, x86_64, and aarch64,
STRICT_ALIGNMENT is zero and this is a no-op. ARM is more complex. Per
[0], ARMv7 and up support unaligned access. ARMv5 do not. ARMv6 does,
but can run in a mode where it looks more like ARMv5.

For ARMv7 and up, STRICT_ALIGNMENT should have been zero, but was one.
Thus this change should be an improvement for ARMv7 (right now unaligned
inputs lose bsaes-armv7). The Android NDK does not even support the
pre-ARMv7 ABI anymore[1]. Nonetheless, Cronet still supports ARMv6 as a
library. It builds with -march=armv6 which GCC interprets as supporting
unaligned access, so it too did not want this code.

For completeness, should anyone still care about ARMv5 or be building
with an overly permissive -march flag, GCC does appear unable to inline
the memcpy calls. However, GCC also does not interpret
(uintptr_t)ptr % sizeof(size_t) as an alignment assertion, so such
consumers have already been paying for the memcpy here and throughout
the library.

In general, C's arcane pointer rules mean we must resort to memcpy
often, so, realistically, we must require that the compiler optimize
memcpy well.

[0] https://medium.com/@iLevex/the-curious-case-of-unaligned-access-on-arm-5dd0ebe24965
[1] https://developer.android.com/ndk/guides/abis#armeabi

Change-Id: I3c7dea562adaeb663032e395499e69530dd8e145
Reviewed-on: https://boringssl-review.googlesource.com/c/34873
Reviewed-by: Adam Langley <agl@google.com>
2019-02-14 17:39:36 +00:00
David Benjamin
4d8e1ce5e9 Patch XTS out of ARMv7 bsaes too.
Bug: 256
Change-Id: I822274bf05901d82b41dc9c9c4e6d0b5d622f3ff
Reviewed-on: https://boringssl-review.googlesource.com/c/34871
Reviewed-by: Adam Langley <agl@google.com>
2019-02-14 17:31:37 +00:00
David Benjamin
fb35b147ca Remove stray prototype.
The function's since been renamed.

Change-Id: Id1a9788dfeb5c46b3463611b08318b3f253d03df
Reviewed-on: https://boringssl-review.googlesource.com/c/34870
Reviewed-by: Adam Langley <agl@google.com>
2019-02-14 17:31:14 +00:00
David Benjamin
eb2c2cdf17 Always define GHASH.
There is a C implementation of gcm_ghash_4bit to pair with
gcm_gmult_4bit. It's even slightly faster per the numbers below (x86_64
OPENSSL_NO_ASM build), but, more importantly, we trim down the
combinatorial explosion of GCM implementations and free up complexity
budget for potentially using bsaes better in the future.

Old:
Did 2557000 AES-128-GCM (16 bytes) seal operations in 1000057us (2556854.3 ops/sec): 40.9 MB/s
Did 94000 AES-128-GCM (1350 bytes) seal operations in 1009613us (93105.0 ops/sec): 125.7 MB/s
Did 17000 AES-128-GCM (8192 bytes) seal operations in 1024768us (16589.1 ops/sec): 135.9 MB/s
Did 2511000 AES-256-GCM (16 bytes) seal operations in 1000196us (2510507.9 ops/sec): 40.2 MB/s
Did 84000 AES-256-GCM (1350 bytes) seal operations in 1000412us (83965.4 ops/sec): 113.4 MB/s
Did 15000 AES-256-GCM (8192 bytes) seal operations in 1046963us (14327.2 ops/sec): 117.4 MB/s

New:
Did 2739000 AES-128-GCM (16 bytes) seal operations in 1000322us (2738118.3 ops/sec): 43.8 MB/s
Did 100000 AES-128-GCM (1350 bytes) seal operations in 1008190us (99187.7 ops/sec): 133.9 MB/s
Did 17000 AES-128-GCM (8192 bytes) seal operations in 1006360us (16892.6 ops/sec): 138.4 MB/s
Did 2546000 AES-256-GCM (16 bytes) seal operations in 1000150us (2545618.2 ops/sec): 40.7 MB/s
Did 86000 AES-256-GCM (1350 bytes) seal operations in 1000970us (85916.7 ops/sec): 116.0 MB/s
Did 14850 AES-256-GCM (8192 bytes) seal operations in 1023459us (14509.6 ops/sec): 118.9 MB/s

While I'm here, tighten up some of the functions and align the ctr32 and
non-ctr32 paths.

Bug: 256
Change-Id: Id4df699cefc8630dd5a350d44f927900340f5e60
Reviewed-on: https://boringssl-review.googlesource.com/c/34869
Reviewed-by: Adam Langley <agl@google.com>
2019-02-14 17:30:55 +00:00
David Benjamin
b22c9fea47 Use Windows symbol APIs in the unwind tester.
This should make things a bit easier to debug.

Update-Note: Test binaries on Windows now link to dbghelp.
Bug: 259
Change-Id: I9da1fc89d429080c5250238e4341445922b1dd8e
Reviewed-on: https://boringssl-review.googlesource.com/c/34868
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-02-12 20:42:47 +00:00
David Benjamin
2e819d8be4 Unwind RDRAND functions correctly on Windows.
But for the ABI conversion bits, these are just leaf functions and don't
even need unwind tables. Just renumber the registers on Windows to only
used volatile ones.

In doing so, this switches to writing rdrand explicitly. perlasm already
knows how to manually encode it and our minimum assembler versions
surely cover rdrand by now anyway. Also add the .size directive. I'm not
sure what it's used for, but the other files have it.

(This isn't a generally reusable technique. The more complex functions
will need actual unwind codes.)

Bug: 259
Change-Id: I1d5669bcf8b6e34939885d78aea6f60597be1528
Reviewed-on: https://boringssl-review.googlesource.com/c/34867
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-12 20:24:27 +00:00
David Benjamin
15ba2d11a9 Patch out unused aesni-x86_64 functions.
This shrinks the bssl binary by about 8k.

Change-Id: I571f258ccf7032ae34db3f20904ad9cc81cca839
Reviewed-on: https://boringssl-review.googlesource.com/c/34866
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-11 20:25:22 +00:00
David Benjamin
cc2b8e2552 Add ABI tests for aesni-gcm-x86_64.pl.
Change-Id: Ic23fc5fbec2c4f8df5d06f807c6bd2c5e1f0e99c
Reviewed-on: https://boringssl-review.googlesource.com/c/34865
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-11 20:08:38 +00:00
David Benjamin
7a3b94cd2c Add ABI tests for x86_64-mont5.pl.
Fix some missing CFI bits.

Change-Id: I42114527f0ef8e03079d37a9f466d64a63a313f5
Reviewed-on: https://boringssl-review.googlesource.com/c/34864
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-11 19:27:13 +00:00
Jeremy Apthorp
7ef4223fb3 sync EVP_get_cipherbyname with EVP_do_all_sorted
EVP_get_cipherbyname should work on everything that EVP_do_all_sorted
lists, and conversely, there should be nothing that
EVP_get_cipherbyname works on that EVP_do_all_sorted doesn't list.

node.js uses these APIs to enumerate and instantiate ciphers.

Change-Id: I87fcedce62d06774f7c6ee7acc898326276be089
Reviewed-on: https://boringssl-review.googlesource.com/c/33984
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2019-02-11 17:20:23 +00:00
Katrin Leinweber
d2a0ffdfa7 Hyperlink DOI to preferred resolver
Change-Id: Ib9983a74d5d2f8be7c96cedde17be5a4e9223d5e
Reviewed-on: https://boringssl-review.googlesource.com/c/34844
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2019-02-08 19:20:05 +00:00
David Benjamin
70fe610556 Implement ABI testing for aarch64.
This caught a bug in bn_mul_mont. Tested manually on iOS and Android.

Change-Id: I1819fcd9ad34dbe3ba92bba952507d86dd12185a
Reviewed-on: https://boringssl-review.googlesource.com/c/34805
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-05 21:44:04 +00:00
David Benjamin
55b9acda99 Fix ABI error in bn_mul_mont on aarch64.
This was caught by an aarch64 ABI tester. aarch64 has the same
considerations around small arguments as x86_64 does. The aarch64
version of bn_mul_mont does not mask off the upper words of the
argument.

The x86_64 version does, so size_t is, strictly speaking, wrong for
aarch64, but bn_mul_mont already has an implicit size limit to support
its internal alloca, so this doesn't really make things worse than
before.

Change-Id: I39bffc8fdb2287e45a2d1f0d1b4bd5532bbf3868
Reviewed-on: https://boringssl-review.googlesource.com/c/34804
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-05 21:17:54 +00:00
David Benjamin
0a87c4982c Implement ABI testing for ARM.
Update-Note: There's some chance this'll break iOS since I was unable to
test it there. The iPad I have to test on is too new to run 32-bit code
at all.

Change-Id: I6593f91b67a5e8a82828237d3b69ed948b07922d
Reviewed-on: https://boringssl-review.googlesource.com/c/34725
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-05 21:01:44 +00:00
David Benjamin
0a67eba62d Fix the order of Windows unwind codes.
The unwind tester suggests Windows doesn't care, but the documentation
says that unwind codes should be sorted in descending offset, which
means the last instruction should be first.

https://docs.microsoft.com/en-us/cpp/build/exception-handling-x64?view=vs-2017#struct-unwind_code

Bug: 259
Change-Id: I21e54c362e18e0405f980005112cc3f7c417c70c
Reviewed-on: https://boringssl-review.googlesource.com/c/34785
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-05 19:38:23 +00:00
David Benjamin
28f035f48b Implement unwind testing for Windows.
Unfortunately, due to most OpenSSL assembly using custom exception
handlers to unwind, most of our assembly doesn't work with
non-destructive unwind. For now, CHECK_ABI behaves like
CHECK_ABI_NO_UNWIND on Windows, and CHECK_ABI_SEH will test unwinding on
both platforms.

The tests do, however, work with the unwind-code-based assembly we
recently added, as well as the clmul-based GHASH which is also
code-based. Remove the ad-hoc SEH tests which intentionally hit memory
access exceptions, now that we can test unwind directly.

Now that we can test it, the next step is to implement SEH directives in
perlasm so writing these unwind codes is less of a chore.

Bug: 259
Change-Id: I23a57a22c5dc9fa4513f575f18192335779678a5
Reviewed-on: https://boringssl-review.googlesource.com/c/34784
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-05 19:22:15 +00:00
David Benjamin
fc31677a1d Tolerate spaces when parsing .type directives.
The .type foo, @abi-omnipotent lines weren't being parsed correctly.
This doesn't change the generated files, but some internal state (used
in-progress work on perlasm SEH directives) wasn't quite right.

Change-Id: Id6aec79281a59f45b2eb2aea9f1fb8806b4c483e
Reviewed-on: https://boringssl-review.googlesource.com/c/34786
Reviewed-by: Adam Langley <agl@google.com>
2019-02-05 15:47:26 +00:00
David Benjamin
33f456b8b0 Don't use bsaes over vpaes for CTR-DRBG.
RAND_bytes rarely uses large enough inputs for bsaes to be worth it.
https://boringssl-review.googlesource.com/c/boringssl/+/33589 includes some
rough benchmarks of various bits here. Some observations:

- 8 blocks of bsaes costs roughly 6.5 blocks of vpaes. Note the comparison
  isn't quite accurate because I'm measuring bsaes_ctr32_encrypt_blocks against
  vpaes_encrypt and vpaes in CTR mode today must make do with a C loop. Even
  assuming a cutoff of 6 rather than 7 blocks, it's rare to ask for 96 bytes
  of entropy at a time.

- CTR-DRBG performs some stray block operations (ctr_drbg_update), which bsaes
  is bad at without extra work to fold them into the CTR loop (not really worth
  it).

- CTR-DRBG calculates a couple new key schedules every RAND_bytes call. We
  don't currently have a constant-time bsaes key schedule. Unfortunately, even
  plain vpaes loses to the current aes_nohw used by bsaes, but it's not
  constant-time. Also taking CTR-DRBG out of the bsaes equation

- Machines without AES hardware (clients) are not going to be RNG-bound. It's
  mostly servers pushing way too many CBC IVs that care. This means bsaes's
  current side channel tradeoffs make even less sense here.

I'm not sure yet what we should do for the rest of the bsaes mess, but it seems
clear that we want to stick with vpaes for the RNG.

Bug: 256
Change-Id: Iec8f13af232794afd007cb1065913e8117eeee24
Reviewed-on: https://boringssl-review.googlesource.com/c/34744
Reviewed-by: Adam Langley <agl@google.com>
2019-02-01 18:03:39 +00:00
David Benjamin
470bd56c9b perlasm/x86_64-xlate.pl: refine symbol recognition in .xdata.
Hexadecimals were erroneously recognized as symbols in .xdata.

(Imported from upstream's b068a9b914887af5cc99895754412582fbb0e10b)

Change-Id: I5d8e8e1969669a8961733802d9f034cf26c45552
Reviewed-on: https://boringssl-review.googlesource.com/c/34704
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-02-01 18:02:44 +00:00
Jesse Selover
d7266ecc9b Enforce key usage for RSA keys in TLS 1.2.
For now, this is off by default and controlled by SSL_set_enforce_rsa_key_usage.
This may be set as late as certificate verification so we may start by enforcing
it for known roots.

Generalizes ssl_cert_check_digital_signature_key_usage to check any part of the
key_usage, and adds a new error KEY_USAGE_BIT_INCORRECT for the generalized
method.

Bug: chromium:795089
Change-Id: Ifa504c321bec3263a4e74f2dc48513e3b895d3ee
Reviewed-on: https://boringssl-review.googlesource.com/c/34604
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-01-30 21:28:34 +00:00
David Benjamin
23e1a1f2d3 Test and fix an ABI issue with small parameters.
Calling conventions must specify how to handle arguments smaller than a
machine word. Should the caller pad them up to a machine word size with
predictable values (zero/sign-extended), or should the callee tolerate
an arbitrary bit pattern?

Annoyingly, I found no text in either SysV or Win64 ABI documentation
describing any of this and resorted to experiment. The short answer is
that callees must tolerate an arbitrary bit pattern on x86_64, which
means we must test this. See the comment in abi_test::internal::ToWord
for the long answer.

CHECK_ABI now, if the type of the parameter is smaller than
crypto_word_t, fills the remaining bytes with 0xaa. This is so the
number is out of bounds for code expecting either zero or sign
extension. (Not that crypto assembly has any business seeing negative
numbers.)

Doing so reveals a bug in ecp_nistz256_ord_sqr_mont. The rep parameter
is typed int, but the code expected uint64_t. In practice, the compiler
will always compile this correctly because:

- On both Win64 and SysV, rep is a register parameter.

- The rep parameter is always a constant, so the compiler has no reason
  to leave garbage in the upper half.

However, I was indeed able to get a bug out of GCC via:

  uint64_t foo = (1ull << 63) | 2;  // Some global the compiler can't
                                    // prove constant.
  ecp_nistz256_ord_sqr_mont(res, a, foo >> 1);

Were ecp_nistz256_ord_sqr_mont a true int-taking function, this would
act like ecp_nistz256_ord_sqr_mont(res, a, 1). Instead, it hung. Fix
this by having it take a full-width word.

This mess has several consequences:

- ABI testing now ideally needs a functional testing component to fully cover
  this case. A bad input might merely produce the wrong answer. Still,
  this is fairly effective as it will cause most code to either segfault
  or loop forever. (Not the enc parameter to AES however...)

- We cannot freely change the type of assembly function prototypes. If the
  prototype says int or unsigned, it must be ignoring the upper half and
  thus "fixing" it to size_t cannot have handled the full range. (Unless
  it was simply wrong of the parameter is already bounded.) If the
  prototype says size_t, switching to int or unsigned will hit this type
  of bug. The former is a safer failure mode though.

- The simplest path out of this mess: new assembly code should *only*
  ever take word-sized parameters. This is not a tall order as the bad
  parameters are usually ints that should have been size_t.

Calling conventions are hard.

Change-Id: If8254aff8953844679fbce4bd3e345e5e2fa5213
Reviewed-on: https://boringssl-review.googlesource.com/c/34627
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-28 21:09:40 +00:00
David Benjamin
ab578adf44 Add RSAZ ABI tests.
As part of this, move the CPU checks to C.

Change-Id: I17b701e1196c1ca116bbd23e0e669cf603ad464d
Reviewed-on: https://boringssl-review.googlesource.com/c/34626
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-01-28 21:00:49 +00:00
David Benjamin
3859fc883d Better document RSAZ and tidy up types.
It's an assembly function, so types are a little meaningless, but
everything is passed through as BN_ULONG, so be consistent. Also
annotate all the RSAZ prototypes with sizes.

Change-Id: I32e59e896da39e79c30ce9db52652fd645a033b4
Reviewed-on: https://boringssl-review.googlesource.com/c/34625
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-01-28 20:54:27 +00:00
David Benjamin
e569c7e25d Add ABI testing for 32-bit x86.
This is much less interesting (stack-based parameters, Windows and SysV
match, no SEH concerns as far as I can tell) than x86_64, but it was
easy to do and I'm more familiar with x86 than ARM, so it made a better
second architecture to make sure all the architecture ifdefs worked out.

Also fix a bug in the x86_64 direction flag code. It was shifting in the
wrong direction, making give 0 or 1<<20 rather than 0 or 1.

(Happily, x86_64 appears to be unique in having vastly different calling
conventions between OSs. x86 is the same between SysV and Windows, and
ARM had the good sense to specify a (mostly) common set of rules.)

Since a lot of the assembly functions use the same names and the tests
were written generically, merely dropping in a trampoline and
CallerState implementation gives us a bunch of ABI tests for free.

Change-Id: I15408c18d43e88cfa1c5c0634a8b268a150ed961
Reviewed-on: https://boringssl-review.googlesource.com/c/34624
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-01-28 20:40:06 +00:00
David Benjamin
8cbb5f8f20 Add a very roundabout EC keygen API.
OpenSSL's EVP-level EC API involves a separate "paramgen" operation,
which is ultimately just a roundabout way to go from a NID to an
EC_GROUP. But Node uses this, and it's the pattern used within OpenSSL
these days, so this appears to be the official upstream recommendation.

Also add a #define for OPENSSL_EC_EXPLICIT_CURVE, because Node uses it,
but fail attempts to use it. Explicit curve encodings are forbidden by
RFC 5480 and generally a bad idea. (Parsing such keys back into OpenSSL
will cause it to lose the optimized path.)

Change-Id: I5e97080e77cf90fc149f6cf6f2cc4900f573fc64
Reviewed-on: https://boringssl-review.googlesource.com/c/34565
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-25 23:08:12 +00:00
David Benjamin
23dcf88e18 Add some Node compatibility functions.
This doesn't cover all the functions used by Node, but it's the easy
bits. (EVP_PKEY_paramgen will be done separately as its a non-trivial
bit of machinery.)

Change-Id: I6501e99f9239ffcdcc57b961ebe85d0ad3965549
Reviewed-on: https://boringssl-review.googlesource.com/c/34544
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2019-01-25 16:50:30 +00:00
Christopher Patton
6c1b376e1d Implement server support for delegated credentials.
This implements the server-side of delegated credentials, a proposed
extension for TLS:
https://tools.ietf.org/html/draft-ietf-tls-subcerts-02

Change-Id: I6a29cf1ead87b90aeca225335063aaf190a417ff
Reviewed-on: https://boringssl-review.googlesource.com/c/33666
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2019-01-24 20:06:58 +00:00
David Benjamin
4545503926 Add a constant-time pshufb-based GHASH implementation.
We currently require clmul instructions for constant-time GHASH
on x86_64. Otherwise, it falls back to a variable-time 4-bit table
implementation. However, a significant proportion of clients lack these
instructions.

Inspired by vpaes, we can use pshufb and a slightly different order of
incorporating the bits to make a constant-time GHASH. This requires
SSSE3, which is very common. Benchmarking old machines we had on hand,
it appears to be a no-op on Sandy Bridge and a small slowdown for
Penryn.

Sandy Bridge (Intel Pentium CPU 987 @ 1.50GHz):
(Note: these numbers are before 16-byte-aligning the table. That was an
improvement on Penryn, so it's possible Sandy Bridge is now better.)
Before:
Did 4244750 AES-128-GCM (16 bytes) seal operations in 4015000us (1057222.9 ops/sec): 16.9 MB/s
Did 442000 AES-128-GCM (1350 bytes) seal operations in 4016000us (110059.8 ops/sec): 148.6 MB/s
Did 84000 AES-128-GCM (8192 bytes) seal operations in 4015000us (20921.5 ops/sec): 171.4 MB/s
Did 3349250 AES-256-GCM (16 bytes) seal operations in 4016000us (833976.6 ops/sec): 13.3 MB/s
Did 343500 AES-256-GCM (1350 bytes) seal operations in 4016000us (85532.9 ops/sec): 115.5 MB/s
Did 65250 AES-256-GCM (8192 bytes) seal operations in 4015000us (16251.6 ops/sec): 133.1 MB/s
After:
Did 4229250 AES-128-GCM (16 bytes) seal operations in 4016000us (1053100.1 ops/sec): 16.8 MB/s [-0.4%]
Did 442250 AES-128-GCM (1350 bytes) seal operations in 4016000us (110122.0 ops/sec): 148.7 MB/s [+0.1%]
Did 83500 AES-128-GCM (8192 bytes) seal operations in 4015000us (20797.0 ops/sec): 170.4 MB/s [-0.6%]
Did 3286500 AES-256-GCM (16 bytes) seal operations in 4016000us (818351.6 ops/sec): 13.1 MB/s [-1.9%]
Did 342750 AES-256-GCM (1350 bytes) seal operations in 4015000us (85367.4 ops/sec): 115.2 MB/s [-0.2%]
Did 65250 AES-256-GCM (8192 bytes) seal operations in 4016000us (16247.5 ops/sec): 133.1 MB/s [-0.0%]

Penryn (Intel Core 2 Duo CPU P8600 @ 2.40GHz):
Before:
Did 1179000 AES-128-GCM (16 bytes) seal operations in 1000139us (1178836.1 ops/sec): 18.9 MB/s
Did 97000 AES-128-GCM (1350 bytes) seal operations in 1006347us (96388.2 ops/sec): 130.1 MB/s
Did 18000 AES-128-GCM (8192 bytes) seal operations in 1028943us (17493.7 ops/sec): 143.3 MB/s
Did 977000 AES-256-GCM (16 bytes) seal operations in 1000197us (976807.6 ops/sec): 15.6 MB/s
Did 82000 AES-256-GCM (1350 bytes) seal operations in 1012434us (80992.9 ops/sec): 109.3 MB/s
Did 15000 AES-256-GCM (8192 bytes) seal operations in 1006528us (14902.7 ops/sec): 122.1 MB/s
After:
Did 1306000 AES-128-GCM (16 bytes) seal operations in 1000153us (1305800.2 ops/sec): 20.9 MB/s [+10.8%]
Did 94000 AES-128-GCM (1350 bytes) seal operations in 1009852us (93082.9 ops/sec): 125.7 MB/s [-3.4%]
Did 17000 AES-128-GCM (8192 bytes) seal operations in 1012096us (16796.8 ops/sec): 137.6 MB/s [-4.0%]
Did 1070000 AES-256-GCM (16 bytes) seal operations in 1000929us (1069006.9 ops/sec): 17.1 MB/s [+9.4%]
Did 79000 AES-256-GCM (1350 bytes) seal operations in 1002209us (78825.9 ops/sec): 106.4 MB/s [-2.7%]
Did 15000 AES-256-GCM (8192 bytes) seal operations in 1061489us (14131.1 ops/sec): 115.8 MB/s [-5.2%]

Change-Id: I1c3760a77af7bee4aee3745d1c648d9e34594afb
Reviewed-on: https://boringssl-review.googlesource.com/c/34267
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-24 17:19:21 +00:00
Adam Langley
51011b4a26 Remove union from |SHA512_CTX|.
With 2fe0360a4e, we no longer use the
other member of this union so it can be removed.

Change-Id: Ideb7c47a72df0b420eb1e7d8c718e1cacb2129f5
Reviewed-on: https://boringssl-review.googlesource.com/c/34449
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2019-01-22 23:36:46 +00:00
David Benjamin
4f3f597d32 Avoid unwind tests on libc functions.
When built under UBSan, it gets confused inside a PLT stub.

Change-Id: Ib082ecc076ba2111337ff5921e465e4beb99aab5
Reviewed-on: https://boringssl-review.googlesource.com/c/34448
Reviewed-by: Adam Langley <agl@google.com>
2019-01-22 23:29:24 +00:00
David Benjamin
14c611cf91 Don't pass NULL,0 to qsort.
qsort shares the same C language bug as mem*. Two of our calls may see
zero-length lists. This trips UBSan.

Change-Id: Id292dd277129881001eb57b1b2db78438cf4642e
Reviewed-on: https://boringssl-review.googlesource.com/c/34447
Reviewed-by: Adam Langley <agl@google.com>
2019-01-22 23:28:38 +00:00
David Benjamin
2fe0360a4e Fix undefined pointer casts in SHA-512 code.
Casting an unaligned pointer to uint64_t* is undefined, even on
platforms that support unaligned access. Additionally, dereferencing as
uint64_t violates strict aliasing rules. Instead, use memcpys which we
assume any sensible compiler can optimize. Also simplify the PULL64
business with the existing CRYPTO_bswap8.

This also removes the need for the
SHA512_BLOCK_CAN_MANAGE_UNALIGNED_DATA logic. The generic C code now
handles unaligned data and the assembly already can as well. (The only
problematic platform with assembly is old ARM, but sha512-armv4.pl
already handles this via an __ARM_ARCH__ check.  See also OpenSSL's
version of this file which always defines
SHA512_BLOCK_CAN_MANAGE_UNALIGNED_DATA if SHA512_ASM is defined.)

Add unaligned tests to digest_test.cc, so we retain coverage of
unaligned EVP_MD inputs.

Change-Id: Idfd8586c64bab2a77292af2fa8eebbd193e57c7d
Reviewed-on: https://boringssl-review.googlesource.com/c/34444
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-22 23:18:36 +00:00
Adam Langley
72f015562c HRSS: flatten sample distribution.
With HRSS-SXY, the sampling algorithm now longer has to be the same
between the two parties. Therefore we can change it at will (as long as
it remains reasonably uniform) and thus take the opportunity to make the
output distribution flatter.

Change-Id: I74c667fcf919fe11ddcf2f4fb8a540b5112268bf
Reviewed-on: https://boringssl-review.googlesource.com/c/34404
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2019-01-22 22:06:43 +00:00
Adam Langley
c1615719ce Add test of assembly code dispatch.
The first attempt involved using Linux's support for hardware
breakpoints to detect when assembly code was run. However, this doesn't
work with SDE, which is a problem.

This version has the assembly code update a global flags variable when
it's run, but only in non-FIPS and non-debug builds.

Update-Note: Assembly files now pay attention to the NDEBUG preprocessor
symbol. Ensure the build passes the symbol in. (If release builds fail
to link due to missing BORINGSSL_function_hit, this is the cause.)

Change-Id: I6b7ced442b7a77d0b4ae148b00c351f68af89a6e
Reviewed-on: https://boringssl-review.googlesource.com/c/33384
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2019-01-22 20:22:53 +00:00
Adam Langley
eadef4730e Simplify HRSS mod3 circuits.
The multiplication and subtraction circuits were found by djb using GNU
Superoptimizer, and the addition circuit is derived from the subtraction
one by hand. They depend on a different representation: -1 is now (1, 1)
rather than (1, 0), and the latter becomes undefined.

The following Python program checks that the circuits work:

values = [0, 1, -1]

def toBits(v):
    if v == 0:
        return 0, 0
    elif v == 1:
        return 0, 1
    elif v == -1:
        return 1, 1
    else:
        raise ValueError(v)

def mul((s1, a1), (s2, a2)):
    return ((s1 ^ s2) & a1 & a2, a1 & a2)

def add((s1, a1), (s2, a2)):
    t = s1 ^ a2
    return (t & (s2 ^ a1), (a1 ^ a2) | (t ^ s2))

def sub((s1, a1), (s2, a2)):
    t = a1 ^ a2
    return ((s1 ^ a2) & (t ^ s2), t | (s1 ^ s2))

def fromBits((s, a)):
    if s == 0 and a == 0:
        return 0
    if s == 0 and a == 1:
        return 1
    if s == 1 and a == 1:
        return -1
    else:
        raise ValueError((s, a))

def wrap(v):
    if v == 2:
        return -1
    elif v == -2:
        return 1
    else:
        return v

for v1 in values:
    for v2 in values:
        print v1, v2

        result = fromBits(mul(toBits(v1), toBits(v2)))
        if result != v1 * v2:
            raise ValueError((v1, v2, result))

        result = fromBits(add(toBits(v1), toBits(v2)))
        if result != wrap(v1 + v2):
            raise ValueError((v1, v2, result))

        result = fromBits(sub(toBits(v1), toBits(v2)))
        if result != wrap(v1 - v2):
            raise ValueError((v1, v2, result))

Change-Id: Ie1a4ca5a82c2651057efc62330eca6fdd9878122
Reviewed-on: https://boringssl-review.googlesource.com/c/34344
Reviewed-by: David Benjamin <davidben@google.com>
2019-01-21 21:32:35 +00:00
David Benjamin
73b1f181b6 Add ABI tests for GCM.
Change-Id: If28096e677104c6109e31e31a636fee82ef4ba11
Reviewed-on: https://boringssl-review.googlesource.com/c/34266
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-15 22:49:37 +00:00
David Benjamin
8285ccd8fc Fix SSL_R_TOO_MUCH_READ_EARLY_DATA.
https://boringssl-review.googlesource.com/15164 allocated a new error code by
hand, rather than using the make_errors.go script, which caused it to clobber
the error space reserved for alerts.

Change-Id: Ife92c45da2c1d3c5506439bd5781ae91240d16d8
Reviewed-on: https://boringssl-review.googlesource.com/c/34307
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-15 21:53:52 +00:00
David Benjamin
b65ce68c8f Test CRYPTO_gcm128_tag in gcm_test.cc.
CRYPTO_gcm128_encrypt should be paired with CRYPTO_gcm128_tag, not
CRYPTO_gcm128_finish.

Change-Id: Ia3023a196fe5b613e9309b5bac19ea849dbc33b7
Reviewed-on: https://boringssl-review.googlesource.com/c/34265
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-15 18:19:57 +00:00
David Benjamin
f18bd55240 Remove pointer cast in P-256 table.
We expect the table to have a slightly nested structure, so just
generate it that way. Avoid risking strict aliasing problems. Thanks to
Brian Smith for pointing this out.

Change-Id: Ie21610c4afab07a610d914265079135dba17b3b7
Reviewed-on: https://boringssl-review.googlesource.com/c/34264
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-15 00:16:17 +00:00
Adam Langley
3eac8b7708 Ignore new fields in forthcoming Wycheproof tests.
Change-Id: I95dd20bb71c18cecd4cae72bcdbd708ee5e92e77
Reviewed-on: https://boringssl-review.googlesource.com/c/34284
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2019-01-14 22:02:37 +00:00
David Benjamin
5349ddb747 Fix RSAZ's OPENSSL_cleanse.
https://boringssl-review.googlesource.com/28584 switched RSAZ's buffer
to being externally-allocated, which means the OPENSSL_cleanse needs to
be tweaked to match.

Change-Id: I0a7307ac86aa10933d10d380ef652c355fed3ee9
Reviewed-on: https://boringssl-review.googlesource.com/c/34191
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-01-14 20:04:39 +00:00
Tom Tan
de3c1f69cc Fix header file for _byteswap_ulong and _byteswap_uint64 from MSVC CRT
_byteswap_ulong and _byteswap_uint64 are documented (see below link) as coming from stdlib.h.
 On some build configurations stdlib.h is pulled in by intrin.h but that is not guaranteed. In particular,
this assumption causes build breaks when building Chromium for Windows ARM64 with clang-cl. This
 change switches the #include to use the documented header file, thus fixing Windows ARM64 with clang-cl.


https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/byteswap-uint64-byteswap-ulong-byteswap-ushort

Bug: chromium:893460
Change-Id: I738c7227a9e156c894c2be62b52228a5bbd88414
Reviewed-on: https://boringssl-review.googlesource.com/c/34244
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Bruce Dawson <brucedawson@chromium.org>
Commit-Queue: David Benjamin <davidben@google.com>
2019-01-14 19:49:39 +00:00
David Benjamin
2bee229103 Add ABI tests for HRSS assembly.
The last instruction did not unwind correctly. Also add .type and .size
annotations so that errors show up properly.

Change-Id: Id18e12b4ed51bdabb90bd5ac66631fd989649eec
Reviewed-on: https://boringssl-review.googlesource.com/c/34190
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2019-01-09 04:10:25 +00:00
David Benjamin
d99b549b8e Add AES ABI tests.
This involves fixing some bugs in aes_nohw_cbc_encrypt's annotations,
and working around a libunwind bug. In doing so, support .cfi_remember_state
and .cfi_restore_state in perlasm.

Change-Id: Iaedfe691356b0468327a6be0958d034dafa760e5
Reviewed-on: https://boringssl-review.googlesource.com/c/34189
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2019-01-09 03:54:55 +00:00
David Benjamin
c0f4dbe4e2 Move aes_nohw, bsaes, and vpaes prototypes to aes/internal.h.
This is in preparation for adding ABI tests to them.

In doing so, update delocate.go so that OPENSSL_ia32cap_get is consistently
callable outside the module. Right now it's callable both inside and outside
normally, but not in FIPS mode because the function is generated. This is
needed for tests and the module to share headers that touch OPENSSL_ia32cap_P.

Change-Id: Idbc7d694acfb974e0b04adac907dab621e87de62
Reviewed-on: https://boringssl-review.googlesource.com/c/34188
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-09 03:35:55 +00:00
David Benjamin
e592d595c4 Add direction flag checking to CHECK_ABI.
Linux and Windows ABIs both require that the direction flag be cleared
on function exit, so that functions can rely on it being cleared on
entry. (Some OpenSSL assembly preserves it, which is stronger, but we
only require what is specified by the ABI so CHECK_ABI works with C
compiler output.)

Change-Id: I1a320aed4371176b4b44fe672f1a90167b84160f
Reviewed-on: https://boringssl-review.googlesource.com/c/34187
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-09 03:22:15 +00:00
David Benjamin
b2f56f9283 Add ABI tests for ChaCha20_ctr32.
Change-Id: I1fad7f954284000474e5723c3fa59fedceb52ad4
Reviewed-on: https://boringssl-review.googlesource.com/c/34186
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-09 03:11:45 +00:00
David Benjamin
5e350d13f5 Add ABI tests for MD5.
This does not actually matter, but writing new CFI directives with the
tester seemed like fun. (It caught two typos, one intentional and one
accidental.)

Change-Id: Iff3e0358f2e56caa26079f658fa7a682772150a1
Reviewed-on: https://boringssl-review.googlesource.com/c/34185
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-08 18:01:07 +00:00
David Benjamin
1aaa7aa83c Add ABI tests for bn_mul_mont.
Bug: 181
Change-Id: Ibd606329278c6b727d95e762920a12b58bb8687a
Reviewed-on: https://boringssl-review.googlesource.com/c/33969
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-04 19:21:31 +00:00
David Benjamin
005f616217 Add ABI tests for SHA*.
Bug: 181
Change-Id: Ica9299613d7fd1b803533b7e489b9ba8fe816a24
Reviewed-on: https://boringssl-review.googlesource.com/c/33968
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-04 19:14:11 +00:00