Commit Graph

5761 Commits

Author SHA1 Message Date
a4efa781dd WIP
Change-Id: I0e8551993f67a6b5a9ef453678996b856e560fa7
2019-04-25 09:48:02 +01:00
daa8e7a14a WIP
Change-Id: I9905e17519d58ac33cdf70e9465923f0ed122a67
2019-04-24 19:39:49 +01:00
c823d1090a WIP.1
Change-Id: I72239a08e36e6818a220074096d34774d5e324b9
2019-04-24 19:34:02 +01:00
4fd8221fc5 WIP 2019-04-24 19:08:23 +01:00
f091e62e14 Integrate SIKE with TLS key exchange.
Implements support for hybrid key exchange based on SIKEp503, a post
quantum, isogeny based KEM. This is a hybrid construction mixed with
X25519 key agreement. Code point is 0xFE32. Cloudflare's SIDH
implementation is used for testing. Key exchange can be used with TLS1.3
only.

Change-Id: I3a5f38d6f7d016274e5bcfb629249664e1d983eb
2019-04-23 14:43:32 +01:00
4b728181d2 Add support for SIKE/p503 post-quantum KEM
Based on Microsoft's implementation available on github:
Source: https://github.com/Microsoft/PQCrypto-SIDH
Commit: 77044b76181eb61c744ac8eb7ddc7a8fe72f6919

Following changes has been applied

* In intel assembly, use MOV instead of MOVQ:
  Intel instruction reference in the Intel Software Developer's Manual
  volume 2A, the MOVQ has 4 forms. None of them mentions moving
  literal to GPR, hence "movq $rax, 0x0" is wrong. Instead, on 64bit
  system, MOV can be used.

* Some variables were wrongly zero-initialized (as per C99 spec).

* Rewrite x86_64 assembly to AT&T format.

* Move assembly for x86_64 and aarch64 to perlasm.

* Move constant values to .RODATA segment, as keeping them in .TEXT
  segment is not compatible with XOM.

* Fixes issue in arm64 code related to the fact that compiler doesn't
  reserve enough space for the linker to relocate address of a global
  variable when used by 'ldr' instructions. Solution is to use 'adrp'
  followed by 'add' instruction. Relocations for 'adrp' and 'add'
  instructions is generated by prefixing the label with :pg_hi21:
  and :lo12: respectively.

* Enable MULX and ADX. Code from MS doesn't support PIC. MULX can't
  reference global variable directly. Instead RIP-relative addressing
  can be used. This improves performance around 10%-13% on SkyLake

* Check if CPU supports BMI2 and ADOX instruction at runtime. On AMD64
  optimized implementation of montgomery multiplication and reduction
  have 2 implementations - faster one takes advantage of BMI2
  instruction set introduced in Haswell and ADOX introduced in
  Broadwell. Thanks to OPENSSL_ia32cap_P it can be decided at runtime
  which implementation to choose. As CPU configuration is static by
  nature, branch predictor will be correct most of the time and hence
  this check very often has no cost.

* Reuse some utilities from boringssl instead of reimplementing them.
  This includes things like:
  * definition of a limb size (use crypto_word_t instead of digit_t)
  * use functions for checking in constant time if value is 0 and/or
    less then
  * #define's used for conditional compilation

* Use SSE2 for conditional swap on vector registers. Improves
  performance a little bit.

* Fix f2elm_t definition. Code imported from MSR defines f2elm_t type as
  a array of arrays. This decays to a pointer to an array (when passing
  as an argument). In C, one can't assign const pointer to an array with
  non-const pointer to an array. Seems it violates 6.7.3/8 from C99
  (same for C11). This problem occures in GCC 6, only when -pedantic
  flag is specified and it occures always in GCC 4.9 (debian jessie).

* Fix definition of eval_3_isog. Second argument in eval_3_isog mustn't be
  const. Similar reason as above.

* Use HMAC-SHA256 instead of cSHAKE-256 to avoid upstreaming cSHAKE
  and SHA3 code.

* Add speed and unit tests for SIKE.

Change-Id: I22f0bb1f9edff314a35cd74b48e8c4962568e330
2019-04-23 14:42:03 +01:00
Adam Langley
7540cc2ec0 Predeclare enums in base.h
Including ssl.h is quite a chunk of code and #defines, so we've tried to
limit its spread internally in the interests of code hygine given that
we have a multi-billion-line repo.

However, header files that mention enums from ssl.h currently need to
include ssl.h. For example, your class may have static class member
functions intended to be callbacks, and they need to be class members
because they'll call other private methods.

C cannot predeclare enums, but C++ can if you explicitly type them.
Sadly C doesn't support explicit types. So option one is to move the
enums into base.h. That works, but the enums properly live in ssl.h and
reading the header file is a lot clearer if you don't have to jump
around to see all the pieces.

So option two (this change) is to explicitly type and predelcare the
enums in base.h for C++ only. The worry now is that C and C++ might
disagree about the type of the enums. However, this has already
happened: at least for |ssl_private_key_result_t|, g++ thinks that it's
an |int| (without any explicit type) and gcc thinks that it's an
|unsigned|. At least they're the same length, I guess?

So, to make sure that this doesn't slip any more, this change also adds
|ssl_test_c.c| which tests that C views the enums as having the same
size as an |int|, at least.

Change-Id: I8248583ec997021f8226d5a798609f6afc96dac4
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35664
Reviewed-by: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2019-04-22 21:49:12 +00:00
David Benjamin
c67076d653 Require certificates under name constraints use SANs.
The common name fallback does not interact well with name constraints.
Until we remove this fallback, we must resolve this conflict.

Blindly applying name constraints to the common name will reject
"decorative" common names that aren't intended to be hostnames (e.g.
[0]). We need to guess based on format whether the common name is a DNS
name. It is important this same check is applied to *both* name
constraints and name matching, which means the OpenSSL version (see
5bd5dcd49605ca2aa7931599894302a3ac4b0b04,
d02d80b2e80adfdde49f76cf7c7af4e013f45005, and
55a6250f1e7336e8a7d89fb609eb23398715ff6f) is unsuitable as a
compatibility data point.

In theory we could limit this to chains with name constraints, which are
uncommon, but X509_check_host sees only the leaf. We must apply it
uniformly. That means a strict check risks problems with malformed
non-WebPKI setups like [1].

For a first pass, mirror Go's behavior. Like Go, rather than run
SAN-less DNS-like common names through name constraints, we simply
reject all such certificates. Name constraints now exclude all leaf
certificates that can trigger the common name fallback. They are rare
enough that we can hopefully hold them to a higher standard.

Note this does not make misclassified decorative common names any worse,
compared to the checking the name constraint. Such names would not have
matched the constraint anyway.

Update-Note: This can may cause two kinds of errors:

1. Leaf certificates whose chain contains a name constraint and lack
   SANs may be rejected with X509_V_ERR_NAME_CONSTRAINTS_WITHOUT_SANS.

2. Leaf certificates which use the common name fallback and verify
   against an insufficiently DNS-looking hostname may fail with
   X509_V_ERR_HOSTNAME_MISMATCH.

In both cases, the fix is to include the subjectAltName in the
certificate, rather than rely on the common name fallback. (Refining the
heuristic is also an option, but the two failure modes pull it in
opposite directions, so this is tricky.)

[0] https://github.com/golang/go/issues/24151
[1] https://github.com/GoogleCloudPlatform/cloudsql-proxy/issues/194

Change-Id: If25557de428768292a14ba3bdeeffbd74e3a3bf8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35665
Reviewed-by: Adam Langley <agl@google.com>
2019-04-22 21:32:29 +00:00
David Benjamin
e55c64fdd3 Make X509_verify_cert_error_string thread-safe.
If the error is unknown, we should not return a static buffer. See also
c0a445a9f279d8c4a519b58e52a50112f2341070 from upstream.

Change-Id: I23e1a3b9e29b34ab3dff41b8a58155683bbb9bd2
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35684
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-04-18 18:58:03 +00:00
David Benjamin
d86eb1bbb3 Disable the common name fallback on *any* SAN list.
This aligns with the Go crypto/x509 behavior and reduces the cases when
the SAN to CN fallback occurs. If the certificate is new enough to have
a SAN list, even if it only contains email or IP addresses, it is
reasonable to assume the certificate is new enough that the common name
is not a DNS name.

Update-Note: Our certificate verification is getting slightly stricter.
Change-Id: I9e3466d8dd8a722405c546181a589f797efa43f9
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35647
Reviewed-by: Adam Langley <agl@google.com>
2019-04-18 18:37:36 +00:00
David Benjamin
923feba608 Silently ignore X509_CHECK_FLAG_ALWAYS_CHECK_SUBJECT.
This flag is backwards. We want to check the common name less, not more. See if
anything was actually relying on this.

Update-Note: X509_CHECK_FLAG_ALWAYS_CHECK_SUBJECT is now ignored.
Change-Id: I8288d57540f8117059e58d72cc173aa4d3077fb6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35646
Reviewed-by: Adam Langley <agl@google.com>
2019-04-18 18:36:46 +00:00
David Benjamin
c60b42bf7e Add X509_CHECK_FLAG_NEVER_CHECK_SUBJECT.
cryptography.io uses this and it's also the correct behavior. Ideally it would
be default, but start with just adding the flag. See also
dd60efea955e41a6f0926f93ec1503c6f83c4e58 from upstream.

Change-Id: I9e13cdbfd44c904ba5bd69a5a66c68c4b7596867
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35645
Reviewed-by: Adam Langley <agl@google.com>
2019-04-18 18:14:12 +00:00
David Benjamin
9df41ae953 Give ENGINE_free a return value.
This simplifies building against cryptography.io, which expects
ENGINE_free to return something.

Change-Id: Id1590abab7f47dae6b3a9d593fa7b0fe371c9912
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35644
Reviewed-by: Adam Langley <agl@google.com>
2019-04-17 20:57:57 +00:00
Adam Langley
c9827e073f Output a ClientHello during handoff.
This will allow edge servers to pass judgement on the ClientHello before
completing the handoff process. This also means that edge servers will
now enforce ClientHello well-formedness — previously that check didn't
occur until the handshaker tried to parse the handoff submission.

Change-Id: I9804ac0224632b4b4381c1a81f434d188e0b9376
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35584
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2019-04-15 22:29:15 +00:00
David Benjamin
2e26348e25 Fix and test EVP_PKEY_CTX copying.
The RSA-PSS salt length was not being copied, and copying an Ed25519
EVP_MD_CTX did not work.

This is rather pointless (an EVP_PKEY_CTX is just a bundle of
parameters), and it's unlikely anyone ever will use this. But since
OpenSSL's EVP_PKEY signing API reuses EVP_MD_CTX and EVP_MD_CTX_copy_ex
is plausible in that scenario, we're stuck making EVP_MD_CTX_copy_ex
reachable for EVP_PKEY too. That then implies EVP_PKEY_dup should exist,
and if it exists we should be testing it.

Change-Id: I189435d0c716a83f58e1d8ac4abc2c409ecfea64
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35626
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-04-15 22:22:35 +00:00
David Benjamin
d1a6d23686 Test copying an EVP_MD_CTX.
We should have test coverage for this path.

Change-Id: I8bcd9e2481562b3ad1e447c03a52b8ff4ff25606
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35625
Reviewed-by: Adam Langley <agl@google.com>
2019-04-15 21:56:07 +00:00
David Benjamin
65dc45cb57 Fix EVP_CIPHER_CTX_copy for AES-GCM.
7578f3f0de made it work, but
26ba48a6fb regressed it by losing the
EVP_CIPH_CUSTOM_COPY flag. Additionally, we've since added an alignment
requirement to EVP_AES_GCM_CTX, which complicates things.

Thanks to Guido Vranken for catching this!

Bug: 270
Change-Id: I71784593dc5a34d1334c92a4daa93546ec0ee2c3
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35624
Reviewed-by: Adam Langley <agl@google.com>
2019-04-15 21:55:06 +00:00
David Benjamin
4a8c05ffe8 Check key sizes in AES_set_*_key.
AES_set_*_key used to call directly into aes_nohw_set_*_key which
gracefully handles some NULL parameters and invalid bit sizes. However,
we now enable optimized assembly implementations, not all of which
perform these checks. (vpaes does not.)

This is fine for the internal assembly functions themselves. Such checks
are better written in C than assembly, and the calling C code usually
already knows the key size. (Indeed aes_ctr_set_key already assumes the
assembly functions are infallible.) AES_set_*_key are public APIs,
however. The NULL check is silly, but we should handle length-like
checks in public APIs.

Change-Id: I259ae6b9811ceaa9dc5bd7173d5754ca7079cff8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35564
Reviewed-by: Adam Langley <agl@google.com>
2019-04-11 15:33:57 +00:00
David Benjamin
31ef16ac2d Add missing nonce_len check to aead_aes_gcm_siv_asm_open.
Test invalid nonce lengths more thoroughly to cover this case on all our
AEADs. Thanks to Guido Vranken for catching this!

In doing so, this also reveals we have a ton of redundant error codes
(https://crbug.com/boringssl/269). I'll tidy that up in a separate
change as it may require some changes to code in Android. For now, this
change uses CIPHER_R_UNSUPPORTED_NONCE_SIZE just to be consistent with
the rest of that file.

Bug: 268
Change-Id: I0a479000ec3005ee55c828eaa92c8302b4625847
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35545
Reviewed-by: Adam Langley <agl@google.com>
2019-04-11 15:31:38 +00:00
David Benjamin
4a136ea005 Test AES-GCM-SIV with OPENSSL_SMALL.
https://boringssl-review.googlesource.com/16805 inadvertently restored
the OPENSSL_SMALL condition in aead_test.cc. I probably handled some
merge conflict wrong.

Change-Id: I1b29fbd4a0a57d94cd8b5bddf7c81ae10063e2a8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35544
Reviewed-by: Adam Langley <agl@google.com>
2019-04-11 15:17:47 +00:00
David Benjamin
ad9eee1628 Handle CBB_cleanup on child CBBs more gracefully.
Child and root CBBs share a type, but are different kinds of things. C++
programmers sometimes mistakenly believe they should use ScopedCBB for
everything. This mostly works because we NULL cbb->child->base on flush,
making CBB_cleanup a no-op. This zeroing also skips the assert in
CBB_cleanup. (If we ran it unconditionally, CBB_zero + CBB_cleanup would
not work.)

However, if a CBB operation fails and a function returns early, the
child CBB is not cleared. ScopedCBB will then call CBB_cleanup which
trips the assert but, in release build, misbehaves.

Run the assert unconditionally and, when the assert fails, still behave
well. To make this work with CBB_zero, negate is_top_level to is_child,
so a flushed child CBB and a (presumably) root CBB in the zero state are
distinguishable.

Update-Note: Code that was using CBB wrong may trip an assert in debug builds.
Change-Id: Ifea7759e1d0331f2e727c59bbafa355d70fb9dba
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35524
Reviewed-by: Adam Langley <agl@google.com>
2019-04-10 22:12:42 +00:00
David Benjamin
be7006adac Update third_party/googletest.
The new version of googletest deprecates INSTANTIATE_TEST_CASE_P in
favor of INSTANTIATE_TEST_SUITE_P, so apply the change.

This requires blacklisting C4628 on MSVC 2015 which says about digraphs
given foo<::std::tuple<...>>. Disable that warning. Digraphs are not
useful and C++11 apparently explicitly disambiguates that.

It also requires applying
https://github.com/google/googletest/pull/2226, to deal with a warning
in older MSVC.

Update-Note: Consumers using BoringSSL with their own copy of googletest
must ensure googletest was updated to a version from 2019-01-03 or
later for INSTANTIATE_TEST_SUITE_P to work. (I believe all relevant
consumers are fine here. If anyone can't update googletest and is
building BoringSSL tests, building with
-DINSTANTIATE_TEST_SUITE_P=INSTANTIATE_TEST_CASE_P would work as
workaround.)

Bug: chromium:936651
Change-Id: I23ada8de34a53131cab88a36a88d3185ab085c64
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35504
Reviewed-by: Adam Langley <agl@google.com>
2019-04-10 22:09:43 +00:00
David Benjamin
387b07b78d Rename 'md' output parameter to 'out' and add bounds.
We usually name output parameters 'out'. (Someone made a C++ templating
change in Chromium which messed up const-ness, saw the compile error,
and thought it was in MD5_Final.) Also tag the parameters with the
sizes.

Sadly, there's a bit of goofiness around SHA224_Final/SHA256_Final and
SHA384_Final/SHA512_Final, but they're just documentation anyway.
(Though it does touch on the mess that is sha->md_len which would be
nice to clear through somehow.)

Change-Id: I1918b7eecfe13f13b217d01d4414ac2358802354
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35484
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-04-08 18:19:01 +00:00
David Benjamin
a26d01719b Update other build tools.
Change-Id: If3c8de4b81559acd88e32928ac9884ace294fd1d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35465
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-04-04 17:42:09 +00:00
David Benjamin
98348562f0 Update SDE to 8.35.0-2019-03-11.
The new version has trap flag emulation, which is great for our ABI
tests. This CL doesn't enable it yet, however. The emulation is slightly
off on when traps start and stop, so the ABI tester will need to tweaked
to be more lenient.

Change-Id: I0eb20176dc63eaa1c35f77379b34f7bb6c0b0407
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35464
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-04-04 17:41:28 +00:00
Christopher Patton
be9953accf nit: Update references to draft-ietf-tls-subcerts.
Change-Id: Ica6ea6eaff1849c7ee42be671b22006fe3ee5ff4
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35444
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2019-04-01 19:54:35 +00:00
Nitish Sakhawalkar
a4af5f85bd Support get versions with get_{min,max}_proto_version for context
When building node with boringssl, `SSL_CTX_get_min_proto_version` and
`SSL_CTX_get_max_proto_version` are used. Openssl exposes those; this
change adds support for boringssl.

For this to work right in DTLS, we switch conf_{min,max}_version to store wire
versions, rather than our internal normalized versions.

Change-Id: I282ed224806c41f69e6f166ca97c6cc05ff51f17
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35404
Reviewed-by: Nitish Sakhawalkar <nitsakh@gmail.com>
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-03-27 12:46:26 +00:00
David Benjamin
df11bed9ee Update ImplDispatchTest for bsaes-x86_64 removal.
I always forget to update this.

Bug: 256
Change-Id: I85fea8fa48da8d4ed6a1e1f001f5e1a74f1b706d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35384
Reviewed-by: Adam Langley <agl@google.com>
2019-03-23 15:15:48 +00:00
David Benjamin
1a36dd4930 Unwind the large_inputs hint in aes_ctr_set_key.
With bsaes-x86_64.pl gone, it is no longer needed. Depending on how armv7 works
(if vpaes-armv7.pl is too slow AND on-demand vpaes->bsaes key conversion is not
viable), we may need to bring it back, but get it out of the way for now.

Bug: 256
Change-Id: I762c83097bd03d88574ae1ae16b88fca6826f655
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35365
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-03-23 07:06:02 +00:00
David Benjamin
32ce6032ff Add an optimized x86_64 vpaes ctr128_f and remove bsaes.
Brian Smith suggested applying vpaes-armv8's "2x" optimization to
vpaes-x86_64. The registers are a little tight (aarch64 has a whole 32
SIMD registers, while x86_64 only has 16), but it's doable with some
spills and makes vpaes much more competitive with bsaes. At small- and
medium-sized inputs, vpaes now matches bsaes. At large inputs, it's a
~10% perf hit.

bsaes is thus pulling much less weight. Losing an entire AES
implementation and having constant-time AES for SSSE3 is attractive.
Some notes:

- The fact that these are older CPUs tempers the perf hit, but CPUs
  without AES-NI are still common enough to matter.

- This CL does regress CBC decrypt performance nontrivially (see below).
  If this matters, we can double-up CBC decryption too. CBC in TLS is
  legacy and already pays a costly Lucky13 mitigation.

- The difference between 1350 and 8192 bytes is likely bsaes AES-GCM
  paying for two slow (and variable-time!) aes_nohw_encrypt
  calls for EK0 and the trailing partial block. At larger inputs, those
  two calls are more amortized.

- To that end, bsaes would likely be much faster on AES-GCM with smarter
  use of bsaes. (Fold one-off calls above into bulk data.) Implementing
  this is a bit of a nuisance though, especially considering we don't
  wish to regress hwaes.

- I'd discarded the key conversion idea, but I think I did it wrong.
  Benchmarks from
  https://boringssl-review.googlesource.com/c/boringssl/+/33589 suggest
  converting to bsaes format on-demand for large ctr32 inputs should
  give the best of both worlds, but at the cost of an entire AES
  implementation relative to this CL.

- ARMv7 still depends on bsaes and has no vpaes. It also has 16 SIMD
  registers, so my plan is to translate it, with the same 2x
  optimization, and see how it compares. Hopefully that, or some
  combination of the above, will work for ARMv7.

Sandy Bridge
bsaes (before):
Did 3144750 AES-128-GCM (16 bytes) seal operations in 5016000us (626943.8 ops/sec): 10.0 MB/s
Did 2053750 AES-128-GCM (256 bytes) seal operations in 5016000us (409439.8 ops/sec): 104.8 MB/s
Did 469000 AES-128-GCM (1350 bytes) seal operations in 5015000us (93519.4 ops/sec): 126.3 MB/s
Did 92500 AES-128-GCM (8192 bytes) seal operations in 5016000us (18441.0 ops/sec): 151.1 MB/s
Did 46750 AES-128-GCM (16384 bytes) seal operations in 5032000us (9290.5 ops/sec): 152.2 MB/s
vpaes-1x (for reference, not this CL):
Did 8684750 AES-128-GCM (16 bytes) seal operations in 5015000us (1731754.7 ops/sec): 27.7 MB/s [+177%]
Did 1731500 AES-128-GCM (256 bytes) seal operations in 5016000us (345195.4 ops/sec): 88.4 MB/s [-15.6%]
Did 346500 AES-128-GCM (1350 bytes) seal operations in 5016000us (69078.9 ops/sec): 93.3 MB/s [-26.1%]
Did 61250 AES-128-GCM (8192 bytes) seal operations in 5015000us (12213.4 ops/sec): 100.1 MB/s [-33.8%]
Did 32500 AES-128-GCM (16384 bytes) seal operations in 5031000us (6459.9 ops/sec): 105.8 MB/s [-30.5%]
vpaes-2x (this CL):
Did 8840000 AES-128-GCM (16 bytes) seal operations in 5015000us (1762711.9 ops/sec): 28.2 MB/s [+182%]
Did 2167750 AES-128-GCM (256 bytes) seal operations in 5016000us (432167.1 ops/sec): 110.6 MB/s [+5.5%]
Did 474000 AES-128-GCM (1350 bytes) seal operations in 5016000us (94497.6 ops/sec): 127.6 MB/s [+1.0%]
Did 81750 AES-128-GCM (8192 bytes) seal operations in 5015000us (16301.1 ops/sec): 133.5 MB/s [-11.6%]
Did 41750 AES-128-GCM (16384 bytes) seal operations in 5031000us (8298.5 ops/sec): 136.0 MB/s [-10.6%]

Penryn
bsaes (before):
Did 958000 AES-128-GCM (16 bytes) seal operations in 1000264us (957747.2 ops/sec): 15.3 MB/s
Did 420000 AES-128-GCM (256 bytes) seal operations in 1000480us (419798.5 ops/sec): 107.5 MB/s
Did 96000 AES-128-GCM (1350 bytes) seal operations in 1001083us (95896.1 ops/sec): 129.5 MB/s
Did 18000 AES-128-GCM (8192 bytes) seal operations in 1042491us (17266.3 ops/sec): 141.4 MB/s
Did 9482 AES-128-GCM (16384 bytes) seal operations in 1095703us (8653.8 ops/sec): 141.8 MB/s
Did 758000 AES-256-GCM (16 bytes) seal operations in 1000769us (757417.5 ops/sec): 12.1 MB/s
Did 359000 AES-256-GCM (256 bytes) seal operations in 1001993us (358285.9 ops/sec): 91.7 MB/s
Did 82000 AES-256-GCM (1350 bytes) seal operations in 1009583us (81221.7 ops/sec): 109.6 MB/s
Did 15000 AES-256-GCM (8192 bytes) seal operations in 1022294us (14672.9 ops/sec): 120.2 MB/s
Did 7884 AES-256-GCM (16384 bytes) seal operations in 1070934us (7361.8 ops/sec): 120.6 MB/s
vpaes-1x (for reference, not this CL):
Did 2030000 AES-128-GCM (16 bytes) seal operations in 1000227us (2029539.3 ops/sec): 32.5 MB/s [+112%]
Did 382000 AES-128-GCM (256 bytes) seal operations in 1001949us (381256.9 ops/sec): 97.6 MB/s [-9.2%]
Did 81000 AES-128-GCM (1350 bytes) seal operations in 1007297us (80413.2 ops/sec): 108.6 MB/s [-16.1%]
Did 14000 AES-128-GCM (8192 bytes) seal operations in 1031499us (13572.5 ops/sec): 111.2 MB/s [-21.4%]
Did 7008 AES-128-GCM (16384 bytes) seal operations in 1030706us (6799.2 ops/sec): 111.4 MB/s [-21.4%]
Did 1838000 AES-256-GCM (16 bytes) seal operations in 1000238us (1837562.7 ops/sec): 29.4 MB/s [+143%]
Did 321000 AES-256-GCM (256 bytes) seal operations in 1001666us (320466.1 ops/sec): 82.0 MB/s [-10.6%]
Did 67000 AES-256-GCM (1350 bytes) seal operations in 1010359us (66313.1 ops/sec): 89.5 MB/s [-18.3%]
Did 12000 AES-256-GCM (8192 bytes) seal operations in 1072706us (11186.7 ops/sec): 91.6 MB/s [-23.8%]
Did 5680 AES-256-GCM (16384 bytes) seal operations in 1009214us (5628.1 ops/sec): 92.2 MB/s [-23.5%]
vpaes-2x (this CL):
Did 2072000 AES-128-GCM (16 bytes) seal operations in 1000066us (2071863.3 ops/sec): 33.1 MB/s [+116%]
Did 432000 AES-128-GCM (256 bytes) seal operations in 1000732us (431684.0 ops/sec): 110.5 MB/s [+2.8%]
Did 92000 AES-128-GCM (1350 bytes) seal operations in 1000580us (91946.7 ops/sec): 124.1 MB/s [-4.2%]
Did 16000 AES-128-GCM (8192 bytes) seal operations in 1016422us (15741.5 ops/sec): 129.0 MB/s [-8.8%]
Did 8448 AES-128-GCM (16384 bytes) seal operations in 1073962us (7866.2 ops/sec): 128.9 MB/s [-9.1%]
Did 1865000 AES-256-GCM (16 bytes) seal operations in 1000043us (1864919.8 ops/sec): 29.8 MB/s [+146%]
Did 364000 AES-256-GCM (256 bytes) seal operations in 1001561us (363432.7 ops/sec): 93.0 MB/s [+1.4%]
Did 77000 AES-256-GCM (1350 bytes) seal operations in 1004123us (76683.8 ops/sec): 103.5 MB/s [-5.6%]
Did 14000 AES-256-GCM (8192 bytes) seal operations in 1071179us (13069.7 ops/sec): 107.1 MB/s [-10.9%]
Did 7008 AES-256-GCM (16384 bytes) seal operations in 1074125us (6524.4 ops/sec): 106.9 MB/s [-11.4%]

Penryn, CBC mode decryption
bsaes (before):
Did 159000 AES-128-CBC-SHA1 (16 bytes) open operations in 1001019us (158838.1 ops/sec): 2.5 MB/s
Did 114000 AES-128-CBC-SHA1 (256 bytes) open operations in 1006485us (113265.5 ops/sec): 29.0 MB/s
Did 65000 AES-128-CBC-SHA1 (1350 bytes) open operations in 1008441us (64455.9 ops/sec): 87.0 MB/s
Did 17000 AES-128-CBC-SHA1 (8192 bytes) open operations in 1005440us (16908.0 ops/sec): 138.5 MB/s
vpaes (after):
Did 167000 AES-128-CBC-SHA1 (16 bytes) open operations in 1003556us (166408.3 ops/sec): 2.7 MB/s [+8%]
Did 112000 AES-128-CBC-SHA1 (256 bytes) open operations in 1005673us (111368.2 ops/sec): 28.5 MB/s [-1.7%]
Did 56000 AES-128-CBC-SHA1 (1350 bytes) open operations in 1005647us (55685.5 ops/sec): 75.2 MB/s [-13.6%]
Did 13635 AES-128-CBC-SHA1 (8192 bytes) open operations in 1020486us (13361.3 ops/sec): 109.5 MB/s [-20.9%]

Bug: 256
Change-Id: I11ed773323ec7a5ee61080c9ed9ed4761849828a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35364
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-03-23 06:59:22 +00:00
David Benjamin
5501a26915 Add 16384 to the default bssl speed sizes.
When servers have a lot of data to send and aren't as latency-sensitive,
it makes sense to send large TLS records, so we care about measuring
both packet-sized and full-sized payloads.

Change-Id: Ib0cf5e0f8660f68a98a04fa86b5989d4a485528b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35344
Reviewed-by: Adam Langley <agl@google.com>
2019-03-20 23:01:43 +00:00
David Benjamin
4ca8d131d3 Rewrite BN_CTX.
While allocating near INT_MAX BIGNUMs or stack frames would never happen, we
should properly handle overflow here. Rewrite it to just be a STACK_OF(BIGNUM)
plus a stack of indices. Also simplify the error-handling. If we make the
errors truly sticky (rather than just sticky per frame), we don't need to keep
track of err_stack and friends.

Thanks to mlbrown for reporting the integer overflows in the original
implementation.

Bug: chromium:942269
Change-Id: Ie9c9baea3eeb82d65d88b1cb1388861f5cd84fe5
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35328
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-03-18 19:18:31 +00:00
David Benjamin
c93be52c9e Save a temporary in BN_mod_exp_mont's w=1 case.
BN_mod_exp_mont is most commonly used in RSA verification, where the exponent
sizes are small enough to use 1-bit "windows". There's no need to allocate the
extra BIGNUM.

Change-Id: I14fb523dfae7d77d2cec10a0209f09f22031d1af
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35327
Reviewed-by: Adam Langley <agl@google.com>
2019-03-18 17:20:32 +00:00
David Benjamin
1c71844ef5 Reject long inputs in c2i_ASN1_INTEGER.
Thanks to mlbrown for reporting this.

Bug: chromium:942269
Change-Id: Ie06970f25a6ab0e08a8861d604b2177c8fd1d1a8
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35326
Reviewed-by: Adam Langley <agl@google.com>
2019-03-18 17:19:52 +00:00
David Benjamin
0dcab9302f Harden the lower level parts of crypto/asn1 against overflows.
The legacy ASN.1 stack contains an unsalvageable mix of integer types.
82dfea8d9e bounded all inputs to the template
machinery, but sometimes code will call ASN1_get_object directly, such as the
just deleted d2i_ASN1_UINTEGER.

Thanks to mlbrown for reporting the d2i_ASN1_UINTEGER overflow.

Bug: chromium:942269
Change-Id: I2d4c8b7faf5dadd1b68dbdb51a5feae071ea2cb6
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35325
Reviewed-by: Adam Langley <agl@google.com>
2019-03-18 17:19:12 +00:00
David Benjamin
bab14fa753 Remove d2i_ASN1_UINTEGER.
It is unused. It dates to an old OpenSSL DSA serialization bug.

Bug: chromium:942269
Update-Note: Removing a function.
Change-Id: Ia98f7eb1dafcd832c744387475cc13b58bc82ffe
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35324
Reviewed-by: Adam Langley <agl@google.com>
2019-03-18 17:18:26 +00:00
David Benjamin
fdb48f9861 Drop some unused bsaes to aes_nohw dependencies.
When the CBC and CTR EVP_CIPHER implementations use bsaes, they never
call dat->block. Note this is *not* true of aes_ctr_set_key which is
used in contexts where it needs single-block operations.

Bug: 256
Change-Id: Ibea4f2117a2220cd5cb09f6cf12b7a50c28bf794
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35168
Reviewed-by: Adam Langley <agl@google.com>
2019-03-14 21:43:58 +00:00
David Benjamin
d22578f366 Adapt gcm_*_neon to aarch64.
This makes AES-GCM always constant-time on aarch64 (provided assembly is
enabled). Unlike vpaes, this does come at a binary size penalty of 1K
compared to the gcm_*_4bit version.

ABI testing already covered by GCMTest.ABI (GHASH_ASM_ARM covers both
OPENSSL_ARM and OPENSSL_AARCH64.)

Cortex-A53 (Raspberry Pi 3 Model B+)
Before:
Did 274000 AES-128-GCM (16 bytes) seal operations in 1003461us (273055.0 ops/sec): 4.4 MB/s
Did 53000 AES-128-GCM (256 bytes) seal operations in 1007689us (52595.6 ops/sec): 13.5 MB/s
Did 12000 AES-128-GCM (1350 bytes) seal operations in 1075908us (11153.4 ops/sec): 15.1 MB/s
Did 2068 AES-128-GCM (8192 bytes) seal operations in 1089037us (1898.9 ops/sec): 15.6 MB/s
After:
Did 298000 AES-128-GCM (16 bytes) seal operations in 1002917us (297133.3 ops/sec): 4.8 MB/s
Did 64000 AES-128-GCM (256 bytes) seal operations in 1001124us (63928.1 ops/sec): 16.4 MB/s
Did 14000 AES-128-GCM (1350 bytes) seal operations in 1015477us (13786.6 ops/sec): 18.6 MB/s
Did 2497 AES-128-GCM (8192 bytes) seal operations in 1057951us (2360.2 ops/sec): 19.3 MB/s

Bug: 265
Change-Id: I251bf0f2eae0578580bb14192755e5d8ff64cd14
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35285
Reviewed-by: Adam Langley <agl@google.com>
2019-03-14 21:43:27 +00:00
David Benjamin
4851041967 Patch out the aes_nohw fallback in bsaes_cbc_encrypt.
This plugs all bsaes fallback leaks for CBC outside of the key schedule.
The CBC EVP_CIPHERs never call the block function directly when there's
a stream.cbc function available.

This affects CBC decryptions of length < 128 or 16 mod 128.
Performance-wise, we don't really care about CBC apart from passing
glances at its use in TLS. There, the Lucky13 workaround mutes the
effects.

Cortex-A53 (Raspberry Pi 3 Model B+)
Before:
Did 78000 AES-128-CBC-SHA1 (16 bytes) open operations in 3020254us (25825.6 ops/sec): 0.4 MB/s
Did 75000 AES-128-CBC-SHA1 (32 bytes) open operations in 3005760us (24952.1 ops/sec): 0.8 MB/s
Did 71000 AES-128-CBC-SHA1 (64 bytes) open operations in 3038137us (23369.6 ops/sec): 1.5 MB/s
Did 67000 AES-128-CBC-SHA1 (96 bytes) open operations in 3027686us (22129.1 ops/sec): 2.1 MB/s
Did 64000 AES-128-CBC-SHA1 (112 bytes) open operations in 3005491us (21294.4 ops/sec): 2.4 MB/s
Did 59000 AES-128-CBC-SHA1 (128 bytes) open operations in 3020083us (19535.9 ops/sec): 2.5 MB/s
Did 53000 AES-128-CBC-SHA1 (240 bytes) open operations in 3020105us (17549.1 ops/sec): 4.2 MB/s
After:
Did 71668 AES-128-CBC-SHA1 (16 bytes) open operations in 3020896us (23724.1 ops/sec): 0.4 MB/s
Did 71000 AES-128-CBC-SHA1 (32 bytes) open operations in 3040826us (23348.9 ops/sec): 0.7 MB/s
Did 68000 AES-128-CBC-SHA1 (64 bytes) open operations in 3009913us (22592.0 ops/sec): 1.4 MB/s
Did 66000 AES-128-CBC-SHA1 (96 bytes) open operations in 3007597us (21944.4 ops/sec): 2.1 MB/s
Did 59000 AES-128-CBC-SHA1 (112 bytes) open operations in 3002878us (19647.8 ops/sec): 2.2 MB/s
Did 59000 AES-128-CBC-SHA1 (128 bytes) open operations in 3046786us (19364.7 ops/sec): 2.5 MB/s
Did 50000 AES-128-CBC-SHA1 (240 bytes) open operations in 3043643us (16427.7 ops/sec): 3.9 MB/s

Penryn (Mac mini, mid 2010)
Before:
Did 152000 AES-128-CBC-SHA1 (16 bytes) open operations in 1004422us (151330.8 ops/sec): 2.4 MB/s
Did 143000 AES-128-CBC-SHA1 (32 bytes) open operations in 1000443us (142936.7 ops/sec): 4.6 MB/s
Did 136000 AES-128-CBC-SHA1 (48 bytes) open operations in 1006580us (135111.0 ops/sec): 6.5 MB/s
Did 146000 AES-128-CBC-SHA1 (96 bytes) open operations in 1005731us (145168.0 ops/sec): 13.9 MB/s
Did 138000 AES-128-CBC-SHA1 (112 bytes) open operations in 1003330us (137542.0 ops/sec): 15.4 MB/s
Did 133000 AES-128-CBC-SHA1 (128 bytes) open operations in 1005876us (132223.1 ops/sec): 16.9 MB/s
Did 117000 AES-128-CBC-SHA1 (240 bytes) open operations in 1004922us (116426.9 ops/sec): 27.9 MB/s
After:
Did 159000 AES-128-CBC-SHA1 (16 bytes) open operations in 1000505us (158919.7 ops/sec): 2.5 MB/s
Did 157000 AES-128-CBC-SHA1 (32 bytes) open operations in 1006091us (156049.5 ops/sec): 5.0 MB/s
Did 154000 AES-128-CBC-SHA1 (48 bytes) open operations in 1002720us (153582.3 ops/sec): 7.4 MB/s
Did 146000 AES-128-CBC-SHA1 (96 bytes) open operations in 1002567us (145626.2 ops/sec): 14.0 MB/s
Did 135000 AES-128-CBC-SHA1 (112 bytes) open operations in 1001212us (134836.6 ops/sec): 15.1 MB/s
Did 133000 AES-128-CBC-SHA1 (128 bytes) open operations in 1006441us (132148.8 ops/sec): 16.9 MB/s
Did 115000 AES-128-CBC-SHA1 (240 bytes) open operations in 1005246us (114399.9 ops/sec): 27.5 MB/s

Bug: 256
Change-Id: I864b4455ada0d4d245380fce6f869dabb0686354
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35167
Reviewed-by: Adam Langley <agl@google.com>
2019-03-14 21:38:28 +00:00
David Benjamin
885a63fb74 Patch out the aes_nohw fallback in bsaes_ctr32_encrypt_blocks.
bsaes_ctr32_encrypt_blocks previously fell back to the table-based
aes_nohw_encrypt for inputs under 128 bytes. Instead, just run the usual
bsaes code, though it means we compute more blocks than needed.

This fixes some (but not all) the timing leaks and is needed for later
bsaes work.

Performance-wise, x86_64 actually sees a performance improvement for all but
tiny inputs. ARM does see a loss at small inputs however.

Cortex-A53 (Raspberry Pi 3 Model B+)
Before:
Did 299000 AES-128-GCM (16 bytes) seal operations in 1001123us (298664.6 ops/sec): 4.8 MB/s
Did 236000 AES-128-GCM (32 bytes) seal operations in 1001611us (235620.4 ops/sec): 7.5 MB/s
Did 167000 AES-128-GCM (64 bytes) seal operations in 1005706us (166052.5 ops/sec): 10.6 MB/s
Did 129000 AES-128-GCM (96 bytes) seal operations in 1006129us (128214.2 ops/sec): 12.3 MB/s
Did 116000 AES-128-GCM (112 bytes) seal operations in 1006302us (115273.5 ops/sec): 12.9 MB/s
Did 107000 AES-128-GCM (128 bytes) seal operations in 1000986us (106894.6 ops/sec): 13.7 MB/s
After:
Did 132000 AES-128-GCM (16 bytes) seal operations in 1005165us (131321.7 ops/sec): 2.1 MB/s
Did 128000 AES-128-GCM (32 bytes) seal operations in 1005966us (127240.9 ops/sec): 4.1 MB/s
Did 120000 AES-128-GCM (64 bytes) seal operations in 1003080us (119631.5 ops/sec): 7.7 MB/s
Did 113000 AES-128-GCM (96 bytes) seal operations in 1000557us (112937.1 ops/sec): 10.8 MB/s
Did 110000 AES-128-GCM (112 bytes) seal operations in 1000407us (109955.2 ops/sec): 12.3 MB/s
Did 108000 AES-128-GCM (128 bytes) seal operations in 1008830us (107054.7 ops/sec): 13.7 MB/s
(Inputs 128 bytes and up are unaffected by this CL.)

Nexus 7
Before:
Did 544000 AES-128-GCM (16 bytes) seal operations in 1001282us (543303.5 ops/sec): 8.7 MB/s
Did 475750 AES-128-GCM (32 bytes) seal operations in 1000244us (475633.9 ops/sec): 15.2 MB/s
Did 370500 AES-128-GCM (64 bytes) seal operations in 1000519us (370307.8 ops/sec): 23.7 MB/s
Did 300750 AES-128-GCM (96 bytes) seal operations in 1000122us (300713.3 ops/sec): 28.9 MB/s
Did 275750 AES-128-GCM (112 bytes) seal operations in 1000702us (275556.6 ops/sec): 30.9 MB/s
Did 251000 AES-128-GCM (128 bytes) seal operations in 1000214us (250946.3 ops/sec): 32.1 MB/s
After:
Did 296000 AES-128-GCM (16 bytes) seal operations in 1001129us (295666.2 ops/sec): 4.7 MB/s
Did 288750 AES-128-GCM (32 bytes) seal operations in 1000488us (288609.2 ops/sec): 9.2 MB/s
Did 267250 AES-128-GCM (64 bytes) seal operations in 1000641us (267078.8 ops/sec): 17.1 MB/s
Did 253250 AES-128-GCM (96 bytes) seal operations in 1000915us (253018.5 ops/sec): 24.3 MB/s
Did 248000 AES-128-GCM (112 bytes) seal operations in 1000091us (247977.4 ops/sec): 27.8 MB/s
Did 249000 AES-128-GCM (128 bytes) seal operations in 1000794us (248802.5 ops/sec): 31.8 MB/s

Penryn (Mac mini, mid 2010)
Before:
Did 1331000 AES-128-GCM (16 bytes) seal operations in 1000263us (1330650.0 ops/sec): 21.3 MB/s
Did 991000 AES-128-GCM (32 bytes) seal operations in 1000274us (990728.5 ops/sec): 31.7 MB/s
Did 780000 AES-128-GCM (48 bytes) seal operations in 1000278us (779783.2 ops/sec): 37.4 MB/s
Did 483000 AES-128-GCM (96 bytes) seal operations in 1000137us (482933.8 ops/sec): 46.4 MB/s
Did 428000 AES-128-GCM (112 bytes) seal operations in 1001132us (427516.1 ops/sec): 47.9 MB/s
Did 682000 AES-128-GCM (128 bytes) seal operations in 1000564us (681615.6 ops/sec): 87.2 MB/s
After:
Did 953000 AES-128-GCM (16 bytes) seal operations in 1000385us (952633.2 ops/sec): 15.2 MB/s
Did 903000 AES-128-GCM (32 bytes) seal operations in 1000998us (902099.7 ops/sec): 28.9 MB/s
Did 850000 AES-128-GCM (48 bytes) seal operations in 1000938us (849203.4 ops/sec): 40.8 MB/s
Did 736000 AES-128-GCM (96 bytes) seal operations in 1000886us (735348.5 ops/sec): 70.6 MB/s
Did 702000 AES-128-GCM (112 bytes) seal operations in 1000657us (701539.1 ops/sec): 78.6 MB/s
Did 676000 AES-128-GCM (128 bytes) seal operations in 1000405us (675726.3 ops/sec): 86.5 MB/s

Bug: 256
Change-Id: I9403da607dd1feaff7b3c9b76fe78b66018fb753
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35166
Reviewed-by: Adam Langley <agl@google.com>
2019-03-14 21:37:46 +00:00
David Benjamin
aadcce380f Implement sk_find manually.
glibc inlines bsearch, so CFI does observe the function pointer mishap.
Binary search is easy enough, aside from thinking through the edge case
at the end, so just implement it by hand. As a bonus, it actually gives
O(lg N) behavior.

sk_*_find needs to return the *first* match, while bsearch does not
promise a particular one. sk_find thus performs a fixup step to find the
first one, but this is linear in the number of matching elements.
Instead, the binary search should take this into account.

This still leaves qsort, but it's not inlined, so hopefully we can leave
it alone.

Bug: chromium:941463
Change-Id: I5c94d6b15423beea3bdb389639466f8b3ff0dc5d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35304
Reviewed-by: Adam Langley <agl@google.com>
2019-03-14 15:21:48 +00:00
David Benjamin
35941f2923 Make vpaes-armv8.pl compatible with XOM.
Change-Id: I27413467e5cac4e16ecbbb8d9a238ba5a8bcb9e7
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35284
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-03-11 23:17:06 +00:00
Adam Langley
1d1345377a Support three-argument instructions on x86-64.
Change-Id: I81c855cd4805d4a5016999669a0cb5261838f23a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35224
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2019-03-11 21:41:40 +00:00
Watson Ladd
3390fd88d7 Correct outdated comments
Change-Id: Idc3a41d025fefa9017fce108bed63cb8af426c9b
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35244
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2019-03-07 21:55:09 +00:00
David Benjamin
f9c8d30897 Remove SSL_get_structure_sizes.
With all those structures made opaque, it's not really useful as a build
sanity-check anymore.

Update-Note: This function is removed, but I don't see any actual uses.
Change-Id: Ib5640e778466da980596e7085d97104d22aa9d33
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35184
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-03-05 17:58:10 +00:00
David Benjamin
b8d7b7498c Prefer vpaes over bsaes in AES-GCM-SIV and AES-CCM.
The AES-GCM-SIV code does not use ctr128_f at all so bsaes is simply
identical to aes_nohw. Also, while CCM encrypts with CTR mode, its MAC
is not parallelizable at all.

(Given the existence of non-parallelizable modes, we ought to make a
vpaes-armv7.pl to ensure constant-time AES on NEON. For now, pick the
right implementation for x86_64 at least.)

aes_ctr_set_key and friends probably aren't the right abstraction
(observe the large vs small inputs hint *almost* matches whether you
touch block128_f), but the right abstraction depends on a couple
questions:

- If you don't provide ctr128_f, is there a perf hit to implementing
  ctr128_f on top of your block128_f to unify calling code?

- It is almost certainly better to use bsaes with gcm.c by calling
  ctr128_f exclusively and paying some copies (a dedicated calling
  convention would be even better, but would be a headache) to integrate
  leading and trailing blocks into the CTR pass. Is this a win, loss, or
  no-op for hwaes, where block128_f is just fine? hwaes is the one mode
  we really should not regress.

Hopefully those will get answered as we continue to chip away at this.

Bug: 256
Change-Id: I8f0150b223b671e68f7da6faaff94a3bea398d4d
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35169
Reviewed-by: Adam Langley <agl@google.com>
2019-03-05 17:55:03 +00:00
David Benjamin
da8bb847fd Tell ASan about the OPENSSL_malloc prefix.
OpenSSL's BN_mul function had a single-word buffer underflow (see
576129cd72ae054d246221f111aabf42b9c6d76d). We already independently
fixed this but, if we hadn't, ASan wouldn't have noticed because of
OPENSSL_malloc.

ASan has runtime hooks we can call to make it more accurate.

Change-Id: Ifc9c3837ece2bc456c5bdc960be707d7b1759904
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35165
Reviewed-by: Adam Langley <agl@google.com>
2019-03-05 17:53:16 +00:00
David Benjamin
8d685ec867 modes/asm/ghash-armv4.pl: address "infixes are deprecated" warnings.
This imports ce5eb5e8149d8d03660575f4b8504c993851988a and
1212818eb07add297fe562eba80ac46a9893781e from OpenSSL's 1.1.1 branch.

Change-Id: I121c0771371697191a163a28d972a7b3cee37762
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35164
Reviewed-by: Adam Langley <agl@google.com>
2019-03-05 17:52:28 +00:00
David Benjamin
55db667c62 Enable vpaes for aarch64, with CTR optimizations.
This patches vpaes-armv8.pl to add vpaes_ctr32_encrypt_blocks. CTR mode
is by far the most important mode these days. It should have access to
_vpaes_encrypt_2x, which gives a considerable speed boost. Also exclude
vpaes_ecb_* as they're not even used.

For iOS, this change is completely a no-op. iOS ARMv8 always has crypto
extensions, and we already statically drop all other AES
implementations.

Android ARMv8 is *not* required to have crypto extensions, but every
ARMv8 device I've seen has them. For those, it is a no-op
performance-wise and a win on size. vpaes appears to be about 5.6KiB
smaller than the tables. ARMv8 always makes SIMD (NEON) available, so we
can statically drop aes_nohw.

In theory, however, crypto-less Android ARMv8 is possible. Today such
chips get a variable-time AES. This CL fixes this, but the performance
story is complex.

The Raspberry Pi 3 is not Android but has a Cortex-A53 chip
without crypto extensions. (But the official images are 32-bit, so even
this is slightly artificial...) There, vpaes is a performance win.

Raspberry Pi 3, Model B+, Cortex-A53
Before:
Did 265000 AES-128-GCM (16 bytes) seal operations in 1003312us (264125.2 ops/sec): 4.2 MB/s
Did 44000 AES-128-GCM (256 bytes) seal operations in 1002141us (43906.0 ops/sec): 11.2 MB/s
Did 9394 AES-128-GCM (1350 bytes) seal operations in 1032104us (9101.8 ops/sec): 12.3 MB/s
Did 1562 AES-128-GCM (8192 bytes) seal operations in 1008982us (1548.1 ops/sec): 12.7 MB/s
After:
Did 277000 AES-128-GCM (16 bytes) seal operations in 1001884us (276479.1 ops/sec): 4.4 MB/s
Did 52000 AES-128-GCM (256 bytes) seal operations in 1001480us (51923.2 ops/sec): 13.3 MB/s
Did 11000 AES-128-GCM (1350 bytes) seal operations in 1007979us (10912.9 ops/sec): 14.7 MB/s
Did 2013 AES-128-GCM (8192 bytes) seal operations in 1085545us (1854.4 ops/sec): 15.2 MB/s

The Pixel 3 has a Cortex-A75 with crypto extensions, so it would never
run this code. However, artificially ignoring them gives another data
point (ARM documentation[*] suggests the extensions are still optional
on a Cortex-A75.) Sadly, vpaes no longer wins on perf over aes_nohw.
But, it is constant-time:

Pixel 3, AES/PMULL extensions ignored, Cortex-A75:
Before:
Did 2102000 AES-128-GCM (16 bytes) seal operations in 1000378us (2101205.7 ops/sec): 33.6 MB/s
Did 358000 AES-128-GCM (256 bytes) seal operations in 1002658us (357051.0 ops/sec): 91.4 MB/s
Did 75000 AES-128-GCM (1350 bytes) seal operations in 1012830us (74049.9 ops/sec): 100.0 MB/s
Did 13000 AES-128-GCM (8192 bytes) seal operations in 1036524us (12541.9 ops/sec): 102.7 MB/s
After:
Did 1453000 AES-128-GCM (16 bytes) seal operations in 1000213us (1452690.6 ops/sec): 23.2 MB/s
Did 285000 AES-128-GCM (256 bytes) seal operations in 1002227us (284366.7 ops/sec): 72.8 MB/s
Did 60000 AES-128-GCM (1350 bytes) seal operations in 1016106us (59049.0 ops/sec): 79.7 MB/s
Did 11000 AES-128-GCM (8192 bytes) seal operations in 1094184us (10053.2 ops/sec): 82.4 MB/s

Note the numbers above run with PMULL off, so the slow GHASH is
dampening the regression. If we test aes_nohw and vpaes paired with
PMULL on, the 20% perf hit becomes a 31% hit. The PMULL-less variant is
more likely to represent a real chip.

This is consistent with upstream's note in the comment, though it is
unclear if 20% is the right order of magnitude: "these results are worse
than scalar compiler-generated code, but it's constant-time and
therefore preferred".

[*] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100458_0301_00_en/lau1442495529696.html

Bug: 246
Change-Id: If1dc87f5131fce742052498295476fbae4628dbf
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35026
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-03-04 20:31:39 +00:00
David Benjamin
b1b4ff93ca Check in vpaes-armv8.pl from OpenSSL unused and unmodified.
This is done separately to make the diffs in the subsequent CL easier to
see. Imported from OpenSSL at revision
25ca718150cef41e1c1d9c2c8c58e2b1e2cad3fa.

Bug: 246
Change-Id: I9e7067ea177963fb9b77bf6fb39702ffe6e34ed4
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35025
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-03-04 20:23:09 +00:00