Patch from https://mta.openssl.org/pipermail/openssl-dev/2016-March/005625.html.
Upstream has yet to make a decision on aliasing requirements for their
assembly. If they choose to go with the stricter aliasing requirement rather
than land this patch, we'll probably want to tweak EVP_AEAD's API guarantees
accordingly and then undiverge.
In the meantime, import this to avoid a regression on x86 from when we had
compiler-vectorized code on GCC platforms. Per our assembly coverage tools and
pending multi-CPU-variant tests, we have good coverage here. Unlike Poly1305
(which is currently waiting on yet another upstream bugfix), where there is
risk of missed carries everywhere, it is much more difficult to accidentally
make a ChaCha20 implementation that fails based on the data passed into it.
This restores a sizeable speed improvement on x86.
Before:
Did 1131000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000205us (1130768.2 ops/sec): 18.1 MB/s
Did 161000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1006136us (160018.1 ops/sec): 216.0 MB/s
Did 28000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1023264us (27363.4 ops/sec): 224.2 MB/s
Did 1166000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000447us (1165479.0 ops/sec): 18.6 MB/s
Did 160000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1004818us (159232.8 ops/sec): 215.0 MB/s
Did 30000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1016977us (29499.2 ops/sec): 241.7 MB/s
After:
Did 2208000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000031us (2207931.6 ops/sec): 35.3 MB/s
Did 402000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001717us (401310.9 ops/sec): 541.8 MB/s
Did 97000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1005394us (96479.6 ops/sec): 790.4 MB/s
Did 2444000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000089us (2443782.5 ops/sec): 39.1 MB/s
Did 459000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000563us (458741.7 ops/sec): 619.3 MB/s
Did 97000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1007942us (96235.7 ops/sec): 788.4 MB/s
Change-Id: I976da606dae062a776e0cc01229ec03a074035d1
Reviewed-on: https://boringssl-review.googlesource.com/7561
Reviewed-by: Steven Valdez <svaldez@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
The coverage tool revealed that we weren't testing all codepaths of the ChaCha
assembly. Add a standalone test as it's much easier to iterate over all lengths
when there isn't the entire AEAD in the way.
I wasn't able to find a really long test vector, so I generated a random one
with the Go implementation we have in runner.
This test gives us full coverage on the ChaCha20_ssse3 variant. (We'll see how
it fares on the other codepaths when the multi-variant test harnesses get in. I
certainly hope there isn't a more novel way to call ChaCha20 than this...)
Change-Id: I087e421c7351f46ea65dacdc7127e4fbf5f4c0aa
Reviewed-on: https://boringssl-review.googlesource.com/7299
Reviewed-by: Adam Langley <agl@google.com>
They fail the newly-added in-place tests. Since we don't have bots for them
yet, verified manually that the arm and aarch64 code is fine.
Change-Id: Ic6f4060f63e713e09707af05e6b7736b7b65c5df
Reviewed-on: https://boringssl-review.googlesource.com/7252
Reviewed-by: Adam Langley <agl@google.com>
This removes chacha_vec_arm.S and chacha_vec.c in favor of unifying on
upstream's code. Upstream's is faster and this cuts down on the number of
distinct codepaths. Our old scheme also didn't give vectorized code on
Windows or aarch64.
BoringSSL-specific modifications made to the assembly:
- As usual, the shelling out to $CC is replaced with hardcoding $avx. I've
tested up to the AVX2 codepath, so enable it all.
- I've removed the AMD XOP code as I have not tested it.
- As usual, the ARM file need the arm_arch.h include tweaked.
Speed numbers follow. We can hope for further wins on these benchmarks after
importing the Poly1305 assembly.
x86
---
Old:
Did 1422000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000433us (1421384.5 ops/sec): 22.7 MB/s
Did 123000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1003803us (122534.0 ops/sec): 165.4 MB/s
Did 22000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1000282us (21993.8 ops/sec): 180.2 MB/s
Did 1428000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000214us (1427694.5 ops/sec): 22.8 MB/s
Did 124000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1006332us (123219.8 ops/sec): 166.3 MB/s
Did 22000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1020771us (21552.3 ops/sec): 176.6 MB/s
New:
Did 1520000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000567us (1519138.6 ops/sec): 24.3 MB/s
Did 152000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1004216us (151361.9 ops/sec): 204.3 MB/s
Did 31000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1009085us (30720.9 ops/sec): 251.7 MB/s
Did 1797000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000141us (1796746.7 ops/sec): 28.7 MB/s
Did 171000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1003204us (170453.9 ops/sec): 230.1 MB/s
Did 31000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1005349us (30835.1 ops/sec): 252.6 MB/s
x86_64, no AVX2
---
Old:
Did 1782000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000204us (1781636.5 ops/sec): 28.5 MB/s
Did 317000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001579us (316500.2 ops/sec): 427.3 MB/s
Did 62000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1012146us (61256.0 ops/sec): 501.8 MB/s
Did 1778000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000220us (1777608.9 ops/sec): 28.4 MB/s
Did 315000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1002886us (314093.5 ops/sec): 424.0 MB/s
Did 71000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1014606us (69977.9 ops/sec): 573.3 MB/s
New:
Did 1866000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000019us (1865964.5 ops/sec): 29.9 MB/s
Did 399000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001017us (398594.6 ops/sec): 538.1 MB/s
Did 84000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1005645us (83528.5 ops/sec): 684.3 MB/s
Did 1881000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000325us (1880388.9 ops/sec): 30.1 MB/s
Did 404000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000004us (403998.4 ops/sec): 545.4 MB/s
Did 85000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1010048us (84154.4 ops/sec): 689.4 MB/s
x86_64, AVX2
---
Old:
Did 2375000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000282us (2374330.4 ops/sec): 38.0 MB/s
Did 448000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001865us (447166.0 ops/sec): 603.7 MB/s
Did 88000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1005217us (87543.3 ops/sec): 717.2 MB/s
Did 2409000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000188us (2408547.2 ops/sec): 38.5 MB/s
Did 446000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1001003us (445553.1 ops/sec): 601.5 MB/s
Did 90000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1006722us (89399.1 ops/sec): 732.4 MB/s
New:
Did 2622000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000266us (2621302.7 ops/sec): 41.9 MB/s
Did 794000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1000783us (793378.8 ops/sec): 1071.1 MB/s
Did 173000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1000176us (172969.6 ops/sec): 1417.0 MB/s
Did 2623000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000330us (2622134.7 ops/sec): 42.0 MB/s
Did 783000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000531us (782584.4 ops/sec): 1056.5 MB/s
Did 174000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1000840us (173854.0 ops/sec): 1424.2 MB/s
arm, Nexus 4
---
Old:
Did 388550 ChaCha20-Poly1305 (16 bytes) seal operations in 1000580us (388324.8 ops/sec): 6.2 MB/s
Did 90000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1003816us (89657.9 ops/sec): 121.0 MB/s
Did 19000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1045750us (18168.8 ops/sec): 148.8 MB/s
Did 398500 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000305us (398378.5 ops/sec): 6.4 MB/s
Did 90500 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000305us (90472.4 ops/sec): 122.1 MB/s
Did 19000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1043278us (18211.8 ops/sec): 149.2 MB/s
New:
Did 424788 ChaCha20-Poly1305 (16 bytes) seal operations in 1000641us (424515.9 ops/sec): 6.8 MB/s
Did 115000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001526us (114824.8 ops/sec): 155.0 MB/s
Did 27000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1033023us (26136.9 ops/sec): 214.1 MB/s
Did 447750 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000549us (447504.3 ops/sec): 7.2 MB/s
Did 117500 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1001923us (117274.5 ops/sec): 158.3 MB/s
Did 27000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1025118us (26338.4 ops/sec): 215.8 MB/s
aarch64, Nexus 6p
(Note we didn't have aarch64 assembly before at all, and still don't have it
for Poly1305. Hopefully once that's added this will be faster than the arm
numbers...)
---
Old:
Did 145040 ChaCha20-Poly1305 (16 bytes) seal operations in 1003065us (144596.8 ops/sec): 2.3 MB/s
Did 14000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1042605us (13427.9 ops/sec): 18.1 MB/s
Did 2618 ChaCha20-Poly1305 (8192 bytes) seal operations in 1093241us (2394.7 ops/sec): 19.6 MB/s
Did 148000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000709us (147895.1 ops/sec): 2.4 MB/s
Did 14000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1047294us (13367.8 ops/sec): 18.0 MB/s
Did 2607 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1090745us (2390.1 ops/sec): 19.6 MB/s
New:
Did 358000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000769us (357724.9 ops/sec): 5.7 MB/s
Did 45000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1021267us (44062.9 ops/sec): 59.5 MB/s
Did 8591 ChaCha20-Poly1305 (8192 bytes) seal operations in 1047136us (8204.3 ops/sec): 67.2 MB/s
Did 343000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000489us (342832.4 ops/sec): 5.5 MB/s
Did 44000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1008326us (43636.7 ops/sec): 58.9 MB/s
Did 8866 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1083341us (8183.9 ops/sec): 67.0 MB/s
Change-Id: I629fe195d072f2c99e8f947578fad6d70823c4c8
Reviewed-on: https://boringssl-review.googlesource.com/7202
Reviewed-by: Adam Langley <agl@google.com>
Most of the OPENSSL_armcap_P accesses in assembly use named constants from
arm_arch.h, but some don't. Consistently use the constants. The dispatch really
should be in C, but in the meantime, make it easier to tell what's going on.
I'll send this patch upstream so we won't be carrying a diff here.
Change-Id: I63c68d2351ea5ce11005813314988e32b6459526
Reviewed-on: https://boringssl-review.googlesource.com/7203
Reviewed-by: Adam Langley <agl@google.com>
They won't be used as-is. This is just to make the diffs easier to see. Taken
from upstream's 4f16039efe3589aa4d63a6f1fab799d0cd9338ca.
Change-Id: I34d8be409f9c8f15b8a6da4b2d98ba3e60aa2210
Reviewed-on: https://boringssl-review.googlesource.com/7200
Reviewed-by: Adam Langley <agl@google.com>
Commit 75a64c08fc missed one case where
the GCC syntax should have been replaced with |alignas|.
Change-Id: Iebdaa9c9a2c0aff171f0b5d4daac607e351a4b7e
Reviewed-on: https://boringssl-review.googlesource.com/6974
Reviewed-by: David Benjamin <davidben@google.com>
MSVC doesn't have stdalign.h and so doesn't support |alignas| in C
code. Define |alignas(x)| as a synonym for |__decltype(align(x))|
instead for it.
This also fixes -Wcast-qual warnings in rsaz_exp.c.
Change-Id: Ifce9031724cb93f5a4aa1f567e7af61b272df9d5
Reviewed-on: https://boringssl-review.googlesource.com/6924
Reviewed-by: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Fix casts from const to non-const where dropping the constness is
completely unnecessary. The changes to chacha_vec.c don't result in any
changes to chacha_vec_arm.S.
Change-Id: I2f10081fd0e73ff5db746347c5971f263a5221a6
Reviewed-on: https://boringssl-review.googlesource.com/6923
Reviewed-by: David Benjamin <davidben@google.com>
The new function |CRYPTO_chacha_96_bit_nonce_from_64_bit_nonce| can be
used to adapt code from that uses 64 bit nonces, in a way that is
compatible with the old semantics.
Change-Id: I83d5b2d482e006e82982f58c9f981e8078c3e1b0
Reviewed-on: https://boringssl-review.googlesource.com/6100
Reviewed-by: Adam Langley <alangley@gmail.com>
Also, organize the links in BUILDING.md sensibly.
Change-Id: Ie9c65750849fcdab7a6a6bf11d1c9cdafb53bc00
Reviewed-on: https://boringssl-review.googlesource.com/6140
Reviewed-by: Adam Langley <alangley@gmail.com>
The ChaCha20 ARM asm is generated from GCC. This change updates the GCC
command line to include -ffunction-sections, which causes GCC to put
each function in its own section so that the linker with --gc-sections
can trim unused functions.
Since the file only has a single function, this is a bit useless, but
it'll now be consistent with the other ARM asm.
Change-Id: If12c675700310ea55af817b5433844eeffc9d029
Reviewed-on: https://boringssl-review.googlesource.com/6006
Reviewed-by: Adam Langley <agl@google.com>
arm_arch.h is included from ARM asm files, but lives in crypto/, not
openssl/include/. Since the asm files are often built from a different
location than their position in the source tree, relative include paths
are unlikely to work so, rather than having crypto/ be a de-facto,
second global include path, this change moves arm_arch.h to
include/openssl/.
It also removes entries from many include paths because they should be
needed as relative includes are always based on the locations of the
source file.
Change-Id: I638ff43d641ca043a4fc06c0d901b11c6ff73542
Reviewed-on: https://boringssl-review.googlesource.com/5746
Reviewed-by: Adam Langley <agl@google.com>
This change causes the generated assembly files for ARM and AArch64 to
have #if guards for __arm__ and __aarch64__, respectively. Since
building on ARM is only supported for Linux, we only have to worry about
GCC/Clang's predefines.
Change-Id: I7198eab6230bcfc26257f0fb6a0cc3166df0bb29
Reviewed-on: https://boringssl-review.googlesource.com/5173
Reviewed-by: Adam Langley <agl@google.com>
Change-Id: I34ee66fcc53d3371591beee3373c46598c31b5c5
Reviewed-on: https://boringssl-review.googlesource.com/3460
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
When addressing [1], I checked the AEAD code but brain-farted: a key is
aligned in that code, but it's the Poly1305 key, which doesn't matter
here.
It would be nice to align the ChaCha key too, but Android doesn't have
|posix_memalign| in the versions that we care about. It does have
|memalign|, but that's documented as "obsolete" and we don't have a
concept of an Android OS yet and I don't want to add one just for this.
So this change uses the buffer for loading the key again.
(Note that we never used to check for alignment of the |key| before
calling this. We must have gotten it for free somehow when checking the
alignment of |in| and |out|. But there are clearly some paths that don't
have an aligned key:
https://code.google.com/p/chromium/issues/detail?id=454308.)
At least the generation script started paying off immediately ☺.
[1] https://boringssl-review.googlesource.com/#/c/3132/1/crypto/chacha/chacha_vec.c@185
Change-Id: I4f893ba0733440fddd453f9636cc2aeaf05076ed
Reviewed-on: https://boringssl-review.googlesource.com/3270
Reviewed-by: Adam Langley <agl@google.com>
Including string.h in base.h causes any file that includes a BoringSSL
header to include string.h. Generally this wouldn't be a problem,
although string.h might slow down the compile if it wasn't otherwise
needed. However, it also causes problems for ipsec-tools in Android
because OpenSSL didn't have this behaviour.
This change removes string.h from base.h and, instead, adds it to each
.c file that requires it.
Change-Id: I5968e50b0e230fd3adf9b72dd2836e6f52d6fb37
Reviewed-on: https://boringssl-review.googlesource.com/3200
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
|num_rounds| is neither a parameter nor manifest constant.
Change-Id: I6c1d3a3819731f53fdd01eef6bb4de8a45176a1d
Reviewed-on: https://boringssl-review.googlesource.com/3180
Reviewed-by: Adam Langley <agl@google.com>
Obviously I shouldn't be doing this by hand each time.
Change-Id: I64e3f5ede5c47eddbff3b18172a95becc681b486
Reviewed-on: https://boringssl-review.googlesource.com/3170
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
I put the header back, but missed the #endif at the end of the file.
Regenerating this is clearly too error prone – I'll write a script to do
it for the future.
Change-Id: I06968c9f7a4673f5942725e727c67cb4e01d361a
By copying the input and output data via an aligned buffer, the
alignment requirements for the NEON ChaCha implementation on ARM can be
eliminted. This does, however, reduce the speed when aligned buffers are
used. However, updating the GCC version used to generate the ASM more
than makes up for that.
On a SnapDragon 801 (OnePlus One) the aligned speed was 214.6 MB/s and
the unaligned speed was 112.1 MB/s. Now both are 218.4 MB/s. A Nexus 7
also shows a slight speed up.
Change-Id: I68321ba56767fa5354b31a1491a539b299236e9a
Reviewed-on: https://boringssl-review.googlesource.com/3132
Reviewed-by: Adam Langley <agl@google.com>
Change-Id: Id77fb7c904cbfe8172466dff20b6a715d90b806c
Reviewed-on: https://boringssl-review.googlesource.com/1710
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
This change marks public symbols as dynamically exported. This means
that it becomes viable to build a shared library of libcrypto and libssl
with -fvisibility=hidden.
On Windows, one not only needs to mark functions for export in a
component, but also for import when using them from a different
component. Because of this we have to build with
|BORINGSSL_IMPLEMENTATION| defined when building the code. Other
components, when including our headers, won't have that defined and then
the |OPENSSL_EXPORT| tag becomes an import tag instead. See the #defines
in base.h
In the asm code, symbols are now hidden by default and those that need
to be exported are wrapped by a C function.
In order to support Chromium, a couple of libssl functions were moved to
ssl.h from ssl_locl.h: ssl_get_new_session and ssl_update_cache.
Change-Id: Ib4b76e2f1983ee066e7806c24721e8626d08a261
Reviewed-on: https://boringssl-review.googlesource.com/1350
Reviewed-by: Adam Langley <agl@google.com>
Previously, public headers lived next to the respective code and there
were symlinks from include/openssl to them.
This doesn't work on Windows.
This change moves the headers to live in include/openssl. In cases where
some symlinks pointed to the same header, I've added a file that just
includes the intended target. These cases are all for backwards-compat.
Change-Id: I6e285b74caf621c644b5168a4877db226b07fd92
Reviewed-on: https://boringssl-review.googlesource.com/1180
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
PNaCl needs OPENSSL_NO_ASM to work and a couple of cases were missing
because it hasn't previously been tested.
Additionally, it defined _BSD_SOURCE and others on the command line,
causing duplicate definition errors when defined in source code.
It's missing readdir_r.
It uses newlib, which appears to use u_short in socket.h without ever
defining it.
Change-Id: Ieccfc7365723d0521f6327eebe9f44a2afc57406
Reviewed-on: https://boringssl-review.googlesource.com/1140
Reviewed-by: Adam Langley <agl@google.com>
The function was hard-coded to 20 rounds already so the argument was
already useless. Thanks to Huzaifa Sidhpurwala for noticing.
Change-Id: I5f9d6ca6d46c6ab769b19820f8f918349544846d