Commit Graph

57 Commits

Author SHA1 Message Date
David Benjamin
6dc994265e Sync up some perlasm license headers and easy fixes.
These files are otherwise up-to-date with OpenSSL master as of
50ea9d2b3521467a11559be41dcf05ee05feabd6, modulo a couple of spelling
fixes which I've imported.

I've also reverted the same-line label and instruction patch to
x86_64-mont*.pl. The new delocate parser handles that fine.

Change-Id: Ife35c671a8104c3cc2fb6c5a03127376fccc4402
Reviewed-on: https://boringssl-review.googlesource.com/25644
Reviewed-by: Adam Langley <agl@google.com>
2018-02-11 01:00:35 +00:00
David Benjamin
875095aa7c Silence ARMv8 deprecated IT instruction warnings.
ARMv8 kindly deprecated most of its IT instructions in Thumb mode.
These files are taken from upstream and are used on both ARMv7 and ARMv8
processors. Accordingly, silence the warnings by marking the file as
targetting ARMv7. In other files, they were accidentally silenced anyway
by way of the existing .arch lines.

This can be reproduced by building with the new NDK and passing
-DCMAKE_ASM_FLAGS=-march=armv8-a. Some of our downstream code ends up
passing that to the assembly.

Note this change does not attempt to arrange for ARMv8-A/T32 to get
code which honors the constraints. It only silences the warnings and
continues to give it the same ARMv7-A/Thumb-2 code that backwards
compatibility dictates it continue to run.

Bug: chromium:575886, b/63131949
Change-Id: I24ce0b695942eaac799347922b243353b43ad7df
Reviewed-on: https://boringssl-review.googlesource.com/24166
Reviewed-by: Adam Langley <agl@google.com>
2017-12-14 01:56:22 +00:00
David Benjamin
808f832917 Run the comment converter on libcrypto.
crypto/{asn1,x509,x509v3,pem} were skipped as they are still OpenSSL
style.

Change-Id: I3cd9a60e1cb483a981aca325041f3fbce294247c
Reviewed-on: https://boringssl-review.googlesource.com/19504
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-08-18 21:49:04 +00:00
David Benjamin
d4e37951b4 x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results.
The changes to the assembly files are synced from upstream's
64d92d74985ebb3d0be58a9718f9e080a14a8e7f. cpu-intel.c is translated to C
from that commit and d84df594404ebbd71d21fec5526178d935e4d88d.

Change-Id: I02c8f83aa4780df301c21f011ef2d8d8300e2f2a
Reviewed-on: https://boringssl-review.googlesource.com/18411
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2017-07-26 22:01:37 +00:00
David Benjamin
3c9729212b Fix chacha-armv4.pl with clang -fno-integrated-as.
The __clang__-guarded #defines cause gas to complain if clang is passed
-fno-integrated-as. Emitting .syntax unified when those are used fixes
this. This matches the change made to ghash-armv4.pl in upstream's
6cf412c473d8145562b76219ce3da73b201b3255.

See also https://github.com/openssl/openssl/pull/3694. This fixes the
build with the latest Android NDK (use the NDK-supplied toolchain file)
with the armeabi ABI.

Bug: chromium:732066
Change-Id: Ic6ca633a58edbe8ae8c7d501bd9515c2476fd7c2
Reviewed-on: https://boringssl-review.googlesource.com/17404
Commit-Queue: Steven Valdez <svaldez@google.com>
Reviewed-by: Steven Valdez <svaldez@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-06-28 13:35:29 +00:00
David Benjamin
f03cdc3a93 Sync ARM assembly up to 609b0852e4d50251857dbbac3141ba042e35a9ae.
This change was made by copying over the files as of that commit and
then discarding the parts of the diff which corresponding to our own
changes.

Change-Id: I28c5d711f7a8cec30749b8174687434129af5209
Reviewed-on: https://boringssl-review.googlesource.com/17111
Reviewed-by: Adam Langley <agl@google.com>
2017-06-13 17:47:20 +00:00
David Benjamin
583c12ea97 Remove filename argument to x86 asm_init.
43e5a26b53 removed the .file directive
from x86asm.pl. This removes the parameter from asm_init altogether. See
also upstream's e195c8a2562baef0fdcae330556ed60b1e922b0e.

Change-Id: I65761bc962d09f9210661a38ecf6df23eae8743d
Reviewed-on: https://boringssl-review.googlesource.com/16247
Reviewed-by: Steven Valdez <svaldez@google.com>
Commit-Queue: Steven Valdez <svaldez@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-05-12 14:58:27 +00:00
Adam Langley
cb1b333c2b x86_64 assembly pack: Win64 SEH face-lift.
(Imports upstream's 384e6de4c7e35e37fb3d6fbeb32ddcb5eb0d3d3f. Changes to
P-256 assembly dropped because we're so different there.)

 - harmonize handlers with guidelines and themselves;
 - fix some bugs in handlers;

Change-Id: Ic0b6a37bed6baedc50448c72fab088327f12898d
Reviewed-on: https://boringssl-review.googlesource.com/13782
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-16 21:55:04 +00:00
David Benjamin
b19b6626c5 Convert chacha_test to GTest.
BUG=129

Change-Id: Ibbd6d0804a75cb17ff33f64d4cdf9ae80b26e9df
Reviewed-on: https://boringssl-review.googlesource.com/13867
Reviewed-by: Steven Valdez <svaldez@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-02-15 17:16:44 +00:00
Adam Langley
004bff3a14 chacha/asm/chacha-x86_64.pl: add AVX512 path optimized for shorter inputs.
(Imports upstream's 3c274a6e2016b6724fbfe3ff1487efa2a536ece4.)

Change-Id: I2f0c0abff04decd347d4770e6d1d190f1e08afa0
Reviewed-on: https://boringssl-review.googlesource.com/13781
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-14 01:11:42 +00:00
Adam Langley
cf9a98cc0c x86 assembly pack: update performance results.
(Imports upstream's a30b0522cb937be54e172c68b0e9f5fa6ec30bf3.)

Change-Id: I6b9e67f97de935ecaaa9524943c6bdbe3540c0d0
Reviewed-on: https://boringssl-review.googlesource.com/13780
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-14 00:44:17 +00:00
Adam Langley
51079b4ebe x86_64 assembly pack: add AVX512 ChaCha20 path.
(Imports upstream's abb8c44fbaf6b88f4f4879b89b32e423aa75617b.)

Note that the AVX512 code is disabled for now. This just reduces the
diff with upstream.

Change-Id: I61da414e53747ecc869f27883e6ab12c1f8513ff
Reviewed-on: https://boringssl-review.googlesource.com/13779
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-14 00:44:01 +00:00
Adam Langley
b99dc55f21 chacha/asm/chacha-x86.pl: improve [backward] portability.
(Imports upstream's d89773d659129368a341df746476da445d47ad31.)

In order to minimize dependency on assembler version a number of
post-SSE2 instructions are encoded manually. But in order to simplify
the procedure only register operands are considered. Non-register
operands are passed down to assembler. Module in question uses pshufb
with memory operands, and old [GNU] assembler can't handle it.
Fortunately in this case it's possible skip just the problematic
segment without skipping SSSE3 support altogether.

Change-Id: Ic3ba1eef14170f9922c2cc69e0d57315e99a788b
Reviewed-on: https://boringssl-review.googlesource.com/13778
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-14 00:35:12 +00:00
Adam Langley
5ca18d8a47 chacha-x86.pl: simplify feature setting.
We do pass -DOPENSSL_IA32_SSE2 on the command line, so this just had the
effect of setting both values to 1 anyway.

Change-Id: Ia34714bb2fe51cc79d51ef9ee3ffe0354049ed0c
Reviewed-on: https://boringssl-review.googlesource.com/13777
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-14 00:35:09 +00:00
Adam Langley
ff7fb71ab5 x86_64 assembly pack: add Goldmont performance results.
(Imports upstream's ace05265d2d599e350cf84ed60955b7f2b173bc9.)

Change-Id: I151a03d662f7effe87f22fd9db7e0265368798b8
Reviewed-on: https://boringssl-review.googlesource.com/13774
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-14 00:14:15 +00:00
Adam Langley
952f7bff7c Spelling fixes in Perl files.
(Imports upstream's 6025001707fd65679d758c877200469d4e72ea88.)

Change-Id: I2f237d675b029cfc7ba3640aa9ce7248cc230013
Reviewed-on: https://boringssl-review.googlesource.com/13773
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-14 00:14:06 +00:00
Adam Langley
c948d46569 Remove trailing whitespace from Perl files.
Upstream did this in 609b0852e4d50251857dbbac3141ba042e35a9ae and it's
easier to apply patches if we do also.

Change-Id: I5142693ed1e26640987ff16f5ea510e81bba200e
Reviewed-on: https://boringssl-review.googlesource.com/13771
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-14 00:13:55 +00:00
David Benjamin
17cf2cb1d2 Work around language and compiler bug in memcpy, etc.
Most C standard library functions are undefined if passed NULL, even
when the corresponding length is zero. This gives them (and, in turn,
all functions which call them) surprising behavior on empty arrays.
Some compilers will miscompile code due to this rule. See also
https://www.imperialviolet.org/2016/06/26/nonnull.html

Add OPENSSL_memcpy, etc., wrappers which avoid this problem.

BUG=23

Change-Id: I95f42b23e92945af0e681264fffaf578e7f8465e
Reviewed-on: https://boringssl-review.googlesource.com/12928
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2016-12-21 20:34:47 +00:00
David Benjamin
3b6fb5903c Use fewer macros in C ChaCha implementation.
I hear our character set includes such novel symbols as '+'.

Change-Id: I96591a563317e71299748a948d68a849e15b5d60
Reviewed-on: https://boringssl-review.googlesource.com/11009
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2016-09-13 01:56:09 +00:00
David Benjamin
fdd8e9c8c7 Switch perlasm calling convention.
Depending on architecture, perlasm differed on which one or both of:

  perl foo.pl flavor output.S
  perl foo.pl flavor > output.S

Upstream has now unified on the first form after making a number of
changes to their files (the second does not even work for their x86
files anymore). Sync those portions of our perlasm scripts with upstream
and update CMakeLists.txt and generate_build_files.py per the new
convention.

This imports various commits like this one:
184bc45f683c76531d7e065b6553ca9086564576 (this was done by taking a
diff, so I don't have the full list)

Confirmed that generate_build_files.py sees no change.

BUG=14

Change-Id: Id2fb5b8bc2a7369d077221b5df9a6947d41f50d2
Reviewed-on: https://boringssl-review.googlesource.com/8518
Reviewed-by: Adam Langley <agl@google.com>
2016-06-27 21:59:26 +00:00
David Benjamin
bf1905a910 Revert "Import chacha-x86.pl fix."
This reverts commit 762e1d039c. We no longer need
to support out < in. Better to keep the assembly aligned with upstream.

Change-Id: I345bf822953bd0e1e79ad5ab4d337dcb22e7676b
Reviewed-on: https://boringssl-review.googlesource.com/8232
Reviewed-by: Adam Langley <agl@google.com>
2016-06-09 19:49:12 +00:00
David Benjamin
2446db0f52 Require in == out for in-place encryption.
While most of OpenSSL's assembly allows out < in too, some of it doesn't.
Upstream seems to not consider this a problem (or, at least, they're failing to
make a decision on whether it is a problem, so we should assume they'll stay
their course). Accordingly, require aliased buffers to exactly align so we
don't have to keep chasing this down.

Change-Id: I00eb3df3e195b249116c68f7272442918d7077eb
Reviewed-on: https://boringssl-review.googlesource.com/8231
Reviewed-by: Adam Langley <agl@google.com>
2016-06-09 19:49:03 +00:00
David Benjamin
0fe4d8bef5 chacha/asm/chacha-armv8.pl: fix intermittent build failures.
(Imported from b9077d85b0042d3d5d877d5cf7f06a8a8c035673.)

Change-Id: I6df3b3d0913b001712a78671c69b9468e059047f
Reviewed-on: https://boringssl-review.googlesource.com/7682
Reviewed-by: Steven Valdez <svaldez@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2016-04-14 20:50:36 +00:00
David Benjamin
762e1d039c Import chacha-x86.pl fix.
Patch from https://mta.openssl.org/pipermail/openssl-dev/2016-March/005625.html.

Upstream has yet to make a decision on aliasing requirements for their
assembly. If they choose to go with the stricter aliasing requirement rather
than land this patch, we'll probably want to tweak EVP_AEAD's API guarantees
accordingly and then undiverge.

In the meantime, import this to avoid a regression on x86 from when we had
compiler-vectorized code on GCC platforms.  Per our assembly coverage tools and
pending multi-CPU-variant tests, we have good coverage here. Unlike Poly1305
(which is currently waiting on yet another upstream bugfix), where there is
risk of missed carries everywhere, it is much more difficult to accidentally
make a ChaCha20 implementation that fails based on the data passed into it.

This restores a sizeable speed improvement on x86.

Before:
Did 1131000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000205us (1130768.2 ops/sec): 18.1 MB/s
Did 161000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1006136us (160018.1 ops/sec): 216.0 MB/s
Did 28000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1023264us (27363.4 ops/sec): 224.2 MB/s
Did 1166000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000447us (1165479.0 ops/sec): 18.6 MB/s
Did 160000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1004818us (159232.8 ops/sec): 215.0 MB/s
Did 30000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1016977us (29499.2 ops/sec): 241.7 MB/s

After:
Did 2208000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000031us (2207931.6 ops/sec): 35.3 MB/s
Did 402000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001717us (401310.9 ops/sec): 541.8 MB/s
Did 97000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1005394us (96479.6 ops/sec): 790.4 MB/s
Did 2444000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000089us (2443782.5 ops/sec): 39.1 MB/s
Did 459000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000563us (458741.7 ops/sec): 619.3 MB/s
Did 97000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1007942us (96235.7 ops/sec): 788.4 MB/s

Change-Id: I976da606dae062a776e0cc01229ec03a074035d1
Reviewed-on: https://boringssl-review.googlesource.com/7561
Reviewed-by: Steven Valdez <svaldez@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2016-03-28 15:58:24 +00:00
David Benjamin
d7166d07ad Add a standalone ChaCha test.
The coverage tool revealed that we weren't testing all codepaths of the ChaCha
assembly. Add a standalone test as it's much easier to iterate over all lengths
when there isn't the entire AEAD in the way.

I wasn't able to find a really long test vector, so I generated a random one
with the Go implementation we have in runner.

This test gives us full coverage on the ChaCha20_ssse3 variant. (We'll see how
it fares on the other codepaths when the multi-variant test harnesses get in. I
certainly hope there isn't a more novel way to call ChaCha20 than this...)

Change-Id: I087e421c7351f46ea65dacdc7127e4fbf5f4c0aa
Reviewed-on: https://boringssl-review.googlesource.com/7299
Reviewed-by: Adam Langley <agl@google.com>
2016-03-04 19:11:03 +00:00
David Benjamin
886119b9f7 Disable ChaCha20 assembly for OPENSSL_X86.
They fail the newly-added in-place tests. Since we don't have bots for them
yet, verified manually that the arm and aarch64 code is fine.

Change-Id: Ic6f4060f63e713e09707af05e6b7736b7b65c5df
Reviewed-on: https://boringssl-review.googlesource.com/7252
Reviewed-by: Adam Langley <agl@google.com>
2016-02-29 22:30:46 +00:00
David Benjamin
35be688078 Enable upstream's ChaCha20 assembly for x86 and ARM (32- and 64-bit).
This removes chacha_vec_arm.S and chacha_vec.c in favor of unifying on
upstream's code. Upstream's is faster and this cuts down on the number of
distinct codepaths. Our old scheme also didn't give vectorized code on
Windows or aarch64.

BoringSSL-specific modifications made to the assembly:

- As usual, the shelling out to $CC is replaced with hardcoding $avx. I've
  tested up to the AVX2 codepath, so enable it all.

- I've removed the AMD XOP code as I have not tested it.

- As usual, the ARM file need the arm_arch.h include tweaked.

Speed numbers follow. We can hope for further wins on these benchmarks after
importing the Poly1305 assembly.

x86
---
Old:
Did 1422000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000433us (1421384.5 ops/sec): 22.7 MB/s
Did 123000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1003803us (122534.0 ops/sec): 165.4 MB/s
Did 22000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1000282us (21993.8 ops/sec): 180.2 MB/s
Did 1428000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000214us (1427694.5 ops/sec): 22.8 MB/s
Did 124000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1006332us (123219.8 ops/sec): 166.3 MB/s
Did 22000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1020771us (21552.3 ops/sec): 176.6 MB/s
New:
Did 1520000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000567us (1519138.6 ops/sec): 24.3 MB/s
Did 152000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1004216us (151361.9 ops/sec): 204.3 MB/s
Did 31000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1009085us (30720.9 ops/sec): 251.7 MB/s
Did 1797000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000141us (1796746.7 ops/sec): 28.7 MB/s
Did 171000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1003204us (170453.9 ops/sec): 230.1 MB/s
Did 31000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1005349us (30835.1 ops/sec): 252.6 MB/s

x86_64, no AVX2
---
Old:
Did 1782000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000204us (1781636.5 ops/sec): 28.5 MB/s
Did 317000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001579us (316500.2 ops/sec): 427.3 MB/s
Did 62000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1012146us (61256.0 ops/sec): 501.8 MB/s
Did 1778000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000220us (1777608.9 ops/sec): 28.4 MB/s
Did 315000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1002886us (314093.5 ops/sec): 424.0 MB/s
Did 71000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1014606us (69977.9 ops/sec): 573.3 MB/s
New:
Did 1866000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000019us (1865964.5 ops/sec): 29.9 MB/s
Did 399000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001017us (398594.6 ops/sec): 538.1 MB/s
Did 84000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1005645us (83528.5 ops/sec): 684.3 MB/s
Did 1881000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000325us (1880388.9 ops/sec): 30.1 MB/s
Did 404000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000004us (403998.4 ops/sec): 545.4 MB/s
Did 85000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1010048us (84154.4 ops/sec): 689.4 MB/s

x86_64, AVX2
---
Old:
Did 2375000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000282us (2374330.4 ops/sec): 38.0 MB/s
Did 448000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001865us (447166.0 ops/sec): 603.7 MB/s
Did 88000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1005217us (87543.3 ops/sec): 717.2 MB/s
Did 2409000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000188us (2408547.2 ops/sec): 38.5 MB/s
Did 446000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1001003us (445553.1 ops/sec): 601.5 MB/s
Did 90000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1006722us (89399.1 ops/sec): 732.4 MB/s
New:
Did 2622000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000266us (2621302.7 ops/sec): 41.9 MB/s
Did 794000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1000783us (793378.8 ops/sec): 1071.1 MB/s
Did 173000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1000176us (172969.6 ops/sec): 1417.0 MB/s
Did 2623000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000330us (2622134.7 ops/sec): 42.0 MB/s
Did 783000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000531us (782584.4 ops/sec): 1056.5 MB/s
Did 174000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1000840us (173854.0 ops/sec): 1424.2 MB/s

arm, Nexus 4
---
Old:
Did 388550 ChaCha20-Poly1305 (16 bytes) seal operations in 1000580us (388324.8 ops/sec): 6.2 MB/s
Did 90000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1003816us (89657.9 ops/sec): 121.0 MB/s
Did 19000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1045750us (18168.8 ops/sec): 148.8 MB/s
Did 398500 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000305us (398378.5 ops/sec): 6.4 MB/s
Did 90500 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000305us (90472.4 ops/sec): 122.1 MB/s
Did 19000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1043278us (18211.8 ops/sec): 149.2 MB/s
New:
Did 424788 ChaCha20-Poly1305 (16 bytes) seal operations in 1000641us (424515.9 ops/sec): 6.8 MB/s
Did 115000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001526us (114824.8 ops/sec): 155.0 MB/s
Did 27000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1033023us (26136.9 ops/sec): 214.1 MB/s
Did 447750 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000549us (447504.3 ops/sec): 7.2 MB/s
Did 117500 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1001923us (117274.5 ops/sec): 158.3 MB/s
Did 27000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1025118us (26338.4 ops/sec): 215.8 MB/s

aarch64, Nexus 6p
(Note we didn't have aarch64 assembly before at all, and still don't have it
for Poly1305. Hopefully once that's added this will be faster than the arm
numbers...)
---
Old:
Did 145040 ChaCha20-Poly1305 (16 bytes) seal operations in 1003065us (144596.8 ops/sec): 2.3 MB/s
Did 14000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1042605us (13427.9 ops/sec): 18.1 MB/s
Did 2618 ChaCha20-Poly1305 (8192 bytes) seal operations in 1093241us (2394.7 ops/sec): 19.6 MB/s
Did 148000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000709us (147895.1 ops/sec): 2.4 MB/s
Did 14000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1047294us (13367.8 ops/sec): 18.0 MB/s
Did 2607 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1090745us (2390.1 ops/sec): 19.6 MB/s
New:
Did 358000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000769us (357724.9 ops/sec): 5.7 MB/s
Did 45000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1021267us (44062.9 ops/sec): 59.5 MB/s
Did 8591 ChaCha20-Poly1305 (8192 bytes) seal operations in 1047136us (8204.3 ops/sec): 67.2 MB/s
Did 343000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000489us (342832.4 ops/sec): 5.5 MB/s
Did 44000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1008326us (43636.7 ops/sec): 58.9 MB/s
Did 8866 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1083341us (8183.9 ops/sec): 67.0 MB/s

Change-Id: I629fe195d072f2c99e8f947578fad6d70823c4c8
Reviewed-on: https://boringssl-review.googlesource.com/7202
Reviewed-by: Adam Langley <agl@google.com>
2016-02-23 17:19:45 +00:00
David Benjamin
0182ecd346 Consistently use named constants in ARM assembly files.
Most of the OPENSSL_armcap_P accesses in assembly use named constants from
arm_arch.h, but some don't. Consistently use the constants. The dispatch really
should be in C, but in the meantime, make it easier to tell what's going on.

I'll send this patch upstream so we won't be carrying a diff here.

Change-Id: I63c68d2351ea5ce11005813314988e32b6459526
Reviewed-on: https://boringssl-review.googlesource.com/7203
Reviewed-by: Adam Langley <agl@google.com>
2016-02-23 17:18:18 +00:00
David Benjamin
295960044b Fix chacha-armv4.pl.
Patch taken from https://rt.openssl.org/Ticket/Display.html?id=4323.

Change-Id: Icbaf8f9a0f92da48f213b251b0afa4b0d5aa627d
Reviewed-on: https://boringssl-review.googlesource.com/7201
Reviewed-by: Adam Langley <agl@google.com>
2016-02-23 01:07:48 +00:00
David Benjamin
ea4d6863c7 Check in pristine copies of OpenSSL's chacha-{arm*,x86}.pl.
They won't be used as-is. This is just to make the diffs easier to see. Taken
from upstream's 4f16039efe3589aa4d63a6f1fab799d0cd9338ca.

Change-Id: I34d8be409f9c8f15b8a6da4b2d98ba3e60aa2210
Reviewed-on: https://boringssl-review.googlesource.com/7200
Reviewed-by: Adam Langley <agl@google.com>
2016-02-23 01:06:43 +00:00
Brian Smith
f547007332 Use |alignas| more in crypto/chacha/chacha_vec.c.
Commit 75a64c08fc missed one case where
the GCC syntax should have been replaced with |alignas|.

Change-Id: Iebdaa9c9a2c0aff171f0b5d4daac607e351a4b7e
Reviewed-on: https://boringssl-review.googlesource.com/6974
Reviewed-by: David Benjamin <davidben@google.com>
2016-01-27 22:12:22 +00:00
Brian Smith
7cae9f5b6c Use |alignas| for alignment.
MSVC doesn't have stdalign.h and so doesn't support |alignas| in C
code. Define |alignas(x)| as a synonym for |__decltype(align(x))|
instead for it.

This also fixes -Wcast-qual warnings in rsaz_exp.c.

Change-Id: Ifce9031724cb93f5a4aa1f567e7af61b272df9d5
Reviewed-on: https://boringssl-review.googlesource.com/6924
Reviewed-by: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2016-01-25 23:05:04 +00:00
Brian Smith
d3a4e280db Fix trivial -Wcast-qual violations.
Fix casts from const to non-const where dropping the constness is
completely unnecessary. The changes to chacha_vec.c don't result in any
changes to chacha_vec_arm.S.

Change-Id: I2f10081fd0e73ff5db746347c5971f263a5221a6
Reviewed-on: https://boringssl-review.googlesource.com/6923
Reviewed-by: David Benjamin <davidben@google.com>
2016-01-21 21:06:02 +00:00
Brian Smith
e80a2ecd0d Change |CRYPTO_chacha_20| to use 96-bit nonces, 32-bit counters.
The new function |CRYPTO_chacha_96_bit_nonce_from_64_bit_nonce| can be
used to adapt code from that uses 64 bit nonces, in a way that is
compatible with the old semantics.

Change-Id: I83d5b2d482e006e82982f58c9f981e8078c3e1b0
Reviewed-on: https://boringssl-review.googlesource.com/6100
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 23:58:46 +00:00
William Hesse
6dc1851f30 Fix aarch64 (64-bit ARM) guard on chacha_vec_arm.S.
Change-Id: Ia3632639daa8655ea5e2f81ba2a5163949f522b2
Reviewed-on: https://boringssl-review.googlesource.com/6110
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 23:32:38 +00:00
Brian Smith
953cfc837f Document how to regenerate crypto/chacha/chacha_vec_arm.S.
Also, organize the links in BUILDING.md sensibly.

Change-Id: Ie9c65750849fcdab7a6a6bf11d1c9cdafb53bc00
Reviewed-on: https://boringssl-review.googlesource.com/6140
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 23:29:57 +00:00
Adam Langley
0dd93002dd Revert section changes for ASM.
This change reverts the following commits:
  72d9cba7cb
  5b61b9ebc5
  3f85e04f40
  2ab24a2d40

Change-Id: I669b83f2269cf96aa71a649a346147b9407a811e
Reviewed-on: https://boringssl-review.googlesource.com/6056
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2015-09-30 22:09:52 +00:00
Adam Langley
5b61b9ebc5 Update ChaCha20 ARM asm with sections.
The ChaCha20 ARM asm is generated from GCC. This change updates the GCC
command line to include -ffunction-sections, which causes GCC to put
each function in its own section so that the linker with --gc-sections
can trim unused functions.

Since the file only has a single function, this is a bit useless, but
it'll now be consistent with the other ARM asm.

Change-Id: If12c675700310ea55af817b5433844eeffc9d029
Reviewed-on: https://boringssl-review.googlesource.com/6006
Reviewed-by: Adam Langley <agl@google.com>
2015-09-29 18:07:54 +00:00
Adam Langley
73415b6aa0 Move arm_arch.h and fix up lots of include paths.
arm_arch.h is included from ARM asm files, but lives in crypto/, not
openssl/include/. Since the asm files are often built from a different
location than their position in the source tree, relative include paths
are unlikely to work so, rather than having crypto/ be a de-facto,
second global include path, this change moves arm_arch.h to
include/openssl/.

It also removes entries from many include paths because they should be
needed as relative includes are always based on the locations of the
source file.

Change-Id: I638ff43d641ca043a4fc06c0d901b11c6ff73542
Reviewed-on: https://boringssl-review.googlesource.com/5746
Reviewed-by: Adam Langley <agl@google.com>
2015-08-26 01:57:59 +00:00
Adam Langley
9eaf07d460 Emit #if guards for ARM assembly files.
This change causes the generated assembly files for ARM and AArch64 to
have #if guards for __arm__ and __aarch64__, respectively. Since
building on ARM is only supported for Linux, we only have to worry about
GCC/Clang's predefines.

Change-Id: I7198eab6230bcfc26257f0fb6a0cc3166df0bb29
Reviewed-on: https://boringssl-review.googlesource.com/5173
Reviewed-by: Adam Langley <agl@google.com>
2015-06-23 21:00:32 +00:00
Adam Langley
26c2b929ba Switch nonce type in chacha_vec.c to uint32_t.
This was suggested in https://boringssl-review.googlesource.com/#/c/3460
but I forgot to upload the change before submitting in Gerrit.

Change-Id: I3a333fe2e8880603a9027638dd013f21d8270638
2015-02-13 13:16:59 -08:00
Adam Langley
d306f165a4 Don't require the ChaCha nonce to be aligned on ARM.
Change-Id: I34ee66fcc53d3371591beee3373c46598c31b5c5
Reviewed-on: https://boringssl-review.googlesource.com/3460
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2015-02-13 20:35:36 +00:00
Adam Langley
868c7ef1f4 Don't assume alignment of ChaCha key on ARM.
When addressing [1], I checked the AEAD code but brain-farted: a key is
aligned in that code, but it's the Poly1305 key, which doesn't matter
here.

It would be nice to align the ChaCha key too, but Android doesn't have
|posix_memalign| in the versions that we care about. It does have
|memalign|, but that's documented as "obsolete" and we don't have a
concept of an Android OS yet and I don't want to add one just for this.

So this change uses the buffer for loading the key again.

(Note that we never used to check for alignment of the |key| before
calling this. We must have gotten it for free somehow when checking the
alignment of |in| and |out|. But there are clearly some paths that don't
have an aligned key:
https://code.google.com/p/chromium/issues/detail?id=454308.)

At least the generation script started paying off immediately ☺.

[1] https://boringssl-review.googlesource.com/#/c/3132/1/crypto/chacha/chacha_vec.c@185

Change-Id: I4f893ba0733440fddd453f9636cc2aeaf05076ed
Reviewed-on: https://boringssl-review.googlesource.com/3270
Reviewed-by: Adam Langley <agl@google.com>
2015-02-03 00:34:17 +00:00
Adam Langley
2b2d66d409 Remove string.h from base.h.
Including string.h in base.h causes any file that includes a BoringSSL
header to include string.h. Generally this wouldn't be a problem,
although string.h might slow down the compile if it wasn't otherwise
needed. However, it also causes problems for ipsec-tools in Android
because OpenSSL didn't have this behaviour.

This change removes string.h from base.h and, instead, adds it to each
.c file that requires it.

Change-Id: I5968e50b0e230fd3adf9b72dd2836e6f52d6fb37
Reviewed-on: https://boringssl-review.googlesource.com/3200
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2015-02-02 19:14:15 +00:00
Brian Smith
efed2210e8 Enable more warnings & treat warnings as errors on Windows.
Change-Id: I2bf0144aaa8b670ff00b8e8dfe36bd4d237b9a8a
Reviewed-on: https://boringssl-review.googlesource.com/3140
Reviewed-by: Adam Langley <agl@google.com>
2015-01-31 00:18:55 +00:00
Douglas Katzman
0bb81fcd66 Fix misleading comment.
|num_rounds| is neither a parameter nor manifest constant.

Change-Id: I6c1d3a3819731f53fdd01eef6bb4de8a45176a1d
Reviewed-on: https://boringssl-review.googlesource.com/3180
Reviewed-by: Adam Langley <agl@google.com>
2015-01-30 22:31:00 +00:00
Adam Langley
ab21891e34 Add a script to generate the ChaCha ARM asm.
Obviously I shouldn't be doing this by hand each time.

Change-Id: I64e3f5ede5c47eddbff3b18172a95becc681b486
Reviewed-on: https://boringssl-review.googlesource.com/3170
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2015-01-30 21:47:22 +00:00
Adam Langley
be629e0e92 Another fix for the regenerated chacha_vec_arm.S.
I put the header back, but missed the #endif at the end of the file.
Regenerating this is clearly too error prone – I'll write a script to do
it for the future.

Change-Id: I06968c9f7a4673f5942725e727c67cb4e01d361a
2015-01-30 10:28:37 -08:00
Adam Langley
d5c0980e33 Update the command line for generating the ChaCha asm.
This command line matches what was used to generate the current file.

Change-Id: I2f730768f4efc1b23860cc7b27d5005311c554ea
2015-01-29 12:20:07 -08:00
Adam Langley
8de7597af3 Don't require alignment in ChaCha20 on ARM.
By copying the input and output data via an aligned buffer, the
alignment requirements for the NEON ChaCha implementation on ARM can be
eliminted. This does, however, reduce the speed when aligned buffers are
used. However, updating the GCC version used to generate the ASM more
than makes up for that.

On a SnapDragon 801 (OnePlus One) the aligned speed was 214.6 MB/s and
the unaligned speed was 112.1 MB/s. Now both are 218.4 MB/s. A Nexus 7
also shows a slight speed up.

Change-Id: I68321ba56767fa5354b31a1491a539b299236e9a
Reviewed-on: https://boringssl-review.googlesource.com/3132
Reviewed-by: Adam Langley <agl@google.com>
2015-01-29 20:06:41 +00:00