(Imports upstream's 384e6de4c7e35e37fb3d6fbeb32ddcb5eb0d3d3f. Changes to
P-256 assembly dropped because we're so different there.)
- harmonize handlers with guidelines and themselves;
- fix some bugs in handlers;
Change-Id: Ic0b6a37bed6baedc50448c72fab088327f12898d
Reviewed-on: https://boringssl-review.googlesource.com/13782
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: David Benjamin <davidben@google.com>
(Imports upstream's ace05265d2d599e350cf84ed60955b7f2b173bc9.)
Change-Id: I151a03d662f7effe87f22fd9db7e0265368798b8
Reviewed-on: https://boringssl-review.googlesource.com/13774
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
(Imports upstream's 6025001707fd65679d758c877200469d4e72ea88.)
Change-Id: I2f237d675b029cfc7ba3640aa9ce7248cc230013
Reviewed-on: https://boringssl-review.googlesource.com/13773
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
(Imports upstream's b7f5503fa6e1feebec2ac12b8ddcb5b5672452a6.)
Change-Id: Ia8d2a8f71c97265d77ef8f6fc3cdfb7cf411c5ce
Reviewed-on: https://boringssl-review.googlesource.com/13772
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Upstream did this in 609b0852e4d50251857dbbac3141ba042e35a9ae and it's
easier to apply patches if we do also.
Change-Id: I5142693ed1e26640987ff16f5ea510e81bba200e
Reviewed-on: https://boringssl-review.googlesource.com/13771
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Measured on a SkyLake processor:
Before:
Did 11373750 AES-128-GCM (16 bytes) seal operations in 1016000us (11194635.8 ops/sec): 179.1 MB/s
Did 2253000 AES-128-GCM (1350 bytes) seal operations in 1016000us (2217519.7 ops/sec): 2993.7 MB/s
Did 453750 AES-128-GCM (8192 bytes) seal operations in 1015000us (447044.3 ops/sec): 3662.2 MB/s
Did 10753500 AES-256-GCM (16 bytes) seal operations in 1016000us (10584153.5 ops/sec): 169.3 MB/s
Did 1898750 AES-256-GCM (1350 bytes) seal operations in 1015000us (1870689.7 ops/sec): 2525.4 MB/s
Did 374000 AES-256-GCM (8192 bytes) seal operations in 1016000us (368110.2 ops/sec): 3015.6 MB/s
After:
Did 11074000 AES-128-GCM (16 bytes) seal operations in 1015000us (10910344.8 ops/sec): 174.6 MB/s
Did 3178250 AES-128-GCM (1350 bytes) seal operations in 1016000us (3128198.8 ops/sec): 4223.1 MB/s
Did 734500 AES-128-GCM (8192 bytes) seal operations in 1016000us (722933.1 ops/sec): 5922.3 MB/s
Did 10394750 AES-256-GCM (16 bytes) seal operations in 1015000us (10241133.0 ops/sec): 163.9 MB/s
Did 2502250 AES-256-GCM (1350 bytes) seal operations in 1016000us (2462844.5 ops/sec): 3324.8 MB/s
Did 544500 AES-256-GCM (8192 bytes) seal operations in 1015000us (536453.2 ops/sec): 4394.6 MB/s
Change-Id: If058935796441ed4e577b9a72d3aa43422edba58
Reviewed-on: https://boringssl-review.googlesource.com/7273
Reviewed-by: Adam Langley <alangley@gmail.com>
BoringSSL will always use the SSE version so this is all dead code.
Change-Id: I0f3b51ee29144b5c83d2553c92bebae901b6366f
Reviewed-on: https://boringssl-review.googlesource.com/13023
Reviewed-by: Adam Langley <alangley@gmail.com>
BoringSSL can assume that MMX, SSE, and SSE2 is always supported so
there is no need for a runtime check and there's no need for this
fallback code. Removing the code improves coverage analysis and shrinks
code size.
Change-Id: I782a1bae228f700895ada0bc56687e53cd02b5df
Reviewed-on: https://boringssl-review.googlesource.com/13022
Reviewed-by: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <alangley@gmail.com>
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <alangley@gmail.com>
This change adds AES and GHASH assembly from upstream, with the aim of
speeding up AES-GCM.
The PPC64LE assembly matches the interface of the ARMv8 assembly so I've
changed the prefix of both sets of asm functions to be the same
("aes_hw_").
Otherwise, the new assmebly files and Perlasm match exactly those from
upstream's c536b6be1a (from their master branch).
Before:
Did 1879000 AES-128-GCM (16 bytes) seal operations in 1000428us (1878196.1 ops/sec): 30.1 MB/s
Did 61000 AES-128-GCM (1350 bytes) seal operations in 1006660us (60596.4 ops/sec): 81.8 MB/s
Did 11000 AES-128-GCM (8192 bytes) seal operations in 1072649us (10255.0 ops/sec): 84.0 MB/s
Did 1665000 AES-256-GCM (16 bytes) seal operations in 1000591us (1664016.6 ops/sec): 26.6 MB/s
Did 52000 AES-256-GCM (1350 bytes) seal operations in 1006971us (51640.0 ops/sec): 69.7 MB/s
Did 8840 AES-256-GCM (8192 bytes) seal operations in 1013294us (8724.0 ops/sec): 71.5 MB/s
After:
Did 4994000 AES-128-GCM (16 bytes) seal operations in 1000017us (4993915.1 ops/sec): 79.9 MB/s
Did 1389000 AES-128-GCM (1350 bytes) seal operations in 1000073us (1388898.6 ops/sec): 1875.0 MB/s
Did 319000 AES-128-GCM (8192 bytes) seal operations in 1000101us (318967.8 ops/sec): 2613.0 MB/s
Did 4668000 AES-256-GCM (16 bytes) seal operations in 1000149us (4667304.6 ops/sec): 74.7 MB/s
Did 1202000 AES-256-GCM (1350 bytes) seal operations in 1000646us (1201224.0 ops/sec): 1621.7 MB/s
Did 269000 AES-256-GCM (8192 bytes) seal operations in 1002804us (268247.8 ops/sec): 2197.5 MB/s
Change-Id: Id848562bd4e1aa79a4683012501dfa5e6c08cfcc
Reviewed-on: https://boringssl-review.googlesource.com/11262
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Some, very recent, versions of Clang now support `.arch`. Allow them to
see these directives with BORINGSSL_CLANG_SUPPORTS_DOT_ARCH.
BUG=39
Change-Id: I122ab4b3d5f14502ffe0c6e006950dc64abf0201
Reviewed-on: https://boringssl-review.googlesource.com/10600
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Depending on architecture, perlasm differed on which one or both of:
perl foo.pl flavor output.S
perl foo.pl flavor > output.S
Upstream has now unified on the first form after making a number of
changes to their files (the second does not even work for their x86
files anymore). Sync those portions of our perlasm scripts with upstream
and update CMakeLists.txt and generate_build_files.py per the new
convention.
This imports various commits like this one:
184bc45f683c76531d7e065b6553ca9086564576 (this was done by taking a
diff, so I don't have the full list)
Confirmed that generate_build_files.py sees no change.
BUG=14
Change-Id: Id2fb5b8bc2a7369d077221b5df9a6947d41f50d2
Reviewed-on: https://boringssl-review.googlesource.com/8518
Reviewed-by: Adam Langley <agl@google.com>
There was some uncertainty about what the code is doing with |$end0|
and whether it was necessary for |$len| to be a multiple of 16 or 96.
Hopefully these added comments make it clear that the code is correct
except for the caveat regarding low memory addresses.
Change-Id: Iea546a59dc7aeb400f50ac5d2d7b9cb88ace9027
Reviewed-on: https://boringssl-review.googlesource.com/7194
Reviewed-by: Adam Langley <agl@google.com>
This change makes the AEAD and EVP code paths use the same code for
AES-GCM. When AVX instructions are enabled in the assembly this will
allow them to use the stitched AES-GCM implementation.
Note that the stitched implementations are no-ops for small inputs
(smaller than 288 bytes for encryption; smaller than 96 bytes for
decryption). This means that only a handful of test cases with longish
inputs actually test the stitched code.
Change-Id: Iece8003d90448dcac9e0bde1f42ff102ebe1a1c9
Reviewed-on: https://boringssl-review.googlesource.com/7173
Reviewed-by: Adam Langley <agl@google.com>
See OpenSSL df057ea6c8a20e4babc047689507dfafde59ffd6.
Change-Id: Ife10dc13ca335cd51434d7769ff85c6929f10226
Reviewed-on: https://boringssl-review.googlesource.com/7172
Reviewed-by: David Benjamin <davidben@google.com>
We'd manually marked some of them hidden, but missed some. Do it in the perlasm
driver instead since we will never expose an asm symbol directly. This reduces
some of our divergence from upstream on these files (and indeed we'd
accidentally lose some .hiddens at one point).
BUG=586141
Change-Id: Ie1bfc6f38ba73d33f5c56a8a40c2bf1668562e7e
Reviewed-on: https://boringssl-review.googlesource.com/7140
Reviewed-by: Adam Langley <agl@google.com>
Triggered by RT#3989.
(Imported from upstream's fbab8baddef8d3346ae40ff068871e2ddaf10270. This
doesn't seem to affect us, but avoid getting out of sync.)
Change-Id: I164e2a72e4b75e286ceaa03745ed9bcbf6c3e32e
Reviewed-on: https://boringssl-review.googlesource.com/6512
Reviewed-by: Adam Langley <agl@google.com>
Since we pre-generate our perlasm, having the output of these files be
sensitive to the environment the run in is unhelpful. It would be bad to
suddenly change what features we do or don't compile in whenever workstations'
toolchains change or if developers do or don't have CC variables set.
Previously, all compiler-version-gated features were turned on in
https://boringssl-review.googlesource.com/6260, but this broke the build. I
also wasn't thorough enough in gathering performance numbers. So, flip them all
to off instead. I'll enable them one-by-one as they're tested.
This should result in no change to generated assembly.
Change-Id: Ib4259b3f97adc4939cb0557c5580e8def120d5bc
Reviewed-on: https://boringssl-review.googlesource.com/6383
Reviewed-by: Adam Langley <agl@google.com>
This reverts commit b9c26014de.
The win64 bot seems unhappy. Will sniff at it tomorrow. In
the meantime, get the tree green again.
Change-Id: I058ddb3ec549beee7eabb2f3f72feb0a4a5143b2
Reviewed-on: https://boringssl-review.googlesource.com/6353
Reviewed-by: Adam Langley <alangley@gmail.com>
Since we pre-generate our perlasm, having the output of these files be
sensitive to the environment the run in is unhelpful. It would be bad to
suddenly change what features we do or don't compile in whenever workstations'
toolchains change.
Enable all compiler-version-gated features as they should all be runtime-gated
anyway. This should align with what upstream's files would have produced on
modern toolschains. We should assume our assemblers can take whatever we'd like
to throw at them. (If it turns out some can't, we'd rather find out and
probably switch the problematic instructions to explicit byte sequences.)
This actually results in a fairly significant change to the assembly we
generate. I'm guessing upstream's buildsystem sets the CC environment variable,
while ours doesn't and so the version checks were all coming out conservative.
diffstat of generated files:
linux-x86/crypto/sha/sha1-586.S | 1176 ++++++++++++
linux-x86/crypto/sha/sha256-586.S | 2248 ++++++++++++++++++++++++
linux-x86_64/crypto/bn/rsaz-avx2.S | 1644 +++++++++++++++++
linux-x86_64/crypto/bn/rsaz-x86_64.S | 638 ++++++
linux-x86_64/crypto/bn/x86_64-mont.S | 332 +++
linux-x86_64/crypto/bn/x86_64-mont5.S | 1130 ++++++++++++
linux-x86_64/crypto/modes/aesni-gcm-x86_64.S | 754 ++++++++
linux-x86_64/crypto/modes/ghash-x86_64.S | 475 +++++
linux-x86_64/crypto/sha/sha1-x86_64.S | 1121 ++++++++++++
linux-x86_64/crypto/sha/sha256-x86_64.S | 1062 +++++++++++
linux-x86_64/crypto/sha/sha512-x86_64.S | 2241 ++++++++++++++++++++++++
mac-x86/crypto/sha/sha1-586.S | 1174 ++++++++++++
mac-x86/crypto/sha/sha256-586.S | 2248 ++++++++++++++++++++++++
mac-x86_64/crypto/bn/rsaz-avx2.S | 1637 +++++++++++++++++
mac-x86_64/crypto/bn/rsaz-x86_64.S | 638 ++++++
mac-x86_64/crypto/bn/x86_64-mont.S | 331 +++
mac-x86_64/crypto/bn/x86_64-mont5.S | 1130 ++++++++++++
mac-x86_64/crypto/modes/aesni-gcm-x86_64.S | 750 ++++++++
mac-x86_64/crypto/modes/ghash-x86_64.S | 475 +++++
mac-x86_64/crypto/sha/sha1-x86_64.S | 1121 ++++++++++++
mac-x86_64/crypto/sha/sha256-x86_64.S | 1062 +++++++++++
mac-x86_64/crypto/sha/sha512-x86_64.S | 2241 ++++++++++++++++++++++++
win-x86/crypto/sha/sha1-586.asm | 1173 ++++++++++++
win-x86/crypto/sha/sha256-586.asm | 2248 ++++++++++++++++++++++++
win-x86_64/crypto/bn/rsaz-avx2.asm | 1858 +++++++++++++++++++-
win-x86_64/crypto/bn/rsaz-x86_64.asm | 638 ++++++
win-x86_64/crypto/bn/x86_64-mont.asm | 352 +++
win-x86_64/crypto/bn/x86_64-mont5.asm | 1184 ++++++++++++
win-x86_64/crypto/modes/aesni-gcm-x86_64.asm | 933 ++++++++++
win-x86_64/crypto/modes/ghash-x86_64.asm | 515 +++++
win-x86_64/crypto/sha/sha1-x86_64.asm | 1152 ++++++++++++
win-x86_64/crypto/sha/sha256-x86_64.asm | 1088 +++++++++++
win-x86_64/crypto/sha/sha512-x86_64.asm | 2499 ++++++
SHA* gets faster. RSA and AES-GCM seem to be more of a wash and even slower
sometimes! This is a little concerning. Though when I repeated the latter two,
it's definitely noisy (RSA in particular), so we may wish to repeat in a more
controlled environment. We could also flip some of these toggles to something
other than the highest setting if it seems some of the variants aren't
desirable. We just shouldn't have them enabled or disabled on accident. This
aligns us closer to upstream though.
$ /tmp/bssl.old speed SHA-
Did 5028000 SHA-1 (16 bytes) operations in 1000048us (5027758.7 ops/sec): 80.4 MB/s
Did 1708000 SHA-1 (256 bytes) operations in 1000257us (1707561.2 ops/sec): 437.1 MB/s
Did 73000 SHA-1 (8192 bytes) operations in 1008406us (72391.5 ops/sec): 593.0 MB/s
Did 3041000 SHA-256 (16 bytes) operations in 1000311us (3040054.5 ops/sec): 48.6 MB/s
Did 779000 SHA-256 (256 bytes) operations in 1000820us (778361.7 ops/sec): 199.3 MB/s
Did 26000 SHA-256 (8192 bytes) operations in 1009875us (25745.8 ops/sec): 210.9 MB/s
Did 1837000 SHA-512 (16 bytes) operations in 1000251us (1836539.0 ops/sec): 29.4 MB/s
Did 803000 SHA-512 (256 bytes) operations in 1000969us (802222.6 ops/sec): 205.4 MB/s
Did 41000 SHA-512 (8192 bytes) operations in 1016768us (40323.8 ops/sec): 330.3 MB/s
$ /tmp/bssl.new speed SHA-
Did 5354000 SHA-1 (16 bytes) operations in 1000104us (5353443.2 ops/sec): 85.7 MB/s
Did 1779000 SHA-1 (256 bytes) operations in 1000121us (1778784.8 ops/sec): 455.4 MB/s
Did 87000 SHA-1 (8192 bytes) operations in 1012641us (85914.0 ops/sec): 703.8 MB/s
Did 3517000 SHA-256 (16 bytes) operations in 1000114us (3516599.1 ops/sec): 56.3 MB/s
Did 935000 SHA-256 (256 bytes) operations in 1000096us (934910.2 ops/sec): 239.3 MB/s
Did 38000 SHA-256 (8192 bytes) operations in 1004476us (37830.7 ops/sec): 309.9 MB/s
Did 2930000 SHA-512 (16 bytes) operations in 1000259us (2929241.3 ops/sec): 46.9 MB/s
Did 1008000 SHA-512 (256 bytes) operations in 1000509us (1007487.2 ops/sec): 257.9 MB/s
Did 45000 SHA-512 (8192 bytes) operations in 1000593us (44973.3 ops/sec): 368.4 MB/s
$ /tmp/bssl.old speed RSA
Did 820 RSA 2048 signing operations in 1017008us (806.3 ops/sec)
Did 27000 RSA 2048 verify operations in 1015400us (26590.5 ops/sec)
Did 1292 RSA 2048 (3 prime, e=3) signing operations in 1008185us (1281.5 ops/sec)
Did 65000 RSA 2048 (3 prime, e=3) verify operations in 1011388us (64268.1 ops/sec)
Did 120 RSA 4096 signing operations in 1061027us (113.1 ops/sec)
Did 8208 RSA 4096 verify operations in 1002717us (8185.8 ops/sec)
$ /tmp/bssl.new speed RSA
Did 760 RSA 2048 signing operations in 1003351us (757.5 ops/sec)
Did 25900 RSA 2048 verify operations in 1028931us (25171.8 ops/sec)
Did 1320 RSA 2048 (3 prime, e=3) signing operations in 1040806us (1268.2 ops/sec)
Did 63000 RSA 2048 (3 prime, e=3) verify operations in 1016042us (62005.3 ops/sec)
Did 104 RSA 4096 signing operations in 1008718us (103.1 ops/sec)
Did 6875 RSA 4096 verify operations in 1093441us (6287.5 ops/sec)
$ /tmp/bssl.old speed GCM
Did 5316000 AES-128-GCM (16 bytes) seal operations in 1000082us (5315564.1 ops/sec): 85.0 MB/s
Did 712000 AES-128-GCM (1350 bytes) seal operations in 1000252us (711820.6 ops/sec): 961.0 MB/s
Did 149000 AES-128-GCM (8192 bytes) seal operations in 1003182us (148527.4 ops/sec): 1216.7 MB/s
Did 5919750 AES-256-GCM (16 bytes) seal operations in 1000016us (5919655.3 ops/sec): 94.7 MB/s
Did 800000 AES-256-GCM (1350 bytes) seal operations in 1000951us (799239.9 ops/sec): 1079.0 MB/s
Did 152000 AES-256-GCM (8192 bytes) seal operations in 1000765us (151883.8 ops/sec): 1244.2 MB/s
$ /tmp/bssl.new speed GCM
Did 5315000 AES-128-GCM (16 bytes) seal operations in 1000125us (5314335.7 ops/sec): 85.0 MB/s
Did 755000 AES-128-GCM (1350 bytes) seal operations in 1000878us (754337.7 ops/sec): 1018.4 MB/s
Did 151000 AES-128-GCM (8192 bytes) seal operations in 1005655us (150150.9 ops/sec): 1230.0 MB/s
Did 5913500 AES-256-GCM (16 bytes) seal operations in 1000041us (5913257.6 ops/sec): 94.6 MB/s
Did 782000 AES-256-GCM (1350 bytes) seal operations in 1001484us (780841.2 ops/sec): 1054.1 MB/s
Did 121000 AES-256-GCM (8192 bytes) seal operations in 1006389us (120231.8 ops/sec): 984.9 MB/s
Change-Id: I0efb32f896c597abc7d7e55c31d038528a5c72a1
Reviewed-on: https://boringssl-review.googlesource.com/6260
Reviewed-by: Adam Langley <alangley@gmail.com>
∙ Some comments had the wrong function name at the beginning.
∙ Some ARM asm ended up with two #if defined(__arm__) lines – one from
the .pl file and one inserted by the translation script.
Change-Id: Ia8032cd09f06a899bf205feebc2d535a5078b521
Reviewed-on: https://boringssl-review.googlesource.com/6000
Reviewed-by: Adam Langley <agl@google.com>
arm_arch.h is included from ARM asm files, but lives in crypto/, not
openssl/include/. Since the asm files are often built from a different
location than their position in the source tree, relative include paths
are unlikely to work so, rather than having crypto/ be a de-facto,
second global include path, this change moves arm_arch.h to
include/openssl/.
It also removes entries from many include paths because they should be
needed as relative includes are always based on the locations of the
source file.
Change-Id: I638ff43d641ca043a4fc06c0d901b11c6ff73542
Reviewed-on: https://boringssl-review.googlesource.com/5746
Reviewed-by: Adam Langley <agl@google.com>
See upstream's 9f0b86c68bb96d49301bbd6473c8235ca05ca06b. Generated by
using upstream's script in 5a3ce86e21715a683ff0d32421ed5c6d5e84234d and
then manually throwing out the false positives. (We converted a bunch of
stuff already in 91157550061d5d794898fe47b95384a7ba5f7b9d.)
This may require some wrestling with depot_tools to land in Chromium due
to Rietveld's encoding bugs, but hopefully that will avoid future
problems; Rietveld breaks if either old or new file is Latin-1.
Change-Id: I26dcb20c7377f92a0c843ef5d74d440a82ea8ceb
Reviewed-on: https://boringssl-review.googlesource.com/5483
Reviewed-by: Adam Langley <agl@google.com>
Clang (3.6, at least) doesn't like .arch when its internal as is used.
Instead, one has to pass -march=armv8-a+crypto on the command line.
Change-Id: Ifc5b57fbebd0eb53658481b0a0c111e808c81d93
Reviewed-on: https://boringssl-review.googlesource.com/4411
Reviewed-by: Adam Langley <agl@google.com>
This facilitates "universal" builds, ones that target multiple
architectures, e.g. ARMv5 through ARMv7.
(Imported from upstream's c1669e1c205dc8e695fb0c10a655f434e758b9f7)
This is a change from a while ago which was a source of divergence between our
perlasm and upstream's. This change in upstream came with the following comment
in Configure:
Note that -march is not among compiler options in below linux-armv4
target line. Not specifying one is intentional to give you choice to:
a) rely on your compiler default by not specifying one;
b) specify your target platform explicitly for optimal performance,
e.g. -march=armv6 or -march=armv7-a;
c) build "universal" binary that targets *range* of platforms by
specifying minimum and maximum supported architecture;
As for c) option. It actually makes no sense to specify maximum to be
less than ARMv7, because it's the least requirement for run-time
switch between platform-specific code paths. And without run-time
switch performance would be equivalent to one for minimum. Secondly,
there are some natural limitations that you'd have to accept and
respect. Most notably you can *not* build "universal" binary for
big-endian platform. This is because ARMv7 processor always picks
instructions in little-endian order. Another similar limitation is
that -mthumb can't "cross" -march=armv6t2 boundary, because that's
where it became Thumb-2. Well, this limitation is a bit artificial,
because it's not really impossible, but it's deemed too tricky to
support. And of course you have to be sure that your binutils are
actually up to the task of handling maximum target platform.
Change-Id: Ie5f674d603393f0a1354a0d0973987484a4a650c
Reviewed-on: https://boringssl-review.googlesource.com/4488
Reviewed-by: Adam Langley <agl@google.com>
Pointer out and suggested by: Ard Biesheuvel.
(Imported from upstream's 5dcf70a1c57c2019bfad640fe14fd4a73212860a)
This is from a while ago, but it's one source of divergence between our copy of
these files and master's.
Change-Id: I6525a27f25eb86a92420c32996af47ecc42ee020
Reviewed-on: https://boringssl-review.googlesource.com/4487
Reviewed-by: Adam Langley <agl@google.com>
(Imported from upstream's 7eeeb49e1103533bc81c234eb19613353866e474)
Here are the performance numbers on a Nexus 9 (32-bit binary):
Before:
Did 4376000 AES-128-GCM (16 bytes) seal operations in 1000016us (4375930.0 ops/sec): 70.0 MB/s
Did 642000 AES-128-GCM (1350 bytes) seal operations in 1001090us (641301.0 ops/sec): 865.8 MB/s
Did 126000 AES-128-GCM (8192 bytes) seal operations in 1001460us (125816.3 ops/sec): 1030.7 MB/s
Did 4120000 AES-256-GCM (16 bytes) seal operations in 1000004us (4119983.5 ops/sec): 65.9 MB/s
Did 547000 AES-256-GCM (1350 bytes) seal operations in 1001165us (546363.5 ops/sec): 737.6 MB/s
Did 99000 AES-256-GCM (8192 bytes) seal operations in 1000027us (98997.3 ops/sec): 811.0 MB/s
After:
Did 4569000 AES-128-GCM (16 bytes) seal operations in 1000011us (4568949.7 ops/sec): 73.1 MB/s
Did 796000 AES-128-GCM (1350 bytes) seal operations in 1000161us (795871.9 ops/sec): 1074.4 MB/s
Did 162000 AES-128-GCM (8192 bytes) seal operations in 1003828us (161382.2 ops/sec): 1322.0 MB/s
Did 4398000 AES-256-GCM (16 bytes) seal operations in 1000001us (4397995.6 ops/sec): 70.4 MB/s
Did 634000 AES-256-GCM (1350 bytes) seal operations in 1001290us (633183.2 ops/sec): 854.8 MB/s
Did 122000 AES-256-GCM (8192 bytes) seal operations in 1005650us (121314.6 ops/sec): 993.8 MB/s
Change-Id: I2fef921069ad174f5651dfe59be262625fb3f7c9
Reviewed-on: https://boringssl-review.googlesource.com/4483
Reviewed-by: Adam Langley <agl@google.com>
This is as partial import of upstream's
9b05cbc33e7895ed033b1119e300782d9e0cf23c. It includes the perlasm changes, but
not the CPU feature detection bits as we do those differently. This is largely
so we don't diverge from upstream, but it'll help with iOS assembly in the
future.
sha512-armv8.pl is modified slightly from upstream to switch from conditioning
on the output file to conditioning on an extra argument. This makes our
previous change from upstream (removing the 'open STDOUT' line) more explicit.
BUG=338886
Change-Id: Ic8ca1388ae20e94566f475bad3464ccc73f445df
Reviewed-on: https://boringssl-review.googlesource.com/4405
Reviewed-by: Adam Langley <agl@google.com>
This affects bignum and sha. Also now that we're passing the SSE2 flag, revert
the change to ghash-x86.pl which unconditionally sets $sse2, just to minimize
upstream divergence. Chromium assumes SSE2 support, so relying on it is okay.
See https://crbug.com/349320.
Note: this change needs to be mirrored in Chromium to take.
bssl speed numbers:
SSE2:
Did 552 RSA 2048 signing operations in 3007814us (183.5 ops/sec)
Did 19003 RSA 2048 verify operations in 3070779us (6188.3 ops/sec)
Did 72 RSA 4096 signing operations in 3055885us (23.6 ops/sec)
Did 4650 RSA 4096 verify operations in 3024926us (1537.2 ops/sec)
Without SSE2:
Did 350 RSA 2048 signing operations in 3042021us (115.1 ops/sec)
Did 11760 RSA 2048 verify operations in 3003197us (3915.8 ops/sec)
Did 46 RSA 4096 signing operations in 3042692us (15.1 ops/sec)
Did 3400 RSA 4096 verify operations in 3083035us (1102.8 ops/sec)
SSE2:
Did 16407000 SHA-1 (16 bytes) operations in 3000141us (5468743.0 ops/sec): 87.5 MB/s
Did 4367000 SHA-1 (256 bytes) operations in 3000436us (1455455.1 ops/sec): 372.6 MB/s
Did 185000 SHA-1 (8192 bytes) operations in 3002666us (61611.9 ops/sec): 504.7 MB/s
Did 9444000 SHA-256 (16 bytes) operations in 3000052us (3147945.4 ops/sec): 50.4 MB/s
Did 2283000 SHA-256 (256 bytes) operations in 3000457us (760884.1 ops/sec): 194.8 MB/s
Did 89000 SHA-256 (8192 bytes) operations in 3016024us (29509.0 ops/sec): 241.7 MB/s
Did 5550000 SHA-512 (16 bytes) operations in 3000350us (1849784.2 ops/sec): 29.6 MB/s
Did 1820000 SHA-512 (256 bytes) operations in 3001039us (606456.6 ops/sec): 155.3 MB/s
Did 93000 SHA-512 (8192 bytes) operations in 3007874us (30918.8 ops/sec): 253.3 MB/s
Without SSE2:
Did 10573000 SHA-1 (16 bytes) operations in 3000261us (3524026.7 ops/sec): 56.4 MB/s
Did 2937000 SHA-1 (256 bytes) operations in 3000621us (978797.4 ops/sec): 250.6 MB/s
Did 123000 SHA-1 (8192 bytes) operations in 3033202us (40551.2 ops/sec): 332.2 MB/s
Did 5846000 SHA-256 (16 bytes) operations in 3000294us (1948475.7 ops/sec): 31.2 MB/s
Did 1377000 SHA-256 (256 bytes) operations in 3000335us (458948.8 ops/sec): 117.5 MB/s
Did 54000 SHA-256 (8192 bytes) operations in 3027962us (17833.8 ops/sec): 146.1 MB/s
Did 2075000 SHA-512 (16 bytes) operations in 3000967us (691443.8 ops/sec): 11.1 MB/s
Did 638000 SHA-512 (256 bytes) operations in 3000576us (212625.8 ops/sec): 54.4 MB/s
Did 30000 SHA-512 (8192 bytes) operations in 3042797us (9859.3 ops/sec): 80.8 MB/s
BUG=430237
Change-Id: I47d1c1ffcd71afe4f4a192272f8cb92af9505ee1
Reviewed-on: https://boringssl-review.googlesource.com/4130
Reviewed-by: Adam Langley <agl@google.com>
This reverts the non-ARM portions of 97999919bb.
x86_64 perlasm already makes .globl imply .hidden. (Confusingly, ARM does not.)
Since we don't need it, revert those to minimize divergence with upstream.
Change-Id: I2d205cfb1183e65d4f18a62bde187d206b1a96de
Reviewed-on: https://boringssl-review.googlesource.com/3610
Reviewed-by: Adam Langley <agl@google.com>
We are leaking asm symbols in Android builds because the asm code isn't
affected by -fvisibility=hidden. This change hides all asm symbols.
This assumes that no asm symbols are public API and that should be true.
Some points to note:
In crypto/rc4/asm/rc4-md5-x86_64.pl there are |RC4_set_key| and
|RC4_options| functions which aren't getting marked as hidden. That's
because those functions aren't actually ever generated. (I'm just trying
to minimise drift with upstream here.)
In crypto/rc4/asm/rc4-x86_64.pl there's |RC4_options| which is "public"
API, except that we've never had it in the header files. So I've just
deleted it. Since we have an internal caller, we'll probably have to put
it back in the future, but it can just be done in rc4.c to save
problems.
BUG=448386
Change-Id: I3846617a0e3d73ec9e5ec3638a53364adbbc6260
Reviewed-on: https://boringssl-review.googlesource.com/3520
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
(Imported from upstream's b3d7294976c58e0e05d0ee44a0e7c9c3b8515e05.)
May as well avoid diverging.
Change-Id: I3edec4fe15b492dd3bfb3146a8944acc6575f861
Reviewed-on: https://boringssl-review.googlesource.com/3020
Reviewed-by: Adam Langley <agl@google.com>
This is an initial cut at aarch64 support. I have only qemu to test it
however—hopefully hardware will be coming soon.
This also affects 32-bit ARM in that aarch64 chips can run 32-bit code
and we would like to be able to take advantage of the crypto operations
even in 32-bit mode. AES and GHASH should Just Work in this case: the
-armx.pl files can be built for either 32- or 64-bit mode based on the
flavour argument given to the Perl script.
SHA-1 and SHA-256 don't work like this however because they've never
support for multiple implementations, thus BoringSSL built for 32-bit
won't use the SHA instructions on an aarch64 chip.
No dedicated ChaCha20 or Poly1305 support yet.
Change-Id: Ib275bc4894a365c8ec7c42f4e91af6dba3bd686c
Reviewed-on: https://boringssl-review.googlesource.com/2801
Reviewed-by: Adam Langley <agl@google.com>
Clang's integrated as accepts unified ARM syntax only. This change
updates the GHASH ARM asm to use that syntax and thus be compatible.
Patch from Nico Weber.
https://code.google.com/p/chromium/issues/detail?id=124610
Change-Id: Ie6f3de4e37286f0af39196fad33905f7dee7402e
This change marks public symbols as dynamically exported. This means
that it becomes viable to build a shared library of libcrypto and libssl
with -fvisibility=hidden.
On Windows, one not only needs to mark functions for export in a
component, but also for import when using them from a different
component. Because of this we have to build with
|BORINGSSL_IMPLEMENTATION| defined when building the code. Other
components, when including our headers, won't have that defined and then
the |OPENSSL_EXPORT| tag becomes an import tag instead. See the #defines
in base.h
In the asm code, symbols are now hidden by default and those that need
to be exported are wrapped by a C function.
In order to support Chromium, a couple of libssl functions were moved to
ssl.h from ssl_locl.h: ssl_get_new_session and ssl_update_cache.
Change-Id: Ib4b76e2f1983ee066e7806c24721e8626d08a261
Reviewed-on: https://boringssl-review.googlesource.com/1350
Reviewed-by: Adam Langley <agl@google.com>
(Imported from upstream's 912f08dd5ed4f68fb275f3b2db828349fcffba14,
52f856526c46ee80ef4c8c37844f084423a3eff7 and
377551b9c4e12aa7846f4d80cf3604f2e396c964)
Change-Id: Ic2bf93371f6d246818729810e7a45b3f0021845a
Câmara, D.; Gouvêa, C. P. L.; López, J. & Dahab, R.: Fast Software
Polynomial Multiplication on ARM Processors using the NEON Engine.
http://conradoplg.cryptoland.net/files/2010/12/mocrysen13.pdf
(Imported from upstream's 0fb3d5b4fdc76b8d4a4700d03480cda135c6c117)
Initial fork from f2d678e6e89b6508147086610e985d4e8416e867 (1.0.2 beta).
(This change contains substantial changes from the original and
effectively starts a new history.)