Commit Graph

72 Commits

Author SHA1 Message Date
David Benjamin
17cf2cb1d2 Work around language and compiler bug in memcpy, etc.
Most C standard library functions are undefined if passed NULL, even
when the corresponding length is zero. This gives them (and, in turn,
all functions which call them) surprising behavior on empty arrays.
Some compilers will miscompile code due to this rule. See also
https://www.imperialviolet.org/2016/06/26/nonnull.html

Add OPENSSL_memcpy, etc., wrappers which avoid this problem.

BUG=23

Change-Id: I95f42b23e92945af0e681264fffaf578e7f8465e
Reviewed-on: https://boringssl-review.googlesource.com/12928
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2016-12-21 20:34:47 +00:00
Adam Langley
509889d3d0 Sync with upstream's version of sha256-armv4.pl.
This change imports sha256-armv4.pl from upstream at rev 8d1ebff4. This
includes changes to remove the use of adrl, which is not supported by
Clang.

Change-Id: I429e7051d63b59acad21601e40883fc3bd8dd2f5
Reviewed-on: https://boringssl-review.googlesource.com/12480
Commit-Queue: Adam Langley <alangley@gmail.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2016-11-30 17:37:24 +00:00
Doug Kwan
7da8ea72a6 Add forward declaration to avoid a compiler warning
This prevents a compiler warning from breaking ppc64le build.

Change-Id: I6752109bd02c6d078e656f89327093f8fb13a125
Reviewed-on: https://boringssl-review.googlesource.com/12363
Reviewed-by: Adam Langley <agl@google.com>
2016-11-18 00:25:50 +00:00
Doug Kwan
5f04b6bc3a Add ppc64le vector implementation of SHA-1.
This change contains a C implementation of SHA-1 for POWER using
AltiVec. It is almost as fast as the scalar-only assembly implementation
for POWER/POWERPC family in OpenSSL but it is easier to maintain and it
allows error checking with tools like ASAN.

This is tested only for ppc64le. It may nor may not work for other
platforms in the POWER/POWERPC familiy.

Before:

SHA-1 @ 16 bytes: ~30 MB/s
SHA-1 @ 8K: ~140 MB/s

After:

SHA-1 @ 16 bytes: ~70 MB/s
SHA-1 @ 8K: ~480 MB/s

Change-Id: I790352e86d9c0cc4e1e57d11c5a0aa5b0780ca6b
Reviewed-on: https://boringssl-review.googlesource.com/12203
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2016-11-17 18:38:14 +00:00
David Benjamin
b1133e9565 Fix up macros.
Macros need a healthy dose of parentheses to avoid expression-level
misparses. Most of this comes from the clang-tidy CL here:
https://android-review.googlesource.com/c/235696/

Also switch most of the macros to use do { ... } while (0) to avoid all
the excessive comma operators and statement-level misparses.

Change-Id: I4c2ee51e347d2aa8c74a2d82de63838b03bbb0f9
Reviewed-on: https://boringssl-review.googlesource.com/11660
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2016-10-18 18:28:23 +00:00
David Benjamin
722ba2db77 sha/asm/sha1-x86_64.pl: fix crash in SHAEXT code on Windows.
RT#4530

(Imported from upstream's 7123aa81e9fb19afb11fdf3850662c5f7ff1f19c.)

We've yet to enable this code, but this confirms that we do indeed need
to get our future all-variants stuff working on Windows as well as
Linux and find an AVX2-capable CI setup on each.

The crash here is caused by some win64-only code using %rax as a frame
pointer (perlasm injects a mov rax,rsp in the prologue of every win64
function).

Change-Id: Ifbe59ceb6ae29266d9cf8a461920344a32b6e555
Reviewed-on: https://boringssl-review.googlesource.com/10366
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2016-08-16 19:46:06 +00:00
Adam Langley
5a8d48ee8c Fix the comments for |SHA[256|384|512]_Transform|.
Change-Id: I6d552d26b3d72f6fffdc4d4d9fc3b5d82fb4e8bb
Reviewed-on: https://boringssl-review.googlesource.com/9010
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2016-07-28 21:49:48 +00:00
David Benjamin
fdd8e9c8c7 Switch perlasm calling convention.
Depending on architecture, perlasm differed on which one or both of:

  perl foo.pl flavor output.S
  perl foo.pl flavor > output.S

Upstream has now unified on the first form after making a number of
changes to their files (the second does not even work for their x86
files anymore). Sync those portions of our perlasm scripts with upstream
and update CMakeLists.txt and generate_build_files.py per the new
convention.

This imports various commits like this one:
184bc45f683c76531d7e065b6553ca9086564576 (this was done by taking a
diff, so I don't have the full list)

Confirmed that generate_build_files.py sees no change.

BUG=14

Change-Id: Id2fb5b8bc2a7369d077221b5df9a6947d41f50d2
Reviewed-on: https://boringssl-review.googlesource.com/8518
Reviewed-by: Adam Langley <agl@google.com>
2016-06-27 21:59:26 +00:00
David Benjamin
862c0aa880 Revert md_len removal from SHA256_CTX and SHA512_CTX.
This reverts commits:
- 9158637142
- a90aa64302
- c0d8b83b44

It turns out code outside of BoringSSL also mismatches Init and Update/Final
functions. Since this is largely cosmetic, it's probably not worth the cost to
do this.

Change-Id: I14e7b299172939f69ced2114be45ccba1dbbb704
Reviewed-on: https://boringssl-review.googlesource.com/7793
Reviewed-by: Adam Langley <agl@google.com>
2016-04-27 19:01:23 +00:00
David Benjamin
88e27bcbe0 Don't mismatch Init and Update/Final hash functions.
Fixes the ASan bot.

Change-Id: I29b9b98680b634c5e486a734afa38f9d4e458518
Reviewed-on: https://boringssl-review.googlesource.com/7792
Reviewed-by: Adam Langley <agl@google.com>
2016-04-27 18:53:00 +00:00
David Benjamin
9158637142 Make SHA256_Final actually only return one.
As with SHA512_Final, use the different APIs rather than store md_len.

Change-Id: Ie1150de6fefa96f283d47aa03de0f18de38c93eb
Reviewed-on: https://boringssl-review.googlesource.com/7722
Reviewed-by: Adam Langley <agl@google.com>
2016-04-27 18:46:17 +00:00
David Benjamin
a90aa64302 Pull HASH_MAKE_STRING out of md32_common.h.
This is in preparation for taking md_len out of SHA256_CTX by allowing us to do
something similar to SHA512_CTX. md32_common.h now emits a static "finish"
function which Final composes with the extraction step.

Change-Id: I314fb31e2482af642fd280500cc0e4716aef1ac6
Reviewed-on: https://boringssl-review.googlesource.com/7721
Reviewed-by: Adam Langley <agl@google.com>
2016-04-27 18:45:12 +00:00
David Benjamin
c0d8b83b44 Make SHA512_Final actually only return one.
Rather than store md_len, factor out the common parts of SHA384_Final and
SHA512_Final and then extract the right state. Also add a missing
SHA384_Transform and be consistent about "1" vs "one" in comments.

This also removes the NULL output special-case which no other hash function
had.

Change-Id: If60008bae7d7d5b123046a46d8fd64139156a7c5
Reviewed-on: https://boringssl-review.googlesource.com/7720
Reviewed-by: Adam Langley <agl@google.com>
2016-04-27 18:42:37 +00:00
David Benjamin
0182ecd346 Consistently use named constants in ARM assembly files.
Most of the OPENSSL_armcap_P accesses in assembly use named constants from
arm_arch.h, but some don't. Consistently use the constants. The dispatch really
should be in C, but in the meantime, make it easier to tell what's going on.

I'll send this patch upstream so we won't be carrying a diff here.

Change-Id: I63c68d2351ea5ce11005813314988e32b6459526
Reviewed-on: https://boringssl-review.googlesource.com/7203
Reviewed-by: Adam Langley <agl@google.com>
2016-02-23 17:18:18 +00:00
David Benjamin
017231a544 Remove asm __asm__ define.
It's only used in one file. No sense in polluting the namespace here.

Change-Id: Iaf3870a4be2d2cad950f4d080e25fe7f0d3929c7
Reviewed-on: https://boringssl-review.googlesource.com/6660
Reviewed-by: Adam Langley <agl@google.com>
2015-12-16 20:03:17 +00:00
David Benjamin
793c21e266 Make HOST_l2c return void.
Nothing ever uses the return value. It'd be better off discarding it rather
than make callers stick (void) everywhere.

Change-Id: Ia28c970a1e5a27db441e4511249589d74408849b
Reviewed-on: https://boringssl-review.googlesource.com/6653
Reviewed-by: Adam Langley <agl@google.com>
2015-12-16 20:02:37 +00:00
David Benjamin
5a19d7dfa8 Use the straight-forward ROTATE macro.
I would hope any sensible compiler would recognize the rotation. (If
not, we should at least pull this into crypto/internal.h.) Confirmed
that clang at least produces the exact same instructions for
sha256_block_data_order for release + NO_ASM. This is also mostly moot
as SHA-1 and SHA-256 both have assembly versions on x86 that sidestep
most of this.

For the digests, take it out of md32_common.h since it doesn't use the
macro. md32_common.h isn't sure whether it's a multiply-included header
or not. It should be, but it has an #include guard (doesn't quite do
what you'd want) and will get HOST_c2l, etc., confused if one tries to
include it twice.

Change-Id: I1632801de6473ffd2c6557f3412521ec5d6b305c
Reviewed-on: https://boringssl-review.googlesource.com/6650
Reviewed-by: Adam Langley <agl@google.com>
2015-12-16 19:57:31 +00:00
David Benjamin
1246670caa Use UINT64_C in sha512.c table.
From the naclports patch:
https://chromium.googlesource.com/external/naclports/+/master/ports/boringssl/nacl.patch

Change-Id: I37ad45fbde0b1b1437d3abd8ed7422bd72cfd959
Reviewed-on: https://boringssl-review.googlesource.com/6623
Reviewed-by: Adam Langley <agl@google.com>
2015-12-15 20:30:10 +00:00
David Benjamin
2077cf9152 Use UINT64_C instead of OPENSSL_U64.
stdint.h already has macros for this. The spec says that, in C++,
__STDC_CONSTANT_MACROS is needed, so define it for bytestring_test.cc.
Chromium seems to use these macros without trouble, so I'm assuming we
can rely on them.

Change-Id: I56d178689b44d22c6379911bbb93d3b01dd832a3
Reviewed-on: https://boringssl-review.googlesource.com/6510
Reviewed-by: Adam Langley <agl@google.com>
2015-11-16 23:18:00 +00:00
David Benjamin
ce7ae6fa27 Enable AVX code for SHA-*.
SHA-1, SHA-256, and SHA-512 get a 12-26%, 17-23%, and 33-37% improvement,
respectively on x86-64. SHA-1 and SHA-256 get a 8-20% and 14-17% improvement on
x86. (x86 does not have AVX code for SHA-512.) This costs us 12k of binary size
on x86-64 and 8k of binary size on x86.

$ bssl speed SHA- (x86-64, before)
Did 4811000 SHA-1 (16 bytes) operations in 1000013us (4810937.5 ops/sec): 77.0 MB/s
Did 1414000 SHA-1 (256 bytes) operations in 1000253us (1413642.3 ops/sec): 361.9 MB/s
Did 56000 SHA-1 (8192 bytes) operations in 1002640us (55852.5 ops/sec): 457.5 MB/s
Did 2536000 SHA-256 (16 bytes) operations in 1000140us (2535645.0 ops/sec): 40.6 MB/s
Did 603000 SHA-256 (256 bytes) operations in 1001613us (602028.9 ops/sec): 154.1 MB/s
Did 25000 SHA-256 (8192 bytes) operations in 1010132us (24749.2 ops/sec): 202.7 MB/s
Did 1767000 SHA-512 (16 bytes) operations in 1000477us (1766157.5 ops/sec): 28.3 MB/s
Did 638000 SHA-512 (256 bytes) operations in 1000933us (637405.3 ops/sec): 163.2 MB/s
Did 32000 SHA-512 (8192 bytes) operations in 1025646us (31199.8 ops/sec): 255.6 MB/s

$ bssl speed SHA- (x86-64, after)
Did 5438000 SHA-1 (16 bytes) operations in 1000060us (5437673.7 ops/sec): 87.0 MB/s
Did 1590000 SHA-1 (256 bytes) operations in 1000181us (1589712.3 ops/sec): 407.0 MB/s
Did 71000 SHA-1 (8192 bytes) operations in 1007958us (70439.4 ops/sec): 577.0 MB/s
Did 2955000 SHA-256 (16 bytes) operations in 1000251us (2954258.5 ops/sec): 47.3 MB/s
Did 740000 SHA-256 (256 bytes) operations in 1000628us (739535.6 ops/sec): 189.3 MB/s
Did 31000 SHA-256 (8192 bytes) operations in 1019619us (30403.5 ops/sec): 249.1 MB/s
Did 2348000 SHA-512 (16 bytes) operations in 1000285us (2347331.0 ops/sec): 37.6 MB/s
Did 878000 SHA-512 (256 bytes) operations in 1001064us (877066.8 ops/sec): 224.5 MB/s
Did 43000 SHA-512 (8192 bytes) operations in 1002485us (42893.4 ops/sec): 351.4 MB/s

$ bssl speed SHA- (x86, before, SHA-512 redacted because irrelevant)
Did 4319000 SHA-1 (16 bytes) operations in 1000066us (4318715.0 ops/sec): 69.1 MB/s
Did 1306000 SHA-1 (256 bytes) operations in 1000437us (1305429.5 ops/sec): 334.2 MB/s
Did 58000 SHA-1 (8192 bytes) operations in 1014807us (57153.7 ops/sec): 468.2 MB/s
Did 2291000 SHA-256 (16 bytes) operations in 1000343us (2290214.5 ops/sec): 36.6 MB/s
Did 594000 SHA-256 (256 bytes) operations in 1000684us (593594.0 ops/sec): 152.0 MB/s
Did 25000 SHA-256 (8192 bytes) operations in 1030688us (24255.6 ops/sec): 198.7 MB/s

$ bssl speed SHA- (x86, after, SHA-512 redacted because irrelevant)
Did 4673000 SHA-1 (16 bytes) operations in 1000063us (4672705.6 ops/sec): 74.8 MB/s
Did 1484000 SHA-1 (256 bytes) operations in 1000453us (1483328.1 ops/sec): 379.7 MB/s
Did 69000 SHA-1 (8192 bytes) operations in 1008305us (68431.7 ops/sec): 560.6 MB/s
Did 2684000 SHA-256 (16 bytes) operations in 1000196us (2683474.0 ops/sec): 42.9 MB/s
Did 679000 SHA-256 (256 bytes) operations in 1000525us (678643.7 ops/sec): 173.7 MB/s
Did 29000 SHA-256 (8192 bytes) operations in 1033251us (28066.8 ops/sec): 229.9 MB/s

Change-Id: I952a3b4fc4c52ebb50690da3b8c97770e8342e98
Reviewed-on: https://boringssl-review.googlesource.com/6470
Reviewed-by: Adam Langley <agl@google.com>
2015-11-12 20:03:32 +00:00
Brian Smith
2e24b9bf73 Allow SHA-512 unaligned data access in |OPENSSL_NO_ASM| mode.
The previous logic only defined
|SHA512_BLOCK_CAN_MANAGE_UNALIGNED_DATA| when the assembly language
optimizations were enabled, but
|SHA512_BLOCK_CAN_MANAGE_UNALIGNED_DATA| is also useful when the C
implementations are used.

If support for ARM processors that don't support unaligned access is
important, then it might be better to condition the enabling of
|SHA512_BLOCK_CAN_MANAGE_UNALIGNED_DATA| on ARM based on more specific
flags.

Change-Id: Ie8c37c73aba308c3ccf79371ce5831512e419989
Reviewed-on: https://boringssl-review.googlesource.com/6402
Reviewed-by: Adam Langley <agl@google.com>
2015-11-06 20:06:54 +00:00
Brian Smith
9d94d5e4ae Remove untested, unnecessary big-endian SHA-1/SHA-256 optimizations.
Change-Id: I3939923d297b901719cb3fb4cff20e770f780c7a
Reviewed-on: https://boringssl-review.googlesource.com/6441
Reviewed-by: Adam Langley <agl@google.com>
2015-11-06 19:36:24 +00:00
Brian Smith
ac9404c3a8 Improve crypto/digest/md32_common.h mechanism.
The documentation in md32_common.h is now (more) correct with respect
to the most important details of the layout of |HASH_CTX|. The
documentation explaining why sha512.c doesn't use md32_common.h is now
more accurate as well.

Before, the C implementations of HASH_BLOCK_DATA_ORDER took a pointer
to the |HASH_CTX| and the assembly language implementations took a
pointer to the hash state |h| member of |HASH_CTX|. (This worked
because |h| is always the first member of |HASH_CTX|.) Now, the C
implementations take a pointer directly to |h| too.

The definitions of |MD4_CTX|, |MD5_CTX|, and |SHA1_CTX| were changed to
be consistent with |SHA256_CTX| and |SHA512_CTX| in storing the hash
state in an array. This will break source compatibility with any
external code that accesses the hash state directly, but will not
affect binary compatibility.

The second parameter of |HASH_BLOCK_DATA_ORDER| is now of type
|const uint8_t *|; previously it was |void *| and all implementations
had a |uint8_t *data| variable to access it as an array of bytes.

This change paves the way for future refactorings such as automatically
generating the |*_Init| functions and/or sharing one I-U-F
implementation across all digest algorithms.

Change-Id: I6e9dd09ff057c67941021d324a4fa1d39f58b0db
Reviewed-on: https://boringssl-review.googlesource.com/6405
Reviewed-by: Adam Langley <agl@google.com>
2015-11-04 00:01:09 +00:00
Adam Langley
f1c1cf8794 Revert "Improve crypto/digest/md32_common.h mechanism."
This reverts commit 00461cf201.

Sadly it broke wpa_supplicant.
2015-11-02 18:14:34 -08:00
Brian Smith
00461cf201 Improve crypto/digest/md32_common.h mechanism.
The documentation in md32_common.h is now (more) correct with respect
to the most important details of the layout of |HASH_CTX|. The
documentation explaining why sha512.c doesn't use md32_common.h is now
more accurate as well.

Before, the C implementations of HASH_BLOCK_DATA_ORDER took a pointer
to the |HASH_CTX| and the assembly language implementations tool a
pointer to the hash state |h| member of |HASH_CTX|. (This worked
because |h| is always the first member of |HASH_CTX|.) Now, the C
implementations take a pointer directly to |h| too.

The definitions of |MD4_CTX|, |MD5_CTX|, and |SHA1_CTX| were changed to
be consistent with |SHA256_CTX| and |SHA512_CTX| in storing the hash
state in an array. This will break source compatibility with any
external code that accesses the hash state directly, but will not
affect binary compatibility.

The second parameter of |HASH_BLOCK_DATA_ORDER| is now of type
|const uint8_t *|; previously it was |void *| and all implementations
had a |uint8_t *data| variable to access it as an array of bytes.

This change paves the way for future refactorings such as automatically
generating the |*_Init| functions and/or sharing one I-U-F
implementation across all digest algorithms.

Change-Id: I30513bb40b5f1d2c8932551d54073c35484b3f8b
Reviewed-on: https://boringssl-review.googlesource.com/6401
Reviewed-by: Adam Langley <agl@google.com>
2015-11-03 02:04:38 +00:00
David Benjamin
278d34234f Get rid of all compiler version checks in perlasm files.
Since we pre-generate our perlasm, having the output of these files be
sensitive to the environment the run in is unhelpful. It would be bad to
suddenly change what features we do or don't compile in whenever workstations'
toolchains change or if developers do or don't have CC variables set.

Previously, all compiler-version-gated features were turned on in
https://boringssl-review.googlesource.com/6260, but this broke the build. I
also wasn't thorough enough in gathering performance numbers. So, flip them all
to off instead. I'll enable them one-by-one as they're tested.

This should result in no change to generated assembly.

Change-Id: Ib4259b3f97adc4939cb0557c5580e8def120d5bc
Reviewed-on: https://boringssl-review.googlesource.com/6383
Reviewed-by: Adam Langley <agl@google.com>
2015-10-28 19:33:04 +00:00
David Benjamin
75885e29c4 Revert "Get rid of all compiler version checks in perlasm files."
This reverts commit b9c26014de.

The win64 bot seems unhappy. Will sniff at it tomorrow. In
the meantime, get the tree green again.

Change-Id: I058ddb3ec549beee7eabb2f3f72feb0a4a5143b2
Reviewed-on: https://boringssl-review.googlesource.com/6353
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 23:12:39 +00:00
David Benjamin
b9c26014de Get rid of all compiler version checks in perlasm files.
Since we pre-generate our perlasm, having the output of these files be
sensitive to the environment the run in is unhelpful. It would be bad to
suddenly change what features we do or don't compile in whenever workstations'
toolchains change.

Enable all compiler-version-gated features as they should all be runtime-gated
anyway. This should align with what upstream's files would have produced on
modern toolschains. We should assume our assemblers can take whatever we'd like
to throw at them. (If it turns out some can't, we'd rather find out and
probably switch the problematic instructions to explicit byte sequences.)

This actually results in a fairly significant change to the assembly we
generate. I'm guessing upstream's buildsystem sets the CC environment variable,
while ours doesn't and so the version checks were all coming out conservative.

diffstat of generated files:

 linux-x86/crypto/sha/sha1-586.S              | 1176 ++++++++++++
 linux-x86/crypto/sha/sha256-586.S            | 2248 ++++++++++++++++++++++++
 linux-x86_64/crypto/bn/rsaz-avx2.S           | 1644 +++++++++++++++++
 linux-x86_64/crypto/bn/rsaz-x86_64.S         |  638 ++++++
 linux-x86_64/crypto/bn/x86_64-mont.S         |  332 +++
 linux-x86_64/crypto/bn/x86_64-mont5.S        | 1130 ++++++++++++
 linux-x86_64/crypto/modes/aesni-gcm-x86_64.S |  754 ++++++++
 linux-x86_64/crypto/modes/ghash-x86_64.S     |  475 +++++
 linux-x86_64/crypto/sha/sha1-x86_64.S        | 1121 ++++++++++++
 linux-x86_64/crypto/sha/sha256-x86_64.S      | 1062 +++++++++++
 linux-x86_64/crypto/sha/sha512-x86_64.S      | 2241 ++++++++++++++++++++++++
 mac-x86/crypto/sha/sha1-586.S                | 1174 ++++++++++++
 mac-x86/crypto/sha/sha256-586.S              | 2248 ++++++++++++++++++++++++
 mac-x86_64/crypto/bn/rsaz-avx2.S             | 1637 +++++++++++++++++
 mac-x86_64/crypto/bn/rsaz-x86_64.S           |  638 ++++++
 mac-x86_64/crypto/bn/x86_64-mont.S           |  331 +++
 mac-x86_64/crypto/bn/x86_64-mont5.S          | 1130 ++++++++++++
 mac-x86_64/crypto/modes/aesni-gcm-x86_64.S   |  750 ++++++++
 mac-x86_64/crypto/modes/ghash-x86_64.S       |  475 +++++
 mac-x86_64/crypto/sha/sha1-x86_64.S          | 1121 ++++++++++++
 mac-x86_64/crypto/sha/sha256-x86_64.S        | 1062 +++++++++++
 mac-x86_64/crypto/sha/sha512-x86_64.S        | 2241 ++++++++++++++++++++++++
 win-x86/crypto/sha/sha1-586.asm              | 1173 ++++++++++++
 win-x86/crypto/sha/sha256-586.asm            | 2248 ++++++++++++++++++++++++
 win-x86_64/crypto/bn/rsaz-avx2.asm           | 1858 +++++++++++++++++++-
 win-x86_64/crypto/bn/rsaz-x86_64.asm         |  638 ++++++
 win-x86_64/crypto/bn/x86_64-mont.asm         |  352 +++
 win-x86_64/crypto/bn/x86_64-mont5.asm        | 1184 ++++++++++++
 win-x86_64/crypto/modes/aesni-gcm-x86_64.asm |  933 ++++++++++
 win-x86_64/crypto/modes/ghash-x86_64.asm     |  515 +++++
 win-x86_64/crypto/sha/sha1-x86_64.asm        | 1152 ++++++++++++
 win-x86_64/crypto/sha/sha256-x86_64.asm      | 1088 +++++++++++
 win-x86_64/crypto/sha/sha512-x86_64.asm      | 2499 ++++++

SHA* gets faster. RSA and AES-GCM seem to be more of a wash and even slower
sometimes!  This is a little concerning. Though when I repeated the latter two,
it's definitely noisy (RSA in particular), so we may wish to repeat in a more
controlled environment. We could also flip some of these toggles to something
other than the highest setting if it seems some of the variants aren't
desirable. We just shouldn't have them enabled or disabled on accident. This
aligns us closer to upstream though.

$ /tmp/bssl.old speed SHA-
Did 5028000 SHA-1 (16 bytes) operations in 1000048us (5027758.7 ops/sec): 80.4 MB/s
Did 1708000 SHA-1 (256 bytes) operations in 1000257us (1707561.2 ops/sec): 437.1 MB/s
Did 73000 SHA-1 (8192 bytes) operations in 1008406us (72391.5 ops/sec): 593.0 MB/s
Did 3041000 SHA-256 (16 bytes) operations in 1000311us (3040054.5 ops/sec): 48.6 MB/s
Did 779000 SHA-256 (256 bytes) operations in 1000820us (778361.7 ops/sec): 199.3 MB/s
Did 26000 SHA-256 (8192 bytes) operations in 1009875us (25745.8 ops/sec): 210.9 MB/s
Did 1837000 SHA-512 (16 bytes) operations in 1000251us (1836539.0 ops/sec): 29.4 MB/s
Did 803000 SHA-512 (256 bytes) operations in 1000969us (802222.6 ops/sec): 205.4 MB/s
Did 41000 SHA-512 (8192 bytes) operations in 1016768us (40323.8 ops/sec): 330.3 MB/s
$ /tmp/bssl.new speed SHA-
Did 5354000 SHA-1 (16 bytes) operations in 1000104us (5353443.2 ops/sec): 85.7 MB/s
Did 1779000 SHA-1 (256 bytes) operations in 1000121us (1778784.8 ops/sec): 455.4 MB/s
Did 87000 SHA-1 (8192 bytes) operations in 1012641us (85914.0 ops/sec): 703.8 MB/s
Did 3517000 SHA-256 (16 bytes) operations in 1000114us (3516599.1 ops/sec): 56.3 MB/s
Did 935000 SHA-256 (256 bytes) operations in 1000096us (934910.2 ops/sec): 239.3 MB/s
Did 38000 SHA-256 (8192 bytes) operations in 1004476us (37830.7 ops/sec): 309.9 MB/s
Did 2930000 SHA-512 (16 bytes) operations in 1000259us (2929241.3 ops/sec): 46.9 MB/s
Did 1008000 SHA-512 (256 bytes) operations in 1000509us (1007487.2 ops/sec): 257.9 MB/s
Did 45000 SHA-512 (8192 bytes) operations in 1000593us (44973.3 ops/sec): 368.4 MB/s

$ /tmp/bssl.old speed RSA
Did 820 RSA 2048 signing operations in 1017008us (806.3 ops/sec)
Did 27000 RSA 2048 verify operations in 1015400us (26590.5 ops/sec)
Did 1292 RSA 2048 (3 prime, e=3) signing operations in 1008185us (1281.5 ops/sec)
Did 65000 RSA 2048 (3 prime, e=3) verify operations in 1011388us (64268.1 ops/sec)
Did 120 RSA 4096 signing operations in 1061027us (113.1 ops/sec)
Did 8208 RSA 4096 verify operations in 1002717us (8185.8 ops/sec)
$ /tmp/bssl.new speed RSA
Did 760 RSA 2048 signing operations in 1003351us (757.5 ops/sec)
Did 25900 RSA 2048 verify operations in 1028931us (25171.8 ops/sec)
Did 1320 RSA 2048 (3 prime, e=3) signing operations in 1040806us (1268.2 ops/sec)
Did 63000 RSA 2048 (3 prime, e=3) verify operations in 1016042us (62005.3 ops/sec)
Did 104 RSA 4096 signing operations in 1008718us (103.1 ops/sec)
Did 6875 RSA 4096 verify operations in 1093441us (6287.5 ops/sec)

$ /tmp/bssl.old speed GCM
Did 5316000 AES-128-GCM (16 bytes) seal operations in 1000082us (5315564.1 ops/sec): 85.0 MB/s
Did 712000 AES-128-GCM (1350 bytes) seal operations in 1000252us (711820.6 ops/sec): 961.0 MB/s
Did 149000 AES-128-GCM (8192 bytes) seal operations in 1003182us (148527.4 ops/sec): 1216.7 MB/s
Did 5919750 AES-256-GCM (16 bytes) seal operations in 1000016us (5919655.3 ops/sec): 94.7 MB/s
Did 800000 AES-256-GCM (1350 bytes) seal operations in 1000951us (799239.9 ops/sec): 1079.0 MB/s
Did 152000 AES-256-GCM (8192 bytes) seal operations in 1000765us (151883.8 ops/sec): 1244.2 MB/s
$ /tmp/bssl.new speed GCM
Did 5315000 AES-128-GCM (16 bytes) seal operations in 1000125us (5314335.7 ops/sec): 85.0 MB/s
Did 755000 AES-128-GCM (1350 bytes) seal operations in 1000878us (754337.7 ops/sec): 1018.4 MB/s
Did 151000 AES-128-GCM (8192 bytes) seal operations in 1005655us (150150.9 ops/sec): 1230.0 MB/s
Did 5913500 AES-256-GCM (16 bytes) seal operations in 1000041us (5913257.6 ops/sec): 94.6 MB/s
Did 782000 AES-256-GCM (1350 bytes) seal operations in 1001484us (780841.2 ops/sec): 1054.1 MB/s
Did 121000 AES-256-GCM (8192 bytes) seal operations in 1006389us (120231.8 ops/sec): 984.9 MB/s

Change-Id: I0efb32f896c597abc7d7e55c31d038528a5c72a1
Reviewed-on: https://boringssl-review.googlesource.com/6260
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 20:31:30 +00:00
David Benjamin
e189c86bc7 Consistently disable the Intel SHA Extensions code.
We haven't tested it yet, but it was only disabled on 64-bit. Disable it on
32-bit as well until we're ready to turn it on.

Change-Id: I50e74aef2c5c3ba539a868c2bb6fb90fdf28a5f0
Reviewed-on: https://boringssl-review.googlesource.com/6271
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 20:27:52 +00:00
David Benjamin
178a88c26f Synchronize sha512-x86_64.pl with upstream.
We missed 7eb9680ae1bf5dd9aeb61c401f2c3bd900ac9aeb. This is a no-op as we don't
set shaext right now anyway. This also includes some cosmetic changes to
minimize the diff with upstream. ("cosmetic". Upstream's perl doesn't like
spaces.)

Change-Id: I17fa663ddaa38c27854d4f59fb83960528d9ba78
Reviewed-on: https://boringssl-review.googlesource.com/6250
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 20:27:28 +00:00
Adam Langley
0dd93002dd Revert section changes for ASM.
This change reverts the following commits:
  72d9cba7cb
  5b61b9ebc5
  3f85e04f40
  2ab24a2d40

Change-Id: I669b83f2269cf96aa71a649a346147b9407a811e
Reviewed-on: https://boringssl-review.googlesource.com/6056
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2015-09-30 22:09:52 +00:00
Adam Langley
72d9cba7cb Move .align directives next to their labels for ARM.
2ab24a2d40 added sections to ARM assembly
files. However, in cases where .align directives were not next to the
labels that they were intended to apply to, the section directives would
cause them to be ignored.

Change-Id: I32117f6747ff8545b80c70dd3b8effdc6e6f67e0
Reviewed-on: https://boringssl-review.googlesource.com/6050
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2015-09-30 18:35:29 +00:00
Adam Langley
2ab24a2d40 Put arm/aarch64 assembly functions in their own section.
This change causes each global arm or aarch64 asm function to be put
into its own section by default. This matches the behaviour of the
-ffunction-sections option to GCC and allows the --gc-sections option to
the linker to discard unused asm functions on a function-by-function
basis.

Sometimes several asm functions will share the same data an, in that
situation, the data is put into the section of one of the functions and
the section of the other function is merged with the added
“.global_with_section” directive.

Change-Id: I12c9b844d48d104d28beb816764358551eac4456
Reviewed-on: https://boringssl-review.googlesource.com/6003
Reviewed-by: Adam Langley <agl@google.com>
2015-09-29 18:02:14 +00:00
Adam Langley
73415b6aa0 Move arm_arch.h and fix up lots of include paths.
arm_arch.h is included from ARM asm files, but lives in crypto/, not
openssl/include/. Since the asm files are often built from a different
location than their position in the source tree, relative include paths
are unlikely to work so, rather than having crypto/ be a de-facto,
second global include path, this change moves arm_arch.h to
include/openssl/.

It also removes entries from many include paths because they should be
needed as relative includes are always based on the locations of the
source file.

Change-Id: I638ff43d641ca043a4fc06c0d901b11c6ff73542
Reviewed-on: https://boringssl-review.googlesource.com/5746
Reviewed-by: Adam Langley <agl@google.com>
2015-08-26 01:57:59 +00:00
David Benjamin
a3a80b23eb Convert remaining Latin-1 files to UTF-8.
See upstream's 9f0b86c68bb96d49301bbd6473c8235ca05ca06b. Generated by
using upstream's script in 5a3ce86e21715a683ff0d32421ed5c6d5e84234d and
then manually throwing out the false positives. (We converted a bunch of
stuff already in 91157550061d5d794898fe47b95384a7ba5f7b9d.)

This may require some wrestling with depot_tools to land in Chromium due
to Rietveld's encoding bugs, but hopefully that will avoid future
problems; Rietveld breaks if either old or new file is Latin-1.

Change-Id: I26dcb20c7377f92a0c843ef5d74d440a82ea8ceb
Reviewed-on: https://boringssl-review.googlesource.com/5483
Reviewed-by: Adam Langley <agl@google.com>
2015-07-29 19:22:55 +00:00
Joel Klinghed
9a4996e359 Fix compilation of sha256-armv4.S when using -march=armv6
sha256-armv4.S:1884: Error: invalid constant (ffffffffffffef90) after fixup

BUG=495695

Change-Id: I5b7423c2f7a10657c92c7b1ccae970f33c569455
Reviewed-on: https://boringssl-review.googlesource.com/4944
Reviewed-by: Adam Langley <agl@google.com>
2015-06-02 18:15:37 +00:00
David Benjamin
e2375e139e Low-level hash 'final' functions cannot fail.
The SHA-2 family has some exceptions, but they're all programmer errors
and should be documented as such. (Are the failure cases even
necessary?)

Change-Id: I00bd0a9450cff78d8caac479817fbd8d3de872b8
Reviewed-on: https://boringssl-review.googlesource.com/4953
Reviewed-by: Adam Langley <agl@google.com>
2015-06-01 22:14:01 +00:00
David Benjamin
049756be46 Fix integer types in low-level hash functions.
Use sized integer types rather than unsigned char/int/long. The latter
two are especially a mess as they're both used in lieu of uint32_t.
Sometimes the code just blindly uses unsigned long and sometimes it uses
unsigned int when an LP64 architecture would notice.

Change-Id: I4c5c6aaf82cfe9fe523435588d286726a7c43056
Reviewed-on: https://boringssl-review.googlesource.com/4952
Reviewed-by: Adam Langley <agl@google.com>
2015-06-01 22:12:21 +00:00
David Benjamin
2a2dbaa9e4 Add assembly support for 32-bit iOS.
(Imported from upstream's 313e6ec11fb8a7bda1676ce5804bee8755664141)

BUG=338886

Change-Id: Id635e78b9afaad5ca311e3aeed888c9aedeb9637
Reviewed-on: https://boringssl-review.googlesource.com/4490
Reviewed-by: Adam Langley <agl@google.com>
2015-05-04 22:44:24 +00:00
David Benjamin
69752b09e4 sha/asm/sha*-armv8.pl: add Denver and X-Gene esults.
(Imported from upstream's be5a87a1b00aceba5484a7ec198ac622c9283def)

Change-Id: I21c16b56949387a0eb3794c98550b8d7dfc4a376
Reviewed-on: https://boringssl-review.googlesource.com/4482
Reviewed-by: Adam Langley <agl@google.com>
2015-04-28 21:28:30 +00:00
David Benjamin
74f79b601d aes/asm/aesv8-armx.pl: optimize for Cortex-A5x.
ARM has optimized Cortex-A5x pipeline to favour pairs of complementary
AES instructions. While modified code improves performance of post-r0p0
Cortex-A53 performance by >40% (for CBC decrypt and CTR), it hurts
original r0p0. We favour later revisions, because one can't prevent
future from coming. Improvement on post-r0p0 Cortex-A57 exceeds 50%,
while new code is not slower on r0p0, or Apple A7 for that matter.

[Update even SHA results for latest Cortex-A53.]

(Imported from upstream's 94376cccb4ed5b376220bffe0739140ea9dad8c8)

Change-Id: I581c65b566116b1f4211fb1bd5a1a54479889d70
Reviewed-on: https://boringssl-review.googlesource.com/4481
Reviewed-by: Adam Langley <agl@google.com>
2015-04-28 21:28:06 +00:00
David Benjamin
7af16eb49f sha/asm/sha512-armv4.pl: adapt for use in Linux kernel context.
Follow-up to sha256-armv4.pl in cooperation with Ard Biesheuvel
(Linaro) and Sami Tolvanen (Google).

(Imported from upstream's b1a5d1c652086257930a1f62ae51c9cdee654b2c.)

Change-Id: Ibc4f289cc8f499924ade8d6b8d494f53bc08bda7
Reviewed-on: https://boringssl-review.googlesource.com/4467
Reviewed-by: Adam Langley <agl@google.com>
2015-04-28 20:55:54 +00:00
David Benjamin
0fd37062b6 sha/asm/sha256-armv4.pl: fix compile issue in kernel and eliminate little-endian dependency.
(Imported from upstream's 51f8d095562f36cdaa6893597b5c609e943b0565.)

I don't see why we'd care, but just to minimize divergence.

Change-Id: I4b07e72c88fcb04654ad28d8fd371e13d59a61b5
Reviewed-on: https://boringssl-review.googlesource.com/4466
Reviewed-by: Adam Langley <agl@google.com>
2015-04-28 20:55:29 +00:00
David Benjamin
256451c461 sha/asm/sha256-armv4.pl: adapt for use in Linux kernel context.
In cooperation with Ard Biesheuvel (Linaro) and Sami Tolvanen (Google).

(Imported from upstream's 2ecd32a1f8f0643ae7b38f59bbaf9f0d6ef326fe)

Change-Id: Iac5853220654b6ef4cb3bb7f8d1efe0eb2ecf634
Reviewed-on: https://boringssl-review.googlesource.com/4463
Reviewed-by: Adam Langley <agl@google.com>
2015-04-28 20:40:39 +00:00
David Benjamin
f06802f1e4 Add arm-xlate.pl and initial iOS asm support.
This is as partial import of upstream's
9b05cbc33e7895ed033b1119e300782d9e0cf23c. It includes the perlasm changes, but
not the CPU feature detection bits as we do those differently. This is largely
so we don't diverge from upstream, but it'll help with iOS assembly in the
future.

sha512-armv8.pl is modified slightly from upstream to switch from conditioning
on the output file to conditioning on an extra argument. This makes our
previous change from upstream (removing the 'open STDOUT' line) more explicit.

BUG=338886

Change-Id: Ic8ca1388ae20e94566f475bad3464ccc73f445df
Reviewed-on: https://boringssl-review.googlesource.com/4405
Reviewed-by: Adam Langley <agl@google.com>
2015-04-20 19:08:26 +00:00
David Benjamin
389939422a ARMv4 assembly pack: add Cortex-A15 performance data.
(Imported from upstream's e390ae50e0bc41676994c6fa23f7b65a8afc4d7f)

Change-Id: Ifee85b0936c06c42cc7c09f8327d15fec51da48a
Reviewed-on: https://boringssl-review.googlesource.com/3832
Reviewed-by: Adam Langley <agl@google.com>
2015-03-10 02:32:05 +00:00
David Benjamin
09bdb2a2c3 Remove explicit .hiddens from x86_64 perlasm files.
This reverts the non-ARM portions of 97999919bb.
x86_64 perlasm already makes .globl imply .hidden. (Confusingly, ARM does not.)
Since we don't need it, revert those to minimize divergence with upstream.

Change-Id: I2d205cfb1183e65d4f18a62bde187d206b1a96de
Reviewed-on: https://boringssl-review.googlesource.com/3610
Reviewed-by: Adam Langley <agl@google.com>
2015-02-25 21:26:16 +00:00
David Benjamin
2b48d6b7dd sha/asm/sha1-586.pl: fix typo.
The typo doesn't affect supported configuration, only unsupported masm.

(Imported from upstream's 3372c4fffa0556a688f8f1f550b095051398f596)

Change-Id: Ib6a2f1d9f6fc244a33da1e079188acdf69d5e2f3
Reviewed-on: https://boringssl-review.googlesource.com/3580
Reviewed-by: Adam Langley <agl@google.com>
2015-02-23 19:44:50 +00:00
Adam Langley
97999919bb Hide all asm symbols.
We are leaking asm symbols in Android builds because the asm code isn't
affected by -fvisibility=hidden. This change hides all asm symbols.

This assumes that no asm symbols are public API and that should be true.
Some points to note:

In crypto/rc4/asm/rc4-md5-x86_64.pl there are |RC4_set_key| and
|RC4_options| functions which aren't getting marked as hidden. That's
because those functions aren't actually ever generated. (I'm just trying
to minimise drift with upstream here.)

In crypto/rc4/asm/rc4-x86_64.pl there's |RC4_options| which is "public"
API, except that we've never had it in the header files. So I've just
deleted it. Since we have an internal caller, we'll probably have to put
it back in the future, but it can just be done in rc4.c to save
problems.

BUG=448386

Change-Id: I3846617a0e3d73ec9e5ec3638a53364adbbc6260
Reviewed-on: https://boringssl-review.googlesource.com/3520
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2015-02-20 21:24:01 +00:00
Adam Langley
16e38b2b8f Mark OPENSSL_armcap_P as hidden in ARM asm.
This is an import from ARM. Without this, one of the Android builds of
BoringSSL was failing with:
  (sha512-armv4.o): requires unsupported dynamic reloc R_ARM_REL32; recompile with -fPIC

This is (I believe) a very misleading error message. The R_ARM_REL32
relocation type is the correct type for position independent code. But
unless the target symbol is hidden then the linker doesn't know that
it's not going to be overridden by a different ELF module.

Chromium probably gets away with this because of different default
compiler flags than Android.

Change-Id: I967eabc4d6b33d1e6635caaf6e7a306e4e77c101
Reviewed-on: https://boringssl-review.googlesource.com/3471
Reviewed-by: Adam Langley <agl@google.com>
2015-02-19 19:58:17 +00:00