Commit Graph

43 Commits

Author SHA1 Message Date
Adam Langley
df2a5562f3 bn/asm/x86_64-mont5.pl: unify gather procedure in hardly used path and reorganize/harmonize post-conditions.
(Imported from upstream's 515f3be47a0b58eec808cf365bc5e8ef6917266b)

Additional hardening following on from CVE-2016-0702.

Change-Id: I19a6739b401887a42eb335fe5838379dc8d04100
Reviewed-on: https://boringssl-review.googlesource.com/7245
Reviewed-by: Adam Langley <agl@google.com>
2016-03-01 18:04:20 +00:00
Adam Langley
b360eaf001 crypto/bn/x86_64-mont5.pl: constant-time gather procedure.
(Imported from upstream's 25d14c6c29b53907bf614b9964d43cd98401a7fc.)

At the same time remove miniscule bias in final subtraction. Performance
penalty varies from platform to platform, and even with key length. For
rsa2048 sign it was observed to be 4% for Sandy Bridge and 7% on
Broadwell.

(This is part of the fix for CVE-2016-0702.)

Change-Id: I43a13d592c4a589d04c17c33c0ca40c2d7375522
Reviewed-on: https://boringssl-review.googlesource.com/7244
Reviewed-by: Adam Langley <agl@google.com>
2016-03-01 18:04:15 +00:00
Adam Langley
1168fc72fc bn/asm/rsaz-avx2.pl: constant-time gather procedure.
(Imported from upstream's 08ea966c01a39e38ef89e8920d53085e4807a43a)

Performance penalty is 2%.

(This is part of the fix for CVE-2016-0702.)

Change-Id: Id3b6262c5d3201dd64b93bdd34601a51794a9275
Reviewed-on: https://boringssl-review.googlesource.com/7243
Reviewed-by: Adam Langley <agl@google.com>
2016-03-01 18:04:09 +00:00
Adam Langley
842a06c2b9 bn/asm/rsax-x86_64.pl: constant-time gather procedure.
(Imported from upstream's ef98503eeef5c108018081ace902d28e609f7772.)

Performance penalty is 2% on Linux and 5% on Windows.

(This is part of the fix for CVE-2016-0702.)

Change-Id: If82f95131c93168282a46ac5a35e2b007cc2bd67
Reviewed-on: https://boringssl-review.googlesource.com/7242
Reviewed-by: Adam Langley <agl@google.com>
2016-03-01 18:03:16 +00:00
Brian Smith
d1425f69df Simplify division-with-remainder calculations in crypto/bn/div.c.
Create a |bn_div_rem_words| that is used for double-word/single-word
divisions and division-with-remainder. Remove all implementations of
|bn_div_words| except for the implementation needed for 64-bit MSVC.
This allows more code to be shared across platforms and also removes
an instance of the dangerous pattern wherein the |div_asm| macro
modified a variable that wasn't passed as a parameter.

Also, document the limitations of the compiler-generated code for the
non-asm code paths more fully. Compilers indeed have not improved in
this respect.

Change-Id: I5a36a2edd7465de406d47d72dcd6bf3e63e5c232
Reviewed-on: https://boringssl-review.googlesource.com/7127
Reviewed-by: David Benjamin <davidben@google.com>
2016-02-25 16:13:22 +00:00
David Benjamin
0182ecd346 Consistently use named constants in ARM assembly files.
Most of the OPENSSL_armcap_P accesses in assembly use named constants from
arm_arch.h, but some don't. Consistently use the constants. The dispatch really
should be in C, but in the meantime, make it easier to tell what's going on.

I'll send this patch upstream so we won't be carrying a diff here.

Change-Id: I63c68d2351ea5ce11005813314988e32b6459526
Reviewed-on: https://boringssl-review.googlesource.com/7203
Reviewed-by: Adam Langley <agl@google.com>
2016-02-23 17:18:18 +00:00
David Benjamin
3ab3e3db6e Mark ARM assembly globals hidden uniformly in arm-xlate.pl.
We'd manually marked some of them hidden, but missed some. Do it in the perlasm
driver instead since we will never expose an asm symbol directly. This reduces
some of our divergence from upstream on these files (and indeed we'd
accidentally lose some .hiddens at one point).

BUG=586141

Change-Id: Ie1bfc6f38ba73d33f5c56a8a40c2bf1668562e7e
Reviewed-on: https://boringssl-review.googlesource.com/7140
Reviewed-by: Adam Langley <agl@google.com>
2016-02-11 17:28:03 +00:00
Brian Smith
a051bdd6cd Remove dead non-|BN_ULLONG|, non-64-bit-MSVC code in crypto/bn.
It is always the case that either |BN_ULLONG| is defined or
|BN_UMULT_LOHI| is defined because |BN_ULLONG| is defined everywhere
except 64-bit MSVC, and BN_UMULT_LOHI is defined for 64-bit MSVC.

Change-Id: I85e5d621458562501af1af65d587c0b8d937ba3b
Reviewed-on: https://boringssl-review.googlesource.com/7044
Reviewed-by: David Benjamin <davidben@google.com>
2016-02-09 16:21:41 +00:00
Brian Smith
767e1210e0 Remove unused Simics code in crypto/bn/asm/x86_64-gcc.c.
Change-Id: If9c5031855c0acfafb73caba169e146f0e16f706
Reviewed-on: https://boringssl-review.googlesource.com/7093
Reviewed-by: David Benjamin <davidben@google.com>
2016-02-08 23:41:47 +00:00
David Benjamin
6d9e5a7448 Re-apply 75b833cc81
I messed up and missed that we were carrying a diff on x86_64-mont5.pl. This
was accidentally dropped in https://boringssl-review.googlesource.com/6616.

To confirm the merge is good now, check out at this revision and run:

  git diff e701f16bd69b6f251ed537e40364c281e85a63b2^ crypto/bn/asm/x86_64-mont5.pl > /tmp/A

Then in OpenSSL's repository:

  git diff d73cc256c8e256c32ed959456101b73ba9842f72^ d73cc256c8e256c32ed959456101b73ba9842f72 crypto/bn/asm/x86_64-mont5.pl  > /tmp/B

And confirm the diffs vary in only metadata:

  diff -u /tmp/A /tmp/B

--- /tmp/A	2015-12-03 11:53:23.127034998 -0500
+++ /tmp/B	2015-12-03 11:53:53.099314287 -0500
@@ -1,8 +1,8 @@
 diff --git a/crypto/bn/asm/x86_64-mont5.pl b/crypto/bn/asm/x86_64-mont5.pl
-index 38def07..3c5a8fc 100644
+index 388e3c6..64e668f 100755
 --- a/crypto/bn/asm/x86_64-mont5.pl
 +++ b/crypto/bn/asm/x86_64-mont5.pl
-@@ -1770,6 +1770,15 @@ sqr8x_reduction:
+@@ -1784,6 +1784,15 @@ sqr8x_reduction:
  .align	32
  .L8x_tail_done:
  	add	(%rdx),%r8		# can this overflow?
@@ -18,7 +18,7 @@
  	xor	%rax,%rax

  	neg	$carry
-@@ -3116,6 +3125,15 @@ sqrx8x_reduction:
+@@ -3130,6 +3139,15 @@ sqrx8x_reduction:
  .align	32
  .Lsqrx8x_tail_done:
  	add	24+8(%rsp),%r8		# can this overflow?
@@ -34,7 +34,7 @@
  	mov	$carry,%rax		# xor	%rax,%rax

  	sub	16+8(%rsp),$carry	# mov 16(%rsp),%cf
-@@ -3159,13 +3177,11 @@ my ($rptr,$nptr)=("%rdx","%rbp");
+@@ -3173,13 +3191,11 @@ my ($rptr,$nptr)=("%rdx","%rbp");
  my @ri=map("%r$_",(10..13));
  my @ni=map("%r$_",(14..15));
  $code.=<<___;

Change-Id: I3fb5253783ed82e4831f5bffde75273bd9609c23
Reviewed-on: https://boringssl-review.googlesource.com/6618
Reviewed-by: Adam Langley <agl@google.com>
2015-12-03 17:25:12 +00:00
David Benjamin
e701f16bd6 bn/asm/x86_64-mont5.pl: fix carry propagating bug (CVE-2015-3193).
(Imported from upstream's d73cc256c8e256c32ed959456101b73ba9842f72.)

Change-Id: I673301fee57f0ab5bef24553caf8b2aac67fb3a9
Reviewed-on: https://boringssl-review.googlesource.com/6616
Reviewed-by: Adam Langley <agl@google.com>
2015-12-03 16:44:35 +00:00
Adam Langley
4ab254017c Add AArch64 Montgomery assembly.
The file armv8-mont.pl is taken from upstream. The speed ups are fairly
modest (~30%) but seem worthwhile.

Before:

Did 231 RSA 2048 signing operations in 1008671us (229.0 ops/sec)
Did 11208 RSA 2048 verify operations in 1036997us (10808.1 ops/sec)
Did 342 RSA 2048 (3 prime, e=3) signing operations in 1021545us (334.8 ops/sec)
Did 32000 RSA 2048 (3 prime, e=3) verify operations in 1016162us (31491.0 ops/sec)
Did 45 RSA 4096 signing operations in 1039805us (43.3 ops/sec)
Did 3608 RSA 4096 verify operations in 1060283us (3402.9 ops/sec)

After:

Did 300 RSA 2048 signing operations in 1009772us (297.1 ops/sec)
Did 12740 RSA 2048 verify operations in 1075413us (11846.6 ops/sec)
Did 408 RSA 2048 (3 prime, e=3) signing operations in 1016139us (401.5 ops/sec)
Did 33000 RSA 2048 (3 prime, e=3) verify operations in 1017510us (32432.1 ops/sec)
Did 52 RSA 4096 signing operations in 1067678us (48.7 ops/sec)
Did 3408 RSA 4096 verify operations in 1062863us (3206.4 ops/sec)

Change-Id: Ife74fac784067fce3668b5c87f51d481732ff855
Reviewed-on: https://boringssl-review.googlesource.com/6444
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2015-11-10 19:13:46 +00:00
David Benjamin
278d34234f Get rid of all compiler version checks in perlasm files.
Since we pre-generate our perlasm, having the output of these files be
sensitive to the environment the run in is unhelpful. It would be bad to
suddenly change what features we do or don't compile in whenever workstations'
toolchains change or if developers do or don't have CC variables set.

Previously, all compiler-version-gated features were turned on in
https://boringssl-review.googlesource.com/6260, but this broke the build. I
also wasn't thorough enough in gathering performance numbers. So, flip them all
to off instead. I'll enable them one-by-one as they're tested.

This should result in no change to generated assembly.

Change-Id: Ib4259b3f97adc4939cb0557c5580e8def120d5bc
Reviewed-on: https://boringssl-review.googlesource.com/6383
Reviewed-by: Adam Langley <agl@google.com>
2015-10-28 19:33:04 +00:00
David Benjamin
75885e29c4 Revert "Get rid of all compiler version checks in perlasm files."
This reverts commit b9c26014de.

The win64 bot seems unhappy. Will sniff at it tomorrow. In
the meantime, get the tree green again.

Change-Id: I058ddb3ec549beee7eabb2f3f72feb0a4a5143b2
Reviewed-on: https://boringssl-review.googlesource.com/6353
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 23:12:39 +00:00
David Benjamin
b9c26014de Get rid of all compiler version checks in perlasm files.
Since we pre-generate our perlasm, having the output of these files be
sensitive to the environment the run in is unhelpful. It would be bad to
suddenly change what features we do or don't compile in whenever workstations'
toolchains change.

Enable all compiler-version-gated features as they should all be runtime-gated
anyway. This should align with what upstream's files would have produced on
modern toolschains. We should assume our assemblers can take whatever we'd like
to throw at them. (If it turns out some can't, we'd rather find out and
probably switch the problematic instructions to explicit byte sequences.)

This actually results in a fairly significant change to the assembly we
generate. I'm guessing upstream's buildsystem sets the CC environment variable,
while ours doesn't and so the version checks were all coming out conservative.

diffstat of generated files:

 linux-x86/crypto/sha/sha1-586.S              | 1176 ++++++++++++
 linux-x86/crypto/sha/sha256-586.S            | 2248 ++++++++++++++++++++++++
 linux-x86_64/crypto/bn/rsaz-avx2.S           | 1644 +++++++++++++++++
 linux-x86_64/crypto/bn/rsaz-x86_64.S         |  638 ++++++
 linux-x86_64/crypto/bn/x86_64-mont.S         |  332 +++
 linux-x86_64/crypto/bn/x86_64-mont5.S        | 1130 ++++++++++++
 linux-x86_64/crypto/modes/aesni-gcm-x86_64.S |  754 ++++++++
 linux-x86_64/crypto/modes/ghash-x86_64.S     |  475 +++++
 linux-x86_64/crypto/sha/sha1-x86_64.S        | 1121 ++++++++++++
 linux-x86_64/crypto/sha/sha256-x86_64.S      | 1062 +++++++++++
 linux-x86_64/crypto/sha/sha512-x86_64.S      | 2241 ++++++++++++++++++++++++
 mac-x86/crypto/sha/sha1-586.S                | 1174 ++++++++++++
 mac-x86/crypto/sha/sha256-586.S              | 2248 ++++++++++++++++++++++++
 mac-x86_64/crypto/bn/rsaz-avx2.S             | 1637 +++++++++++++++++
 mac-x86_64/crypto/bn/rsaz-x86_64.S           |  638 ++++++
 mac-x86_64/crypto/bn/x86_64-mont.S           |  331 +++
 mac-x86_64/crypto/bn/x86_64-mont5.S          | 1130 ++++++++++++
 mac-x86_64/crypto/modes/aesni-gcm-x86_64.S   |  750 ++++++++
 mac-x86_64/crypto/modes/ghash-x86_64.S       |  475 +++++
 mac-x86_64/crypto/sha/sha1-x86_64.S          | 1121 ++++++++++++
 mac-x86_64/crypto/sha/sha256-x86_64.S        | 1062 +++++++++++
 mac-x86_64/crypto/sha/sha512-x86_64.S        | 2241 ++++++++++++++++++++++++
 win-x86/crypto/sha/sha1-586.asm              | 1173 ++++++++++++
 win-x86/crypto/sha/sha256-586.asm            | 2248 ++++++++++++++++++++++++
 win-x86_64/crypto/bn/rsaz-avx2.asm           | 1858 +++++++++++++++++++-
 win-x86_64/crypto/bn/rsaz-x86_64.asm         |  638 ++++++
 win-x86_64/crypto/bn/x86_64-mont.asm         |  352 +++
 win-x86_64/crypto/bn/x86_64-mont5.asm        | 1184 ++++++++++++
 win-x86_64/crypto/modes/aesni-gcm-x86_64.asm |  933 ++++++++++
 win-x86_64/crypto/modes/ghash-x86_64.asm     |  515 +++++
 win-x86_64/crypto/sha/sha1-x86_64.asm        | 1152 ++++++++++++
 win-x86_64/crypto/sha/sha256-x86_64.asm      | 1088 +++++++++++
 win-x86_64/crypto/sha/sha512-x86_64.asm      | 2499 ++++++

SHA* gets faster. RSA and AES-GCM seem to be more of a wash and even slower
sometimes!  This is a little concerning. Though when I repeated the latter two,
it's definitely noisy (RSA in particular), so we may wish to repeat in a more
controlled environment. We could also flip some of these toggles to something
other than the highest setting if it seems some of the variants aren't
desirable. We just shouldn't have them enabled or disabled on accident. This
aligns us closer to upstream though.

$ /tmp/bssl.old speed SHA-
Did 5028000 SHA-1 (16 bytes) operations in 1000048us (5027758.7 ops/sec): 80.4 MB/s
Did 1708000 SHA-1 (256 bytes) operations in 1000257us (1707561.2 ops/sec): 437.1 MB/s
Did 73000 SHA-1 (8192 bytes) operations in 1008406us (72391.5 ops/sec): 593.0 MB/s
Did 3041000 SHA-256 (16 bytes) operations in 1000311us (3040054.5 ops/sec): 48.6 MB/s
Did 779000 SHA-256 (256 bytes) operations in 1000820us (778361.7 ops/sec): 199.3 MB/s
Did 26000 SHA-256 (8192 bytes) operations in 1009875us (25745.8 ops/sec): 210.9 MB/s
Did 1837000 SHA-512 (16 bytes) operations in 1000251us (1836539.0 ops/sec): 29.4 MB/s
Did 803000 SHA-512 (256 bytes) operations in 1000969us (802222.6 ops/sec): 205.4 MB/s
Did 41000 SHA-512 (8192 bytes) operations in 1016768us (40323.8 ops/sec): 330.3 MB/s
$ /tmp/bssl.new speed SHA-
Did 5354000 SHA-1 (16 bytes) operations in 1000104us (5353443.2 ops/sec): 85.7 MB/s
Did 1779000 SHA-1 (256 bytes) operations in 1000121us (1778784.8 ops/sec): 455.4 MB/s
Did 87000 SHA-1 (8192 bytes) operations in 1012641us (85914.0 ops/sec): 703.8 MB/s
Did 3517000 SHA-256 (16 bytes) operations in 1000114us (3516599.1 ops/sec): 56.3 MB/s
Did 935000 SHA-256 (256 bytes) operations in 1000096us (934910.2 ops/sec): 239.3 MB/s
Did 38000 SHA-256 (8192 bytes) operations in 1004476us (37830.7 ops/sec): 309.9 MB/s
Did 2930000 SHA-512 (16 bytes) operations in 1000259us (2929241.3 ops/sec): 46.9 MB/s
Did 1008000 SHA-512 (256 bytes) operations in 1000509us (1007487.2 ops/sec): 257.9 MB/s
Did 45000 SHA-512 (8192 bytes) operations in 1000593us (44973.3 ops/sec): 368.4 MB/s

$ /tmp/bssl.old speed RSA
Did 820 RSA 2048 signing operations in 1017008us (806.3 ops/sec)
Did 27000 RSA 2048 verify operations in 1015400us (26590.5 ops/sec)
Did 1292 RSA 2048 (3 prime, e=3) signing operations in 1008185us (1281.5 ops/sec)
Did 65000 RSA 2048 (3 prime, e=3) verify operations in 1011388us (64268.1 ops/sec)
Did 120 RSA 4096 signing operations in 1061027us (113.1 ops/sec)
Did 8208 RSA 4096 verify operations in 1002717us (8185.8 ops/sec)
$ /tmp/bssl.new speed RSA
Did 760 RSA 2048 signing operations in 1003351us (757.5 ops/sec)
Did 25900 RSA 2048 verify operations in 1028931us (25171.8 ops/sec)
Did 1320 RSA 2048 (3 prime, e=3) signing operations in 1040806us (1268.2 ops/sec)
Did 63000 RSA 2048 (3 prime, e=3) verify operations in 1016042us (62005.3 ops/sec)
Did 104 RSA 4096 signing operations in 1008718us (103.1 ops/sec)
Did 6875 RSA 4096 verify operations in 1093441us (6287.5 ops/sec)

$ /tmp/bssl.old speed GCM
Did 5316000 AES-128-GCM (16 bytes) seal operations in 1000082us (5315564.1 ops/sec): 85.0 MB/s
Did 712000 AES-128-GCM (1350 bytes) seal operations in 1000252us (711820.6 ops/sec): 961.0 MB/s
Did 149000 AES-128-GCM (8192 bytes) seal operations in 1003182us (148527.4 ops/sec): 1216.7 MB/s
Did 5919750 AES-256-GCM (16 bytes) seal operations in 1000016us (5919655.3 ops/sec): 94.7 MB/s
Did 800000 AES-256-GCM (1350 bytes) seal operations in 1000951us (799239.9 ops/sec): 1079.0 MB/s
Did 152000 AES-256-GCM (8192 bytes) seal operations in 1000765us (151883.8 ops/sec): 1244.2 MB/s
$ /tmp/bssl.new speed GCM
Did 5315000 AES-128-GCM (16 bytes) seal operations in 1000125us (5314335.7 ops/sec): 85.0 MB/s
Did 755000 AES-128-GCM (1350 bytes) seal operations in 1000878us (754337.7 ops/sec): 1018.4 MB/s
Did 151000 AES-128-GCM (8192 bytes) seal operations in 1005655us (150150.9 ops/sec): 1230.0 MB/s
Did 5913500 AES-256-GCM (16 bytes) seal operations in 1000041us (5913257.6 ops/sec): 94.6 MB/s
Did 782000 AES-256-GCM (1350 bytes) seal operations in 1001484us (780841.2 ops/sec): 1054.1 MB/s
Did 121000 AES-256-GCM (8192 bytes) seal operations in 1006389us (120231.8 ops/sec): 984.9 MB/s

Change-Id: I0efb32f896c597abc7d7e55c31d038528a5c72a1
Reviewed-on: https://boringssl-review.googlesource.com/6260
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 20:31:30 +00:00
Adam Langley
0dd93002dd Revert section changes for ASM.
This change reverts the following commits:
  72d9cba7cb
  5b61b9ebc5
  3f85e04f40
  2ab24a2d40

Change-Id: I669b83f2269cf96aa71a649a346147b9407a811e
Reviewed-on: https://boringssl-review.googlesource.com/6056
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2015-09-30 22:09:52 +00:00
Adam Langley
72d9cba7cb Move .align directives next to their labels for ARM.
2ab24a2d40 added sections to ARM assembly
files. However, in cases where .align directives were not next to the
labels that they were intended to apply to, the section directives would
cause them to be ignored.

Change-Id: I32117f6747ff8545b80c70dd3b8effdc6e6f67e0
Reviewed-on: https://boringssl-review.googlesource.com/6050
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2015-09-30 18:35:29 +00:00
Adam Langley
2ab24a2d40 Put arm/aarch64 assembly functions in their own section.
This change causes each global arm or aarch64 asm function to be put
into its own section by default. This matches the behaviour of the
-ffunction-sections option to GCC and allows the --gc-sections option to
the linker to discard unused asm functions on a function-by-function
basis.

Sometimes several asm functions will share the same data an, in that
situation, the data is put into the section of one of the functions and
the section of the other function is merged with the added
“.global_with_section” directive.

Change-Id: I12c9b844d48d104d28beb816764358551eac4456
Reviewed-on: https://boringssl-review.googlesource.com/6003
Reviewed-by: Adam Langley <agl@google.com>
2015-09-29 18:02:14 +00:00
Adam Langley
73415b6aa0 Move arm_arch.h and fix up lots of include paths.
arm_arch.h is included from ARM asm files, but lives in crypto/, not
openssl/include/. Since the asm files are often built from a different
location than their position in the source tree, relative include paths
are unlikely to work so, rather than having crypto/ be a de-facto,
second global include path, this change moves arm_arch.h to
include/openssl/.

It also removes entries from many include paths because they should be
needed as relative includes are always based on the locations of the
source file.

Change-Id: I638ff43d641ca043a4fc06c0d901b11c6ff73542
Reviewed-on: https://boringssl-review.googlesource.com/5746
Reviewed-by: Adam Langley <agl@google.com>
2015-08-26 01:57:59 +00:00
Adam Langley
966003273d Don't use x86_64-gcc.c with NO_ASM.
Android (on OS X) builds with NO_ASM and was getting both generic.c and
x86_64-gcc.c. This change updates the latter so that it's excluded in
NO_ASM builds.

Change-Id: I1f0e1c5e551eed9c575ce632ec3016fce7ec9d2e
Reviewed-on: https://boringssl-review.googlesource.com/4741
Reviewed-by: Adam Langley <agl@google.com>
2015-05-15 22:23:49 +00:00
David Benjamin
2a2dbaa9e4 Add assembly support for 32-bit iOS.
(Imported from upstream's 313e6ec11fb8a7bda1676ce5804bee8755664141)

BUG=338886

Change-Id: Id635e78b9afaad5ca311e3aeed888c9aedeb9637
Reviewed-on: https://boringssl-review.googlesource.com/4490
Reviewed-by: Adam Langley <agl@google.com>
2015-05-04 22:44:24 +00:00
David Benjamin
96ac819197 Remove inconsistency in ARM support.
This facilitates "universal" builds, ones that target multiple
architectures, e.g. ARMv5 through ARMv7.

(Imported from upstream's c1669e1c205dc8e695fb0c10a655f434e758b9f7)

This is a change from a while ago which was a source of divergence between our
perlasm and upstream's. This change in upstream came with the following comment
in Configure:

 Note that -march is not among compiler options in below linux-armv4
 target line. Not specifying one is intentional to give you choice to:

 a) rely on your compiler default by not specifying one;
 b) specify your target platform explicitly for optimal performance,
    e.g. -march=armv6 or -march=armv7-a;
 c) build "universal" binary that targets *range* of platforms by
    specifying minimum and maximum supported architecture;

 As for c) option. It actually makes no sense to specify maximum to be
 less than ARMv7, because it's the least requirement for run-time
 switch between platform-specific code paths. And without run-time
 switch performance would be equivalent to one for minimum. Secondly,
 there are some natural limitations that you'd have to accept and
 respect. Most notably you can *not* build "universal" binary for
 big-endian platform. This is because ARMv7 processor always picks
 instructions in little-endian order. Another similar limitation is
 that -mthumb can't "cross" -march=armv6t2 boundary, because that's
 where it became Thumb-2. Well, this limitation is a bit artificial,
 because it's not really impossible, but it's deemed too tricky to
 support. And of course you have to be sure that your binutils are
 actually up to the task of handling maximum target platform.

Change-Id: Ie5f674d603393f0a1354a0d0973987484a4a650c
Reviewed-on: https://boringssl-review.googlesource.com/4488
Reviewed-by: Adam Langley <agl@google.com>
2015-05-04 22:43:51 +00:00
David Benjamin
4ae52cddad ARM assembly pack: get ARMv7 instruction endianness right.
Pointer out and suggested by: Ard Biesheuvel.

(Imported from upstream's 5dcf70a1c57c2019bfad640fe14fd4a73212860a)

This is from a while ago, but it's one source of divergence between our copy of
these files and master's.

Change-Id: I6525a27f25eb86a92420c32996af47ecc42ee020
Reviewed-on: https://boringssl-review.googlesource.com/4487
Reviewed-by: Adam Langley <agl@google.com>
2015-05-04 22:41:59 +00:00
David Benjamin
09bdb2a2c3 Remove explicit .hiddens from x86_64 perlasm files.
This reverts the non-ARM portions of 97999919bb.
x86_64 perlasm already makes .globl imply .hidden. (Confusingly, ARM does not.)
Since we don't need it, revert those to minimize divergence with upstream.

Change-Id: I2d205cfb1183e65d4f18a62bde187d206b1a96de
Reviewed-on: https://boringssl-review.googlesource.com/3610
Reviewed-by: Adam Langley <agl@google.com>
2015-02-25 21:26:16 +00:00
Adam Langley
97999919bb Hide all asm symbols.
We are leaking asm symbols in Android builds because the asm code isn't
affected by -fvisibility=hidden. This change hides all asm symbols.

This assumes that no asm symbols are public API and that should be true.
Some points to note:

In crypto/rc4/asm/rc4-md5-x86_64.pl there are |RC4_set_key| and
|RC4_options| functions which aren't getting marked as hidden. That's
because those functions aren't actually ever generated. (I'm just trying
to minimise drift with upstream here.)

In crypto/rc4/asm/rc4-x86_64.pl there's |RC4_options| which is "public"
API, except that we've never had it in the header files. So I've just
deleted it. Since we have an internal caller, we'll probably have to put
it back in the future, but it can just be done in rc4.c to save
problems.

BUG=448386

Change-Id: I3846617a0e3d73ec9e5ec3638a53364adbbc6260
Reviewed-on: https://boringssl-review.googlesource.com/3520
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2015-02-20 21:24:01 +00:00
Adam Langley
16e38b2b8f Mark OPENSSL_armcap_P as hidden in ARM asm.
This is an import from ARM. Without this, one of the Android builds of
BoringSSL was failing with:
  (sha512-armv4.o): requires unsupported dynamic reloc R_ARM_REL32; recompile with -fPIC

This is (I believe) a very misleading error message. The R_ARM_REL32
relocation type is the correct type for position independent code. But
unless the target symbol is hidden then the linker doesn't know that
it's not going to be overridden by a different ELF module.

Chromium probably gets away with this because of different default
compiler flags than Android.

Change-Id: I967eabc4d6b33d1e6635caaf6e7a306e4e77c101
Reviewed-on: https://boringssl-review.googlesource.com/3471
Reviewed-by: Adam Langley <agl@google.com>
2015-02-19 19:58:17 +00:00
David Benjamin
c9a202fee3 Add in missing curly braces part 1.
Everything before crypto/ec.

Change-Id: Icbfab8e4ffe5cc56bf465eb57d3fdad3959a085c
Reviewed-on: https://boringssl-review.googlesource.com/3401
Reviewed-by: Adam Langley <agl@google.com>
2015-02-11 19:31:01 +00:00
Adam Langley
9115755006 Convert latin-1 files to UTF-8.
A handful of latin-1 codepoints existed a trio of files. This change
switches the encoding to UTF-8.

Change-Id: I00309e4d1ee3101e0cc02abc53196eafa17a4fa5
2015-01-29 17:26:36 -08:00
David Benjamin
347f025d75 Remove unused modexp512-x86_64.pl.
See upstream's c436e05bdc7f49985a750df64122c960240b3ae1.

Change-Id: I7cbe5315a769450e4630dd4e8f465cdfd45c2e08
Reviewed-on: https://boringssl-review.googlesource.com/3025
Reviewed-by: Adam Langley <agl@google.com>
2015-01-26 18:45:45 +00:00
David Benjamin
8604eda634 Add Broadwell performance results.
(Imported from upstream's b3d7294976c58e0e05d0ee44a0e7c9c3b8515e05.)

May as well avoid diverging.

Change-Id: I3edec4fe15b492dd3bfb3146a8944acc6575f861
Reviewed-on: https://boringssl-review.googlesource.com/3020
Reviewed-by: Adam Langley <agl@google.com>
2015-01-26 18:35:35 +00:00
Adam Langley
a83cc803b1 Fix for CVE-2014-3570.
(With minor bn/generic.c revamp.)

(Imported from upstream's 56df92efb6893abe323307939425957ce878c8f0)

Change-Id: I9d85cfde4dfb29e64ff7417f781d0c9f1685e905
Reviewed-on: https://boringssl-review.googlesource.com/2780
Reviewed-by: Adam Langley <agl@google.com>
2015-01-09 02:49:10 +00:00
Adam Langley
25cb99c149 crypto/bn/asm/rsaz-*.pl: allow spaces in Perl path name.
(Imported from upstream's ef908777218bd4a362dbe9cebb8e18fa8ab384cf.)

Change-Id: Id9b288d230cc9d8ab308690a18e687e2132e3293
Reviewed-on: https://boringssl-review.googlesource.com/2168
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2014-11-05 21:26:15 +00:00
David Benjamin
f44aa68a26 Fix standalone Win64 build.
generic.c still needs to include generic implementations in Win64.
Those are currently done with inline assembly and won't work on
MSVC.

Change-Id: Ifeb5470872d8c97b2ccffeae6f3ccb5661051de3
Reviewed-on: https://boringssl-review.googlesource.com/2102
Reviewed-by: Adam Langley <agl@google.com>
2014-10-31 22:00:45 +00:00
David Benjamin
bf681a40d6 Fix out-of-bounds read in BN_mod_exp_mont_consttime.
bn_get_bits5 always reads two bytes, even when it doesn't need to. For some
sizes of |p|, this can result in reading just past the edge of the array.
Unroll the first iteration of the loop and avoid reading out of bounds.

Replace bn_get_bits5 altogether in C as it's not doing anything interesting.

Change-Id: Ibcc8cea7d9c644a2639445396455da47fe869a5c
Reviewed-on: https://boringssl-review.googlesource.com/1393
Reviewed-by: Adam Langley <agl@google.com>
2014-08-06 00:11:47 +00:00
Adam Langley
eb7d2ed1fe Add visibility rules.
This change marks public symbols as dynamically exported. This means
that it becomes viable to build a shared library of libcrypto and libssl
with -fvisibility=hidden.

On Windows, one not only needs to mark functions for export in a
component, but also for import when using them from a different
component. Because of this we have to build with
|BORINGSSL_IMPLEMENTATION| defined when building the code. Other
components, when including our headers, won't have that defined and then
the |OPENSSL_EXPORT| tag becomes an import tag instead. See the #defines
in base.h

In the asm code, symbols are now hidden by default and those that need
to be exported are wrapped by a C function.

In order to support Chromium, a couple of libssl functions were moved to
ssl.h from ssl_locl.h: ssl_get_new_session and ssl_update_cache.

Change-Id: Ib4b76e2f1983ee066e7806c24721e8626d08a261
Reviewed-on: https://boringssl-review.googlesource.com/1350
Reviewed-by: Adam Langley <agl@google.com>
2014-07-31 22:03:11 +00:00
Adam Langley
4b5979b3fa x86_64 assembly pack: improve masm support.
(Imported from upstream's 371feee876dd8b58531cb6e50fe79262db8e4ed7)

Change-Id: Id3b5ece6b5e5f0565060d5e598ea265d64dac9df
2014-07-28 17:05:13 -07:00
Adam Langley
2811da2eca x86_64 assembly pack: allow clang to compile AVX code.
(Imported from upstream's 912f08dd5ed4f68fb275f3b2db828349fcffba14,
52f856526c46ee80ef4c8c37844f084423a3eff7 and
377551b9c4e12aa7846f4d80cf3604f2e396c964)

Change-Id: Ic2bf93371f6d246818729810e7a45b3f0021845a
2014-07-28 17:05:13 -07:00
Adam Langley
b351d83875 bn/asm/rsaz-avx2.pl: fix occasional failures.
(Imported from upstream's 1067663d852435b1adff32ec01e9b8e54d2b5896)

Change-Id: I39e2a24176306f4170449145d3dee2c2edbf6dfe
2014-07-28 17:05:12 -07:00
Adam Langley
25ba90e34a move check for AD*X to rsaz-avx2.pl.
This ensures high performance is situations when assembler supports
AVX2, but not AD*X.

(Imported from upstream's 82a9dafe32e1e39b5adff18f9061e43d8df3d3c5)

Change-Id: Ie67f49a1c5467807139b6a8a0d4e62162d8a974f
2014-07-28 17:05:12 -07:00
Adam Langley
7ac79ebe55 The asm files bn/asm/x86* weren't actually used.
(This appears to be the case with upstream too, it's not that BoringSSL
is missing optimisations from what I can see.)

Change-Id: I0e54762ef0d09e60994ec82c5cca1ff0b3b23ea4
Reviewed-on: https://boringssl-review.googlesource.com/1080
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2014-07-02 00:29:12 +00:00
Adam Langley
ebebf87d6d Add needed volatile qualifications.
Add volatile qualifications to two blocks of inline asm to stop GCC from
eliminating them as dead code.
2014-06-20 13:17:33 -07:00
Adam Langley
75b833cc81 OpenSSL: make final reduction in Montgomery multiplication constant-time.
(The issue was reported by Shay Gueron.)

The final reduction in Montgomery multiplication computes if (X >= m) then X =
X - m else X = X

In OpenSSL, this was done by computing T = X - m,  doing a constant-time
selection of the *addresses* of X and T, and loading from the resulting
address. But this is not cache-neutral.

This patch changes the behaviour by loading both X and T into registers, and
doing a constant-time selection of the *values*.

TODO(fork): only some of the fixes from the original patch still apply to
the 1.0.2 code.
2014-06-20 13:17:33 -07:00
Adam Langley
95c29f3cd1 Inital import.
Initial fork from f2d678e6e89b6508147086610e985d4e8416e867 (1.0.2 beta).

(This change contains substantial changes from the original and
effectively starts a new history.)
2014-06-20 13:17:32 -07:00