Commit Graph

21 Commits

Author SHA1 Message Date
Adam Langley
628f518cdc bn/asm/x86_64*: add DWARF CFI directives.
(Imports upstream's 76e624a003db22db2d99ece04a15e20fe44c1fbe.)

Also includes the following fixes:
https://github.com/openssl/openssl/pull/2582
https://github.com/openssl/openssl/pull/2655

Change-Id: I6086a87a534d152cdbff104c62ad9dcd9b4e012a
Reviewed-on: https://boringssl-review.googlesource.com/13783
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-02-16 23:03:48 +00:00
Adam Langley
cb1b333c2b x86_64 assembly pack: Win64 SEH face-lift.
(Imports upstream's 384e6de4c7e35e37fb3d6fbeb32ddcb5eb0d3d3f. Changes to
P-256 assembly dropped because we're so different there.)

 - harmonize handlers with guidelines and themselves;
 - fix some bugs in handlers;

Change-Id: Ic0b6a37bed6baedc50448c72fab088327f12898d
Reviewed-on: https://boringssl-review.googlesource.com/13782
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-16 21:55:04 +00:00
Adam Langley
766a6fd151 Revert "OpenSSL: make final reduction in Montgomery multiplication constant-time."
This reverts commit 75b833cc81.

Sadly this needs to be redone because upstream never took this change.
Perhaps, once redone, we can try upstreaming it again.

Change-Id: Ic8aaa0728a43936cde1628ca031ff3821f0fbf5b
Reviewed-on: https://boringssl-review.googlesource.com/13776
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-14 00:35:05 +00:00
Adam Langley
0bf9d6d554 bn/asm/x86[_64]-mont*.pl: implement slightly alternative page-walking.
(Imports upstream's 3ba1ef829cf3dd36eaa5e819258d90291c6a1027.)

Original strategy for page-walking was adjust stack pointer and then
touch pages in order. This kind of asks for double-fault, because
if touch fails, then signal will be delivered to frame above adjusted
stack pointer. But touching pages prior adjusting stack pointer would
upset valgrind. As compromise let's adjust stack pointer in pages,
touching top of the stack. This still asks for double-fault, but at
least prevents corruption of neighbour stack if allocation is to
overstep the guard page.

Also omit predict-non-taken hints as they reportedly trigger illegal
instructions in some VM setups.

Change-Id: Ife42935319de79c6c76f8df60a76204c546fd1e0
Reviewed-on: https://boringssl-review.googlesource.com/13775
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-14 00:14:21 +00:00
Adam Langley
c948d46569 Remove trailing whitespace from Perl files.
Upstream did this in 609b0852e4d50251857dbbac3141ba042e35a9ae and it's
easier to apply patches if we do also.

Change-Id: I5142693ed1e26640987ff16f5ea510e81bba200e
Reviewed-on: https://boringssl-review.googlesource.com/13771
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-14 00:13:55 +00:00
Adam Langley
073a06d3da On Windows, page walking is known as __chkstk.
(Imports upstream's 0a86f668212acfa6b48abacbc17b99c234eedf33.)

Change-Id: Ie31d99f8cc3e93b6a9c7c5daa066de96941b3f7c
Reviewed-on: https://boringssl-review.googlesource.com/13770
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-14 00:13:50 +00:00
Adam Langley
b8344501d3 Explain *cough*-dows
(Imports upstream's 1bf80d93024e72628d4351c7ad19c0dfe635aa95.)

Change-Id: If1d61336edc7f63cdfd8ac14157376bde2651a31
Reviewed-on: https://boringssl-review.googlesource.com/13769
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-14 00:13:44 +00:00
Adam Langley
edcd8fda65 bn/asm/x86[_64]-mont*.pl: complement alloca with page-walking.
(Imports upstream's adc4f1fc25b2cac90076f1e1695b05b7aeeae501.)

Some OSes, *cough*-dows, insist on stack being "wired" to
physical memory in strictly sequential manner, i.e. if stack
allocation spans two pages, then reference to farmost one can
be punishable by SEGV. But page walking can do good even on
other OSes, because it guarantees that villain thread hits
the guard page before it can make damage to innocent one...

Change-Id: Ie1e278eb5982f26e596783b3d7820a71295688ec
Reviewed-on: https://boringssl-review.googlesource.com/13768
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2017-02-14 00:13:38 +00:00
David Benjamin
d103616db1 bn/asm/x86_64-mont5.pl: fix carry bug in bn_sqr8x_internal.
CVE-2017-3732

(Imported from upstream's 3f4bcf5bb664b47ed369a70b99fac4e0ad141bb3 and
3e7a496307ab1174c1f8f64eed4454c1c9cde1a8.)

Change-Id: I40255fdf4184e3b919758a72c3d3a7486d91ff65
Reviewed-on: https://boringssl-review.googlesource.com/13360
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-01-26 18:29:44 +00:00
David Benjamin
fdd8e9c8c7 Switch perlasm calling convention.
Depending on architecture, perlasm differed on which one or both of:

  perl foo.pl flavor output.S
  perl foo.pl flavor > output.S

Upstream has now unified on the first form after making a number of
changes to their files (the second does not even work for their x86
files anymore). Sync those portions of our perlasm scripts with upstream
and update CMakeLists.txt and generate_build_files.py per the new
convention.

This imports various commits like this one:
184bc45f683c76531d7e065b6553ca9086564576 (this was done by taking a
diff, so I don't have the full list)

Confirmed that generate_build_files.py sees no change.

BUG=14

Change-Id: Id2fb5b8bc2a7369d077221b5df9a6947d41f50d2
Reviewed-on: https://boringssl-review.googlesource.com/8518
Reviewed-by: Adam Langley <agl@google.com>
2016-06-27 21:59:26 +00:00
Adam Langley
df2a5562f3 bn/asm/x86_64-mont5.pl: unify gather procedure in hardly used path and reorganize/harmonize post-conditions.
(Imported from upstream's 515f3be47a0b58eec808cf365bc5e8ef6917266b)

Additional hardening following on from CVE-2016-0702.

Change-Id: I19a6739b401887a42eb335fe5838379dc8d04100
Reviewed-on: https://boringssl-review.googlesource.com/7245
Reviewed-by: Adam Langley <agl@google.com>
2016-03-01 18:04:20 +00:00
Adam Langley
b360eaf001 crypto/bn/x86_64-mont5.pl: constant-time gather procedure.
(Imported from upstream's 25d14c6c29b53907bf614b9964d43cd98401a7fc.)

At the same time remove miniscule bias in final subtraction. Performance
penalty varies from platform to platform, and even with key length. For
rsa2048 sign it was observed to be 4% for Sandy Bridge and 7% on
Broadwell.

(This is part of the fix for CVE-2016-0702.)

Change-Id: I43a13d592c4a589d04c17c33c0ca40c2d7375522
Reviewed-on: https://boringssl-review.googlesource.com/7244
Reviewed-by: Adam Langley <agl@google.com>
2016-03-01 18:04:15 +00:00
David Benjamin
6d9e5a7448 Re-apply 75b833cc81
I messed up and missed that we were carrying a diff on x86_64-mont5.pl. This
was accidentally dropped in https://boringssl-review.googlesource.com/6616.

To confirm the merge is good now, check out at this revision and run:

  git diff e701f16bd69b6f251ed537e40364c281e85a63b2^ crypto/bn/asm/x86_64-mont5.pl > /tmp/A

Then in OpenSSL's repository:

  git diff d73cc256c8e256c32ed959456101b73ba9842f72^ d73cc256c8e256c32ed959456101b73ba9842f72 crypto/bn/asm/x86_64-mont5.pl  > /tmp/B

And confirm the diffs vary in only metadata:

  diff -u /tmp/A /tmp/B

--- /tmp/A	2015-12-03 11:53:23.127034998 -0500
+++ /tmp/B	2015-12-03 11:53:53.099314287 -0500
@@ -1,8 +1,8 @@
 diff --git a/crypto/bn/asm/x86_64-mont5.pl b/crypto/bn/asm/x86_64-mont5.pl
-index 38def07..3c5a8fc 100644
+index 388e3c6..64e668f 100755
 --- a/crypto/bn/asm/x86_64-mont5.pl
 +++ b/crypto/bn/asm/x86_64-mont5.pl
-@@ -1770,6 +1770,15 @@ sqr8x_reduction:
+@@ -1784,6 +1784,15 @@ sqr8x_reduction:
  .align	32
  .L8x_tail_done:
  	add	(%rdx),%r8		# can this overflow?
@@ -18,7 +18,7 @@
  	xor	%rax,%rax

  	neg	$carry
-@@ -3116,6 +3125,15 @@ sqrx8x_reduction:
+@@ -3130,6 +3139,15 @@ sqrx8x_reduction:
  .align	32
  .Lsqrx8x_tail_done:
  	add	24+8(%rsp),%r8		# can this overflow?
@@ -34,7 +34,7 @@
  	mov	$carry,%rax		# xor	%rax,%rax

  	sub	16+8(%rsp),$carry	# mov 16(%rsp),%cf
-@@ -3159,13 +3177,11 @@ my ($rptr,$nptr)=("%rdx","%rbp");
+@@ -3173,13 +3191,11 @@ my ($rptr,$nptr)=("%rdx","%rbp");
  my @ri=map("%r$_",(10..13));
  my @ni=map("%r$_",(14..15));
  $code.=<<___;

Change-Id: I3fb5253783ed82e4831f5bffde75273bd9609c23
Reviewed-on: https://boringssl-review.googlesource.com/6618
Reviewed-by: Adam Langley <agl@google.com>
2015-12-03 17:25:12 +00:00
David Benjamin
e701f16bd6 bn/asm/x86_64-mont5.pl: fix carry propagating bug (CVE-2015-3193).
(Imported from upstream's d73cc256c8e256c32ed959456101b73ba9842f72.)

Change-Id: I673301fee57f0ab5bef24553caf8b2aac67fb3a9
Reviewed-on: https://boringssl-review.googlesource.com/6616
Reviewed-by: Adam Langley <agl@google.com>
2015-12-03 16:44:35 +00:00
David Benjamin
278d34234f Get rid of all compiler version checks in perlasm files.
Since we pre-generate our perlasm, having the output of these files be
sensitive to the environment the run in is unhelpful. It would be bad to
suddenly change what features we do or don't compile in whenever workstations'
toolchains change or if developers do or don't have CC variables set.

Previously, all compiler-version-gated features were turned on in
https://boringssl-review.googlesource.com/6260, but this broke the build. I
also wasn't thorough enough in gathering performance numbers. So, flip them all
to off instead. I'll enable them one-by-one as they're tested.

This should result in no change to generated assembly.

Change-Id: Ib4259b3f97adc4939cb0557c5580e8def120d5bc
Reviewed-on: https://boringssl-review.googlesource.com/6383
Reviewed-by: Adam Langley <agl@google.com>
2015-10-28 19:33:04 +00:00
David Benjamin
75885e29c4 Revert "Get rid of all compiler version checks in perlasm files."
This reverts commit b9c26014de.

The win64 bot seems unhappy. Will sniff at it tomorrow. In
the meantime, get the tree green again.

Change-Id: I058ddb3ec549beee7eabb2f3f72feb0a4a5143b2
Reviewed-on: https://boringssl-review.googlesource.com/6353
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 23:12:39 +00:00
David Benjamin
b9c26014de Get rid of all compiler version checks in perlasm files.
Since we pre-generate our perlasm, having the output of these files be
sensitive to the environment the run in is unhelpful. It would be bad to
suddenly change what features we do or don't compile in whenever workstations'
toolchains change.

Enable all compiler-version-gated features as they should all be runtime-gated
anyway. This should align with what upstream's files would have produced on
modern toolschains. We should assume our assemblers can take whatever we'd like
to throw at them. (If it turns out some can't, we'd rather find out and
probably switch the problematic instructions to explicit byte sequences.)

This actually results in a fairly significant change to the assembly we
generate. I'm guessing upstream's buildsystem sets the CC environment variable,
while ours doesn't and so the version checks were all coming out conservative.

diffstat of generated files:

 linux-x86/crypto/sha/sha1-586.S              | 1176 ++++++++++++
 linux-x86/crypto/sha/sha256-586.S            | 2248 ++++++++++++++++++++++++
 linux-x86_64/crypto/bn/rsaz-avx2.S           | 1644 +++++++++++++++++
 linux-x86_64/crypto/bn/rsaz-x86_64.S         |  638 ++++++
 linux-x86_64/crypto/bn/x86_64-mont.S         |  332 +++
 linux-x86_64/crypto/bn/x86_64-mont5.S        | 1130 ++++++++++++
 linux-x86_64/crypto/modes/aesni-gcm-x86_64.S |  754 ++++++++
 linux-x86_64/crypto/modes/ghash-x86_64.S     |  475 +++++
 linux-x86_64/crypto/sha/sha1-x86_64.S        | 1121 ++++++++++++
 linux-x86_64/crypto/sha/sha256-x86_64.S      | 1062 +++++++++++
 linux-x86_64/crypto/sha/sha512-x86_64.S      | 2241 ++++++++++++++++++++++++
 mac-x86/crypto/sha/sha1-586.S                | 1174 ++++++++++++
 mac-x86/crypto/sha/sha256-586.S              | 2248 ++++++++++++++++++++++++
 mac-x86_64/crypto/bn/rsaz-avx2.S             | 1637 +++++++++++++++++
 mac-x86_64/crypto/bn/rsaz-x86_64.S           |  638 ++++++
 mac-x86_64/crypto/bn/x86_64-mont.S           |  331 +++
 mac-x86_64/crypto/bn/x86_64-mont5.S          | 1130 ++++++++++++
 mac-x86_64/crypto/modes/aesni-gcm-x86_64.S   |  750 ++++++++
 mac-x86_64/crypto/modes/ghash-x86_64.S       |  475 +++++
 mac-x86_64/crypto/sha/sha1-x86_64.S          | 1121 ++++++++++++
 mac-x86_64/crypto/sha/sha256-x86_64.S        | 1062 +++++++++++
 mac-x86_64/crypto/sha/sha512-x86_64.S        | 2241 ++++++++++++++++++++++++
 win-x86/crypto/sha/sha1-586.asm              | 1173 ++++++++++++
 win-x86/crypto/sha/sha256-586.asm            | 2248 ++++++++++++++++++++++++
 win-x86_64/crypto/bn/rsaz-avx2.asm           | 1858 +++++++++++++++++++-
 win-x86_64/crypto/bn/rsaz-x86_64.asm         |  638 ++++++
 win-x86_64/crypto/bn/x86_64-mont.asm         |  352 +++
 win-x86_64/crypto/bn/x86_64-mont5.asm        | 1184 ++++++++++++
 win-x86_64/crypto/modes/aesni-gcm-x86_64.asm |  933 ++++++++++
 win-x86_64/crypto/modes/ghash-x86_64.asm     |  515 +++++
 win-x86_64/crypto/sha/sha1-x86_64.asm        | 1152 ++++++++++++
 win-x86_64/crypto/sha/sha256-x86_64.asm      | 1088 +++++++++++
 win-x86_64/crypto/sha/sha512-x86_64.asm      | 2499 ++++++

SHA* gets faster. RSA and AES-GCM seem to be more of a wash and even slower
sometimes!  This is a little concerning. Though when I repeated the latter two,
it's definitely noisy (RSA in particular), so we may wish to repeat in a more
controlled environment. We could also flip some of these toggles to something
other than the highest setting if it seems some of the variants aren't
desirable. We just shouldn't have them enabled or disabled on accident. This
aligns us closer to upstream though.

$ /tmp/bssl.old speed SHA-
Did 5028000 SHA-1 (16 bytes) operations in 1000048us (5027758.7 ops/sec): 80.4 MB/s
Did 1708000 SHA-1 (256 bytes) operations in 1000257us (1707561.2 ops/sec): 437.1 MB/s
Did 73000 SHA-1 (8192 bytes) operations in 1008406us (72391.5 ops/sec): 593.0 MB/s
Did 3041000 SHA-256 (16 bytes) operations in 1000311us (3040054.5 ops/sec): 48.6 MB/s
Did 779000 SHA-256 (256 bytes) operations in 1000820us (778361.7 ops/sec): 199.3 MB/s
Did 26000 SHA-256 (8192 bytes) operations in 1009875us (25745.8 ops/sec): 210.9 MB/s
Did 1837000 SHA-512 (16 bytes) operations in 1000251us (1836539.0 ops/sec): 29.4 MB/s
Did 803000 SHA-512 (256 bytes) operations in 1000969us (802222.6 ops/sec): 205.4 MB/s
Did 41000 SHA-512 (8192 bytes) operations in 1016768us (40323.8 ops/sec): 330.3 MB/s
$ /tmp/bssl.new speed SHA-
Did 5354000 SHA-1 (16 bytes) operations in 1000104us (5353443.2 ops/sec): 85.7 MB/s
Did 1779000 SHA-1 (256 bytes) operations in 1000121us (1778784.8 ops/sec): 455.4 MB/s
Did 87000 SHA-1 (8192 bytes) operations in 1012641us (85914.0 ops/sec): 703.8 MB/s
Did 3517000 SHA-256 (16 bytes) operations in 1000114us (3516599.1 ops/sec): 56.3 MB/s
Did 935000 SHA-256 (256 bytes) operations in 1000096us (934910.2 ops/sec): 239.3 MB/s
Did 38000 SHA-256 (8192 bytes) operations in 1004476us (37830.7 ops/sec): 309.9 MB/s
Did 2930000 SHA-512 (16 bytes) operations in 1000259us (2929241.3 ops/sec): 46.9 MB/s
Did 1008000 SHA-512 (256 bytes) operations in 1000509us (1007487.2 ops/sec): 257.9 MB/s
Did 45000 SHA-512 (8192 bytes) operations in 1000593us (44973.3 ops/sec): 368.4 MB/s

$ /tmp/bssl.old speed RSA
Did 820 RSA 2048 signing operations in 1017008us (806.3 ops/sec)
Did 27000 RSA 2048 verify operations in 1015400us (26590.5 ops/sec)
Did 1292 RSA 2048 (3 prime, e=3) signing operations in 1008185us (1281.5 ops/sec)
Did 65000 RSA 2048 (3 prime, e=3) verify operations in 1011388us (64268.1 ops/sec)
Did 120 RSA 4096 signing operations in 1061027us (113.1 ops/sec)
Did 8208 RSA 4096 verify operations in 1002717us (8185.8 ops/sec)
$ /tmp/bssl.new speed RSA
Did 760 RSA 2048 signing operations in 1003351us (757.5 ops/sec)
Did 25900 RSA 2048 verify operations in 1028931us (25171.8 ops/sec)
Did 1320 RSA 2048 (3 prime, e=3) signing operations in 1040806us (1268.2 ops/sec)
Did 63000 RSA 2048 (3 prime, e=3) verify operations in 1016042us (62005.3 ops/sec)
Did 104 RSA 4096 signing operations in 1008718us (103.1 ops/sec)
Did 6875 RSA 4096 verify operations in 1093441us (6287.5 ops/sec)

$ /tmp/bssl.old speed GCM
Did 5316000 AES-128-GCM (16 bytes) seal operations in 1000082us (5315564.1 ops/sec): 85.0 MB/s
Did 712000 AES-128-GCM (1350 bytes) seal operations in 1000252us (711820.6 ops/sec): 961.0 MB/s
Did 149000 AES-128-GCM (8192 bytes) seal operations in 1003182us (148527.4 ops/sec): 1216.7 MB/s
Did 5919750 AES-256-GCM (16 bytes) seal operations in 1000016us (5919655.3 ops/sec): 94.7 MB/s
Did 800000 AES-256-GCM (1350 bytes) seal operations in 1000951us (799239.9 ops/sec): 1079.0 MB/s
Did 152000 AES-256-GCM (8192 bytes) seal operations in 1000765us (151883.8 ops/sec): 1244.2 MB/s
$ /tmp/bssl.new speed GCM
Did 5315000 AES-128-GCM (16 bytes) seal operations in 1000125us (5314335.7 ops/sec): 85.0 MB/s
Did 755000 AES-128-GCM (1350 bytes) seal operations in 1000878us (754337.7 ops/sec): 1018.4 MB/s
Did 151000 AES-128-GCM (8192 bytes) seal operations in 1005655us (150150.9 ops/sec): 1230.0 MB/s
Did 5913500 AES-256-GCM (16 bytes) seal operations in 1000041us (5913257.6 ops/sec): 94.6 MB/s
Did 782000 AES-256-GCM (1350 bytes) seal operations in 1001484us (780841.2 ops/sec): 1054.1 MB/s
Did 121000 AES-256-GCM (8192 bytes) seal operations in 1006389us (120231.8 ops/sec): 984.9 MB/s

Change-Id: I0efb32f896c597abc7d7e55c31d038528a5c72a1
Reviewed-on: https://boringssl-review.googlesource.com/6260
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 20:31:30 +00:00
David Benjamin
bf681a40d6 Fix out-of-bounds read in BN_mod_exp_mont_consttime.
bn_get_bits5 always reads two bytes, even when it doesn't need to. For some
sizes of |p|, this can result in reading just past the edge of the array.
Unroll the first iteration of the loop and avoid reading out of bounds.

Replace bn_get_bits5 altogether in C as it's not doing anything interesting.

Change-Id: Ibcc8cea7d9c644a2639445396455da47fe869a5c
Reviewed-on: https://boringssl-review.googlesource.com/1393
Reviewed-by: Adam Langley <agl@google.com>
2014-08-06 00:11:47 +00:00
Adam Langley
4b5979b3fa x86_64 assembly pack: improve masm support.
(Imported from upstream's 371feee876dd8b58531cb6e50fe79262db8e4ed7)

Change-Id: Id3b5ece6b5e5f0565060d5e598ea265d64dac9df
2014-07-28 17:05:13 -07:00
Adam Langley
75b833cc81 OpenSSL: make final reduction in Montgomery multiplication constant-time.
(The issue was reported by Shay Gueron.)

The final reduction in Montgomery multiplication computes if (X >= m) then X =
X - m else X = X

In OpenSSL, this was done by computing T = X - m,  doing a constant-time
selection of the *addresses* of X and T, and loading from the resulting
address. But this is not cache-neutral.

This patch changes the behaviour by loading both X and T into registers, and
doing a constant-time selection of the *values*.

TODO(fork): only some of the fixes from the original patch still apply to
the 1.0.2 code.
2014-06-20 13:17:33 -07:00
Adam Langley
95c29f3cd1 Inital import.
Initial fork from f2d678e6e89b6508147086610e985d4e8416e867 (1.0.2 beta).

(This change contains substantial changes from the original and
effectively starts a new history.)
2014-06-20 13:17:32 -07:00