Commit Graph

14 Commits

Author SHA1 Message Date
David Benjamin
fdd8e9c8c7 Switch perlasm calling convention.
Depending on architecture, perlasm differed on which one or both of:

  perl foo.pl flavor output.S
  perl foo.pl flavor > output.S

Upstream has now unified on the first form after making a number of
changes to their files (the second does not even work for their x86
files anymore). Sync those portions of our perlasm scripts with upstream
and update CMakeLists.txt and generate_build_files.py per the new
convention.

This imports various commits like this one:
184bc45f683c76531d7e065b6553ca9086564576 (this was done by taking a
diff, so I don't have the full list)

Confirmed that generate_build_files.py sees no change.

BUG=14

Change-Id: Id2fb5b8bc2a7369d077221b5df9a6947d41f50d2
Reviewed-on: https://boringssl-review.googlesource.com/8518
Reviewed-by: Adam Langley <agl@google.com>
2016-06-27 21:59:26 +00:00
David Benjamin
ce7ae6fa27 Enable AVX code for SHA-*.
SHA-1, SHA-256, and SHA-512 get a 12-26%, 17-23%, and 33-37% improvement,
respectively on x86-64. SHA-1 and SHA-256 get a 8-20% and 14-17% improvement on
x86. (x86 does not have AVX code for SHA-512.) This costs us 12k of binary size
on x86-64 and 8k of binary size on x86.

$ bssl speed SHA- (x86-64, before)
Did 4811000 SHA-1 (16 bytes) operations in 1000013us (4810937.5 ops/sec): 77.0 MB/s
Did 1414000 SHA-1 (256 bytes) operations in 1000253us (1413642.3 ops/sec): 361.9 MB/s
Did 56000 SHA-1 (8192 bytes) operations in 1002640us (55852.5 ops/sec): 457.5 MB/s
Did 2536000 SHA-256 (16 bytes) operations in 1000140us (2535645.0 ops/sec): 40.6 MB/s
Did 603000 SHA-256 (256 bytes) operations in 1001613us (602028.9 ops/sec): 154.1 MB/s
Did 25000 SHA-256 (8192 bytes) operations in 1010132us (24749.2 ops/sec): 202.7 MB/s
Did 1767000 SHA-512 (16 bytes) operations in 1000477us (1766157.5 ops/sec): 28.3 MB/s
Did 638000 SHA-512 (256 bytes) operations in 1000933us (637405.3 ops/sec): 163.2 MB/s
Did 32000 SHA-512 (8192 bytes) operations in 1025646us (31199.8 ops/sec): 255.6 MB/s

$ bssl speed SHA- (x86-64, after)
Did 5438000 SHA-1 (16 bytes) operations in 1000060us (5437673.7 ops/sec): 87.0 MB/s
Did 1590000 SHA-1 (256 bytes) operations in 1000181us (1589712.3 ops/sec): 407.0 MB/s
Did 71000 SHA-1 (8192 bytes) operations in 1007958us (70439.4 ops/sec): 577.0 MB/s
Did 2955000 SHA-256 (16 bytes) operations in 1000251us (2954258.5 ops/sec): 47.3 MB/s
Did 740000 SHA-256 (256 bytes) operations in 1000628us (739535.6 ops/sec): 189.3 MB/s
Did 31000 SHA-256 (8192 bytes) operations in 1019619us (30403.5 ops/sec): 249.1 MB/s
Did 2348000 SHA-512 (16 bytes) operations in 1000285us (2347331.0 ops/sec): 37.6 MB/s
Did 878000 SHA-512 (256 bytes) operations in 1001064us (877066.8 ops/sec): 224.5 MB/s
Did 43000 SHA-512 (8192 bytes) operations in 1002485us (42893.4 ops/sec): 351.4 MB/s

$ bssl speed SHA- (x86, before, SHA-512 redacted because irrelevant)
Did 4319000 SHA-1 (16 bytes) operations in 1000066us (4318715.0 ops/sec): 69.1 MB/s
Did 1306000 SHA-1 (256 bytes) operations in 1000437us (1305429.5 ops/sec): 334.2 MB/s
Did 58000 SHA-1 (8192 bytes) operations in 1014807us (57153.7 ops/sec): 468.2 MB/s
Did 2291000 SHA-256 (16 bytes) operations in 1000343us (2290214.5 ops/sec): 36.6 MB/s
Did 594000 SHA-256 (256 bytes) operations in 1000684us (593594.0 ops/sec): 152.0 MB/s
Did 25000 SHA-256 (8192 bytes) operations in 1030688us (24255.6 ops/sec): 198.7 MB/s

$ bssl speed SHA- (x86, after, SHA-512 redacted because irrelevant)
Did 4673000 SHA-1 (16 bytes) operations in 1000063us (4672705.6 ops/sec): 74.8 MB/s
Did 1484000 SHA-1 (256 bytes) operations in 1000453us (1483328.1 ops/sec): 379.7 MB/s
Did 69000 SHA-1 (8192 bytes) operations in 1008305us (68431.7 ops/sec): 560.6 MB/s
Did 2684000 SHA-256 (16 bytes) operations in 1000196us (2683474.0 ops/sec): 42.9 MB/s
Did 679000 SHA-256 (256 bytes) operations in 1000525us (678643.7 ops/sec): 173.7 MB/s
Did 29000 SHA-256 (8192 bytes) operations in 1033251us (28066.8 ops/sec): 229.9 MB/s

Change-Id: I952a3b4fc4c52ebb50690da3b8c97770e8342e98
Reviewed-on: https://boringssl-review.googlesource.com/6470
Reviewed-by: Adam Langley <agl@google.com>
2015-11-12 20:03:32 +00:00
David Benjamin
278d34234f Get rid of all compiler version checks in perlasm files.
Since we pre-generate our perlasm, having the output of these files be
sensitive to the environment the run in is unhelpful. It would be bad to
suddenly change what features we do or don't compile in whenever workstations'
toolchains change or if developers do or don't have CC variables set.

Previously, all compiler-version-gated features were turned on in
https://boringssl-review.googlesource.com/6260, but this broke the build. I
also wasn't thorough enough in gathering performance numbers. So, flip them all
to off instead. I'll enable them one-by-one as they're tested.

This should result in no change to generated assembly.

Change-Id: Ib4259b3f97adc4939cb0557c5580e8def120d5bc
Reviewed-on: https://boringssl-review.googlesource.com/6383
Reviewed-by: Adam Langley <agl@google.com>
2015-10-28 19:33:04 +00:00
David Benjamin
75885e29c4 Revert "Get rid of all compiler version checks in perlasm files."
This reverts commit b9c26014de.

The win64 bot seems unhappy. Will sniff at it tomorrow. In
the meantime, get the tree green again.

Change-Id: I058ddb3ec549beee7eabb2f3f72feb0a4a5143b2
Reviewed-on: https://boringssl-review.googlesource.com/6353
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 23:12:39 +00:00
David Benjamin
b9c26014de Get rid of all compiler version checks in perlasm files.
Since we pre-generate our perlasm, having the output of these files be
sensitive to the environment the run in is unhelpful. It would be bad to
suddenly change what features we do or don't compile in whenever workstations'
toolchains change.

Enable all compiler-version-gated features as they should all be runtime-gated
anyway. This should align with what upstream's files would have produced on
modern toolschains. We should assume our assemblers can take whatever we'd like
to throw at them. (If it turns out some can't, we'd rather find out and
probably switch the problematic instructions to explicit byte sequences.)

This actually results in a fairly significant change to the assembly we
generate. I'm guessing upstream's buildsystem sets the CC environment variable,
while ours doesn't and so the version checks were all coming out conservative.

diffstat of generated files:

 linux-x86/crypto/sha/sha1-586.S              | 1176 ++++++++++++
 linux-x86/crypto/sha/sha256-586.S            | 2248 ++++++++++++++++++++++++
 linux-x86_64/crypto/bn/rsaz-avx2.S           | 1644 +++++++++++++++++
 linux-x86_64/crypto/bn/rsaz-x86_64.S         |  638 ++++++
 linux-x86_64/crypto/bn/x86_64-mont.S         |  332 +++
 linux-x86_64/crypto/bn/x86_64-mont5.S        | 1130 ++++++++++++
 linux-x86_64/crypto/modes/aesni-gcm-x86_64.S |  754 ++++++++
 linux-x86_64/crypto/modes/ghash-x86_64.S     |  475 +++++
 linux-x86_64/crypto/sha/sha1-x86_64.S        | 1121 ++++++++++++
 linux-x86_64/crypto/sha/sha256-x86_64.S      | 1062 +++++++++++
 linux-x86_64/crypto/sha/sha512-x86_64.S      | 2241 ++++++++++++++++++++++++
 mac-x86/crypto/sha/sha1-586.S                | 1174 ++++++++++++
 mac-x86/crypto/sha/sha256-586.S              | 2248 ++++++++++++++++++++++++
 mac-x86_64/crypto/bn/rsaz-avx2.S             | 1637 +++++++++++++++++
 mac-x86_64/crypto/bn/rsaz-x86_64.S           |  638 ++++++
 mac-x86_64/crypto/bn/x86_64-mont.S           |  331 +++
 mac-x86_64/crypto/bn/x86_64-mont5.S          | 1130 ++++++++++++
 mac-x86_64/crypto/modes/aesni-gcm-x86_64.S   |  750 ++++++++
 mac-x86_64/crypto/modes/ghash-x86_64.S       |  475 +++++
 mac-x86_64/crypto/sha/sha1-x86_64.S          | 1121 ++++++++++++
 mac-x86_64/crypto/sha/sha256-x86_64.S        | 1062 +++++++++++
 mac-x86_64/crypto/sha/sha512-x86_64.S        | 2241 ++++++++++++++++++++++++
 win-x86/crypto/sha/sha1-586.asm              | 1173 ++++++++++++
 win-x86/crypto/sha/sha256-586.asm            | 2248 ++++++++++++++++++++++++
 win-x86_64/crypto/bn/rsaz-avx2.asm           | 1858 +++++++++++++++++++-
 win-x86_64/crypto/bn/rsaz-x86_64.asm         |  638 ++++++
 win-x86_64/crypto/bn/x86_64-mont.asm         |  352 +++
 win-x86_64/crypto/bn/x86_64-mont5.asm        | 1184 ++++++++++++
 win-x86_64/crypto/modes/aesni-gcm-x86_64.asm |  933 ++++++++++
 win-x86_64/crypto/modes/ghash-x86_64.asm     |  515 +++++
 win-x86_64/crypto/sha/sha1-x86_64.asm        | 1152 ++++++++++++
 win-x86_64/crypto/sha/sha256-x86_64.asm      | 1088 +++++++++++
 win-x86_64/crypto/sha/sha512-x86_64.asm      | 2499 ++++++

SHA* gets faster. RSA and AES-GCM seem to be more of a wash and even slower
sometimes!  This is a little concerning. Though when I repeated the latter two,
it's definitely noisy (RSA in particular), so we may wish to repeat in a more
controlled environment. We could also flip some of these toggles to something
other than the highest setting if it seems some of the variants aren't
desirable. We just shouldn't have them enabled or disabled on accident. This
aligns us closer to upstream though.

$ /tmp/bssl.old speed SHA-
Did 5028000 SHA-1 (16 bytes) operations in 1000048us (5027758.7 ops/sec): 80.4 MB/s
Did 1708000 SHA-1 (256 bytes) operations in 1000257us (1707561.2 ops/sec): 437.1 MB/s
Did 73000 SHA-1 (8192 bytes) operations in 1008406us (72391.5 ops/sec): 593.0 MB/s
Did 3041000 SHA-256 (16 bytes) operations in 1000311us (3040054.5 ops/sec): 48.6 MB/s
Did 779000 SHA-256 (256 bytes) operations in 1000820us (778361.7 ops/sec): 199.3 MB/s
Did 26000 SHA-256 (8192 bytes) operations in 1009875us (25745.8 ops/sec): 210.9 MB/s
Did 1837000 SHA-512 (16 bytes) operations in 1000251us (1836539.0 ops/sec): 29.4 MB/s
Did 803000 SHA-512 (256 bytes) operations in 1000969us (802222.6 ops/sec): 205.4 MB/s
Did 41000 SHA-512 (8192 bytes) operations in 1016768us (40323.8 ops/sec): 330.3 MB/s
$ /tmp/bssl.new speed SHA-
Did 5354000 SHA-1 (16 bytes) operations in 1000104us (5353443.2 ops/sec): 85.7 MB/s
Did 1779000 SHA-1 (256 bytes) operations in 1000121us (1778784.8 ops/sec): 455.4 MB/s
Did 87000 SHA-1 (8192 bytes) operations in 1012641us (85914.0 ops/sec): 703.8 MB/s
Did 3517000 SHA-256 (16 bytes) operations in 1000114us (3516599.1 ops/sec): 56.3 MB/s
Did 935000 SHA-256 (256 bytes) operations in 1000096us (934910.2 ops/sec): 239.3 MB/s
Did 38000 SHA-256 (8192 bytes) operations in 1004476us (37830.7 ops/sec): 309.9 MB/s
Did 2930000 SHA-512 (16 bytes) operations in 1000259us (2929241.3 ops/sec): 46.9 MB/s
Did 1008000 SHA-512 (256 bytes) operations in 1000509us (1007487.2 ops/sec): 257.9 MB/s
Did 45000 SHA-512 (8192 bytes) operations in 1000593us (44973.3 ops/sec): 368.4 MB/s

$ /tmp/bssl.old speed RSA
Did 820 RSA 2048 signing operations in 1017008us (806.3 ops/sec)
Did 27000 RSA 2048 verify operations in 1015400us (26590.5 ops/sec)
Did 1292 RSA 2048 (3 prime, e=3) signing operations in 1008185us (1281.5 ops/sec)
Did 65000 RSA 2048 (3 prime, e=3) verify operations in 1011388us (64268.1 ops/sec)
Did 120 RSA 4096 signing operations in 1061027us (113.1 ops/sec)
Did 8208 RSA 4096 verify operations in 1002717us (8185.8 ops/sec)
$ /tmp/bssl.new speed RSA
Did 760 RSA 2048 signing operations in 1003351us (757.5 ops/sec)
Did 25900 RSA 2048 verify operations in 1028931us (25171.8 ops/sec)
Did 1320 RSA 2048 (3 prime, e=3) signing operations in 1040806us (1268.2 ops/sec)
Did 63000 RSA 2048 (3 prime, e=3) verify operations in 1016042us (62005.3 ops/sec)
Did 104 RSA 4096 signing operations in 1008718us (103.1 ops/sec)
Did 6875 RSA 4096 verify operations in 1093441us (6287.5 ops/sec)

$ /tmp/bssl.old speed GCM
Did 5316000 AES-128-GCM (16 bytes) seal operations in 1000082us (5315564.1 ops/sec): 85.0 MB/s
Did 712000 AES-128-GCM (1350 bytes) seal operations in 1000252us (711820.6 ops/sec): 961.0 MB/s
Did 149000 AES-128-GCM (8192 bytes) seal operations in 1003182us (148527.4 ops/sec): 1216.7 MB/s
Did 5919750 AES-256-GCM (16 bytes) seal operations in 1000016us (5919655.3 ops/sec): 94.7 MB/s
Did 800000 AES-256-GCM (1350 bytes) seal operations in 1000951us (799239.9 ops/sec): 1079.0 MB/s
Did 152000 AES-256-GCM (8192 bytes) seal operations in 1000765us (151883.8 ops/sec): 1244.2 MB/s
$ /tmp/bssl.new speed GCM
Did 5315000 AES-128-GCM (16 bytes) seal operations in 1000125us (5314335.7 ops/sec): 85.0 MB/s
Did 755000 AES-128-GCM (1350 bytes) seal operations in 1000878us (754337.7 ops/sec): 1018.4 MB/s
Did 151000 AES-128-GCM (8192 bytes) seal operations in 1005655us (150150.9 ops/sec): 1230.0 MB/s
Did 5913500 AES-256-GCM (16 bytes) seal operations in 1000041us (5913257.6 ops/sec): 94.6 MB/s
Did 782000 AES-256-GCM (1350 bytes) seal operations in 1001484us (780841.2 ops/sec): 1054.1 MB/s
Did 121000 AES-256-GCM (8192 bytes) seal operations in 1006389us (120231.8 ops/sec): 984.9 MB/s

Change-Id: I0efb32f896c597abc7d7e55c31d038528a5c72a1
Reviewed-on: https://boringssl-review.googlesource.com/6260
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 20:31:30 +00:00
David Benjamin
e189c86bc7 Consistently disable the Intel SHA Extensions code.
We haven't tested it yet, but it was only disabled on 64-bit. Disable it on
32-bit as well until we're ready to turn it on.

Change-Id: I50e74aef2c5c3ba539a868c2bb6fb90fdf28a5f0
Reviewed-on: https://boringssl-review.googlesource.com/6271
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 20:27:52 +00:00
David Benjamin
a3a80b23eb Convert remaining Latin-1 files to UTF-8.
See upstream's 9f0b86c68bb96d49301bbd6473c8235ca05ca06b. Generated by
using upstream's script in 5a3ce86e21715a683ff0d32421ed5c6d5e84234d and
then manually throwing out the false positives. (We converted a bunch of
stuff already in 91157550061d5d794898fe47b95384a7ba5f7b9d.)

This may require some wrestling with depot_tools to land in Chromium due
to Rietveld's encoding bugs, but hopefully that will avoid future
problems; Rietveld breaks if either old or new file is Latin-1.

Change-Id: I26dcb20c7377f92a0c843ef5d74d440a82ea8ceb
Reviewed-on: https://boringssl-review.googlesource.com/5483
Reviewed-by: Adam Langley <agl@google.com>
2015-07-29 19:22:55 +00:00
David Benjamin
2b48d6b7dd sha/asm/sha1-586.pl: fix typo.
The typo doesn't affect supported configuration, only unsupported masm.

(Imported from upstream's 3372c4fffa0556a688f8f1f550b095051398f596)

Change-Id: Ib6a2f1d9f6fc244a33da1e079188acdf69d5e2f3
Reviewed-on: https://boringssl-review.googlesource.com/3580
Reviewed-by: Adam Langley <agl@google.com>
2015-02-23 19:44:50 +00:00
Adam Langley
3dfbcc1f25 x86[_64] assembly pack: add Silvermont performance data.
(Imported from upstream's 9dd6240201fdd9a9a0ce2aa66df04c174d08cf99)

Change-Id: Ie0f6f876e06ac28c717ec949565f6b0126166b30
2014-11-10 13:45:32 -08:00
Adam Langley
2811da2eca x86_64 assembly pack: allow clang to compile AVX code.
(Imported from upstream's 912f08dd5ed4f68fb275f3b2db828349fcffba14,
52f856526c46ee80ef4c8c37844f084423a3eff7 and
377551b9c4e12aa7846f4d80cf3604f2e396c964)

Change-Id: Ic2bf93371f6d246818729810e7a45b3f0021845a
2014-07-28 17:05:13 -07:00
Adam Langley
006779a02c Add benchmarks for hash functions to bssl speed. 2014-06-20 13:17:42 -07:00
Adam Langley
cb5dd63e5e Add support for Intel SHA extension.
(Imported from upstream's 70fddbe32a7b3400a6ad0a9265f2c0ed72988d27)
2014-06-20 13:17:42 -07:00
Adam Langley
5c6ca976c8 Update SHA asm from master.
(Imported from upstream's 729d334106e6ef3a2b2f4f9cb2520669a07ae79d)
2014-06-20 13:17:37 -07:00
Adam Langley
95c29f3cd1 Inital import.
Initial fork from f2d678e6e89b6508147086610e985d4e8416e867 (1.0.2 beta).

(This change contains substantial changes from the original and
effectively starts a new history.)
2014-06-20 13:17:32 -07:00