Commit Graph

9 Commits

Author SHA1 Message Date
David Benjamin
aff72a3805 Add the start of standalone iOS build support.
The built-in CMake support seems to basically work, though it believes
you want to build a fat binary which doesn't work with how we build
perlasm. (We'd need to stop conditioning on CMAKE_SYSTEM_PROCESSOR at
all, wrap all the generated assembly files in ifdefs, and convince the
build to emit more than one. Probably not worth bothering for now.)

We still, of course, need to actually test the assembly on iOS before
this can be shipped anywhere.

BUG=48

Change-Id: I6ae71d98d706be03142b82f7844d1c9b02a2b832
Reviewed-on: https://boringssl-review.googlesource.com/14645
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Steven Valdez <svaldez@google.com>
Reviewed-by: Steven Valdez <svaldez@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-04-07 17:13:44 +00:00
David Benjamin
3c4a5cbb71 Revert "Enable upstream's Poly1305 code."
This reverts commit 6f0c4db90e except for the
imported assembly files, which are left as-is but unused. Until upstream fixes
https://rt.openssl.org/Ticket/Display.html?id=4483, we shouldn't ship this
code. Once that bug has been fixed, we'll restore it.

Change-Id: I74aea18ce31a4b79657d04f8589c18d6b17f1578
Reviewed-on: https://boringssl-review.googlesource.com/7602
Reviewed-by: Emily Stark (Dunn) <estark@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2016-03-29 22:47:11 +00:00
David Benjamin
6f0c4db90e Enable upstream's Poly1305 code.
The C implementation is still our existing C implementation, but slightly
tweaked to fit with upstream's init/block/emits convention.

I've tested this by looking at code coverage in kcachegrind and

  valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes

(NB: valgrind 3.11.0 is needed for AVX2. And even that only does 64-bit AVX2,
so we can't get coverage for the 32-bit code yet. But I had to disable that
anyway.)

This was paired with a hacked up version of poly1305_test that would repeat
tests with different ia32cap and armcap values. This isn't checked in, but we
badly need a story for testing all the different variants.

I'm not happy with upstream's code in either the C/asm boundary or how it
dispatches between different versions, but just debugging the code has been a
significant time investment. I'd hoped to extract the SIMD parts and do the
rest in C, but I think we need to focus on testing first (and use that to
guide what modifications would help). For now, this version seems to work at
least.

The x86 (not x86_64) AVX2 code needs to be disabled because it's broken. It
also seems pretty unnecessary.
https://rt.openssl.org/Ticket/Display.html?id=4346

Otherwise it seems to work and buys us a decent performance improvement.
Notably, my Nexus 6P is finally faster at ChaCha20-Poly1305 than my Nexus 4!

bssl speed numbers follow:

x86
---
Old:
Did 1554000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000536us (1553167.5 ops/sec): 24.9 MB/s
Did 136000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1003947us (135465.3 ops/sec): 182.9 MB/s
Did 30000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1022990us (29325.8 ops/sec): 240.2 MB/s
Did 1888000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000206us (1887611.2 ops/sec): 30.2 MB/s
Did 173000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1003036us (172476.4 ops/sec): 232.8 MB/s
Did 30000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1027759us (29189.7 ops/sec): 239.1 MB/s
New:
Did 2030000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000507us (2028971.3 ops/sec): 32.5 MB/s
Did 404000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1000287us (403884.1 ops/sec): 545.2 MB/s
Did 83000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1001258us (82895.7 ops/sec): 679.1 MB/s
Did 2018000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000006us (2017987.9 ops/sec): 32.3 MB/s
Did 360000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1001962us (359295.1 ops/sec): 485.0 MB/s
Did 85000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1002479us (84789.8 ops/sec): 694.6 MB/s

x86_64, no AVX2
---
Old:
Did 2023000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000258us (2022478.2 ops/sec): 32.4 MB/s
Did 466000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1002619us (464782.7 ops/sec): 627.5 MB/s
Did 90000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1001133us (89898.1 ops/sec): 736.4 MB/s
Did 2238000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000175us (2237608.4 ops/sec): 35.8 MB/s
Did 483000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1001348us (482349.8 ops/sec): 651.2 MB/s
Did 90000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1003141us (89718.2 ops/sec): 735.0 MB/s
New:
Did 2558000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000275us (2557296.7 ops/sec): 40.9 MB/s
Did 510000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001810us (509078.6 ops/sec): 687.3 MB/s
Did 115000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1006457us (114262.2 ops/sec): 936.0 MB/s
Did 2818000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000187us (2817473.1 ops/sec): 45.1 MB/s
Did 418000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1001140us (417524.0 ops/sec): 563.7 MB/s
Did 91000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1002539us (90769.5 ops/sec): 743.6 MB/s

x86_64, AVX2
---
Old:
Did 2516000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000115us (2515710.7 ops/sec): 40.3 MB/s
Did 774000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1000300us (773767.9 ops/sec): 1044.6 MB/s
Did 171000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1004373us (170255.5 ops/sec): 1394.7 MB/s
Did 2580000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000144us (2579628.5 ops/sec): 41.3 MB/s
Did 769000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000472us (768637.2 ops/sec): 1037.7 MB/s
Did 169000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1000320us (168945.9 ops/sec): 1384.0 MB/s
New:
Did 3240000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000114us (3239630.7 ops/sec): 51.8 MB/s
Did 932000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1000059us (931945.0 ops/sec): 1258.1 MB/s
Did 217000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1003282us (216290.1 ops/sec): 1771.8 MB/s
Did 3187000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000100us (3186681.3 ops/sec): 51.0 MB/s
Did 926000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000071us (925934.3 ops/sec): 1250.0 MB/s
Did 215000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1000479us (214897.1 ops/sec): 1760.4 MB/s

arm, Nexus 4
---
Old:
Did 430248 ChaCha20-Poly1305 (16 bytes) seal operations in 1000153us (430182.2 ops/sec): 6.9 MB/s
Did 115250 ChaCha20-Poly1305 (1350 bytes) seal operations in 1000549us (115186.8 ops/sec): 155.5 MB/s
Did 27000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1030124us (26210.4 ops/sec): 214.7 MB/s
Did 451750 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000549us (451502.1 ops/sec): 7.2 MB/s
Did 118000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1001557us (117816.6 ops/sec): 159.1 MB/s
Did 27000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1024263us (26360.4 ops/sec): 215.9 MB/s
New:
Did 553644 ChaCha20-Poly1305 (16 bytes) seal operations in 1000183us (553542.7 ops/sec): 8.9 MB/s
Did 126000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1000396us (125950.1 ops/sec): 170.0 MB/s
Did 27000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1000336us (26990.9 ops/sec): 221.1 MB/s
Did 559000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1001465us (558182.3 ops/sec): 8.9 MB/s
Did 124000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000824us (123897.9 ops/sec): 167.3 MB/s
Did 28000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1034854us (27057.0 ops/sec): 221.7 MB/s

aarch64, Nexus 6P
---
Old:
Did 358000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000358us (357871.9 ops/sec): 5.7 MB/s
Did 45000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1022386us (44014.7 ops/sec): 59.4 MB/s
Did 8657 ChaCha20-Poly1305 (8192 bytes) seal operations in 1063722us (8138.4 ops/sec): 66.7 MB/s
Did 350000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000074us (349974.1 ops/sec): 5.6 MB/s
Did 44000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1007907us (43654.8 ops/sec): 58.9 MB/s
Did 8525 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1042644us (8176.3 ops/sec): 67.0 MB/s
New:
Did 713000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000190us (712864.6 ops/sec): 11.4 MB/s
Did 180000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1004249us (179238.4 ops/sec): 242.0 MB/s
Did 41000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1005811us (40763.1 ops/sec): 333.9 MB/s
Did 775000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000719us (774443.2 ops/sec): 12.4 MB/s
Did 182000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1003529us (181360.0 ops/sec): 244.8 MB/s
Did 41000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1010576us (40570.9 ops/sec): 332.4 MB/s

Change-Id: Iaa4ab86ac1174b79833077963cc3616cfb08e686
Reviewed-on: https://boringssl-review.googlesource.com/7226
Reviewed-by: Adam Langley <agl@google.com>
2016-02-26 16:05:14 +00:00
Adam Langley
0dd93002dd Revert section changes for ASM.
This change reverts the following commits:
  72d9cba7cb
  5b61b9ebc5
  3f85e04f40
  2ab24a2d40

Change-Id: I669b83f2269cf96aa71a649a346147b9407a811e
Reviewed-on: https://boringssl-review.googlesource.com/6056
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2015-09-30 22:09:52 +00:00
Adam Langley
3f85e04f40 Add sections to Poly1305 ARM asm code.
This code isn't generated by perlasm and so the section directives need
to be added manually.

Change-Id: I46158741743859679decbce99097fe6071bf8012
Reviewed-on: https://boringssl-review.googlesource.com/6005
Reviewed-by: Adam Langley <agl@google.com>
2015-09-29 18:04:14 +00:00
Adam Langley
041e4dd5e2 Fix ARM Clang build.
The immediate in this operation is too large for ARM. GCC will
automatically rewrite it to use bic (where bic does an AND NOT). Clang,
however doesn't, and reasonably throws an error.

This change switches to using bic in the source file, thus making both
happy.

Change-Id: I958fa29b88bffeab20c6ee11660736222a2e6986
Reviewed-on: https://boringssl-review.googlesource.com/4410
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2015-05-05 00:24:59 +00:00
Adam Langley
60d612fdcf Fix ARM build with OPENSSL_NO_ASM.
Change-Id: Id77fb7c904cbfe8172466dff20b6a715d90b806c
Reviewed-on: https://boringssl-review.googlesource.com/1710
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2014-09-03 19:23:25 +00:00
Adam Langley
eb7d2ed1fe Add visibility rules.
This change marks public symbols as dynamically exported. This means
that it becomes viable to build a shared library of libcrypto and libssl
with -fvisibility=hidden.

On Windows, one not only needs to mark functions for export in a
component, but also for import when using them from a different
component. Because of this we have to build with
|BORINGSSL_IMPLEMENTATION| defined when building the code. Other
components, when including our headers, won't have that defined and then
the |OPENSSL_EXPORT| tag becomes an import tag instead. See the #defines
in base.h

In the asm code, symbols are now hidden by default and those that need
to be exported are wrapped by a C function.

In order to support Chromium, a couple of libssl functions were moved to
ssl.h from ssl_locl.h: ssl_get_new_session and ssl_update_cache.

Change-Id: Ib4b76e2f1983ee066e7806c24721e8626d08a261
Reviewed-on: https://boringssl-review.googlesource.com/1350
Reviewed-by: Adam Langley <agl@google.com>
2014-07-31 22:03:11 +00:00
Adam Langley
de0b202684 ChaCha20-Poly1305 support. 2014-06-20 13:17:35 -07:00