Commit Graph

10 Commits

Author SHA1 Message Date
David Benjamin
884086e0e2 Remove x86_64 x25519 assembly.
Now that we have 64-bit C code, courtesy of fiat-crypto, the tradeoff
for carrying the assembly changes:

Assembly:
Did 16000 Curve25519 base-point multiplication operations in 1059932us (15095.3 ops/sec)
Did 16000 Curve25519 arbitrary point multiplication operations in 1060023us (15094.0 ops/sec)

fiat64:
Did 39000 Curve25519 base-point multiplication operations in 1004712us (38817.1 ops/sec)
Did 14000 Curve25519 arbitrary point multiplication operations in 1006827us (13905.1 ops/sec)

The assembly is still about 9% faster than fiat64, but fiat64 gets to
use the Ed25519 tables for the base point multiplication, so overall it
is actually faster to disable the assembly:

>>> 1/(1/15094.0 + 1/15095.3)
7547.324986004976
>>> 1/(1/38817.1 + 1/13905.1)
10237.73016319501

(At the cost of touching a 30kB table.)

The assembly implementation is no longer pulling its weight. Remove it
and use the fiat code in all build configurations.

Change-Id: Id736873177d5568bb16ea06994b9fcb1af104e33
Reviewed-on: https://boringssl-review.googlesource.com/25524
Reviewed-by: Adam Langley <agl@google.com>
2018-02-01 21:44:58 +00:00
Andreas Auernhammer
e7d3922b43 Improve Curve25519 cswap x64 assembly
This change replace the cmovq scheme with slightly faster SSE2 code.
The SSE2 code was first introduced in Go's curve25519 implementation.
See: https://go-review.googlesource.com/c/39693/

The implementation is basicly copied from the Go assembly.

Change-Id: I25931a421ba141ce33809875699f048b0941c061
Reviewed-on: https://boringssl-review.googlesource.com/16564
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-05-23 22:51:48 +00:00
David Benjamin
aff72a3805 Add the start of standalone iOS build support.
The built-in CMake support seems to basically work, though it believes
you want to build a fat binary which doesn't work with how we build
perlasm. (We'd need to stop conditioning on CMAKE_SYSTEM_PROCESSOR at
all, wrap all the generated assembly files in ifdefs, and convince the
build to emit more than one. Probably not worth bothering for now.)

We still, of course, need to actually test the assembly on iOS before
this can be shipped anywhere.

BUG=48

Change-Id: I6ae71d98d706be03142b82f7844d1c9b02a2b832
Reviewed-on: https://boringssl-review.googlesource.com/14645
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Steven Valdez <svaldez@google.com>
Reviewed-by: Steven Valdez <svaldez@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-04-07 17:13:44 +00:00
Adam Langley
772a5bed7d Reorder the X25519 ladderstep stack frame on x86-64.
The current X25519 assembly has a 352-byte stack frame and saves the
regsiters at the bottom. This means that the CFI information cannot be
represented in the “compact” form that MacOS seems to want to use (see
linked bug).

The stack frame looked like:

 360 CFA
 352 return address
 ⋮
 56  (296 bytes of scratch space)
 48  saved RBP
 40  saved RBX
 32  saved R15
 24  saved R14
 16  saved R13
 8   saved R12
 0   (hole left from 3f38d80b dropping the superfluous saving of R11)

Now it looks like:

 352 CFA
 344 return address
 336 saved RBP
 328 saved RBX
 320 saved R15
 312 saved R14
 304 saved R13
 296 saved R12
 ⋮
 0   (296 bytes of scratch space)

The bulk of the changes involve subtracting 56 from all the offsets to
RSP when working in the scratch space. This was done in Vim with:
  '<,'>s/\([1-9][0-9]*\)(%rsp)/\=submatch(1)-56."(%rsp)"/

BUG=176

Change-Id: I022830e8f896fe2d877015fa3ecfa1d073207679
Reviewed-on: https://boringssl-review.googlesource.com/13580
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-02-02 22:47:05 +00:00
David Benjamin
5c9d411e14 Fix some compact unwind errors.
The Mac ld gets unhappy about "weird" unwind directives:

In chacha20_poly1305_x86_64.pl, $keyp is being pushed on the stack
(according to the comment) because it gets clobbered in the computation
somewhere. $keyp is %r9 which is not callee-saved (it's an argument
register), so we don't need to tag it with .cfi_offset.

In x25519-asm-x86_64.S, x25519_x86_64_mul saves %rdi on the stack.
However it too is not callee-saved (it's an argument register) and
should not have a .cfi_offset. %rdi also does not appear to be written
to anywhere in the function, so there's no need to save it at all.

(This does not resolve the "r15 is saved too far from return address"
errors. Just the non-standard register ones.)

BUG=176

Change-Id: I53f3f7db3d1745384fb47cb52cd6536aabb5065e
Reviewed-on: https://boringssl-review.googlesource.com/13560
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2017-02-02 22:05:06 +00:00
Adam Langley
3f38d80b2f Add CFI information to the x86-64 X25519 asm.
This change serves to check that all our consumers can process assembly
with CFI directives in it.

For the first change I picked a file that's not perlasm to keep things
slightly simplier, but that might have been a mistake:

DJB's tooling always aligns the stack to 32 bytes and it's not possible
to express this in DWARF format (without using a register to store the
old stack pointer).

Since none of the functions here appear to care about that alignment, I
removed it from each of them. I also trimmed the set of saved registers
where possible and used the redzone for functions that didn't need much
stack.

Overall, this appears to have slightly improved the performance (by
about 0.7%):

Before:

Did 46000 Curve25519 base-point multiplication operations in 3023288us (15215.2 ops/sec)
Did 46000 Curve25519 arbitrary point multiplication operations in 3017315us (15245.3 ops/sec)
Did 46000 Curve25519 base-point multiplication operations in 3015346us (15255.3 ops/sec)
Did 46000 Curve25519 arbitrary point multiplication operations in 3018609us (15238.8 ops/sec)
Did 46000 Curve25519 base-point multiplication operations in 3019004us (15236.8 ops/sec)
Did 46000 Curve25519 arbitrary point multiplication operations in 3013135us (15266.5 ops/sec)

After:

Did 46000 Curve25519 base-point multiplication operations in 3007659us (15294.3 ops/sec)
Did 47000 Curve25519 arbitrary point multiplication operations in 3054202us (15388.6 ops/sec)
Did 46000 Curve25519 base-point multiplication operations in 3008714us (15288.9 ops/sec)
Did 46000 Curve25519 arbitrary point multiplication operations in 3004740us (15309.1 ops/sec)
Did 46000 Curve25519 base-point multiplication operations in 3009140us (15286.8 ops/sec)
Did 47000 Curve25519 arbitrary point multiplication operations in 3057518us (15371.9 ops/sec)

Change-Id: I31df11c45b2ea0bf44dde861d52c27f848331691
Reviewed-on: https://boringssl-review.googlesource.com/13200
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2017-01-31 17:55:19 +00:00
William Hesse
bf3335c621 Add #ifdef guards to crypto/curve25519 assembly files.
Add guards for the architecture and OPENSSL_NO_ASM to
the assembly-language files in crypto/curve25519/asm.
The Dart compilation of BoringSSL includes all files,
because the architecture is not known when gyp is run.

Change-Id: I66f5ae525266b63b0fe3a929012b771d545779b5
Reviewed-on: https://boringssl-review.googlesource.com/7030
Reviewed-by: Adam Langley <agl@google.com>
2016-02-02 16:03:33 +00:00
Adam Langley
7b8b9c17db Include 'asm' in the name of X25519 asm sources.
Some build systems don't like two targets with the same base name and
the curve25519 code had x25519-x86_64.[Sc].

Change-Id: If8382eb84996d7e75b34b28def57829d93019cff
Reviewed-on: https://boringssl-review.googlesource.com/6878
Reviewed-by: Adam Langley <agl@google.com>
2016-01-05 16:05:50 +00:00
Adam Langley
77a173efed Add x86-64 assembly for X25519.
This assembly is in gas syntax so is not built on Windows nor when
OPENSSL_SMALL is defined.

Change-Id: I1050cf1b16350fd4b758e4c463261b30a1b65390
Reviewed-on: https://boringssl-review.googlesource.com/6782
Reviewed-by: Adam Langley <agl@google.com>
2015-12-22 16:22:38 +00:00
Adam Langley
b1b6229fc8 Add NEON implementation of curve25519.
Nexus 7 goes from 1002.8 ops/sec to 4704.8 at a cost of 10KB of code.
(It'll actually save code if built with -mfpu=neon because then the
generic version can be discarded by the compiler.)

Change-Id: Ia6d02efb2c2d1bb02a07eb56ec4ca3b0dba99382
Reviewed-on: https://boringssl-review.googlesource.com/6524
Reviewed-by: Adam Langley <agl@google.com>
2015-11-19 00:20:38 +00:00