Commit Graph

7 Commits

Author SHA1 Message Date
Brian Smith
644539191b chacha20_poly1305_x86_64.pl: Suppress Yasm non-local label warnings.
Before, attempting to build the code using Yasm as the assembler would
result in warnings like this:

    warning : no non-local label before `.chacha20_consts'

Precede the local labels with a non-local label to suppress these
warnings.

It isn't clear why these labels are defined as local labels instead of
regular labels.  Making them non-local may be a better idea.

For reference, Yasm's interpretation of local labels is described
succinctly at
https://www.tortall.net/projects/yasm/manual/html/nasm-local-label.html.

Change-Id: Ifc92de7fd7379859fe33f1137ab20b6ec282cd0b
Reviewed-on: https://boringssl-review.googlesource.com/13384
Reviewed-by: Adam Langley <agl@google.com>
2017-02-09 18:05:41 +00:00
David Benjamin
5c9d411e14 Fix some compact unwind errors.
The Mac ld gets unhappy about "weird" unwind directives:

In chacha20_poly1305_x86_64.pl, $keyp is being pushed on the stack
(according to the comment) because it gets clobbered in the computation
somewhere. $keyp is %r9 which is not callee-saved (it's an argument
register), so we don't need to tag it with .cfi_offset.

In x25519-asm-x86_64.S, x25519_x86_64_mul saves %rdi on the stack.
However it too is not callee-saved (it's an argument register) and
should not have a .cfi_offset. %rdi also does not appear to be written
to anywhere in the function, so there's no need to save it at all.

(This does not resolve the "r15 is saved too far from return address"
errors. Just the non-standard register ones.)

BUG=176

Change-Id: I53f3f7db3d1745384fb47cb52cd6536aabb5065e
Reviewed-on: https://boringssl-review.googlesource.com/13560
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2017-02-02 22:05:06 +00:00
Brian Smith
360a4c2616 chacha20_poly1305_x86_64.pl: Use NASM-compatible syntax for |ldea|.
Cargo-cult the way other Perlasm scripts do it.

Change-Id: I86aaf725e41b601f24595518a8a6bc481fa0c7fc
Reviewed-on: https://boringssl-review.googlesource.com/13382
Reviewed-by: Adam Langley <agl@google.com>
2017-01-27 23:17:13 +00:00
Brian Smith
357a9f23fe chacha20_poly1305_x86_64.pl: Use |imulq| instead of |imul|.
Perlasm requires the size suffix when targeting NASM and Yasm; without
it, the resulting .asm file has |imu| instead of |imul|.

Change-Id: Icb95b8c0b68cf4f93becdc1930dc217398f56bec
Reviewed-on: https://boringssl-review.googlesource.com/13381
Reviewed-by: Adam Langley <agl@google.com>
2017-01-27 23:16:52 +00:00
Brian Smith
3416d28a57 chacha20_poly1305_x86_64.pl: Escape command line args like other PerlAsm scripts.
Use the same quoting used in other files so that this file can be built
the same way as other files on platforms that require the other kind of
quoting.

Change-Id: I808769bf014fbfe526fedcdc1e1f617b3490d03b
Reviewed-on: https://boringssl-review.googlesource.com/13380
Reviewed-by: Adam Langley <agl@google.com>
2017-01-27 23:16:27 +00:00
Adam Langley
1da9c67a99 Use a Perlasm variable rather than an #if to exclude the ChaCha20-Poly1305 asm on Windows.
The Windows assembler doesn't appear to do preprocessor macros but nor
can it cope with this style of label.

Change-Id: I0b8ca7372bb9ea0f20101ed138681d379944658e
Reviewed-on: https://boringssl-review.googlesource.com/13207
Reviewed-by: David Benjamin <davidben@google.com>
2017-01-23 22:05:06 +00:00
vkrasnov
8d56558031 Optimized Seal/Open routines for ChaCha20-Poly1305 for x86-64
This is basically the same implementation I wrote for Go

The Go implementation:
https://github.com/golang/crypto/blob/master/chacha20poly1305/chacha20poly1305_amd64.s
The Cloudflare patch for OpenSSL:
https://github.com/cloudflare/sslconfig/blob/master/patches/openssl__chacha20_poly1305_draft_and_rfc_ossl102j.patch

The Seal/Open is only available for the new version, the old one uses
the bundled Poly1305, and the existing ChaCha20 implementations

The benefits of this code, compared to the optimized code currently
disabled in BoringSSL:

* Passes test vectors
* Faster performance: The AVX2 code (on Haswell), is 55% faster for 16B,
  15% for 1350 and 6% for 8192 byte buffers
* Even faster on pre-AVX2 CPUs

Feel free to put whatever license, etc. is appropriate, under the
existing CLA.

Benchmarks are for 16/1350/8192 chunk sizes and given in MB/s:

Before (Ivy Bridge): 34.2   589.5  739.4
After:               68.4   692.1  799.4
Before (Skylake):    50    1233   1649
After:              119.4  1736   2196
After (Andy's):      63.6  1608   2261

Change-Id: I9186f721812655011fc17698b67ddbe8a1c7203b
Reviewed-on: https://boringssl-review.googlesource.com/13142
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2017-01-23 21:12:44 +00:00