Commit Graph

5 Commits

Author SHA1 Message Date
Brian Smith
360a4c2616 chacha20_poly1305_x86_64.pl: Use NASM-compatible syntax for |ldea|.
Cargo-cult the way other Perlasm scripts do it.

Change-Id: I86aaf725e41b601f24595518a8a6bc481fa0c7fc
Reviewed-on: https://boringssl-review.googlesource.com/13382
Reviewed-by: Adam Langley <agl@google.com>
2017-01-27 23:17:13 +00:00
Brian Smith
357a9f23fe chacha20_poly1305_x86_64.pl: Use |imulq| instead of |imul|.
Perlasm requires the size suffix when targeting NASM and Yasm; without
it, the resulting .asm file has |imu| instead of |imul|.

Change-Id: Icb95b8c0b68cf4f93becdc1930dc217398f56bec
Reviewed-on: https://boringssl-review.googlesource.com/13381
Reviewed-by: Adam Langley <agl@google.com>
2017-01-27 23:16:52 +00:00
Brian Smith
3416d28a57 chacha20_poly1305_x86_64.pl: Escape command line args like other PerlAsm scripts.
Use the same quoting used in other files so that this file can be built
the same way as other files on platforms that require the other kind of
quoting.

Change-Id: I808769bf014fbfe526fedcdc1e1f617b3490d03b
Reviewed-on: https://boringssl-review.googlesource.com/13380
Reviewed-by: Adam Langley <agl@google.com>
2017-01-27 23:16:27 +00:00
Adam Langley
1da9c67a99 Use a Perlasm variable rather than an #if to exclude the ChaCha20-Poly1305 asm on Windows.
The Windows assembler doesn't appear to do preprocessor macros but nor
can it cope with this style of label.

Change-Id: I0b8ca7372bb9ea0f20101ed138681d379944658e
Reviewed-on: https://boringssl-review.googlesource.com/13207
Reviewed-by: David Benjamin <davidben@google.com>
2017-01-23 22:05:06 +00:00
vkrasnov
8d56558031 Optimized Seal/Open routines for ChaCha20-Poly1305 for x86-64
This is basically the same implementation I wrote for Go

The Go implementation:
https://github.com/golang/crypto/blob/master/chacha20poly1305/chacha20poly1305_amd64.s
The Cloudflare patch for OpenSSL:
https://github.com/cloudflare/sslconfig/blob/master/patches/openssl__chacha20_poly1305_draft_and_rfc_ossl102j.patch

The Seal/Open is only available for the new version, the old one uses
the bundled Poly1305, and the existing ChaCha20 implementations

The benefits of this code, compared to the optimized code currently
disabled in BoringSSL:

* Passes test vectors
* Faster performance: The AVX2 code (on Haswell), is 55% faster for 16B,
  15% for 1350 and 6% for 8192 byte buffers
* Even faster on pre-AVX2 CPUs

Feel free to put whatever license, etc. is appropriate, under the
existing CLA.

Benchmarks are for 16/1350/8192 chunk sizes and given in MB/s:

Before (Ivy Bridge): 34.2   589.5  739.4
After:               68.4   692.1  799.4
Before (Skylake):    50    1233   1649
After:              119.4  1736   2196
After (Andy's):      63.6  1608   2261

Change-Id: I9186f721812655011fc17698b67ddbe8a1c7203b
Reviewed-on: https://boringssl-review.googlesource.com/13142
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2017-01-23 21:12:44 +00:00