The Windows assembler doesn't appear to do preprocessor macros but nor
can it cope with this style of label.
Change-Id: I0b8ca7372bb9ea0f20101ed138681d379944658e
Reviewed-on: https://boringssl-review.googlesource.com/13207
Reviewed-by: David Benjamin <davidben@google.com>
This is basically the same implementation I wrote for Go
The Go implementation:
https://github.com/golang/crypto/blob/master/chacha20poly1305/chacha20poly1305_amd64.s
The Cloudflare patch for OpenSSL:
https://github.com/cloudflare/sslconfig/blob/master/patches/openssl__chacha20_poly1305_draft_and_rfc_ossl102j.patch
The Seal/Open is only available for the new version, the old one uses
the bundled Poly1305, and the existing ChaCha20 implementations
The benefits of this code, compared to the optimized code currently
disabled in BoringSSL:
* Passes test vectors
* Faster performance: The AVX2 code (on Haswell), is 55% faster for 16B,
15% for 1350 and 6% for 8192 byte buffers
* Even faster on pre-AVX2 CPUs
Feel free to put whatever license, etc. is appropriate, under the
existing CLA.
Benchmarks are for 16/1350/8192 chunk sizes and given in MB/s:
Before (Ivy Bridge): 34.2 589.5 739.4
After: 68.4 692.1 799.4
Before (Skylake): 50 1233 1649
After: 119.4 1736 2196
After (Andy's): 63.6 1608 2261
Change-Id: I9186f721812655011fc17698b67ddbe8a1c7203b
Reviewed-on: https://boringssl-review.googlesource.com/13142
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>