Before, attempting to build the code using Yasm as the assembler would
result in warnings like this:
warning : no non-local label before `.chacha20_consts'
Precede the local labels with a non-local label to suppress these
warnings.
It isn't clear why these labels are defined as local labels instead of
regular labels. Making them non-local may be a better idea.
For reference, Yasm's interpretation of local labels is described
succinctly at
https://www.tortall.net/projects/yasm/manual/html/nasm-local-label.html.
Change-Id: Ifc92de7fd7379859fe33f1137ab20b6ec282cd0b
Reviewed-on: https://boringssl-review.googlesource.com/13384
Reviewed-by: Adam Langley <agl@google.com>
The Mac ld gets unhappy about "weird" unwind directives:
In chacha20_poly1305_x86_64.pl, $keyp is being pushed on the stack
(according to the comment) because it gets clobbered in the computation
somewhere. $keyp is %r9 which is not callee-saved (it's an argument
register), so we don't need to tag it with .cfi_offset.
In x25519-asm-x86_64.S, x25519_x86_64_mul saves %rdi on the stack.
However it too is not callee-saved (it's an argument register) and
should not have a .cfi_offset. %rdi also does not appear to be written
to anywhere in the function, so there's no need to save it at all.
(This does not resolve the "r15 is saved too far from return address"
errors. Just the non-standard register ones.)
BUG=176
Change-Id: I53f3f7db3d1745384fb47cb52cd6536aabb5065e
Reviewed-on: https://boringssl-review.googlesource.com/13560
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Cargo-cult the way other Perlasm scripts do it.
Change-Id: I86aaf725e41b601f24595518a8a6bc481fa0c7fc
Reviewed-on: https://boringssl-review.googlesource.com/13382
Reviewed-by: Adam Langley <agl@google.com>
Perlasm requires the size suffix when targeting NASM and Yasm; without
it, the resulting .asm file has |imu| instead of |imul|.
Change-Id: Icb95b8c0b68cf4f93becdc1930dc217398f56bec
Reviewed-on: https://boringssl-review.googlesource.com/13381
Reviewed-by: Adam Langley <agl@google.com>
Use the same quoting used in other files so that this file can be built
the same way as other files on platforms that require the other kind of
quoting.
Change-Id: I808769bf014fbfe526fedcdc1e1f617b3490d03b
Reviewed-on: https://boringssl-review.googlesource.com/13380
Reviewed-by: Adam Langley <agl@google.com>
The Windows assembler doesn't appear to do preprocessor macros but nor
can it cope with this style of label.
Change-Id: I0b8ca7372bb9ea0f20101ed138681d379944658e
Reviewed-on: https://boringssl-review.googlesource.com/13207
Reviewed-by: David Benjamin <davidben@google.com>
This is basically the same implementation I wrote for Go
The Go implementation:
https://github.com/golang/crypto/blob/master/chacha20poly1305/chacha20poly1305_amd64.s
The Cloudflare patch for OpenSSL:
https://github.com/cloudflare/sslconfig/blob/master/patches/openssl__chacha20_poly1305_draft_and_rfc_ossl102j.patch
The Seal/Open is only available for the new version, the old one uses
the bundled Poly1305, and the existing ChaCha20 implementations
The benefits of this code, compared to the optimized code currently
disabled in BoringSSL:
* Passes test vectors
* Faster performance: The AVX2 code (on Haswell), is 55% faster for 16B,
15% for 1350 and 6% for 8192 byte buffers
* Even faster on pre-AVX2 CPUs
Feel free to put whatever license, etc. is appropriate, under the
existing CLA.
Benchmarks are for 16/1350/8192 chunk sizes and given in MB/s:
Before (Ivy Bridge): 34.2 589.5 739.4
After: 68.4 692.1 799.4
Before (Skylake): 50 1233 1649
After: 119.4 1736 2196
After (Andy's): 63.6 1608 2261
Change-Id: I9186f721812655011fc17698b67ddbe8a1c7203b
Reviewed-on: https://boringssl-review.googlesource.com/13142
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>