6f0c4db90e
The C implementation is still our existing C implementation, but slightly tweaked to fit with upstream's init/block/emits convention. I've tested this by looking at code coverage in kcachegrind and valgrind --tool=callgrind --dump-instr=yes --collect-jumps=yes (NB: valgrind 3.11.0 is needed for AVX2. And even that only does 64-bit AVX2, so we can't get coverage for the 32-bit code yet. But I had to disable that anyway.) This was paired with a hacked up version of poly1305_test that would repeat tests with different ia32cap and armcap values. This isn't checked in, but we badly need a story for testing all the different variants. I'm not happy with upstream's code in either the C/asm boundary or how it dispatches between different versions, but just debugging the code has been a significant time investment. I'd hoped to extract the SIMD parts and do the rest in C, but I think we need to focus on testing first (and use that to guide what modifications would help). For now, this version seems to work at least. The x86 (not x86_64) AVX2 code needs to be disabled because it's broken. It also seems pretty unnecessary. https://rt.openssl.org/Ticket/Display.html?id=4346 Otherwise it seems to work and buys us a decent performance improvement. Notably, my Nexus 6P is finally faster at ChaCha20-Poly1305 than my Nexus 4! bssl speed numbers follow: x86 --- Old: Did 1554000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000536us (1553167.5 ops/sec): 24.9 MB/s Did 136000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1003947us (135465.3 ops/sec): 182.9 MB/s Did 30000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1022990us (29325.8 ops/sec): 240.2 MB/s Did 1888000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000206us (1887611.2 ops/sec): 30.2 MB/s Did 173000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1003036us (172476.4 ops/sec): 232.8 MB/s Did 30000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1027759us (29189.7 ops/sec): 239.1 MB/s New: Did 2030000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000507us (2028971.3 ops/sec): 32.5 MB/s Did 404000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1000287us (403884.1 ops/sec): 545.2 MB/s Did 83000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1001258us (82895.7 ops/sec): 679.1 MB/s Did 2018000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000006us (2017987.9 ops/sec): 32.3 MB/s Did 360000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1001962us (359295.1 ops/sec): 485.0 MB/s Did 85000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1002479us (84789.8 ops/sec): 694.6 MB/s x86_64, no AVX2 --- Old: Did 2023000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000258us (2022478.2 ops/sec): 32.4 MB/s Did 466000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1002619us (464782.7 ops/sec): 627.5 MB/s Did 90000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1001133us (89898.1 ops/sec): 736.4 MB/s Did 2238000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000175us (2237608.4 ops/sec): 35.8 MB/s Did 483000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1001348us (482349.8 ops/sec): 651.2 MB/s Did 90000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1003141us (89718.2 ops/sec): 735.0 MB/s New: Did 2558000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000275us (2557296.7 ops/sec): 40.9 MB/s Did 510000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001810us (509078.6 ops/sec): 687.3 MB/s Did 115000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1006457us (114262.2 ops/sec): 936.0 MB/s Did 2818000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000187us (2817473.1 ops/sec): 45.1 MB/s Did 418000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1001140us (417524.0 ops/sec): 563.7 MB/s Did 91000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1002539us (90769.5 ops/sec): 743.6 MB/s x86_64, AVX2 --- Old: Did 2516000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000115us (2515710.7 ops/sec): 40.3 MB/s Did 774000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1000300us (773767.9 ops/sec): 1044.6 MB/s Did 171000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1004373us (170255.5 ops/sec): 1394.7 MB/s Did 2580000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000144us (2579628.5 ops/sec): 41.3 MB/s Did 769000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000472us (768637.2 ops/sec): 1037.7 MB/s Did 169000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1000320us (168945.9 ops/sec): 1384.0 MB/s New: Did 3240000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000114us (3239630.7 ops/sec): 51.8 MB/s Did 932000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1000059us (931945.0 ops/sec): 1258.1 MB/s Did 217000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1003282us (216290.1 ops/sec): 1771.8 MB/s Did 3187000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000100us (3186681.3 ops/sec): 51.0 MB/s Did 926000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000071us (925934.3 ops/sec): 1250.0 MB/s Did 215000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1000479us (214897.1 ops/sec): 1760.4 MB/s arm, Nexus 4 --- Old: Did 430248 ChaCha20-Poly1305 (16 bytes) seal operations in 1000153us (430182.2 ops/sec): 6.9 MB/s Did 115250 ChaCha20-Poly1305 (1350 bytes) seal operations in 1000549us (115186.8 ops/sec): 155.5 MB/s Did 27000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1030124us (26210.4 ops/sec): 214.7 MB/s Did 451750 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000549us (451502.1 ops/sec): 7.2 MB/s Did 118000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1001557us (117816.6 ops/sec): 159.1 MB/s Did 27000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1024263us (26360.4 ops/sec): 215.9 MB/s New: Did 553644 ChaCha20-Poly1305 (16 bytes) seal operations in 1000183us (553542.7 ops/sec): 8.9 MB/s Did 126000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1000396us (125950.1 ops/sec): 170.0 MB/s Did 27000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1000336us (26990.9 ops/sec): 221.1 MB/s Did 559000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1001465us (558182.3 ops/sec): 8.9 MB/s Did 124000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000824us (123897.9 ops/sec): 167.3 MB/s Did 28000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1034854us (27057.0 ops/sec): 221.7 MB/s aarch64, Nexus 6P --- Old: Did 358000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000358us (357871.9 ops/sec): 5.7 MB/s Did 45000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1022386us (44014.7 ops/sec): 59.4 MB/s Did 8657 ChaCha20-Poly1305 (8192 bytes) seal operations in 1063722us (8138.4 ops/sec): 66.7 MB/s Did 350000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000074us (349974.1 ops/sec): 5.6 MB/s Did 44000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1007907us (43654.8 ops/sec): 58.9 MB/s Did 8525 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1042644us (8176.3 ops/sec): 67.0 MB/s New: Did 713000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000190us (712864.6 ops/sec): 11.4 MB/s Did 180000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1004249us (179238.4 ops/sec): 242.0 MB/s Did 41000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1005811us (40763.1 ops/sec): 333.9 MB/s Did 775000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000719us (774443.2 ops/sec): 12.4 MB/s Did 182000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1003529us (181360.0 ops/sec): 244.8 MB/s Did 41000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1010576us (40570.9 ops/sec): 332.4 MB/s Change-Id: Iaa4ab86ac1174b79833077963cc3616cfb08e686 Reviewed-on: https://boringssl-review.googlesource.com/7226 Reviewed-by: Adam Langley <agl@google.com> |
||
---|---|---|
.. | ||
aead.h | ||
aes.h | ||
arm_arch.h | ||
asn1_mac.h | ||
asn1.h | ||
asn1t.h | ||
base64.h | ||
base.h | ||
bio.h | ||
blowfish.h | ||
bn.h | ||
buf.h | ||
buffer.h | ||
bytestring.h | ||
cast.h | ||
chacha.h | ||
cipher.h | ||
cmac.h | ||
conf.h | ||
cpu.h | ||
crypto.h | ||
curve25519.h | ||
des.h | ||
dh.h | ||
digest.h | ||
dsa.h | ||
dtls1.h | ||
ec_key.h | ||
ec.h | ||
ecdh.h | ||
ecdsa.h | ||
engine.h | ||
err.h | ||
evp.h | ||
ex_data.h | ||
hkdf.h | ||
hmac.h | ||
lhash_macros.h | ||
lhash.h | ||
md4.h | ||
md5.h | ||
mem.h | ||
obj_mac.h | ||
obj.h | ||
objects.h | ||
opensslconf.h | ||
opensslv.h | ||
ossl_typ.h | ||
pem.h | ||
pkcs7.h | ||
pkcs8.h | ||
pkcs12.h | ||
poly1305.h | ||
pqueue.h | ||
rand.h | ||
rc4.h | ||
rsa.h | ||
safestack.h | ||
sha.h | ||
srtp.h | ||
ssl3.h | ||
ssl.h | ||
stack_macros.h | ||
stack.h | ||
thread.h | ||
time_support.h | ||
tls1.h | ||
type_check.h | ||
x509_vfy.h | ||
x509.h | ||
x509v3.h |