boringssl

Commit Graph

Author	SHA1	Message	Date
Henry Case	e9a058315b	Integrate SIKE with TLS key exchange. Implements support for hybrid key exchange based on SIKEp503, a post quantum, isogeny based KEM. This is a hybrid construction mixed with X25519 key agreement. Code point is 0xFE32. Cloudflare's SIDH implementation is used for testing. Key exchange can be used with TLS1.3 only. Change-Id: I3a5f38d6f7d016274e5bcfb629249664e1d983eb	5 years ago
Henry Case	628c450c81	Add support for SIKE/p503 post-quantum KEM Based on Microsoft's implementation available on github: Source: https://github.com/Microsoft/PQCrypto-SIDH Commit: `77044b7618` Following changes has been applied * In intel assembly, use MOV instead of MOVQ: Intel instruction reference in the Intel Software Developer's Manual volume 2A, the MOVQ has 4 forms. None of them mentions moving literal to GPR, hence "movq $rax, 0x0" is wrong. Instead, on 64bit system, MOV can be used. * Some variables were wrongly zero-initialized (as per C99 spec) * Move constant values to .RODATA segment, as keeping them in .TEXT segment is not compatible with XOM. * Fixes issue in arm64 code related to the fact that compiler doesn't reserve enough space for the linker to relocate address of a global variable when used by 'ldr' instructions. Solution is to use 'adrp' followed by 'add' instruction. Relocations for 'adrp' and 'add' instructions is generated by prefixing the label with :pg_hi21: and :lo12: respectively. * Enable MULX and ADX. Code from MS doesn't support PIC. MULX can't reference global variable directly. Instead RIP-relative addressing can be used. This improves performance around 10%-13% on SkyLake * Check if CPU supports BMI2 and ADOX instruction at runtime. On AMD64 optimized implementation of montgomery multiplication and reduction have 2 implementations - faster one takes advantage of BMI2 instruction set introduced in Haswell and ADOX introduced in Broadwell. Thanks to OPENSSL_ia32cap_P it can be decided at runtime which implementation to choose. As CPU configuration is static by nature, branch predictor will be correct most of the time and hence this check very often has no cost. * Reuse some utilities from boringssl instead of reimplementing them. This includes things like: * definition of a limb size (use crypto_word_t instead of digit_t) * use functions for checking in constant time if value is 0 and/or less then * #define's used for conditional compilation * Use SSE2 for conditional swap on vector registers. Improves performance a little bit. * Fix f2elm_t definition. Code imported from MSR defines f2elm_t type as a array of arrays. This decays to a pointer to an array (when passing as an argument). In C, one can't assign const pointer to an array with non-const pointer to an array. Seems it violates 6.7.3/8 from C99 (same for C11). This problem occures in GCC 6, only when -pedantic flag is specified and it occures always in GCC 4.9 (debian jessie). * Fix definition of eval_3_isog. Second argument in eval_3_isog mustn't be const. Similar reason as above. * Use HMAC-SHA256 instead of cSHAKE-256 to avoid upstreaming cSHAKE and SHA3 code. * Add speed and unit tests for SIKE. Change-Id: I22f0bb1f9edff314a35cd74b48e8c4962568e330	5 years ago
David Benjamin	5501a26915	Add 16384 to the default bssl speed sizes. When servers have a lot of data to send and aren't as latency-sensitive, it makes sense to send large TLS records, so we care about measuring both packet-sized and full-sized payloads. Change-Id: Ib0cf5e0f8660f68a98a04fa86b5989d4a485528b Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35344 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	4ca8d131d3	Rewrite BN_CTX. While allocating near INT_MAX BIGNUMs or stack frames would never happen, we should properly handle overflow here. Rewrite it to just be a STACK_OF(BIGNUM) plus a stack of indices. Also simplify the error-handling. If we make the errors truly sticky (rather than just sticky per frame), we don't need to keep track of err_stack and friends. Thanks to mlbrown for reporting the integer overflows in the original implementation. Bug: chromium:942269 Change-Id: Ie9c9baea3eeb82d65d88b1cb1388861f5cd84fe5 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35328 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	c93be52c9e	Save a temporary in BN_mod_exp_mont's w=1 case. BN_mod_exp_mont is most commonly used in RSA verification, where the exponent sizes are small enough to use 1-bit "windows". There's no need to allocate the extra BIGNUM. Change-Id: I14fb523dfae7d77d2cec10a0209f09f22031d1af Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35327 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	1c71844ef5	Reject long inputs in c2i_ASN1_INTEGER. Thanks to mlbrown for reporting this. Bug: chromium:942269 Change-Id: Ie06970f25a6ab0e08a8861d604b2177c8fd1d1a8 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35326 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	0dcab9302f	Harden the lower level parts of crypto/asn1 against overflows. The legacy ASN.1 stack contains an unsalvageable mix of integer types. `82dfea8d9e` bounded all inputs to the template machinery, but sometimes code will call ASN1_get_object directly, such as the just deleted d2i_ASN1_UINTEGER. Thanks to mlbrown for reporting the d2i_ASN1_UINTEGER overflow. Bug: chromium:942269 Change-Id: I2d4c8b7faf5dadd1b68dbdb51a5feae071ea2cb6 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35325 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	bab14fa753	Remove d2i_ASN1_UINTEGER. It is unused. It dates to an old OpenSSL DSA serialization bug. Bug: chromium:942269 Update-Note: Removing a function. Change-Id: Ia98f7eb1dafcd832c744387475cc13b58bc82ffe Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35324 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	fdb48f9861	Drop some unused bsaes to aes_nohw dependencies. When the CBC and CTR EVP_CIPHER implementations use bsaes, they never call dat->block. Note this is not true of aes_ctr_set_key which is used in contexts where it needs single-block operations. Bug: 256 Change-Id: Ibea4f2117a2220cd5cb09f6cf12b7a50c28bf794 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35168 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	d22578f366	Adapt gcm__neon to aarch64. This makes AES-GCM always constant-time on aarch64 (provided assembly is enabled). Unlike vpaes, this does come at a binary size penalty of 1K compared to the gcm__4bit version. ABI testing already covered by GCMTest.ABI (GHASH_ASM_ARM covers both OPENSSL_ARM and OPENSSL_AARCH64.) Cortex-A53 (Raspberry Pi 3 Model B+) Before: Did 274000 AES-128-GCM (16 bytes) seal operations in 1003461us (273055.0 ops/sec): 4.4 MB/s Did 53000 AES-128-GCM (256 bytes) seal operations in 1007689us (52595.6 ops/sec): 13.5 MB/s Did 12000 AES-128-GCM (1350 bytes) seal operations in 1075908us (11153.4 ops/sec): 15.1 MB/s Did 2068 AES-128-GCM (8192 bytes) seal operations in 1089037us (1898.9 ops/sec): 15.6 MB/s After: Did 298000 AES-128-GCM (16 bytes) seal operations in 1002917us (297133.3 ops/sec): 4.8 MB/s Did 64000 AES-128-GCM (256 bytes) seal operations in 1001124us (63928.1 ops/sec): 16.4 MB/s Did 14000 AES-128-GCM (1350 bytes) seal operations in 1015477us (13786.6 ops/sec): 18.6 MB/s Did 2497 AES-128-GCM (8192 bytes) seal operations in 1057951us (2360.2 ops/sec): 19.3 MB/s Bug: 265 Change-Id: I251bf0f2eae0578580bb14192755e5d8ff64cd14 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35285 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	4851041967	Patch out the aes_nohw fallback in bsaes_cbc_encrypt. This plugs all bsaes fallback leaks for CBC outside of the key schedule. The CBC EVP_CIPHERs never call the block function directly when there's a stream.cbc function available. This affects CBC decryptions of length < 128 or 16 mod 128. Performance-wise, we don't really care about CBC apart from passing glances at its use in TLS. There, the Lucky13 workaround mutes the effects. Cortex-A53 (Raspberry Pi 3 Model B+) Before: Did 78000 AES-128-CBC-SHA1 (16 bytes) open operations in 3020254us (25825.6 ops/sec): 0.4 MB/s Did 75000 AES-128-CBC-SHA1 (32 bytes) open operations in 3005760us (24952.1 ops/sec): 0.8 MB/s Did 71000 AES-128-CBC-SHA1 (64 bytes) open operations in 3038137us (23369.6 ops/sec): 1.5 MB/s Did 67000 AES-128-CBC-SHA1 (96 bytes) open operations in 3027686us (22129.1 ops/sec): 2.1 MB/s Did 64000 AES-128-CBC-SHA1 (112 bytes) open operations in 3005491us (21294.4 ops/sec): 2.4 MB/s Did 59000 AES-128-CBC-SHA1 (128 bytes) open operations in 3020083us (19535.9 ops/sec): 2.5 MB/s Did 53000 AES-128-CBC-SHA1 (240 bytes) open operations in 3020105us (17549.1 ops/sec): 4.2 MB/s After: Did 71668 AES-128-CBC-SHA1 (16 bytes) open operations in 3020896us (23724.1 ops/sec): 0.4 MB/s Did 71000 AES-128-CBC-SHA1 (32 bytes) open operations in 3040826us (23348.9 ops/sec): 0.7 MB/s Did 68000 AES-128-CBC-SHA1 (64 bytes) open operations in 3009913us (22592.0 ops/sec): 1.4 MB/s Did 66000 AES-128-CBC-SHA1 (96 bytes) open operations in 3007597us (21944.4 ops/sec): 2.1 MB/s Did 59000 AES-128-CBC-SHA1 (112 bytes) open operations in 3002878us (19647.8 ops/sec): 2.2 MB/s Did 59000 AES-128-CBC-SHA1 (128 bytes) open operations in 3046786us (19364.7 ops/sec): 2.5 MB/s Did 50000 AES-128-CBC-SHA1 (240 bytes) open operations in 3043643us (16427.7 ops/sec): 3.9 MB/s Penryn (Mac mini, mid 2010) Before: Did 152000 AES-128-CBC-SHA1 (16 bytes) open operations in 1004422us (151330.8 ops/sec): 2.4 MB/s Did 143000 AES-128-CBC-SHA1 (32 bytes) open operations in 1000443us (142936.7 ops/sec): 4.6 MB/s Did 136000 AES-128-CBC-SHA1 (48 bytes) open operations in 1006580us (135111.0 ops/sec): 6.5 MB/s Did 146000 AES-128-CBC-SHA1 (96 bytes) open operations in 1005731us (145168.0 ops/sec): 13.9 MB/s Did 138000 AES-128-CBC-SHA1 (112 bytes) open operations in 1003330us (137542.0 ops/sec): 15.4 MB/s Did 133000 AES-128-CBC-SHA1 (128 bytes) open operations in 1005876us (132223.1 ops/sec): 16.9 MB/s Did 117000 AES-128-CBC-SHA1 (240 bytes) open operations in 1004922us (116426.9 ops/sec): 27.9 MB/s After: Did 159000 AES-128-CBC-SHA1 (16 bytes) open operations in 1000505us (158919.7 ops/sec): 2.5 MB/s Did 157000 AES-128-CBC-SHA1 (32 bytes) open operations in 1006091us (156049.5 ops/sec): 5.0 MB/s Did 154000 AES-128-CBC-SHA1 (48 bytes) open operations in 1002720us (153582.3 ops/sec): 7.4 MB/s Did 146000 AES-128-CBC-SHA1 (96 bytes) open operations in 1002567us (145626.2 ops/sec): 14.0 MB/s Did 135000 AES-128-CBC-SHA1 (112 bytes) open operations in 1001212us (134836.6 ops/sec): 15.1 MB/s Did 133000 AES-128-CBC-SHA1 (128 bytes) open operations in 1006441us (132148.8 ops/sec): 16.9 MB/s Did 115000 AES-128-CBC-SHA1 (240 bytes) open operations in 1005246us (114399.9 ops/sec): 27.5 MB/s Bug: 256 Change-Id: I864b4455ada0d4d245380fce6f869dabb0686354 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35167 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	885a63fb74	Patch out the aes_nohw fallback in bsaes_ctr32_encrypt_blocks. bsaes_ctr32_encrypt_blocks previously fell back to the table-based aes_nohw_encrypt for inputs under 128 bytes. Instead, just run the usual bsaes code, though it means we compute more blocks than needed. This fixes some (but not all) the timing leaks and is needed for later bsaes work. Performance-wise, x86_64 actually sees a performance improvement for all but tiny inputs. ARM does see a loss at small inputs however. Cortex-A53 (Raspberry Pi 3 Model B+) Before: Did 299000 AES-128-GCM (16 bytes) seal operations in 1001123us (298664.6 ops/sec): 4.8 MB/s Did 236000 AES-128-GCM (32 bytes) seal operations in 1001611us (235620.4 ops/sec): 7.5 MB/s Did 167000 AES-128-GCM (64 bytes) seal operations in 1005706us (166052.5 ops/sec): 10.6 MB/s Did 129000 AES-128-GCM (96 bytes) seal operations in 1006129us (128214.2 ops/sec): 12.3 MB/s Did 116000 AES-128-GCM (112 bytes) seal operations in 1006302us (115273.5 ops/sec): 12.9 MB/s Did 107000 AES-128-GCM (128 bytes) seal operations in 1000986us (106894.6 ops/sec): 13.7 MB/s After: Did 132000 AES-128-GCM (16 bytes) seal operations in 1005165us (131321.7 ops/sec): 2.1 MB/s Did 128000 AES-128-GCM (32 bytes) seal operations in 1005966us (127240.9 ops/sec): 4.1 MB/s Did 120000 AES-128-GCM (64 bytes) seal operations in 1003080us (119631.5 ops/sec): 7.7 MB/s Did 113000 AES-128-GCM (96 bytes) seal operations in 1000557us (112937.1 ops/sec): 10.8 MB/s Did 110000 AES-128-GCM (112 bytes) seal operations in 1000407us (109955.2 ops/sec): 12.3 MB/s Did 108000 AES-128-GCM (128 bytes) seal operations in 1008830us (107054.7 ops/sec): 13.7 MB/s (Inputs 128 bytes and up are unaffected by this CL.) Nexus 7 Before: Did 544000 AES-128-GCM (16 bytes) seal operations in 1001282us (543303.5 ops/sec): 8.7 MB/s Did 475750 AES-128-GCM (32 bytes) seal operations in 1000244us (475633.9 ops/sec): 15.2 MB/s Did 370500 AES-128-GCM (64 bytes) seal operations in 1000519us (370307.8 ops/sec): 23.7 MB/s Did 300750 AES-128-GCM (96 bytes) seal operations in 1000122us (300713.3 ops/sec): 28.9 MB/s Did 275750 AES-128-GCM (112 bytes) seal operations in 1000702us (275556.6 ops/sec): 30.9 MB/s Did 251000 AES-128-GCM (128 bytes) seal operations in 1000214us (250946.3 ops/sec): 32.1 MB/s After: Did 296000 AES-128-GCM (16 bytes) seal operations in 1001129us (295666.2 ops/sec): 4.7 MB/s Did 288750 AES-128-GCM (32 bytes) seal operations in 1000488us (288609.2 ops/sec): 9.2 MB/s Did 267250 AES-128-GCM (64 bytes) seal operations in 1000641us (267078.8 ops/sec): 17.1 MB/s Did 253250 AES-128-GCM (96 bytes) seal operations in 1000915us (253018.5 ops/sec): 24.3 MB/s Did 248000 AES-128-GCM (112 bytes) seal operations in 1000091us (247977.4 ops/sec): 27.8 MB/s Did 249000 AES-128-GCM (128 bytes) seal operations in 1000794us (248802.5 ops/sec): 31.8 MB/s Penryn (Mac mini, mid 2010) Before: Did 1331000 AES-128-GCM (16 bytes) seal operations in 1000263us (1330650.0 ops/sec): 21.3 MB/s Did 991000 AES-128-GCM (32 bytes) seal operations in 1000274us (990728.5 ops/sec): 31.7 MB/s Did 780000 AES-128-GCM (48 bytes) seal operations in 1000278us (779783.2 ops/sec): 37.4 MB/s Did 483000 AES-128-GCM (96 bytes) seal operations in 1000137us (482933.8 ops/sec): 46.4 MB/s Did 428000 AES-128-GCM (112 bytes) seal operations in 1001132us (427516.1 ops/sec): 47.9 MB/s Did 682000 AES-128-GCM (128 bytes) seal operations in 1000564us (681615.6 ops/sec): 87.2 MB/s After: Did 953000 AES-128-GCM (16 bytes) seal operations in 1000385us (952633.2 ops/sec): 15.2 MB/s Did 903000 AES-128-GCM (32 bytes) seal operations in 1000998us (902099.7 ops/sec): 28.9 MB/s Did 850000 AES-128-GCM (48 bytes) seal operations in 1000938us (849203.4 ops/sec): 40.8 MB/s Did 736000 AES-128-GCM (96 bytes) seal operations in 1000886us (735348.5 ops/sec): 70.6 MB/s Did 702000 AES-128-GCM (112 bytes) seal operations in 1000657us (701539.1 ops/sec): 78.6 MB/s Did 676000 AES-128-GCM (128 bytes) seal operations in 1000405us (675726.3 ops/sec): 86.5 MB/s Bug: 256 Change-Id: I9403da607dd1feaff7b3c9b76fe78b66018fb753 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35166 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	aadcce380f	Implement sk_find manually. glibc inlines bsearch, so CFI does observe the function pointer mishap. Binary search is easy enough, aside from thinking through the edge case at the end, so just implement it by hand. As a bonus, it actually gives O(lg N) behavior. sk__find needs to return the first* match, while bsearch does not promise a particular one. sk_find thus performs a fixup step to find the first one, but this is linear in the number of matching elements. Instead, the binary search should take this into account. This still leaves qsort, but it's not inlined, so hopefully we can leave it alone. Bug: chromium:941463 Change-Id: I5c94d6b15423beea3bdb389639466f8b3ff0dc5d Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35304 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	35941f2923	Make vpaes-armv8.pl compatible with XOM. Change-Id: I27413467e5cac4e16ecbbb8d9a238ba5a8bcb9e7 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35284 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	5 years ago
Adam Langley	1d1345377a	Support three-argument instructions on x86-64. Change-Id: I81c855cd4805d4a5016999669a0cb5261838f23a Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35224 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: David Benjamin <davidben@google.com>	5 years ago
Watson Ladd	3390fd88d7	Correct outdated comments Change-Id: Idc3a41d025fefa9017fce108bed63cb8af426c9b Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35244 Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: David Benjamin <davidben@google.com>	5 years ago
David Benjamin	f9c8d30897	Remove SSL_get_structure_sizes. With all those structures made opaque, it's not really useful as a build sanity-check anymore. Update-Note: This function is removed, but I don't see any actual uses. Change-Id: Ib5640e778466da980596e7085d97104d22aa9d33 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35184 Commit-Queue: David Benjamin <davidben@google.com> Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	b8d7b7498c	Prefer vpaes over bsaes in AES-GCM-SIV and AES-CCM. The AES-GCM-SIV code does not use ctr128_f at all so bsaes is simply identical to aes_nohw. Also, while CCM encrypts with CTR mode, its MAC is not parallelizable at all. (Given the existence of non-parallelizable modes, we ought to make a vpaes-armv7.pl to ensure constant-time AES on NEON. For now, pick the right implementation for x86_64 at least.) aes_ctr_set_key and friends probably aren't the right abstraction (observe the large vs small inputs hint almost matches whether you touch block128_f), but the right abstraction depends on a couple questions: - If you don't provide ctr128_f, is there a perf hit to implementing ctr128_f on top of your block128_f to unify calling code? - It is almost certainly better to use bsaes with gcm.c by calling ctr128_f exclusively and paying some copies (a dedicated calling convention would be even better, but would be a headache) to integrate leading and trailing blocks into the CTR pass. Is this a win, loss, or no-op for hwaes, where block128_f is just fine? hwaes is the one mode we really should not regress. Hopefully those will get answered as we continue to chip away at this. Bug: 256 Change-Id: I8f0150b223b671e68f7da6faaff94a3bea398d4d Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35169 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	da8bb847fd	Tell ASan about the OPENSSL_malloc prefix. OpenSSL's BN_mul function had a single-word buffer underflow (see `576129cd72`). We already independently fixed this but, if we hadn't, ASan wouldn't have noticed because of OPENSSL_malloc. ASan has runtime hooks we can call to make it more accurate. Change-Id: Ifc9c3837ece2bc456c5bdc960be707d7b1759904 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35165 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	8d685ec867	modes/asm/ghash-armv4.pl: address "infixes are deprecated" warnings. This imports `ce5eb5e814` and `1212818eb0` from OpenSSL's 1.1.1 branch. Change-Id: I121c0771371697191a163a28d972a7b3cee37762 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35164 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	55db667c62	Enable vpaes for aarch64, with CTR optimizations. This patches vpaes-armv8.pl to add vpaes_ctr32_encrypt_blocks. CTR mode is by far the most important mode these days. It should have access to _vpaes_encrypt_2x, which gives a considerable speed boost. Also exclude vpaes_ecb_* as they're not even used. For iOS, this change is completely a no-op. iOS ARMv8 always has crypto extensions, and we already statically drop all other AES implementations. Android ARMv8 is not required to have crypto extensions, but every ARMv8 device I've seen has them. For those, it is a no-op performance-wise and a win on size. vpaes appears to be about 5.6KiB smaller than the tables. ARMv8 always makes SIMD (NEON) available, so we can statically drop aes_nohw. In theory, however, crypto-less Android ARMv8 is possible. Today such chips get a variable-time AES. This CL fixes this, but the performance story is complex. The Raspberry Pi 3 is not Android but has a Cortex-A53 chip without crypto extensions. (But the official images are 32-bit, so even this is slightly artificial...) There, vpaes is a performance win. Raspberry Pi 3, Model B+, Cortex-A53 Before: Did 265000 AES-128-GCM (16 bytes) seal operations in 1003312us (264125.2 ops/sec): 4.2 MB/s Did 44000 AES-128-GCM (256 bytes) seal operations in 1002141us (43906.0 ops/sec): 11.2 MB/s Did 9394 AES-128-GCM (1350 bytes) seal operations in 1032104us (9101.8 ops/sec): 12.3 MB/s Did 1562 AES-128-GCM (8192 bytes) seal operations in 1008982us (1548.1 ops/sec): 12.7 MB/s After: Did 277000 AES-128-GCM (16 bytes) seal operations in 1001884us (276479.1 ops/sec): 4.4 MB/s Did 52000 AES-128-GCM (256 bytes) seal operations in 1001480us (51923.2 ops/sec): 13.3 MB/s Did 11000 AES-128-GCM (1350 bytes) seal operations in 1007979us (10912.9 ops/sec): 14.7 MB/s Did 2013 AES-128-GCM (8192 bytes) seal operations in 1085545us (1854.4 ops/sec): 15.2 MB/s The Pixel 3 has a Cortex-A75 with crypto extensions, so it would never run this code. However, artificially ignoring them gives another data point (ARM documentation[] suggests the extensions are still optional on a Cortex-A75.) Sadly, vpaes no longer wins on perf over aes_nohw. But, it is constant-time: Pixel 3, AES/PMULL extensions ignored, Cortex-A75: Before: Did 2102000 AES-128-GCM (16 bytes) seal operations in 1000378us (2101205.7 ops/sec): 33.6 MB/s Did 358000 AES-128-GCM (256 bytes) seal operations in 1002658us (357051.0 ops/sec): 91.4 MB/s Did 75000 AES-128-GCM (1350 bytes) seal operations in 1012830us (74049.9 ops/sec): 100.0 MB/s Did 13000 AES-128-GCM (8192 bytes) seal operations in 1036524us (12541.9 ops/sec): 102.7 MB/s After: Did 1453000 AES-128-GCM (16 bytes) seal operations in 1000213us (1452690.6 ops/sec): 23.2 MB/s Did 285000 AES-128-GCM (256 bytes) seal operations in 1002227us (284366.7 ops/sec): 72.8 MB/s Did 60000 AES-128-GCM (1350 bytes) seal operations in 1016106us (59049.0 ops/sec): 79.7 MB/s Did 11000 AES-128-GCM (8192 bytes) seal operations in 1094184us (10053.2 ops/sec): 82.4 MB/s Note the numbers above run with PMULL off, so the slow GHASH is dampening the regression. If we test aes_nohw and vpaes paired with PMULL on, the 20% perf hit becomes a 31% hit. The PMULL-less variant is more likely to represent a real chip. This is consistent with upstream's note in the comment, though it is unclear if 20% is the right order of magnitude: "these results are worse than scalar compiler-generated code, but it's constant-time and therefore preferred". [] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100458_0301_00_en/lau1442495529696.html Bug: 246 Change-Id: If1dc87f5131fce742052498295476fbae4628dbf Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35026 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	b1b4ff93ca	Check in vpaes-armv8.pl from OpenSSL unused and unmodified. This is done separately to make the diffs in the subsequent CL easier to see. Imported from OpenSSL at revision `25ca718150`. Bug: 246 Change-Id: I9e7067ea177963fb9b77bf6fb39702ffe6e34ed4 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35025 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	5 years ago
Jeremy Apthorp	1fa5abc0b4	silence unused variable warnings when using OPENSSL_clear_free e.g. here: `adbe3b837e/src/node_crypto.cc (L3439)` Change-Id: I2d43a3439d6a56c8eee3636b3c1f5ba615b233ba Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35144 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	5 years ago
Jeremy Apthorp	19220dd6af	Handle NULL public key in \|EC_KEY_set_public_key\|. Node.js expects to be able to pass NULL to this function to clear the current public key: `adbe3b837e/src/node_crypto.cc (L5316)` Change-Id: Id4e34d8e8b556c28000e4df12ff6f4432ad9220c Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35124 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	5 years ago
David Benjamin	5ce12e6436	Add a 32-bit SSSE3 GHASH implementation. The 64-bit version can be fairly straightforwardly translated. Ironically, this makes 32-bit x86 the first architecture to meet the goal of constant-time AES-GCM given SIMD assembly. (Though x86_64 could join by simply giving up on bsaes...) Bug: 263 Change-Id: Icb2cec936457fac7132bbb5dbb094433bc14b86e Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35024 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	5 years ago
Robert Sloan	ae1e08709f	Also include abi_test.cc in ssl_test_files. Change-Id: I1225f1623d4438a2ccaf482eddbe4f460cfaf78c Reviewed-on: https://boringssl-review.googlesource.com/c/35104 Reviewed-by: David Benjamin <davidben@google.com> Commit-Queue: David Benjamin <davidben@google.com>	5 years ago
David Benjamin	c3889634a1	Don't pull abi_test.cc into non-GTest targets. The test_support is kind of a mess right now because it's sometimes used in GTest targets and sometimes not. It really should be split into two libraries, but do this for now to unbreak the Android build. Change-Id: I7cd2b0f6ed9eda1a529ec3c69a92390e20da66f8 Reviewed-on: https://boringssl-review.googlesource.com/c/35084 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	5 years ago
Alessandro Ghedini	a6124742d0	Update *_set_cert_cb documentation regarding resumption Since `34202b93b6` cert_cb is always called before resumption is checked. Change-Id: I27ca5653144027a1f545a90ecb6b68e64783a66a Reviewed-on: https://boringssl-review.googlesource.com/c/35004 Reviewed-by: David Benjamin <davidben@google.com>	5 years ago
David Benjamin	1e0262ad87	Add a reference for Linux ARM ABI. The Android NDK docs link to a ARM GNU/Linux Application Binary Interface Supplement document. Also fix a type in trampoline-armv4.pl. The generic ARM document is usually shortened AAPCS, not APCS. I couldn't find a corresponding link for aarch64. Change-Id: I6e5543f5c9e26955cd3945e9e7a5dcff27c2bd78 Reviewed-on: https://boringssl-review.googlesource.com/c/35064 Commit-Queue: David Benjamin <davidben@google.com> Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	a57435e138	Remove __ARM_ARCH__ guard on gcm_*_v8. OpenSSL's `c1669e1c20` switched it to __ARM_MAX_ARCH__, which we mirrored in assembly but not C. The C version should be __ARM_MAX_ARCH__ to match. However, __ARM_MAX_ARCH__ is hardcoded to 8, so just remove the check. Change-Id: Ic873203db1478f49437b889b84ee7fb28eba1a6d Reviewed-on: https://boringssl-review.googlesource.com/c/35045 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	f1f73f8966	Fix bsaes-armv7.pl getting disabled by accident. https://boringssl-review.googlesource.com/c/34188 accidentally disabled it (__ARM_MAX_ARCH__ wasn't defined), which, in turn, masked a bug in https://boringssl-review.googlesource.com/c/34874. Remove the __ARM_MAX_ARCH__ check as that's hardcoded to 8 anyway. Then revert the problematic part of the bsaes-armv7.pl change. That brings back the somewhat questionable post-dispatch to pre-dispatch call, but I hope to patch the fallbacks out soon anyway. Change-Id: I567e55fe35cb716d5ed56580113a302617f5ad71 Reviewed-on: https://boringssl-review.googlesource.com/c/35044 Commit-Queue: David Benjamin <davidben@google.com> Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	6443173d03	Add an option to configure bssl speed chunk size. bsaes, in its current incarnation, hits various pathological behaviors at different input sizes. Make it easy to experiment around them. Bug: 256 Change-Id: Ib6c6ca7d06a570dbf7d4d2ea81c1db0d94d3d0c4 Reviewed-on: https://boringssl-review.googlesource.com/c/34876 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	98ad4d77e3	Appease GCC's uninitialized value warning. GCC notices that one function believes < 0 is the error while the other believes it's != 0. unw_get_reg never returns positive, but match them. Change-Id: I40af614e6b1400bf3d398bd32beb6d3ec702bc11 Reviewed-on: https://boringssl-review.googlesource.com/c/34985 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	5 years ago
Adam Langley	a367d9267f	Set VPAES flags in x86-64 code. The ImplDispatchTest was broken because the 64-bit VPAES code wasn't setting the hit flags. Change-Id: I30200db64337deba7ae9d70d8427decbdfceca58 Reviewed-on: https://boringssl-review.googlesource.com/c/34986 Reviewed-by: David Benjamin <davidben@google.com>	5 years ago
David Benjamin	65dc321492	Enable vpaes for AES_* functions. This makes the AES_* functions meet our constant-time goals for platforms where we have vpaes available. In particular, QUIC packet number encryption needs single-block operations and those should have vpaes available. As a bonus, when vpaes is statically available, the aes_nohw_* functions should be dropped by the linker. (Notably, NEON is guaranteed on aarch64. Although vpaes-armv8.pl itself may take some more exploration. https://crbug.com/boringssl/246#c4) Bug: 263 Change-Id: Ie1c4727a166ec101a8453761757c87dadc188769 Reviewed-on: https://boringssl-review.googlesource.com/c/34875 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	3c19830f6f	Avoid double-dispatch with AES_* vs aes_nohw_. In particular, consistently pair bsaes with aes_nohw. Ideally the aes_nohw_ calls in bsaes-.pl would be patched out and bsaes grows its own constant-time key setup (https://crbug.com/boringssl/256), but I'll sort that out separately. In the meantime, avoid going through AES_ which now dispatch. This avoids several nuisances: 1. If we were to add, say, a vpaes-armv7.pl the ABI tests would break. Fundamentally, we cannot assume that an AES_KEY has one and only one representation and must keep everything matching up. 2. AES_* functions should enable vpaes. This makes AES_* faster and constant-time for vector-capable CPUs (https://crbug.com/boringssl/263), relevant for QUIC packet number encryption, allowing us to add vpaes-armv8.pl (https://crbug.com/boringssl/246) without carrying a (likely) mostly unused AES implementation. 3. It's silly to double-dispatch when the EVP layer has already dispatched. 4. We should avoid asm calling into C. Otherwise, we need to test asm for ABI compliance as both caller and callee. Currently we only test it for callee compliance. When asm calls into asm, it should comply with the ABI as caller too, but mistakes don't matter as long as the called function triggers it. If the function is asm, this is fixed. If it is C, we must care about arbitrary C compiler output. Bug: 263 Change-Id: Ic85af5c765fd57cbffeaf301c3872bad6c5bbf78 Reviewed-on: https://boringssl-review.googlesource.com/c/34874 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	5 years ago
Kaustubha Govind	c18353d214	Add uint64_t support in CBS and CBB. We need these APIs to parse some Certificate Transparency structures. Bug: chromium:634570 Change-Id: I4eb46058985a7369dc119ba6a1214913b237da39 Reviewed-on: https://boringssl-review.googlesource.com/c/34944 Reviewed-by: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> Commit-Queue: Adam Langley <agl@google.com>	5 years ago
David Benjamin	f109f20873	Clear out a bunch of -Wextra-semi warnings. Unfortunately, it's not enough to be able to turn it on thanks to the PURE_VIRTUAL macro. But it gets us most of the way there. Change-Id: Ie6ad5119fcfd420115fa49d7312f3586890244f4 Reviewed-on: https://boringssl-review.googlesource.com/c/34949 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	5 years ago
Steven Valdez	0326105aa9	Add compiled python files to .gitignore. Change-Id: If5d88d88bd1ea8189cc715cc38e70bd3b11c4b67 Reviewed-on: https://boringssl-review.googlesource.com/c/34950 Commit-Queue: Steven Valdez <svaldez@google.com> Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	5 years ago
David Benjamin	24a18b8a40	Fix x86_64-xlate.pl comment regex. This did not correctly capture lines like the following: https://boringssl.googlesource.com/boringssl/+/refs/heads/master/crypto/chacha/asm/chacha-x86_64.pl#260 https://boringssl.googlesource.com/boringssl/+/refs/heads/master/crypto/fipsmodule/aes/asm/aes-x86_64.pl#992 https://boringssl.googlesource.com/boringssl/+/refs/heads/master/crypto/fipsmodule/aes/asm/aesni-x86_64.pl#641 https://boringssl.googlesource.com/boringssl/+/refs/heads/master/crypto/fipsmodule/aes/asm/bsaes-x86_64.pl#387 https://boringssl.googlesource.com/boringssl/+/refs/heads/master/crypto/fipsmodule/modes/asm/ghash-x86_64.pl#455 https://boringssl.googlesource.com/boringssl/+/refs/heads/master/crypto/fipsmodule/ec/asm/p256-x86_64-asm.pl#92 Reportedly that last one causes problems with some assemblers. Change-Id: I82d6f0d81b902e48fad3c45947f84f02370eb1ab Reviewed-on: https://boringssl-review.googlesource.com/c/34925 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	1908667015	Add go 1.11 to go.mod. Go 1.12 really wants to record a version in go.mod if there is no version in there. 1.12 is not yet released, so stick 1.11 in there for now. We'll bump it to 1.12 and so on as we update our minimum versions. Change-Id: I79ac85837149ab7cadd2f23acd8ab2d207a1a355 Reviewed-on: https://boringssl-review.googlesource.com/c/34924 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	104306f587	Remove STRICT_ALIGNMENT code from modes. STRICT_ALIGNMENT is a remnant of OpenSSL code would cast pointers to size_t* and load more than one byte at a time. Not all architectures support unaligned access, so it did an alignment check and only enterred this path if aligned or the underlying architecture didn't care. This is UB. Unaligned casts in C are undefined on all architectures, so we switch these to memcpy some time ago. Compilers can optimize memcpy to the unaligned accesses we wanted. That left our modes logic as: - If STRICT_ALIGNMENT is 1 and things are unaligned, work byte-by-byte. - Otherwise, use the memcpy-based word-by-word code, which now works independent of STRICT_ALIGNMENT. Remove the first check to simplify things. On x86, x86_64, and aarch64, STRICT_ALIGNMENT is zero and this is a no-op. ARM is more complex. Per [0], ARMv7 and up support unaligned access. ARMv5 do not. ARMv6 does, but can run in a mode where it looks more like ARMv5. For ARMv7 and up, STRICT_ALIGNMENT should have been zero, but was one. Thus this change should be an improvement for ARMv7 (right now unaligned inputs lose bsaes-armv7). The Android NDK does not even support the pre-ARMv7 ABI anymore[1]. Nonetheless, Cronet still supports ARMv6 as a library. It builds with -march=armv6 which GCC interprets as supporting unaligned access, so it too did not want this code. For completeness, should anyone still care about ARMv5 or be building with an overly permissive -march flag, GCC does appear unable to inline the memcpy calls. However, GCC also does not interpret (uintptr_t)ptr % sizeof(size_t) as an alignment assertion, so such consumers have already been paying for the memcpy here and throughout the library. In general, C's arcane pointer rules mean we must resort to memcpy often, so, realistically, we must require that the compiler optimize memcpy well. [0] https://medium.com/@iLevex/the-curious-case-of-unaligned-access-on-arm-5dd0ebe24965 [1] https://developer.android.com/ndk/guides/abis#armeabi Change-Id: I3c7dea562adaeb663032e395499e69530dd8e145 Reviewed-on: https://boringssl-review.googlesource.com/c/34873 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	d8598ce03f	Remove non-STRICT_ALIGNMENT code from xts.c. Independent of the underlying CPU architecture, casting unaligned pointers to uint64_t* is undefined. Just use a memcpy. The compiler should be able to optimize that itself. Change-Id: I39210871fca3eaf1f4b1d205b2bb0c337116d9cc Reviewed-on: https://boringssl-review.googlesource.com/c/34872 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	4d8e1ce5e9	Patch XTS out of ARMv7 bsaes too. Bug: 256 Change-Id: I822274bf05901d82b41dc9c9c4e6d0b5d622f3ff Reviewed-on: https://boringssl-review.googlesource.com/c/34871 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	fb35b147ca	Remove stray prototype. The function's since been renamed. Change-Id: Id1a9788dfeb5c46b3463611b08318b3f253d03df Reviewed-on: https://boringssl-review.googlesource.com/c/34870 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	eb2c2cdf17	Always define GHASH. There is a C implementation of gcm_ghash_4bit to pair with gcm_gmult_4bit. It's even slightly faster per the numbers below (x86_64 OPENSSL_NO_ASM build), but, more importantly, we trim down the combinatorial explosion of GCM implementations and free up complexity budget for potentially using bsaes better in the future. Old: Did 2557000 AES-128-GCM (16 bytes) seal operations in 1000057us (2556854.3 ops/sec): 40.9 MB/s Did 94000 AES-128-GCM (1350 bytes) seal operations in 1009613us (93105.0 ops/sec): 125.7 MB/s Did 17000 AES-128-GCM (8192 bytes) seal operations in 1024768us (16589.1 ops/sec): 135.9 MB/s Did 2511000 AES-256-GCM (16 bytes) seal operations in 1000196us (2510507.9 ops/sec): 40.2 MB/s Did 84000 AES-256-GCM (1350 bytes) seal operations in 1000412us (83965.4 ops/sec): 113.4 MB/s Did 15000 AES-256-GCM (8192 bytes) seal operations in 1046963us (14327.2 ops/sec): 117.4 MB/s New: Did 2739000 AES-128-GCM (16 bytes) seal operations in 1000322us (2738118.3 ops/sec): 43.8 MB/s Did 100000 AES-128-GCM (1350 bytes) seal operations in 1008190us (99187.7 ops/sec): 133.9 MB/s Did 17000 AES-128-GCM (8192 bytes) seal operations in 1006360us (16892.6 ops/sec): 138.4 MB/s Did 2546000 AES-256-GCM (16 bytes) seal operations in 1000150us (2545618.2 ops/sec): 40.7 MB/s Did 86000 AES-256-GCM (1350 bytes) seal operations in 1000970us (85916.7 ops/sec): 116.0 MB/s Did 14850 AES-256-GCM (8192 bytes) seal operations in 1023459us (14509.6 ops/sec): 118.9 MB/s While I'm here, tighten up some of the functions and align the ctr32 and non-ctr32 paths. Bug: 256 Change-Id: Id4df699cefc8630dd5a350d44f927900340f5e60 Reviewed-on: https://boringssl-review.googlesource.com/c/34869 Reviewed-by: Adam Langley <agl@google.com>	5 years ago
Watson Ladd	2f213f643f	Update delegated credentials to draft-03 Change-Id: I0c648340ac7bb134fcda42c56a83f4815bbaa557 Reviewed-on: https://boringssl-review.googlesource.com/c/34884 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	5 years ago
David Benjamin	b22c9fea47	Use Windows symbol APIs in the unwind tester. This should make things a bit easier to debug. Update-Note: Test binaries on Windows now link to dbghelp. Bug: 259 Change-Id: I9da1fc89d429080c5250238e4341445922b1dd8e Reviewed-on: https://boringssl-review.googlesource.com/c/34868 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	5 years ago
David Benjamin	2e819d8be4	Unwind RDRAND functions correctly on Windows. But for the ABI conversion bits, these are just leaf functions and don't even need unwind tables. Just renumber the registers on Windows to only used volatile ones. In doing so, this switches to writing rdrand explicitly. perlasm already knows how to manually encode it and our minimum assembler versions surely cover rdrand by now anyway. Also add the .size directive. I'm not sure what it's used for, but the other files have it. (This isn't a generally reusable technique. The more complex functions will need actual unwind codes.) Bug: 259 Change-Id: I1d5669bcf8b6e34939885d78aea6f60597be1528 Reviewed-on: https://boringssl-review.googlesource.com/c/34867 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	5 years ago
David Benjamin	15ba2d11a9	Patch out unused aesni-x86_64 functions. This shrinks the bssl binary by about 8k. Change-Id: I571f258ccf7032ae34db3f20904ad9cc81cca839 Reviewed-on: https://boringssl-review.googlesource.com/c/34866 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	5 years ago

1 2 3 4 5 ...

5733 Commits (e9a058315be44537d4696b584692e1a0da568645) All Branches Search

5733 Commits (e9a058315be44537d4696b584692e1a0da568645)

All Branches