boringssl

Author	SHA1	Message	Date
David Benjamin	1a36dd4930	Unwind the large_inputs hint in aes_ctr_set_key. With bsaes-x86_64.pl gone, it is no longer needed. Depending on how armv7 works (if vpaes-armv7.pl is too slow AND on-demand vpaes->bsaes key conversion is not viable), we may need to bring it back, but get it out of the way for now. Bug: 256 Change-Id: I762c83097bd03d88574ae1ae16b88fca6826f655 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35365 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-03-23 07:06:02 +00:00
David Benjamin	32ce6032ff	Add an optimized x86_64 vpaes ctr128_f and remove bsaes. Brian Smith suggested applying vpaes-armv8's "2x" optimization to vpaes-x86_64. The registers are a little tight (aarch64 has a whole 32 SIMD registers, while x86_64 only has 16), but it's doable with some spills and makes vpaes much more competitive with bsaes. At small- and medium-sized inputs, vpaes now matches bsaes. At large inputs, it's a ~10% perf hit. bsaes is thus pulling much less weight. Losing an entire AES implementation and having constant-time AES for SSSE3 is attractive. Some notes: - The fact that these are older CPUs tempers the perf hit, but CPUs without AES-NI are still common enough to matter. - This CL does regress CBC decrypt performance nontrivially (see below). If this matters, we can double-up CBC decryption too. CBC in TLS is legacy and already pays a costly Lucky13 mitigation. - The difference between 1350 and 8192 bytes is likely bsaes AES-GCM paying for two slow (and variable-time!) aes_nohw_encrypt calls for EK0 and the trailing partial block. At larger inputs, those two calls are more amortized. - To that end, bsaes would likely be much faster on AES-GCM with smarter use of bsaes. (Fold one-off calls above into bulk data.) Implementing this is a bit of a nuisance though, especially considering we don't wish to regress hwaes. - I'd discarded the key conversion idea, but I think I did it wrong. Benchmarks from https://boringssl-review.googlesource.com/c/boringssl/+/33589 suggest converting to bsaes format on-demand for large ctr32 inputs should give the best of both worlds, but at the cost of an entire AES implementation relative to this CL. - ARMv7 still depends on bsaes and has no vpaes. It also has 16 SIMD registers, so my plan is to translate it, with the same 2x optimization, and see how it compares. Hopefully that, or some combination of the above, will work for ARMv7. Sandy Bridge bsaes (before): Did 3144750 AES-128-GCM (16 bytes) seal operations in 5016000us (626943.8 ops/sec): 10.0 MB/s Did 2053750 AES-128-GCM (256 bytes) seal operations in 5016000us (409439.8 ops/sec): 104.8 MB/s Did 469000 AES-128-GCM (1350 bytes) seal operations in 5015000us (93519.4 ops/sec): 126.3 MB/s Did 92500 AES-128-GCM (8192 bytes) seal operations in 5016000us (18441.0 ops/sec): 151.1 MB/s Did 46750 AES-128-GCM (16384 bytes) seal operations in 5032000us (9290.5 ops/sec): 152.2 MB/s vpaes-1x (for reference, not this CL): Did 8684750 AES-128-GCM (16 bytes) seal operations in 5015000us (1731754.7 ops/sec): 27.7 MB/s [+177%] Did 1731500 AES-128-GCM (256 bytes) seal operations in 5016000us (345195.4 ops/sec): 88.4 MB/s [-15.6%] Did 346500 AES-128-GCM (1350 bytes) seal operations in 5016000us (69078.9 ops/sec): 93.3 MB/s [-26.1%] Did 61250 AES-128-GCM (8192 bytes) seal operations in 5015000us (12213.4 ops/sec): 100.1 MB/s [-33.8%] Did 32500 AES-128-GCM (16384 bytes) seal operations in 5031000us (6459.9 ops/sec): 105.8 MB/s [-30.5%] vpaes-2x (this CL): Did 8840000 AES-128-GCM (16 bytes) seal operations in 5015000us (1762711.9 ops/sec): 28.2 MB/s [+182%] Did 2167750 AES-128-GCM (256 bytes) seal operations in 5016000us (432167.1 ops/sec): 110.6 MB/s [+5.5%] Did 474000 AES-128-GCM (1350 bytes) seal operations in 5016000us (94497.6 ops/sec): 127.6 MB/s [+1.0%] Did 81750 AES-128-GCM (8192 bytes) seal operations in 5015000us (16301.1 ops/sec): 133.5 MB/s [-11.6%] Did 41750 AES-128-GCM (16384 bytes) seal operations in 5031000us (8298.5 ops/sec): 136.0 MB/s [-10.6%] Penryn bsaes (before): Did 958000 AES-128-GCM (16 bytes) seal operations in 1000264us (957747.2 ops/sec): 15.3 MB/s Did 420000 AES-128-GCM (256 bytes) seal operations in 1000480us (419798.5 ops/sec): 107.5 MB/s Did 96000 AES-128-GCM (1350 bytes) seal operations in 1001083us (95896.1 ops/sec): 129.5 MB/s Did 18000 AES-128-GCM (8192 bytes) seal operations in 1042491us (17266.3 ops/sec): 141.4 MB/s Did 9482 AES-128-GCM (16384 bytes) seal operations in 1095703us (8653.8 ops/sec): 141.8 MB/s Did 758000 AES-256-GCM (16 bytes) seal operations in 1000769us (757417.5 ops/sec): 12.1 MB/s Did 359000 AES-256-GCM (256 bytes) seal operations in 1001993us (358285.9 ops/sec): 91.7 MB/s Did 82000 AES-256-GCM (1350 bytes) seal operations in 1009583us (81221.7 ops/sec): 109.6 MB/s Did 15000 AES-256-GCM (8192 bytes) seal operations in 1022294us (14672.9 ops/sec): 120.2 MB/s Did 7884 AES-256-GCM (16384 bytes) seal operations in 1070934us (7361.8 ops/sec): 120.6 MB/s vpaes-1x (for reference, not this CL): Did 2030000 AES-128-GCM (16 bytes) seal operations in 1000227us (2029539.3 ops/sec): 32.5 MB/s [+112%] Did 382000 AES-128-GCM (256 bytes) seal operations in 1001949us (381256.9 ops/sec): 97.6 MB/s [-9.2%] Did 81000 AES-128-GCM (1350 bytes) seal operations in 1007297us (80413.2 ops/sec): 108.6 MB/s [-16.1%] Did 14000 AES-128-GCM (8192 bytes) seal operations in 1031499us (13572.5 ops/sec): 111.2 MB/s [-21.4%] Did 7008 AES-128-GCM (16384 bytes) seal operations in 1030706us (6799.2 ops/sec): 111.4 MB/s [-21.4%] Did 1838000 AES-256-GCM (16 bytes) seal operations in 1000238us (1837562.7 ops/sec): 29.4 MB/s [+143%] Did 321000 AES-256-GCM (256 bytes) seal operations in 1001666us (320466.1 ops/sec): 82.0 MB/s [-10.6%] Did 67000 AES-256-GCM (1350 bytes) seal operations in 1010359us (66313.1 ops/sec): 89.5 MB/s [-18.3%] Did 12000 AES-256-GCM (8192 bytes) seal operations in 1072706us (11186.7 ops/sec): 91.6 MB/s [-23.8%] Did 5680 AES-256-GCM (16384 bytes) seal operations in 1009214us (5628.1 ops/sec): 92.2 MB/s [-23.5%] vpaes-2x (this CL): Did 2072000 AES-128-GCM (16 bytes) seal operations in 1000066us (2071863.3 ops/sec): 33.1 MB/s [+116%] Did 432000 AES-128-GCM (256 bytes) seal operations in 1000732us (431684.0 ops/sec): 110.5 MB/s [+2.8%] Did 92000 AES-128-GCM (1350 bytes) seal operations in 1000580us (91946.7 ops/sec): 124.1 MB/s [-4.2%] Did 16000 AES-128-GCM (8192 bytes) seal operations in 1016422us (15741.5 ops/sec): 129.0 MB/s [-8.8%] Did 8448 AES-128-GCM (16384 bytes) seal operations in 1073962us (7866.2 ops/sec): 128.9 MB/s [-9.1%] Did 1865000 AES-256-GCM (16 bytes) seal operations in 1000043us (1864919.8 ops/sec): 29.8 MB/s [+146%] Did 364000 AES-256-GCM (256 bytes) seal operations in 1001561us (363432.7 ops/sec): 93.0 MB/s [+1.4%] Did 77000 AES-256-GCM (1350 bytes) seal operations in 1004123us (76683.8 ops/sec): 103.5 MB/s [-5.6%] Did 14000 AES-256-GCM (8192 bytes) seal operations in 1071179us (13069.7 ops/sec): 107.1 MB/s [-10.9%] Did 7008 AES-256-GCM (16384 bytes) seal operations in 1074125us (6524.4 ops/sec): 106.9 MB/s [-11.4%] Penryn, CBC mode decryption bsaes (before): Did 159000 AES-128-CBC-SHA1 (16 bytes) open operations in 1001019us (158838.1 ops/sec): 2.5 MB/s Did 114000 AES-128-CBC-SHA1 (256 bytes) open operations in 1006485us (113265.5 ops/sec): 29.0 MB/s Did 65000 AES-128-CBC-SHA1 (1350 bytes) open operations in 1008441us (64455.9 ops/sec): 87.0 MB/s Did 17000 AES-128-CBC-SHA1 (8192 bytes) open operations in 1005440us (16908.0 ops/sec): 138.5 MB/s vpaes (after): Did 167000 AES-128-CBC-SHA1 (16 bytes) open operations in 1003556us (166408.3 ops/sec): 2.7 MB/s [+8%] Did 112000 AES-128-CBC-SHA1 (256 bytes) open operations in 1005673us (111368.2 ops/sec): 28.5 MB/s [-1.7%] Did 56000 AES-128-CBC-SHA1 (1350 bytes) open operations in 1005647us (55685.5 ops/sec): 75.2 MB/s [-13.6%] Did 13635 AES-128-CBC-SHA1 (8192 bytes) open operations in 1020486us (13361.3 ops/sec): 109.5 MB/s [-20.9%] Bug: 256 Change-Id: I11ed773323ec7a5ee61080c9ed9ed4761849828a Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35364 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-03-23 06:59:22 +00:00
David Benjamin	4ca8d131d3	Rewrite BN_CTX. While allocating near INT_MAX BIGNUMs or stack frames would never happen, we should properly handle overflow here. Rewrite it to just be a STACK_OF(BIGNUM) plus a stack of indices. Also simplify the error-handling. If we make the errors truly sticky (rather than just sticky per frame), we don't need to keep track of err_stack and friends. Thanks to mlbrown for reporting the integer overflows in the original implementation. Bug: chromium:942269 Change-Id: Ie9c9baea3eeb82d65d88b1cb1388861f5cd84fe5 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35328 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-03-18 19:18:31 +00:00
David Benjamin	c93be52c9e	Save a temporary in BN_mod_exp_mont's w=1 case. BN_mod_exp_mont is most commonly used in RSA verification, where the exponent sizes are small enough to use 1-bit "windows". There's no need to allocate the extra BIGNUM. Change-Id: I14fb523dfae7d77d2cec10a0209f09f22031d1af Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35327 Reviewed-by: Adam Langley <agl@google.com>	2019-03-18 17:20:32 +00:00
David Benjamin	fdb48f9861	Drop some unused bsaes to aes_nohw dependencies. When the CBC and CTR EVP_CIPHER implementations use bsaes, they never call dat->block. Note this is not true of aes_ctr_set_key which is used in contexts where it needs single-block operations. Bug: 256 Change-Id: Ibea4f2117a2220cd5cb09f6cf12b7a50c28bf794 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35168 Reviewed-by: Adam Langley <agl@google.com>	2019-03-14 21:43:58 +00:00
David Benjamin	d22578f366	Adapt gcm__neon to aarch64. This makes AES-GCM always constant-time on aarch64 (provided assembly is enabled). Unlike vpaes, this does come at a binary size penalty of 1K compared to the gcm__4bit version. ABI testing already covered by GCMTest.ABI (GHASH_ASM_ARM covers both OPENSSL_ARM and OPENSSL_AARCH64.) Cortex-A53 (Raspberry Pi 3 Model B+) Before: Did 274000 AES-128-GCM (16 bytes) seal operations in 1003461us (273055.0 ops/sec): 4.4 MB/s Did 53000 AES-128-GCM (256 bytes) seal operations in 1007689us (52595.6 ops/sec): 13.5 MB/s Did 12000 AES-128-GCM (1350 bytes) seal operations in 1075908us (11153.4 ops/sec): 15.1 MB/s Did 2068 AES-128-GCM (8192 bytes) seal operations in 1089037us (1898.9 ops/sec): 15.6 MB/s After: Did 298000 AES-128-GCM (16 bytes) seal operations in 1002917us (297133.3 ops/sec): 4.8 MB/s Did 64000 AES-128-GCM (256 bytes) seal operations in 1001124us (63928.1 ops/sec): 16.4 MB/s Did 14000 AES-128-GCM (1350 bytes) seal operations in 1015477us (13786.6 ops/sec): 18.6 MB/s Did 2497 AES-128-GCM (8192 bytes) seal operations in 1057951us (2360.2 ops/sec): 19.3 MB/s Bug: 265 Change-Id: I251bf0f2eae0578580bb14192755e5d8ff64cd14 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35285 Reviewed-by: Adam Langley <agl@google.com>	2019-03-14 21:43:27 +00:00
David Benjamin	4851041967	Patch out the aes_nohw fallback in bsaes_cbc_encrypt. This plugs all bsaes fallback leaks for CBC outside of the key schedule. The CBC EVP_CIPHERs never call the block function directly when there's a stream.cbc function available. This affects CBC decryptions of length < 128 or 16 mod 128. Performance-wise, we don't really care about CBC apart from passing glances at its use in TLS. There, the Lucky13 workaround mutes the effects. Cortex-A53 (Raspberry Pi 3 Model B+) Before: Did 78000 AES-128-CBC-SHA1 (16 bytes) open operations in 3020254us (25825.6 ops/sec): 0.4 MB/s Did 75000 AES-128-CBC-SHA1 (32 bytes) open operations in 3005760us (24952.1 ops/sec): 0.8 MB/s Did 71000 AES-128-CBC-SHA1 (64 bytes) open operations in 3038137us (23369.6 ops/sec): 1.5 MB/s Did 67000 AES-128-CBC-SHA1 (96 bytes) open operations in 3027686us (22129.1 ops/sec): 2.1 MB/s Did 64000 AES-128-CBC-SHA1 (112 bytes) open operations in 3005491us (21294.4 ops/sec): 2.4 MB/s Did 59000 AES-128-CBC-SHA1 (128 bytes) open operations in 3020083us (19535.9 ops/sec): 2.5 MB/s Did 53000 AES-128-CBC-SHA1 (240 bytes) open operations in 3020105us (17549.1 ops/sec): 4.2 MB/s After: Did 71668 AES-128-CBC-SHA1 (16 bytes) open operations in 3020896us (23724.1 ops/sec): 0.4 MB/s Did 71000 AES-128-CBC-SHA1 (32 bytes) open operations in 3040826us (23348.9 ops/sec): 0.7 MB/s Did 68000 AES-128-CBC-SHA1 (64 bytes) open operations in 3009913us (22592.0 ops/sec): 1.4 MB/s Did 66000 AES-128-CBC-SHA1 (96 bytes) open operations in 3007597us (21944.4 ops/sec): 2.1 MB/s Did 59000 AES-128-CBC-SHA1 (112 bytes) open operations in 3002878us (19647.8 ops/sec): 2.2 MB/s Did 59000 AES-128-CBC-SHA1 (128 bytes) open operations in 3046786us (19364.7 ops/sec): 2.5 MB/s Did 50000 AES-128-CBC-SHA1 (240 bytes) open operations in 3043643us (16427.7 ops/sec): 3.9 MB/s Penryn (Mac mini, mid 2010) Before: Did 152000 AES-128-CBC-SHA1 (16 bytes) open operations in 1004422us (151330.8 ops/sec): 2.4 MB/s Did 143000 AES-128-CBC-SHA1 (32 bytes) open operations in 1000443us (142936.7 ops/sec): 4.6 MB/s Did 136000 AES-128-CBC-SHA1 (48 bytes) open operations in 1006580us (135111.0 ops/sec): 6.5 MB/s Did 146000 AES-128-CBC-SHA1 (96 bytes) open operations in 1005731us (145168.0 ops/sec): 13.9 MB/s Did 138000 AES-128-CBC-SHA1 (112 bytes) open operations in 1003330us (137542.0 ops/sec): 15.4 MB/s Did 133000 AES-128-CBC-SHA1 (128 bytes) open operations in 1005876us (132223.1 ops/sec): 16.9 MB/s Did 117000 AES-128-CBC-SHA1 (240 bytes) open operations in 1004922us (116426.9 ops/sec): 27.9 MB/s After: Did 159000 AES-128-CBC-SHA1 (16 bytes) open operations in 1000505us (158919.7 ops/sec): 2.5 MB/s Did 157000 AES-128-CBC-SHA1 (32 bytes) open operations in 1006091us (156049.5 ops/sec): 5.0 MB/s Did 154000 AES-128-CBC-SHA1 (48 bytes) open operations in 1002720us (153582.3 ops/sec): 7.4 MB/s Did 146000 AES-128-CBC-SHA1 (96 bytes) open operations in 1002567us (145626.2 ops/sec): 14.0 MB/s Did 135000 AES-128-CBC-SHA1 (112 bytes) open operations in 1001212us (134836.6 ops/sec): 15.1 MB/s Did 133000 AES-128-CBC-SHA1 (128 bytes) open operations in 1006441us (132148.8 ops/sec): 16.9 MB/s Did 115000 AES-128-CBC-SHA1 (240 bytes) open operations in 1005246us (114399.9 ops/sec): 27.5 MB/s Bug: 256 Change-Id: I864b4455ada0d4d245380fce6f869dabb0686354 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35167 Reviewed-by: Adam Langley <agl@google.com>	2019-03-14 21:38:28 +00:00
David Benjamin	885a63fb74	Patch out the aes_nohw fallback in bsaes_ctr32_encrypt_blocks. bsaes_ctr32_encrypt_blocks previously fell back to the table-based aes_nohw_encrypt for inputs under 128 bytes. Instead, just run the usual bsaes code, though it means we compute more blocks than needed. This fixes some (but not all) the timing leaks and is needed for later bsaes work. Performance-wise, x86_64 actually sees a performance improvement for all but tiny inputs. ARM does see a loss at small inputs however. Cortex-A53 (Raspberry Pi 3 Model B+) Before: Did 299000 AES-128-GCM (16 bytes) seal operations in 1001123us (298664.6 ops/sec): 4.8 MB/s Did 236000 AES-128-GCM (32 bytes) seal operations in 1001611us (235620.4 ops/sec): 7.5 MB/s Did 167000 AES-128-GCM (64 bytes) seal operations in 1005706us (166052.5 ops/sec): 10.6 MB/s Did 129000 AES-128-GCM (96 bytes) seal operations in 1006129us (128214.2 ops/sec): 12.3 MB/s Did 116000 AES-128-GCM (112 bytes) seal operations in 1006302us (115273.5 ops/sec): 12.9 MB/s Did 107000 AES-128-GCM (128 bytes) seal operations in 1000986us (106894.6 ops/sec): 13.7 MB/s After: Did 132000 AES-128-GCM (16 bytes) seal operations in 1005165us (131321.7 ops/sec): 2.1 MB/s Did 128000 AES-128-GCM (32 bytes) seal operations in 1005966us (127240.9 ops/sec): 4.1 MB/s Did 120000 AES-128-GCM (64 bytes) seal operations in 1003080us (119631.5 ops/sec): 7.7 MB/s Did 113000 AES-128-GCM (96 bytes) seal operations in 1000557us (112937.1 ops/sec): 10.8 MB/s Did 110000 AES-128-GCM (112 bytes) seal operations in 1000407us (109955.2 ops/sec): 12.3 MB/s Did 108000 AES-128-GCM (128 bytes) seal operations in 1008830us (107054.7 ops/sec): 13.7 MB/s (Inputs 128 bytes and up are unaffected by this CL.) Nexus 7 Before: Did 544000 AES-128-GCM (16 bytes) seal operations in 1001282us (543303.5 ops/sec): 8.7 MB/s Did 475750 AES-128-GCM (32 bytes) seal operations in 1000244us (475633.9 ops/sec): 15.2 MB/s Did 370500 AES-128-GCM (64 bytes) seal operations in 1000519us (370307.8 ops/sec): 23.7 MB/s Did 300750 AES-128-GCM (96 bytes) seal operations in 1000122us (300713.3 ops/sec): 28.9 MB/s Did 275750 AES-128-GCM (112 bytes) seal operations in 1000702us (275556.6 ops/sec): 30.9 MB/s Did 251000 AES-128-GCM (128 bytes) seal operations in 1000214us (250946.3 ops/sec): 32.1 MB/s After: Did 296000 AES-128-GCM (16 bytes) seal operations in 1001129us (295666.2 ops/sec): 4.7 MB/s Did 288750 AES-128-GCM (32 bytes) seal operations in 1000488us (288609.2 ops/sec): 9.2 MB/s Did 267250 AES-128-GCM (64 bytes) seal operations in 1000641us (267078.8 ops/sec): 17.1 MB/s Did 253250 AES-128-GCM (96 bytes) seal operations in 1000915us (253018.5 ops/sec): 24.3 MB/s Did 248000 AES-128-GCM (112 bytes) seal operations in 1000091us (247977.4 ops/sec): 27.8 MB/s Did 249000 AES-128-GCM (128 bytes) seal operations in 1000794us (248802.5 ops/sec): 31.8 MB/s Penryn (Mac mini, mid 2010) Before: Did 1331000 AES-128-GCM (16 bytes) seal operations in 1000263us (1330650.0 ops/sec): 21.3 MB/s Did 991000 AES-128-GCM (32 bytes) seal operations in 1000274us (990728.5 ops/sec): 31.7 MB/s Did 780000 AES-128-GCM (48 bytes) seal operations in 1000278us (779783.2 ops/sec): 37.4 MB/s Did 483000 AES-128-GCM (96 bytes) seal operations in 1000137us (482933.8 ops/sec): 46.4 MB/s Did 428000 AES-128-GCM (112 bytes) seal operations in 1001132us (427516.1 ops/sec): 47.9 MB/s Did 682000 AES-128-GCM (128 bytes) seal operations in 1000564us (681615.6 ops/sec): 87.2 MB/s After: Did 953000 AES-128-GCM (16 bytes) seal operations in 1000385us (952633.2 ops/sec): 15.2 MB/s Did 903000 AES-128-GCM (32 bytes) seal operations in 1000998us (902099.7 ops/sec): 28.9 MB/s Did 850000 AES-128-GCM (48 bytes) seal operations in 1000938us (849203.4 ops/sec): 40.8 MB/s Did 736000 AES-128-GCM (96 bytes) seal operations in 1000886us (735348.5 ops/sec): 70.6 MB/s Did 702000 AES-128-GCM (112 bytes) seal operations in 1000657us (701539.1 ops/sec): 78.6 MB/s Did 676000 AES-128-GCM (128 bytes) seal operations in 1000405us (675726.3 ops/sec): 86.5 MB/s Bug: 256 Change-Id: I9403da607dd1feaff7b3c9b76fe78b66018fb753 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35166 Reviewed-by: Adam Langley <agl@google.com>	2019-03-14 21:37:46 +00:00
David Benjamin	35941f2923	Make vpaes-armv8.pl compatible with XOM. Change-Id: I27413467e5cac4e16ecbbb8d9a238ba5a8bcb9e7 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35284 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-03-11 23:17:06 +00:00
David Benjamin	8d685ec867	modes/asm/ghash-armv4.pl: address "infixes are deprecated" warnings. This imports ce5eb5e8149d8d03660575f4b8504c993851988a and 1212818eb07add297fe562eba80ac46a9893781e from OpenSSL's 1.1.1 branch. Change-Id: I121c0771371697191a163a28d972a7b3cee37762 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35164 Reviewed-by: Adam Langley <agl@google.com>	2019-03-05 17:52:28 +00:00
David Benjamin	55db667c62	Enable vpaes for aarch64, with CTR optimizations. This patches vpaes-armv8.pl to add vpaes_ctr32_encrypt_blocks. CTR mode is by far the most important mode these days. It should have access to _vpaes_encrypt_2x, which gives a considerable speed boost. Also exclude vpaes_ecb_* as they're not even used. For iOS, this change is completely a no-op. iOS ARMv8 always has crypto extensions, and we already statically drop all other AES implementations. Android ARMv8 is not required to have crypto extensions, but every ARMv8 device I've seen has them. For those, it is a no-op performance-wise and a win on size. vpaes appears to be about 5.6KiB smaller than the tables. ARMv8 always makes SIMD (NEON) available, so we can statically drop aes_nohw. In theory, however, crypto-less Android ARMv8 is possible. Today such chips get a variable-time AES. This CL fixes this, but the performance story is complex. The Raspberry Pi 3 is not Android but has a Cortex-A53 chip without crypto extensions. (But the official images are 32-bit, so even this is slightly artificial...) There, vpaes is a performance win. Raspberry Pi 3, Model B+, Cortex-A53 Before: Did 265000 AES-128-GCM (16 bytes) seal operations in 1003312us (264125.2 ops/sec): 4.2 MB/s Did 44000 AES-128-GCM (256 bytes) seal operations in 1002141us (43906.0 ops/sec): 11.2 MB/s Did 9394 AES-128-GCM (1350 bytes) seal operations in 1032104us (9101.8 ops/sec): 12.3 MB/s Did 1562 AES-128-GCM (8192 bytes) seal operations in 1008982us (1548.1 ops/sec): 12.7 MB/s After: Did 277000 AES-128-GCM (16 bytes) seal operations in 1001884us (276479.1 ops/sec): 4.4 MB/s Did 52000 AES-128-GCM (256 bytes) seal operations in 1001480us (51923.2 ops/sec): 13.3 MB/s Did 11000 AES-128-GCM (1350 bytes) seal operations in 1007979us (10912.9 ops/sec): 14.7 MB/s Did 2013 AES-128-GCM (8192 bytes) seal operations in 1085545us (1854.4 ops/sec): 15.2 MB/s The Pixel 3 has a Cortex-A75 with crypto extensions, so it would never run this code. However, artificially ignoring them gives another data point (ARM documentation[] suggests the extensions are still optional on a Cortex-A75.) Sadly, vpaes no longer wins on perf over aes_nohw. But, it is constant-time: Pixel 3, AES/PMULL extensions ignored, Cortex-A75: Before: Did 2102000 AES-128-GCM (16 bytes) seal operations in 1000378us (2101205.7 ops/sec): 33.6 MB/s Did 358000 AES-128-GCM (256 bytes) seal operations in 1002658us (357051.0 ops/sec): 91.4 MB/s Did 75000 AES-128-GCM (1350 bytes) seal operations in 1012830us (74049.9 ops/sec): 100.0 MB/s Did 13000 AES-128-GCM (8192 bytes) seal operations in 1036524us (12541.9 ops/sec): 102.7 MB/s After: Did 1453000 AES-128-GCM (16 bytes) seal operations in 1000213us (1452690.6 ops/sec): 23.2 MB/s Did 285000 AES-128-GCM (256 bytes) seal operations in 1002227us (284366.7 ops/sec): 72.8 MB/s Did 60000 AES-128-GCM (1350 bytes) seal operations in 1016106us (59049.0 ops/sec): 79.7 MB/s Did 11000 AES-128-GCM (8192 bytes) seal operations in 1094184us (10053.2 ops/sec): 82.4 MB/s Note the numbers above run with PMULL off, so the slow GHASH is dampening the regression. If we test aes_nohw and vpaes paired with PMULL on, the 20% perf hit becomes a 31% hit. The PMULL-less variant is more likely to represent a real chip. This is consistent with upstream's note in the comment, though it is unclear if 20% is the right order of magnitude: "these results are worse than scalar compiler-generated code, but it's constant-time and therefore preferred". [] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100458_0301_00_en/lau1442495529696.html Bug: 246 Change-Id: If1dc87f5131fce742052498295476fbae4628dbf Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35026 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-03-04 20:31:39 +00:00
David Benjamin	b1b4ff93ca	Check in vpaes-armv8.pl from OpenSSL unused and unmodified. This is done separately to make the diffs in the subsequent CL easier to see. Imported from OpenSSL at revision 25ca718150cef41e1c1d9c2c8c58e2b1e2cad3fa. Bug: 246 Change-Id: I9e7067ea177963fb9b77bf6fb39702ffe6e34ed4 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35025 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-03-04 20:23:09 +00:00
Jeremy Apthorp	19220dd6af	Handle NULL public key in \|EC_KEY_set_public_key\|. Node.js expects to be able to pass NULL to this function to clear the current public key: `adbe3b837e/src/node_crypto.cc (L5316)` Change-Id: Id4e34d8e8b556c28000e4df12ff6f4432ad9220c Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35124 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2019-03-04 19:45:29 +00:00
David Benjamin	5ce12e6436	Add a 32-bit SSSE3 GHASH implementation. The 64-bit version can be fairly straightforwardly translated. Ironically, this makes 32-bit x86 the first architecture to meet the goal of constant-time AES-GCM given SIMD assembly. (Though x86_64 could join by simply giving up on bsaes...) Bug: 263 Change-Id: Icb2cec936457fac7132bbb5dbb094433bc14b86e Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35024 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-03-04 19:02:52 +00:00
David Benjamin	a57435e138	Remove __ARM_ARCH__ guard on gcm_*_v8. OpenSSL's c1669e1c205dc8e695fb0c10a655f434e758b9f7 switched it to __ARM_MAX_ARCH__, which we mirrored in assembly but not C. The C version should be __ARM_MAX_ARCH__ to match. However, __ARM_MAX_ARCH__ is hardcoded to 8, so just remove the check. Change-Id: Ic873203db1478f49437b889b84ee7fb28eba1a6d Reviewed-on: https://boringssl-review.googlesource.com/c/35045 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-02-27 02:26:21 +00:00
David Benjamin	f1f73f8966	Fix bsaes-armv7.pl getting disabled by accident. https://boringssl-review.googlesource.com/c/34188 accidentally disabled it (__ARM_MAX_ARCH__ wasn't defined), which, in turn, masked a bug in https://boringssl-review.googlesource.com/c/34874. Remove the __ARM_MAX_ARCH__ check as that's hardcoded to 8 anyway. Then revert the problematic part of the bsaes-armv7.pl change. That brings back the somewhat questionable post-dispatch to pre-dispatch call, but I hope to patch the fallbacks out soon anyway. Change-Id: I567e55fe35cb716d5ed56580113a302617f5ad71 Reviewed-on: https://boringssl-review.googlesource.com/c/35044 Commit-Queue: David Benjamin <davidben@google.com> Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-02-27 02:06:21 +00:00
Adam Langley	a367d9267f	Set VPAES flags in x86-64 code. The ImplDispatchTest was broken because the 64-bit VPAES code wasn't setting the hit flags. Change-Id: I30200db64337deba7ae9d70d8427decbdfceca58 Reviewed-on: https://boringssl-review.googlesource.com/c/34986 Reviewed-by: David Benjamin <davidben@google.com>	2019-02-22 23:41:50 +00:00
David Benjamin	65dc321492	Enable vpaes for AES_* functions. This makes the AES_* functions meet our constant-time goals for platforms where we have vpaes available. In particular, QUIC packet number encryption needs single-block operations and those should have vpaes available. As a bonus, when vpaes is statically available, the aes_nohw_* functions should be dropped by the linker. (Notably, NEON is guaranteed on aarch64. Although vpaes-armv8.pl itself may take some more exploration. https://crbug.com/boringssl/246#c4) Bug: 263 Change-Id: Ie1c4727a166ec101a8453761757c87dadc188769 Reviewed-on: https://boringssl-review.googlesource.com/c/34875 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-02-22 23:09:19 +00:00
David Benjamin	3c19830f6f	Avoid double-dispatch with AES_* vs aes_nohw_. In particular, consistently pair bsaes with aes_nohw. Ideally the aes_nohw_ calls in bsaes-.pl would be patched out and bsaes grows its own constant-time key setup (https://crbug.com/boringssl/256), but I'll sort that out separately. In the meantime, avoid going through AES_ which now dispatch. This avoids several nuisances: 1. If we were to add, say, a vpaes-armv7.pl the ABI tests would break. Fundamentally, we cannot assume that an AES_KEY has one and only one representation and must keep everything matching up. 2. AES_* functions should enable vpaes. This makes AES_* faster and constant-time for vector-capable CPUs (https://crbug.com/boringssl/263), relevant for QUIC packet number encryption, allowing us to add vpaes-armv8.pl (https://crbug.com/boringssl/246) without carrying a (likely) mostly unused AES implementation. 3. It's silly to double-dispatch when the EVP layer has already dispatched. 4. We should avoid asm calling into C. Otherwise, we need to test asm for ABI compliance as both caller and callee. Currently we only test it for callee compliance. When asm calls into asm, it should comply with the ABI as caller too, but mistakes don't matter as long as the called function triggers it. If the function is asm, this is fixed. If it is C, we must care about arbitrary C compiler output. Bug: 263 Change-Id: Ic85af5c765fd57cbffeaf301c3872bad6c5bbf78 Reviewed-on: https://boringssl-review.googlesource.com/c/34874 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-02-22 22:51:51 +00:00
David Benjamin	f109f20873	Clear out a bunch of -Wextra-semi warnings. Unfortunately, it's not enough to be able to turn it on thanks to the PURE_VIRTUAL macro. But it gets us most of the way there. Change-Id: Ie6ad5119fcfd420115fa49d7312f3586890244f4 Reviewed-on: https://boringssl-review.googlesource.com/c/34949 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2019-02-21 19:12:39 +00:00
David Benjamin	104306f587	Remove STRICT_ALIGNMENT code from modes. STRICT_ALIGNMENT is a remnant of OpenSSL code would cast pointers to size_t* and load more than one byte at a time. Not all architectures support unaligned access, so it did an alignment check and only enterred this path if aligned or the underlying architecture didn't care. This is UB. Unaligned casts in C are undefined on all architectures, so we switch these to memcpy some time ago. Compilers can optimize memcpy to the unaligned accesses we wanted. That left our modes logic as: - If STRICT_ALIGNMENT is 1 and things are unaligned, work byte-by-byte. - Otherwise, use the memcpy-based word-by-word code, which now works independent of STRICT_ALIGNMENT. Remove the first check to simplify things. On x86, x86_64, and aarch64, STRICT_ALIGNMENT is zero and this is a no-op. ARM is more complex. Per [0], ARMv7 and up support unaligned access. ARMv5 do not. ARMv6 does, but can run in a mode where it looks more like ARMv5. For ARMv7 and up, STRICT_ALIGNMENT should have been zero, but was one. Thus this change should be an improvement for ARMv7 (right now unaligned inputs lose bsaes-armv7). The Android NDK does not even support the pre-ARMv7 ABI anymore[1]. Nonetheless, Cronet still supports ARMv6 as a library. It builds with -march=armv6 which GCC interprets as supporting unaligned access, so it too did not want this code. For completeness, should anyone still care about ARMv5 or be building with an overly permissive -march flag, GCC does appear unable to inline the memcpy calls. However, GCC also does not interpret (uintptr_t)ptr % sizeof(size_t) as an alignment assertion, so such consumers have already been paying for the memcpy here and throughout the library. In general, C's arcane pointer rules mean we must resort to memcpy often, so, realistically, we must require that the compiler optimize memcpy well. [0] https://medium.com/@iLevex/the-curious-case-of-unaligned-access-on-arm-5dd0ebe24965 [1] https://developer.android.com/ndk/guides/abis#armeabi Change-Id: I3c7dea562adaeb663032e395499e69530dd8e145 Reviewed-on: https://boringssl-review.googlesource.com/c/34873 Reviewed-by: Adam Langley <agl@google.com>	2019-02-14 17:39:36 +00:00
David Benjamin	4d8e1ce5e9	Patch XTS out of ARMv7 bsaes too. Bug: 256 Change-Id: I822274bf05901d82b41dc9c9c4e6d0b5d622f3ff Reviewed-on: https://boringssl-review.googlesource.com/c/34871 Reviewed-by: Adam Langley <agl@google.com>	2019-02-14 17:31:37 +00:00
David Benjamin	fb35b147ca	Remove stray prototype. The function's since been renamed. Change-Id: Id1a9788dfeb5c46b3463611b08318b3f253d03df Reviewed-on: https://boringssl-review.googlesource.com/c/34870 Reviewed-by: Adam Langley <agl@google.com>	2019-02-14 17:31:14 +00:00
David Benjamin	eb2c2cdf17	Always define GHASH. There is a C implementation of gcm_ghash_4bit to pair with gcm_gmult_4bit. It's even slightly faster per the numbers below (x86_64 OPENSSL_NO_ASM build), but, more importantly, we trim down the combinatorial explosion of GCM implementations and free up complexity budget for potentially using bsaes better in the future. Old: Did 2557000 AES-128-GCM (16 bytes) seal operations in 1000057us (2556854.3 ops/sec): 40.9 MB/s Did 94000 AES-128-GCM (1350 bytes) seal operations in 1009613us (93105.0 ops/sec): 125.7 MB/s Did 17000 AES-128-GCM (8192 bytes) seal operations in 1024768us (16589.1 ops/sec): 135.9 MB/s Did 2511000 AES-256-GCM (16 bytes) seal operations in 1000196us (2510507.9 ops/sec): 40.2 MB/s Did 84000 AES-256-GCM (1350 bytes) seal operations in 1000412us (83965.4 ops/sec): 113.4 MB/s Did 15000 AES-256-GCM (8192 bytes) seal operations in 1046963us (14327.2 ops/sec): 117.4 MB/s New: Did 2739000 AES-128-GCM (16 bytes) seal operations in 1000322us (2738118.3 ops/sec): 43.8 MB/s Did 100000 AES-128-GCM (1350 bytes) seal operations in 1008190us (99187.7 ops/sec): 133.9 MB/s Did 17000 AES-128-GCM (8192 bytes) seal operations in 1006360us (16892.6 ops/sec): 138.4 MB/s Did 2546000 AES-256-GCM (16 bytes) seal operations in 1000150us (2545618.2 ops/sec): 40.7 MB/s Did 86000 AES-256-GCM (1350 bytes) seal operations in 1000970us (85916.7 ops/sec): 116.0 MB/s Did 14850 AES-256-GCM (8192 bytes) seal operations in 1023459us (14509.6 ops/sec): 118.9 MB/s While I'm here, tighten up some of the functions and align the ctr32 and non-ctr32 paths. Bug: 256 Change-Id: Id4df699cefc8630dd5a350d44f927900340f5e60 Reviewed-on: https://boringssl-review.googlesource.com/c/34869 Reviewed-by: Adam Langley <agl@google.com>	2019-02-14 17:30:55 +00:00
David Benjamin	2e819d8be4	Unwind RDRAND functions correctly on Windows. But for the ABI conversion bits, these are just leaf functions and don't even need unwind tables. Just renumber the registers on Windows to only used volatile ones. In doing so, this switches to writing rdrand explicitly. perlasm already knows how to manually encode it and our minimum assembler versions surely cover rdrand by now anyway. Also add the .size directive. I'm not sure what it's used for, but the other files have it. (This isn't a generally reusable technique. The more complex functions will need actual unwind codes.) Bug: 259 Change-Id: I1d5669bcf8b6e34939885d78aea6f60597be1528 Reviewed-on: https://boringssl-review.googlesource.com/c/34867 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-02-12 20:24:27 +00:00
David Benjamin	15ba2d11a9	Patch out unused aesni-x86_64 functions. This shrinks the bssl binary by about 8k. Change-Id: I571f258ccf7032ae34db3f20904ad9cc81cca839 Reviewed-on: https://boringssl-review.googlesource.com/c/34866 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-02-11 20:25:22 +00:00
David Benjamin	cc2b8e2552	Add ABI tests for aesni-gcm-x86_64.pl. Change-Id: Ic23fc5fbec2c4f8df5d06f807c6bd2c5e1f0e99c Reviewed-on: https://boringssl-review.googlesource.com/c/34865 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-02-11 20:08:38 +00:00
David Benjamin	7a3b94cd2c	Add ABI tests for x86_64-mont5.pl. Fix some missing CFI bits. Change-Id: I42114527f0ef8e03079d37a9f466d64a63a313f5 Reviewed-on: https://boringssl-review.googlesource.com/c/34864 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-02-11 19:27:13 +00:00
Katrin Leinweber	d2a0ffdfa7	Hyperlink DOI to preferred resolver Change-Id: Ib9983a74d5d2f8be7c96cedde17be5a4e9223d5e Reviewed-on: https://boringssl-review.googlesource.com/c/34844 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2019-02-08 19:20:05 +00:00
David Benjamin	55b9acda99	Fix ABI error in bn_mul_mont on aarch64. This was caught by an aarch64 ABI tester. aarch64 has the same considerations around small arguments as x86_64 does. The aarch64 version of bn_mul_mont does not mask off the upper words of the argument. The x86_64 version does, so size_t is, strictly speaking, wrong for aarch64, but bn_mul_mont already has an implicit size limit to support its internal alloca, so this doesn't really make things worse than before. Change-Id: I39bffc8fdb2287e45a2d1f0d1b4bd5532bbf3868 Reviewed-on: https://boringssl-review.googlesource.com/c/34804 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-02-05 21:17:54 +00:00
David Benjamin	0a87c4982c	Implement ABI testing for ARM. Update-Note: There's some chance this'll break iOS since I was unable to test it there. The iPad I have to test on is too new to run 32-bit code at all. Change-Id: I6593f91b67a5e8a82828237d3b69ed948b07922d Reviewed-on: https://boringssl-review.googlesource.com/c/34725 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-02-05 21:01:44 +00:00
David Benjamin	0a67eba62d	Fix the order of Windows unwind codes. The unwind tester suggests Windows doesn't care, but the documentation says that unwind codes should be sorted in descending offset, which means the last instruction should be first. https://docs.microsoft.com/en-us/cpp/build/exception-handling-x64?view=vs-2017#struct-unwind_code Bug: 259 Change-Id: I21e54c362e18e0405f980005112cc3f7c417c70c Reviewed-on: https://boringssl-review.googlesource.com/c/34785 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-02-05 19:38:23 +00:00
David Benjamin	28f035f48b	Implement unwind testing for Windows. Unfortunately, due to most OpenSSL assembly using custom exception handlers to unwind, most of our assembly doesn't work with non-destructive unwind. For now, CHECK_ABI behaves like CHECK_ABI_NO_UNWIND on Windows, and CHECK_ABI_SEH will test unwinding on both platforms. The tests do, however, work with the unwind-code-based assembly we recently added, as well as the clmul-based GHASH which is also code-based. Remove the ad-hoc SEH tests which intentionally hit memory access exceptions, now that we can test unwind directly. Now that we can test it, the next step is to implement SEH directives in perlasm so writing these unwind codes is less of a chore. Bug: 259 Change-Id: I23a57a22c5dc9fa4513f575f18192335779678a5 Reviewed-on: https://boringssl-review.googlesource.com/c/34784 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-02-05 19:22:15 +00:00
David Benjamin	33f456b8b0	Don't use bsaes over vpaes for CTR-DRBG. RAND_bytes rarely uses large enough inputs for bsaes to be worth it. https://boringssl-review.googlesource.com/c/boringssl/+/33589 includes some rough benchmarks of various bits here. Some observations: - 8 blocks of bsaes costs roughly 6.5 blocks of vpaes. Note the comparison isn't quite accurate because I'm measuring bsaes_ctr32_encrypt_blocks against vpaes_encrypt and vpaes in CTR mode today must make do with a C loop. Even assuming a cutoff of 6 rather than 7 blocks, it's rare to ask for 96 bytes of entropy at a time. - CTR-DRBG performs some stray block operations (ctr_drbg_update), which bsaes is bad at without extra work to fold them into the CTR loop (not really worth it). - CTR-DRBG calculates a couple new key schedules every RAND_bytes call. We don't currently have a constant-time bsaes key schedule. Unfortunately, even plain vpaes loses to the current aes_nohw used by bsaes, but it's not constant-time. Also taking CTR-DRBG out of the bsaes equation - Machines without AES hardware (clients) are not going to be RNG-bound. It's mostly servers pushing way too many CBC IVs that care. This means bsaes's current side channel tradeoffs make even less sense here. I'm not sure yet what we should do for the rest of the bsaes mess, but it seems clear that we want to stick with vpaes for the RNG. Bug: 256 Change-Id: Iec8f13af232794afd007cb1065913e8117eeee24 Reviewed-on: https://boringssl-review.googlesource.com/c/34744 Reviewed-by: Adam Langley <agl@google.com>	2019-02-01 18:03:39 +00:00
David Benjamin	23e1a1f2d3	Test and fix an ABI issue with small parameters. Calling conventions must specify how to handle arguments smaller than a machine word. Should the caller pad them up to a machine word size with predictable values (zero/sign-extended), or should the callee tolerate an arbitrary bit pattern? Annoyingly, I found no text in either SysV or Win64 ABI documentation describing any of this and resorted to experiment. The short answer is that callees must tolerate an arbitrary bit pattern on x86_64, which means we must test this. See the comment in abi_test::internal::ToWord for the long answer. CHECK_ABI now, if the type of the parameter is smaller than crypto_word_t, fills the remaining bytes with 0xaa. This is so the number is out of bounds for code expecting either zero or sign extension. (Not that crypto assembly has any business seeing negative numbers.) Doing so reveals a bug in ecp_nistz256_ord_sqr_mont. The rep parameter is typed int, but the code expected uint64_t. In practice, the compiler will always compile this correctly because: - On both Win64 and SysV, rep is a register parameter. - The rep parameter is always a constant, so the compiler has no reason to leave garbage in the upper half. However, I was indeed able to get a bug out of GCC via: uint64_t foo = (1ull << 63) \| 2; // Some global the compiler can't // prove constant. ecp_nistz256_ord_sqr_mont(res, a, foo >> 1); Were ecp_nistz256_ord_sqr_mont a true int-taking function, this would act like ecp_nistz256_ord_sqr_mont(res, a, 1). Instead, it hung. Fix this by having it take a full-width word. This mess has several consequences: - ABI testing now ideally needs a functional testing component to fully cover this case. A bad input might merely produce the wrong answer. Still, this is fairly effective as it will cause most code to either segfault or loop forever. (Not the enc parameter to AES however...) - We cannot freely change the type of assembly function prototypes. If the prototype says int or unsigned, it must be ignoring the upper half and thus "fixing" it to size_t cannot have handled the full range. (Unless it was simply wrong of the parameter is already bounded.) If the prototype says size_t, switching to int or unsigned will hit this type of bug. The former is a safer failure mode though. - The simplest path out of this mess: new assembly code should only ever take word-sized parameters. This is not a tall order as the bad parameters are usually ints that should have been size_t. Calling conventions are hard. Change-Id: If8254aff8953844679fbce4bd3e345e5e2fa5213 Reviewed-on: https://boringssl-review.googlesource.com/c/34627 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-28 21:09:40 +00:00
David Benjamin	ab578adf44	Add RSAZ ABI tests. As part of this, move the CPU checks to C. Change-Id: I17b701e1196c1ca116bbd23e0e669cf603ad464d Reviewed-on: https://boringssl-review.googlesource.com/c/34626 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2019-01-28 21:00:49 +00:00
David Benjamin	3859fc883d	Better document RSAZ and tidy up types. It's an assembly function, so types are a little meaningless, but everything is passed through as BN_ULONG, so be consistent. Also annotate all the RSAZ prototypes with sizes. Change-Id: I32e59e896da39e79c30ce9db52652fd645a033b4 Reviewed-on: https://boringssl-review.googlesource.com/c/34625 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2019-01-28 20:54:27 +00:00
David Benjamin	e569c7e25d	Add ABI testing for 32-bit x86. This is much less interesting (stack-based parameters, Windows and SysV match, no SEH concerns as far as I can tell) than x86_64, but it was easy to do and I'm more familiar with x86 than ARM, so it made a better second architecture to make sure all the architecture ifdefs worked out. Also fix a bug in the x86_64 direction flag code. It was shifting in the wrong direction, making give 0 or 1<<20 rather than 0 or 1. (Happily, x86_64 appears to be unique in having vastly different calling conventions between OSs. x86 is the same between SysV and Windows, and ARM had the good sense to specify a (mostly) common set of rules.) Since a lot of the assembly functions use the same names and the tests were written generically, merely dropping in a trampoline and CallerState implementation gives us a bunch of ABI tests for free. Change-Id: I15408c18d43e88cfa1c5c0634a8b268a150ed961 Reviewed-on: https://boringssl-review.googlesource.com/c/34624 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2019-01-28 20:40:06 +00:00
David Benjamin	23dcf88e18	Add some Node compatibility functions. This doesn't cover all the functions used by Node, but it's the easy bits. (EVP_PKEY_paramgen will be done separately as its a non-trivial bit of machinery.) Change-Id: I6501e99f9239ffcdcc57b961ebe85d0ad3965549 Reviewed-on: https://boringssl-review.googlesource.com/c/34544 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2019-01-25 16:50:30 +00:00
David Benjamin	4545503926	Add a constant-time pshufb-based GHASH implementation. We currently require clmul instructions for constant-time GHASH on x86_64. Otherwise, it falls back to a variable-time 4-bit table implementation. However, a significant proportion of clients lack these instructions. Inspired by vpaes, we can use pshufb and a slightly different order of incorporating the bits to make a constant-time GHASH. This requires SSSE3, which is very common. Benchmarking old machines we had on hand, it appears to be a no-op on Sandy Bridge and a small slowdown for Penryn. Sandy Bridge (Intel Pentium CPU 987 @ 1.50GHz): (Note: these numbers are before 16-byte-aligning the table. That was an improvement on Penryn, so it's possible Sandy Bridge is now better.) Before: Did 4244750 AES-128-GCM (16 bytes) seal operations in 4015000us (1057222.9 ops/sec): 16.9 MB/s Did 442000 AES-128-GCM (1350 bytes) seal operations in 4016000us (110059.8 ops/sec): 148.6 MB/s Did 84000 AES-128-GCM (8192 bytes) seal operations in 4015000us (20921.5 ops/sec): 171.4 MB/s Did 3349250 AES-256-GCM (16 bytes) seal operations in 4016000us (833976.6 ops/sec): 13.3 MB/s Did 343500 AES-256-GCM (1350 bytes) seal operations in 4016000us (85532.9 ops/sec): 115.5 MB/s Did 65250 AES-256-GCM (8192 bytes) seal operations in 4015000us (16251.6 ops/sec): 133.1 MB/s After: Did 4229250 AES-128-GCM (16 bytes) seal operations in 4016000us (1053100.1 ops/sec): 16.8 MB/s [-0.4%] Did 442250 AES-128-GCM (1350 bytes) seal operations in 4016000us (110122.0 ops/sec): 148.7 MB/s [+0.1%] Did 83500 AES-128-GCM (8192 bytes) seal operations in 4015000us (20797.0 ops/sec): 170.4 MB/s [-0.6%] Did 3286500 AES-256-GCM (16 bytes) seal operations in 4016000us (818351.6 ops/sec): 13.1 MB/s [-1.9%] Did 342750 AES-256-GCM (1350 bytes) seal operations in 4015000us (85367.4 ops/sec): 115.2 MB/s [-0.2%] Did 65250 AES-256-GCM (8192 bytes) seal operations in 4016000us (16247.5 ops/sec): 133.1 MB/s [-0.0%] Penryn (Intel Core 2 Duo CPU P8600 @ 2.40GHz): Before: Did 1179000 AES-128-GCM (16 bytes) seal operations in 1000139us (1178836.1 ops/sec): 18.9 MB/s Did 97000 AES-128-GCM (1350 bytes) seal operations in 1006347us (96388.2 ops/sec): 130.1 MB/s Did 18000 AES-128-GCM (8192 bytes) seal operations in 1028943us (17493.7 ops/sec): 143.3 MB/s Did 977000 AES-256-GCM (16 bytes) seal operations in 1000197us (976807.6 ops/sec): 15.6 MB/s Did 82000 AES-256-GCM (1350 bytes) seal operations in 1012434us (80992.9 ops/sec): 109.3 MB/s Did 15000 AES-256-GCM (8192 bytes) seal operations in 1006528us (14902.7 ops/sec): 122.1 MB/s After: Did 1306000 AES-128-GCM (16 bytes) seal operations in 1000153us (1305800.2 ops/sec): 20.9 MB/s [+10.8%] Did 94000 AES-128-GCM (1350 bytes) seal operations in 1009852us (93082.9 ops/sec): 125.7 MB/s [-3.4%] Did 17000 AES-128-GCM (8192 bytes) seal operations in 1012096us (16796.8 ops/sec): 137.6 MB/s [-4.0%] Did 1070000 AES-256-GCM (16 bytes) seal operations in 1000929us (1069006.9 ops/sec): 17.1 MB/s [+9.4%] Did 79000 AES-256-GCM (1350 bytes) seal operations in 1002209us (78825.9 ops/sec): 106.4 MB/s [-2.7%] Did 15000 AES-256-GCM (8192 bytes) seal operations in 1061489us (14131.1 ops/sec): 115.8 MB/s [-5.2%] Change-Id: I1c3760a77af7bee4aee3745d1c648d9e34594afb Reviewed-on: https://boringssl-review.googlesource.com/c/34267 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-24 17:19:21 +00:00
Adam Langley	51011b4a26	Remove union from \|SHA512_CTX\|. With `2fe0360a4e`, we no longer use the other member of this union so it can be removed. Change-Id: Ideb7c47a72df0b420eb1e7d8c718e1cacb2129f5 Reviewed-on: https://boringssl-review.googlesource.com/c/34449 Commit-Queue: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: David Benjamin <davidben@google.com>	2019-01-22 23:36:46 +00:00
David Benjamin	2fe0360a4e	Fix undefined pointer casts in SHA-512 code. Casting an unaligned pointer to uint64_t* is undefined, even on platforms that support unaligned access. Additionally, dereferencing as uint64_t violates strict aliasing rules. Instead, use memcpys which we assume any sensible compiler can optimize. Also simplify the PULL64 business with the existing CRYPTO_bswap8. This also removes the need for the SHA512_BLOCK_CAN_MANAGE_UNALIGNED_DATA logic. The generic C code now handles unaligned data and the assembly already can as well. (The only problematic platform with assembly is old ARM, but sha512-armv4.pl already handles this via an __ARM_ARCH__ check. See also OpenSSL's version of this file which always defines SHA512_BLOCK_CAN_MANAGE_UNALIGNED_DATA if SHA512_ASM is defined.) Add unaligned tests to digest_test.cc, so we retain coverage of unaligned EVP_MD inputs. Change-Id: Idfd8586c64bab2a77292af2fa8eebbd193e57c7d Reviewed-on: https://boringssl-review.googlesource.com/c/34444 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-22 23:18:36 +00:00
Adam Langley	c1615719ce	Add test of assembly code dispatch. The first attempt involved using Linux's support for hardware breakpoints to detect when assembly code was run. However, this doesn't work with SDE, which is a problem. This version has the assembly code update a global flags variable when it's run, but only in non-FIPS and non-debug builds. Update-Note: Assembly files now pay attention to the NDEBUG preprocessor symbol. Ensure the build passes the symbol in. (If release builds fail to link due to missing BORINGSSL_function_hit, this is the cause.) Change-Id: I6b7ced442b7a77d0b4ae148b00c351f68af89a6e Reviewed-on: https://boringssl-review.googlesource.com/c/33384 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: David Benjamin <davidben@google.com>	2019-01-22 20:22:53 +00:00
David Benjamin	73b1f181b6	Add ABI tests for GCM. Change-Id: If28096e677104c6109e31e31a636fee82ef4ba11 Reviewed-on: https://boringssl-review.googlesource.com/c/34266 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-15 22:49:37 +00:00
David Benjamin	b65ce68c8f	Test CRYPTO_gcm128_tag in gcm_test.cc. CRYPTO_gcm128_encrypt should be paired with CRYPTO_gcm128_tag, not CRYPTO_gcm128_finish. Change-Id: Ia3023a196fe5b613e9309b5bac19ea849dbc33b7 Reviewed-on: https://boringssl-review.googlesource.com/c/34265 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-15 18:19:57 +00:00
David Benjamin	f18bd55240	Remove pointer cast in P-256 table. We expect the table to have a slightly nested structure, so just generate it that way. Avoid risking strict aliasing problems. Thanks to Brian Smith for pointing this out. Change-Id: Ie21610c4afab07a610d914265079135dba17b3b7 Reviewed-on: https://boringssl-review.googlesource.com/c/34264 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-15 00:16:17 +00:00
David Benjamin	5349ddb747	Fix RSAZ's OPENSSL_cleanse. https://boringssl-review.googlesource.com/28584 switched RSAZ's buffer to being externally-allocated, which means the OPENSSL_cleanse needs to be tweaked to match. Change-Id: I0a7307ac86aa10933d10d380ef652c355fed3ee9 Reviewed-on: https://boringssl-review.googlesource.com/c/34191 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: David Benjamin <davidben@google.com>	2019-01-14 20:04:39 +00:00
David Benjamin	d99b549b8e	Add AES ABI tests. This involves fixing some bugs in aes_nohw_cbc_encrypt's annotations, and working around a libunwind bug. In doing so, support .cfi_remember_state and .cfi_restore_state in perlasm. Change-Id: Iaedfe691356b0468327a6be0958d034dafa760e5 Reviewed-on: https://boringssl-review.googlesource.com/c/34189 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com>	2019-01-09 03:54:55 +00:00
David Benjamin	c0f4dbe4e2	Move aes_nohw, bsaes, and vpaes prototypes to aes/internal.h. This is in preparation for adding ABI tests to them. In doing so, update delocate.go so that OPENSSL_ia32cap_get is consistently callable outside the module. Right now it's callable both inside and outside normally, but not in FIPS mode because the function is generated. This is needed for tests and the module to share headers that touch OPENSSL_ia32cap_P. Change-Id: Idbc7d694acfb974e0b04adac907dab621e87de62 Reviewed-on: https://boringssl-review.googlesource.com/c/34188 Commit-Queue: Adam Langley <agl@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-09 03:35:55 +00:00
David Benjamin	5e350d13f5	Add ABI tests for MD5. This does not actually matter, but writing new CFI directives with the tester seemed like fun. (It caught two typos, one intentional and one accidental.) Change-Id: Iff3e0358f2e56caa26079f658fa7a682772150a1 Reviewed-on: https://boringssl-review.googlesource.com/c/34185 Commit-Queue: David Benjamin <davidben@google.com> Reviewed-by: Adam Langley <agl@google.com>	2019-01-08 18:01:07 +00:00

1 2 3 4 5 ...

495 Commits