Commit Graph

49 Commits

Author SHA1 Message Date
David Benjamin
32ce6032ff Add an optimized x86_64 vpaes ctr128_f and remove bsaes.
Brian Smith suggested applying vpaes-armv8's "2x" optimization to
vpaes-x86_64. The registers are a little tight (aarch64 has a whole 32
SIMD registers, while x86_64 only has 16), but it's doable with some
spills and makes vpaes much more competitive with bsaes. At small- and
medium-sized inputs, vpaes now matches bsaes. At large inputs, it's a
~10% perf hit.

bsaes is thus pulling much less weight. Losing an entire AES
implementation and having constant-time AES for SSSE3 is attractive.
Some notes:

- The fact that these are older CPUs tempers the perf hit, but CPUs
  without AES-NI are still common enough to matter.

- This CL does regress CBC decrypt performance nontrivially (see below).
  If this matters, we can double-up CBC decryption too. CBC in TLS is
  legacy and already pays a costly Lucky13 mitigation.

- The difference between 1350 and 8192 bytes is likely bsaes AES-GCM
  paying for two slow (and variable-time!) aes_nohw_encrypt
  calls for EK0 and the trailing partial block. At larger inputs, those
  two calls are more amortized.

- To that end, bsaes would likely be much faster on AES-GCM with smarter
  use of bsaes. (Fold one-off calls above into bulk data.) Implementing
  this is a bit of a nuisance though, especially considering we don't
  wish to regress hwaes.

- I'd discarded the key conversion idea, but I think I did it wrong.
  Benchmarks from
  https://boringssl-review.googlesource.com/c/boringssl/+/33589 suggest
  converting to bsaes format on-demand for large ctr32 inputs should
  give the best of both worlds, but at the cost of an entire AES
  implementation relative to this CL.

- ARMv7 still depends on bsaes and has no vpaes. It also has 16 SIMD
  registers, so my plan is to translate it, with the same 2x
  optimization, and see how it compares. Hopefully that, or some
  combination of the above, will work for ARMv7.

Sandy Bridge
bsaes (before):
Did 3144750 AES-128-GCM (16 bytes) seal operations in 5016000us (626943.8 ops/sec): 10.0 MB/s
Did 2053750 AES-128-GCM (256 bytes) seal operations in 5016000us (409439.8 ops/sec): 104.8 MB/s
Did 469000 AES-128-GCM (1350 bytes) seal operations in 5015000us (93519.4 ops/sec): 126.3 MB/s
Did 92500 AES-128-GCM (8192 bytes) seal operations in 5016000us (18441.0 ops/sec): 151.1 MB/s
Did 46750 AES-128-GCM (16384 bytes) seal operations in 5032000us (9290.5 ops/sec): 152.2 MB/s
vpaes-1x (for reference, not this CL):
Did 8684750 AES-128-GCM (16 bytes) seal operations in 5015000us (1731754.7 ops/sec): 27.7 MB/s [+177%]
Did 1731500 AES-128-GCM (256 bytes) seal operations in 5016000us (345195.4 ops/sec): 88.4 MB/s [-15.6%]
Did 346500 AES-128-GCM (1350 bytes) seal operations in 5016000us (69078.9 ops/sec): 93.3 MB/s [-26.1%]
Did 61250 AES-128-GCM (8192 bytes) seal operations in 5015000us (12213.4 ops/sec): 100.1 MB/s [-33.8%]
Did 32500 AES-128-GCM (16384 bytes) seal operations in 5031000us (6459.9 ops/sec): 105.8 MB/s [-30.5%]
vpaes-2x (this CL):
Did 8840000 AES-128-GCM (16 bytes) seal operations in 5015000us (1762711.9 ops/sec): 28.2 MB/s [+182%]
Did 2167750 AES-128-GCM (256 bytes) seal operations in 5016000us (432167.1 ops/sec): 110.6 MB/s [+5.5%]
Did 474000 AES-128-GCM (1350 bytes) seal operations in 5016000us (94497.6 ops/sec): 127.6 MB/s [+1.0%]
Did 81750 AES-128-GCM (8192 bytes) seal operations in 5015000us (16301.1 ops/sec): 133.5 MB/s [-11.6%]
Did 41750 AES-128-GCM (16384 bytes) seal operations in 5031000us (8298.5 ops/sec): 136.0 MB/s [-10.6%]

Penryn
bsaes (before):
Did 958000 AES-128-GCM (16 bytes) seal operations in 1000264us (957747.2 ops/sec): 15.3 MB/s
Did 420000 AES-128-GCM (256 bytes) seal operations in 1000480us (419798.5 ops/sec): 107.5 MB/s
Did 96000 AES-128-GCM (1350 bytes) seal operations in 1001083us (95896.1 ops/sec): 129.5 MB/s
Did 18000 AES-128-GCM (8192 bytes) seal operations in 1042491us (17266.3 ops/sec): 141.4 MB/s
Did 9482 AES-128-GCM (16384 bytes) seal operations in 1095703us (8653.8 ops/sec): 141.8 MB/s
Did 758000 AES-256-GCM (16 bytes) seal operations in 1000769us (757417.5 ops/sec): 12.1 MB/s
Did 359000 AES-256-GCM (256 bytes) seal operations in 1001993us (358285.9 ops/sec): 91.7 MB/s
Did 82000 AES-256-GCM (1350 bytes) seal operations in 1009583us (81221.7 ops/sec): 109.6 MB/s
Did 15000 AES-256-GCM (8192 bytes) seal operations in 1022294us (14672.9 ops/sec): 120.2 MB/s
Did 7884 AES-256-GCM (16384 bytes) seal operations in 1070934us (7361.8 ops/sec): 120.6 MB/s
vpaes-1x (for reference, not this CL):
Did 2030000 AES-128-GCM (16 bytes) seal operations in 1000227us (2029539.3 ops/sec): 32.5 MB/s [+112%]
Did 382000 AES-128-GCM (256 bytes) seal operations in 1001949us (381256.9 ops/sec): 97.6 MB/s [-9.2%]
Did 81000 AES-128-GCM (1350 bytes) seal operations in 1007297us (80413.2 ops/sec): 108.6 MB/s [-16.1%]
Did 14000 AES-128-GCM (8192 bytes) seal operations in 1031499us (13572.5 ops/sec): 111.2 MB/s [-21.4%]
Did 7008 AES-128-GCM (16384 bytes) seal operations in 1030706us (6799.2 ops/sec): 111.4 MB/s [-21.4%]
Did 1838000 AES-256-GCM (16 bytes) seal operations in 1000238us (1837562.7 ops/sec): 29.4 MB/s [+143%]
Did 321000 AES-256-GCM (256 bytes) seal operations in 1001666us (320466.1 ops/sec): 82.0 MB/s [-10.6%]
Did 67000 AES-256-GCM (1350 bytes) seal operations in 1010359us (66313.1 ops/sec): 89.5 MB/s [-18.3%]
Did 12000 AES-256-GCM (8192 bytes) seal operations in 1072706us (11186.7 ops/sec): 91.6 MB/s [-23.8%]
Did 5680 AES-256-GCM (16384 bytes) seal operations in 1009214us (5628.1 ops/sec): 92.2 MB/s [-23.5%]
vpaes-2x (this CL):
Did 2072000 AES-128-GCM (16 bytes) seal operations in 1000066us (2071863.3 ops/sec): 33.1 MB/s [+116%]
Did 432000 AES-128-GCM (256 bytes) seal operations in 1000732us (431684.0 ops/sec): 110.5 MB/s [+2.8%]
Did 92000 AES-128-GCM (1350 bytes) seal operations in 1000580us (91946.7 ops/sec): 124.1 MB/s [-4.2%]
Did 16000 AES-128-GCM (8192 bytes) seal operations in 1016422us (15741.5 ops/sec): 129.0 MB/s [-8.8%]
Did 8448 AES-128-GCM (16384 bytes) seal operations in 1073962us (7866.2 ops/sec): 128.9 MB/s [-9.1%]
Did 1865000 AES-256-GCM (16 bytes) seal operations in 1000043us (1864919.8 ops/sec): 29.8 MB/s [+146%]
Did 364000 AES-256-GCM (256 bytes) seal operations in 1001561us (363432.7 ops/sec): 93.0 MB/s [+1.4%]
Did 77000 AES-256-GCM (1350 bytes) seal operations in 1004123us (76683.8 ops/sec): 103.5 MB/s [-5.6%]
Did 14000 AES-256-GCM (8192 bytes) seal operations in 1071179us (13069.7 ops/sec): 107.1 MB/s [-10.9%]
Did 7008 AES-256-GCM (16384 bytes) seal operations in 1074125us (6524.4 ops/sec): 106.9 MB/s [-11.4%]

Penryn, CBC mode decryption
bsaes (before):
Did 159000 AES-128-CBC-SHA1 (16 bytes) open operations in 1001019us (158838.1 ops/sec): 2.5 MB/s
Did 114000 AES-128-CBC-SHA1 (256 bytes) open operations in 1006485us (113265.5 ops/sec): 29.0 MB/s
Did 65000 AES-128-CBC-SHA1 (1350 bytes) open operations in 1008441us (64455.9 ops/sec): 87.0 MB/s
Did 17000 AES-128-CBC-SHA1 (8192 bytes) open operations in 1005440us (16908.0 ops/sec): 138.5 MB/s
vpaes (after):
Did 167000 AES-128-CBC-SHA1 (16 bytes) open operations in 1003556us (166408.3 ops/sec): 2.7 MB/s [+8%]
Did 112000 AES-128-CBC-SHA1 (256 bytes) open operations in 1005673us (111368.2 ops/sec): 28.5 MB/s [-1.7%]
Did 56000 AES-128-CBC-SHA1 (1350 bytes) open operations in 1005647us (55685.5 ops/sec): 75.2 MB/s [-13.6%]
Did 13635 AES-128-CBC-SHA1 (8192 bytes) open operations in 1020486us (13361.3 ops/sec): 109.5 MB/s [-20.9%]

Bug: 256
Change-Id: I11ed773323ec7a5ee61080c9ed9ed4761849828a
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35364
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-03-23 06:59:22 +00:00
David Benjamin
4851041967 Patch out the aes_nohw fallback in bsaes_cbc_encrypt.
This plugs all bsaes fallback leaks for CBC outside of the key schedule.
The CBC EVP_CIPHERs never call the block function directly when there's
a stream.cbc function available.

This affects CBC decryptions of length < 128 or 16 mod 128.
Performance-wise, we don't really care about CBC apart from passing
glances at its use in TLS. There, the Lucky13 workaround mutes the
effects.

Cortex-A53 (Raspberry Pi 3 Model B+)
Before:
Did 78000 AES-128-CBC-SHA1 (16 bytes) open operations in 3020254us (25825.6 ops/sec): 0.4 MB/s
Did 75000 AES-128-CBC-SHA1 (32 bytes) open operations in 3005760us (24952.1 ops/sec): 0.8 MB/s
Did 71000 AES-128-CBC-SHA1 (64 bytes) open operations in 3038137us (23369.6 ops/sec): 1.5 MB/s
Did 67000 AES-128-CBC-SHA1 (96 bytes) open operations in 3027686us (22129.1 ops/sec): 2.1 MB/s
Did 64000 AES-128-CBC-SHA1 (112 bytes) open operations in 3005491us (21294.4 ops/sec): 2.4 MB/s
Did 59000 AES-128-CBC-SHA1 (128 bytes) open operations in 3020083us (19535.9 ops/sec): 2.5 MB/s
Did 53000 AES-128-CBC-SHA1 (240 bytes) open operations in 3020105us (17549.1 ops/sec): 4.2 MB/s
After:
Did 71668 AES-128-CBC-SHA1 (16 bytes) open operations in 3020896us (23724.1 ops/sec): 0.4 MB/s
Did 71000 AES-128-CBC-SHA1 (32 bytes) open operations in 3040826us (23348.9 ops/sec): 0.7 MB/s
Did 68000 AES-128-CBC-SHA1 (64 bytes) open operations in 3009913us (22592.0 ops/sec): 1.4 MB/s
Did 66000 AES-128-CBC-SHA1 (96 bytes) open operations in 3007597us (21944.4 ops/sec): 2.1 MB/s
Did 59000 AES-128-CBC-SHA1 (112 bytes) open operations in 3002878us (19647.8 ops/sec): 2.2 MB/s
Did 59000 AES-128-CBC-SHA1 (128 bytes) open operations in 3046786us (19364.7 ops/sec): 2.5 MB/s
Did 50000 AES-128-CBC-SHA1 (240 bytes) open operations in 3043643us (16427.7 ops/sec): 3.9 MB/s

Penryn (Mac mini, mid 2010)
Before:
Did 152000 AES-128-CBC-SHA1 (16 bytes) open operations in 1004422us (151330.8 ops/sec): 2.4 MB/s
Did 143000 AES-128-CBC-SHA1 (32 bytes) open operations in 1000443us (142936.7 ops/sec): 4.6 MB/s
Did 136000 AES-128-CBC-SHA1 (48 bytes) open operations in 1006580us (135111.0 ops/sec): 6.5 MB/s
Did 146000 AES-128-CBC-SHA1 (96 bytes) open operations in 1005731us (145168.0 ops/sec): 13.9 MB/s
Did 138000 AES-128-CBC-SHA1 (112 bytes) open operations in 1003330us (137542.0 ops/sec): 15.4 MB/s
Did 133000 AES-128-CBC-SHA1 (128 bytes) open operations in 1005876us (132223.1 ops/sec): 16.9 MB/s
Did 117000 AES-128-CBC-SHA1 (240 bytes) open operations in 1004922us (116426.9 ops/sec): 27.9 MB/s
After:
Did 159000 AES-128-CBC-SHA1 (16 bytes) open operations in 1000505us (158919.7 ops/sec): 2.5 MB/s
Did 157000 AES-128-CBC-SHA1 (32 bytes) open operations in 1006091us (156049.5 ops/sec): 5.0 MB/s
Did 154000 AES-128-CBC-SHA1 (48 bytes) open operations in 1002720us (153582.3 ops/sec): 7.4 MB/s
Did 146000 AES-128-CBC-SHA1 (96 bytes) open operations in 1002567us (145626.2 ops/sec): 14.0 MB/s
Did 135000 AES-128-CBC-SHA1 (112 bytes) open operations in 1001212us (134836.6 ops/sec): 15.1 MB/s
Did 133000 AES-128-CBC-SHA1 (128 bytes) open operations in 1006441us (132148.8 ops/sec): 16.9 MB/s
Did 115000 AES-128-CBC-SHA1 (240 bytes) open operations in 1005246us (114399.9 ops/sec): 27.5 MB/s

Bug: 256
Change-Id: I864b4455ada0d4d245380fce6f869dabb0686354
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35167
Reviewed-by: Adam Langley <agl@google.com>
2019-03-14 21:38:28 +00:00
David Benjamin
885a63fb74 Patch out the aes_nohw fallback in bsaes_ctr32_encrypt_blocks.
bsaes_ctr32_encrypt_blocks previously fell back to the table-based
aes_nohw_encrypt for inputs under 128 bytes. Instead, just run the usual
bsaes code, though it means we compute more blocks than needed.

This fixes some (but not all) the timing leaks and is needed for later
bsaes work.

Performance-wise, x86_64 actually sees a performance improvement for all but
tiny inputs. ARM does see a loss at small inputs however.

Cortex-A53 (Raspberry Pi 3 Model B+)
Before:
Did 299000 AES-128-GCM (16 bytes) seal operations in 1001123us (298664.6 ops/sec): 4.8 MB/s
Did 236000 AES-128-GCM (32 bytes) seal operations in 1001611us (235620.4 ops/sec): 7.5 MB/s
Did 167000 AES-128-GCM (64 bytes) seal operations in 1005706us (166052.5 ops/sec): 10.6 MB/s
Did 129000 AES-128-GCM (96 bytes) seal operations in 1006129us (128214.2 ops/sec): 12.3 MB/s
Did 116000 AES-128-GCM (112 bytes) seal operations in 1006302us (115273.5 ops/sec): 12.9 MB/s
Did 107000 AES-128-GCM (128 bytes) seal operations in 1000986us (106894.6 ops/sec): 13.7 MB/s
After:
Did 132000 AES-128-GCM (16 bytes) seal operations in 1005165us (131321.7 ops/sec): 2.1 MB/s
Did 128000 AES-128-GCM (32 bytes) seal operations in 1005966us (127240.9 ops/sec): 4.1 MB/s
Did 120000 AES-128-GCM (64 bytes) seal operations in 1003080us (119631.5 ops/sec): 7.7 MB/s
Did 113000 AES-128-GCM (96 bytes) seal operations in 1000557us (112937.1 ops/sec): 10.8 MB/s
Did 110000 AES-128-GCM (112 bytes) seal operations in 1000407us (109955.2 ops/sec): 12.3 MB/s
Did 108000 AES-128-GCM (128 bytes) seal operations in 1008830us (107054.7 ops/sec): 13.7 MB/s
(Inputs 128 bytes and up are unaffected by this CL.)

Nexus 7
Before:
Did 544000 AES-128-GCM (16 bytes) seal operations in 1001282us (543303.5 ops/sec): 8.7 MB/s
Did 475750 AES-128-GCM (32 bytes) seal operations in 1000244us (475633.9 ops/sec): 15.2 MB/s
Did 370500 AES-128-GCM (64 bytes) seal operations in 1000519us (370307.8 ops/sec): 23.7 MB/s
Did 300750 AES-128-GCM (96 bytes) seal operations in 1000122us (300713.3 ops/sec): 28.9 MB/s
Did 275750 AES-128-GCM (112 bytes) seal operations in 1000702us (275556.6 ops/sec): 30.9 MB/s
Did 251000 AES-128-GCM (128 bytes) seal operations in 1000214us (250946.3 ops/sec): 32.1 MB/s
After:
Did 296000 AES-128-GCM (16 bytes) seal operations in 1001129us (295666.2 ops/sec): 4.7 MB/s
Did 288750 AES-128-GCM (32 bytes) seal operations in 1000488us (288609.2 ops/sec): 9.2 MB/s
Did 267250 AES-128-GCM (64 bytes) seal operations in 1000641us (267078.8 ops/sec): 17.1 MB/s
Did 253250 AES-128-GCM (96 bytes) seal operations in 1000915us (253018.5 ops/sec): 24.3 MB/s
Did 248000 AES-128-GCM (112 bytes) seal operations in 1000091us (247977.4 ops/sec): 27.8 MB/s
Did 249000 AES-128-GCM (128 bytes) seal operations in 1000794us (248802.5 ops/sec): 31.8 MB/s

Penryn (Mac mini, mid 2010)
Before:
Did 1331000 AES-128-GCM (16 bytes) seal operations in 1000263us (1330650.0 ops/sec): 21.3 MB/s
Did 991000 AES-128-GCM (32 bytes) seal operations in 1000274us (990728.5 ops/sec): 31.7 MB/s
Did 780000 AES-128-GCM (48 bytes) seal operations in 1000278us (779783.2 ops/sec): 37.4 MB/s
Did 483000 AES-128-GCM (96 bytes) seal operations in 1000137us (482933.8 ops/sec): 46.4 MB/s
Did 428000 AES-128-GCM (112 bytes) seal operations in 1001132us (427516.1 ops/sec): 47.9 MB/s
Did 682000 AES-128-GCM (128 bytes) seal operations in 1000564us (681615.6 ops/sec): 87.2 MB/s
After:
Did 953000 AES-128-GCM (16 bytes) seal operations in 1000385us (952633.2 ops/sec): 15.2 MB/s
Did 903000 AES-128-GCM (32 bytes) seal operations in 1000998us (902099.7 ops/sec): 28.9 MB/s
Did 850000 AES-128-GCM (48 bytes) seal operations in 1000938us (849203.4 ops/sec): 40.8 MB/s
Did 736000 AES-128-GCM (96 bytes) seal operations in 1000886us (735348.5 ops/sec): 70.6 MB/s
Did 702000 AES-128-GCM (112 bytes) seal operations in 1000657us (701539.1 ops/sec): 78.6 MB/s
Did 676000 AES-128-GCM (128 bytes) seal operations in 1000405us (675726.3 ops/sec): 86.5 MB/s

Bug: 256
Change-Id: I9403da607dd1feaff7b3c9b76fe78b66018fb753
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35166
Reviewed-by: Adam Langley <agl@google.com>
2019-03-14 21:37:46 +00:00
David Benjamin
35941f2923 Make vpaes-armv8.pl compatible with XOM.
Change-Id: I27413467e5cac4e16ecbbb8d9a238ba5a8bcb9e7
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35284
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-03-11 23:17:06 +00:00
David Benjamin
55db667c62 Enable vpaes for aarch64, with CTR optimizations.
This patches vpaes-armv8.pl to add vpaes_ctr32_encrypt_blocks. CTR mode
is by far the most important mode these days. It should have access to
_vpaes_encrypt_2x, which gives a considerable speed boost. Also exclude
vpaes_ecb_* as they're not even used.

For iOS, this change is completely a no-op. iOS ARMv8 always has crypto
extensions, and we already statically drop all other AES
implementations.

Android ARMv8 is *not* required to have crypto extensions, but every
ARMv8 device I've seen has them. For those, it is a no-op
performance-wise and a win on size. vpaes appears to be about 5.6KiB
smaller than the tables. ARMv8 always makes SIMD (NEON) available, so we
can statically drop aes_nohw.

In theory, however, crypto-less Android ARMv8 is possible. Today such
chips get a variable-time AES. This CL fixes this, but the performance
story is complex.

The Raspberry Pi 3 is not Android but has a Cortex-A53 chip
without crypto extensions. (But the official images are 32-bit, so even
this is slightly artificial...) There, vpaes is a performance win.

Raspberry Pi 3, Model B+, Cortex-A53
Before:
Did 265000 AES-128-GCM (16 bytes) seal operations in 1003312us (264125.2 ops/sec): 4.2 MB/s
Did 44000 AES-128-GCM (256 bytes) seal operations in 1002141us (43906.0 ops/sec): 11.2 MB/s
Did 9394 AES-128-GCM (1350 bytes) seal operations in 1032104us (9101.8 ops/sec): 12.3 MB/s
Did 1562 AES-128-GCM (8192 bytes) seal operations in 1008982us (1548.1 ops/sec): 12.7 MB/s
After:
Did 277000 AES-128-GCM (16 bytes) seal operations in 1001884us (276479.1 ops/sec): 4.4 MB/s
Did 52000 AES-128-GCM (256 bytes) seal operations in 1001480us (51923.2 ops/sec): 13.3 MB/s
Did 11000 AES-128-GCM (1350 bytes) seal operations in 1007979us (10912.9 ops/sec): 14.7 MB/s
Did 2013 AES-128-GCM (8192 bytes) seal operations in 1085545us (1854.4 ops/sec): 15.2 MB/s

The Pixel 3 has a Cortex-A75 with crypto extensions, so it would never
run this code. However, artificially ignoring them gives another data
point (ARM documentation[*] suggests the extensions are still optional
on a Cortex-A75.) Sadly, vpaes no longer wins on perf over aes_nohw.
But, it is constant-time:

Pixel 3, AES/PMULL extensions ignored, Cortex-A75:
Before:
Did 2102000 AES-128-GCM (16 bytes) seal operations in 1000378us (2101205.7 ops/sec): 33.6 MB/s
Did 358000 AES-128-GCM (256 bytes) seal operations in 1002658us (357051.0 ops/sec): 91.4 MB/s
Did 75000 AES-128-GCM (1350 bytes) seal operations in 1012830us (74049.9 ops/sec): 100.0 MB/s
Did 13000 AES-128-GCM (8192 bytes) seal operations in 1036524us (12541.9 ops/sec): 102.7 MB/s
After:
Did 1453000 AES-128-GCM (16 bytes) seal operations in 1000213us (1452690.6 ops/sec): 23.2 MB/s
Did 285000 AES-128-GCM (256 bytes) seal operations in 1002227us (284366.7 ops/sec): 72.8 MB/s
Did 60000 AES-128-GCM (1350 bytes) seal operations in 1016106us (59049.0 ops/sec): 79.7 MB/s
Did 11000 AES-128-GCM (8192 bytes) seal operations in 1094184us (10053.2 ops/sec): 82.4 MB/s

Note the numbers above run with PMULL off, so the slow GHASH is
dampening the regression. If we test aes_nohw and vpaes paired with
PMULL on, the 20% perf hit becomes a 31% hit. The PMULL-less variant is
more likely to represent a real chip.

This is consistent with upstream's note in the comment, though it is
unclear if 20% is the right order of magnitude: "these results are worse
than scalar compiler-generated code, but it's constant-time and
therefore preferred".

[*] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100458_0301_00_en/lau1442495529696.html

Bug: 246
Change-Id: If1dc87f5131fce742052498295476fbae4628dbf
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35026
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-03-04 20:31:39 +00:00
David Benjamin
b1b4ff93ca Check in vpaes-armv8.pl from OpenSSL unused and unmodified.
This is done separately to make the diffs in the subsequent CL easier to
see. Imported from OpenSSL at revision
25ca718150cef41e1c1d9c2c8c58e2b1e2cad3fa.

Bug: 246
Change-Id: I9e7067ea177963fb9b77bf6fb39702ffe6e34ed4
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35025
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-03-04 20:23:09 +00:00
David Benjamin
f1f73f8966 Fix bsaes-armv7.pl getting disabled by accident.
https://boringssl-review.googlesource.com/c/34188 accidentally disabled
it (__ARM_MAX_ARCH__ wasn't defined), which, in turn, masked a bug in
https://boringssl-review.googlesource.com/c/34874.

Remove the __ARM_MAX_ARCH__ check as that's hardcoded to 8 anyway. Then
revert the problematic part of the bsaes-armv7.pl change. That brings
back the somewhat questionable post-dispatch to pre-dispatch call, but I
hope to patch the fallbacks out soon anyway.

Change-Id: I567e55fe35cb716d5ed56580113a302617f5ad71
Reviewed-on: https://boringssl-review.googlesource.com/c/35044
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-27 02:06:21 +00:00
Adam Langley
a367d9267f Set VPAES flags in x86-64 code.
The ImplDispatchTest was broken because the 64-bit VPAES code wasn't
setting the hit flags.

Change-Id: I30200db64337deba7ae9d70d8427decbdfceca58
Reviewed-on: https://boringssl-review.googlesource.com/c/34986
Reviewed-by: David Benjamin <davidben@google.com>
2019-02-22 23:41:50 +00:00
David Benjamin
65dc321492 Enable vpaes for AES_* functions.
This makes the AES_* functions meet our constant-time goals for
platforms where we have vpaes available. In particular, QUIC packet
number encryption needs single-block operations and those should have
vpaes available.

As a bonus, when vpaes is statically available, the aes_nohw_* functions
should be dropped by the linker. (Notably, NEON is guaranteed on
aarch64. Although vpaes-armv8.pl itself may take some more exploration.
https://crbug.com/boringssl/246#c4)

Bug: 263
Change-Id: Ie1c4727a166ec101a8453761757c87dadc188769
Reviewed-on: https://boringssl-review.googlesource.com/c/34875
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-22 23:09:19 +00:00
David Benjamin
3c19830f6f Avoid double-dispatch with AES_* vs aes_nohw_*.
In particular, consistently pair bsaes with aes_nohw.

Ideally the aes_nohw_* calls in bsaes-*.pl would be patched out and
bsaes grows its own constant-time key setup
(https://crbug.com/boringssl/256), but I'll sort that out separately. In
the meantime, avoid going through AES_* which now dispatch. This avoids
several nuisances:

1. If we were to add, say, a vpaes-armv7.pl the ABI tests would break.
   Fundamentally, we cannot assume that an AES_KEY has one and only one
   representation and must keep everything matching up.

2. AES_* functions should enable vpaes. This makes AES_* faster and
   constant-time for vector-capable CPUs
   (https://crbug.com/boringssl/263), relevant for QUIC packet number
   encryption, allowing us to add vpaes-armv8.pl
   (https://crbug.com/boringssl/246) without carrying a (likely) mostly
   unused AES implementation.

3. It's silly to double-dispatch when the EVP layer has already
   dispatched.

4. We should avoid asm calling into C. Otherwise, we need to test asm
   for ABI compliance as both caller and callee. Currently we only test
   it for callee compliance. When asm calls into asm, it *should* comply
   with the ABI as caller too, but mistakes don't matter as long as the
   called function triggers it. If the function is asm, this is fixed.
   If it is C, we must care about arbitrary C compiler output.

Bug: 263
Change-Id: Ic85af5c765fd57cbffeaf301c3872bad6c5bbf78
Reviewed-on: https://boringssl-review.googlesource.com/c/34874
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-22 22:51:51 +00:00
David Benjamin
4d8e1ce5e9 Patch XTS out of ARMv7 bsaes too.
Bug: 256
Change-Id: I822274bf05901d82b41dc9c9c4e6d0b5d622f3ff
Reviewed-on: https://boringssl-review.googlesource.com/c/34871
Reviewed-by: Adam Langley <agl@google.com>
2019-02-14 17:31:37 +00:00
David Benjamin
15ba2d11a9 Patch out unused aesni-x86_64 functions.
This shrinks the bssl binary by about 8k.

Change-Id: I571f258ccf7032ae34db3f20904ad9cc81cca839
Reviewed-on: https://boringssl-review.googlesource.com/c/34866
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-02-11 20:25:22 +00:00
Adam Langley
c1615719ce Add test of assembly code dispatch.
The first attempt involved using Linux's support for hardware
breakpoints to detect when assembly code was run. However, this doesn't
work with SDE, which is a problem.

This version has the assembly code update a global flags variable when
it's run, but only in non-FIPS and non-debug builds.

Update-Note: Assembly files now pay attention to the NDEBUG preprocessor
symbol. Ensure the build passes the symbol in. (If release builds fail
to link due to missing BORINGSSL_function_hit, this is the cause.)

Change-Id: I6b7ced442b7a77d0b4ae148b00c351f68af89a6e
Reviewed-on: https://boringssl-review.googlesource.com/c/33384
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2019-01-22 20:22:53 +00:00
David Benjamin
d99b549b8e Add AES ABI tests.
This involves fixing some bugs in aes_nohw_cbc_encrypt's annotations,
and working around a libunwind bug. In doing so, support .cfi_remember_state
and .cfi_restore_state in perlasm.

Change-Id: Iaedfe691356b0468327a6be0958d034dafa760e5
Reviewed-on: https://boringssl-review.googlesource.com/c/34189
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2019-01-09 03:54:55 +00:00
David Benjamin
c0f4dbe4e2 Move aes_nohw, bsaes, and vpaes prototypes to aes/internal.h.
This is in preparation for adding ABI tests to them.

In doing so, update delocate.go so that OPENSSL_ia32cap_get is consistently
callable outside the module. Right now it's callable both inside and outside
normally, but not in FIPS mode because the function is generated. This is
needed for tests and the module to share headers that touch OPENSSL_ia32cap_P.

Change-Id: Idbc7d694acfb974e0b04adac907dab621e87de62
Reviewed-on: https://boringssl-review.googlesource.com/c/34188
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-01-09 03:35:55 +00:00
David Benjamin
3adb1e5a37 Patch out the XTS implementation in bsaes.
We don't call it, so ship less code and reduce the number of places
where we must think about the bsaes -> aes_nohw fallback.

Bug: 256
Change-Id: I10ac2d70e18ec81e679631a9532c36d9edab1c6e
Reviewed-on: https://boringssl-review.googlesource.com/c/33586
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
2018-12-12 22:27:13 +00:00
Adam Langley
bf5021a6b8 Eliminate |OPENSSL_ia32cap_P| in C code in the FIPS module.
This can break delocate with certain compiler settings.

Change-Id: I76cf0f780d0e967390feed754e39b0ab25068f42
Reviewed-on: https://boringssl-review.googlesource.com/c/33485
Commit-Queue: Adam Langley <alangley@gmail.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
2018-12-06 00:58:14 +00:00
Brian Smith
0f5ecd3a85 Re-enable AES-NI on 32-bit x86 too.
commit 05750f23ae disabled AES-NI for
32-bit x86, perhaps unintentionally.

Change-Id: Ie950c4f49526257138ecc803df5ecfc115bc648d
Reviewed-on: https://boringssl-review.googlesource.com/c/33365
Reviewed-by: Adam Langley <agl@google.com>
2018-11-28 00:32:30 +00:00
David Benjamin
293d9ee4e8 Support execute-only memory for AArch64 assembly.
Put data in .rodata and, rather than adr, use the combination of adrp :pg_hi21:
and add :lo12:. Unfortunately, iOS uses different syntax, so we must add more
transforms to arm-xlate.pl.

Tested manually by:

1. Use Android NDK r19-beta1

2. Follow usual instructions to configure CMake for aarch64, but pass
   -DCMAKE_EXE_LINKER_FLAGS="-fuse-ld=lld -Wl,-execute-only".

3. Build. Confirm with readelf -l tool/bssl that .text is not marked
   readable.

4. Push the test binaries onto a Pixel 3. Test normally and with
   --cpu={none,neon,crypto}. I had to pass --gtest_filter=-*Thread* to
   crypto_test. There appears to be an issue with some runtime function
   that's unrelated to our assembly.

No measurable performance difference.

Going forward, to support this, we will need to apply similar changes to
all other AArch64 assembly. This is relatively straightforward, but may
be a little finicky for dual-AArch32/AArch64 files (aesv8-armx.pl).

Update-Note: Assembly syntax is a mess. There's a decent chance some
assembler will get offend.

Change-Id: Ib59b921d4cce76584320fefd23e6bb7ebd4847eb
Reviewed-on: https://boringssl-review.googlesource.com/c/33245
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
2018-11-19 19:58:15 +00:00
David Benjamin
73535ab252 Fix undefined block128_f, etc., casts.
This one is a little thorny. All the various block cipher modes
functions and callbacks take a void *key. This allows them to be used
with multiple kinds of block ciphers.

However, the implementations of those callbacks are the normal typed
functions, like AES_encrypt. Those take AES_KEY *key. While, at the ABI
level, this is perfectly fine, C considers this undefined behavior.

If we wish to preserve this genericness, we could either instantiate
multiple versions of these mode functions or create wrappers of
AES_encrypt, etc., that take void *key.

The former means more code and is tedious without C++ templates (maybe
someday...). The latter would not be difficult for a compiler to
optimize out. C mistakenly allowed comparing function pointers for
equality, which means a compiler cannot replace pointers to wrapper
functions with the real thing. (That said, the performance-sensitive
bits already act in chunks, e.g. ctr128_f, so the function call overhead
shouldn't matter.)

But our only 128-bit block cipher is AES anyway, so I just switched
things to use AES_KEY throughout. AES is doing fine, and hopefully we
would have the sense not to pair a hypothetical future block cipher with
so many modes!

Change-Id: Ied3e843f0e3042a439f09e655b29847ade9d4c7d
Reviewed-on: https://boringssl-review.googlesource.com/32107
Reviewed-by: Adam Langley <agl@google.com>
2018-10-01 17:35:02 +00:00
Adam Langley
91254c244c Rename |asm_AES_*| to |aes_nohw_*|.
Rather than have plain-C functions, asm functions, and accelerated
functions, just have accelerated and non-accelerated, where the latter
are either provided by assembly or by C code.

Pertinently, this allows Aarch64 to use hardware accel for the basic
|AES_*| functions.

Change-Id: I0003c0c7a43d85a3eee8c8f37697f61a3070dd40
Reviewed-on: https://boringssl-review.googlesource.com/28385
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2018-05-15 23:02:52 +00:00
Adam Langley
05750f23ae Revert "Revert "Revert "Revert "Make x86(-64) use the same aes_hw_* infrastructure as POWER and the ARMs.""""
This was reverted a second time because it ended up always setting the
final argument to CRYPTO_gcm128_init to zero, which disabled some
acceleration of GCM on ≥Haswell. With this update, that argument will be
set to 1 if |aes_hw_*| functions are being used.

Probably this will need to be reverted too for some reason. I'm hoping
to fill the entire git short description with “Revert”.

Change-Id: Ib4a06f937d35d95affdc0b63f29f01c4a8c47d03
Reviewed-on: https://boringssl-review.googlesource.com/28484
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2018-05-14 22:09:29 +00:00
Adam Langley
69271b5d4f Revert "Revert "Revert "Make x86(-64) use the same aes_hw_* infrastructure as POWER and the ARMs."""
gcm.c's AES-NI code wasn't triggering. (Thanks Brain for noting.)

Change-Id: Ic740e498b94fece180ac35c449066aee1349cbd5
Reviewed-on: https://boringssl-review.googlesource.com/28424
Reviewed-by: Adam Langley <alangley@gmail.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2018-05-12 15:18:16 +00:00
Adam Langley
29d97ff333 Revert "Revert "Make x86(-64) use the same aes_hw_* infrastructure as POWER and the ARMs.""
This relands
https://boringssl-review.googlesource.com/c/boringssl/+/28026 with a
change to avoid calling the Aarch64 hardware functions when the set has
been set by C code, since these are seemingly incompatible.

Change-Id: I91f3ed41cf6f7a7ce7a0477753569fac084c528b
Reviewed-on: https://boringssl-review.googlesource.com/28384
Reviewed-by: Adam Langley <agl@google.com>
2018-05-11 19:16:49 +00:00
Adam Langley
aca24c8724 Revert "Make x86(-64) use the same aes_hw_* infrastructure as POWER and the ARMs."
Broke Aarch64 on the main builders (but not the trybots, somehow.)

Change-Id: I53eb09c99ef42a59628b0506b5ddb125299b554a
Reviewed-on: https://boringssl-review.googlesource.com/28364
Reviewed-by: Adam Langley <agl@google.com>
2018-05-11 17:39:50 +00:00
Adam Langley
26ba48a6fb Make x86(-64) use the same aes_hw_* infrastructure as POWER and the ARMs.
This also happens to make the AES_[en|de]crypt functions use AES-NI
(where available) on Intel.

Update-Note: this substantially changes how AES-NI is triggered. Worth running bssl speed (on both k8 and ppc), before and after, to confirm that there are no regressions.

Change-Id: I5f22c1975236bbc1633c24ab60d683bca8ddd4c3
Reviewed-on: https://boringssl-review.googlesource.com/28026
Reviewed-by: David Benjamin <davidben@google.com>
2018-05-11 00:16:39 +00:00
David Benjamin
bf33114b51 Rename third_party/wycheproof to satisfy a bureaucrat.
Make it clear this is not a pristine full copy of all of Wycheproof as a
library.

Change-Id: I1aa5253a1d7c696e69b2e8d7897924f15303d9ac
Reviewed-on: https://boringssl-review.googlesource.com/28188
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Martin Kreichgauer <martinkr@google.com>
Reviewed-by: Martin Kreichgauer <martinkr@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2018-05-07 18:33:50 +00:00
David Benjamin
179c4e257a Update Wycheproof, add keywrap tests, and fix a bug.
The bug, courtesy of Wycheproof, is that AES key wrap requires the input
be at least two blocks, not one. This also matches the OpenSSL behavior
of those two APIs.

Update-Note: AES_wrap_key with in_len = 8 and AES_unwrap_key with
in_len = 16 will no longer work.

Change-Id: I5fc63ebc16920c2f9fd488afe8c544e0647d7507
Reviewed-on: https://boringssl-review.googlesource.com/27925
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
2018-05-04 17:08:44 +00:00
Adam Langley
0c9ac2e7bf Drop FULL_UNROLL code in aes.c.
We've never defined this so this code has always been dead.

Change-Id: Ibcc4095bf812c7e1866c5f39968789606f0995ae
Reviewed-on: https://boringssl-review.googlesource.com/28024
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2018-05-03 16:10:14 +00:00
David Benjamin
672f6fc248 Always use adr with __thumb2__.
Thumb2 addresses are a bit a mess, depending on whether a label is
interpreted as a function pointer value (for use with BX and BLX) or as
a program counter value (for use with PC-relative addressing). Clang's
integrated assembler mis-assembles this code. See
https://crbug.com/124610#c54 for details.

Instead, use the ADR pseudo-instruction which has clear semantics and
should be supported by every assembler that handles the OpenSSL Thumb2
code. (In other files, the ADR vs SUB conditionals are based on
__thumb2__ already. For some reason, this one is based on __APPLE__, I'm
guessing to deal with an older version of clang assembler.)

It's unclear to me which of clang or binutils is "correct" or if this is
even a well-defined notion beyond "whatever binutils does". But I will
note that https://github.com/openssl/openssl/pull/4669 suggests binutils
has also changed behavior around this before.

See also https://github.com/openssl/openssl/pull/5431 in OpenSSL.

Bug: chromium:124610
Change-Id: I5e7a0c8c0f54a3f65cc324ad599a41883675f368
Reviewed-on: https://boringssl-review.googlesource.com/26164
Commit-Queue: Steven Valdez <svaldez@google.com>
Reviewed-by: Steven Valdez <svaldez@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2018-02-22 22:28:15 +00:00
David Benjamin
f6cf8bbc84 Sync up AES assembly.
This syncs up with OpenSSL master as of
50ea9d2b3521467a11559be41dcf05ee05feabd6. The non-license non-spelling
changes are CFI bits, which were added in upstream in
b84460ad3a3e4fcb22efaa0a8365b826f4264ecf.

Change-Id: I42280985f834d5b9133eacafc8ff9dbd2f0ea59a
Reviewed-on: https://boringssl-review.googlesource.com/25704
Reviewed-by: Adam Langley <agl@google.com>
2018-02-11 01:03:17 +00:00
David Benjamin
6dc994265e Sync up some perlasm license headers and easy fixes.
These files are otherwise up-to-date with OpenSSL master as of
50ea9d2b3521467a11559be41dcf05ee05feabd6, modulo a couple of spelling
fixes which I've imported.

I've also reverted the same-line label and instruction patch to
x86_64-mont*.pl. The new delocate parser handles that fine.

Change-Id: Ife35c671a8104c3cc2fb6c5a03127376fccc4402
Reviewed-on: https://boringssl-review.googlesource.com/25644
Reviewed-by: Adam Langley <agl@google.com>
2018-02-11 01:00:35 +00:00
David Benjamin
875095aa7c Silence ARMv8 deprecated IT instruction warnings.
ARMv8 kindly deprecated most of its IT instructions in Thumb mode.
These files are taken from upstream and are used on both ARMv7 and ARMv8
processors. Accordingly, silence the warnings by marking the file as
targetting ARMv7. In other files, they were accidentally silenced anyway
by way of the existing .arch lines.

This can be reproduced by building with the new NDK and passing
-DCMAKE_ASM_FLAGS=-march=armv8-a. Some of our downstream code ends up
passing that to the assembly.

Note this change does not attempt to arrange for ARMv8-A/T32 to get
code which honors the constraints. It only silences the warnings and
continues to give it the same ARMv7-A/Thumb-2 code that backwards
compatibility dictates it continue to run.

Bug: chromium:575886, b/63131949
Change-Id: I24ce0b695942eaac799347922b243353b43ad7df
Reviewed-on: https://boringssl-review.googlesource.com/24166
Reviewed-by: Adam Langley <agl@google.com>
2017-12-14 01:56:22 +00:00
David Benjamin
4358f104cf Remove clang assembler .arch workaround.
This makes it difficult to build against the NDK's toolchain file. The
problem is __clang__ just means Clang is the frontend and implies
nothing about which assembler. When using as, it is fine. When using
clang-as on Linux, one needs a clang-as from this year.

The only places where we case about clang's integrated assembler are iOS
(where perlasm strips out .arch anyway) and build environments like
Chromium which have a regularly-updated clang. Thus we can remove this
now.

Bug: 39
Update-Note: Holler if this breaks the build. If it doesn't break the
   build, you can probably remove any BORINGSSL_CLANG_SUPPORTS_DOT_ARCH
   or explicit -march armv8-a+crypto lines in your BoringSSL build.
Change-Id: I21ce54b14c659830520c2f1d51c7bd13e0980c68
Reviewed-on: https://boringssl-review.googlesource.com/24124
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-12-13 22:22:41 +00:00
David Benjamin
b8e2d6327a es/asm/{aes-armv4|bsaes-armv7}.pl: make it work with binutils-2.29.
It's not clear if it's a feature or bug, but binutils-2.29[.1]
interprets 'adr' instruction with Thumb2 code reference differently,
in a way that affects calculation of addresses of constants' tables.

(Imported from upstream's b82acc3c1a7f304c9df31841753a0fa76b5b3cda.)

Change-Id: Ia0f5233a9fcfaf18b9d1164bf1c88217c0cbb60d
Reviewed-on: https://boringssl-review.googlesource.com/22724
Commit-Queue: Steven Valdez <svaldez@google.com>
Reviewed-by: Steven Valdez <svaldez@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-11-08 16:53:04 +00:00
David Benjamin
808f832917 Run the comment converter on libcrypto.
crypto/{asn1,x509,x509v3,pem} were skipped as they are still OpenSSL
style.

Change-Id: I3cd9a60e1cb483a981aca325041f3fbce294247c
Reviewed-on: https://boringssl-review.googlesource.com/19504
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-08-18 21:49:04 +00:00
David Benjamin
d4e37951b4 x86_64 assembly pack: "optimize" for Knights Landing, add AVX-512 results.
The changes to the assembly files are synced from upstream's
64d92d74985ebb3d0be58a9718f9e080a14a8e7f. cpu-intel.c is translated to C
from that commit and d84df594404ebbd71d21fec5526178d935e4d88d.

Change-Id: I02c8f83aa4780df301c21f011ef2d8d8300e2f2a
Reviewed-on: https://boringssl-review.googlesource.com/18411
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2017-07-26 22:01:37 +00:00
David Benjamin
0a3663a64f ARMv4 assembly pack: harmonize Thumb-ification of iOS build.
Three modules were left behind in
I59df0b567e8e80befe5c399f817d6410ddafc577.

(Imported from upstream's c93f06c12f10c07cea935abd78a07a037e27f155.)

This actually meant functions defined in those two files were
non-functional. I'm guessing no one noticed upstream because, if you go
strictly by iOS compile-time capabilities, all this code is unreachable
on ios32, only ios64.

Change-Id: I55035edf2aebf96d14bdf66161afa2374643d4ec
Reviewed-on: https://boringssl-review.googlesource.com/17113
Reviewed-by: David Benjamin <davidben@google.com>
2017-06-13 17:49:16 +00:00
David Benjamin
f03cdc3a93 Sync ARM assembly up to 609b0852e4d50251857dbbac3141ba042e35a9ae.
This change was made by copying over the files as of that commit and
then discarding the parts of the diff which corresponding to our own
changes.

Change-Id: I28c5d711f7a8cec30749b8174687434129af5209
Reviewed-on: https://boringssl-review.googlesource.com/17111
Reviewed-by: Adam Langley <agl@google.com>
2017-06-13 17:47:20 +00:00
David Benjamin
8da59555c6 ARMv4 assembly pack: allow Thumb2 even in iOS build, and engage it in most modules.
(Imported from upstream's a285992763f3961f69a8d86bf7dfff020a08cef9.)

Change-Id: I59df0b567e8e80befe5c399f817d6410ddafc577
Reviewed-on: https://boringssl-review.googlesource.com/17110
Reviewed-by: Adam Langley <agl@google.com>
2017-06-13 17:47:10 +00:00
David Benjamin
ae96383af3 ARMv4 assembly pack: implement support for Thumb2.
As some of ARM processors, more specifically Cortex-Mx series, are
Thumb2-only, we need to support Thumb2-only builds even in assembly.

(Imported from upstream's 11208dcfb9105e8afa37233185decefd45e89e17.)

Change-Id: I7cb48ce6a842cf3cfdf553f6e6e6227d52d525c0
Reviewed-on: https://boringssl-review.googlesource.com/17108
Reviewed-by: Adam Langley <agl@google.com>
2017-06-13 17:46:35 +00:00
David Benjamin
c07635f869 Remove local __arm__ ifdef on aes-armv4.pl.
We patch arm-xlate.pl to add the ifdefs, so this isn't needed and
reduces our upstream diff.

(We do still have a diff from upstream here. Will go through them
shortly.)

Change-Id: I5b1e301b9111969815f58d69a98591c973465f42
Reviewed-on: https://boringssl-review.googlesource.com/17105
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-06-12 21:48:54 +00:00
Adam Langley
7c075b99e2 Change ppc64le AES code for FIPS.
The symbol “rcon” should be local in order to avoid collisions and it's
much easier on delocate if some of the expressions are evalulated in
Perl rather than left in the resulting .S file.

Also fix the perlasm style so the symbols are actually local.

Change-Id: Iddfc661fc3a6504bcc5732abaa1174da89ad805e
Reviewed-on: https://boringssl-review.googlesource.com/16524
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-05-25 22:02:22 +00:00
David Benjamin
3ecd0a5fca Convert aes_test to GTest.
This introduces machinery to start embedding the test data files into
the crypto_test binary. Figuring out every CI's test data story is more
trouble than is worth it. The GTest FileTest runner is considerably
different from the old one:

- It returns void and expects failures to use the GTest EXPECT_* and
  ASSERT_* macros, rather than ExpectBytesEqual. This is more monkey
  work to convert, but ultimately less work to add new tests. I think
  it's also valuable for our FileTest and normal test patterns to align
  as much as possible. The line number is emitted via SCOPED_TRACE.

- I've intentionally omitted the Error attribute handling, since that
  doesn't work very well with the new callback. This means evp_test.cc
  will take a little more work to convert, but this is again to keep our
  two test patterns aligned.

- The callback takes a std::function rather than a C-style void pointer.
  This means we can go nuts with lambdas. It also places the path first
  so clang-format doesn't go nuts.

BUG=129

Change-Id: I0d1920a342b00e64043e3ea05f5f5af57bfe77b3
Reviewed-on: https://boringssl-review.googlesource.com/16507
Reviewed-by: Adam Langley <agl@google.com>
2017-05-23 22:33:25 +00:00
David Benjamin
583c12ea97 Remove filename argument to x86 asm_init.
43e5a26b53 removed the .file directive
from x86asm.pl. This removes the parameter from asm_init altogether. See
also upstream's e195c8a2562baef0fdcae330556ed60b1e922b0e.

Change-Id: I65761bc962d09f9210661a38ecf6df23eae8743d
Reviewed-on: https://boringssl-review.googlesource.com/16247
Reviewed-by: Steven Valdez <svaldez@google.com>
Commit-Queue: Steven Valdez <svaldez@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-05-12 14:58:27 +00:00
Adam Langley
2e2a226ac9 Move cipher/ into crypto/fipsmodule/
Change-Id: Id65e0988534056a72d9b40cc9ba5194e2d9b8a7c
Reviewed-on: https://boringssl-review.googlesource.com/15904
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-05-05 22:39:40 +00:00
David Benjamin
def85b403d Revise OPENSSL_ia32cap_P strategy to avoid TEXTRELs.
OPENSSL_ia32cap_addr avoids any relocations within the module, at the
cost of a runtime TEXTREL, which causes problems in some cases.
(Notably, if someone links us into a binary which uses the GCC "ifunc"
attribute, the loader crashes.)

We add a OPENSSL_ia32cap_addr_delta symbol (which is reachable
relocation-free from the module) stores the difference between
OPENSSL_ia32cap_P and its own address.  Next, reference
OPENSSL_ia32cap_P in code as usual, but always doing LEAQ (or the
equivalent GOTPCREL MOVQ) into a register first. This pattern we can
then transform into a LEAQ and ADDQ on OPENSSL_ia32cap_addr_delta.

ADDQ modifies the FLAGS register, so this is only a safe transformation
if we safe and restore flags first. That, in turn, is only a safe
transformation if code always uses %rsp as a stack pointer (specifically
everything below the stack must be fair game for scribbling over). Linux
delivers signals on %rsp, so this should already be an ABI requirement.
Further, we must clear the red zone (using LEAQ to avoid touching FLAGS)
which signal handlers may not scribble over.

This also fixes the GOTTPOFF logic to clear the red zone.

Change-Id: I4ca6133ab936d5a13d5c8ef265a12ab6bd0073c9
Reviewed-on: https://boringssl-review.googlesource.com/15545
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-04-27 21:07:33 +00:00
Adam Langley
0648129566 Move modes/ into the FIPS module
The changes to delocate.go are needed because modes/ does things like
return the address of a module function. Both of these need to be
changed from referencing the GOT to using local symbols.

Rather than testing whether |ghash| is |gcm_ghash_avx|, we can just keep
that information in a flag.

The test for |aesni_ctr32_encrypt_blocks| is more problematic, but I
believe that it's superfluous and can be dropped: if you passed in a
stream function that was semantically different from
|aesni_ctr32_encrypt_blocks| you would already have a bug because
|CRYPTO_gcm128_[en|de]crypt_ctr32| will handle a block at the end
themselves, and assume a big-endian, 32-bit counter anyway.

Change-Id: I68a84ebdab6c6006e11e9467e3362d7585461385
Reviewed-on: https://boringssl-review.googlesource.com/15064
Reviewed-by: Adam Langley <agl@google.com>
2017-04-21 17:46:37 +00:00
Adam Langley
8c62d9dd8b Move AES code into the FIPS module.
Change-Id: Id94e71bce4dca25e77f52f38c07e0489ca072d2d
Reviewed-on: https://boringssl-review.googlesource.com/15027
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-04-14 23:28:00 +00:00