(Imported from upstream's 912f08dd5ed4f68fb275f3b2db828349fcffba14,
52f856526c46ee80ef4c8c37844f084423a3eff7 and
377551b9c4e12aa7846f4d80cf3604f2e396c964)
Change-Id: Ic2bf93371f6d246818729810e7a45b3f0021845a
This ensures high performance is situations when assembler supports
AVX2, but not AD*X.
(Imported from upstream's 82a9dafe32e1e39b5adff18f9061e43d8df3d3c5)
Change-Id: Ie67f49a1c5467807139b6a8a0d4e62162d8a974f
(This appears to be the case with upstream too, it's not that BoringSSL
is missing optimisations from what I can see.)
Change-Id: I0e54762ef0d09e60994ec82c5cca1ff0b3b23ea4
Reviewed-on: https://boringssl-review.googlesource.com/1080
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
(The issue was reported by Shay Gueron.)
The final reduction in Montgomery multiplication computes if (X >= m) then X =
X - m else X = X
In OpenSSL, this was done by computing T = X - m, doing a constant-time
selection of the *addresses* of X and T, and loading from the resulting
address. But this is not cache-neutral.
This patch changes the behaviour by loading both X and T into registers, and
doing a constant-time selection of the *values*.
TODO(fork): only some of the fixes from the original patch still apply to
the 1.0.2 code.
Initial fork from f2d678e6e89b6508147086610e985d4e8416e867 (1.0.2 beta).
(This change contains substantial changes from the original and
effectively starts a new history.)