(This appears to be the case with upstream too, it's not that BoringSSL
is missing optimisations from what I can see.)
Change-Id: I0e54762ef0d09e60994ec82c5cca1ff0b3b23ea4
Reviewed-on: https://boringssl-review.googlesource.com/1080
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
(The issue was reported by Shay Gueron.)
The final reduction in Montgomery multiplication computes if (X >= m) then X =
X - m else X = X
In OpenSSL, this was done by computing T = X - m, doing a constant-time
selection of the *addresses* of X and T, and loading from the resulting
address. But this is not cache-neutral.
This patch changes the behaviour by loading both X and T into registers, and
doing a constant-time selection of the *values*.
TODO(fork): only some of the fixes from the original patch still apply to
the 1.0.2 code.
Initial fork from f2d678e6e89b6508147086610e985d4e8416e867 (1.0.2 beta).
(This change contains substantial changes from the original and
effectively starts a new history.)