The lazy-initialisation of BN_MONT_CTX was serialising all threads, as noted by
Daniel Sands and co at Sandia. This was to handle the case that 2 or more
threads race to lazy-init the same context, but stunted all scalability in the
case where 2 or more threads are doing unrelated things! We favour the latter
case by punishing the former. The init work gets done by each thread that finds
the context to be uninitialised, and we then lock the "set" logic after that
work is done - the winning thread's work gets used, the losing threads throw
away what they've done.
(Imported from upstream's bf43446835bfd3f9abf1898a99ae20f2285320f3)
It's not clear whether this inconsistency could lead to an actual
computation error, but it involved a BIGNUM being passed around the
montgomery logic in an inconsistent state. This was found using flags
-DBN_DEBUG -DBN_DEBUG_RAND, and working backwards from this assertion
in 'ectest';
ectest: bn_mul.c:960: BN_mul: Assertion `(_bnum2->top == 0) ||
(_bnum2->d[_bnum2->top - 1] != 0)' failed
(Imported from upstream's 3cc546a3bbcbf26cd14fc45fb133d36820ed0a75)
This change adds a new function, BN_bn2bin_padded, that attempts, as
much as possible, to serialise a BIGNUM in constant time.
This is used to avoid some timing leaks in RSA decryption.
Some RSA private keys are specified with only n, e and d. Although we
can use these keys directly, it's nice to have a uniform representation
that includes the precomputed CRT values. This change adds a function
that can recover the primes from a minimal private key of that form.
Ensure that, when generating small primes, the result is actually of the
requested size. Fixes OpenSSL #2701.
This change does not address the cases of generating safe primes, or
where the |add| parameter is non-NULL.
(The issue was reported by Shay Gueron.)
The final reduction in Montgomery multiplication computes if (X >= m) then X =
X - m else X = X
In OpenSSL, this was done by computing T = X - m, doing a constant-time
selection of the *addresses* of X and T, and loading from the resulting
address. But this is not cache-neutral.
This patch changes the behaviour by loading both X and T into registers, and
doing a constant-time selection of the *values*.
TODO(fork): only some of the fixes from the original patch still apply to
the 1.0.2 code.
Initial fork from f2d678e6e89b6508147086610e985d4e8416e867 (1.0.2 beta).
(This change contains substantial changes from the original and
effectively starts a new history.)