This is slower, but constant-time. It intentionally omits the signed
digit optimization because we cannot be sure the doubling case will be
unreachable for all curves. This is a fallback generic implementation
for curves which we must support for compatibility but which are not
common or important enough to justify curve-specific work.
Before:
Did 814 ECDH P-384 operations in 1085384us (750.0 ops/sec)
Did 1430 ECDSA P-384 signing operations in 1081988us (1321.6 ops/sec)
Did 308 ECDH P-521 operations in 1057741us (291.2 ops/sec)
Did 539 ECDSA P-521 signing operations in 1049797us (513.4 ops/sec)
After:
Did 715 ECDH P-384 operations in 1080161us (661.9 ops/sec)
Did 1188 ECDSA P-384 verify operations in 1069567us (1110.7 ops/sec)
Did 275 ECDH P-521 operations in 1060503us (259.3 ops/sec)
Did 506 ECDSA P-521 signing operations in 1084739us (466.5 ops/sec)
But we're still faster than the old BIGNUM implementation. EC_FELEM
more than paid for both the loss of points_make_affine and this CL.
Bug: 239
Change-Id: I65d71a731aad16b523928ee47618822d503ea704
Reviewed-on: https://boringssl-review.googlesource.com/27708
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
This introduces EC_FELEM, which is analogous to EC_SCALAR. It is used
for EC_POINT's representation in the generic EC_METHOD, as well as
random operations on tuned EC_METHODs that still are implemented
genericly.
Unlike EC_SCALAR, EC_FELEM's exact representation is awkwardly specific
to the EC_METHOD, analogous to how the old values were BIGNUMs but may
or may not have been in Montgomery form. This is kind of a nuisance, but
no more than before. (If p224-64.c were easily convertable to Montgomery
form, we could say |EC_FELEM| is always in Montgomery form. If we
exposed the internal add and double implementations in each of the
curves, we could give |EC_POINT| an |EC_METHOD|-specific representation
and |EC_FELEM| is purely a |EC_GFp_mont_method| type. I'll leave this
for later.)
The generic add and doubling formulas are aligned with the formulas
proved in fiat-crypto. Those only applied to a = -3, so I've proved a
generic one in https://github.com/mit-plv/fiat-crypto/pull/356, in case
someone uses a custom curve. The new formulas are verified,
constant-time, and swap a multiply for a square. As expressed in
fiat-crypto they do use more temporaries, but this seems to be fine with
stack-allocated EC_FELEMs. (We can try to help the compiler later,
but benchamrks below suggest this isn't necessary.)
Unlike BIGNUM, EC_FELEM can be stack-allocated. It also captures the
bounds in the type system and, in particular, that the width is correct,
which will make it easier to select a point in constant-time in the
future. (Indeed the old code did not always have the correct width. Its
point formula involved halving and implemented this in variable time and
variable width.)
Before:
Did 77274 ECDH P-256 operations in 10046087us (7692.0 ops/sec)
Did 5959 ECDH P-384 operations in 10031701us (594.0 ops/sec)
Did 10815 ECDSA P-384 signing operations in 10087892us (1072.1 ops/sec)
Did 8976 ECDSA P-384 verify operations in 10071038us (891.3 ops/sec)
Did 2600 ECDH P-521 operations in 10091688us (257.6 ops/sec)
Did 4590 ECDSA P-521 signing operations in 10055195us (456.5 ops/sec)
Did 3811 ECDSA P-521 verify operations in 10003574us (381.0 ops/sec)
After:
Did 77736 ECDH P-256 operations in 10029858us (7750.5 ops/sec) [+0.8%]
Did 7519 ECDH P-384 operations in 10068076us (746.8 ops/sec) [+25.7%]
Did 13335 ECDSA P-384 signing operations in 10029962us (1329.5 ops/sec) [+24.0%]
Did 11021 ECDSA P-384 verify operations in 10088600us (1092.4 ops/sec) [+22.6%]
Did 2912 ECDH P-521 operations in 10001325us (291.2 ops/sec) [+13.0%]
Did 5150 ECDSA P-521 signing operations in 10027462us (513.6 ops/sec) [+12.5%]
Did 4264 ECDSA P-521 verify operations in 10069694us (423.4 ops/sec) [+11.1%]
This more than pays for removing points_make_affine previously and even
speeds up ECDH P-256 slightly. (The point-on-curve check uses the
generic code.)
Next is to push the stack-allocating up to ec_wNAF_mul, followed by a
constant-time single-point multiplication.
Bug: 239
Change-Id: I44a2dff7c52522e491d0f8cffff64c4ab5cd353c
Reviewed-on: https://boringssl-review.googlesource.com/27668
Reviewed-by: Adam Langley <agl@google.com>
Some non-FIPS consumers exclude bcm.c and build each fragment file
separately. This means non-FIPS code cannot live in bcm.c.
https://boringssl-review.googlesource.com/25044 made the self-test
function exist outside of FIPS code, so it needed to be moved into is
own file.
To avoid confusing generate_build_files.py, this can't be named
self_test.c, so I went with self_check.c.
Change-Id: I337b39b158bc50d6ca0a8ad1b6e15eb851095e1e
Reviewed-on: https://boringssl-review.googlesource.com/25124
Reviewed-by: Martin Kreichgauer <martinkr@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This change adds |BORINGSSL_self_test|, which allows applications to run
the FIPS KAT tests on demand, even in non-FIPS builds.
Change-Id: I950b30a02ab030d5e05f2d86148beb4ee1b5929c
Reviewed-on: https://boringssl-review.googlesource.com/25044
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: David Benjamin <davidben@google.com>
NIAP requires that the TLS KDF be tested by CAVP so this change moves
the PRF into crypto/fipsmodule/tls and adds a test harness for it. Like
the KAS tests, this is only triggered when “-niap” is passed to
run_cavp.go.
Change-Id: Iaa4973d915853c8e367e6106d829e44fcf1b4ce5
Reviewed-on: https://boringssl-review.googlesource.com/24666
Reviewed-by: Adam Langley <agl@google.com>
The fiat-crypto-generated code uses the Montgomery form implementation
strategy, for both 32-bit and 64-bit code.
64-bit throughput seems slower, but the difference is smaller than noise between repetitions (-2%?)
32-bit throughput has decreased significantly for ECDH (-40%). I am
attributing this to the change from varibale-time scalar multiplication
to constant-time scalar multiplication. Due to the same bottleneck,
ECDSA verification still uses the old code (otherwise there would have
been a 60% throughput decrease). On the other hand, ECDSA signing
throughput has increased slightly (+10%), perhaps due to the use of a
precomputed table of multiples of the base point.
64-bit benchmarks (Google Cloud Haswell):
with this change:
Did 9126 ECDH P-256 operations in 1009572us (9039.5 ops/sec)
Did 23000 ECDSA P-256 signing operations in 1039832us (22119.0 ops/sec)
Did 8820 ECDSA P-256 verify operations in 1024242us (8611.2 ops/sec)
master (40e8c921ca):
Did 9340 ECDH P-256 operations in 1017975us (9175.1 ops/sec)
Did 23000 ECDSA P-256 signing operations in 1039820us (22119.2 ops/sec)
Did 8688 ECDSA P-256 verify operations in 1021108us (8508.4 ops/sec)
benchmarks on ARMv7 (LG Nexus 4):
with this change:
Did 150 ECDH P-256 operations in 1029726us (145.7 ops/sec)
Did 506 ECDSA P-256 signing operations in 1065192us (475.0 ops/sec)
Did 363 ECDSA P-256 verify operations in 1033298us (351.3 ops/sec)
master (2fce1beda0):
Did 245 ECDH P-256 operations in 1017518us (240.8 ops/sec)
Did 473 ECDSA P-256 signing operations in 1086281us (435.4 ops/sec)
Did 360 ECDSA P-256 verify operations in 1003846us (358.6 ops/sec)
64-bit tables converted as follows:
import re, sys, math
p = 2**256 - 2**224 + 2**192 + 2**96 - 1
R = 2**256
def convert(t):
x0, s1, x1, s2, x2, s3, x3 = t.groups()
v = int(x0, 0) + 2**64 * (int(x1, 0) + 2**64*(int(x2,0) + 2**64*(int(x3, 0)) ))
w = v*R%p
y0 = hex(w%(2**64))
y1 = hex((w>>64)%(2**64))
y2 = hex((w>>(2*64))%(2**64))
y3 = hex((w>>(3*64))%(2**64))
ww = int(y0, 0) + 2**64 * (int(y1, 0) + 2**64*(int(y2,0) + 2**64*(int(y3, 0)) ))
if ww != v*R%p:
print(x0,x1,x2,x3)
print(hex(v))
print(y0,y1,y2,y3)
print(hex(w))
print(hex(ww))
assert 0
return '{'+y0+s1+y1+s2+y2+s3+y3+'}'
fe_re = re.compile('{'+r'(\s*,\s*)'.join(r'(\d+|0x[abcdefABCDEF0123456789]+)' for i in range(4)) + '}')
print (re.sub(fe_re, convert, sys.stdin.read()).rstrip('\n'))
32-bit tables converted from 64-bit tables
Change-Id: I52d6e5504fcb6ca2e8b0ee13727f4500c80c1799
Reviewed-on: https://boringssl-review.googlesource.com/23244
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
crypto/{asn1,x509,x509v3,pem} were skipped as they are still OpenSSL
style.
Change-Id: I3cd9a60e1cb483a981aca325041f3fbce294247c
Reviewed-on: https://boringssl-review.googlesource.com/19504
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Change-Id: I8512c6bfb62f1a83afc8f763d681bf5db3b4ceae
Reviewed-on: https://boringssl-review.googlesource.com/17144
Commit-Queue: Adam Langley <alangley@gmail.com>
Reviewed-by: David Benjamin <davidben@google.com>
This matches the example code in IG 9.10.
Change-Id: Ie010d135d6c30acb9248b689302b0a27d65bc4f7
Reviewed-on: https://boringssl-review.googlesource.com/17006
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
We want to clarify that this isn't the PWCT that FIPS generally means,
but rather the power-on self-test. Since ECDSA is non-deterministic, we
have to implement that power-on self-test as a PWCT, but we have a
different flag to break that actual PWCT.
Change-Id: I3e27c6a6b0483a6c04e764d6af8a4a863e0b8b77
Reviewed-on: https://boringssl-review.googlesource.com/16765
Reviewed-by: Adam Langley <agl@google.com>
Change-Id: Iab7a738a8981de7c56d1585050e78699cb876dab
Reviewed-on: https://boringssl-review.googlesource.com/16467
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Saves having it in several places.
Change-Id: I329e1bf4dd4a7f51396e36e2604280fcca32b58c
Reviewed-on: https://boringssl-review.googlesource.com/16026
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This allows breaking Known Answer Tests for AES-GCM, DES, SHA-1,
SHA-256, SHA-512, RSA signing and DRBG as required by FIPS.
Change-Id: I8e59698a5048656021f296195229a09ca5cd767c
Reviewed-on: https://boringssl-review.googlesource.com/16088
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This is required by FIPS testing.
Change-Id: Ia399a0bf3d03182499c0565278a3713cebe771e3
Reviewed-on: https://boringssl-review.googlesource.com/16044
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
SHA-512 is faster to calculate on 64-bit systems and we're only
targetting 64-bit systems with FIPS.
Change-Id: I5e9b8419ad4ddc72ec682c4193ffb17975d228e5
Reviewed-on: https://boringssl-review.googlesource.com/16025
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This avoids depending the FIPS module on crypto/bytestring and moves
ECDSA_SIG_{new,free} into the module.
Change-Id: I7b45ef07f1140873a0da300501141b6ae272a5d9
Reviewed-on: https://boringssl-review.googlesource.com/15984
Reviewed-by: Adam Langley <agl@google.com>
The names in the P-224 code collided with the P-256 code and thus many
of the functions and constants in the P-224 code have been prefixed.
Change-Id: I6bcd304640c539d0483d129d5eaf1702894929a8
Reviewed-on: https://boringssl-review.googlesource.com/15847
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This will let us keep CBS/CBB out of the module. It also makes the PWCT
actually use a hard-coded public key since kEC was using the
private-key-only serialization.
Change-Id: I3769fa26fc789c4797a56534df73f810cf5441c4
Reviewed-on: https://boringssl-review.googlesource.com/15830
Reviewed-by: Adam Langley <agl@google.com>
This will let us keep CBS/CBB out of the module.
Change-Id: I780de0fa2c102cf27eee2cc242ee23740fbc16ce
Reviewed-on: https://boringssl-review.googlesource.com/15829
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
We check the length for MD5+SHA1 but not the normal cases. Instead,
EVP_PKEY_sign externally checks the length (largely because the silly
RSA-PSS padding function forces it). We especially should be checking
the length for these because otherwise the prefix built into the ASN.1
prefix is wrong.
The primary motivation is to avoid putting EVP_PKEY inside the FIPS
module. This means all logic for supported algorithms should live in
crypto/rsa.
This requires fixing up the verify_recover logic and some tests,
including bcm.c's KAT bits.
(evp_tests.txt is now this odd mixture of EVP-level and RSA-level error
codes. A follow-up change will add new APIs for RSA-PSS which will allow
p_rsa.c to be trimmed down and make things consistent.)
Change-Id: I29158e9695b28e8632b06b449234a5dded35c3e7
Reviewed-on: https://boringssl-review.googlesource.com/15824
Reviewed-by: Adam Langley <agl@google.com>
Change-Id: I167b7045c537d95294d387936f3d7bad530e1c6f
Reviewed-on: https://boringssl-review.googlesource.com/15844
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Support for platforms that we don't support FIPS on doesn't need to be
in the module. Also, functions for dealing with whether fork-unsafe
buffering is enabled are left out because they aren't implementing any
cryptography and they use global r/w state, making their inclusion
painful.
Change-Id: I71a0123db6f5449e9dfc7ec7dea0944428e661aa
Reviewed-on: https://boringssl-review.googlesource.com/15084
Reviewed-by: Adam Langley <agl@google.com>
With some optimisation settings, Clang was loading
BORINGSSL_bcm_text_hash with AVX2 instructions, which weren't getting
translated correctly. This seems to work and is less fragile.
The compiler just emits an leaq here. This is because it knows the
symbol is hidden (in the shared library sense), so it needn't go through
GOTPCREL. The assembler would have added a relocation, were the symbol
left undefined, but since we define the symbol later on, it all works
out without a relocation.
Were the symbol not hidden, the compiler would have emitted a movq by
way of GOTPCREL, but we can now translate those away anyway.
Change-Id: I442a22f4f8afaadaacbab7044f946a963ebfc46c
Reviewed-on: https://boringssl-review.googlesource.com/15384
Reviewed-by: Adam Langley <agl@google.com>
Change-Id: Ibd6b9b12b3b622f67f69da5c2add8b1b040882f1
Reviewed-on: https://boringssl-review.googlesource.com/15344
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
The changes to delocate.go are needed because modes/ does things like
return the address of a module function. Both of these need to be
changed from referencing the GOT to using local symbols.
Rather than testing whether |ghash| is |gcm_ghash_avx|, we can just keep
that information in a flag.
The test for |aesni_ctr32_encrypt_blocks| is more problematic, but I
believe that it's superfluous and can be dropped: if you passed in a
stream function that was semantically different from
|aesni_ctr32_encrypt_blocks| you would already have a bug because
|CRYPTO_gcm128_[en|de]crypt_ctr32| will handle a block at the end
themselves, and assume a big-endian, 32-bit counter anyway.
Change-Id: I68a84ebdab6c6006e11e9467e3362d7585461385
Reviewed-on: https://boringssl-review.googlesource.com/15064
Reviewed-by: Adam Langley <agl@google.com>
It's not obvious how to make ASAN happy with the integrity test but this
will let us test FIPS-only code with ASAN at least.
Change-Id: Iac983787e04cb86a158e4416c410d9b2d1e5e03f
Reviewed-on: https://boringssl-review.googlesource.com/14965
Reviewed-by: Adam Langley <agl@google.com>
This restores the original version of delocate.go, with the subsequent
bugfixes patched in. With this, the FIPS module builds with GCC and
Clang, with and without optimizations. I did patch over a variant of the
macro though, since it was otherwise really wordy.
Playing games with sections was a little overly clever and relied on the
compiler not performing a number of optimizations. Clang blew threw all
of those assumptions.
Change-Id: Ib4da468a5925998457994f9e392cf0c04573fe91
Reviewed-on: https://boringssl-review.googlesource.com/14805
Reviewed-by: Adam Langley <agl@google.com>