Independent of the underlying CPU architecture, casting unaligned
pointers to uint64_t* is undefined. Just use a memcpy. The compiler
should be able to optimize that itself.
Change-Id: I39210871fca3eaf1f4b1d205b2bb0c337116d9cc
Reviewed-on: https://boringssl-review.googlesource.com/c/34872
Reviewed-by: Adam Langley <agl@google.com>
There is a C implementation of gcm_ghash_4bit to pair with
gcm_gmult_4bit. It's even slightly faster per the numbers below (x86_64
OPENSSL_NO_ASM build), but, more importantly, we trim down the
combinatorial explosion of GCM implementations and free up complexity
budget for potentially using bsaes better in the future.
Old:
Did 2557000 AES-128-GCM (16 bytes) seal operations in 1000057us (2556854.3 ops/sec): 40.9 MB/s
Did 94000 AES-128-GCM (1350 bytes) seal operations in 1009613us (93105.0 ops/sec): 125.7 MB/s
Did 17000 AES-128-GCM (8192 bytes) seal operations in 1024768us (16589.1 ops/sec): 135.9 MB/s
Did 2511000 AES-256-GCM (16 bytes) seal operations in 1000196us (2510507.9 ops/sec): 40.2 MB/s
Did 84000 AES-256-GCM (1350 bytes) seal operations in 1000412us (83965.4 ops/sec): 113.4 MB/s
Did 15000 AES-256-GCM (8192 bytes) seal operations in 1046963us (14327.2 ops/sec): 117.4 MB/s
New:
Did 2739000 AES-128-GCM (16 bytes) seal operations in 1000322us (2738118.3 ops/sec): 43.8 MB/s
Did 100000 AES-128-GCM (1350 bytes) seal operations in 1008190us (99187.7 ops/sec): 133.9 MB/s
Did 17000 AES-128-GCM (8192 bytes) seal operations in 1006360us (16892.6 ops/sec): 138.4 MB/s
Did 2546000 AES-256-GCM (16 bytes) seal operations in 1000150us (2545618.2 ops/sec): 40.7 MB/s
Did 86000 AES-256-GCM (1350 bytes) seal operations in 1000970us (85916.7 ops/sec): 116.0 MB/s
Did 14850 AES-256-GCM (8192 bytes) seal operations in 1023459us (14509.6 ops/sec): 118.9 MB/s
While I'm here, tighten up some of the functions and align the ctr32 and
non-ctr32 paths.
Bug: 256
Change-Id: Id4df699cefc8630dd5a350d44f927900340f5e60
Reviewed-on: https://boringssl-review.googlesource.com/c/34869
Reviewed-by: Adam Langley <agl@google.com>
Change-Id: I0c648340ac7bb134fcda42c56a83f4815bbaa557
Reviewed-on: https://boringssl-review.googlesource.com/c/34884
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
This should make things a bit easier to debug.
Update-Note: Test binaries on Windows now link to dbghelp.
Bug: 259
Change-Id: I9da1fc89d429080c5250238e4341445922b1dd8e
Reviewed-on: https://boringssl-review.googlesource.com/c/34868
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
But for the ABI conversion bits, these are just leaf functions and don't
even need unwind tables. Just renumber the registers on Windows to only
used volatile ones.
In doing so, this switches to writing rdrand explicitly. perlasm already
knows how to manually encode it and our minimum assembler versions
surely cover rdrand by now anyway. Also add the .size directive. I'm not
sure what it's used for, but the other files have it.
(This isn't a generally reusable technique. The more complex functions
will need actual unwind codes.)
Bug: 259
Change-Id: I1d5669bcf8b6e34939885d78aea6f60597be1528
Reviewed-on: https://boringssl-review.googlesource.com/c/34867
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
This shrinks the bssl binary by about 8k.
Change-Id: I571f258ccf7032ae34db3f20904ad9cc81cca839
Reviewed-on: https://boringssl-review.googlesource.com/c/34866
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Change-Id: Ic23fc5fbec2c4f8df5d06f807c6bd2c5e1f0e99c
Reviewed-on: https://boringssl-review.googlesource.com/c/34865
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Fix some missing CFI bits.
Change-Id: I42114527f0ef8e03079d37a9f466d64a63a313f5
Reviewed-on: https://boringssl-review.googlesource.com/c/34864
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
EVP_get_cipherbyname should work on everything that EVP_do_all_sorted
lists, and conversely, there should be nothing that
EVP_get_cipherbyname works on that EVP_do_all_sorted doesn't list.
node.js uses these APIs to enumerate and instantiate ciphers.
Change-Id: I87fcedce62d06774f7c6ee7acc898326276be089
Reviewed-on: https://boringssl-review.googlesource.com/c/33984
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Thanks to Nico Weber for pointing this out.
Change-Id: I763fd4a6f8fe467a027d5b249d9f76633ab4375a
Reviewed-on: https://boringssl-review.googlesource.com/c/34824
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Steven Valdez <svaldez@google.com>
Reviewed-by: Steven Valdez <svaldez@google.com>
It's the same as for clients, and we're probably not going to change
that any time soon.
Change-Id: Ic48cb640e98b0957d264267b97b5393f1977c6e6
Reviewed-on: https://boringssl-review.googlesource.com/c/34665
Reviewed-by: David Benjamin <davidben@google.com>
This caught a bug in bn_mul_mont. Tested manually on iOS and Android.
Change-Id: I1819fcd9ad34dbe3ba92bba952507d86dd12185a
Reviewed-on: https://boringssl-review.googlesource.com/c/34805
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
This was caught by an aarch64 ABI tester. aarch64 has the same
considerations around small arguments as x86_64 does. The aarch64
version of bn_mul_mont does not mask off the upper words of the
argument.
The x86_64 version does, so size_t is, strictly speaking, wrong for
aarch64, but bn_mul_mont already has an implicit size limit to support
its internal alloca, so this doesn't really make things worse than
before.
Change-Id: I39bffc8fdb2287e45a2d1f0d1b4bd5532bbf3868
Reviewed-on: https://boringssl-review.googlesource.com/c/34804
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Update-Note: There's some chance this'll break iOS since I was unable to
test it there. The iPad I have to test on is too new to run 32-bit code
at all.
Change-Id: I6593f91b67a5e8a82828237d3b69ed948b07922d
Reviewed-on: https://boringssl-review.googlesource.com/c/34725
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
Unfortunately, due to most OpenSSL assembly using custom exception
handlers to unwind, most of our assembly doesn't work with
non-destructive unwind. For now, CHECK_ABI behaves like
CHECK_ABI_NO_UNWIND on Windows, and CHECK_ABI_SEH will test unwinding on
both platforms.
The tests do, however, work with the unwind-code-based assembly we
recently added, as well as the clmul-based GHASH which is also
code-based. Remove the ad-hoc SEH tests which intentionally hit memory
access exceptions, now that we can test unwind directly.
Now that we can test it, the next step is to implement SEH directives in
perlasm so writing these unwind codes is less of a chore.
Bug: 259
Change-Id: I23a57a22c5dc9fa4513f575f18192335779678a5
Reviewed-on: https://boringssl-review.googlesource.com/c/34784
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
The .type foo, @abi-omnipotent lines weren't being parsed correctly.
This doesn't change the generated files, but some internal state (used
in-progress work on perlasm SEH directives) wasn't quite right.
Change-Id: Id6aec79281a59f45b2eb2aea9f1fb8806b4c483e
Reviewed-on: https://boringssl-review.googlesource.com/c/34786
Reviewed-by: Adam Langley <agl@google.com>
RSA keygen isn't the fastest. Just use the existing one in
rsaCertificate.
Change-Id: Icd151232928e67e0a7d5becabf9dc96b0e9bfa22
Reviewed-on: https://boringssl-review.googlesource.com/c/34764
Commit-Queue: Steven Valdez <svaldez@google.com>
Reviewed-by: Steven Valdez <svaldez@google.com>
RAND_bytes rarely uses large enough inputs for bsaes to be worth it.
https://boringssl-review.googlesource.com/c/boringssl/+/33589 includes some
rough benchmarks of various bits here. Some observations:
- 8 blocks of bsaes costs roughly 6.5 blocks of vpaes. Note the comparison
isn't quite accurate because I'm measuring bsaes_ctr32_encrypt_blocks against
vpaes_encrypt and vpaes in CTR mode today must make do with a C loop. Even
assuming a cutoff of 6 rather than 7 blocks, it's rare to ask for 96 bytes
of entropy at a time.
- CTR-DRBG performs some stray block operations (ctr_drbg_update), which bsaes
is bad at without extra work to fold them into the CTR loop (not really worth
it).
- CTR-DRBG calculates a couple new key schedules every RAND_bytes call. We
don't currently have a constant-time bsaes key schedule. Unfortunately, even
plain vpaes loses to the current aes_nohw used by bsaes, but it's not
constant-time. Also taking CTR-DRBG out of the bsaes equation
- Machines without AES hardware (clients) are not going to be RNG-bound. It's
mostly servers pushing way too many CBC IVs that care. This means bsaes's
current side channel tradeoffs make even less sense here.
I'm not sure yet what we should do for the rest of the bsaes mess, but it seems
clear that we want to stick with vpaes for the RNG.
Bug: 256
Change-Id: Iec8f13af232794afd007cb1065913e8117eeee24
Reviewed-on: https://boringssl-review.googlesource.com/c/34744
Reviewed-by: Adam Langley <agl@google.com>
Hexadecimals were erroneously recognized as symbols in .xdata.
(Imported from upstream's b068a9b914887af5cc99895754412582fbb0e10b)
Change-Id: I5d8e8e1969669a8961733802d9f034cf26c45552
Reviewed-on: https://boringssl-review.googlesource.com/c/34704
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Android's official documentation seems to assume you're using the NDK
build system or Android Studio. I extracted this from one of their
scripts a while back. May as well put it somewhere we can easily find
it.
Change-Id: I259abc54e6935ab537956a7cbf9f80e924a60b7a
Reviewed-on: https://boringssl-review.googlesource.com/c/34724
Reviewed-by: Adam Langley <agl@google.com>
For now, this is off by default and controlled by SSL_set_enforce_rsa_key_usage.
This may be set as late as certificate verification so we may start by enforcing
it for known roots.
Generalizes ssl_cert_check_digital_signature_key_usage to check any part of the
key_usage, and adds a new error KEY_USAGE_BIT_INCORRECT for the generalized
method.
Bug: chromium:795089
Change-Id: Ifa504c321bec3263a4e74f2dc48513e3b895d3ee
Reviewed-on: https://boringssl-review.googlesource.com/c/34604
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
They were causing a "panic: ServerHello unexpectedly contained extensions"
if the client unconditionally signals support for OCSP or SCTs.
Change-Id: Ia60639431daf78679b269dfe337c1af171fd7d8b
Reviewed-on: https://boringssl-review.googlesource.com/c/34644
Reviewed-by: David Benjamin <davidben@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Calling conventions must specify how to handle arguments smaller than a
machine word. Should the caller pad them up to a machine word size with
predictable values (zero/sign-extended), or should the callee tolerate
an arbitrary bit pattern?
Annoyingly, I found no text in either SysV or Win64 ABI documentation
describing any of this and resorted to experiment. The short answer is
that callees must tolerate an arbitrary bit pattern on x86_64, which
means we must test this. See the comment in abi_test::internal::ToWord
for the long answer.
CHECK_ABI now, if the type of the parameter is smaller than
crypto_word_t, fills the remaining bytes with 0xaa. This is so the
number is out of bounds for code expecting either zero or sign
extension. (Not that crypto assembly has any business seeing negative
numbers.)
Doing so reveals a bug in ecp_nistz256_ord_sqr_mont. The rep parameter
is typed int, but the code expected uint64_t. In practice, the compiler
will always compile this correctly because:
- On both Win64 and SysV, rep is a register parameter.
- The rep parameter is always a constant, so the compiler has no reason
to leave garbage in the upper half.
However, I was indeed able to get a bug out of GCC via:
uint64_t foo = (1ull << 63) | 2; // Some global the compiler can't
// prove constant.
ecp_nistz256_ord_sqr_mont(res, a, foo >> 1);
Were ecp_nistz256_ord_sqr_mont a true int-taking function, this would
act like ecp_nistz256_ord_sqr_mont(res, a, 1). Instead, it hung. Fix
this by having it take a full-width word.
This mess has several consequences:
- ABI testing now ideally needs a functional testing component to fully cover
this case. A bad input might merely produce the wrong answer. Still,
this is fairly effective as it will cause most code to either segfault
or loop forever. (Not the enc parameter to AES however...)
- We cannot freely change the type of assembly function prototypes. If the
prototype says int or unsigned, it must be ignoring the upper half and
thus "fixing" it to size_t cannot have handled the full range. (Unless
it was simply wrong of the parameter is already bounded.) If the
prototype says size_t, switching to int or unsigned will hit this type
of bug. The former is a safer failure mode though.
- The simplest path out of this mess: new assembly code should *only*
ever take word-sized parameters. This is not a tall order as the bad
parameters are usually ints that should have been size_t.
Calling conventions are hard.
Change-Id: If8254aff8953844679fbce4bd3e345e5e2fa5213
Reviewed-on: https://boringssl-review.googlesource.com/c/34627
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
As part of this, move the CPU checks to C.
Change-Id: I17b701e1196c1ca116bbd23e0e669cf603ad464d
Reviewed-on: https://boringssl-review.googlesource.com/c/34626
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
It's an assembly function, so types are a little meaningless, but
everything is passed through as BN_ULONG, so be consistent. Also
annotate all the RSAZ prototypes with sizes.
Change-Id: I32e59e896da39e79c30ce9db52652fd645a033b4
Reviewed-on: https://boringssl-review.googlesource.com/c/34625
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
This is much less interesting (stack-based parameters, Windows and SysV
match, no SEH concerns as far as I can tell) than x86_64, but it was
easy to do and I'm more familiar with x86 than ARM, so it made a better
second architecture to make sure all the architecture ifdefs worked out.
Also fix a bug in the x86_64 direction flag code. It was shifting in the
wrong direction, making give 0 or 1<<20 rather than 0 or 1.
(Happily, x86_64 appears to be unique in having vastly different calling
conventions between OSs. x86 is the same between SysV and Windows, and
ARM had the good sense to specify a (mostly) common set of rules.)
Since a lot of the assembly functions use the same names and the tests
were written generically, merely dropping in a trampoline and
CallerState implementation gives us a bunch of ABI tests for free.
Change-Id: I15408c18d43e88cfa1c5c0634a8b268a150ed961
Reviewed-on: https://boringssl-review.googlesource.com/c/34624
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
OpenSSL's EVP-level EC API involves a separate "paramgen" operation,
which is ultimately just a roundabout way to go from a NID to an
EC_GROUP. But Node uses this, and it's the pattern used within OpenSSL
these days, so this appears to be the official upstream recommendation.
Also add a #define for OPENSSL_EC_EXPLICIT_CURVE, because Node uses it,
but fail attempts to use it. Explicit curve encodings are forbidden by
RFC 5480 and generally a bad idea. (Parsing such keys back into OpenSSL
will cause it to lose the optimized path.)
Change-Id: I5e97080e77cf90fc149f6cf6f2cc4900f573fc64
Reviewed-on: https://boringssl-review.googlesource.com/c/34565
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
This doesn't cover all the functions used by Node, but it's the easy
bits. (EVP_PKEY_paramgen will be done separately as its a non-trivial
bit of machinery.)
Change-Id: I6501e99f9239ffcdcc57b961ebe85d0ad3965549
Reviewed-on: https://boringssl-review.googlesource.com/c/34544
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
We currently require clmul instructions for constant-time GHASH
on x86_64. Otherwise, it falls back to a variable-time 4-bit table
implementation. However, a significant proportion of clients lack these
instructions.
Inspired by vpaes, we can use pshufb and a slightly different order of
incorporating the bits to make a constant-time GHASH. This requires
SSSE3, which is very common. Benchmarking old machines we had on hand,
it appears to be a no-op on Sandy Bridge and a small slowdown for
Penryn.
Sandy Bridge (Intel Pentium CPU 987 @ 1.50GHz):
(Note: these numbers are before 16-byte-aligning the table. That was an
improvement on Penryn, so it's possible Sandy Bridge is now better.)
Before:
Did 4244750 AES-128-GCM (16 bytes) seal operations in 4015000us (1057222.9 ops/sec): 16.9 MB/s
Did 442000 AES-128-GCM (1350 bytes) seal operations in 4016000us (110059.8 ops/sec): 148.6 MB/s
Did 84000 AES-128-GCM (8192 bytes) seal operations in 4015000us (20921.5 ops/sec): 171.4 MB/s
Did 3349250 AES-256-GCM (16 bytes) seal operations in 4016000us (833976.6 ops/sec): 13.3 MB/s
Did 343500 AES-256-GCM (1350 bytes) seal operations in 4016000us (85532.9 ops/sec): 115.5 MB/s
Did 65250 AES-256-GCM (8192 bytes) seal operations in 4015000us (16251.6 ops/sec): 133.1 MB/s
After:
Did 4229250 AES-128-GCM (16 bytes) seal operations in 4016000us (1053100.1 ops/sec): 16.8 MB/s [-0.4%]
Did 442250 AES-128-GCM (1350 bytes) seal operations in 4016000us (110122.0 ops/sec): 148.7 MB/s [+0.1%]
Did 83500 AES-128-GCM (8192 bytes) seal operations in 4015000us (20797.0 ops/sec): 170.4 MB/s [-0.6%]
Did 3286500 AES-256-GCM (16 bytes) seal operations in 4016000us (818351.6 ops/sec): 13.1 MB/s [-1.9%]
Did 342750 AES-256-GCM (1350 bytes) seal operations in 4015000us (85367.4 ops/sec): 115.2 MB/s [-0.2%]
Did 65250 AES-256-GCM (8192 bytes) seal operations in 4016000us (16247.5 ops/sec): 133.1 MB/s [-0.0%]
Penryn (Intel Core 2 Duo CPU P8600 @ 2.40GHz):
Before:
Did 1179000 AES-128-GCM (16 bytes) seal operations in 1000139us (1178836.1 ops/sec): 18.9 MB/s
Did 97000 AES-128-GCM (1350 bytes) seal operations in 1006347us (96388.2 ops/sec): 130.1 MB/s
Did 18000 AES-128-GCM (8192 bytes) seal operations in 1028943us (17493.7 ops/sec): 143.3 MB/s
Did 977000 AES-256-GCM (16 bytes) seal operations in 1000197us (976807.6 ops/sec): 15.6 MB/s
Did 82000 AES-256-GCM (1350 bytes) seal operations in 1012434us (80992.9 ops/sec): 109.3 MB/s
Did 15000 AES-256-GCM (8192 bytes) seal operations in 1006528us (14902.7 ops/sec): 122.1 MB/s
After:
Did 1306000 AES-128-GCM (16 bytes) seal operations in 1000153us (1305800.2 ops/sec): 20.9 MB/s [+10.8%]
Did 94000 AES-128-GCM (1350 bytes) seal operations in 1009852us (93082.9 ops/sec): 125.7 MB/s [-3.4%]
Did 17000 AES-128-GCM (8192 bytes) seal operations in 1012096us (16796.8 ops/sec): 137.6 MB/s [-4.0%]
Did 1070000 AES-256-GCM (16 bytes) seal operations in 1000929us (1069006.9 ops/sec): 17.1 MB/s [+9.4%]
Did 79000 AES-256-GCM (1350 bytes) seal operations in 1002209us (78825.9 ops/sec): 106.4 MB/s [-2.7%]
Did 15000 AES-256-GCM (8192 bytes) seal operations in 1061489us (14131.1 ops/sec): 115.8 MB/s [-5.2%]
Change-Id: I1c3760a77af7bee4aee3745d1c648d9e34594afb
Reviewed-on: https://boringssl-review.googlesource.com/c/34267
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
These tests failed when CECPQ2 was enabled by default. Even if we're
not going to make CECPQ2 the default, it's worth fixing them to be more
robust.
Change-Id: Idef508bca9e17a4ef0e0a8a396755abd975f9908
Reviewed-on: https://boringssl-review.googlesource.com/c/34524
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
If 256-bit ciphers are a requirement for CECPQ2 then that introduces a
link between supported ciphers and supported groups: offering CECPQ2
without a 256-bit cipher is invalid. But that's a little weird since
these things were otherwise independent.
So, rather than require a 256-bit cipher for CECPQ2, just prefer them.
Change-Id: I491749e41708cd9c5eeed5b4ae23c11e5c0b9725
Reviewed-on: https://boringssl-review.googlesource.com/c/34504
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
11.0.2 has since been released, but we are now aware of several more
bugs, so the workaround is unlikely to be removable for the foreseeable
future.
Change-Id: I8e7edcba2f002d0558a21e607306ddf9a205bfb3
Reviewed-on: https://boringssl-review.googlesource.com/c/34484
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
On our bots, debug unit tests take around 2.5x as long to complete as
release tests on Linux, 3x as long on macOS, and 6x as long on Windows.
Our tests are fast, so this does not particularly matter, but SDE
inflates a 13 second test run to 8 minutes. On Windows (MSVC), where we
don't but would like to test with SDE, the difference between optimized
and unoptimized is even larger, and test runs are slower in general.
This suggests running SDE tests in release mode. Release mode tests,
however, are less effective because they do not include asserts. Thus,
add a RelWithAsserts option.
(Chromium does something similar. I believe most of the test-running
configurations on the critical path run is_debug = false and
dcheck_always_on = true.)
Change-Id: I273dd86ab8ea039f34eca431483827c87dc5c461
Reviewed-on: https://boringssl-review.googlesource.com/c/34464
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
With 2fe0360a4e, we no longer use the
other member of this union so it can be removed.
Change-Id: Ideb7c47a72df0b420eb1e7d8c718e1cacb2129f5
Reviewed-on: https://boringssl-review.googlesource.com/c/34449
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
When built under UBSan, it gets confused inside a PLT stub.
Change-Id: Ib082ecc076ba2111337ff5921e465e4beb99aab5
Reviewed-on: https://boringssl-review.googlesource.com/c/34448
Reviewed-by: Adam Langley <agl@google.com>
qsort shares the same C language bug as mem*. Two of our calls may see
zero-length lists. This trips UBSan.
Change-Id: Id292dd277129881001eb57b1b2db78438cf4642e
Reviewed-on: https://boringssl-review.googlesource.com/c/34447
Reviewed-by: Adam Langley <agl@google.com>
Due to a language flaw in C, left-shifts on signed integers are
undefined for negative numbers. This makes them all but useless. Cast to
the unsigned type, left-shift, and cast back (casts are defined to wrap)
to silence UBSan.
Change-Id: I8fbe739aee1c99cf553462b675863e6d68c2b302
Reviewed-on: https://boringssl-review.googlesource.com/c/34446
Reviewed-by: Adam Langley <agl@google.com>
Casting an unaligned pointer to uint64_t* is undefined, even on
platforms that support unaligned access. Additionally, dereferencing as
uint64_t violates strict aliasing rules. Instead, use memcpys which we
assume any sensible compiler can optimize. Also simplify the PULL64
business with the existing CRYPTO_bswap8.
This also removes the need for the
SHA512_BLOCK_CAN_MANAGE_UNALIGNED_DATA logic. The generic C code now
handles unaligned data and the assembly already can as well. (The only
problematic platform with assembly is old ARM, but sha512-armv4.pl
already handles this via an __ARM_ARCH__ check. See also OpenSSL's
version of this file which always defines
SHA512_BLOCK_CAN_MANAGE_UNALIGNED_DATA if SHA512_ASM is defined.)
Add unaligned tests to digest_test.cc, so we retain coverage of
unaligned EVP_MD inputs.
Change-Id: Idfd8586c64bab2a77292af2fa8eebbd193e57c7d
Reviewed-on: https://boringssl-review.googlesource.com/c/34444
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
With HRSS-SXY, the sampling algorithm now longer has to be the same
between the two parties. Therefore we can change it at will (as long as
it remains reasonably uniform) and thus take the opportunity to make the
output distribution flatter.
Change-Id: I74c667fcf919fe11ddcf2f4fb8a540b5112268bf
Reviewed-on: https://boringssl-review.googlesource.com/c/34404
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
The first attempt involved using Linux's support for hardware
breakpoints to detect when assembly code was run. However, this doesn't
work with SDE, which is a problem.
This version has the assembly code update a global flags variable when
it's run, but only in non-FIPS and non-debug builds.
Update-Note: Assembly files now pay attention to the NDEBUG preprocessor
symbol. Ensure the build passes the symbol in. (If release builds fail
to link due to missing BORINGSSL_function_hit, this is the cause.)
Change-Id: I6b7ced442b7a77d0b4ae148b00c351f68af89a6e
Reviewed-on: https://boringssl-review.googlesource.com/c/33384
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
The multiplication and subtraction circuits were found by djb using GNU
Superoptimizer, and the addition circuit is derived from the subtraction
one by hand. They depend on a different representation: -1 is now (1, 1)
rather than (1, 0), and the latter becomes undefined.
The following Python program checks that the circuits work:
values = [0, 1, -1]
def toBits(v):
if v == 0:
return 0, 0
elif v == 1:
return 0, 1
elif v == -1:
return 1, 1
else:
raise ValueError(v)
def mul((s1, a1), (s2, a2)):
return ((s1 ^ s2) & a1 & a2, a1 & a2)
def add((s1, a1), (s2, a2)):
t = s1 ^ a2
return (t & (s2 ^ a1), (a1 ^ a2) | (t ^ s2))
def sub((s1, a1), (s2, a2)):
t = a1 ^ a2
return ((s1 ^ a2) & (t ^ s2), t | (s1 ^ s2))
def fromBits((s, a)):
if s == 0 and a == 0:
return 0
if s == 0 and a == 1:
return 1
if s == 1 and a == 1:
return -1
else:
raise ValueError((s, a))
def wrap(v):
if v == 2:
return -1
elif v == -2:
return 1
else:
return v
for v1 in values:
for v2 in values:
print v1, v2
result = fromBits(mul(toBits(v1), toBits(v2)))
if result != v1 * v2:
raise ValueError((v1, v2, result))
result = fromBits(add(toBits(v1), toBits(v2)))
if result != wrap(v1 + v2):
raise ValueError((v1, v2, result))
result = fromBits(sub(toBits(v1), toBits(v2)))
if result != wrap(v1 - v2):
raise ValueError((v1, v2, result))
Change-Id: Ie1a4ca5a82c2651057efc62330eca6fdd9878122
Reviewed-on: https://boringssl-review.googlesource.com/c/34344
Reviewed-by: David Benjamin <davidben@google.com>
Since |ssl_renegotiate_never| is the default, this option is moot.
However, OpenSSL defines and supports it so this helps code that wishes
to support both.
Change-Id: I3a2f6e93a078d39526d10f9cd0a990953bd45825
Reviewed-on: https://boringssl-review.googlesource.com/c/34384
Reviewed-by: Adam Langley <alangley@gmail.com>
Commit-Queue: Adam Langley <alangley@gmail.com>