The first attempt involved using Linux's support for hardware
breakpoints to detect when assembly code was run. However, this doesn't
work with SDE, which is a problem.
This version has the assembly code update a global flags variable when
it's run, but only in non-FIPS and non-debug builds.
Update-Note: Assembly files now pay attention to the NDEBUG preprocessor
symbol. Ensure the build passes the symbol in. (If release builds fail
to link due to missing BORINGSSL_function_hit, this is the cause.)
Change-Id: I6b7ced442b7a77d0b4ae148b00c351f68af89a6e
Reviewed-on: https://boringssl-review.googlesource.com/c/33384
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
The Clang used in the Android SDK, at least, defines both __ARM_NEON__
and __ARM_NEON for ARMv7, but only the latter for AArch64.
This change switches each use of __ARM_NEON__ to accept either.
Change-Id: I3b5d5badc9ff0210888fd456e9329dc53a2b9b09
Reviewed-on: https://boringssl-review.googlesource.com/c/33104
Commit-Queue: Adam Langley <alangley@gmail.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
C and C++ handle inline functions differently. In C++, an inline function is
defined in just the header file, potentially emitted in multiple compilation
units (in cases the compiler did not inline), but each copy must be identical
to satsify ODR. In C, a non-static inline must be manually emitted in exactly
one compilation unit with a separate extern inline declaration.
In both languages, exported inline functions referencing file-local symbols are
problematic. C forbids this altogether (though GCC and Clang seem not to
enforce it). It works in C++, but ODR requires the definitions be identical,
including all names in the definitions resolving to the "same entity". In
practice, this is unlikely to be a problem, but an inline function that returns
a pointer to a file-local symbol could compile oddly.
Historically, we used static inline in headers. However, to satisfy ODR, use
plain inline in C++, to allow inline consumer functions to call our header
functions. Plain inline would also work better with C99 inline, but that is not
used much in practice, extern inline is tedious, and there are conflicts with
the old gnu89 model: https://stackoverflow.com/questions/216510/extern-inline
For dual C/C++ code, use a macro to dispatch between these. For C++-only
code, stop using static inline and just use plain inline.
Update-Note: If you see weird C++ compile or link failures in header
functions, this change is probably to blame. Though this change
doesn't affect C and non-static inline is extremely common in C++,
so I would expect this to be fine.
Change-Id: Ibb0bf8ff57143fc14e10342854e467f85a5e4a82
Reviewed-on: https://boringssl-review.googlesource.com/32116
Commit-Queue: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
ssl is all that's left. Will do that once that's at a quiet point.
Change-Id: Ia183aed5671e3b2de333def138d7f2c9296fb517
Reviewed-on: https://boringssl-review.googlesource.com/19564
Commit-Queue: David Benjamin <davidben@google.com>
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
Originally we had some confusion around whether the features could be
toggled individually or not. Per the ARM C Language Extensions doc[1],
__ARM_FEATURE_CRYPTO implies the "crypto extension" which encompasses
all of them. The runtime CPUID equivalent can report the features
individually, but it seems no one separates them in practice, for now.
(If they ever do, probably there'll be a new set of #defines.)
[1] http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053c/IHI0053C_acle_2_0.pdf
Change-Id: I12915dfc308f58fb005286db75e50d8328eeb3ea
Reviewed-on: https://boringssl-review.googlesource.com/16991
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
OPENSSL_ia32cap_addr avoids any relocations within the module, at the
cost of a runtime TEXTREL, which causes problems in some cases.
(Notably, if someone links us into a binary which uses the GCC "ifunc"
attribute, the loader crashes.)
Fix C references of OPENSSL_ia32cap_addr with a function. This is
analogous to the BSS getters. A follow-up commit will fix perlasm with a
different scheme which avoids calling into a function (clobbering
registers and complicating unwind directives.)
Change-Id: I09d6cda4cec35b693e16b5387611167da8c7a6de
Reviewed-on: https://boringssl-review.googlesource.com/15525
Reviewed-by: Adam Langley <agl@google.com>
(Thanks to Sam Panzer for the patch.)
At least some linkers will drop constructor functions if no symbols from
that translation unit are used elsewhere in the program. On POWER, since
the cached capability value isn't a global in crypto.o (like other
platforms), the constructor function is getting discarded.
The C++11 spec says (3.6.2, paragraph 4):
It is implementation-defined whether the dynamic initialization of a
non-local variable with static storage duration is done before the
first statement of main. If the initialization is deferred to some
point in time after the first statement of main, it shall occur
before the first odr-use (3.2) of any function or variable defined
in the same translation unit as the variable to be initialized.
Compilers appear to interpret that to mean they are allowed to drop
(i.e. indefinitely defer) constructors that occur in translation units
that are never used, so they can avoid initializing some part of a
library if it's dropped on the floor.
This change makes the hardware capability value for POWER a global in
crypto.c, which should prevent the constructor function from being
ignored.
Change-Id: I43ebe492d0ac1491f6f6c2097971a277f923dd3e
Reviewed-on: https://boringssl-review.googlesource.com/14664
Commit-Queue: Adam Langley <agl@google.com>
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: David Benjamin <davidben@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
This change adds AES and GHASH assembly from upstream, with the aim of
speeding up AES-GCM.
The PPC64LE assembly matches the interface of the ARMv8 assembly so I've
changed the prefix of both sets of asm functions to be the same
("aes_hw_").
Otherwise, the new assmebly files and Perlasm match exactly those from
upstream's c536b6be1a (from their master branch).
Before:
Did 1879000 AES-128-GCM (16 bytes) seal operations in 1000428us (1878196.1 ops/sec): 30.1 MB/s
Did 61000 AES-128-GCM (1350 bytes) seal operations in 1006660us (60596.4 ops/sec): 81.8 MB/s
Did 11000 AES-128-GCM (8192 bytes) seal operations in 1072649us (10255.0 ops/sec): 84.0 MB/s
Did 1665000 AES-256-GCM (16 bytes) seal operations in 1000591us (1664016.6 ops/sec): 26.6 MB/s
Did 52000 AES-256-GCM (1350 bytes) seal operations in 1006971us (51640.0 ops/sec): 69.7 MB/s
Did 8840 AES-256-GCM (8192 bytes) seal operations in 1013294us (8724.0 ops/sec): 71.5 MB/s
After:
Did 4994000 AES-128-GCM (16 bytes) seal operations in 1000017us (4993915.1 ops/sec): 79.9 MB/s
Did 1389000 AES-128-GCM (1350 bytes) seal operations in 1000073us (1388898.6 ops/sec): 1875.0 MB/s
Did 319000 AES-128-GCM (8192 bytes) seal operations in 1000101us (318967.8 ops/sec): 2613.0 MB/s
Did 4668000 AES-256-GCM (16 bytes) seal operations in 1000149us (4667304.6 ops/sec): 74.7 MB/s
Did 1202000 AES-256-GCM (1350 bytes) seal operations in 1000646us (1201224.0 ops/sec): 1621.7 MB/s
Did 269000 AES-256-GCM (8192 bytes) seal operations in 1002804us (268247.8 ops/sec): 2197.5 MB/s
Change-Id: Id848562bd4e1aa79a4683012501dfa5e6c08cfcc
Reviewed-on: https://boringssl-review.googlesource.com/11262
Reviewed-by: Adam Langley <agl@google.com>
Commit-Queue: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
If we're to allow the buggy CPU workaround to fire when __ARM_NEON__ is set,
CRYPTO_is_NEON_capable also needs to be aware of it. Also add an API to export
this value out of BoringSSL, so we can get some metrics on how prevalent this
chip is.
BUG=chromium:606629
Change-Id: I97d65a47a6130689098b32ce45a8c57c468aa405
Reviewed-on: https://boringssl-review.googlesource.com/7796
Reviewed-by: Adam Langley <agl@google.com>
This removes the thread-unsafe SIGILL-based detection and the
multi-consumer-hostile CRYPTO_set_NEON_capable API. (Changing
OPENSSL_armcap_P after initialization is likely to cause problems.)
The right way to detect ARM features on Linux is getauxval. On aarch64,
we should be able to rely on this, so use it straight. Split this out
into its own file. The #ifdefs in the old cpu-arm.c meant it shared all
but no code with its arm counterpart anyway.
Unfortunately, various versions of Android have different missing APIs, so, on
arm, we need a series of workarounds. Previously, we used a SIGILL fallback
based on OpenSSL's logic, but this is inherently not thread-safe. (SIGILL also
does not tell us if the OS knows how to save and restore NEON state.) Instead,
base the behavior on Android NDK's cpu-features library, what Chromium
currently uses with CRYPTO_set_NEON_capable:
- Android before API level 20 does not provide getauxval. Where missing,
we can read from /proc/self/auxv.
- On some versions of Android, /proc/self/auxv is also not readable, so
use /proc/cpuinfo's Features line.
- Linux only advertises optional features in /proc/cpuinfo. ARMv8 makes NEON
mandatory, so /proc/cpuinfo can't be used without additional effort.
Finally, we must blacklist a particular chip because the NEON unit is broken
(https://crbug.com/341598).
Unfortunately, this means CRYPTO_library_init now depends on /proc being
available, which will require some care with Chromium's sandbox. The
simplest solution is to just call CRYPTO_library_init before entering
the sandbox.
It's worth noting that Chromium's current EnsureOpenSSLInit function already
depends on /proc/cpuinfo to detect the broken CPU, by way of base::CPU.
android_getCpuFeatures also interally depends on it. We were already relying on
both of those being stateful and primed prior to entering the sandbox.
BUG=chromium:589200
Change-Id: Ic5d1c341aab5a614eb129d8aa5ada2809edd6af8
Reviewed-on: https://boringssl-review.googlesource.com/7506
Reviewed-by: David Benjamin <davidben@google.com>
This depends on https://codereview.chromium.org/1730823002/. The bit was only
ever targetted to one (rather old) CPU. Disable NEON on it uniformly, so we
don't have to worry about whether any new NEON code breaks it.
BUG=589200
Change-Id: Icc7d17d634735aca5425fe0a765ec2fba3329326
Reviewed-on: https://boringssl-review.googlesource.com/7211
Reviewed-by: Adam Langley <agl@google.com>
If -mfpu=neon is passed then we don't need to worry about checking for
NEON support at run time. This change allows |CRYPTO_is_NEON_capable| to
statically return 1 in this case. This then allows the compiler to
discard generic code in several cases.
Change-Id: I3b229740ea3d5cb0a304f365c400a0996d0c66ef
Reviewed-on: https://boringssl-review.googlesource.com/6523
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
The other codepath is Linux-specific. This should get tidied up a bit but, in
the meantime, fix the Chromium iOS (NO_ASM) build. Even when the assembly gets
working, it seems iOS prefers you make fat binaries rather than detect features
at runtime, so this is what we want anyway.
BUG=548539
Change-Id: If19b2e380a96918b07bacc300a3a27b885697b99
Reviewed-on: https://boringssl-review.googlesource.com/6380
Reviewed-by: Adam Langley <agl@google.com>
Some ARM environments don't support |getauxval| or signals and need to
configure the capabilities of the chip at compile time. This change adds
defines that allow them to do so.
Change-Id: I4e6987f69dd13444029bc7ac7ed4dbf8fb1faa76
Reviewed-on: https://boringssl-review.googlesource.com/6280
Reviewed-by: Adam Langley <agl@google.com>
Rather, take a leaf out of Chromium's book and use MSVC's __cpuid and
_xgetbv built-in, with an inline assembly emulated version for other
compilers.
This preserves the behavior of the original assembly with the following
differences:
- CPUs without cpuid aren't support. Chromium's base/cpu.cc doesn't
check, and SSE2 support is part of our baseline; the perlasm code
is always built with OPENSSL_IA32_SSE2.
- The clear_xmm block in cpu-x86-asm.pl is removed. This was used to
clear some XMM-using features if OSXSAVE was set but XCR0 reports the
OS doesn't use XSAVE to store SSE state. This wasn't present in the
x86_64 and seems wrong. Section 13.5.2 of the Intel manual, volume 1,
explicitly says SSE may still be used in this case; the OS may save
that state in FXSAVE instead. A side discussion on upstream's RT#2633
agrees.
- The old code ran some AMD CPUs through the "intel" codepath and some
went straight to "generic" after duplicating some, but not all, logic.
The AMD copy didn't clear some reserved bits and didn't query CPUID 7
for AVX2 support. This is moot since AMD CPUs today don't support
AVX2, but it seems they're expected to in the future?
- Setting bit 10 is dropped. This doesn't appear to be queried anywhere,
was 32-bit only, and seems a remnant of upstream's
14e21f863a3e3278bb8660ea9844e92e52e1f2f7.
Change-Id: I0548877c97e997f7beb25e15f3fea71c68a951d2
Reviewed-on: https://boringssl-review.googlesource.com/5434
Reviewed-by: Adam Langley <agl@google.com>
Some other reserved bits are repurposed. Also explicitly mention that
bit 20 is zero (formerly RC4_CHAR), so it's not accidentally repurposed
later.
Change-Id: Idc4b32efe089ae7b7295472c4488f75258b7f962
Reviewed-on: https://boringssl-review.googlesource.com/5432
Reviewed-by: Adam Langley <agl@google.com>
RC4_CHAR is a bit in the x86(-64) CPUID information that switches the
RC4 asm code from using an array of 256 uint32_t's to 256 uint8_t's. It
was originally written for the P4, where the uint8_t style was faster.
(On modern chips, setting RC4_CHAR took RC4-MD5 from 458 to 304 MB/s.
Although I wonder whether, on a server with many connections, using less
cache wouldn't be better.)
However, I'm not too worried about a slowdown of RC4 on P4 systems these
days (the last new P4 chip was released nine years ago) and I want the
code to be simplier.
Also, RC4_CHAR was set when the CPUID family was 15, but Intel actually
lists 15 as a special code meaning "also check the extended family
bits", which the asm didn't do.
The RC4_CHAR support remains in the RC4 asm code to avoid drift with
upstream.
Change-Id: If3febc925a83a76f453b9e9f8de5ee43759927c6
Reviewed-on: https://boringssl-review.googlesource.com/3550
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
This is an initial cut at aarch64 support. I have only qemu to test it
however—hopefully hardware will be coming soon.
This also affects 32-bit ARM in that aarch64 chips can run 32-bit code
and we would like to be able to take advantage of the crypto operations
even in 32-bit mode. AES and GHASH should Just Work in this case: the
-armx.pl files can be built for either 32- or 64-bit mode based on the
flavour argument given to the Perl script.
SHA-1 and SHA-256 don't work like this however because they've never
support for multiple implementations, thus BoringSSL built for 32-bit
won't use the SHA instructions on an aarch64 chip.
No dedicated ChaCha20 or Poly1305 support yet.
Change-Id: Ib275bc4894a365c8ec7c42f4e91af6dba3bd686c
Reviewed-on: https://boringssl-review.googlesource.com/2801
Reviewed-by: Adam Langley <agl@google.com>
Add some missing headers and ensure each header has a short description. doc.go
gets confused at declarations that break before the first (, so avoid doing
that. Also skip a/an/deprecated: in markupFirstWord and process pipe words in
the table of contents.
Change-Id: Ia08ec5ae8e496dd617e377e154eeea74f4abf435
Reviewed-on: https://boringssl-review.googlesource.com/2839
Reviewed-by: Adam Langley <agl@google.com>
Otherwise, in C, it becomes a K&R function declaration which doesn't actually
type-check the number of arguments.
Change-Id: I0731a9fefca46fb1c266bfb1c33d464cf451a22e
Reviewed-on: https://boringssl-review.googlesource.com/1582
Reviewed-by: Adam Langley <agl@google.com>
Some phones have a buggy NEON unit and the Poly1305 NEON code fails on
them, even though other NEON code appears to work fine.
This change:
1) Fixes a bug where NEON was assumed even when the code wasn't compiled
in NEON mode.
2) Adds a second NEON control bit that can be disabled in order to run
NEON code, but not the Poly1305 NEON code.
https://code.google.com/p/chromium/issues/detail?id=341598
Change-Id: Icb121bf8dba47c7a46c7667f676ff7a4bc973625
Reviewed-on: https://boringssl-review.googlesource.com/1351
Reviewed-by: Adam Langley <agl@google.com>
This change marks public symbols as dynamically exported. This means
that it becomes viable to build a shared library of libcrypto and libssl
with -fvisibility=hidden.
On Windows, one not only needs to mark functions for export in a
component, but also for import when using them from a different
component. Because of this we have to build with
|BORINGSSL_IMPLEMENTATION| defined when building the code. Other
components, when including our headers, won't have that defined and then
the |OPENSSL_EXPORT| tag becomes an import tag instead. See the #defines
in base.h
In the asm code, symbols are now hidden by default and those that need
to be exported are wrapped by a C function.
In order to support Chromium, a couple of libssl functions were moved to
ssl.h from ssl_locl.h: ssl_get_new_session and ssl_update_cache.
Change-Id: Ib4b76e2f1983ee066e7806c24721e8626d08a261
Reviewed-on: https://boringssl-review.googlesource.com/1350
Reviewed-by: Adam Langley <agl@google.com>
Previously, public headers lived next to the respective code and there
were symlinks from include/openssl to them.
This doesn't work on Windows.
This change moves the headers to live in include/openssl. In cases where
some symlinks pointed to the same header, I've added a file that just
includes the intended target. These cases are all for backwards-compat.
Change-Id: I6e285b74caf621c644b5168a4877db226b07fd92
Reviewed-on: https://boringssl-review.googlesource.com/1180
Reviewed-by: David Benjamin <davidben@chromium.org>
Reviewed-by: Adam Langley <agl@google.com>
Initial fork from f2d678e6e89b6508147086610e985d4e8416e867 (1.0.2 beta).
(This change contains substantial changes from the original and
effectively starts a new history.)