boringssl/BUILDING.md
David Benjamin 35be688078 Enable upstream's ChaCha20 assembly for x86 and ARM (32- and 64-bit).
This removes chacha_vec_arm.S and chacha_vec.c in favor of unifying on
upstream's code. Upstream's is faster and this cuts down on the number of
distinct codepaths. Our old scheme also didn't give vectorized code on
Windows or aarch64.

BoringSSL-specific modifications made to the assembly:

- As usual, the shelling out to $CC is replaced with hardcoding $avx. I've
  tested up to the AVX2 codepath, so enable it all.

- I've removed the AMD XOP code as I have not tested it.

- As usual, the ARM file need the arm_arch.h include tweaked.

Speed numbers follow. We can hope for further wins on these benchmarks after
importing the Poly1305 assembly.

x86
---
Old:
Did 1422000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000433us (1421384.5 ops/sec): 22.7 MB/s
Did 123000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1003803us (122534.0 ops/sec): 165.4 MB/s
Did 22000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1000282us (21993.8 ops/sec): 180.2 MB/s
Did 1428000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000214us (1427694.5 ops/sec): 22.8 MB/s
Did 124000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1006332us (123219.8 ops/sec): 166.3 MB/s
Did 22000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1020771us (21552.3 ops/sec): 176.6 MB/s
New:
Did 1520000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000567us (1519138.6 ops/sec): 24.3 MB/s
Did 152000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1004216us (151361.9 ops/sec): 204.3 MB/s
Did 31000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1009085us (30720.9 ops/sec): 251.7 MB/s
Did 1797000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000141us (1796746.7 ops/sec): 28.7 MB/s
Did 171000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1003204us (170453.9 ops/sec): 230.1 MB/s
Did 31000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1005349us (30835.1 ops/sec): 252.6 MB/s

x86_64, no AVX2
---
Old:
Did 1782000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000204us (1781636.5 ops/sec): 28.5 MB/s
Did 317000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001579us (316500.2 ops/sec): 427.3 MB/s
Did 62000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1012146us (61256.0 ops/sec): 501.8 MB/s
Did 1778000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000220us (1777608.9 ops/sec): 28.4 MB/s
Did 315000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1002886us (314093.5 ops/sec): 424.0 MB/s
Did 71000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1014606us (69977.9 ops/sec): 573.3 MB/s
New:
Did 1866000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000019us (1865964.5 ops/sec): 29.9 MB/s
Did 399000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001017us (398594.6 ops/sec): 538.1 MB/s
Did 84000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1005645us (83528.5 ops/sec): 684.3 MB/s
Did 1881000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000325us (1880388.9 ops/sec): 30.1 MB/s
Did 404000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000004us (403998.4 ops/sec): 545.4 MB/s
Did 85000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1010048us (84154.4 ops/sec): 689.4 MB/s

x86_64, AVX2
---
Old:
Did 2375000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000282us (2374330.4 ops/sec): 38.0 MB/s
Did 448000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001865us (447166.0 ops/sec): 603.7 MB/s
Did 88000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1005217us (87543.3 ops/sec): 717.2 MB/s
Did 2409000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000188us (2408547.2 ops/sec): 38.5 MB/s
Did 446000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1001003us (445553.1 ops/sec): 601.5 MB/s
Did 90000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1006722us (89399.1 ops/sec): 732.4 MB/s
New:
Did 2622000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000266us (2621302.7 ops/sec): 41.9 MB/s
Did 794000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1000783us (793378.8 ops/sec): 1071.1 MB/s
Did 173000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1000176us (172969.6 ops/sec): 1417.0 MB/s
Did 2623000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000330us (2622134.7 ops/sec): 42.0 MB/s
Did 783000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000531us (782584.4 ops/sec): 1056.5 MB/s
Did 174000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1000840us (173854.0 ops/sec): 1424.2 MB/s

arm, Nexus 4
---
Old:
Did 388550 ChaCha20-Poly1305 (16 bytes) seal operations in 1000580us (388324.8 ops/sec): 6.2 MB/s
Did 90000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1003816us (89657.9 ops/sec): 121.0 MB/s
Did 19000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1045750us (18168.8 ops/sec): 148.8 MB/s
Did 398500 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000305us (398378.5 ops/sec): 6.4 MB/s
Did 90500 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1000305us (90472.4 ops/sec): 122.1 MB/s
Did 19000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1043278us (18211.8 ops/sec): 149.2 MB/s
New:
Did 424788 ChaCha20-Poly1305 (16 bytes) seal operations in 1000641us (424515.9 ops/sec): 6.8 MB/s
Did 115000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1001526us (114824.8 ops/sec): 155.0 MB/s
Did 27000 ChaCha20-Poly1305 (8192 bytes) seal operations in 1033023us (26136.9 ops/sec): 214.1 MB/s
Did 447750 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000549us (447504.3 ops/sec): 7.2 MB/s
Did 117500 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1001923us (117274.5 ops/sec): 158.3 MB/s
Did 27000 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1025118us (26338.4 ops/sec): 215.8 MB/s

aarch64, Nexus 6p
(Note we didn't have aarch64 assembly before at all, and still don't have it
for Poly1305. Hopefully once that's added this will be faster than the arm
numbers...)
---
Old:
Did 145040 ChaCha20-Poly1305 (16 bytes) seal operations in 1003065us (144596.8 ops/sec): 2.3 MB/s
Did 14000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1042605us (13427.9 ops/sec): 18.1 MB/s
Did 2618 ChaCha20-Poly1305 (8192 bytes) seal operations in 1093241us (2394.7 ops/sec): 19.6 MB/s
Did 148000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000709us (147895.1 ops/sec): 2.4 MB/s
Did 14000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1047294us (13367.8 ops/sec): 18.0 MB/s
Did 2607 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1090745us (2390.1 ops/sec): 19.6 MB/s
New:
Did 358000 ChaCha20-Poly1305 (16 bytes) seal operations in 1000769us (357724.9 ops/sec): 5.7 MB/s
Did 45000 ChaCha20-Poly1305 (1350 bytes) seal operations in 1021267us (44062.9 ops/sec): 59.5 MB/s
Did 8591 ChaCha20-Poly1305 (8192 bytes) seal operations in 1047136us (8204.3 ops/sec): 67.2 MB/s
Did 343000 ChaCha20-Poly1305-Old (16 bytes) seal operations in 1000489us (342832.4 ops/sec): 5.5 MB/s
Did 44000 ChaCha20-Poly1305-Old (1350 bytes) seal operations in 1008326us (43636.7 ops/sec): 58.9 MB/s
Did 8866 ChaCha20-Poly1305-Old (8192 bytes) seal operations in 1083341us (8183.9 ops/sec): 67.0 MB/s

Change-Id: I629fe195d072f2c99e8f947578fad6d70823c4c8
Reviewed-on: https://boringssl-review.googlesource.com/7202
Reviewed-by: Adam Langley <agl@google.com>
2016-02-23 17:19:45 +00:00

5.6 KiB

Building BoringSSL

Build Prerequisites

  • CMake 2.8.8 or later is required.

  • Perl 5.6.1 or later is required. On Windows, Active State Perl has been reported to work, as has MSYS Perl. Strawberry Perl also works but it adds GCC to PATH, which can confuse some build tools when identifying the compiler (removing C:\Strawberry\c\bin from PATH should resolve any problems). If Perl is not found by CMake, it may be configured explicitly by setting PERL_EXECUTABLE.

  • On Windows you currently must use Ninja to build; on other platforms, it is not required, but recommended, because it makes builds faster.

  • If you need to build Ninja from source, then a recent version of Python is required (Python 2.7.5 works).

  • On Windows only, Yasm is required. If not found by CMake, it may be configured explicitly by setting CMAKE_ASM_NASM_COMPILER.

  • A C compiler is required. On Windows, MSVC 12 (Visual Studio 2013) or later with Platform SDK 8.1 or later are supported. Recent versions of GCC (4.8+) and Clang should work on non-Windows platforms, and maybe on Windows too.

  • Go is required. If not found by CMake, the go executable may be configured explicitly by setting GO_EXECUTABLE.

Building

Using Ninja (note the 'N' is capitalized in the cmake invocation):

mkdir build
cd build
cmake -GNinja ..
ninja

Using Make (does not work on Windows):

mkdir build
cd build
cmake ..
make

You usually don't need to run cmake again after changing CMakeLists.txt files because the build scripts will detect changes to them and rebuild themselves automatically.

Note that the default build flags in the top-level CMakeLists.txt are for debugging—optimisation isn't enabled. Pass -DCMAKE_BUILD_TYPE=Release to cmake to configure a release build.

If you want to cross-compile then there is an example toolchain file for 32-bit Intel in util/. Wipe out the build directory, recreate it and run cmake like this:

cmake -DCMAKE_TOOLCHAIN_FILE=../util/32-bit-toolchain.cmake -GNinja ..

If you want to build as a shared library, pass -DBUILD_SHARED_LIBS=1. On Windows, where functions need to be tagged with dllimport when coming from a shared library, define BORINGSSL_SHARED_LIBRARY in any code which #includes the BoringSSL headers.

In order to serve environments where code-size is important as well as those where performance is the overriding concern, OPENSSL_SMALL can be defined to remove some code that is especially large.

See CMake's documentation for other variables which may be used to configure the build.

Building for Android

It's possible to build BoringSSL with the Android NDK using CMake. This has been tested with version 10d of the NDK.

Unpack the Android NDK somewhere and export ANDROID_NDK to point to the directory. Clone https://github.com/taka-no-me/android-cmake into util/. Then make a build directory as above and run CMake twice like this:

cmake -DANDROID_NATIVE_API_LEVEL=android-9 \
      -DANDROID_ABI=armeabi-v7a \
      -DCMAKE_TOOLCHAIN_FILE=../util/android-cmake/android.toolchain.cmake \
      -DANDROID_NATIVE_API_LEVEL=16 \
      -GNinja ..

Once you've run that twice, Ninja should produce Android-compatible binaries. You can replace armeabi-v7a in the above with arm64-v8a to build aarch64 binaries.

Known Limitations on Windows

  • Versions of CMake since 3.0.2 have a bug in its Ninja generator that causes yasm to output warnings

    yasm: warning: can open only one input file, only the last file will be processed
    

    These warnings can be safely ignored. The cmake bug is http://www.cmake.org/Bug/view.php?id=15253.

  • CMake can generate Visual Studio projects, but the generated project files don't have steps for assembling the assembly language source files, so they currently cannot be used to build BoringSSL.

Embedded ARM

ARM, unlike Intel, does not have an instruction that allows applications to discover the capabilities of the processor. Instead, the capability information has to be provided by the operating system somehow.

BoringSSL will try to use getauxval to discover the capabilities and, failing that, will probe for NEON support by executing a NEON instruction and handling any illegal-instruction signal. But some environments don't support that sort of thing and, for them, it's possible to configure the CPU capabilities at compile time.

If you define OPENSSL_STATIC_ARMCAP then you can define any of the following to enabling the corresponding ARM feature.

  • OPENSSL_STATIC_ARMCAP_NEON or __ARM_NEON__ (note that the latter is set by compilers when NEON support is enabled).
  • OPENSSL_STATIC_ARMCAP_AES
  • OPENSSL_STATIC_ARMCAP_SHA1
  • OPENSSL_STATIC_ARMCAP_SHA256
  • OPENSSL_STATIC_ARMCAP_PMULL

Note that if a feature is enabled in this way, but not actually supported at run-time, BoringSSL will likely crash.

Running tests

There are two sets of tests: the C/C++ tests and the blackbox tests. For former are built by Ninja and can be run from the top-level directory with go run util/all_tests.go. The latter have to be run separately by running go test from within ssl/test/runner.

Both sets of tests may also be run with ninja -C build run_tests, but CMake 3.2 or later is required to avoid Ninja's output buffering.