boringssl/crypto/fipsmodule/CMakeLists.txt
David Benjamin 55db667c62 Enable vpaes for aarch64, with CTR optimizations.
This patches vpaes-armv8.pl to add vpaes_ctr32_encrypt_blocks. CTR mode
is by far the most important mode these days. It should have access to
_vpaes_encrypt_2x, which gives a considerable speed boost. Also exclude
vpaes_ecb_* as they're not even used.

For iOS, this change is completely a no-op. iOS ARMv8 always has crypto
extensions, and we already statically drop all other AES
implementations.

Android ARMv8 is *not* required to have crypto extensions, but every
ARMv8 device I've seen has them. For those, it is a no-op
performance-wise and a win on size. vpaes appears to be about 5.6KiB
smaller than the tables. ARMv8 always makes SIMD (NEON) available, so we
can statically drop aes_nohw.

In theory, however, crypto-less Android ARMv8 is possible. Today such
chips get a variable-time AES. This CL fixes this, but the performance
story is complex.

The Raspberry Pi 3 is not Android but has a Cortex-A53 chip
without crypto extensions. (But the official images are 32-bit, so even
this is slightly artificial...) There, vpaes is a performance win.

Raspberry Pi 3, Model B+, Cortex-A53
Before:
Did 265000 AES-128-GCM (16 bytes) seal operations in 1003312us (264125.2 ops/sec): 4.2 MB/s
Did 44000 AES-128-GCM (256 bytes) seal operations in 1002141us (43906.0 ops/sec): 11.2 MB/s
Did 9394 AES-128-GCM (1350 bytes) seal operations in 1032104us (9101.8 ops/sec): 12.3 MB/s
Did 1562 AES-128-GCM (8192 bytes) seal operations in 1008982us (1548.1 ops/sec): 12.7 MB/s
After:
Did 277000 AES-128-GCM (16 bytes) seal operations in 1001884us (276479.1 ops/sec): 4.4 MB/s
Did 52000 AES-128-GCM (256 bytes) seal operations in 1001480us (51923.2 ops/sec): 13.3 MB/s
Did 11000 AES-128-GCM (1350 bytes) seal operations in 1007979us (10912.9 ops/sec): 14.7 MB/s
Did 2013 AES-128-GCM (8192 bytes) seal operations in 1085545us (1854.4 ops/sec): 15.2 MB/s

The Pixel 3 has a Cortex-A75 with crypto extensions, so it would never
run this code. However, artificially ignoring them gives another data
point (ARM documentation[*] suggests the extensions are still optional
on a Cortex-A75.) Sadly, vpaes no longer wins on perf over aes_nohw.
But, it is constant-time:

Pixel 3, AES/PMULL extensions ignored, Cortex-A75:
Before:
Did 2102000 AES-128-GCM (16 bytes) seal operations in 1000378us (2101205.7 ops/sec): 33.6 MB/s
Did 358000 AES-128-GCM (256 bytes) seal operations in 1002658us (357051.0 ops/sec): 91.4 MB/s
Did 75000 AES-128-GCM (1350 bytes) seal operations in 1012830us (74049.9 ops/sec): 100.0 MB/s
Did 13000 AES-128-GCM (8192 bytes) seal operations in 1036524us (12541.9 ops/sec): 102.7 MB/s
After:
Did 1453000 AES-128-GCM (16 bytes) seal operations in 1000213us (1452690.6 ops/sec): 23.2 MB/s
Did 285000 AES-128-GCM (256 bytes) seal operations in 1002227us (284366.7 ops/sec): 72.8 MB/s
Did 60000 AES-128-GCM (1350 bytes) seal operations in 1016106us (59049.0 ops/sec): 79.7 MB/s
Did 11000 AES-128-GCM (8192 bytes) seal operations in 1094184us (10053.2 ops/sec): 82.4 MB/s

Note the numbers above run with PMULL off, so the slow GHASH is
dampening the regression. If we test aes_nohw and vpaes paired with
PMULL on, the 20% perf hit becomes a 31% hit. The PMULL-less variant is
more likely to represent a real chip.

This is consistent with upstream's note in the comment, though it is
unclear if 20% is the right order of magnitude: "these results are worse
than scalar compiler-generated code, but it's constant-time and
therefore preferred".

[*] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100458_0301_00_en/lau1442495529696.html

Bug: 246
Change-Id: If1dc87f5131fce742052498295476fbae4628dbf
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/35026
Commit-Queue: David Benjamin <davidben@google.com>
Reviewed-by: Adam Langley <agl@google.com>
2019-03-04 20:31:39 +00:00

212 lines
6.1 KiB
CMake

include_directories(../../include)
if(${ARCH} STREQUAL "x86_64")
set(
BCM_ASM_SOURCES
aesni-gcm-x86_64.${ASM_EXT}
aesni-x86_64.${ASM_EXT}
aes-x86_64.${ASM_EXT}
bsaes-x86_64.${ASM_EXT}
ghash-ssse3-x86_64.${ASM_EXT}
ghash-x86_64.${ASM_EXT}
md5-x86_64.${ASM_EXT}
p256-x86_64-asm.${ASM_EXT}
p256_beeu-x86_64-asm.${ASM_EXT}
rdrand-x86_64.${ASM_EXT}
rsaz-avx2.${ASM_EXT}
sha1-x86_64.${ASM_EXT}
sha256-x86_64.${ASM_EXT}
sha512-x86_64.${ASM_EXT}
vpaes-x86_64.${ASM_EXT}
x86_64-mont5.${ASM_EXT}
x86_64-mont.${ASM_EXT}
)
endif()
if(${ARCH} STREQUAL "x86")
set(
BCM_ASM_SOURCES
aes-586.${ASM_EXT}
aesni-x86.${ASM_EXT}
bn-586.${ASM_EXT}
co-586.${ASM_EXT}
ghash-ssse3-x86.${ASM_EXT}
ghash-x86.${ASM_EXT}
md5-586.${ASM_EXT}
sha1-586.${ASM_EXT}
sha256-586.${ASM_EXT}
sha512-586.${ASM_EXT}
vpaes-x86.${ASM_EXT}
x86-mont.${ASM_EXT}
)
endif()
if(${ARCH} STREQUAL "arm")
set(
BCM_ASM_SOURCES
aes-armv4.${ASM_EXT}
aesv8-armx.${ASM_EXT}
armv4-mont.${ASM_EXT}
bsaes-armv7.${ASM_EXT}
ghash-armv4.${ASM_EXT}
ghashv8-armx.${ASM_EXT}
sha1-armv4-large.${ASM_EXT}
sha256-armv4.${ASM_EXT}
sha512-armv4.${ASM_EXT}
)
endif()
if(${ARCH} STREQUAL "aarch64")
set(
BCM_ASM_SOURCES
aesv8-armx.${ASM_EXT}
armv8-mont.${ASM_EXT}
ghashv8-armx.${ASM_EXT}
sha1-armv8.${ASM_EXT}
sha256-armv8.${ASM_EXT}
sha512-armv8.${ASM_EXT}
vpaes-armv8.${ASM_EXT}
)
endif()
if(${ARCH} STREQUAL "ppc64le")
set(
BCM_ASM_SOURCES
aesp8-ppc.${ASM_EXT}
ghashp8-ppc.${ASM_EXT}
)
endif()
perlasm(aes-586.${ASM_EXT} aes/asm/aes-586.pl)
perlasm(aes-armv4.${ASM_EXT} aes/asm/aes-armv4.pl)
perlasm(aesni-gcm-x86_64.${ASM_EXT} modes/asm/aesni-gcm-x86_64.pl)
perlasm(aesni-x86_64.${ASM_EXT} aes/asm/aesni-x86_64.pl)
perlasm(aesni-x86.${ASM_EXT} aes/asm/aesni-x86.pl)
perlasm(aesp8-ppc.${ASM_EXT} aes/asm/aesp8-ppc.pl)
perlasm(aesv8-armx.${ASM_EXT} aes/asm/aesv8-armx.pl)
perlasm(aes-x86_64.${ASM_EXT} aes/asm/aes-x86_64.pl)
perlasm(armv4-mont.${ASM_EXT} bn/asm/armv4-mont.pl)
perlasm(armv8-mont.${ASM_EXT} bn/asm/armv8-mont.pl)
perlasm(bn-586.${ASM_EXT} bn/asm/bn-586.pl)
perlasm(bsaes-armv7.${ASM_EXT} aes/asm/bsaes-armv7.pl)
perlasm(bsaes-x86_64.${ASM_EXT} aes/asm/bsaes-x86_64.pl)
perlasm(co-586.${ASM_EXT} bn/asm/co-586.pl)
perlasm(ghash-armv4.${ASM_EXT} modes/asm/ghash-armv4.pl)
perlasm(ghashp8-ppc.${ASM_EXT} modes/asm/ghashp8-ppc.pl)
perlasm(ghashv8-armx.${ASM_EXT} modes/asm/ghashv8-armx.pl)
perlasm(ghash-ssse3-x86_64.${ASM_EXT} modes/asm/ghash-ssse3-x86_64.pl)
perlasm(ghash-ssse3-x86.${ASM_EXT} modes/asm/ghash-ssse3-x86.pl)
perlasm(ghash-x86_64.${ASM_EXT} modes/asm/ghash-x86_64.pl)
perlasm(ghash-x86.${ASM_EXT} modes/asm/ghash-x86.pl)
perlasm(md5-586.${ASM_EXT} md5/asm/md5-586.pl)
perlasm(md5-x86_64.${ASM_EXT} md5/asm/md5-x86_64.pl)
perlasm(p256-x86_64-asm.${ASM_EXT} ec/asm/p256-x86_64-asm.pl)
perlasm(p256_beeu-x86_64-asm.${ASM_EXT} ec/asm/p256_beeu-x86_64-asm.pl)
perlasm(rdrand-x86_64.${ASM_EXT} rand/asm/rdrand-x86_64.pl)
perlasm(rsaz-avx2.${ASM_EXT} bn/asm/rsaz-avx2.pl)
perlasm(sha1-586.${ASM_EXT} sha/asm/sha1-586.pl)
perlasm(sha1-armv4-large.${ASM_EXT} sha/asm/sha1-armv4-large.pl)
perlasm(sha1-armv8.${ASM_EXT} sha/asm/sha1-armv8.pl)
perlasm(sha1-x86_64.${ASM_EXT} sha/asm/sha1-x86_64.pl)
perlasm(sha256-586.${ASM_EXT} sha/asm/sha256-586.pl)
perlasm(sha256-armv4.${ASM_EXT} sha/asm/sha256-armv4.pl)
perlasm(sha256-armv8.${ASM_EXT} sha/asm/sha512-armv8.pl)
perlasm(sha256-x86_64.${ASM_EXT} sha/asm/sha512-x86_64.pl)
perlasm(sha512-586.${ASM_EXT} sha/asm/sha512-586.pl)
perlasm(sha512-armv4.${ASM_EXT} sha/asm/sha512-armv4.pl)
perlasm(sha512-armv8.${ASM_EXT} sha/asm/sha512-armv8.pl)
perlasm(sha512-x86_64.${ASM_EXT} sha/asm/sha512-x86_64.pl)
perlasm(vpaes-armv8.${ASM_EXT} aes/asm/vpaes-armv8.pl)
perlasm(vpaes-x86_64.${ASM_EXT} aes/asm/vpaes-x86_64.pl)
perlasm(vpaes-x86.${ASM_EXT} aes/asm/vpaes-x86.pl)
perlasm(x86_64-mont5.${ASM_EXT} bn/asm/x86_64-mont5.pl)
perlasm(x86_64-mont.${ASM_EXT} bn/asm/x86_64-mont.pl)
perlasm(x86-mont.${ASM_EXT} bn/asm/x86-mont.pl)
if(FIPS_DELOCATE)
if(OPENSSL_NO_ASM)
# If OPENSSL_NO_ASM was defined then ASM will not have been enabled, but in
# FIPS mode we have to have it because the module build requires going via
# textual assembly.
enable_language(ASM)
endif()
add_library(
bcm_c_generated_asm
STATIC
bcm.c
)
add_dependencies(bcm_c_generated_asm global_target)
set_target_properties(bcm_c_generated_asm PROPERTIES COMPILE_OPTIONS "-S")
set_target_properties(bcm_c_generated_asm PROPERTIES POSITION_INDEPENDENT_CODE ON)
go_executable(delocate boringssl.googlesource.com/boringssl/util/fipstools/delocate)
add_custom_command(
OUTPUT bcm-delocated.S
COMMAND ./delocate -a $<TARGET_FILE:bcm_c_generated_asm> -o bcm-delocated.S ${BCM_ASM_SOURCES}
DEPENDS bcm_c_generated_asm delocate ${BCM_ASM_SOURCES}
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
)
add_library(
bcm_hashunset
STATIC
bcm-delocated.S
)
add_dependencies(bcm_hashunset global_target)
set_target_properties(bcm_hashunset PROPERTIES POSITION_INDEPENDENT_CODE ON)
set_target_properties(bcm_hashunset PROPERTIES LINKER_LANGUAGE C)
go_executable(inject_hash
boringssl.googlesource.com/boringssl/util/fipstools/inject_hash)
add_custom_command(
OUTPUT bcm.o
COMMAND ./inject_hash -o bcm.o -in-archive $<TARGET_FILE:bcm_hashunset>
DEPENDS bcm_hashunset inject_hash
WORKING_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
)
# The outputs of add_custom_command cannot be referenced outside of the
# CMakeLists.txt that defines it. Thus we have to wrap bcm.o in a custom target
# so that crypto can depend on it.
add_custom_target(bcm_o_target DEPENDS bcm.o)
add_library(
fipsmodule
OBJECT
is_fips.c
)
add_dependencies(fipsmodule global_target)
set_target_properties(fipsmodule PROPERTIES LINKER_LANGUAGE C)
else()
add_library(
fipsmodule
OBJECT
bcm.c
is_fips.c
${BCM_ASM_SOURCES}
)
add_dependencies(fipsmodule global_target)
endif()