(Imported from upstream's b3d7294976c58e0e05d0ee44a0e7c9c3b8515e05.) May as well avoid diverging. Change-Id: I3edec4fe15b492dd3bfb3146a8944acc6575f861 Reviewed-on: https://boringssl-review.googlesource.com/3020 Reviewed-by: Adam Langley <agl@google.com>kris/onging/CECPQ3_patch15
@@ -61,8 +61,12 @@ | |||||
# | # | ||||
# rsa2048 sign/sec OpenSSL 1.0.1 scalar(*) this | # rsa2048 sign/sec OpenSSL 1.0.1 scalar(*) this | ||||
# 2.3GHz Haswell 621 765/+23% 1113/+79% | # 2.3GHz Haswell 621 765/+23% 1113/+79% | ||||
# 2.3GHz Broadwell(**) 688 1200(***)/+74% 1120/+63% | |||||
# | # | ||||
# (*) if system doesn't support AVX2, for reference purposes; | # (*) if system doesn't support AVX2, for reference purposes; | ||||
# (**) scaled to 2.3GHz to simplify comparison; | |||||
# (***) scalar AD*X code is faster than AVX2 and is preferred code | |||||
# path for Broadwell; | |||||
$flavour = shift; | $flavour = shift; | ||||
$output = shift; | $output = shift; | ||||
@@ -22,7 +22,10 @@ | |||||
# [1] and [2], with MOVBE twist suggested by Ilya Albrekht and Max | # [1] and [2], with MOVBE twist suggested by Ilya Albrekht and Max | ||||
# Locktyukhin of Intel Corp. who verified that it reduces shuffles | # Locktyukhin of Intel Corp. who verified that it reduces shuffles | ||||
# pressure with notable relative improvement, achieving 1.0 cycle per | # pressure with notable relative improvement, achieving 1.0 cycle per | ||||
# byte processed with 128-bit key on Haswell processor. | |||||
# byte processed with 128-bit key on Haswell processor, and 0.74 - | |||||
# on Broadwell. [Mentioned results are raw profiled measurements for | |||||
# favourable packet size, one divisible by 96. Applications using the | |||||
# EVP interface will observe a few percent worse performance.] | |||||
# | # | ||||
# [1] http://rt.openssl.org/Ticket/Display.html?id=2900&user=guest&pass=guest | # [1] http://rt.openssl.org/Ticket/Display.html?id=2900&user=guest&pass=guest | ||||
# [2] http://www.intel.com/content/dam/www/public/us/en/documents/software-support/enabling-high-performance-gcm.pdf | # [2] http://www.intel.com/content/dam/www/public/us/en/documents/software-support/enabling-high-performance-gcm.pdf | ||||
@@ -63,6 +63,7 @@ | |||||
# Sandy Bridge 1.80(+8%) | # Sandy Bridge 1.80(+8%) | ||||
# Ivy Bridge 1.80(+7%) | # Ivy Bridge 1.80(+7%) | ||||
# Haswell 0.55(+93%) (if system doesn't support AVX) | # Haswell 0.55(+93%) (if system doesn't support AVX) | ||||
# Broadwell 0.45(+110%)(if system doesn't support AVX) | |||||
# Bulldozer 1.49(+27%) | # Bulldozer 1.49(+27%) | ||||
# Silvermont 2.88(+13%) | # Silvermont 2.88(+13%) | ||||
@@ -73,7 +74,8 @@ | |||||
# CPUs such as Sandy and Ivy Bridge can execute it, the code performs | # CPUs such as Sandy and Ivy Bridge can execute it, the code performs | ||||
# sub-optimally in comparison to above mentioned version. But thanks | # sub-optimally in comparison to above mentioned version. But thanks | ||||
# to Ilya Albrekht and Max Locktyukhin of Intel Corp. we knew that | # to Ilya Albrekht and Max Locktyukhin of Intel Corp. we knew that | ||||
# it performs in 0.41 cycles per byte on Haswell processor. | |||||
# it performs in 0.41 cycles per byte on Haswell processor, and in | |||||
# 0.29 on Broadwell. | |||||
# | # | ||||
# [1] http://rt.openssl.org/Ticket/Display.html?id=2900&user=guest&pass=guest | # [1] http://rt.openssl.org/Ticket/Display.html?id=2900&user=guest&pass=guest | ||||