boringssl/crypto/fipsmodule/ec/util.c
Andres Erbsen 46304abf7d ec/p256.c: fiat-crypto field arithmetic (64, 32)
The fiat-crypto-generated code uses the Montgomery form implementation
strategy, for both 32-bit and 64-bit code.

64-bit throughput seems slower, but the difference is smaller than noise between repetitions (-2%?)

32-bit throughput has decreased significantly for ECDH (-40%). I am
attributing this to the change from varibale-time scalar multiplication
to constant-time scalar multiplication. Due to the same bottleneck,
ECDSA verification still uses the old code (otherwise there would have
been a 60% throughput decrease). On the other hand, ECDSA signing
throughput has increased slightly (+10%), perhaps due to the use of a
precomputed table of multiples of the base point.

64-bit benchmarks (Google Cloud Haswell):

with this change:
Did 9126 ECDH P-256 operations in 1009572us (9039.5 ops/sec)
Did 23000 ECDSA P-256 signing operations in 1039832us (22119.0 ops/sec)
Did 8820 ECDSA P-256 verify operations in 1024242us (8611.2 ops/sec)

master (40e8c921ca):
Did 9340 ECDH P-256 operations in 1017975us (9175.1 ops/sec)
Did 23000 ECDSA P-256 signing operations in 1039820us (22119.2 ops/sec)
Did 8688 ECDSA P-256 verify operations in 1021108us (8508.4 ops/sec)

benchmarks on ARMv7 (LG Nexus 4):

with this change:
Did 150 ECDH P-256 operations in 1029726us (145.7 ops/sec)
Did 506 ECDSA P-256 signing operations in 1065192us (475.0 ops/sec)
Did 363 ECDSA P-256 verify operations in 1033298us (351.3 ops/sec)

master (2fce1beda0):
Did 245 ECDH P-256 operations in 1017518us (240.8 ops/sec)
Did 473 ECDSA P-256 signing operations in 1086281us (435.4 ops/sec)
Did 360 ECDSA P-256 verify operations in 1003846us (358.6 ops/sec)

64-bit tables converted as follows:

import re, sys, math

p = 2**256 - 2**224 + 2**192 + 2**96 - 1
R = 2**256

def convert(t):
    x0, s1, x1, s2, x2, s3, x3 = t.groups()
    v = int(x0, 0) + 2**64 * (int(x1, 0) + 2**64*(int(x2,0) + 2**64*(int(x3, 0)) ))
    w = v*R%p
    y0 = hex(w%(2**64))
    y1 = hex((w>>64)%(2**64))
    y2 = hex((w>>(2*64))%(2**64))
    y3 = hex((w>>(3*64))%(2**64))
    ww = int(y0, 0) + 2**64 * (int(y1, 0) + 2**64*(int(y2,0) + 2**64*(int(y3, 0)) ))
    if ww != v*R%p:
        print(x0,x1,x2,x3)
        print(hex(v))
        print(y0,y1,y2,y3)
        print(hex(w))
        print(hex(ww))
        assert 0
    return '{'+y0+s1+y1+s2+y2+s3+y3+'}'

fe_re = re.compile('{'+r'(\s*,\s*)'.join(r'(\d+|0x[abcdefABCDEF0123456789]+)' for i in range(4)) + '}')
print (re.sub(fe_re, convert, sys.stdin.read()).rstrip('\n'))

32-bit tables converted from 64-bit tables

Change-Id: I52d6e5504fcb6ca2e8b0ee13727f4500c80c1799
Reviewed-on: https://boringssl-review.googlesource.com/23244
Commit-Queue: Adam Langley <agl@google.com>
Reviewed-by: Adam Langley <agl@google.com>
CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
2017-12-11 17:55:46 +00:00

105 lines
4.8 KiB
C

/* Copyright (c) 2015, Google Inc.
*
* Permission to use, copy, modify, and/or distribute this software for any
* purpose with or without fee is hereby granted, provided that the above
* copyright notice and this permission notice appear in all copies.
*
* THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
* WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
* MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
* SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
* WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION
* OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
* CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */
#include <openssl/base.h>
#include <openssl/ec.h>
#include "internal.h"
// This function looks at 5+1 scalar bits (5 current, 1 adjacent less
// significant bit), and recodes them into a signed digit for use in fast point
// multiplication: the use of signed rather than unsigned digits means that
// fewer points need to be precomputed, given that point inversion is easy (a
// precomputed point dP makes -dP available as well).
//
// BACKGROUND:
//
// Signed digits for multiplication were introduced by Booth ("A signed binary
// multiplication technique", Quart. Journ. Mech. and Applied Math., vol. IV,
// pt. 2 (1951), pp. 236-240), in that case for multiplication of integers.
// Booth's original encoding did not generally improve the density of nonzero
// digits over the binary representation, and was merely meant to simplify the
// handling of signed factors given in two's complement; but it has since been
// shown to be the basis of various signed-digit representations that do have
// further advantages, including the wNAF, using the following general
// approach:
//
// (1) Given a binary representation
//
// b_k ... b_2 b_1 b_0,
//
// of a nonnegative integer (b_k in {0, 1}), rewrite it in digits 0, 1, -1
// by using bit-wise subtraction as follows:
//
// b_k b_(k-1) ... b_2 b_1 b_0
// - b_k ... b_3 b_2 b_1 b_0
// -------------------------------------
// s_k b_(k-1) ... s_3 s_2 s_1 s_0
//
// A left-shift followed by subtraction of the original value yields a new
// representation of the same value, using signed bits s_i = b_(i+1) - b_i.
// This representation from Booth's paper has since appeared in the
// literature under a variety of different names including "reversed binary
// form", "alternating greedy expansion", "mutual opposite form", and
// "sign-alternating {+-1}-representation".
//
// An interesting property is that among the nonzero bits, values 1 and -1
// strictly alternate.
//
// (2) Various window schemes can be applied to the Booth representation of
// integers: for example, right-to-left sliding windows yield the wNAF
// (a signed-digit encoding independently discovered by various researchers
// in the 1990s), and left-to-right sliding windows yield a left-to-right
// equivalent of the wNAF (independently discovered by various researchers
// around 2004).
//
// To prevent leaking information through side channels in point multiplication,
// we need to recode the given integer into a regular pattern: sliding windows
// as in wNAFs won't do, we need their fixed-window equivalent -- which is a few
// decades older: we'll be using the so-called "modified Booth encoding" due to
// MacSorley ("High-speed arithmetic in binary computers", Proc. IRE, vol. 49
// (1961), pp. 67-91), in a radix-2^5 setting. That is, we always combine five
// signed bits into a signed digit:
//
// s_(4j + 4) s_(4j + 3) s_(4j + 2) s_(4j + 1) s_(4j)
//
// The sign-alternating property implies that the resulting digit values are
// integers from -16 to 16.
//
// Of course, we don't actually need to compute the signed digits s_i as an
// intermediate step (that's just a nice way to see how this scheme relates
// to the wNAF): a direct computation obtains the recoded digit from the
// six bits b_(4j + 4) ... b_(4j - 1).
//
// This function takes those five bits as an integer (0 .. 63), writing the
// recoded digit to *sign (0 for positive, 1 for negative) and *digit (absolute
// value, in the range 0 .. 8). Note that this integer essentially provides the
// input bits "shifted to the left" by one position: for example, the input to
// compute the least significant recoded digit, given that there's no bit b_-1,
// has to be b_4 b_3 b_2 b_1 b_0 0.
void ec_GFp_nistp_recode_scalar_bits(uint8_t *sign, uint8_t *digit,
uint8_t in) {
uint8_t s, d;
s = ~((in >> 5) - 1); /* sets all bits to MSB(in), 'in' seen as
* 6-bit value */
d = (1 << 6) - in - 1;
d = (d & s) | (in & ~s);
d = (d >> 1) + (d & 1);
*sign = s & 1;
*digit = d;
}