In case of x86, there are few implementation of the montgomery reduction
and multiplication. At runtime, library chooses most performant
implementation according to information received from CPUID. The
implementation then is assigned to a function pointer which gets CALL'd
during program execution.
The problem is a mixture of following; function pointer points to an
assembly function, it's arguments are also pointers to some data and
arguments passed to the call are allocated on a stack before a call.
As variable can't be tagged as go:noescape, the go compiler will move
arguments passed to the call from stack to heap. This causes significant
performance degradataion.
The solution is not to use function pointer. Instead, both redc and mul,
will check at runtime CPU capabilities and do the JMP to the correct
part of the text. Thanks to branch prediction the cost of the solution
is minimal and smaller than function call. This patch also removes all
heap allocations done by functions operating on prime field.
The other goal of this patch is to remove x86 specific code from
arith_decl.go, which will be also compiled for ARM arch in near future.
Results:
--------
benchmark old ns/op new ns/op delta
BenchmarkFp2ElementMul 428 194 -54.67%
BenchmarkFp2ElementInv 67353 34447 -48.86%
BenchmarkFp2ElementSquare 335 139 -58.51%
BenchmarkFp503MontgomeryReduce 26.3 22.8 -13.31%
BenchmarkSidhKeyAgreementP503 12451199 6402396 -48.58%
BenchmarkAliceKeyGenPubP503 7349333 3590954 -51.14%
BenchmarkBobKeyGenPubP503 8253676 4094141 -50.40%
BenchmarkSharedSecretAliceP503 5888022 2916821 -50.46%
BenchmarkSharedSecretBobP503 6908018 3436713 -50.25%
Comparision with P751:
----------------------
BenchmarkBobKeyGenPubP751 13616876
BenchmarkBobKeyGenPubP503 4094141
BenchmarkSharedSecretAliceP751 9870216
BenchmarkSharedSecretAliceP503 2916821
There is a cost - possibly CMPB & JE each time mul and redc is called.
Also patch introduces arith_amd64.go file which keeps some variables -
this currently needs to be done for each field. Probably it could be
possible to get rid of it at some point.
Using golang.org/x/sys/cpu would be really great, but it makes it
hard to vendor the library with go 1.10.
This patch gets information about CPU capabilities directly from CPUID
* Makes it possible to use p503 in SIDH and SIKE
* Refactors tests so that unit tests in SIKE and SIDH are run for
each prime field. It adds test data array called 'tdata' which
describes test parameters for underlying prime field. When new field
is added, it's enough to simply add new test data to 'tdata' vector in
order to run all existing tests with new prime field.
* SIKE/p503 is now tested with official test vectors from NIST
submission
Private key generation can take advantage of the fact that keyspace for
secret key is (0, 2^x - 1), for some possitivite value of 'x' (see SIKE,
1.3.8). It means that all bytes in the secret key, but the last one, can
take any value between <0x00,0xFF>. Similarily for the last byte, but
generation needs to chop off some bits, to make sure generated value is
an element of a key-space.
Assuming uniform distribution of bytes generated by RNG, secret key is
still chosen uniformly at random, but there is no need to maintain field
specific assembly code.
In https://eprint.iacr.org/2017/1015.pdf a technique was described to
improve the performance of Montgomery reduction for Montgomery-friendly
moduli. This adds an implementation using the MULX, ADOX and ADCX
instructions, available in the BMI2 (since Haswell) and ADX (since
Broadwell) instruction set extensions.
Implementation uses golang.org/x/sys/cpu for CPU capabilities checking
at runtime.
Development done by: Ko- <ko@cloudflare.com>
It makes a little bit more sense to have GeneratePublicKey as a method
of PrivateKey. In this case code doesn't need to check if caller
provided pointer is nil. Object was created by NewPrivateKey(), so it
code can assume object was correctly initialized.
The old GeneratePublicKey was returning an error when caller provided
pointer was nil. As this possibility is now removed, method doesn't
return error anymore.
From/to bytes conversion will be refactored when p503 is introduced.
Patch splits part that uses field specific functions from part that
converts Fp element to bytes.
Patch also removes some testing helpers which are no longer needed.
Go 1.10 correctly translates MOVQ pseudo-instruction to MOV. It was
fixed in:
7b773946c0
We don't expect this library to compile with older version than Go 1.10
In https://eprint.iacr.org/2017/1015.pdf a technique was described to
improve the performance of Montgomery reduction for Montgomery-friendly
moduli. This adds an implementation using the mulx, adox and adcx
instructions, available in the BMI2 (since Haswell) and ADX (since
Broadwell) instruction set extensions.
* implements SIKE specified here:
http://www.sike.org/files/SIDH-spec.pdf
* methods for both - KEM and PKE - are added
* adds SIKE specific key variant
* tests: known answer tests for sike
* uses cSHKAE from nobs-crypto
* tests: adds continues integration
* Makefile has targets for running tests, benchmarks and code coverage. It also
contains target for env preparation. In order to run sidh tests
GOPATH must contain p751toolbox package. I've chosen to manualy
copy this package to the temporary GOPATH directory. It could also be done
by calling "go get", but then any commit to both p751toolbox and sidh would need
to be done in 2 steps.
* .travis.yml calls make and uploads code coverage to Codecov
* move sidh to seperated folder
* sidh: updates algorithm to SIDHv3 and refactoring
* makes an algorithm compatible with Microsoft's SIDHv3
implementation. This is required to implement SIKE key
encapsulation mechanism, as specified in PQC NIST submission
from Nov, 30 2017
* removes SIDHBobPublicKey/SIDHAlicePublicKey/SIDHBobPrivateKey/
SIDHAlicePrivateKey. Instead PrivateKey and PublicKey structures
where introduced. Each of this structure stores variant of a key
A or B. Implementation uses a key variant in order to differentiate
between 2- and 3-torision groups.
Main goal of removing "Alice" and "Bob" specific structures is to
remove code duplication
* Introduces SidhParams: structure to store prime field and SIDH
domain parameters.
* Refactors public API. Introduces:
- Functions to generate, import, export keypair
- DeriveSecret function to create shared secret
- Supporting functions and types
* Removes code which is not used by implementation anymore, like
DistortAndDifference(), SecretPoint(), DblAdd(),
OkeyaSakuraiCoordinateRecovery() and many more. Also tests for those
functions are removed.
* Adds fixes for key import/export