1
0
mirror of https://github.com/henrydcase/nobs.git synced 2024-11-27 01:21:21 +00:00
nobs/dh/csidh/curve_test.go

372 lines
17 KiB
Go
Raw Normal View History

cSIDH-511: (#26) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh | TestName |Go | C | |------------------|----------|----------| |TestSharedSecret | 57.95774 | 57.91092 | |TestKeyGeneration | 62.23614 | 58.12980 | |TestSharedSecret | 55.28988 | 57.23132 | |TestKeyGeneration | 61.68745 | 58.66396 | |TestSharedSecret | 63.19408 | 58.64774 | |TestKeyGeneration | 62.34022 | 61.62539 | |TestSharedSecret | 62.85453 | 68.74503 | |TestKeyGeneration | 52.58518 | 58.40115 | |TestSharedSecret | 50.77081 | 61.91699 | |TestKeyGeneration | 59.91843 | 61.09266 | |TestSharedSecret | 59.97962 | 62.98151 | |TestKeyGeneration | 64.57525 | 56.22863 | |TestSharedSecret | 56.40521 | 55.77447 | |TestKeyGeneration | 67.85850 | 58.52604 | |TestSharedSecret | 60.54290 | 65.14052 | |TestKeyGeneration | 65.45766 | 58.42823 | On average Go implementation is 2% faster.
2019-11-24 03:39:35 +00:00
package csidh
import (
"math/big"
"testing"
)
2020-05-14 00:48:43 +01:00
// Actual test implementation.
cSIDH-511: (#26) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh | TestName |Go | C | |------------------|----------|----------| |TestSharedSecret | 57.95774 | 57.91092 | |TestKeyGeneration | 62.23614 | 58.12980 | |TestSharedSecret | 55.28988 | 57.23132 | |TestKeyGeneration | 61.68745 | 58.66396 | |TestSharedSecret | 63.19408 | 58.64774 | |TestKeyGeneration | 62.34022 | 61.62539 | |TestSharedSecret | 62.85453 | 68.74503 | |TestKeyGeneration | 52.58518 | 58.40115 | |TestSharedSecret | 50.77081 | 61.91699 | |TestKeyGeneration | 59.91843 | 61.09266 | |TestSharedSecret | 59.97962 | 62.98151 | |TestKeyGeneration | 64.57525 | 56.22863 | |TestSharedSecret | 56.40521 | 55.77447 | |TestKeyGeneration | 67.85850 | 58.52604 | |TestSharedSecret | 60.54290 | 65.14052 | |TestKeyGeneration | 65.45766 | 58.42823 | On average Go implementation is 2% faster.
2019-11-24 03:39:35 +00:00
func TestXAdd(t *testing.T) {
var P, Q, PdQ point
var PaQ point
var expPaQ big.Int
// points from a Elliptic Curve defined in sage as follows:
// A = 0x6055947AAFEBF773CE912680A6A32656073233D2FD6FDF4A143BE82D25B44ECC0431DE564C0F0D6591ACC62D6876E86F5D06B68C9EAF20D0DB0A6B99ED558512
// E = EllipticCurve(GF(p), [0, A, 0, 1, 0])
// where p is CSIDH's 511-bit prime
checkXAdd := func() {
xAdd(&PaQ, &P, &Q, &PdQ)
ret := toNormX(&PaQ)
if ret.Cmp(&expPaQ) != 0 {
t.Errorf("\nExp: %s\nGot: %s", expPaQ.Text(16), ret.Text(16))
}
}
expPaQ.SetString("0x41C98C5D7FF118B1A3987733581FD69C0CC27D7B63BCCA525106B9945869C6DAEDAA3D5D9D2679237EF0D013BE68EF12731DBFB26E12576BAD1E824C67ABD125", 0)
P.x = toFp("0x5840FD8E0165F7F474260F99337461AF195233F791FABE735EC2634B74A95559568B4CEB23959C8A01C5C57E215D22639868ED840D74FE2BAC04830CF75047AD")
P.z = toFp("1")
Q.x = toFp("0x3C1A003C71436698B4A181CEB12BA4B4D1FF7BB14AAAF6FBDA6957C4EBA20AD8E3893DF6F64E67E81163E024C19C7E975F3EC61862F75502C3ED802370E75A3F")
Q.z = toFp("1")
PdQ.x = toFp("0x519B1928F752B0B2143C1C23EB247B370DBB5B9C29B9A3A064D7FBC1B67FAC34B6D3DDA0F3CB87C387B425B36F31B93A8E73252BA701927B767A9DE89D5A92AE")
PdQ.z = toFp("1")
checkXAdd()
expPaQ.SetString("0x5840FD8E0165F7F474260F99337461AF195233F791FABE735EC2634B74A95559568B4CEB23959C8A01C5C57E215D22639868ED840D74FE2BAC04830CF75047AD", 0)
P.x = toFp("0x5840FD8E0165F7F474260F99337461AF195233F791FABE735EC2634B74A95559568B4CEB23959C8A01C5C57E215D22639868ED840D74FE2BAC04830CF75047AD")
P.z = toFp("1")
Q.x = toFp("1")
Q.z = toFp("0x0")
PdQ.x = toFp(expPaQ.Text(10))
PdQ.z = toFp("1")
checkXAdd()
}
func TestXDbl(t *testing.T) {
var P, A point
var PaP point
var expPaP big.Int
// points from a Elliptic Curve defined in sage as follows:
// A = 0x599841D7D1FCD92A85759B7A3D2D5E4C56EFB17F19F86EB70E121EA16305EDE45A55868BE069313F821F7D94069EC220A4AC3B85500376710538246E9B3BC138
// E = EllipticCurve(GF(p), [0, A, 0, 1, 0])
// where p is CSIDH's 511-bit prime
expPaP.SetString("0x6115B5D8BB613D11BDFEA70D436D87C1515553F6A15061727B4001E0AF745AAA9F39EB9464982829D931F77DAB9D71B24FF0D1D34C347F2A51FD45821F2EA06F", 0)
P.x = toFp("0x6C5B4D4AB0765AAB23C10F8455BE522D3A5363324D7AD641CC67C0A52FC1FFE9F3F8EDFE641478CA93D4D0016D83F21487FD4AF4E02F8A2C237CF27C5604BCC")
P.z = toFp("1")
A.x = toFp("0x599841D7D1FCD92A85759B7A3D2D5E4C56EFB17F19F86EB70E121EA16305EDE45A55868BE069313F821F7D94069EC220A4AC3B85500376710538246E9B3BC138")
A.z = toFp("1")
xDbl(&PaP, &P, &A)
ret := toNormX(&PaP)
if ret.Cmp(&expPaP) != 0 {
t.Errorf("\nExp: %s\nGot: %s", expPaP.Text(16), ret.Text(16))
}
}
func TestXDblAddNominal(t *testing.T) {
var P, Q, PdQ point
var PaP, PaQ point
var expPaP, expPaQ big.Int
var A coeff
checkXDblAdd := func() {
var A24 coeff
// A24.a = 2*A.z + A.a
addRdc(&A24.a, &A.c, &A.c)
addRdc(&A24.a, &A24.a, &A.a)
// A24.z = 4*A.z
mulRdc(&A24.c, &A.c, &four)
// Additionally will check if input can be same as output
PaP = P
PaQ = Q
xDblAdd(&PaP, &PaQ, &PaP, &PaQ, &PdQ, &A24)
retPaP := toNormX(&PaP)
retPaQ := toNormX(&PaQ)
if retPaP.Cmp(&expPaP) != 0 {
t.Errorf("\nExp: %s\nGot: %s", expPaP.Text(16), retPaP.Text(16))
}
if retPaQ.Cmp(&expPaQ) != 0 {
t.Errorf("\nExp: %s\nGot: %s", expPaQ.Text(16), retPaQ.Text(16))
}
}
// 2*P
expPaP.SetString("0x38F5B37271A3D8FA50107F88045D6F6B08355DD026C02E0306CE5875F47422736AD841B4122B2BD7DE6166BB6498F6A283378FF8250948E834F15CEA2D59A57B", 0)
// P+Q
expPaQ.SetString("0x53D9B44C5F61651612243CF7987F619FE6ACB5CF29538F96A63E7278E131F41A17D64388E31B028A5183EF9096AE82724BC34D8DDFD67AD68BD552A33C345B8C", 0)
P.x = toFp("0x4FE17B4CC66E85960F57033CD45996C99248DA09DF2E36F8840657B52F74ED8173E0D322FA57D7B4D0EE7F12967BBD59140B42F2626E29167D6419E851E5A4C9")
P.z = toFp("1")
Q.x = toFp("0x465047949CD6574FDBE00EA365CAF7A95DC9DEBE96A188823CA8C9DD9F527CF81290D49864F61DF0C08C1D6052139230735CA6CFDBDC1A8820610CCD71861176")
Q.z = toFp("1")
PdQ.x = toFp("0x49D3B999A0A020B34473568A8F75B5405F2D3BE5A006595015FC6DDC6BED8AB2A51A887B6DC62C64354466865FFD69E50AD37F6F4FBD74119EB65EBC9367B556")
PdQ.z = toFp("1")
A.a = toFp("0x118F955D498D902FD42E5B2926F297CC814CD7649EC5B070295622F97C4A0D9BD34058A7E0E00CB73ED32FCC237F9F6B7D2A15F5CC7C4EC61ECEF80ACBB0EFA4")
A.c = toFp("1")
checkXDblAdd()
// Case P=value, Q=(x=1, z=0). In this case PaQ==P; PaP=2*P
expPaP.SetString("0x38F5B37271A3D8FA50107F88045D6F6B08355DD026C02E0306CE5875F47422736AD841B4122B2BD7DE6166BB6498F6A283378FF8250948E834F15CEA2D59A57B", 0)
expPaQ.SetString("0x4FE17B4CC66E85960F57033CD45996C99248DA09DF2E36F8840657B52F74ED8173E0D322FA57D7B4D0EE7F12967BBD59140B42F2626E29167D6419E851E5A4C9", 0)
P.x = toFp("0x4FE17B4CC66E85960F57033CD45996C99248DA09DF2E36F8840657B52F74ED8173E0D322FA57D7B4D0EE7F12967BBD59140B42F2626E29167D6419E851E5A4C9")
P.z = toFp("1")
Q.x = toFp("1")
Q.z = toFp("0")
PdQ.x = toFp("0x4FE17B4CC66E85960F57033CD45996C99248DA09DF2E36F8840657B52F74ED8173E0D322FA57D7B4D0EE7F12967BBD59140B42F2626E29167D6419E851E5A4C9")
PdQ.z = toFp("1")
A.a = toFp("0x118F955D498D902FD42E5B2926F297CC814CD7649EC5B070295622F97C4A0D9BD34058A7E0E00CB73ED32FCC237F9F6B7D2A15F5CC7C4EC61ECEF80ACBB0EFA4")
A.c = toFp("1")
checkXDblAdd()
}
func TestXDblAddVSxDblxAdd(t *testing.T) {
var P, Q, PdQ point
var PaP1, PaQ1 point
var PaP2, PaQ2 point
var A point
var A24 coeff
P.x = toFp("0x4FE17B4CC66E85960F57033CD45996C99248DA09DF2E36F8840657B52F74ED8173E0D322FA57D7B4D0EE7F12967BBD59140B42F2626E29167D6419E851E5A4C9")
P.z = toFp("1")
Q.x = toFp("0x465047949CD6574FDBE00EA365CAF7A95DC9DEBE96A188823CA8C9DD9F527CF81290D49864F61DF0C08C1D6052139230735CA6CFDBDC1A8820610CCD71861176")
Q.z = toFp("1")
PdQ.x = toFp("0x49D3B999A0A020B34473568A8F75B5405F2D3BE5A006595015FC6DDC6BED8AB2A51A887B6DC62C64354466865FFD69E50AD37F6F4FBD74119EB65EBC9367B556")
PdQ.z = toFp("1")
A.x = toFp("0x118F955D498D902FD42E5B2926F297CC814CD7649EC5B070295622F97C4A0D9BD34058A7E0E00CB73ED32FCC237F9F6B7D2A15F5CC7C4EC61ECEF80ACBB0EFA4")
A.z = toFp("1")
// Precompute A24 for xDblAdd
// (A+2C:4C) => (A24.x = A.x+2A.z; A24.z = 4*A.z)
addRdc(&A24.a, &A.z, &A.z)
addRdc(&A24.a, &A24.a, &A.x)
mulRdc(&A24.c, &A.z, &four)
for i := 0; i < numIter; i++ {
xAdd(&PaQ2, &P, &Q, &PdQ)
xDbl(&PaP2, &P, &A)
xDblAdd(&PaP1, &PaQ1, &P, &Q, &PdQ, &A24)
if !ceqpoint(&PaQ1, &PaQ2) {
exp := toNormX(&PaQ1)
got := toNormX(&PaQ2)
t.Errorf("\nExp: \n\t%s\nGot from xAdd: \n\t%s", exp.Text(16), got.Text(16))
}
if !ceqpoint(&PaP1, &PaP2) {
exp := toNormX(&PaP1)
got := toNormX(&PaP2)
t.Errorf("\nExp: \n\t%s\nGot from xDbl: \n\t%s", exp.Text(16), got.Text(16))
}
// Swap values for next operation
PdQ = Q
Q = P
P = PaP1
}
}
func TestXMul(t *testing.T) {
var P point
var co coeff
var expKP big.Int
var k fp
checkXMul := func() {
var kP point
2020-05-14 00:48:43 +01:00
xMul(&kP, &P, &co, &k)
cSIDH-511: (#26) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh | TestName |Go | C | |------------------|----------|----------| |TestSharedSecret | 57.95774 | 57.91092 | |TestKeyGeneration | 62.23614 | 58.12980 | |TestSharedSecret | 55.28988 | 57.23132 | |TestKeyGeneration | 61.68745 | 58.66396 | |TestSharedSecret | 63.19408 | 58.64774 | |TestKeyGeneration | 62.34022 | 61.62539 | |TestSharedSecret | 62.85453 | 68.74503 | |TestKeyGeneration | 52.58518 | 58.40115 | |TestSharedSecret | 50.77081 | 61.91699 | |TestKeyGeneration | 59.91843 | 61.09266 | |TestSharedSecret | 59.97962 | 62.98151 | |TestKeyGeneration | 64.57525 | 56.22863 | |TestSharedSecret | 56.40521 | 55.77447 | |TestKeyGeneration | 67.85850 | 58.52604 | |TestSharedSecret | 60.54290 | 65.14052 | |TestKeyGeneration | 65.45766 | 58.42823 | On average Go implementation is 2% faster.
2019-11-24 03:39:35 +00:00
retKP := toNormX(&kP)
if expKP.Cmp(&retKP) != 0 {
t.Errorf("\nExp: %s\nGot: %s", expKP.Text(16), retKP.Text(16))
}
// Check if first and second argument can overlap
2020-05-14 00:48:43 +01:00
xMul(&P, &P, &co, &k)
cSIDH-511: (#26) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh | TestName |Go | C | |------------------|----------|----------| |TestSharedSecret | 57.95774 | 57.91092 | |TestKeyGeneration | 62.23614 | 58.12980 | |TestSharedSecret | 55.28988 | 57.23132 | |TestKeyGeneration | 61.68745 | 58.66396 | |TestSharedSecret | 63.19408 | 58.64774 | |TestKeyGeneration | 62.34022 | 61.62539 | |TestSharedSecret | 62.85453 | 68.74503 | |TestKeyGeneration | 52.58518 | 58.40115 | |TestSharedSecret | 50.77081 | 61.91699 | |TestKeyGeneration | 59.91843 | 61.09266 | |TestSharedSecret | 59.97962 | 62.98151 | |TestKeyGeneration | 64.57525 | 56.22863 | |TestSharedSecret | 56.40521 | 55.77447 | |TestKeyGeneration | 67.85850 | 58.52604 | |TestSharedSecret | 60.54290 | 65.14052 | |TestKeyGeneration | 65.45766 | 58.42823 | On average Go implementation is 2% faster.
2019-11-24 03:39:35 +00:00
retKP = toNormX(&P)
if expKP.Cmp(&retKP) != 0 {
t.Errorf("\nExp: %s\nGot: %s", expKP.Text(16), retKP.Text(16))
}
}
// Case C=1
expKP.SetString("0x582B866603E6FBEBD21FE660FB34EF9466FDEC55FFBCE1073134CC557071147821BBAD225E30F7B2B6790B00ED9C39A29AA043F58AF995E440AFB13DA8E6D788", 0)
P.x = toFp("0x1C5CA539C1D5B52DE4750C390C24C05251E8B1D33E48971FA86F5ADDED2D06C8CD31E94887541468BB2925EBD693C9DDFF5BD9508430F25FE28EE30C0760C0FE")
P.z = toFp("1")
co.a = toFp("0x538F785D52996919C8D5C73D842A0249669B5B6BB05338B74EAE8094AE5009A3BA2D73730F527D7403E8184D9B1FA11C0C4C40E7B328A84874A6DBCE99E1DF92")
co.c = toFp("1")
k = fp{0x7A36C930A83EFBD5, 0xD0E80041ED0DDF9F, 0x5AA17134F1B8F877, 0x975711EC94168E51, 0xB3CAD962BED4BAC5, 0x3026DFDD7E4F5687, 0xE67F91AB8EC9C3AF, 0x34671D3FD8C317E7}
checkXMul()
// Check if algorithms works correctly with k=1
expKP.SetString("0x1C5CA539C1D5B52DE4750C390C24C05251E8B1D33E48971FA86F5ADDED2D06C8CD31E94887541468BB2925EBD693C9DDFF5BD9508430F25FE28EE30C0760C0FE", 0)
P.x = toFp("0x1C5CA539C1D5B52DE4750C390C24C05251E8B1D33E48971FA86F5ADDED2D06C8CD31E94887541468BB2925EBD693C9DDFF5BD9508430F25FE28EE30C0760C0FE")
P.z = toFp("1")
co.a = toFp("0x538F785D52996919C8D5C73D842A0249669B5B6BB05338B74EAE8094AE5009A3BA2D73730F527D7403E8184D9B1FA11C0C4C40E7B328A84874A6DBCE99E1DF92")
co.c = toFp("1")
k = fp{1, 0, 0, 0, 0, 0, 0, 0}
checkXMul()
// Check if algorithms works correctly with value of k for which few small and high
// order bits are 0 (test for odd number of cswaps in xMul)
expKP.SetString("0x1925EDA0928C10F427B4E642E7E1481A670D1249956DED6A2292B9BAB841F6AA86A9F41459400845ED4A5E2531A14165F64FE4E43DBD85321B429C6DAE2E8987", 0)
P.x = toFp("0x4CE8603817B9BB06515E921AA201D26B31F3CE181D1E18CD5CD704708CCAD47546CEEAB42B98EE67925A5259E0684A0489F574A999DE127F708B849ACAA12A63")
P.z = toFp("1")
co.a = toFp("0x538F785D52996919C8D5C73D842A0249669B5B6BB05338B74EAE8094AE5009A3BA2D73730F527D7403E8184D9B1FA11C0C4C40E7B328A84874A6DBCE99E1DF92")
co.c = toFp("1")
k = fp{0, 7, 0, 0, 0, 0, 0, 0}
checkXMul()
// Check if algorithms works correctly with value of k for which few small and high
// order bits are 0 (test for even number of cswaps in xMul)
expKP.SetString("0x30C02915C5967C3B6EB2196A934ADF38A183E9C7E814B54121F93048A8FC12D5036992FABF8D807581017A4C1F93D07352413F38F6A902FC76A8894FE8D94805", 0)
P.x = toFp("0x2DDD15ED7C169BE6D9EC02CFE3DC507EC4A7A4D96DE3FAAB9BFCEA1B047807EA301E89830F2FDD0E7E642A85E7ACDE16BAD76DF140F719C4A7AB85153E7D69DC")
P.z = toFp("1")
co.a = toFp("0x538F785D52996919C8D5C73D842A0249669B5B6BB05338B74EAE8094AE5009A3BA2D73730F527D7403E8184D9B1FA11C0C4C40E7B328A84874A6DBCE99E1DF92")
co.c = toFp("1")
k = fp{0, 15, 0, 0, 0, 0, 0, 0}
checkXMul()
// xMul512 does NOT work correctly for k==0. In such case function will return 2*P. But
// thanks to that fact we don't need to handle k==0 case, we get some speedup.
expKP.SetString("0x6115B5D8BB613D11BDFEA70D436D87C1515553F6A15061727B4001E0AF745AAA9F39EB9464982829D931F77DAB9D71B24FF0D1D34C347F2A51FD45821F2EA06F", 0)
P.x = toFp("0x6C5B4D4AB0765AAB23C10F8455BE522D3A5363324D7AD641CC67C0A52FC1FFE9F3F8EDFE641478CA93D4D0016D83F21487FD4AF4E02F8A2C237CF27C5604BCC")
P.z = toFp("1")
co.a = toFp("0x599841D7D1FCD92A85759B7A3D2D5E4C56EFB17F19F86EB70E121EA16305EDE45A55868BE069313F821F7D94069EC220A4AC3B85500376710538246E9B3BC138")
co.c = toFp("1")
k = fp{0, 0, 0, 0, 0, 0, 0, 0}
checkXMul()
}
func TestMappointHardcoded3(t *testing.T) {
var P = point{
x: fp{0xca1a2fdec38c669b, 0xf2fe3678ebeb978b, 0xfda3e9a6f0c719d, 0x6f7bffa41772570b, 0x3d90cdd6283dc150, 0x21b55b738eb1ded9, 0x209515d0a9f41dd6, 0x5275cf397d154a12},
z: fp{0x1fff8309761576e, 0xef239cbeda7c2ba1, 0x6136ae2d76e95873, 0x1f8f6ac909570cec, 0x780fdf0cc7d676d8, 0x548098fe92ed04e1, 0xb39da564701ef35d, 0x5fec19626df41306}}
var A = coeff{
a: fp{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
c: fp{0xc8fc8df598726f0a, 0x7b1bc81750a6af95, 0x5d319e67c1e961b4, 0xb0aa7275301955f1, 0x4a080672d9ba6c64, 0x97a5ef8a246ee77b, 0x6ea9e5d4383676a, 0x3496e2e117e0ec80}}
var K = point{
x: fp{0x597616608e291c6f, 0xd14230b008736798, 0xa63099b1ace67e6e, 0xe37c13afd768bcfa, 0xc6ef718894f08135, 0x53a4fd09091f3522, 0xc9a1f9f670645fe1, 0x628c4a8efd83e5f0},
z: fp{0x8f18a654312ac1ad, 0xbc20a9b2472785c9, 0xdaf97c29bbf9e492, 0xf91a8c799e2f6119, 0xc8dc675cc8e528e6, 0x9a7b2c2f0df95171, 0x85629cd38cdd9fdb, 0x656d5253d3fd1a6e}}
var k uint64 = 3
var expA = coeff{
a: fp{0x6fa92a66e77cfc1, 0x9efbfb7118f1832c, 0x441894cc5d1d24ae, 0x5a2f0fafa26761de, 0x8095c36d3a20a78a, 0xb22be0023612a135, 0x5eb844d06ef0f430, 0x52e53309d1c90cf8},
c: fp{0x98173d5664a23e5c, 0xd8fe1c6306bbc11a, 0xa774fbc502648059, 0x766a0d839aa62c83, 0x4b074f9b93d1633d, 0xf306019dbf87f505, 0x77c720ca059234b0, 0x3d47ab65269c5908}}
var expP = point{
x: fp{0x91aba9b39f280495, 0xfbd8ea69d2990aeb, 0xb03e1b8ed7fe3dba, 0x3d30a41499f08998, 0xb15a42630de9c606, 0xa7dd487fef16f5c8, 0x8673948afed8e968, 0x57ecc8710004cd4d},
z: fp{0xce8819869a942526, 0xb98ca2ff79ef8969, 0xd49c9703743a1812, 0x21dbb090f9152e03, 0xbabdcac831b1adea, 0x8cee90762baa2ddd, 0xa0dd2ddcef809d96, 0x1de2a8887a32f19b}}
2020-05-14 00:48:43 +01:00
xIso(&P, &A, &K, k)
if !eqFp(&P.x, &expP.x) || !eqFp(&P.z, &expP.z) {
cSIDH-511: (#26) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh | TestName |Go | C | |------------------|----------|----------| |TestSharedSecret | 57.95774 | 57.91092 | |TestKeyGeneration | 62.23614 | 58.12980 | |TestSharedSecret | 55.28988 | 57.23132 | |TestKeyGeneration | 61.68745 | 58.66396 | |TestSharedSecret | 63.19408 | 58.64774 | |TestKeyGeneration | 62.34022 | 61.62539 | |TestSharedSecret | 62.85453 | 68.74503 | |TestKeyGeneration | 52.58518 | 58.40115 | |TestSharedSecret | 50.77081 | 61.91699 | |TestKeyGeneration | 59.91843 | 61.09266 | |TestSharedSecret | 59.97962 | 62.98151 | |TestKeyGeneration | 64.57525 | 56.22863 | |TestSharedSecret | 56.40521 | 55.77447 | |TestKeyGeneration | 67.85850 | 58.52604 | |TestSharedSecret | 60.54290 | 65.14052 | |TestKeyGeneration | 65.45766 | 58.42823 | On average Go implementation is 2% faster.
2019-11-24 03:39:35 +00:00
normP := toNormX(&P)
normPExp := toNormX(&expP)
t.Errorf("P != expP [\n %s != %s\n]", normP.Text(16), normPExp.Text(16))
}
2020-05-14 00:48:43 +01:00
if !eqFp(&A.a, &expA.a) || !eqFp(&A.c, &expA.c) {
cSIDH-511: (#26) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh | TestName |Go | C | |------------------|----------|----------| |TestSharedSecret | 57.95774 | 57.91092 | |TestKeyGeneration | 62.23614 | 58.12980 | |TestSharedSecret | 55.28988 | 57.23132 | |TestKeyGeneration | 61.68745 | 58.66396 | |TestSharedSecret | 63.19408 | 58.64774 | |TestKeyGeneration | 62.34022 | 61.62539 | |TestSharedSecret | 62.85453 | 68.74503 | |TestKeyGeneration | 52.58518 | 58.40115 | |TestSharedSecret | 50.77081 | 61.91699 | |TestKeyGeneration | 59.91843 | 61.09266 | |TestSharedSecret | 59.97962 | 62.98151 | |TestKeyGeneration | 64.57525 | 56.22863 | |TestSharedSecret | 56.40521 | 55.77447 | |TestKeyGeneration | 67.85850 | 58.52604 | |TestSharedSecret | 60.54290 | 65.14052 | |TestKeyGeneration | 65.45766 | 58.42823 | On average Go implementation is 2% faster.
2019-11-24 03:39:35 +00:00
t.Errorf("A != expA %X %X", A.a[0], expA.a[0])
}
}
func TestMappointHardcoded5(t *testing.T) {
var P = point{
x: fp{0xca1a2fdec38c669b, 0xf2fe3678ebeb978b, 0xfda3e9a6f0c719d, 0x6f7bffa41772570b, 0x3d90cdd6283dc150, 0x21b55b738eb1ded9, 0x209515d0a9f41dd6, 0x5275cf397d154a12},
z: fp{0x1fff8309761576e, 0xef239cbeda7c2ba1, 0x6136ae2d76e95873, 0x1f8f6ac909570cec, 0x780fdf0cc7d676d8, 0x548098fe92ed04e1, 0xb39da564701ef35d, 0x5fec19626df41306}}
var A = coeff{
a: fp{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
c: fp{0xc8fc8df598726f0a, 0x7b1bc81750a6af95, 0x5d319e67c1e961b4, 0xb0aa7275301955f1, 0x4a080672d9ba6c64, 0x97a5ef8a246ee77b, 0x6ea9e5d4383676a, 0x3496e2e117e0ec80}}
var K = point{
x: fp{0x597616608e291c6f, 0xd14230b008736798, 0xa63099b1ace67e6e, 0xe37c13afd768bcfa, 0xc6ef718894f08135, 0x53a4fd09091f3522, 0xc9a1f9f670645fe1, 0x628c4a8efd83e5f0},
z: fp{0x8f18a654312ac1ad, 0xbc20a9b2472785c9, 0xdaf97c29bbf9e492, 0xf91a8c799e2f6119, 0xc8dc675cc8e528e6, 0x9a7b2c2f0df95171, 0x85629cd38cdd9fdb, 0x656d5253d3fd1a6e}}
var k uint64 = 5
var expA = coeff{
a: fp{0x32076f58298ed474, 0x5094a1fc8696d307, 0x82e510594157944a, 0xb60ce760f88c83a9, 0xae8a28c325186983, 0xe31d2446a4ad2f18, 0xb266c612b5f141c1, 0x64283e618db5a705},
c: fp{0x4472b49b65272190, 0x2bd5919309778f56, 0x6132753691fe016c, 0x8f654849c09e6d34, 0xfa208dd9aea1ef12, 0xf7df0dd10071411a, 0x75afb7860500922c, 0x52fb7d34b129fb65}}
var expP = point{
x: fp{0x3b75fc94b2a6df2d, 0x96d53dc9b0e867a0, 0x22e87202421d274e, 0x30a361440697ee1a, 0x8b52ee078bdbddcd, 0x64425d500e6b934d, 0xf47d1f568f6df391, 0x5d9d3607431395ab},
z: fp{0x746e02dafa040976, 0xcd408f2cddbf3a8e, 0xf643354e0e13a93f, 0x7c39ed96ce9a5e29, 0xfcdf26f1a1a550ca, 0x2fc8aafc4ca0a559, 0x5d204a2b14cf19ba, 0xbd2c3406762f05d}}
2020-05-14 00:48:43 +01:00
xIso(&P, &A, &K, k)
if !eqFp(&P.x, &expP.x) || !eqFp(&P.z, &expP.z) {
cSIDH-511: (#26) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh | TestName |Go | C | |------------------|----------|----------| |TestSharedSecret | 57.95774 | 57.91092 | |TestKeyGeneration | 62.23614 | 58.12980 | |TestSharedSecret | 55.28988 | 57.23132 | |TestKeyGeneration | 61.68745 | 58.66396 | |TestSharedSecret | 63.19408 | 58.64774 | |TestKeyGeneration | 62.34022 | 61.62539 | |TestSharedSecret | 62.85453 | 68.74503 | |TestKeyGeneration | 52.58518 | 58.40115 | |TestSharedSecret | 50.77081 | 61.91699 | |TestKeyGeneration | 59.91843 | 61.09266 | |TestSharedSecret | 59.97962 | 62.98151 | |TestKeyGeneration | 64.57525 | 56.22863 | |TestSharedSecret | 56.40521 | 55.77447 | |TestKeyGeneration | 67.85850 | 58.52604 | |TestSharedSecret | 60.54290 | 65.14052 | |TestKeyGeneration | 65.45766 | 58.42823 | On average Go implementation is 2% faster.
2019-11-24 03:39:35 +00:00
normP := toNormX(&P)
normPExp := toNormX(&expP)
t.Errorf("P != expP [\n %s != %s\n]", normP.Text(16), normPExp.Text(16))
}
2020-05-14 00:48:43 +01:00
if !eqFp(&A.a, &expA.a) || !eqFp(&A.c, &expA.c) {
cSIDH-511: (#26) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh | TestName |Go | C | |------------------|----------|----------| |TestSharedSecret | 57.95774 | 57.91092 | |TestKeyGeneration | 62.23614 | 58.12980 | |TestSharedSecret | 55.28988 | 57.23132 | |TestKeyGeneration | 61.68745 | 58.66396 | |TestSharedSecret | 63.19408 | 58.64774 | |TestKeyGeneration | 62.34022 | 61.62539 | |TestSharedSecret | 62.85453 | 68.74503 | |TestKeyGeneration | 52.58518 | 58.40115 | |TestSharedSecret | 50.77081 | 61.91699 | |TestKeyGeneration | 59.91843 | 61.09266 | |TestSharedSecret | 59.97962 | 62.98151 | |TestKeyGeneration | 64.57525 | 56.22863 | |TestSharedSecret | 56.40521 | 55.77447 | |TestKeyGeneration | 67.85850 | 58.52604 | |TestSharedSecret | 60.54290 | 65.14052 | |TestKeyGeneration | 65.45766 | 58.42823 | On average Go implementation is 2% faster.
2019-11-24 03:39:35 +00:00
t.Errorf("A != expA %X %X", A.a[0], expA.a[0])
}
}
func BenchmarkXMul(b *testing.B) {
var kP, P point
var co coeff
var expKP big.Int
var k fp
// Case C=1
expKP.SetString("0x582B866603E6FBEBD21FE660FB34EF9466FDEC55FFBCE1073134CC557071147821BBAD225E30F7B2B6790B00ED9C39A29AA043F58AF995E440AFB13DA8E6D788", 0)
P.x = toFp("0x1C5CA539C1D5B52DE4750C390C24C05251E8B1D33E48971FA86F5ADDED2D06C8CD31E94887541468BB2925EBD693C9DDFF5BD9508430F25FE28EE30C0760C0FE")
P.z = toFp("1")
co.a = toFp("0x538F785D52996919C8D5C73D842A0249669B5B6BB05338B74EAE8094AE5009A3BA2D73730F527D7403E8184D9B1FA11C0C4C40E7B328A84874A6DBCE99E1DF92")
co.c = toFp("1")
k = fp{0x7A36C930A83EFBD5, 0xD0E80041ED0DDF9F, 0x5AA17134F1B8F877, 0x975711EC94168E51, 0xB3CAD962BED4BAC5, 0x3026DFDD7E4F5687, 0xE67F91AB8EC9C3AF, 0x34671D3FD8C317E7}
for n := 0; n < b.N; n++ {
2020-05-14 00:48:43 +01:00
xMul(&kP, &P, &co, &k)
cSIDH-511: (#26) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh | TestName |Go | C | |------------------|----------|----------| |TestSharedSecret | 57.95774 | 57.91092 | |TestKeyGeneration | 62.23614 | 58.12980 | |TestSharedSecret | 55.28988 | 57.23132 | |TestKeyGeneration | 61.68745 | 58.66396 | |TestSharedSecret | 63.19408 | 58.64774 | |TestKeyGeneration | 62.34022 | 61.62539 | |TestSharedSecret | 62.85453 | 68.74503 | |TestKeyGeneration | 52.58518 | 58.40115 | |TestSharedSecret | 50.77081 | 61.91699 | |TestKeyGeneration | 59.91843 | 61.09266 | |TestSharedSecret | 59.97962 | 62.98151 | |TestKeyGeneration | 64.57525 | 56.22863 | |TestSharedSecret | 56.40521 | 55.77447 | |TestKeyGeneration | 67.85850 | 58.52604 | |TestSharedSecret | 60.54290 | 65.14052 | |TestKeyGeneration | 65.45766 | 58.42823 | On average Go implementation is 2% faster.
2019-11-24 03:39:35 +00:00
}
}
func BenchmarkXAdd(b *testing.B) {
var P, Q, PdQ point
var PaQ point
P.x = toFp("0x5840FD8E0165F7F474260F99337461AF195233F791FABE735EC2634B74A95559568B4CEB23959C8A01C5C57E215D22639868ED840D74FE2BAC04830CF75047AD")
P.z = toFp("1")
Q.x = toFp("0x3C1A003C71436698B4A181CEB12BA4B4D1FF7BB14AAAF6FBDA6957C4EBA20AD8E3893DF6F64E67E81163E024C19C7E975F3EC61862F75502C3ED802370E75A3F")
Q.z = toFp("1")
PdQ.x = toFp("0x519B1928F752B0B2143C1C23EB247B370DBB5B9C29B9A3A064D7FBC1B67FAC34B6D3DDA0F3CB87C387B425B36F31B93A8E73252BA701927B767A9DE89D5A92AE")
PdQ.z = toFp("1")
for n := 0; n < b.N; n++ {
xAdd(&PaQ, &P, &Q, &PdQ)
}
}
func BenchmarkXDbl(b *testing.B) {
var P, A point
var PaP point
P.x = toFp("0x6C5B4D4AB0765AAB23C10F8455BE522D3A5363324D7AD641CC67C0A52FC1FFE9F3F8EDFE641478CA93D4D0016D83F21487FD4AF4E02F8A2C237CF27C5604BCC")
P.z = toFp("1")
A.x = toFp("0x599841D7D1FCD92A85759B7A3D2D5E4C56EFB17F19F86EB70E121EA16305EDE45A55868BE069313F821F7D94069EC220A4AC3B85500376710538246E9B3BC138")
A.z = toFp("1")
for n := 0; n < b.N; n++ {
xDbl(&PaP, &P, &A)
}
}
func BenchmarkIsom(b *testing.B) {
var P, kern point
var expPhiP big.Int
var co coeff
var k = uint64(2)
expPhiP.SetString("0x5FEBD68F795F9AEB732ECF0D1507904922F2B0736704E0751EF242B4E191E6F630D83778B5E5681161FD071CDEF7DF4C3A41D0ECEB30E90B119C5BF86C5AB51A", 0)
P.x = toFp("0x5FD8D226C228FD6AA3CCDCAB931C5D3AA000A46B47041F59D9724E517594F696D38F2CB45C987ACF68BB1057D8D518F926D8F55171F337D05354E0022BC66B23")
P.z = toFp("1")
co.a = toFp("0x9E8DBC4914E3C4F080592642DD0B08B9564AB3ADF75EE9B58A685443BA6E39A1ACD1201B7F034077AF344123880AF9D8C77575E6E782E00186881ECE8B87CA3")
co.c = toFp("1")
kern.x = toFp("0x594F77A49EABBF2A12025BC00E1DBC119CDA674B9FE8A00791724B42FEB7D225C4C9940B01B09B8F00B30B0E961212FB63E42614814E38EC9E5E5B0FEBF98C58")
kern.z = toFp("1")
for n := 0; n < b.N; n++ {
2020-05-14 00:48:43 +01:00
xIso(&P, &co, &kern, k)
cSIDH-511: (#26) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh | TestName |Go | C | |------------------|----------|----------| |TestSharedSecret | 57.95774 | 57.91092 | |TestKeyGeneration | 62.23614 | 58.12980 | |TestSharedSecret | 55.28988 | 57.23132 | |TestKeyGeneration | 61.68745 | 58.66396 | |TestSharedSecret | 63.19408 | 58.64774 | |TestKeyGeneration | 62.34022 | 61.62539 | |TestSharedSecret | 62.85453 | 68.74503 | |TestKeyGeneration | 52.58518 | 58.40115 | |TestSharedSecret | 50.77081 | 61.91699 | |TestKeyGeneration | 59.91843 | 61.09266 | |TestSharedSecret | 59.97962 | 62.98151 | |TestKeyGeneration | 64.57525 | 56.22863 | |TestSharedSecret | 56.40521 | 55.77447 | |TestKeyGeneration | 67.85850 | 58.52604 | |TestSharedSecret | 60.54290 | 65.14052 | |TestKeyGeneration | 65.45766 | 58.42823 | On average Go implementation is 2% faster.
2019-11-24 03:39:35 +00:00
}
}