kris/nobs - nobs - GIT server of AmongBytes

kris/nobs

mirror of https://github.com/henrydcase/nobs.git synced 2024-11-29 10:21:23 +00:00

Author	SHA1	Message	Date
Kris Kwiatkowski	f5a7daf2bb	sidh: update to p434	2020-05-14 00:02:32 +00:00
Kris Kwiatkowski	91945fde1f	csidh: cosmettic updates	2020-05-13 23:48:43 +00:00
Kris Kwiatkowski	7efbbf4745	cSIDH-511: (#26 ) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh \| TestName \|Go \| C \| \|------------------\|----------\|----------\| \|TestSharedSecret \| 57.95774 \| 57.91092 \| \|TestKeyGeneration \| 62.23614 \| 58.12980 \| \|TestSharedSecret \| 55.28988 \| 57.23132 \| \|TestKeyGeneration \| 61.68745 \| 58.66396 \| \|TestSharedSecret \| 63.19408 \| 58.64774 \| \|TestKeyGeneration \| 62.34022 \| 61.62539 \| \|TestSharedSecret \| 62.85453 \| 68.74503 \| \|TestKeyGeneration \| 52.58518 \| 58.40115 \| \|TestSharedSecret \| 50.77081 \| 61.91699 \| \|TestKeyGeneration \| 59.91843 \| 61.09266 \| \|TestSharedSecret \| 59.97962 \| 62.98151 \| \|TestKeyGeneration \| 64.57525 \| 56.22863 \| \|TestSharedSecret \| 56.40521 \| 55.77447 \| \|TestKeyGeneration \| 67.85850 \| 58.52604 \| \|TestSharedSecret \| 60.54290 \| 65.14052 \| \|TestKeyGeneration \| 65.45766 \| 58.42823 \| On average Go implementation is 2% faster.	2019-11-25 15:03:29 +00:00
Kris Kwiatkowski	b184944242	Nits for SIDH	2019-04-09 17:09:34 +01:00
Kris Kwiatkowski	e66cc99401	Improves comment	2019-02-19 14:44:11 +00:00
Kris Kwiatkowski	90f8cba329	SIDH: Update (#9 ) * Change license to BSD-3 * SIDH: Multiple developlemnts	2018-12-03 23:07:01 +00:00
Kris Kwiatkowski	ea2ffa2d61	PERF: sidh-p503: Split sub and add into 2 uops instead of 3 (#8 ) The performance improvement comes from the fact that on Skylake "add mem, reg" splits into 2 uops - one arithmetic uop and another one for loading a value from mem. However, changing operand order to "add reg, mem" splits into 3 uops: one for arithmetic op, one for load and one additional one for storing the result back. Using separated instruction for loading/storing helps to parallelize execution (load/store and arithmetic instruction is done in parallel if possible) For details, see: https://www.agner.org/optimize/instruction_tables.pdf New: BenchmarkFp503StrongReduce-4 300000000 5.57 ns/op Old: BenchmarkFp503StrongReduce-4 200000000 8.60 ns/op This just improves one function, but more functions can be improved	2018-11-18 20:57:29 +00:00
Kris Kwiatkowski	e9ddb6fb45	sidh/csidh: use SEE for performing CSWAP (#6 ) * Makefile * makefile: tools for profiling * sidh: use SIMD for performing CSWAP Loads data into 128-bit XMM registers and performs conditional swap. This is probably less useful for SIDH, but will be useful for cSIDH	2018-10-29 15:41:09 +00:00
Kris Kwiatkowski	1e34845d00	complate rewrite for SIDH and SIKE. adds p503 (#5 )	2018-10-25 15:22:28 +01:00
Kris Kwiatkowski	d6fc82531f	Doc	2018-10-25 15:22:28 +01:00
Kris Kwiatkowski	b769c88767	Improves some comments and hardcodes precomputed value (#4 ) * Improves some comments and hardcodes precomputed value * Tests curve coefficients recovery	2018-10-25 15:22:28 +01:00
Henry D. Case	ddbd866ee5	additional comments	2018-07-31 20:21:32 +01:00
Henry D. Case	73c9938c59	Use ADCB instead of SBBL in checkLessThanThree238	2018-07-31 17:10:03 +01:00
Henry D. Case	105532aa09	sidh: move p751 implementation to p751 folder	2018-07-27 00:09:34 +01:00
Henry D. Case	a4d12ceaae	adds SIKE and SIDH	2018-07-23 23:18:38 +01:00

15 Commits