kris/nobs - nobs - GIT server of AmongBytes

kris/nobs

mirror of https://github.com/henrydcase/nobs.git synced 2024-11-25 00:21:29 +00:00

Author	SHA1	Message	Date
Jacob Appelbaum	20fffc2f35	add basic support for ppc64le, riscv64 (#48 ) This change set modifies build metadata to add support for ppc64le (POWER9) and riscv64 (RISC-V). The arm64 and amd64 assembler implementations are architecture specific and do not support ppc64le or riscv64. On ppc64le or riscv64 a generic implementation is chosen. The drbg/internal/aes/cipher_noasm.go file was written by @mixmasala and myself. The csidh and sidh tests are extremely slow (>30m) on RISC-V using the sifive,u54-mc (HiFive Unleashed) development board. The test timeout is set to infinity on RISC-V by the top level Makefile as at least one test does not finish within the default 10 minutes on RISC-V. On RISC-V the csidh test finishes after around 30 minutes, the sidh test finishes after around 71 minutes. These changes were tested with amd64 (Intel Core i7), arm64 (Raspberry Pi 4b), ppc64le (Talos POWER9, PowerNV T2P9D01 REV 1.00), and riscv64 (HighFive Unleashed, rv64imafdc,sifive,u54-mc). The kernel versions of those systems follows: Linux rpi4 5.13.0-1009-raspi #10-Ubuntu SMP PREEMPT Mon Oct 25 13:58:43 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux Linux i7 5.8.0-63-generic #71-Ubuntu SMP Tue Jul 13 15:59:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux Linux power9 5.11.0-34-generic #36-Ubuntu SMP Thu Aug 26 19:19:54 UTC 2021 ppc64le ppc64le ppc64le GNU/Linux Linux risc-v-unleashed-000 5.11.0-1022-generic #23~20.04.1-Ubuntu SMP Thu Oct 21 10:16:27 UTC 2021 riscv64 riscv64 riscv64 GNU/Linux	2023-03-13 23:12:45 +00:00
Kris Kwiatkowski	3a8ac85da1	two more benchmarks	2021-04-22 13:46:15 +01:00
Kris Kwiatkowski	c30f61923a	adds cycle count. fixes csidh which provides 128 not 512 bits of security	2020-08-25 11:21:11 +01:00
Kris Kwiatkowski	a02b9a77a0	mkem: add csidh	2020-08-25 11:19:07 +01:00
Kris Kwiatkowski	2500d74484	export more symbols from common	2020-05-16 22:37:41 +00:00
Henry Case	a152c09fd5	sike: move common (#33 ) * makes common reusable * exports some more symbols from common * remove kem for a moment	2020-05-16 20:14:48 +00:00
Henry Case	55957bbf5e	sike: move common (#32 ) * makes common reusable * exports some more symbols from common	2020-05-16 18:51:34 +00:00
Kris Kwiatkowski	ab962715d5	Fixes cSIDH key generation when run in the loop	2020-05-14 11:53:23 +00:00
Henry Case	bc32024729	sidh: updates (#31 )	2020-05-14 08:51:20 +00:00
Kris Kwiatkowski	f5a7daf2bb	sidh: update to p434	2020-05-14 00:02:32 +00:00
Kris Kwiatkowski	91945fde1f	csidh: cosmettic updates	2020-05-13 23:48:43 +00:00
Kris Kwiatkowski	7efbbf4745	cSIDH-511: (#26 ) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh \| TestName \|Go \| C \| \|------------------\|----------\|----------\| \|TestSharedSecret \| 57.95774 \| 57.91092 \| \|TestKeyGeneration \| 62.23614 \| 58.12980 \| \|TestSharedSecret \| 55.28988 \| 57.23132 \| \|TestKeyGeneration \| 61.68745 \| 58.66396 \| \|TestSharedSecret \| 63.19408 \| 58.64774 \| \|TestKeyGeneration \| 62.34022 \| 61.62539 \| \|TestSharedSecret \| 62.85453 \| 68.74503 \| \|TestKeyGeneration \| 52.58518 \| 58.40115 \| \|TestSharedSecret \| 50.77081 \| 61.91699 \| \|TestKeyGeneration \| 59.91843 \| 61.09266 \| \|TestSharedSecret \| 59.97962 \| 62.98151 \| \|TestKeyGeneration \| 64.57525 \| 56.22863 \| \|TestSharedSecret \| 56.40521 \| 55.77447 \| \|TestKeyGeneration \| 67.85850 \| 58.52604 \| \|TestSharedSecret \| 60.54290 \| 65.14052 \| \|TestKeyGeneration \| 65.45766 \| 58.42823 \| On average Go implementation is 2% faster.	2019-11-25 15:03:29 +00:00
Kris Kwiatkowski	b184944242	Nits for SIDH	2019-04-09 17:09:34 +01:00
Kris Kwiatkowski	e66cc99401	Improves comment	2019-02-19 14:44:11 +00:00
Kris Kwiatkowski	90f8cba329	SIDH: Update (#9 ) * Change license to BSD-3 * SIDH: Multiple developlemnts	2018-12-03 23:07:01 +00:00
Kris Kwiatkowski	ea2ffa2d61	PERF: sidh-p503: Split sub and add into 2 uops instead of 3 (#8 ) The performance improvement comes from the fact that on Skylake "add mem, reg" splits into 2 uops - one arithmetic uop and another one for loading a value from mem. However, changing operand order to "add reg, mem" splits into 3 uops: one for arithmetic op, one for load and one additional one for storing the result back. Using separated instruction for loading/storing helps to parallelize execution (load/store and arithmetic instruction is done in parallel if possible) For details, see: https://www.agner.org/optimize/instruction_tables.pdf New: BenchmarkFp503StrongReduce-4 300000000 5.57 ns/op Old: BenchmarkFp503StrongReduce-4 200000000 8.60 ns/op This just improves one function, but more functions can be improved	2018-11-18 20:57:29 +00:00
Kris Kwiatkowski	e9ddb6fb45	sidh/csidh: use SEE for performing CSWAP (#6 ) * Makefile * makefile: tools for profiling * sidh: use SIMD for performing CSWAP Loads data into 128-bit XMM registers and performs conditional swap. This is probably less useful for SIDH, but will be useful for cSIDH	2018-10-29 15:41:09 +00:00
Kris Kwiatkowski	1e34845d00	complate rewrite for SIDH and SIKE. adds p503 (#5 )	2018-10-25 15:22:28 +01:00
Kris Kwiatkowski	d6fc82531f	Doc	2018-10-25 15:22:28 +01:00
Kris Kwiatkowski	b769c88767	Improves some comments and hardcodes precomputed value (#4 ) * Improves some comments and hardcodes precomputed value * Tests curve coefficients recovery	2018-10-25 15:22:28 +01:00
Henry D. Case	ddbd866ee5	additional comments	2018-07-31 20:21:32 +01:00
Henry D. Case	73c9938c59	Use ADCB instead of SBBL in checkLessThanThree238	2018-07-31 17:10:03 +01:00
Henry D. Case	105532aa09	sidh: move p751 implementation to p751 folder	2018-07-27 00:09:34 +01:00
Henry D. Case	a4d12ceaae	adds SIKE and SIDH	2018-07-23 23:18:38 +01:00

24 Commits