kris/nobs - nobs - GIT server of AmongBytes

kris/nobs

Fork 0

mirror of https://github.com/henrydcase/nobs.git synced 2024-11-22 15:18:57 +00:00

Commit Graph

Author	SHA1	Message	Date
Jacob Appelbaum	20fffc2f35	add basic support for ppc64le, riscv64 (#48 ) This change set modifies build metadata to add support for ppc64le (POWER9) and riscv64 (RISC-V). The arm64 and amd64 assembler implementations are architecture specific and do not support ppc64le or riscv64. On ppc64le or riscv64 a generic implementation is chosen. The drbg/internal/aes/cipher_noasm.go file was written by @mixmasala and myself. The csidh and sidh tests are extremely slow (>30m) on RISC-V using the sifive,u54-mc (HiFive Unleashed) development board. The test timeout is set to infinity on RISC-V by the top level Makefile as at least one test does not finish within the default 10 minutes on RISC-V. On RISC-V the csidh test finishes after around 30 minutes, the sidh test finishes after around 71 minutes. These changes were tested with amd64 (Intel Core i7), arm64 (Raspberry Pi 4b), ppc64le (Talos POWER9, PowerNV T2P9D01 REV 1.00), and riscv64 (HighFive Unleashed, rv64imafdc,sifive,u54-mc). The kernel versions of those systems follows: Linux rpi4 5.13.0-1009-raspi #10-Ubuntu SMP PREEMPT Mon Oct 25 13:58:43 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux Linux i7 5.8.0-63-generic #71-Ubuntu SMP Tue Jul 13 15:59:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux Linux power9 5.11.0-34-generic #36-Ubuntu SMP Thu Aug 26 19:19:54 UTC 2021 ppc64le ppc64le ppc64le GNU/Linux Linux risc-v-unleashed-000 5.11.0-1022-generic #23~20.04.1-Ubuntu SMP Thu Oct 21 10:16:27 UTC 2021 riscv64 riscv64 riscv64 GNU/Linux	2023-03-13 23:12:45 +00:00
Kris Kwiatkowski	91945fde1f	csidh: cosmettic updates	2020-05-13 23:48:43 +00:00
Kris Kwiatkowski	7efbbf4745	cSIDH-511: (#26 ) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh \| TestName \|Go \| C \| \|------------------\|----------\|----------\| \|TestSharedSecret \| 57.95774 \| 57.91092 \| \|TestKeyGeneration \| 62.23614 \| 58.12980 \| \|TestSharedSecret \| 55.28988 \| 57.23132 \| \|TestKeyGeneration \| 61.68745 \| 58.66396 \| \|TestSharedSecret \| 63.19408 \| 58.64774 \| \|TestKeyGeneration \| 62.34022 \| 61.62539 \| \|TestSharedSecret \| 62.85453 \| 68.74503 \| \|TestKeyGeneration \| 52.58518 \| 58.40115 \| \|TestSharedSecret \| 50.77081 \| 61.91699 \| \|TestKeyGeneration \| 59.91843 \| 61.09266 \| \|TestSharedSecret \| 59.97962 \| 62.98151 \| \|TestKeyGeneration \| 64.57525 \| 56.22863 \| \|TestSharedSecret \| 56.40521 \| 55.77447 \| \|TestKeyGeneration \| 67.85850 \| 58.52604 \| \|TestSharedSecret \| 60.54290 \| 65.14052 \| \|TestKeyGeneration \| 65.45766 \| 58.42823 \| On average Go implementation is 2% faster.	2019-11-25 15:03:29 +00:00

Author

SHA1

Message

Date

Jacob Appelbaum

20fffc2f35

add basic support for ppc64le, riscv64 (#48 )

This change set modifies build metadata to add support for ppc64le
(POWER9) and riscv64 (RISC-V).  The arm64 and amd64 assembler
implementations are architecture specific and do not support ppc64le or
riscv64. On ppc64le or riscv64 a generic implementation is chosen.  The
drbg/internal/aes/cipher_noasm.go file was written by @mixmasala and
myself.

The csidh and sidh tests are extremely slow (>30m) on RISC-V using the
sifive,u54-mc (HiFive Unleashed) development board. The test timeout is
set to infinity on RISC-V by the top level Makefile as at least one test
does not finish within the default 10 minutes on RISC-V. On RISC-V the
csidh test finishes after around 30 minutes, the sidh test finishes
after around 71 minutes.

These changes were tested with amd64 (Intel Core i7), arm64 (Raspberry
Pi 4b), ppc64le (Talos POWER9, PowerNV T2P9D01 REV 1.00), and riscv64
(HighFive Unleashed, rv64imafdc,sifive,u54-mc).

The kernel versions of those systems follows:

Linux rpi4 5.13.0-1009-raspi #10-Ubuntu SMP PREEMPT Mon Oct 25 13:58:43
UTC 2021 aarch64 aarch64 aarch64 GNU/Linux

Linux i7 5.8.0-63-generic #71-Ubuntu SMP Tue Jul 13 15:59:12 UTC 2021
x86_64 x86_64 x86_64 GNU/Linux

Linux power9 5.11.0-34-generic #36-Ubuntu SMP Thu Aug 26 19:19:54 UTC
2021 ppc64le ppc64le ppc64le GNU/Linux

Linux risc-v-unleashed-000 5.11.0-1022-generic #23~20.04.1-Ubuntu SMP
Thu Oct 21 10:16:27 UTC 2021 riscv64 riscv64 riscv64 GNU/Linux

2023-03-13 23:12:45 +00:00

Kris Kwiatkowski

91945fde1f

csidh: cosmettic updates

2020-05-13 23:48:43 +00:00

Kris Kwiatkowski

7efbbf4745

cSIDH-511: (#26 )

Implementation of Commutative Supersingular Isogeny Diffie Hellman,
based on "A faster way to CSIDH" paper (2018/782).

* For fast isogeny calculation, implementation converts a curve from
  Montgomery to Edwards. All calculations are done on Edwards curve
  and then converted back to Montgomery.
* As multiplication in a field Fp511 is most expensive operation
  the implementation contains multiple multiplications. It has
  most performant, assembly implementation which uses BMI2 and
  ADOX/ADCX instructions for modern CPUs. It also contains
  slower implementation which will run on older CPUs

* Benchmarks (Intel SkyLake):

  BenchmarkGeneratePrivate   	    6459	    172213 ns/op	       0 B/op	       0 allocs/op
  BenchmarkGenerateKeyPair   	      25	  45800356 ns/op	       0 B/op	       0 allocs/op
  BenchmarkValidate          	     297	   3915983 ns/op	       0 B/op	       0 allocs/op
  BenchmarkValidateRandom    	  184683	      6231 ns/op	       0 B/op	       0 allocs/op
  BenchmarkValidateGenerated 	      25	  48481306 ns/op	       0 B/op	       0 allocs/op
  BenchmarkDerive            	      19	  60928763 ns/op	       0 B/op	       0 allocs/op
  BenchmarkDeriveGenerated   	       8	 137342421 ns/op	       0 B/op	       0 allocs/op
  BenchmarkXMul              	    2311	    494267 ns/op	       1 B/op	       0 allocs/op
  BenchmarkXAdd              	 2396754	       501 ns/op	       0 B/op	       0 allocs/op
  BenchmarkXDbl              	 2072690	       571 ns/op	       0 B/op	       0 allocs/op
  BenchmarkIsom              	   78004	     15171 ns/op	       0 B/op	       0 allocs/op
  BenchmarkFp512Sub          	224635152	         5.33 ns/op	       0 B/op	       0 allocs/op
  BenchmarkFp512Mul          	246633255	         4.90 ns/op	       0 B/op	       0 allocs/op
  BenchmarkCSwap             	233228547	         5.10 ns/op	       0 B/op	       0 allocs/op
  BenchmarkAddRdc            	87348240	        12.6 ns/op	       0 B/op	       0 allocs/op
  BenchmarkSubRdc            	95112787	        11.7 ns/op	       0 B/op	       0 allocs/op
  BenchmarkModExpRdc         	   25436	     46878 ns/op	       0 B/op	       0 allocs/op
  BenchmarkMulBmiAsm         	19527573	        60.1 ns/op	       0 B/op	       0 allocs/op
  BenchmarkMulGeneric        	 7117650	       164 ns/op	       0 B/op	       0 allocs/op

* Go code has very similar performance when compared to C
  implementation.
  Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23)
  github.com:henrydcase/sidh-torture/csidh

  | TestName         |Go        | C        |
  |------------------|----------|----------|
  |TestSharedSecret  | 57.95774 | 57.91092 |
  |TestKeyGeneration | 62.23614 | 58.12980 |
  |TestSharedSecret  | 55.28988 | 57.23132 |
  |TestKeyGeneration | 61.68745 | 58.66396 |
  |TestSharedSecret  | 63.19408 | 58.64774 |
  |TestKeyGeneration | 62.34022 | 61.62539 |
  |TestSharedSecret  | 62.85453 | 68.74503 |
  |TestKeyGeneration | 52.58518 | 58.40115 |
  |TestSharedSecret  | 50.77081 | 61.91699 |
  |TestKeyGeneration | 59.91843 | 61.09266 |
  |TestSharedSecret  | 59.97962 | 62.98151 |
  |TestKeyGeneration | 64.57525 | 56.22863 |
  |TestSharedSecret  | 56.40521 | 55.77447 |
  |TestKeyGeneration | 67.85850 | 58.52604 |
  |TestSharedSecret  | 60.54290 | 65.14052 |
  |TestKeyGeneration | 65.45766 | 58.42823 |

  On average Go implementation is 2% faster.

2019-11-25 15:03:29 +00:00

3 Commits