1
0
mirror of https://github.com/henrydcase/nobs.git synced 2024-11-22 15:18:57 +00:00
Commit Graph

3 Commits

Author SHA1 Message Date
Jacob Appelbaum
20fffc2f35
add basic support for ppc64le, riscv64 (#48)
This change set modifies build metadata to add support for ppc64le
(POWER9) and riscv64 (RISC-V).  The arm64 and amd64 assembler
implementations are architecture specific and do not support ppc64le or
riscv64. On ppc64le or riscv64 a generic implementation is chosen.  The
drbg/internal/aes/cipher_noasm.go file was written by @mixmasala and
myself.

The csidh and sidh tests are extremely slow (>30m) on RISC-V using the
sifive,u54-mc (HiFive Unleashed) development board. The test timeout is
set to infinity on RISC-V by the top level Makefile as at least one test
does not finish within the default 10 minutes on RISC-V. On RISC-V the
csidh test finishes after around 30 minutes, the sidh test finishes
after around 71 minutes.

These changes were tested with amd64 (Intel Core i7), arm64 (Raspberry
Pi 4b), ppc64le (Talos POWER9, PowerNV T2P9D01 REV 1.00), and riscv64
(HighFive Unleashed, rv64imafdc,sifive,u54-mc).

The kernel versions of those systems follows:

Linux rpi4 5.13.0-1009-raspi #10-Ubuntu SMP PREEMPT Mon Oct 25 13:58:43
UTC 2021 aarch64 aarch64 aarch64 GNU/Linux

Linux i7 5.8.0-63-generic #71-Ubuntu SMP Tue Jul 13 15:59:12 UTC 2021
x86_64 x86_64 x86_64 GNU/Linux

Linux power9 5.11.0-34-generic #36-Ubuntu SMP Thu Aug 26 19:19:54 UTC
2021 ppc64le ppc64le ppc64le GNU/Linux

Linux risc-v-unleashed-000 5.11.0-1022-generic #23~20.04.1-Ubuntu SMP
Thu Oct 21 10:16:27 UTC 2021 riscv64 riscv64 riscv64 GNU/Linux
2023-03-13 23:12:45 +00:00
91945fde1f csidh: cosmettic updates 2020-05-13 23:48:43 +00:00
7efbbf4745 cSIDH-511: (#26)
Implementation of Commutative Supersingular Isogeny Diffie Hellman,
based on "A faster way to CSIDH" paper (2018/782).

* For fast isogeny calculation, implementation converts a curve from
  Montgomery to Edwards. All calculations are done on Edwards curve
  and then converted back to Montgomery.
* As multiplication in a field Fp511 is most expensive operation
  the implementation contains multiple multiplications. It has
  most performant, assembly implementation which uses BMI2 and
  ADOX/ADCX instructions for modern CPUs. It also contains
  slower implementation which will run on older CPUs

* Benchmarks (Intel SkyLake):

  BenchmarkGeneratePrivate   	    6459	    172213 ns/op	       0 B/op	       0 allocs/op
  BenchmarkGenerateKeyPair   	      25	  45800356 ns/op	       0 B/op	       0 allocs/op
  BenchmarkValidate          	     297	   3915983 ns/op	       0 B/op	       0 allocs/op
  BenchmarkValidateRandom    	  184683	      6231 ns/op	       0 B/op	       0 allocs/op
  BenchmarkValidateGenerated 	      25	  48481306 ns/op	       0 B/op	       0 allocs/op
  BenchmarkDerive            	      19	  60928763 ns/op	       0 B/op	       0 allocs/op
  BenchmarkDeriveGenerated   	       8	 137342421 ns/op	       0 B/op	       0 allocs/op
  BenchmarkXMul              	    2311	    494267 ns/op	       1 B/op	       0 allocs/op
  BenchmarkXAdd              	 2396754	       501 ns/op	       0 B/op	       0 allocs/op
  BenchmarkXDbl              	 2072690	       571 ns/op	       0 B/op	       0 allocs/op
  BenchmarkIsom              	   78004	     15171 ns/op	       0 B/op	       0 allocs/op
  BenchmarkFp512Sub          	224635152	         5.33 ns/op	       0 B/op	       0 allocs/op
  BenchmarkFp512Mul          	246633255	         4.90 ns/op	       0 B/op	       0 allocs/op
  BenchmarkCSwap             	233228547	         5.10 ns/op	       0 B/op	       0 allocs/op
  BenchmarkAddRdc            	87348240	        12.6 ns/op	       0 B/op	       0 allocs/op
  BenchmarkSubRdc            	95112787	        11.7 ns/op	       0 B/op	       0 allocs/op
  BenchmarkModExpRdc         	   25436	     46878 ns/op	       0 B/op	       0 allocs/op
  BenchmarkMulBmiAsm         	19527573	        60.1 ns/op	       0 B/op	       0 allocs/op
  BenchmarkMulGeneric        	 7117650	       164 ns/op	       0 B/op	       0 allocs/op

* Go code has very similar performance when compared to C
  implementation.
  Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23)
  github.com:henrydcase/sidh-torture/csidh

  | TestName         |Go        | C        |
  |------------------|----------|----------|
  |TestSharedSecret  | 57.95774 | 57.91092 |
  |TestKeyGeneration | 62.23614 | 58.12980 |
  |TestSharedSecret  | 55.28988 | 57.23132 |
  |TestKeyGeneration | 61.68745 | 58.66396 |
  |TestSharedSecret  | 63.19408 | 58.64774 |
  |TestKeyGeneration | 62.34022 | 61.62539 |
  |TestSharedSecret  | 62.85453 | 68.74503 |
  |TestKeyGeneration | 52.58518 | 58.40115 |
  |TestSharedSecret  | 50.77081 | 61.91699 |
  |TestKeyGeneration | 59.91843 | 61.09266 |
  |TestSharedSecret  | 59.97962 | 62.98151 |
  |TestKeyGeneration | 64.57525 | 56.22863 |
  |TestSharedSecret  | 56.40521 | 55.77447 |
  |TestKeyGeneration | 67.85850 | 58.52604 |
  |TestSharedSecret  | 60.54290 | 65.14052 |
  |TestKeyGeneration | 65.45766 | 58.42823 |

  On average Go implementation is 2% faster.
2019-11-25 15:03:29 +00:00