1
0
mirror of https://github.com/henrydcase/nobs.git synced 2024-11-25 00:21:29 +00:00
Commit Graph

15 Commits

Author SHA1 Message Date
Jacob Appelbaum
20fffc2f35
add basic support for ppc64le, riscv64 (#48)
This change set modifies build metadata to add support for ppc64le
(POWER9) and riscv64 (RISC-V).  The arm64 and amd64 assembler
implementations are architecture specific and do not support ppc64le or
riscv64. On ppc64le or riscv64 a generic implementation is chosen.  The
drbg/internal/aes/cipher_noasm.go file was written by @mixmasala and
myself.

The csidh and sidh tests are extremely slow (>30m) on RISC-V using the
sifive,u54-mc (HiFive Unleashed) development board. The test timeout is
set to infinity on RISC-V by the top level Makefile as at least one test
does not finish within the default 10 minutes on RISC-V. On RISC-V the
csidh test finishes after around 30 minutes, the sidh test finishes
after around 71 minutes.

These changes were tested with amd64 (Intel Core i7), arm64 (Raspberry
Pi 4b), ppc64le (Talos POWER9, PowerNV T2P9D01 REV 1.00), and riscv64
(HighFive Unleashed, rv64imafdc,sifive,u54-mc).

The kernel versions of those systems follows:

Linux rpi4 5.13.0-1009-raspi #10-Ubuntu SMP PREEMPT Mon Oct 25 13:58:43
UTC 2021 aarch64 aarch64 aarch64 GNU/Linux

Linux i7 5.8.0-63-generic #71-Ubuntu SMP Tue Jul 13 15:59:12 UTC 2021
x86_64 x86_64 x86_64 GNU/Linux

Linux power9 5.11.0-34-generic #36-Ubuntu SMP Thu Aug 26 19:19:54 UTC
2021 ppc64le ppc64le ppc64le GNU/Linux

Linux risc-v-unleashed-000 5.11.0-1022-generic #23~20.04.1-Ubuntu SMP
Thu Oct 21 10:16:27 UTC 2021 riscv64 riscv64 riscv64 GNU/Linux
2023-03-13 23:12:45 +00:00
bc32024729
sidh: updates (#31) 2020-05-14 08:51:20 +00:00
7efbbf4745 cSIDH-511: (#26)
Implementation of Commutative Supersingular Isogeny Diffie Hellman,
based on "A faster way to CSIDH" paper (2018/782).

* For fast isogeny calculation, implementation converts a curve from
  Montgomery to Edwards. All calculations are done on Edwards curve
  and then converted back to Montgomery.
* As multiplication in a field Fp511 is most expensive operation
  the implementation contains multiple multiplications. It has
  most performant, assembly implementation which uses BMI2 and
  ADOX/ADCX instructions for modern CPUs. It also contains
  slower implementation which will run on older CPUs

* Benchmarks (Intel SkyLake):

  BenchmarkGeneratePrivate   	    6459	    172213 ns/op	       0 B/op	       0 allocs/op
  BenchmarkGenerateKeyPair   	      25	  45800356 ns/op	       0 B/op	       0 allocs/op
  BenchmarkValidate          	     297	   3915983 ns/op	       0 B/op	       0 allocs/op
  BenchmarkValidateRandom    	  184683	      6231 ns/op	       0 B/op	       0 allocs/op
  BenchmarkValidateGenerated 	      25	  48481306 ns/op	       0 B/op	       0 allocs/op
  BenchmarkDerive            	      19	  60928763 ns/op	       0 B/op	       0 allocs/op
  BenchmarkDeriveGenerated   	       8	 137342421 ns/op	       0 B/op	       0 allocs/op
  BenchmarkXMul              	    2311	    494267 ns/op	       1 B/op	       0 allocs/op
  BenchmarkXAdd              	 2396754	       501 ns/op	       0 B/op	       0 allocs/op
  BenchmarkXDbl              	 2072690	       571 ns/op	       0 B/op	       0 allocs/op
  BenchmarkIsom              	   78004	     15171 ns/op	       0 B/op	       0 allocs/op
  BenchmarkFp512Sub          	224635152	         5.33 ns/op	       0 B/op	       0 allocs/op
  BenchmarkFp512Mul          	246633255	         4.90 ns/op	       0 B/op	       0 allocs/op
  BenchmarkCSwap             	233228547	         5.10 ns/op	       0 B/op	       0 allocs/op
  BenchmarkAddRdc            	87348240	        12.6 ns/op	       0 B/op	       0 allocs/op
  BenchmarkSubRdc            	95112787	        11.7 ns/op	       0 B/op	       0 allocs/op
  BenchmarkModExpRdc         	   25436	     46878 ns/op	       0 B/op	       0 allocs/op
  BenchmarkMulBmiAsm         	19527573	        60.1 ns/op	       0 B/op	       0 allocs/op
  BenchmarkMulGeneric        	 7117650	       164 ns/op	       0 B/op	       0 allocs/op

* Go code has very similar performance when compared to C
  implementation.
  Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23)
  github.com:henrydcase/sidh-torture/csidh

  | TestName         |Go        | C        |
  |------------------|----------|----------|
  |TestSharedSecret  | 57.95774 | 57.91092 |
  |TestKeyGeneration | 62.23614 | 58.12980 |
  |TestSharedSecret  | 55.28988 | 57.23132 |
  |TestKeyGeneration | 61.68745 | 58.66396 |
  |TestSharedSecret  | 63.19408 | 58.64774 |
  |TestKeyGeneration | 62.34022 | 61.62539 |
  |TestSharedSecret  | 62.85453 | 68.74503 |
  |TestKeyGeneration | 52.58518 | 58.40115 |
  |TestSharedSecret  | 50.77081 | 61.91699 |
  |TestKeyGeneration | 59.91843 | 61.09266 |
  |TestSharedSecret  | 59.97962 | 62.98151 |
  |TestKeyGeneration | 64.57525 | 56.22863 |
  |TestSharedSecret  | 56.40521 | 55.77447 |
  |TestKeyGeneration | 67.85850 | 58.52604 |
  |TestSharedSecret  | 60.54290 | 65.14052 |
  |TestKeyGeneration | 65.45766 | 58.42823 |

  On average Go implementation is 2% faster.
2019-11-25 15:03:29 +00:00
7298b650cc
Adds go.mod (#21)
* Reset Makefile after adding go.mod
* Remove ``build`` directory
* Simiplifies makefile
* shake: Make xorIn copyOut platform specific
2019-05-15 18:03:35 +01:00
71624cdc4c Improvements to makefile 2019-04-09 17:30:30 +01:00
08f7315b64 DRBG: Speed improvements
* CTR-DRBG doesn't call "NewCipher" for block encryption
* Changes API of CTR-DRBG, so that read operation implementes io.Reader

Benchmark results:
----------------------
benchmark           old ns/op     new ns/op     delta
BenchmarkInit-4     1118          3579          +220.13%
BenchmarkRead-4     5343          14589         +173.05%

benchmark           old allocs     new allocs     delta
BenchmarkInit-4     15             0              -100.00%
BenchmarkRead-4     67             0              -100.00%

benchmark           old bytes     new bytes     delta
BenchmarkInit-4     1824          0             -100.00%
BenchmarkRead-4     9488          0             -100.00%
2019-04-09 14:37:59 +01:00
90f8cba329
SIDH: Update (#9)
* Change license to BSD-3

* SIDH: Multiple developlemnts
2018-12-03 23:07:01 +00:00
e9ddb6fb45
sidh/csidh: use SEE for performing CSWAP (#6)
* Makefile

* makefile: tools for profiling

* sidh: use SIMD for performing CSWAP

Loads data into 128-bit XMM registers and performs conditional swap.
This is probably less useful for SIDH, but will be useful for cSIDH
2018-10-29 15:41:09 +00:00
51688dc4bb makefile: adds bench target 2018-10-25 15:18:54 +01:00
2ff456da90 Temporarily adds simple x448 implementation 2018-08-02 23:45:28 +01:00
dc58ebcd23 makefile formatting 2018-07-31 19:14:49 +01:00
2a25a09b4a improves makefile 2018-07-31 18:20:27 +01:00
34805fc1fb Improves Makefile 2018-07-31 18:00:55 +01:00
2fc873ca64 creates package ready to move to tls-tris 2018-07-27 00:38:21 +01:00
8cf7cfdc8d SM3 and cSHAKE 2018-06-23 16:34:45 +01:00