1
0
mirror of https://github.com/henrydcase/nobs.git synced 2024-11-22 15:18:57 +00:00
Commit Graph

82 Commits

Author SHA1 Message Date
dependabot[bot]
25b66236df
Bump golang.org/x/sys in /kem/mkem (#50)
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.0.0-20191120155948-bd437916bb0e to 0.1.0.
- [Release notes](https://github.com/golang/sys/releases)
- [Commits](https://github.com/golang/sys/commits/v0.1.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-13 23:15:16 +00:00
dependabot[bot]
a7142b7412
Bump golang.org/x/sys from 0.0.0-20191120155948-bd437916bb0e to 0.1.0 (#49)
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.0.0-20191120155948-bd437916bb0e to 0.1.0.
- [Release notes](https://github.com/golang/sys/releases)
- [Commits](https://github.com/golang/sys/commits/v0.1.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sys
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-13 23:13:08 +00:00
Jacob Appelbaum
20fffc2f35
add basic support for ppc64le, riscv64 (#48)
This change set modifies build metadata to add support for ppc64le
(POWER9) and riscv64 (RISC-V).  The arm64 and amd64 assembler
implementations are architecture specific and do not support ppc64le or
riscv64. On ppc64le or riscv64 a generic implementation is chosen.  The
drbg/internal/aes/cipher_noasm.go file was written by @mixmasala and
myself.

The csidh and sidh tests are extremely slow (>30m) on RISC-V using the
sifive,u54-mc (HiFive Unleashed) development board. The test timeout is
set to infinity on RISC-V by the top level Makefile as at least one test
does not finish within the default 10 minutes on RISC-V. On RISC-V the
csidh test finishes after around 30 minutes, the sidh test finishes
after around 71 minutes.

These changes were tested with amd64 (Intel Core i7), arm64 (Raspberry
Pi 4b), ppc64le (Talos POWER9, PowerNV T2P9D01 REV 1.00), and riscv64
(HighFive Unleashed, rv64imafdc,sifive,u54-mc).

The kernel versions of those systems follows:

Linux rpi4 5.13.0-1009-raspi #10-Ubuntu SMP PREEMPT Mon Oct 25 13:58:43
UTC 2021 aarch64 aarch64 aarch64 GNU/Linux

Linux i7 5.8.0-63-generic #71-Ubuntu SMP Tue Jul 13 15:59:12 UTC 2021
x86_64 x86_64 x86_64 GNU/Linux

Linux power9 5.11.0-34-generic #36-Ubuntu SMP Thu Aug 26 19:19:54 UTC
2021 ppc64le ppc64le ppc64le GNU/Linux

Linux risc-v-unleashed-000 5.11.0-1022-generic #23~20.04.1-Ubuntu SMP
Thu Oct 21 10:16:27 UTC 2021 riscv64 riscv64 riscv64 GNU/Linux
2023-03-13 23:12:45 +00:00
3a8ac85da1 two more benchmarks 2021-04-22 13:46:15 +01:00
cf196e90f2
Update README.md 2021-04-09 09:14:58 +01:00
73be4271a4
Update ctr_drbg.go 2021-04-09 07:45:00 +01:00
55bb2ea182
Update README.md 2021-04-02 17:34:03 +01:00
1e84ed09cc
Update README.md 2021-04-02 17:33:34 +01:00
9ddbe424a3 Edit README.md 2021-03-11 23:03:50 +00:00
7c32db8dd7 sm3: use less operations for ff1 and gg1 2021-03-08 23:58:08 +00:00
8474981cfc
SHA-3: speedups (#47)
* add function for one-off calculation

* sha3: simplifies Read function

* sha3: remove if from Read
2020-10-03 23:27:08 +01:00
45bc1a75f6
add function for one-off calculation (#45) 2020-10-03 15:12:26 +01:00
adfaf1e58c
fix: ebx -> ecx (#46) 2020-10-03 15:11:52 +01:00
24408329a5 Use bits.RotateLeft64 whenever possible 2020-09-28 21:03:08 +01:00
0174e314a1
Update README.md 2020-08-29 02:13:24 +01:00
820906b7c7
sha3: optimizations and cleanup (#41)
* complate reset of the SHA-3 code. Affects mostly the code in sha3.go
* fixes a bug  which causes SHAKE implementation to crash
* implementation of Read()/Write() avoid unnecessary buffering as much
  as possible
* NOTE: at some point I've done separated implementation for SumXXX,
  functions, but after optimizing implementation of Read/Write/Sum, the
  gain wasn't that big

Current speed on Initial speed on i7-8665U@1.90

BenchmarkPermutationFunction 	 			 1592787	       736 ns/op	 271.90 MB/s	       0 B/op	       0 allocs/op
BenchmarkSha3Chunk_x01/SHA-3/224         	   98752	     11630 ns/op	 176.02 MB/s	       0 B/op	       0 allocs/op
BenchmarkSha3Chunk_x01/SHA-3/256         	   92508	     12447 ns/op	 164.46 MB/s	       0 B/op	       0 allocs/op
BenchmarkSha3Chunk_x01/SHA-3/384         	   76765	     15206 ns/op	 134.62 MB/s	       0 B/op	       0 allocs/op
BenchmarkSha3Chunk_x01/SHA-3/512         	   54333	     21932 ns/op	  93.33 MB/s	       0 B/op	       0 allocs/op
BenchmarkSha3Chunk_x16/SHA-3/224         	   10000	    102161 ns/op	 160.37 MB/s	       0 B/op	       0 allocs/op
BenchmarkSha3Chunk_x16/SHA-3/256         	   10000	    106531 ns/op	 153.80 MB/s	       0 B/op	       0 allocs/op
BenchmarkSha3Chunk_x16/SHA-3/384         	    8641	    137272 ns/op	 119.35 MB/s	       0 B/op	       0 allocs/op
BenchmarkSha3Chunk_x16/SHA-3/512         	    6340	    189124 ns/op	  86.63 MB/s	       0 B/op	       0 allocs/op
BenchmarkShake_x01/SHAKE-128             	  167062	      7149 ns/op	 188.83 MB/s	       0 B/op	       0 allocs/op
BenchmarkShake_x01/SHAKE-256             	  151982	      7748 ns/op	 174.24 MB/s	       0 B/op	       0 allocs/op
BenchmarkShake_x16/SHAKE-128             	   12963	     87770 ns/op	 186.67 MB/s	       0 B/op	       0 allocs/op
BenchmarkShake_x16/SHAKE-256             	   10000	    105554 ns/op	 155.22 MB/s	       0 B/op	       0 allocs/op
BenchmarkCShake/cSHAKE-128               	  109148	     10940 ns/op	 187.11 MB/s	       0 B/op	       0 allocs/op
BenchmarkCShake/cSHAKE-256               	   90324	     13211 ns/op	 154.94 MB/s	       0 B/op	       0 allocs/op
PASS
2020-08-29 02:12:49 +01:00
3a320e1714
Create FUNDING.yml 2020-08-28 23:32:08 +01:00
7dcb72bf74 remove shake 2020-08-25 22:39:56 +01:00
ffd7590213 improve comment
Initial speed on i7-8665U

> go test -bench=. -test.cpu=1
goos: linux
goarch: amd64
pkg: github.com/henrydcase/nobs/hash/sha3
BenchmarkPermutationFunction 	 1634836	       732 ns/op	 273.18 MB/s
BenchmarkSha3_512_MTU        	   78438	     15340 ns/op	  88.00 MB/s
BenchmarkSha3_384_MTU        	  108807	     11025 ns/op	 122.45 MB/s
BenchmarkSha3_256_MTU        	  136902	      8767 ns/op	 153.98 MB/s
BenchmarkSha3_224_MTU        	  143377	      8355 ns/op	 161.57 MB/s
BenchmarkShake128_MTU        	  163569	      7108 ns/op	 189.94 MB/s
BenchmarkShake256_MTU        	  156534	      7643 ns/op	 176.64 MB/s
BenchmarkShake256_16x        	   10000	    112109 ns/op	 146.14 MB/s
BenchmarkShake256_1MiB       	     204	   5877014 ns/op	 178.42 MB/s
BenchmarkSha3_512_1MiB       	     100	  10967026 ns/op	  95.61 MB/s
PASS
ok  	github.com/henrydcase/nobs/hash/sha3	13.855s
2020-08-25 17:09:40 +01:00
516ea4f5e8 cleanup 2020-08-25 12:32:22 +01:00
68ba33e34f sha3: remove s390 2020-08-25 12:11:08 +01:00
b56c355c8d
adds cycle count. fixes csidh which provides 128 not 512 bits of security (#38) 2020-08-25 11:22:53 +01:00
c30f61923a adds cycle count. fixes csidh which provides 128 not 512 bits of security 2020-08-25 11:21:11 +01:00
a02b9a77a0 mkem: add csidh 2020-08-25 11:19:07 +01:00
2500d74484 export more symbols from common 2020-05-16 22:37:41 +00:00
a152c09fd5
sike: move common (#33)
* makes common reusable
* exports some more symbols from common
* remove kem for a moment
2020-05-16 20:14:48 +00:00
55957bbf5e
sike: move common (#32)
* makes common reusable
* exports some more symbols from common
2020-05-16 18:51:34 +00:00
ab962715d5 Fixes cSIDH key generation when run in the loop 2020-05-14 11:53:23 +00:00
bc32024729
sidh: updates (#31) 2020-05-14 08:51:20 +00:00
f5a7daf2bb sidh: update to p434 2020-05-14 00:02:32 +00:00
91945fde1f csidh: cosmettic updates 2020-05-13 23:48:43 +00:00
7d891c7eb8
support go 1.14 (#29)
NOBS now supports Go 1.14 for
* x86-64
* ARM
2020-03-05 11:19:51 +00:00
d0692c81f0 Remove BS from README.md 2020-02-13 10:27:42 +00:00
48ea6a583a Remove BS from README.md 2020-02-13 10:27:18 +00:00
c5bff4fa11 Remove BS from README.md 2020-02-13 10:25:47 +00:00
2a73461591 remove crapy x448 2020-02-13 10:17:54 +00:00
7efbbf4745 cSIDH-511: (#26)
Implementation of Commutative Supersingular Isogeny Diffie Hellman,
based on "A faster way to CSIDH" paper (2018/782).

* For fast isogeny calculation, implementation converts a curve from
  Montgomery to Edwards. All calculations are done on Edwards curve
  and then converted back to Montgomery.
* As multiplication in a field Fp511 is most expensive operation
  the implementation contains multiple multiplications. It has
  most performant, assembly implementation which uses BMI2 and
  ADOX/ADCX instructions for modern CPUs. It also contains
  slower implementation which will run on older CPUs

* Benchmarks (Intel SkyLake):

  BenchmarkGeneratePrivate   	    6459	    172213 ns/op	       0 B/op	       0 allocs/op
  BenchmarkGenerateKeyPair   	      25	  45800356 ns/op	       0 B/op	       0 allocs/op
  BenchmarkValidate          	     297	   3915983 ns/op	       0 B/op	       0 allocs/op
  BenchmarkValidateRandom    	  184683	      6231 ns/op	       0 B/op	       0 allocs/op
  BenchmarkValidateGenerated 	      25	  48481306 ns/op	       0 B/op	       0 allocs/op
  BenchmarkDerive            	      19	  60928763 ns/op	       0 B/op	       0 allocs/op
  BenchmarkDeriveGenerated   	       8	 137342421 ns/op	       0 B/op	       0 allocs/op
  BenchmarkXMul              	    2311	    494267 ns/op	       1 B/op	       0 allocs/op
  BenchmarkXAdd              	 2396754	       501 ns/op	       0 B/op	       0 allocs/op
  BenchmarkXDbl              	 2072690	       571 ns/op	       0 B/op	       0 allocs/op
  BenchmarkIsom              	   78004	     15171 ns/op	       0 B/op	       0 allocs/op
  BenchmarkFp512Sub          	224635152	         5.33 ns/op	       0 B/op	       0 allocs/op
  BenchmarkFp512Mul          	246633255	         4.90 ns/op	       0 B/op	       0 allocs/op
  BenchmarkCSwap             	233228547	         5.10 ns/op	       0 B/op	       0 allocs/op
  BenchmarkAddRdc            	87348240	        12.6 ns/op	       0 B/op	       0 allocs/op
  BenchmarkSubRdc            	95112787	        11.7 ns/op	       0 B/op	       0 allocs/op
  BenchmarkModExpRdc         	   25436	     46878 ns/op	       0 B/op	       0 allocs/op
  BenchmarkMulBmiAsm         	19527573	        60.1 ns/op	       0 B/op	       0 allocs/op
  BenchmarkMulGeneric        	 7117650	       164 ns/op	       0 B/op	       0 allocs/op

* Go code has very similar performance when compared to C
  implementation.
  Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23)
  github.com:henrydcase/sidh-torture/csidh

  | TestName         |Go        | C        |
  |------------------|----------|----------|
  |TestSharedSecret  | 57.95774 | 57.91092 |
  |TestKeyGeneration | 62.23614 | 58.12980 |
  |TestSharedSecret  | 55.28988 | 57.23132 |
  |TestKeyGeneration | 61.68745 | 58.66396 |
  |TestSharedSecret  | 63.19408 | 58.64774 |
  |TestKeyGeneration | 62.34022 | 61.62539 |
  |TestSharedSecret  | 62.85453 | 68.74503 |
  |TestKeyGeneration | 52.58518 | 58.40115 |
  |TestSharedSecret  | 50.77081 | 61.91699 |
  |TestKeyGeneration | 59.91843 | 61.09266 |
  |TestSharedSecret  | 59.97962 | 62.98151 |
  |TestKeyGeneration | 64.57525 | 56.22863 |
  |TestSharedSecret  | 56.40521 | 55.77447 |
  |TestKeyGeneration | 67.85850 | 58.52604 |
  |TestSharedSecret  | 60.54290 | 65.14052 |
  |TestKeyGeneration | 65.45766 | 58.42823 |

  On average Go implementation is 2% faster.
2019-11-25 15:03:29 +00:00
15f6ee16b9 SHA-3: Fixes crash when cloning Shake state 2019-05-26 17:29:15 +01:00
9b3c0190b0 Updates P34 strategy calculation 2019-05-23 18:32:28 +01:00
7298b650cc
Adds go.mod (#21)
* Reset Makefile after adding go.mod
* Remove ``build`` directory
* Simiplifies makefile
* shake: Make xorIn copyOut platform specific
2019-05-15 18:03:35 +01:00
49bf0db8fd SHAKE: Don't use function pointers (#20)
* xorIn and copyOut function pointers cause input and output data
  to be moved to heap. This degrades performance of calling code.

* This change removes usage of those function pointers. We will always
  use unaligned implementation as it's faster (but may crash on some
  systems)

* Benchmark compares generic vs unaligned xorIn and copyOut

benchmark                          old ns/op     new ns/op     delta
BenchmarkPermutationFunction-4     463           815           +76.03%
BenchmarkShake128_MTU-4            4443          8180          +84.11%
BenchmarkShake256_MTU-4            4739          9060          +91.18%
BenchmarkShake256_16x-4            71886         132629        +84.50%
BenchmarkShake256_1MiB-4           3695138       6649012       +79.94%
BenchmarkCShake128_448_16x-4       21210         24611         +16.03%
BenchmarkCShake128_1MiB-4          3009342       3396496       +12.87%
BenchmarkCShake256_448_16x-4       26034         27785         +6.73%
BenchmarkCShake256_1MiB-4          3654713       3829404       +4.78%
2019-05-14 17:08:33 +01:00
e6439f96ab Adds cSHAKE with 0 alloc interface (#19) 2019-05-14 01:19:29 +01:00
6f9706df01
CTR-DRBG: Use hardware acceleration on X86 (#18)
benchmark              old ns/op     new ns/op     delta
BenchmarkInit-4        3403          397           -88.33%
BenchmarkRead-4        14535         1560          -89.27%
2019-04-09 23:50:21 +01:00
71624cdc4c Improvements to makefile 2019-04-09 17:30:30 +01:00
b184944242 Nits for SIDH 2019-04-09 17:09:34 +01:00
08f7315b64 DRBG: Speed improvements
* CTR-DRBG doesn't call "NewCipher" for block encryption
* Changes API of CTR-DRBG, so that read operation implementes io.Reader

Benchmark results:
----------------------
benchmark           old ns/op     new ns/op     delta
BenchmarkInit-4     1118          3579          +220.13%
BenchmarkRead-4     5343          14589         +173.05%

benchmark           old allocs     new allocs     delta
BenchmarkInit-4     15             0              -100.00%
BenchmarkRead-4     67             0              -100.00%

benchmark           old bytes     new bytes     delta
BenchmarkInit-4     1824          0             -100.00%
BenchmarkRead-4     9488          0             -100.00%
2019-04-09 14:37:59 +01:00
e66cc99401 Improves comment 2019-02-19 14:44:11 +00:00
b47a731959
Run tests on ARM64 (#11) 2019-02-16 21:29:20 +00:00
90f8cba329
SIDH: Update (#9)
* Change license to BSD-3

* SIDH: Multiple developlemnts
2018-12-03 23:07:01 +00:00
ea2ffa2d61 PERF: sidh-p503: Split sub and add into 2 uops instead of 3 (#8)
The performance improvement comes from the fact that on Skylake
"add mem, reg" splits into 2 uops - one arithmetic uop and another one
for loading a value from mem.
However, changing operand order to "add reg, mem" splits into 3 uops:
one for arithmetic op, one for load and one additional one for storing
the result back.
Using separated instruction for loading/storing helps to parallelize
execution (load/store and arithmetic instruction is done in parallel
if possible)

For details, see: https://www.agner.org/optimize/instruction_tables.pdf

New: BenchmarkFp503StrongReduce-4    300000000            5.57 ns/op
Old: BenchmarkFp503StrongReduce-4    200000000            8.60 ns/op

This just improves one function, but more functions can be improved
2018-11-18 20:57:29 +00:00