dependabot[bot]
25b66236df
Bump golang.org/x/sys in /kem/mkem ( #50 )
Bumps [golang.org/x/sys](https://github.com/golang/sys ) from 0.0.0-20191120155948-bd437916bb0e to 0.1.0.
- [Release notes](https://github.com/golang/sys/releases )
- [Commits](https://github.com/golang/sys/commits/v0.1.0 )
---
updated-dependencies:
- dependency-name: golang.org/x/sys
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
1 year ago
dependabot[bot]
a7142b7412
Bump golang.org/x/sys from 0.0.0-20191120155948-bd437916bb0e to 0.1.0 ( #49 )
Bumps [golang.org/x/sys](https://github.com/golang/sys ) from 0.0.0-20191120155948-bd437916bb0e to 0.1.0.
- [Release notes](https://github.com/golang/sys/releases )
- [Commits](https://github.com/golang/sys/commits/v0.1.0 )
---
updated-dependencies:
- dependency-name: golang.org/x/sys
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
1 year ago
Jacob Appelbaum
20fffc2f35
add basic support for ppc64le, riscv64 ( #48 )
This change set modifies build metadata to add support for ppc64le
(POWER9) and riscv64 (RISC-V). The arm64 and amd64 assembler
implementations are architecture specific and do not support ppc64le or
riscv64. On ppc64le or riscv64 a generic implementation is chosen. The
drbg/internal/aes/cipher_noasm.go file was written by @mixmasala and
myself.
The csidh and sidh tests are extremely slow (>30m) on RISC-V using the
sifive,u54-mc (HiFive Unleashed) development board. The test timeout is
set to infinity on RISC-V by the top level Makefile as at least one test
does not finish within the default 10 minutes on RISC-V. On RISC-V the
csidh test finishes after around 30 minutes, the sidh test finishes
after around 71 minutes.
These changes were tested with amd64 (Intel Core i7), arm64 (Raspberry
Pi 4b), ppc64le (Talos POWER9, PowerNV T2P9D01 REV 1.00), and riscv64
(HighFive Unleashed, rv64imafdc,sifive,u54-mc).
The kernel versions of those systems follows:
Linux rpi4 5.13.0-1009-raspi #10-Ubuntu SMP PREEMPT Mon Oct 25 13:58:43
UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
Linux i7 5.8.0-63-generic #71-Ubuntu SMP Tue Jul 13 15:59:12 UTC 2021
x86_64 x86_64 x86_64 GNU/Linux
Linux power9 5.11.0-34-generic #36-Ubuntu SMP Thu Aug 26 19:19:54 UTC
2021 ppc64le ppc64le ppc64le GNU/Linux
Linux risc-v-unleashed-000 5.11.0-1022-generic #23~20.04.1-Ubuntu SMP
Thu Oct 21 10:16:27 UTC 2021 riscv64 riscv64 riscv64 GNU/Linux
1 year ago
Henry Case
3a8ac85da1
two more benchmarks
3 years ago
Henry Case
cf196e90f2
Update README.md
3 years ago
Henry Case
73be4271a4
Update ctr_drbg.go
3 years ago
Henry Case
55bb2ea182
Update README.md
3 years ago
Henry Case
1e84ed09cc
Update README.md
3 years ago
Henry Case
9ddbe424a3
Edit README.md
3 years ago
Henry Case
7c32db8dd7
sm3: use less operations for ff1 and gg1
3 years ago
Henry Case
8474981cfc
SHA-3: speedups ( #47 )
* add function for one-off calculation
* sha3: simplifies Read function
* sha3: remove if from Read
4 years ago
Henry Case
45bc1a75f6
add function for one-off calculation ( #45 )
4 years ago
Henry Case
adfaf1e58c
fix: ebx -> ecx ( #46 )
4 years ago
Henry Case
24408329a5
Use bits.RotateLeft64 whenever possible
4 years ago
Henry Case
0174e314a1
Update README.md
4 years ago
Henry Case
820906b7c7
sha3: optimizations and cleanup ( #41 )
* complate reset of the SHA-3 code. Affects mostly the code in sha3.go
* fixes a bug which causes SHAKE implementation to crash
* implementation of Read()/Write() avoid unnecessary buffering as much
as possible
* NOTE: at some point I've done separated implementation for SumXXX,
functions, but after optimizing implementation of Read/Write/Sum, the
gain wasn't that big
Current speed on Initial speed on i7-8665U@1.90
BenchmarkPermutationFunction 1592787 736 ns/op 271.90 MB/s 0 B/op 0 allocs/op
BenchmarkSha3Chunk_x01/SHA-3/224 98752 11630 ns/op 176.02 MB/s 0 B/op 0 allocs/op
BenchmarkSha3Chunk_x01/SHA-3/256 92508 12447 ns/op 164.46 MB/s 0 B/op 0 allocs/op
BenchmarkSha3Chunk_x01/SHA-3/384 76765 15206 ns/op 134.62 MB/s 0 B/op 0 allocs/op
BenchmarkSha3Chunk_x01/SHA-3/512 54333 21932 ns/op 93.33 MB/s 0 B/op 0 allocs/op
BenchmarkSha3Chunk_x16/SHA-3/224 10000 102161 ns/op 160.37 MB/s 0 B/op 0 allocs/op
BenchmarkSha3Chunk_x16/SHA-3/256 10000 106531 ns/op 153.80 MB/s 0 B/op 0 allocs/op
BenchmarkSha3Chunk_x16/SHA-3/384 8641 137272 ns/op 119.35 MB/s 0 B/op 0 allocs/op
BenchmarkSha3Chunk_x16/SHA-3/512 6340 189124 ns/op 86.63 MB/s 0 B/op 0 allocs/op
BenchmarkShake_x01/SHAKE-128 167062 7149 ns/op 188.83 MB/s 0 B/op 0 allocs/op
BenchmarkShake_x01/SHAKE-256 151982 7748 ns/op 174.24 MB/s 0 B/op 0 allocs/op
BenchmarkShake_x16/SHAKE-128 12963 87770 ns/op 186.67 MB/s 0 B/op 0 allocs/op
BenchmarkShake_x16/SHAKE-256 10000 105554 ns/op 155.22 MB/s 0 B/op 0 allocs/op
BenchmarkCShake/cSHAKE-128 109148 10940 ns/op 187.11 MB/s 0 B/op 0 allocs/op
BenchmarkCShake/cSHAKE-256 90324 13211 ns/op 154.94 MB/s 0 B/op 0 allocs/op
PASS
4 years ago
Henry Case
3a320e1714
Create FUNDING.yml
4 years ago
Henry Case
7dcb72bf74
remove shake
4 years ago
Henry Case
ffd7590213
improve comment
Initial speed on i7-8665U
> go test -bench=. -test.cpu=1
goos: linux
goarch: amd64
pkg: github.com/henrydcase/nobs/hash/sha3
BenchmarkPermutationFunction 1634836 732 ns/op 273.18 MB/s
BenchmarkSha3_512_MTU 78438 15340 ns/op 88.00 MB/s
BenchmarkSha3_384_MTU 108807 11025 ns/op 122.45 MB/s
BenchmarkSha3_256_MTU 136902 8767 ns/op 153.98 MB/s
BenchmarkSha3_224_MTU 143377 8355 ns/op 161.57 MB/s
BenchmarkShake128_MTU 163569 7108 ns/op 189.94 MB/s
BenchmarkShake256_MTU 156534 7643 ns/op 176.64 MB/s
BenchmarkShake256_16x 10000 112109 ns/op 146.14 MB/s
BenchmarkShake256_1MiB 204 5877014 ns/op 178.42 MB/s
BenchmarkSha3_512_1MiB 100 10967026 ns/op 95.61 MB/s
PASS
ok github.com/henrydcase/nobs/hash/sha3 13.855s
4 years ago
Henry Case
516ea4f5e8
cleanup
4 years ago
Henry Case
68ba33e34f
sha3: remove s390
4 years ago
Henry Case
b56c355c8d
adds cycle count. fixes csidh which provides 128 not 512 bits of security ( #38 )
4 years ago
Henry Case
c30f61923a
adds cycle count. fixes csidh which provides 128 not 512 bits of security
4 years ago
Henry Case
a02b9a77a0
mkem: add csidh
4 years ago
Henry Case
2500d74484
export more symbols from common
4 years ago
Henry Case
a152c09fd5
sike: move common ( #33 )
* makes common reusable
* exports some more symbols from common
* remove kem for a moment
4 years ago
Henry Case
55957bbf5e
sike: move common ( #32 )
* makes common reusable
* exports some more symbols from common
4 years ago
Henry Case
ab962715d5
Fixes cSIDH key generation when run in the loop
4 years ago
Henry Case
bc32024729
sidh: updates ( #31 )
4 years ago
Henry Case
f5a7daf2bb
sidh: update to p434
4 years ago
Henry Case
91945fde1f
csidh: cosmettic updates
4 years ago
Kris K
7d891c7eb8
support go 1.14 ( #29 )
NOBS now supports Go 1.14 for
* x86-64
* ARM
4 years ago
Henry Case
d0692c81f0
Remove BS from README.md
4 years ago
Henry Case
48ea6a583a
Remove BS from README.md
4 years ago
Henry Case
c5bff4fa11
Remove BS from README.md
4 years ago
Henry Case
2a73461591
remove crapy x448
4 years ago
Kris Kwiatkowski
7efbbf4745
cSIDH-511: ( #26 )
Implementation of Commutative Supersingular Isogeny Diffie Hellman,
based on "A faster way to CSIDH" paper (2018/782).
* For fast isogeny calculation, implementation converts a curve from
Montgomery to Edwards. All calculations are done on Edwards curve
and then converted back to Montgomery.
* As multiplication in a field Fp511 is most expensive operation
the implementation contains multiple multiplications. It has
most performant, assembly implementation which uses BMI2 and
ADOX/ADCX instructions for modern CPUs. It also contains
slower implementation which will run on older CPUs
* Benchmarks (Intel SkyLake):
BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op
BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op
BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op
BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op
BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op
BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op
BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op
BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op
BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op
BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op
BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op
BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op
BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op
BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op
BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op
BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op
BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op
BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op
BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op
* Go code has very similar performance when compared to C
implementation.
Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23)
github.com:henrydcase/sidh-torture/csidh
| TestName |Go | C |
|------------------|----------|----------|
|TestSharedSecret | 57.95774 | 57.91092 |
|TestKeyGeneration | 62.23614 | 58.12980 |
|TestSharedSecret | 55.28988 | 57.23132 |
|TestKeyGeneration | 61.68745 | 58.66396 |
|TestSharedSecret | 63.19408 | 58.64774 |
|TestKeyGeneration | 62.34022 | 61.62539 |
|TestSharedSecret | 62.85453 | 68.74503 |
|TestKeyGeneration | 52.58518 | 58.40115 |
|TestSharedSecret | 50.77081 | 61.91699 |
|TestKeyGeneration | 59.91843 | 61.09266 |
|TestSharedSecret | 59.97962 | 62.98151 |
|TestKeyGeneration | 64.57525 | 56.22863 |
|TestSharedSecret | 56.40521 | 55.77447 |
|TestKeyGeneration | 67.85850 | 58.52604 |
|TestSharedSecret | 60.54290 | 65.14052 |
|TestKeyGeneration | 65.45766 | 58.42823 |
On average Go implementation is 2% faster.
5 years ago
Henry Case
15f6ee16b9
SHA-3: Fixes crash when cloning Shake state
5 years ago
Henry Case
9b3c0190b0
Updates P34 strategy calculation
5 years ago
Henry Case
7298b650cc
Adds go.mod ( #21 )
* Reset Makefile after adding go.mod
* Remove ``build`` directory
* Simiplifies makefile
* shake: Make xorIn copyOut platform specific
5 years ago
Henry Case
49bf0db8fd
SHAKE: Don't use function pointers ( #20 )
* xorIn and copyOut function pointers cause input and output data
to be moved to heap. This degrades performance of calling code.
* This change removes usage of those function pointers. We will always
use unaligned implementation as it's faster (but may crash on some
systems)
* Benchmark compares generic vs unaligned xorIn and copyOut
benchmark old ns/op new ns/op delta
BenchmarkPermutationFunction-4 463 815 +76.03%
BenchmarkShake128_MTU-4 4443 8180 +84.11%
BenchmarkShake256_MTU-4 4739 9060 +91.18%
BenchmarkShake256_16x-4 71886 132629 +84.50%
BenchmarkShake256_1MiB-4 3695138 6649012 +79.94%
BenchmarkCShake128_448_16x-4 21210 24611 +16.03%
BenchmarkCShake128_1MiB-4 3009342 3396496 +12.87%
BenchmarkCShake256_448_16x-4 26034 27785 +6.73%
BenchmarkCShake256_1MiB-4 3654713 3829404 +4.78%
5 years ago
Henry Case
e6439f96ab
Adds cSHAKE with 0 alloc interface ( #19 )
5 years ago
Henry Case
6f9706df01
CTR-DRBG: Use hardware acceleration on X86 ( #18 )
benchmark old ns/op new ns/op delta
BenchmarkInit-4 3403 397 -88.33%
BenchmarkRead-4 14535 1560 -89.27%
5 years ago
Kris Kwiatkowski
71624cdc4c
Improvements to makefile
5 years ago
Kris Kwiatkowski
b184944242
Nits for SIDH
5 years ago
Kris Kwiatkowski
08f7315b64
DRBG: Speed improvements
* CTR-DRBG doesn't call "NewCipher" for block encryption
* Changes API of CTR-DRBG, so that read operation implementes io.Reader
Benchmark results:
----------------------
benchmark old ns/op new ns/op delta
BenchmarkInit-4 1118 3579 +220.13%
BenchmarkRead-4 5343 14589 +173.05%
benchmark old allocs new allocs delta
BenchmarkInit-4 15 0 -100.00%
BenchmarkRead-4 67 0 -100.00%
benchmark old bytes new bytes delta
BenchmarkInit-4 1824 0 -100.00%
BenchmarkRead-4 9488 0 -100.00%
5 years ago
Kris Kwiatkowski
e66cc99401
Improves comment
5 years ago
Henry Case
b47a731959
Run tests on ARM64 ( #11 )
5 years ago
Henry Case
90f8cba329
SIDH: Update ( #9 )
* Change license to BSD-3
* SIDH: Multiple developlemnts
6 years ago
Henry Case
ea2ffa2d61
PERF: sidh-p503: Split sub and add into 2 uops instead of 3 ( #8 )
The performance improvement comes from the fact that on Skylake
"add mem, reg" splits into 2 uops - one arithmetic uop and another one
for loading a value from mem.
However, changing operand order to "add reg, mem" splits into 3 uops:
one for arithmetic op, one for load and one additional one for storing
the result back.
Using separated instruction for loading/storing helps to parallelize
execution (load/store and arithmetic instruction is done in parallel
if possible)
For details, see: https://www.agner.org/optimize/instruction_tables.pdf
New: BenchmarkFp503StrongReduce-4 300000000 5.57 ns/op
Old: BenchmarkFp503StrongReduce-4 200000000 8.60 ns/op
This just improves one function, but more functions can be improved
6 years ago