kris/nobs - nobs - GIT server of AmongBytes

kris/nobs

mirror of https://github.com/henrydcase/nobs.git synced 2024-11-22 15:18:57 +00:00

Author	SHA1	Message	Date
dependabot[bot]	25b66236df	Bump golang.org/x/sys in /kem/mkem (#50 ) Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.0.0-20191120155948-bd437916bb0e to 0.1.0. - [Release notes](https://github.com/golang/sys/releases) - [Commits](https://github.com/golang/sys/commits/v0.1.0) --- updated-dependencies: - dependency-name: golang.org/x/sys dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-13 23:15:16 +00:00
dependabot[bot]	a7142b7412	Bump golang.org/x/sys from 0.0.0-20191120155948-bd437916bb0e to 0.1.0 (#49 ) Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.0.0-20191120155948-bd437916bb0e to 0.1.0. - [Release notes](https://github.com/golang/sys/releases) - [Commits](https://github.com/golang/sys/commits/v0.1.0) --- updated-dependencies: - dependency-name: golang.org/x/sys dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-13 23:13:08 +00:00
Jacob Appelbaum	20fffc2f35	add basic support for ppc64le, riscv64 (#48 ) This change set modifies build metadata to add support for ppc64le (POWER9) and riscv64 (RISC-V). The arm64 and amd64 assembler implementations are architecture specific and do not support ppc64le or riscv64. On ppc64le or riscv64 a generic implementation is chosen. The drbg/internal/aes/cipher_noasm.go file was written by @mixmasala and myself. The csidh and sidh tests are extremely slow (>30m) on RISC-V using the sifive,u54-mc (HiFive Unleashed) development board. The test timeout is set to infinity on RISC-V by the top level Makefile as at least one test does not finish within the default 10 minutes on RISC-V. On RISC-V the csidh test finishes after around 30 minutes, the sidh test finishes after around 71 minutes. These changes were tested with amd64 (Intel Core i7), arm64 (Raspberry Pi 4b), ppc64le (Talos POWER9, PowerNV T2P9D01 REV 1.00), and riscv64 (HighFive Unleashed, rv64imafdc,sifive,u54-mc). The kernel versions of those systems follows: Linux rpi4 5.13.0-1009-raspi #10-Ubuntu SMP PREEMPT Mon Oct 25 13:58:43 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux Linux i7 5.8.0-63-generic #71-Ubuntu SMP Tue Jul 13 15:59:12 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux Linux power9 5.11.0-34-generic #36-Ubuntu SMP Thu Aug 26 19:19:54 UTC 2021 ppc64le ppc64le ppc64le GNU/Linux Linux risc-v-unleashed-000 5.11.0-1022-generic #23~20.04.1-Ubuntu SMP Thu Oct 21 10:16:27 UTC 2021 riscv64 riscv64 riscv64 GNU/Linux	2023-03-13 23:12:45 +00:00
Kris Kwiatkowski	3a8ac85da1	two more benchmarks	2021-04-22 13:46:15 +01:00
Henry Case	cf196e90f2	Update README.md	2021-04-09 09:14:58 +01:00
Henry Case	73be4271a4	Update ctr_drbg.go	2021-04-09 07:45:00 +01:00
Henry Case	55bb2ea182	Update README.md	2021-04-02 17:34:03 +01:00
Henry Case	1e84ed09cc	Update README.md	2021-04-02 17:33:34 +01:00
Kris Kwiatkowski	9ddbe424a3	Edit README.md	2021-03-11 23:03:50 +00:00
Kris Kwiatkowski	7c32db8dd7	sm3: use less operations for ff1 and gg1	2021-03-08 23:58:08 +00:00
Henry Case	8474981cfc	SHA-3: speedups (#47 ) * add function for one-off calculation * sha3: simplifies Read function * sha3: remove if from Read	2020-10-03 23:27:08 +01:00
Henry Case	45bc1a75f6	add function for one-off calculation (#45 )	2020-10-03 15:12:26 +01:00
Henry Case	adfaf1e58c	fix: ebx -> ecx (#46 )	2020-10-03 15:11:52 +01:00
Kris Kwiatkowski	24408329a5	Use bits.RotateLeft64 whenever possible	2020-09-28 21:03:08 +01:00
Henry Case	0174e314a1	Update README.md	2020-08-29 02:13:24 +01:00
Henry Case	820906b7c7	sha3: optimizations and cleanup (#41 ) * complate reset of the SHA-3 code. Affects mostly the code in sha3.go * fixes a bug which causes SHAKE implementation to crash * implementation of Read()/Write() avoid unnecessary buffering as much as possible * NOTE: at some point I've done separated implementation for SumXXX, functions, but after optimizing implementation of Read/Write/Sum, the gain wasn't that big Current speed on Initial speed on i7-8665U@1.90 BenchmarkPermutationFunction 1592787 736 ns/op 271.90 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x01/SHA-3/224 98752 11630 ns/op 176.02 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x01/SHA-3/256 92508 12447 ns/op 164.46 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x01/SHA-3/384 76765 15206 ns/op 134.62 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x01/SHA-3/512 54333 21932 ns/op 93.33 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x16/SHA-3/224 10000 102161 ns/op 160.37 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x16/SHA-3/256 10000 106531 ns/op 153.80 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x16/SHA-3/384 8641 137272 ns/op 119.35 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x16/SHA-3/512 6340 189124 ns/op 86.63 MB/s 0 B/op 0 allocs/op BenchmarkShake_x01/SHAKE-128 167062 7149 ns/op 188.83 MB/s 0 B/op 0 allocs/op BenchmarkShake_x01/SHAKE-256 151982 7748 ns/op 174.24 MB/s 0 B/op 0 allocs/op BenchmarkShake_x16/SHAKE-128 12963 87770 ns/op 186.67 MB/s 0 B/op 0 allocs/op BenchmarkShake_x16/SHAKE-256 10000 105554 ns/op 155.22 MB/s 0 B/op 0 allocs/op BenchmarkCShake/cSHAKE-128 109148 10940 ns/op 187.11 MB/s 0 B/op 0 allocs/op BenchmarkCShake/cSHAKE-256 90324 13211 ns/op 154.94 MB/s 0 B/op 0 allocs/op PASS	2020-08-29 02:12:49 +01:00
Henry Case	3a320e1714	Create FUNDING.yml	2020-08-28 23:32:08 +01:00
Kris Kwiatkowski	7dcb72bf74	remove shake	2020-08-25 22:39:56 +01:00
Kris Kwiatkowski	ffd7590213	improve comment Initial speed on i7-8665U > go test -bench=. -test.cpu=1 goos: linux goarch: amd64 pkg: github.com/henrydcase/nobs/hash/sha3 BenchmarkPermutationFunction 1634836 732 ns/op 273.18 MB/s BenchmarkSha3_512_MTU 78438 15340 ns/op 88.00 MB/s BenchmarkSha3_384_MTU 108807 11025 ns/op 122.45 MB/s BenchmarkSha3_256_MTU 136902 8767 ns/op 153.98 MB/s BenchmarkSha3_224_MTU 143377 8355 ns/op 161.57 MB/s BenchmarkShake128_MTU 163569 7108 ns/op 189.94 MB/s BenchmarkShake256_MTU 156534 7643 ns/op 176.64 MB/s BenchmarkShake256_16x 10000 112109 ns/op 146.14 MB/s BenchmarkShake256_1MiB 204 5877014 ns/op 178.42 MB/s BenchmarkSha3_512_1MiB 100 10967026 ns/op 95.61 MB/s PASS ok github.com/henrydcase/nobs/hash/sha3 13.855s	2020-08-25 17:09:40 +01:00
Kris Kwiatkowski	516ea4f5e8	cleanup	2020-08-25 12:32:22 +01:00
Kris Kwiatkowski	68ba33e34f	sha3: remove s390	2020-08-25 12:11:08 +01:00
Henry Case	b56c355c8d	adds cycle count. fixes csidh which provides 128 not 512 bits of security (#38 )	2020-08-25 11:22:53 +01:00
Kris Kwiatkowski	c30f61923a	adds cycle count. fixes csidh which provides 128 not 512 bits of security	2020-08-25 11:21:11 +01:00
Kris Kwiatkowski	a02b9a77a0	mkem: add csidh	2020-08-25 11:19:07 +01:00
Kris Kwiatkowski	2500d74484	export more symbols from common	2020-05-16 22:37:41 +00:00
Henry Case	a152c09fd5	sike: move common (#33 ) * makes common reusable * exports some more symbols from common * remove kem for a moment	2020-05-16 20:14:48 +00:00
Henry Case	55957bbf5e	sike: move common (#32 ) * makes common reusable * exports some more symbols from common	2020-05-16 18:51:34 +00:00
Kris Kwiatkowski	ab962715d5	Fixes cSIDH key generation when run in the loop	2020-05-14 11:53:23 +00:00
Henry Case	bc32024729	sidh: updates (#31 )	2020-05-14 08:51:20 +00:00
Kris Kwiatkowski	f5a7daf2bb	sidh: update to p434	2020-05-14 00:02:32 +00:00
Kris Kwiatkowski	91945fde1f	csidh: cosmettic updates	2020-05-13 23:48:43 +00:00
Kris K	7d891c7eb8	support go 1.14 (#29 ) NOBS now supports Go 1.14 for * x86-64 * ARM	2020-03-05 11:19:51 +00:00
Kris Kwiatkowski	d0692c81f0	Remove BS from README.md	2020-02-13 10:27:42 +00:00
Kris Kwiatkowski	48ea6a583a	Remove BS from README.md	2020-02-13 10:27:18 +00:00
Kris Kwiatkowski	c5bff4fa11	Remove BS from README.md	2020-02-13 10:25:47 +00:00
Kris Kwiatkowski	2a73461591	remove crapy x448	2020-02-13 10:17:54 +00:00
Kris Kwiatkowski	7efbbf4745	cSIDH-511: (#26 ) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh \| TestName \|Go \| C \| \|------------------\|----------\|----------\| \|TestSharedSecret \| 57.95774 \| 57.91092 \| \|TestKeyGeneration \| 62.23614 \| 58.12980 \| \|TestSharedSecret \| 55.28988 \| 57.23132 \| \|TestKeyGeneration \| 61.68745 \| 58.66396 \| \|TestSharedSecret \| 63.19408 \| 58.64774 \| \|TestKeyGeneration \| 62.34022 \| 61.62539 \| \|TestSharedSecret \| 62.85453 \| 68.74503 \| \|TestKeyGeneration \| 52.58518 \| 58.40115 \| \|TestSharedSecret \| 50.77081 \| 61.91699 \| \|TestKeyGeneration \| 59.91843 \| 61.09266 \| \|TestSharedSecret \| 59.97962 \| 62.98151 \| \|TestKeyGeneration \| 64.57525 \| 56.22863 \| \|TestSharedSecret \| 56.40521 \| 55.77447 \| \|TestKeyGeneration \| 67.85850 \| 58.52604 \| \|TestSharedSecret \| 60.54290 \| 65.14052 \| \|TestKeyGeneration \| 65.45766 \| 58.42823 \| On average Go implementation is 2% faster.	2019-11-25 15:03:29 +00:00
Henry Case	15f6ee16b9	SHA-3: Fixes crash when cloning Shake state	2019-05-26 17:29:15 +01:00
Henry Case	9b3c0190b0	Updates P34 strategy calculation	2019-05-23 18:32:28 +01:00
Henry Case	7298b650cc	Adds go.mod (#21 ) * Reset Makefile after adding go.mod * Remove ``build`` directory * Simiplifies makefile * shake: Make xorIn copyOut platform specific	2019-05-15 18:03:35 +01:00
Henry Case	49bf0db8fd	SHAKE: Don't use function pointers (#20 ) * xorIn and copyOut function pointers cause input and output data to be moved to heap. This degrades performance of calling code. * This change removes usage of those function pointers. We will always use unaligned implementation as it's faster (but may crash on some systems) * Benchmark compares generic vs unaligned xorIn and copyOut benchmark old ns/op new ns/op delta BenchmarkPermutationFunction-4 463 815 +76.03% BenchmarkShake128_MTU-4 4443 8180 +84.11% BenchmarkShake256_MTU-4 4739 9060 +91.18% BenchmarkShake256_16x-4 71886 132629 +84.50% BenchmarkShake256_1MiB-4 3695138 6649012 +79.94% BenchmarkCShake128_448_16x-4 21210 24611 +16.03% BenchmarkCShake128_1MiB-4 3009342 3396496 +12.87% BenchmarkCShake256_448_16x-4 26034 27785 +6.73% BenchmarkCShake256_1MiB-4 3654713 3829404 +4.78%	2019-05-14 17:08:33 +01:00
Henry Case	e6439f96ab	Adds cSHAKE with 0 alloc interface (#19 )	2019-05-14 01:19:29 +01:00
Henry Case	6f9706df01	CTR-DRBG: Use hardware acceleration on X86 (#18 ) benchmark old ns/op new ns/op delta BenchmarkInit-4 3403 397 -88.33% BenchmarkRead-4 14535 1560 -89.27%	2019-04-09 23:50:21 +01:00
Kris Kwiatkowski	71624cdc4c	Improvements to makefile	2019-04-09 17:30:30 +01:00
Kris Kwiatkowski	b184944242	Nits for SIDH	2019-04-09 17:09:34 +01:00
Kris Kwiatkowski	08f7315b64	DRBG: Speed improvements * CTR-DRBG doesn't call "NewCipher" for block encryption * Changes API of CTR-DRBG, so that read operation implementes io.Reader Benchmark results: ---------------------- benchmark old ns/op new ns/op delta BenchmarkInit-4 1118 3579 +220.13% BenchmarkRead-4 5343 14589 +173.05% benchmark old allocs new allocs delta BenchmarkInit-4 15 0 -100.00% BenchmarkRead-4 67 0 -100.00% benchmark old bytes new bytes delta BenchmarkInit-4 1824 0 -100.00% BenchmarkRead-4 9488 0 -100.00%	2019-04-09 14:37:59 +01:00
Kris Kwiatkowski	e66cc99401	Improves comment	2019-02-19 14:44:11 +00:00
Henry Case	b47a731959	Run tests on ARM64 (#11 )	2019-02-16 21:29:20 +00:00
Kris Kwiatkowski	90f8cba329	SIDH: Update (#9 ) * Change license to BSD-3 * SIDH: Multiple developlemnts	2018-12-03 23:07:01 +00:00
Kris Kwiatkowski	ea2ffa2d61	PERF: sidh-p503: Split sub and add into 2 uops instead of 3 (#8 ) The performance improvement comes from the fact that on Skylake "add mem, reg" splits into 2 uops - one arithmetic uop and another one for loading a value from mem. However, changing operand order to "add reg, mem" splits into 3 uops: one for arithmetic op, one for load and one additional one for storing the result back. Using separated instruction for loading/storing helps to parallelize execution (load/store and arithmetic instruction is done in parallel if possible) For details, see: https://www.agner.org/optimize/instruction_tables.pdf New: BenchmarkFp503StrongReduce-4 300000000 5.57 ns/op Old: BenchmarkFp503StrongReduce-4 200000000 8.60 ns/op This just improves one function, but more functions can be improved	2018-11-18 20:57:29 +00:00

1 2

82 Commits