kris/nobs - nobs - GIT server of AmongBytes

kris/nobs

mirror of https://github.com/henrydcase/nobs.git synced 2024-11-22 15:18:57 +00:00

Author	SHA1	Message	Date
Kris Kwiatkowski	d2896a16e8	WIP	2021-03-08 23:54:47 +00:00
Kris Kwiatkowski	29c33a33a7	adds sha2: wip	2021-03-07 15:36:46 +00:00
Henry Case	8474981cfc	SHA-3: speedups (#47 ) * add function for one-off calculation * sha3: simplifies Read function * sha3: remove if from Read	2020-10-03 23:27:08 +01:00
Henry Case	45bc1a75f6	add function for one-off calculation (#45 )	2020-10-03 15:12:26 +01:00
Henry Case	adfaf1e58c	fix: ebx -> ecx (#46 )	2020-10-03 15:11:52 +01:00
Kris Kwiatkowski	24408329a5	Use bits.RotateLeft64 whenever possible	2020-09-28 21:03:08 +01:00
Henry Case	0174e314a1	Update README.md	2020-08-29 02:13:24 +01:00
Henry Case	820906b7c7	sha3: optimizations and cleanup (#41 ) * complate reset of the SHA-3 code. Affects mostly the code in sha3.go * fixes a bug which causes SHAKE implementation to crash * implementation of Read()/Write() avoid unnecessary buffering as much as possible * NOTE: at some point I've done separated implementation for SumXXX, functions, but after optimizing implementation of Read/Write/Sum, the gain wasn't that big Current speed on Initial speed on i7-8665U@1.90 BenchmarkPermutationFunction 1592787 736 ns/op 271.90 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x01/SHA-3/224 98752 11630 ns/op 176.02 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x01/SHA-3/256 92508 12447 ns/op 164.46 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x01/SHA-3/384 76765 15206 ns/op 134.62 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x01/SHA-3/512 54333 21932 ns/op 93.33 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x16/SHA-3/224 10000 102161 ns/op 160.37 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x16/SHA-3/256 10000 106531 ns/op 153.80 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x16/SHA-3/384 8641 137272 ns/op 119.35 MB/s 0 B/op 0 allocs/op BenchmarkSha3Chunk_x16/SHA-3/512 6340 189124 ns/op 86.63 MB/s 0 B/op 0 allocs/op BenchmarkShake_x01/SHAKE-128 167062 7149 ns/op 188.83 MB/s 0 B/op 0 allocs/op BenchmarkShake_x01/SHAKE-256 151982 7748 ns/op 174.24 MB/s 0 B/op 0 allocs/op BenchmarkShake_x16/SHAKE-128 12963 87770 ns/op 186.67 MB/s 0 B/op 0 allocs/op BenchmarkShake_x16/SHAKE-256 10000 105554 ns/op 155.22 MB/s 0 B/op 0 allocs/op BenchmarkCShake/cSHAKE-128 109148 10940 ns/op 187.11 MB/s 0 B/op 0 allocs/op BenchmarkCShake/cSHAKE-256 90324 13211 ns/op 154.94 MB/s 0 B/op 0 allocs/op PASS	2020-08-29 02:12:49 +01:00
Henry Case	3a320e1714	Create FUNDING.yml	2020-08-28 23:32:08 +01:00
Kris Kwiatkowski	7dcb72bf74	remove shake	2020-08-25 22:39:56 +01:00
Kris Kwiatkowski	ffd7590213	improve comment Initial speed on i7-8665U > go test -bench=. -test.cpu=1 goos: linux goarch: amd64 pkg: github.com/henrydcase/nobs/hash/sha3 BenchmarkPermutationFunction 1634836 732 ns/op 273.18 MB/s BenchmarkSha3_512_MTU 78438 15340 ns/op 88.00 MB/s BenchmarkSha3_384_MTU 108807 11025 ns/op 122.45 MB/s BenchmarkSha3_256_MTU 136902 8767 ns/op 153.98 MB/s BenchmarkSha3_224_MTU 143377 8355 ns/op 161.57 MB/s BenchmarkShake128_MTU 163569 7108 ns/op 189.94 MB/s BenchmarkShake256_MTU 156534 7643 ns/op 176.64 MB/s BenchmarkShake256_16x 10000 112109 ns/op 146.14 MB/s BenchmarkShake256_1MiB 204 5877014 ns/op 178.42 MB/s BenchmarkSha3_512_1MiB 100 10967026 ns/op 95.61 MB/s PASS ok github.com/henrydcase/nobs/hash/sha3 13.855s	2020-08-25 17:09:40 +01:00
Kris Kwiatkowski	516ea4f5e8	cleanup	2020-08-25 12:32:22 +01:00
Kris Kwiatkowski	68ba33e34f	sha3: remove s390	2020-08-25 12:11:08 +01:00
Henry Case	b56c355c8d	adds cycle count. fixes csidh which provides 128 not 512 bits of security (#38 )	2020-08-25 11:22:53 +01:00
Kris Kwiatkowski	c30f61923a	adds cycle count. fixes csidh which provides 128 not 512 bits of security	2020-08-25 11:21:11 +01:00
Kris Kwiatkowski	a02b9a77a0	mkem: add csidh	2020-08-25 11:19:07 +01:00
Kris Kwiatkowski	2500d74484	export more symbols from common	2020-05-16 22:37:41 +00:00
Henry Case	a152c09fd5	sike: move common (#33 ) * makes common reusable * exports some more symbols from common * remove kem for a moment	2020-05-16 20:14:48 +00:00
Henry Case	55957bbf5e	sike: move common (#32 ) * makes common reusable * exports some more symbols from common	2020-05-16 18:51:34 +00:00
Kris Kwiatkowski	ab962715d5	Fixes cSIDH key generation when run in the loop	2020-05-14 11:53:23 +00:00
Henry Case	bc32024729	sidh: updates (#31 )	2020-05-14 08:51:20 +00:00
Kris Kwiatkowski	f5a7daf2bb	sidh: update to p434	2020-05-14 00:02:32 +00:00
Kris Kwiatkowski	91945fde1f	csidh: cosmettic updates	2020-05-13 23:48:43 +00:00
Kris K	7d891c7eb8	support go 1.14 (#29 ) NOBS now supports Go 1.14 for * x86-64 * ARM	2020-03-05 11:19:51 +00:00
Kris Kwiatkowski	d0692c81f0	Remove BS from README.md	2020-02-13 10:27:42 +00:00
Kris Kwiatkowski	48ea6a583a	Remove BS from README.md	2020-02-13 10:27:18 +00:00
Kris Kwiatkowski	c5bff4fa11	Remove BS from README.md	2020-02-13 10:25:47 +00:00
Kris Kwiatkowski	2a73461591	remove crapy x448	2020-02-13 10:17:54 +00:00
Kris Kwiatkowski	7efbbf4745	cSIDH-511: (#26 ) Implementation of Commutative Supersingular Isogeny Diffie Hellman, based on "A faster way to CSIDH" paper (2018/782). * For fast isogeny calculation, implementation converts a curve from Montgomery to Edwards. All calculations are done on Edwards curve and then converted back to Montgomery. * As multiplication in a field Fp511 is most expensive operation the implementation contains multiple multiplications. It has most performant, assembly implementation which uses BMI2 and ADOX/ADCX instructions for modern CPUs. It also contains slower implementation which will run on older CPUs * Benchmarks (Intel SkyLake): BenchmarkGeneratePrivate 6459 172213 ns/op 0 B/op 0 allocs/op BenchmarkGenerateKeyPair 25 45800356 ns/op 0 B/op 0 allocs/op BenchmarkValidate 297 3915983 ns/op 0 B/op 0 allocs/op BenchmarkValidateRandom 184683 6231 ns/op 0 B/op 0 allocs/op BenchmarkValidateGenerated 25 48481306 ns/op 0 B/op 0 allocs/op BenchmarkDerive 19 60928763 ns/op 0 B/op 0 allocs/op BenchmarkDeriveGenerated 8 137342421 ns/op 0 B/op 0 allocs/op BenchmarkXMul 2311 494267 ns/op 1 B/op 0 allocs/op BenchmarkXAdd 2396754 501 ns/op 0 B/op 0 allocs/op BenchmarkXDbl 2072690 571 ns/op 0 B/op 0 allocs/op BenchmarkIsom 78004 15171 ns/op 0 B/op 0 allocs/op BenchmarkFp512Sub 224635152 5.33 ns/op 0 B/op 0 allocs/op BenchmarkFp512Mul 246633255 4.90 ns/op 0 B/op 0 allocs/op BenchmarkCSwap 233228547 5.10 ns/op 0 B/op 0 allocs/op BenchmarkAddRdc 87348240 12.6 ns/op 0 B/op 0 allocs/op BenchmarkSubRdc 95112787 11.7 ns/op 0 B/op 0 allocs/op BenchmarkModExpRdc 25436 46878 ns/op 0 B/op 0 allocs/op BenchmarkMulBmiAsm 19527573 60.1 ns/op 0 B/op 0 allocs/op BenchmarkMulGeneric 7117650 164 ns/op 0 B/op 0 allocs/op * Go code has very similar performance when compared to C implementation. Results from sidh_torturer (4e2996e12d68364761064341cbe1d1b47efafe23) github.com:henrydcase/sidh-torture/csidh \| TestName \|Go \| C \| \|------------------\|----------\|----------\| \|TestSharedSecret \| 57.95774 \| 57.91092 \| \|TestKeyGeneration \| 62.23614 \| 58.12980 \| \|TestSharedSecret \| 55.28988 \| 57.23132 \| \|TestKeyGeneration \| 61.68745 \| 58.66396 \| \|TestSharedSecret \| 63.19408 \| 58.64774 \| \|TestKeyGeneration \| 62.34022 \| 61.62539 \| \|TestSharedSecret \| 62.85453 \| 68.74503 \| \|TestKeyGeneration \| 52.58518 \| 58.40115 \| \|TestSharedSecret \| 50.77081 \| 61.91699 \| \|TestKeyGeneration \| 59.91843 \| 61.09266 \| \|TestSharedSecret \| 59.97962 \| 62.98151 \| \|TestKeyGeneration \| 64.57525 \| 56.22863 \| \|TestSharedSecret \| 56.40521 \| 55.77447 \| \|TestKeyGeneration \| 67.85850 \| 58.52604 \| \|TestSharedSecret \| 60.54290 \| 65.14052 \| \|TestKeyGeneration \| 65.45766 \| 58.42823 \| On average Go implementation is 2% faster.	2019-11-25 15:03:29 +00:00
Henry Case	15f6ee16b9	SHA-3: Fixes crash when cloning Shake state	2019-05-26 17:29:15 +01:00
Henry Case	9b3c0190b0	Updates P34 strategy calculation	2019-05-23 18:32:28 +01:00
Henry Case	7298b650cc	Adds go.mod (#21 ) * Reset Makefile after adding go.mod * Remove ``build`` directory * Simiplifies makefile * shake: Make xorIn copyOut platform specific	2019-05-15 18:03:35 +01:00
Henry Case	49bf0db8fd	SHAKE: Don't use function pointers (#20 ) * xorIn and copyOut function pointers cause input and output data to be moved to heap. This degrades performance of calling code. * This change removes usage of those function pointers. We will always use unaligned implementation as it's faster (but may crash on some systems) * Benchmark compares generic vs unaligned xorIn and copyOut benchmark old ns/op new ns/op delta BenchmarkPermutationFunction-4 463 815 +76.03% BenchmarkShake128_MTU-4 4443 8180 +84.11% BenchmarkShake256_MTU-4 4739 9060 +91.18% BenchmarkShake256_16x-4 71886 132629 +84.50% BenchmarkShake256_1MiB-4 3695138 6649012 +79.94% BenchmarkCShake128_448_16x-4 21210 24611 +16.03% BenchmarkCShake128_1MiB-4 3009342 3396496 +12.87% BenchmarkCShake256_448_16x-4 26034 27785 +6.73% BenchmarkCShake256_1MiB-4 3654713 3829404 +4.78%	2019-05-14 17:08:33 +01:00
Henry Case	e6439f96ab	Adds cSHAKE with 0 alloc interface (#19 )	2019-05-14 01:19:29 +01:00
Henry Case	6f9706df01	CTR-DRBG: Use hardware acceleration on X86 (#18 ) benchmark old ns/op new ns/op delta BenchmarkInit-4 3403 397 -88.33% BenchmarkRead-4 14535 1560 -89.27%	2019-04-09 23:50:21 +01:00
Kris Kwiatkowski	71624cdc4c	Improvements to makefile	2019-04-09 17:30:30 +01:00
Kris Kwiatkowski	b184944242	Nits for SIDH	2019-04-09 17:09:34 +01:00
Kris Kwiatkowski	08f7315b64	DRBG: Speed improvements * CTR-DRBG doesn't call "NewCipher" for block encryption * Changes API of CTR-DRBG, so that read operation implementes io.Reader Benchmark results: ---------------------- benchmark old ns/op new ns/op delta BenchmarkInit-4 1118 3579 +220.13% BenchmarkRead-4 5343 14589 +173.05% benchmark old allocs new allocs delta BenchmarkInit-4 15 0 -100.00% BenchmarkRead-4 67 0 -100.00% benchmark old bytes new bytes delta BenchmarkInit-4 1824 0 -100.00% BenchmarkRead-4 9488 0 -100.00%	2019-04-09 14:37:59 +01:00
Kris Kwiatkowski	e66cc99401	Improves comment	2019-02-19 14:44:11 +00:00
Henry Case	b47a731959	Run tests on ARM64 (#11 )	2019-02-16 21:29:20 +00:00
Kris Kwiatkowski	90f8cba329	SIDH: Update (#9 ) * Change license to BSD-3 * SIDH: Multiple developlemnts	2018-12-03 23:07:01 +00:00
Kris Kwiatkowski	ea2ffa2d61	PERF: sidh-p503: Split sub and add into 2 uops instead of 3 (#8 ) The performance improvement comes from the fact that on Skylake "add mem, reg" splits into 2 uops - one arithmetic uop and another one for loading a value from mem. However, changing operand order to "add reg, mem" splits into 3 uops: one for arithmetic op, one for load and one additional one for storing the result back. Using separated instruction for loading/storing helps to parallelize execution (load/store and arithmetic instruction is done in parallel if possible) For details, see: https://www.agner.org/optimize/instruction_tables.pdf New: BenchmarkFp503StrongReduce-4 300000000 5.57 ns/op Old: BenchmarkFp503StrongReduce-4 200000000 8.60 ns/op This just improves one function, but more functions can be improved	2018-11-18 20:57:29 +00:00
Kris Kwiatkowski	e9ddb6fb45	sidh/csidh: use SEE for performing CSWAP (#6 ) * Makefile * makefile: tools for profiling * sidh: use SIMD for performing CSWAP Loads data into 128-bit XMM registers and performs conditional swap. This is probably less useful for SIDH, but will be useful for cSIDH	2018-10-29 15:41:09 +00:00
Kris Kwiatkowski	a456dc4dd9	readme: License	2018-10-25 15:22:28 +01:00
Kris Kwiatkowski	ae57368c7b	License BS for sha3	2018-10-25 15:22:28 +01:00
Kris Kwiatkowski	00c16fe97e	License bulshit	2018-10-25 15:22:28 +01:00
Kris Kwiatkowski	65bbafeef5	script used for calculating sliding window startegy in SIDH P34	2018-10-25 15:22:28 +01:00
Kris Kwiatkowski	0531c3479b	Update README.md	2018-10-25 15:22:28 +01:00
Kris Kwiatkowski	1e34845d00	complate rewrite for SIDH and SIKE. adds p503 (#5 )	2018-10-25 15:22:28 +01:00
Kris Kwiatkowski	d6fc82531f	Doc	2018-10-25 15:22:28 +01:00

1 2

74 Commits