nobs/sidh at 9137ce1ffcbaa7985a7c9e739ac4e8aa50e5a55e - nobs

kris/nobs

mirror of https://github.com/henrydcase/nobs.git synced 2024-11-22 23:28:57 +00:00

History

Kris Kwiatkowski ea2ffa2d61 PERF: sidh-p503: Split sub and add into 2 uops instead of 3 (#8 ) The performance improvement comes from the fact that on Skylake "add mem, reg" splits into 2 uops - one arithmetic uop and another one for loading a value from mem. However, changing operand order to "add reg, mem" splits into 3 uops: one for arithmetic op, one for load and one additional one for storing the result back. Using separated instruction for loading/storing helps to parallelize execution (load/store and arithmetic instruction is done in parallel if possible) For details, see: https://www.agner.org/optimize/instruction_tables.pdf New: BenchmarkFp503StrongReduce-4 300000000 5.57 ns/op Old: BenchmarkFp503StrongReduce-4 200000000 8.60 ns/op This just improves one function, but more functions can be improved		2018-11-18 20:57:29 +00:00
..
internal/isogeny	complate rewrite for SIDH and SIKE. adds p503 (#5 )	2018-10-25 15:22:28 +01:00
p503	PERF: sidh-p503: Split sub and add into 2 uops instead of 3 (#8 )	2018-11-18 20:57:29 +00:00
p751	sidh/csidh: use SEE for performing CSWAP (#6 )	2018-10-29 15:41:09 +00:00
api.go	complate rewrite for SIDH and SIKE. adds p503 (#5 )	2018-10-25 15:22:28 +01:00
params.go	complate rewrite for SIDH and SIKE. adds p503 (#5 )	2018-10-25 15:22:28 +01:00
sidh_test.go	complate rewrite for SIDH and SIKE. adds p503 (#5 )	2018-10-25 15:22:28 +01:00
sidh.go	complate rewrite for SIDH and SIKE. adds p503 (#5 )	2018-10-25 15:22:28 +01:00