mirror of
https://github.com/henrydcase/nobs.git
synced 2024-11-22 15:18:57 +00:00
Kris Kwiatkowski
ea2ffa2d61
The performance improvement comes from the fact that on Skylake "add mem, reg" splits into 2 uops - one arithmetic uop and another one for loading a value from mem. However, changing operand order to "add reg, mem" splits into 3 uops: one for arithmetic op, one for load and one additional one for storing the result back. Using separated instruction for loading/storing helps to parallelize execution (load/store and arithmetic instruction is done in parallel if possible) For details, see: https://www.agner.org/optimize/instruction_tables.pdf New: BenchmarkFp503StrongReduce-4 300000000 5.57 ns/op Old: BenchmarkFp503StrongReduce-4 200000000 8.60 ns/op This just improves one function, but more functions can be improved |
||
---|---|---|
.. | ||
internal/isogeny | ||
p503 | ||
p751 | ||
api.go | ||
params.go | ||
sidh_test.go | ||
sidh.go |