Nexus 7 goes from 1002.8 ops/sec to 4704.8 at a cost of 10KB of code.
(It'll actually save code if built with -mfpu=neon because then the
generic version can be discarded by the compiler.)
Change-Id: Ia6d02efb2c2d1bb02a07eb56ec4ca3b0dba99382
Reviewed-on: https://boringssl-review.googlesource.com/6524
Reviewed-by: Adam Langley <agl@google.com>
MSVC doesn't like unary minus on unsigned types. Also, the speed test
always failed because the inputs were all zeros and thus had small
order.
Change-Id: Ic2d3c2c9bd57dc66295d93891396871cebac1e0b