Nexus 7 goes from 1002.8 ops/sec to 4704.8 at a cost of 10KB of code.
(It'll actually save code if built with -mfpu=neon because then the
generic version can be discarded by the compiler.)
Change-Id: Ia6d02efb2c2d1bb02a07eb56ec4ca3b0dba99382
Reviewed-on: https://boringssl-review.googlesource.com/6524
Reviewed-by: Adam Langley <agl@google.com>