Implement field_{mul,sqr} in p224-64.c with p224_felems.
This is in preparation for representing field elements with
stack-allocated types in the generic code. While there is likely little
benefit in threading all the turned field arithmetic through all the
generic code, and the P-224 logic, in particular, does not have a tight
enough abstraction for this, the current implementations depend on
BN_div, which is not compatible with stack-allocating things and avoiding
malloc.
This also speeds things up slightly, now that benchmarks cover point
validation.
Before:
Did 82786 ECDH P-224 operations in 10024326us (8258.5 ops/sec)
After:
Did 89991 ECDH P-224 operations in 10012429us (8987.9 ops/sec)
Change-Id: I468483b49f5dc69187aebd62834365ce5caab795
Reviewed-on: https://boringssl-review.googlesource.com/26971
Reviewed-by: Adam Langley <agl@google.com>