Improve CBC decrypt and CTR by ~13/16%, which adds up to ~25/33%
improvement over "pre-Silvermont" version. [Add performance table to
aesni-x86.pl].
(Imported from upstream's b347341c75656cf8bc039bd0ea5e3571c9299687)
This change adjusts the stack pointer during CBC decryption. The code
was previously using the red zone across function calls and valgrind
thinks that the "unused" stack is undefined after a function call.
Initial fork from f2d678e6e89b6508147086610e985d4e8416e867 (1.0.2 beta).
(This change contains substantial changes from the original and
effectively starts a new history.)