boringssl/crypto/perlasm
David Benjamin b9c26014de Get rid of all compiler version checks in perlasm files.
Since we pre-generate our perlasm, having the output of these files be
sensitive to the environment the run in is unhelpful. It would be bad to
suddenly change what features we do or don't compile in whenever workstations'
toolchains change.

Enable all compiler-version-gated features as they should all be runtime-gated
anyway. This should align with what upstream's files would have produced on
modern toolschains. We should assume our assemblers can take whatever we'd like
to throw at them. (If it turns out some can't, we'd rather find out and
probably switch the problematic instructions to explicit byte sequences.)

This actually results in a fairly significant change to the assembly we
generate. I'm guessing upstream's buildsystem sets the CC environment variable,
while ours doesn't and so the version checks were all coming out conservative.

diffstat of generated files:

 linux-x86/crypto/sha/sha1-586.S              | 1176 ++++++++++++
 linux-x86/crypto/sha/sha256-586.S            | 2248 ++++++++++++++++++++++++
 linux-x86_64/crypto/bn/rsaz-avx2.S           | 1644 +++++++++++++++++
 linux-x86_64/crypto/bn/rsaz-x86_64.S         |  638 ++++++
 linux-x86_64/crypto/bn/x86_64-mont.S         |  332 +++
 linux-x86_64/crypto/bn/x86_64-mont5.S        | 1130 ++++++++++++
 linux-x86_64/crypto/modes/aesni-gcm-x86_64.S |  754 ++++++++
 linux-x86_64/crypto/modes/ghash-x86_64.S     |  475 +++++
 linux-x86_64/crypto/sha/sha1-x86_64.S        | 1121 ++++++++++++
 linux-x86_64/crypto/sha/sha256-x86_64.S      | 1062 +++++++++++
 linux-x86_64/crypto/sha/sha512-x86_64.S      | 2241 ++++++++++++++++++++++++
 mac-x86/crypto/sha/sha1-586.S                | 1174 ++++++++++++
 mac-x86/crypto/sha/sha256-586.S              | 2248 ++++++++++++++++++++++++
 mac-x86_64/crypto/bn/rsaz-avx2.S             | 1637 +++++++++++++++++
 mac-x86_64/crypto/bn/rsaz-x86_64.S           |  638 ++++++
 mac-x86_64/crypto/bn/x86_64-mont.S           |  331 +++
 mac-x86_64/crypto/bn/x86_64-mont5.S          | 1130 ++++++++++++
 mac-x86_64/crypto/modes/aesni-gcm-x86_64.S   |  750 ++++++++
 mac-x86_64/crypto/modes/ghash-x86_64.S       |  475 +++++
 mac-x86_64/crypto/sha/sha1-x86_64.S          | 1121 ++++++++++++
 mac-x86_64/crypto/sha/sha256-x86_64.S        | 1062 +++++++++++
 mac-x86_64/crypto/sha/sha512-x86_64.S        | 2241 ++++++++++++++++++++++++
 win-x86/crypto/sha/sha1-586.asm              | 1173 ++++++++++++
 win-x86/crypto/sha/sha256-586.asm            | 2248 ++++++++++++++++++++++++
 win-x86_64/crypto/bn/rsaz-avx2.asm           | 1858 +++++++++++++++++++-
 win-x86_64/crypto/bn/rsaz-x86_64.asm         |  638 ++++++
 win-x86_64/crypto/bn/x86_64-mont.asm         |  352 +++
 win-x86_64/crypto/bn/x86_64-mont5.asm        | 1184 ++++++++++++
 win-x86_64/crypto/modes/aesni-gcm-x86_64.asm |  933 ++++++++++
 win-x86_64/crypto/modes/ghash-x86_64.asm     |  515 +++++
 win-x86_64/crypto/sha/sha1-x86_64.asm        | 1152 ++++++++++++
 win-x86_64/crypto/sha/sha256-x86_64.asm      | 1088 +++++++++++
 win-x86_64/crypto/sha/sha512-x86_64.asm      | 2499 ++++++

SHA* gets faster. RSA and AES-GCM seem to be more of a wash and even slower
sometimes!  This is a little concerning. Though when I repeated the latter two,
it's definitely noisy (RSA in particular), so we may wish to repeat in a more
controlled environment. We could also flip some of these toggles to something
other than the highest setting if it seems some of the variants aren't
desirable. We just shouldn't have them enabled or disabled on accident. This
aligns us closer to upstream though.

$ /tmp/bssl.old speed SHA-
Did 5028000 SHA-1 (16 bytes) operations in 1000048us (5027758.7 ops/sec): 80.4 MB/s
Did 1708000 SHA-1 (256 bytes) operations in 1000257us (1707561.2 ops/sec): 437.1 MB/s
Did 73000 SHA-1 (8192 bytes) operations in 1008406us (72391.5 ops/sec): 593.0 MB/s
Did 3041000 SHA-256 (16 bytes) operations in 1000311us (3040054.5 ops/sec): 48.6 MB/s
Did 779000 SHA-256 (256 bytes) operations in 1000820us (778361.7 ops/sec): 199.3 MB/s
Did 26000 SHA-256 (8192 bytes) operations in 1009875us (25745.8 ops/sec): 210.9 MB/s
Did 1837000 SHA-512 (16 bytes) operations in 1000251us (1836539.0 ops/sec): 29.4 MB/s
Did 803000 SHA-512 (256 bytes) operations in 1000969us (802222.6 ops/sec): 205.4 MB/s
Did 41000 SHA-512 (8192 bytes) operations in 1016768us (40323.8 ops/sec): 330.3 MB/s
$ /tmp/bssl.new speed SHA-
Did 5354000 SHA-1 (16 bytes) operations in 1000104us (5353443.2 ops/sec): 85.7 MB/s
Did 1779000 SHA-1 (256 bytes) operations in 1000121us (1778784.8 ops/sec): 455.4 MB/s
Did 87000 SHA-1 (8192 bytes) operations in 1012641us (85914.0 ops/sec): 703.8 MB/s
Did 3517000 SHA-256 (16 bytes) operations in 1000114us (3516599.1 ops/sec): 56.3 MB/s
Did 935000 SHA-256 (256 bytes) operations in 1000096us (934910.2 ops/sec): 239.3 MB/s
Did 38000 SHA-256 (8192 bytes) operations in 1004476us (37830.7 ops/sec): 309.9 MB/s
Did 2930000 SHA-512 (16 bytes) operations in 1000259us (2929241.3 ops/sec): 46.9 MB/s
Did 1008000 SHA-512 (256 bytes) operations in 1000509us (1007487.2 ops/sec): 257.9 MB/s
Did 45000 SHA-512 (8192 bytes) operations in 1000593us (44973.3 ops/sec): 368.4 MB/s

$ /tmp/bssl.old speed RSA
Did 820 RSA 2048 signing operations in 1017008us (806.3 ops/sec)
Did 27000 RSA 2048 verify operations in 1015400us (26590.5 ops/sec)
Did 1292 RSA 2048 (3 prime, e=3) signing operations in 1008185us (1281.5 ops/sec)
Did 65000 RSA 2048 (3 prime, e=3) verify operations in 1011388us (64268.1 ops/sec)
Did 120 RSA 4096 signing operations in 1061027us (113.1 ops/sec)
Did 8208 RSA 4096 verify operations in 1002717us (8185.8 ops/sec)
$ /tmp/bssl.new speed RSA
Did 760 RSA 2048 signing operations in 1003351us (757.5 ops/sec)
Did 25900 RSA 2048 verify operations in 1028931us (25171.8 ops/sec)
Did 1320 RSA 2048 (3 prime, e=3) signing operations in 1040806us (1268.2 ops/sec)
Did 63000 RSA 2048 (3 prime, e=3) verify operations in 1016042us (62005.3 ops/sec)
Did 104 RSA 4096 signing operations in 1008718us (103.1 ops/sec)
Did 6875 RSA 4096 verify operations in 1093441us (6287.5 ops/sec)

$ /tmp/bssl.old speed GCM
Did 5316000 AES-128-GCM (16 bytes) seal operations in 1000082us (5315564.1 ops/sec): 85.0 MB/s
Did 712000 AES-128-GCM (1350 bytes) seal operations in 1000252us (711820.6 ops/sec): 961.0 MB/s
Did 149000 AES-128-GCM (8192 bytes) seal operations in 1003182us (148527.4 ops/sec): 1216.7 MB/s
Did 5919750 AES-256-GCM (16 bytes) seal operations in 1000016us (5919655.3 ops/sec): 94.7 MB/s
Did 800000 AES-256-GCM (1350 bytes) seal operations in 1000951us (799239.9 ops/sec): 1079.0 MB/s
Did 152000 AES-256-GCM (8192 bytes) seal operations in 1000765us (151883.8 ops/sec): 1244.2 MB/s
$ /tmp/bssl.new speed GCM
Did 5315000 AES-128-GCM (16 bytes) seal operations in 1000125us (5314335.7 ops/sec): 85.0 MB/s
Did 755000 AES-128-GCM (1350 bytes) seal operations in 1000878us (754337.7 ops/sec): 1018.4 MB/s
Did 151000 AES-128-GCM (8192 bytes) seal operations in 1005655us (150150.9 ops/sec): 1230.0 MB/s
Did 5913500 AES-256-GCM (16 bytes) seal operations in 1000041us (5913257.6 ops/sec): 94.6 MB/s
Did 782000 AES-256-GCM (1350 bytes) seal operations in 1001484us (780841.2 ops/sec): 1054.1 MB/s
Did 121000 AES-256-GCM (8192 bytes) seal operations in 1006389us (120231.8 ops/sec): 984.9 MB/s

Change-Id: I0efb32f896c597abc7d7e55c31d038528a5c72a1
Reviewed-on: https://boringssl-review.googlesource.com/6260
Reviewed-by: Adam Langley <alangley@gmail.com>
2015-10-26 20:31:30 +00:00
..
arm-xlate.pl Revert section changes for ASM. 2015-09-30 22:09:52 +00:00
cbc.pl Inital import. 2014-06-20 13:17:32 -07:00
readme Inital import. 2014-06-20 13:17:32 -07:00
x86_64-xlate.pl Get rid of all compiler version checks in perlasm files. 2015-10-26 20:31:30 +00:00
x86asm.pl Get MASM output working on Win32. 2014-10-29 23:13:20 +00:00
x86gas.pl Add visibility rules. 2014-07-31 22:03:11 +00:00
x86masm.pl perlasm/x86masm.pl: make it work. 2015-02-23 19:45:30 +00:00
x86nasm.pl Build Win32 with Yasm rather than MASM. 2014-10-29 23:14:11 +00:00

The perl scripts in this directory are my 'hack' to generate
multiple different assembler formats via the one origional script.

The way to use this library is to start with adding the path to this directory
and then include it.

push(@INC,"perlasm","../../perlasm");
require "x86asm.pl";

The first thing we do is setup the file and type of assember

&asm_init($ARGV[0],$0);

The first argument is the 'type'.  Currently
'cpp', 'sol', 'a.out', 'elf' or 'win32'.
Argument 2 is the file name.

The reciprocal function is
&asm_finish() which should be called at the end.

There are 2 main 'packages'. x86ms.pl, which is the microsoft assembler,
and x86unix.pl which is the unix (gas) version.

Functions of interest are:
&external_label("des_SPtrans");	declare and external variable
&LB(reg);			Low byte for a register
&HB(reg);			High byte for a register
&BP(off,base,index,scale)	Byte pointer addressing
&DWP(off,base,index,scale)	Word pointer addressing
&stack_push(num)		Basically a 'sub esp, num*4' with extra
&stack_pop(num)			inverse of stack_push
&function_begin(name,extra)	Start a function with pushing of
				edi, esi, ebx and ebp.  extra is extra win32
				external info that may be required.
&function_begin_B(name,extra)	Same as norma function_begin but no pushing.
&function_end(name)		Call at end of function.
&function_end_A(name)		Standard pop and ret, for use inside functions
&function_end_B(name)		Call at end but with poping or 'ret'.
&swtmp(num)			Address on stack temp word.
&wparam(num)			Parameter number num, that was push
				in C convention.  This all works over pushes
				and pops.
&comment("hello there")		Put in a comment.
&label("loop")			Refer to a label, normally a jmp target.
&set_label("loop")		Set a label at this point.
&data_word(word)		Put in a word of data.

So how does this all hold together?  Given

int calc(int len, int *data)
	{
	int i,j=0;

	for (i=0; i<len; i++)
		{
		j+=other(data[i]);
		}
	}

So a very simple version of this function could be coded as

	push(@INC,"perlasm","../../perlasm");
	require "x86asm.pl";
	
	&asm_init($ARGV[0],"cacl.pl");

	&external_label("other");

	$tmp1=	"eax";
	$j=	"edi";
	$data=	"esi";
	$i=	"ebp";

	&comment("a simple function");
	&function_begin("calc");
	&mov(	$data,		&wparam(1)); # data
	&xor(	$j,		$j);
	&xor(	$i,		$i);

	&set_label("loop");
	&cmp(	$i,		&wparam(0));
	&jge(	&label("end"));

	&mov(	$tmp1,		&DWP(0,$data,$i,4));
	&push(	$tmp1);
	&call(	"other");
	&add(	$j,		"eax");
	&pop(	$tmp1);
	&inc(	$i);
	&jmp(	&label("loop"));

	&set_label("end");
	&mov(	"eax",		$j);

	&function_end("calc");

	&asm_finish();

The above example is very very unoptimised but gives an idea of how
things work.

There is also a cbc mode function generator in cbc.pl

&cbc(	$name,
	$encrypt_function_name,
	$decrypt_function_name,
	$true_if_byte_swap_needed,
	$parameter_number_for_iv,
	$parameter_number_for_encrypt_flag,
	$first_parameter_to_pass,
	$second_parameter_to_pass,
	$third_parameter_to_pass);

So for example, given
void BF_encrypt(BF_LONG *data,BF_KEY *key);
void BF_decrypt(BF_LONG *data,BF_KEY *key);
void BF_cbc_encrypt(unsigned char *in, unsigned char *out, long length,
        BF_KEY *ks, unsigned char *iv, int enc);

&cbc("BF_cbc_encrypt","BF_encrypt","BF_encrypt",1,4,5,3,-1,-1);

&cbc("des_ncbc_encrypt","des_encrypt","des_encrypt",0,4,5,3,5,-1);
&cbc("des_ede3_cbc_encrypt","des_encrypt3","des_decrypt3",0,6,7,3,4,5);