This produces slightly better performance than the inline assembly, and has the added benefit that it should be portable to other systems that use gcc, not just x86-64. Here are the results on my "AMD Athlon(tm) 7450 Dual-Core Processor" with "gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3": with portable 64H macros: camellia : Schedule at 1659 camellia [ 23]: Encrypt at 431, Decrypt at 434 whirlpool : Process at 55 with inline assembly (with "memory clobber" for correctness): camellia : Schedule at 1380 camellia [ 23]: Encrypt at 406, Decrypt at 403 whirlpool : Process at 50 with __builtin_bswap64: camellia : Schedule at 1352 camellia [ 23]: Encrypt at 396, Decrypt at 391 whirlpool : Process at 46
See doc/crypt.pdf
					Languages
				
				
								
								
									C
								
								98.2%
							
						
							
								
								
									Makefile
								
								0.7%
							
						
							
								
								
									Perl
								
								0.4%
							
						
							
								
								
									Shell
								
								0.3%
							
						
							
								
								
									Java
								
								0.2%
							
						
							
								
								
									Other
								
								0.1%