No puede seleccionar más de 25 temas Los temas deben comenzar con una letra o número, pueden incluir guiones ('-') y pueden tener hasta 35 caracteres de largo.

aesp8-ppc.pl 91 KiB

Add PPC64LE assembly for AES-GCM. This change adds AES and GHASH assembly from upstream, with the aim of speeding up AES-GCM. The PPC64LE assembly matches the interface of the ARMv8 assembly so I've changed the prefix of both sets of asm functions to be the same ("aes_hw_"). Otherwise, the new assmebly files and Perlasm match exactly those from upstream's c536b6be1a (from their master branch). Before: Did 1879000 AES-128-GCM (16 bytes) seal operations in 1000428us (1878196.1 ops/sec): 30.1 MB/s Did 61000 AES-128-GCM (1350 bytes) seal operations in 1006660us (60596.4 ops/sec): 81.8 MB/s Did 11000 AES-128-GCM (8192 bytes) seal operations in 1072649us (10255.0 ops/sec): 84.0 MB/s Did 1665000 AES-256-GCM (16 bytes) seal operations in 1000591us (1664016.6 ops/sec): 26.6 MB/s Did 52000 AES-256-GCM (1350 bytes) seal operations in 1006971us (51640.0 ops/sec): 69.7 MB/s Did 8840 AES-256-GCM (8192 bytes) seal operations in 1013294us (8724.0 ops/sec): 71.5 MB/s After: Did 4994000 AES-128-GCM (16 bytes) seal operations in 1000017us (4993915.1 ops/sec): 79.9 MB/s Did 1389000 AES-128-GCM (1350 bytes) seal operations in 1000073us (1388898.6 ops/sec): 1875.0 MB/s Did 319000 AES-128-GCM (8192 bytes) seal operations in 1000101us (318967.8 ops/sec): 2613.0 MB/s Did 4668000 AES-256-GCM (16 bytes) seal operations in 1000149us (4667304.6 ops/sec): 74.7 MB/s Did 1202000 AES-256-GCM (1350 bytes) seal operations in 1000646us (1201224.0 ops/sec): 1621.7 MB/s Did 269000 AES-256-GCM (8192 bytes) seal operations in 1002804us (268247.8 ops/sec): 2197.5 MB/s Change-Id: Id848562bd4e1aa79a4683012501dfa5e6c08cfcc Reviewed-on: https://boringssl-review.googlesource.com/11262 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
hace 8 años
Add PPC64LE assembly for AES-GCM. This change adds AES and GHASH assembly from upstream, with the aim of speeding up AES-GCM. The PPC64LE assembly matches the interface of the ARMv8 assembly so I've changed the prefix of both sets of asm functions to be the same ("aes_hw_"). Otherwise, the new assmebly files and Perlasm match exactly those from upstream's c536b6be1a (from their master branch). Before: Did 1879000 AES-128-GCM (16 bytes) seal operations in 1000428us (1878196.1 ops/sec): 30.1 MB/s Did 61000 AES-128-GCM (1350 bytes) seal operations in 1006660us (60596.4 ops/sec): 81.8 MB/s Did 11000 AES-128-GCM (8192 bytes) seal operations in 1072649us (10255.0 ops/sec): 84.0 MB/s Did 1665000 AES-256-GCM (16 bytes) seal operations in 1000591us (1664016.6 ops/sec): 26.6 MB/s Did 52000 AES-256-GCM (1350 bytes) seal operations in 1006971us (51640.0 ops/sec): 69.7 MB/s Did 8840 AES-256-GCM (8192 bytes) seal operations in 1013294us (8724.0 ops/sec): 71.5 MB/s After: Did 4994000 AES-128-GCM (16 bytes) seal operations in 1000017us (4993915.1 ops/sec): 79.9 MB/s Did 1389000 AES-128-GCM (1350 bytes) seal operations in 1000073us (1388898.6 ops/sec): 1875.0 MB/s Did 319000 AES-128-GCM (8192 bytes) seal operations in 1000101us (318967.8 ops/sec): 2613.0 MB/s Did 4668000 AES-256-GCM (16 bytes) seal operations in 1000149us (4667304.6 ops/sec): 74.7 MB/s Did 1202000 AES-256-GCM (1350 bytes) seal operations in 1000646us (1201224.0 ops/sec): 1621.7 MB/s Did 269000 AES-256-GCM (8192 bytes) seal operations in 1002804us (268247.8 ops/sec): 2197.5 MB/s Change-Id: Id848562bd4e1aa79a4683012501dfa5e6c08cfcc Reviewed-on: https://boringssl-review.googlesource.com/11262 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
hace 8 años
Add PPC64LE assembly for AES-GCM. This change adds AES and GHASH assembly from upstream, with the aim of speeding up AES-GCM. The PPC64LE assembly matches the interface of the ARMv8 assembly so I've changed the prefix of both sets of asm functions to be the same ("aes_hw_"). Otherwise, the new assmebly files and Perlasm match exactly those from upstream's c536b6be1a (from their master branch). Before: Did 1879000 AES-128-GCM (16 bytes) seal operations in 1000428us (1878196.1 ops/sec): 30.1 MB/s Did 61000 AES-128-GCM (1350 bytes) seal operations in 1006660us (60596.4 ops/sec): 81.8 MB/s Did 11000 AES-128-GCM (8192 bytes) seal operations in 1072649us (10255.0 ops/sec): 84.0 MB/s Did 1665000 AES-256-GCM (16 bytes) seal operations in 1000591us (1664016.6 ops/sec): 26.6 MB/s Did 52000 AES-256-GCM (1350 bytes) seal operations in 1006971us (51640.0 ops/sec): 69.7 MB/s Did 8840 AES-256-GCM (8192 bytes) seal operations in 1013294us (8724.0 ops/sec): 71.5 MB/s After: Did 4994000 AES-128-GCM (16 bytes) seal operations in 1000017us (4993915.1 ops/sec): 79.9 MB/s Did 1389000 AES-128-GCM (1350 bytes) seal operations in 1000073us (1388898.6 ops/sec): 1875.0 MB/s Did 319000 AES-128-GCM (8192 bytes) seal operations in 1000101us (318967.8 ops/sec): 2613.0 MB/s Did 4668000 AES-256-GCM (16 bytes) seal operations in 1000149us (4667304.6 ops/sec): 74.7 MB/s Did 1202000 AES-256-GCM (1350 bytes) seal operations in 1000646us (1201224.0 ops/sec): 1621.7 MB/s Did 269000 AES-256-GCM (8192 bytes) seal operations in 1002804us (268247.8 ops/sec): 2197.5 MB/s Change-Id: Id848562bd4e1aa79a4683012501dfa5e6c08cfcc Reviewed-on: https://boringssl-review.googlesource.com/11262 Reviewed-by: Adam Langley <agl@google.com> Commit-Queue: Adam Langley <agl@google.com> CQ-Verified: CQ bot account: commit-bot@chromium.org <commit-bot@chromium.org>
hace 8 años
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309131013111312131313141315131613171318131913201321132213231324132513261327132813291330133113321333133413351336133713381339134013411342134313441345134613471348134913501351135213531354135513561357135813591360136113621363136413651366136713681369137013711372137313741375137613771378137913801381138213831384138513861387138813891390139113921393139413951396139713981399140014011402140314041405140614071408140914101411141214131414141514161417141814191420142114221423142414251426142714281429143014311432143314341435143614371438143914401441144214431444144514461447144814491450145114521453145414551456145714581459146014611462146314641465146614671468146914701471147214731474147514761477147814791480148114821483148414851486148714881489149014911492149314941495149614971498149915001501150215031504150515061507150815091510151115121513151415151516151715181519152015211522152315241525152615271528152915301531153215331534153515361537153815391540154115421543154415451546154715481549155015511552155315541555155615571558155915601561156215631564156515661567156815691570157115721573157415751576157715781579158015811582158315841585158615871588158915901591159215931594159515961597159815991600160116021603160416051606160716081609161016111612161316141615161616171618161916201621162216231624162516261627162816291630163116321633163416351636163716381639164016411642164316441645164616471648164916501651165216531654165516561657165816591660166116621663166416651666166716681669167016711672167316741675167616771678167916801681168216831684168516861687168816891690169116921693169416951696169716981699170017011702170317041705170617071708170917101711171217131714171517161717171817191720172117221723172417251726172717281729173017311732173317341735173617371738173917401741174217431744174517461747174817491750175117521753175417551756175717581759176017611762176317641765176617671768176917701771177217731774177517761777177817791780178117821783178417851786178717881789179017911792179317941795179617971798179918001801180218031804180518061807180818091810181118121813181418151816181718181819182018211822182318241825182618271828182918301831183218331834183518361837183818391840184118421843184418451846184718481849185018511852185318541855185618571858185918601861186218631864186518661867186818691870187118721873187418751876187718781879188018811882188318841885188618871888188918901891189218931894189518961897189818991900190119021903190419051906190719081909191019111912191319141915191619171918191919201921192219231924192519261927192819291930193119321933193419351936193719381939194019411942194319441945194619471948194919501951195219531954195519561957195819591960196119621963196419651966196719681969197019711972197319741975197619771978197919801981198219831984198519861987198819891990199119921993199419951996199719981999200020012002200320042005200620072008200920102011201220132014201520162017201820192020202120222023202420252026202720282029203020312032203320342035203620372038203920402041204220432044204520462047204820492050205120522053205420552056205720582059206020612062206320642065206620672068206920702071207220732074207520762077207820792080208120822083208420852086208720882089209020912092209320942095209620972098209921002101210221032104210521062107210821092110211121122113211421152116211721182119212021212122212321242125212621272128212921302131213221332134213521362137213821392140214121422143214421452146214721482149215021512152215321542155215621572158215921602161216221632164216521662167216821692170217121722173217421752176217721782179218021812182218321842185218621872188218921902191219221932194219521962197219821992200220122022203220422052206220722082209221022112212221322142215221622172218221922202221222222232224222522262227222822292230223122322233223422352236223722382239224022412242224322442245224622472248224922502251225222532254225522562257225822592260226122622263226422652266226722682269227022712272227322742275227622772278227922802281228222832284228522862287228822892290229122922293229422952296229722982299230023012302230323042305230623072308230923102311231223132314231523162317231823192320232123222323232423252326232723282329233023312332233323342335233623372338233923402341234223432344234523462347234823492350235123522353235423552356235723582359236023612362236323642365236623672368236923702371237223732374237523762377237823792380238123822383238423852386238723882389239023912392239323942395239623972398239924002401240224032404240524062407240824092410241124122413241424152416241724182419242024212422242324242425242624272428242924302431243224332434243524362437243824392440244124422443244424452446244724482449245024512452245324542455245624572458245924602461246224632464246524662467246824692470247124722473247424752476247724782479248024812482248324842485248624872488248924902491249224932494249524962497249824992500250125022503250425052506250725082509251025112512251325142515251625172518251925202521252225232524252525262527252825292530253125322533253425352536253725382539254025412542254325442545254625472548254925502551255225532554255525562557255825592560256125622563256425652566256725682569257025712572257325742575257625772578257925802581258225832584258525862587258825892590259125922593259425952596259725982599260026012602260326042605260626072608260926102611261226132614261526162617261826192620262126222623262426252626262726282629263026312632263326342635263626372638263926402641264226432644264526462647264826492650265126522653265426552656265726582659266026612662266326642665266626672668266926702671267226732674267526762677267826792680268126822683268426852686268726882689269026912692269326942695269626972698269927002701270227032704270527062707270827092710271127122713271427152716271727182719272027212722272327242725272627272728272927302731273227332734273527362737273827392740274127422743274427452746274727482749275027512752275327542755275627572758275927602761276227632764276527662767276827692770277127722773277427752776277727782779278027812782278327842785278627872788278927902791279227932794279527962797279827992800280128022803280428052806280728082809281028112812281328142815281628172818281928202821282228232824282528262827282828292830283128322833283428352836283728382839284028412842284328442845284628472848284928502851285228532854285528562857285828592860286128622863286428652866286728682869287028712872287328742875287628772878287928802881288228832884288528862887288828892890289128922893289428952896289728982899290029012902290329042905290629072908290929102911291229132914291529162917291829192920292129222923292429252926292729282929293029312932293329342935293629372938293929402941294229432944294529462947294829492950295129522953295429552956295729582959296029612962296329642965296629672968296929702971297229732974297529762977297829792980298129822983298429852986298729882989299029912992299329942995299629972998299930003001300230033004300530063007300830093010301130123013301430153016301730183019302030213022302330243025302630273028302930303031303230333034303530363037303830393040304130423043304430453046304730483049305030513052305330543055305630573058305930603061306230633064306530663067306830693070307130723073307430753076307730783079308030813082308330843085308630873088308930903091309230933094309530963097309830993100310131023103310431053106310731083109311031113112311331143115311631173118311931203121312231233124312531263127312831293130313131323133313431353136313731383139314031413142314331443145314631473148314931503151315231533154315531563157315831593160316131623163316431653166316731683169317031713172317331743175317631773178317931803181318231833184318531863187318831893190319131923193319431953196319731983199320032013202320332043205320632073208320932103211321232133214321532163217321832193220322132223223322432253226322732283229323032313232323332343235323632373238323932403241324232433244324532463247324832493250325132523253325432553256325732583259326032613262326332643265326632673268326932703271327232733274327532763277327832793280328132823283328432853286328732883289329032913292329332943295329632973298329933003301330233033304330533063307330833093310331133123313331433153316331733183319332033213322332333243325332633273328332933303331333233333334333533363337333833393340334133423343334433453346334733483349335033513352335333543355335633573358335933603361336233633364336533663367336833693370337133723373337433753376337733783379338033813382338333843385338633873388338933903391339233933394339533963397339833993400340134023403340434053406340734083409341034113412341334143415341634173418341934203421342234233424342534263427342834293430343134323433343434353436343734383439344034413442344334443445344634473448344934503451345234533454345534563457345834593460346134623463346434653466346734683469347034713472347334743475347634773478347934803481348234833484348534863487348834893490349134923493349434953496349734983499350035013502350335043505350635073508350935103511351235133514351535163517351835193520352135223523352435253526352735283529353035313532353335343535353635373538353935403541354235433544354535463547354835493550355135523553355435553556355735583559356035613562356335643565356635673568356935703571357235733574357535763577357835793580358135823583358435853586358735883589359035913592359335943595359635973598359936003601360236033604360536063607360836093610361136123613361436153616361736183619362036213622362336243625362636273628362936303631363236333634363536363637363836393640364136423643364436453646364736483649365036513652365336543655365636573658365936603661366236633664366536663667366836693670367136723673367436753676367736783679368036813682368336843685368636873688368936903691369236933694369536963697369836993700370137023703370437053706370737083709371037113712371337143715371637173718371937203721372237233724372537263727372837293730373137323733373437353736373737383739374037413742374337443745374637473748374937503751375237533754375537563757375837593760376137623763376437653766376737683769377037713772377337743775377637773778377937803781378237833784378537863787378837893790379137923793379437953796379737983799380038013802380338043805
  1. #! /usr/bin/env perl
  2. # Copyright 2014-2016 The OpenSSL Project Authors. All Rights Reserved.
  3. #
  4. # Licensed under the OpenSSL license (the "License"). You may not use
  5. # this file except in compliance with the License. You can obtain a copy
  6. # in the file LICENSE in the source distribution or at
  7. # https://www.openssl.org/source/license.html
  8. #
  9. # ====================================================================
  10. # Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
  11. # project. The module is, however, dual licensed under OpenSSL and
  12. # CRYPTOGAMS licenses depending on where you obtain it. For further
  13. # details see http://www.openssl.org/~appro/cryptogams/.
  14. # ====================================================================
  15. #
  16. # This module implements support for AES instructions as per PowerISA
  17. # specification version 2.07, first implemented by POWER8 processor.
  18. # The module is endian-agnostic in sense that it supports both big-
  19. # and little-endian cases. Data alignment in parallelizable modes is
  20. # handled with VSX loads and stores, which implies MSR.VSX flag being
  21. # set. It should also be noted that ISA specification doesn't prohibit
  22. # alignment exceptions for these instructions on page boundaries.
  23. # Initially alignment was handled in pure AltiVec/VMX way [when data
  24. # is aligned programmatically, which in turn guarantees exception-
  25. # free execution], but it turned to hamper performance when vcipher
  26. # instructions are interleaved. It's reckoned that eventual
  27. # misalignment penalties at page boundaries are in average lower
  28. # than additional overhead in pure AltiVec approach.
  29. #
  30. # May 2016
  31. #
  32. # Add XTS subroutine, 9x on little- and 12x improvement on big-endian
  33. # systems were measured.
  34. #
  35. ######################################################################
  36. # Current large-block performance in cycles per byte processed with
  37. # 128-bit key (less is better).
  38. #
  39. # CBC en-/decrypt CTR XTS
  40. # POWER8[le] 3.96/0.72 0.74 1.1
  41. # POWER8[be] 3.75/0.65 0.66 1.0
  42. $flavour = shift;
  43. if ($flavour =~ /64/) {
  44. $SIZE_T =8;
  45. $LRSAVE =2*$SIZE_T;
  46. $STU ="stdu";
  47. $POP ="ld";
  48. $PUSH ="std";
  49. $UCMP ="cmpld";
  50. $SHL ="sldi";
  51. } elsif ($flavour =~ /32/) {
  52. $SIZE_T =4;
  53. $LRSAVE =$SIZE_T;
  54. $STU ="stwu";
  55. $POP ="lwz";
  56. $PUSH ="stw";
  57. $UCMP ="cmplw";
  58. $SHL ="slwi";
  59. } else { die "nonsense $flavour"; }
  60. $LITTLE_ENDIAN = ($flavour=~/le$/) ? $SIZE_T : 0;
  61. $0 =~ m/(.*[\/\\])[^\/\\]+$/; $dir=$1;
  62. ( $xlate="${dir}ppc-xlate.pl" and -f $xlate ) or
  63. ( $xlate="${dir}../../perlasm/ppc-xlate.pl" and -f $xlate) or
  64. die "can't locate ppc-xlate.pl";
  65. open STDOUT,"| $^X $xlate $flavour ".shift || die "can't call $xlate: $!";
  66. $FRAME=8*$SIZE_T;
  67. $prefix="aes_hw";
  68. $sp="r1";
  69. $vrsave="r12";
  70. #########################################################################
  71. {{{ # Key setup procedures #
  72. my ($inp,$bits,$out,$ptr,$cnt,$rounds)=map("r$_",(3..8));
  73. my ($zero,$in0,$in1,$key,$rcon,$mask,$tmp)=map("v$_",(0..6));
  74. my ($stage,$outperm,$outmask,$outhead,$outtail)=map("v$_",(7..11));
  75. $code.=<<___;
  76. .machine "any"
  77. .text
  78. .align 7
  79. rcon:
  80. .long 0x01000000, 0x01000000, 0x01000000, 0x01000000 ?rev
  81. .long 0x1b000000, 0x1b000000, 0x1b000000, 0x1b000000 ?rev
  82. .long 0x0d0e0f0c, 0x0d0e0f0c, 0x0d0e0f0c, 0x0d0e0f0c ?rev
  83. .long 0,0,0,0 ?asis
  84. Lconsts:
  85. mflr r0
  86. bcl 20,31,\$+4
  87. mflr $ptr #vvvvv "distance between . and rcon
  88. addi $ptr,$ptr,-0x48
  89. mtlr r0
  90. blr
  91. .long 0
  92. .byte 0,12,0x14,0,0,0,0,0
  93. .asciz "AES for PowerISA 2.07, CRYPTOGAMS by <appro\@openssl.org>"
  94. .globl .${prefix}_set_encrypt_key
  95. .align 5
  96. .${prefix}_set_encrypt_key:
  97. Lset_encrypt_key:
  98. mflr r11
  99. $PUSH r11,$LRSAVE($sp)
  100. li $ptr,-1
  101. ${UCMP}i $inp,0
  102. beq- Lenc_key_abort # if ($inp==0) return -1;
  103. ${UCMP}i $out,0
  104. beq- Lenc_key_abort # if ($out==0) return -1;
  105. li $ptr,-2
  106. cmpwi $bits,128
  107. blt- Lenc_key_abort
  108. cmpwi $bits,256
  109. bgt- Lenc_key_abort
  110. andi. r0,$bits,0x3f
  111. bne- Lenc_key_abort
  112. lis r0,0xfff0
  113. mfspr $vrsave,256
  114. mtspr 256,r0
  115. bl Lconsts
  116. mtlr r11
  117. neg r9,$inp
  118. lvx $in0,0,$inp
  119. addi $inp,$inp,15 # 15 is not typo
  120. lvsr $key,0,r9 # borrow $key
  121. li r8,0x20
  122. cmpwi $bits,192
  123. lvx $in1,0,$inp
  124. le?vspltisb $mask,0x0f # borrow $mask
  125. lvx $rcon,0,$ptr
  126. le?vxor $key,$key,$mask # adjust for byte swap
  127. lvx $mask,r8,$ptr
  128. addi $ptr,$ptr,0x10
  129. vperm $in0,$in0,$in1,$key # align [and byte swap in LE]
  130. li $cnt,8
  131. vxor $zero,$zero,$zero
  132. mtctr $cnt
  133. ?lvsr $outperm,0,$out
  134. vspltisb $outmask,-1
  135. lvx $outhead,0,$out
  136. ?vperm $outmask,$zero,$outmask,$outperm
  137. blt Loop128
  138. addi $inp,$inp,8
  139. beq L192
  140. addi $inp,$inp,8
  141. b L256
  142. .align 4
  143. Loop128:
  144. vperm $key,$in0,$in0,$mask # rotate-n-splat
  145. vsldoi $tmp,$zero,$in0,12 # >>32
  146. vperm $outtail,$in0,$in0,$outperm # rotate
  147. vsel $stage,$outhead,$outtail,$outmask
  148. vmr $outhead,$outtail
  149. vcipherlast $key,$key,$rcon
  150. stvx $stage,0,$out
  151. addi $out,$out,16
  152. vxor $in0,$in0,$tmp
  153. vsldoi $tmp,$zero,$tmp,12 # >>32
  154. vxor $in0,$in0,$tmp
  155. vsldoi $tmp,$zero,$tmp,12 # >>32
  156. vxor $in0,$in0,$tmp
  157. vadduwm $rcon,$rcon,$rcon
  158. vxor $in0,$in0,$key
  159. bdnz Loop128
  160. lvx $rcon,0,$ptr # last two round keys
  161. vperm $key,$in0,$in0,$mask # rotate-n-splat
  162. vsldoi $tmp,$zero,$in0,12 # >>32
  163. vperm $outtail,$in0,$in0,$outperm # rotate
  164. vsel $stage,$outhead,$outtail,$outmask
  165. vmr $outhead,$outtail
  166. vcipherlast $key,$key,$rcon
  167. stvx $stage,0,$out
  168. addi $out,$out,16
  169. vxor $in0,$in0,$tmp
  170. vsldoi $tmp,$zero,$tmp,12 # >>32
  171. vxor $in0,$in0,$tmp
  172. vsldoi $tmp,$zero,$tmp,12 # >>32
  173. vxor $in0,$in0,$tmp
  174. vadduwm $rcon,$rcon,$rcon
  175. vxor $in0,$in0,$key
  176. vperm $key,$in0,$in0,$mask # rotate-n-splat
  177. vsldoi $tmp,$zero,$in0,12 # >>32
  178. vperm $outtail,$in0,$in0,$outperm # rotate
  179. vsel $stage,$outhead,$outtail,$outmask
  180. vmr $outhead,$outtail
  181. vcipherlast $key,$key,$rcon
  182. stvx $stage,0,$out
  183. addi $out,$out,16
  184. vxor $in0,$in0,$tmp
  185. vsldoi $tmp,$zero,$tmp,12 # >>32
  186. vxor $in0,$in0,$tmp
  187. vsldoi $tmp,$zero,$tmp,12 # >>32
  188. vxor $in0,$in0,$tmp
  189. vxor $in0,$in0,$key
  190. vperm $outtail,$in0,$in0,$outperm # rotate
  191. vsel $stage,$outhead,$outtail,$outmask
  192. vmr $outhead,$outtail
  193. stvx $stage,0,$out
  194. addi $inp,$out,15 # 15 is not typo
  195. addi $out,$out,0x50
  196. li $rounds,10
  197. b Ldone
  198. .align 4
  199. L192:
  200. lvx $tmp,0,$inp
  201. li $cnt,4
  202. vperm $outtail,$in0,$in0,$outperm # rotate
  203. vsel $stage,$outhead,$outtail,$outmask
  204. vmr $outhead,$outtail
  205. stvx $stage,0,$out
  206. addi $out,$out,16
  207. vperm $in1,$in1,$tmp,$key # align [and byte swap in LE]
  208. vspltisb $key,8 # borrow $key
  209. mtctr $cnt
  210. vsububm $mask,$mask,$key # adjust the mask
  211. Loop192:
  212. vperm $key,$in1,$in1,$mask # roate-n-splat
  213. vsldoi $tmp,$zero,$in0,12 # >>32
  214. vcipherlast $key,$key,$rcon
  215. vxor $in0,$in0,$tmp
  216. vsldoi $tmp,$zero,$tmp,12 # >>32
  217. vxor $in0,$in0,$tmp
  218. vsldoi $tmp,$zero,$tmp,12 # >>32
  219. vxor $in0,$in0,$tmp
  220. vsldoi $stage,$zero,$in1,8
  221. vspltw $tmp,$in0,3
  222. vxor $tmp,$tmp,$in1
  223. vsldoi $in1,$zero,$in1,12 # >>32
  224. vadduwm $rcon,$rcon,$rcon
  225. vxor $in1,$in1,$tmp
  226. vxor $in0,$in0,$key
  227. vxor $in1,$in1,$key
  228. vsldoi $stage,$stage,$in0,8
  229. vperm $key,$in1,$in1,$mask # rotate-n-splat
  230. vsldoi $tmp,$zero,$in0,12 # >>32
  231. vperm $outtail,$stage,$stage,$outperm # rotate
  232. vsel $stage,$outhead,$outtail,$outmask
  233. vmr $outhead,$outtail
  234. vcipherlast $key,$key,$rcon
  235. stvx $stage,0,$out
  236. addi $out,$out,16
  237. vsldoi $stage,$in0,$in1,8
  238. vxor $in0,$in0,$tmp
  239. vsldoi $tmp,$zero,$tmp,12 # >>32
  240. vperm $outtail,$stage,$stage,$outperm # rotate
  241. vsel $stage,$outhead,$outtail,$outmask
  242. vmr $outhead,$outtail
  243. vxor $in0,$in0,$tmp
  244. vsldoi $tmp,$zero,$tmp,12 # >>32
  245. vxor $in0,$in0,$tmp
  246. stvx $stage,0,$out
  247. addi $out,$out,16
  248. vspltw $tmp,$in0,3
  249. vxor $tmp,$tmp,$in1
  250. vsldoi $in1,$zero,$in1,12 # >>32
  251. vadduwm $rcon,$rcon,$rcon
  252. vxor $in1,$in1,$tmp
  253. vxor $in0,$in0,$key
  254. vxor $in1,$in1,$key
  255. vperm $outtail,$in0,$in0,$outperm # rotate
  256. vsel $stage,$outhead,$outtail,$outmask
  257. vmr $outhead,$outtail
  258. stvx $stage,0,$out
  259. addi $inp,$out,15 # 15 is not typo
  260. addi $out,$out,16
  261. bdnz Loop192
  262. li $rounds,12
  263. addi $out,$out,0x20
  264. b Ldone
  265. .align 4
  266. L256:
  267. lvx $tmp,0,$inp
  268. li $cnt,7
  269. li $rounds,14
  270. vperm $outtail,$in0,$in0,$outperm # rotate
  271. vsel $stage,$outhead,$outtail,$outmask
  272. vmr $outhead,$outtail
  273. stvx $stage,0,$out
  274. addi $out,$out,16
  275. vperm $in1,$in1,$tmp,$key # align [and byte swap in LE]
  276. mtctr $cnt
  277. Loop256:
  278. vperm $key,$in1,$in1,$mask # rotate-n-splat
  279. vsldoi $tmp,$zero,$in0,12 # >>32
  280. vperm $outtail,$in1,$in1,$outperm # rotate
  281. vsel $stage,$outhead,$outtail,$outmask
  282. vmr $outhead,$outtail
  283. vcipherlast $key,$key,$rcon
  284. stvx $stage,0,$out
  285. addi $out,$out,16
  286. vxor $in0,$in0,$tmp
  287. vsldoi $tmp,$zero,$tmp,12 # >>32
  288. vxor $in0,$in0,$tmp
  289. vsldoi $tmp,$zero,$tmp,12 # >>32
  290. vxor $in0,$in0,$tmp
  291. vadduwm $rcon,$rcon,$rcon
  292. vxor $in0,$in0,$key
  293. vperm $outtail,$in0,$in0,$outperm # rotate
  294. vsel $stage,$outhead,$outtail,$outmask
  295. vmr $outhead,$outtail
  296. stvx $stage,0,$out
  297. addi $inp,$out,15 # 15 is not typo
  298. addi $out,$out,16
  299. bdz Ldone
  300. vspltw $key,$in0,3 # just splat
  301. vsldoi $tmp,$zero,$in1,12 # >>32
  302. vsbox $key,$key
  303. vxor $in1,$in1,$tmp
  304. vsldoi $tmp,$zero,$tmp,12 # >>32
  305. vxor $in1,$in1,$tmp
  306. vsldoi $tmp,$zero,$tmp,12 # >>32
  307. vxor $in1,$in1,$tmp
  308. vxor $in1,$in1,$key
  309. b Loop256
  310. .align 4
  311. Ldone:
  312. lvx $in1,0,$inp # redundant in aligned case
  313. vsel $in1,$outhead,$in1,$outmask
  314. stvx $in1,0,$inp
  315. li $ptr,0
  316. mtspr 256,$vrsave
  317. stw $rounds,0($out)
  318. Lenc_key_abort:
  319. mr r3,$ptr
  320. blr
  321. .long 0
  322. .byte 0,12,0x14,1,0,0,3,0
  323. .long 0
  324. .size .${prefix}_set_encrypt_key,.-.${prefix}_set_encrypt_key
  325. .globl .${prefix}_set_decrypt_key
  326. .align 5
  327. .${prefix}_set_decrypt_key:
  328. $STU $sp,-$FRAME($sp)
  329. mflr r10
  330. $PUSH r10,$FRAME+$LRSAVE($sp)
  331. bl Lset_encrypt_key
  332. mtlr r10
  333. cmpwi r3,0
  334. bne- Ldec_key_abort
  335. slwi $cnt,$rounds,4
  336. subi $inp,$out,240 # first round key
  337. srwi $rounds,$rounds,1
  338. add $out,$inp,$cnt # last round key
  339. mtctr $rounds
  340. Ldeckey:
  341. lwz r0, 0($inp)
  342. lwz r6, 4($inp)
  343. lwz r7, 8($inp)
  344. lwz r8, 12($inp)
  345. addi $inp,$inp,16
  346. lwz r9, 0($out)
  347. lwz r10,4($out)
  348. lwz r11,8($out)
  349. lwz r12,12($out)
  350. stw r0, 0($out)
  351. stw r6, 4($out)
  352. stw r7, 8($out)
  353. stw r8, 12($out)
  354. subi $out,$out,16
  355. stw r9, -16($inp)
  356. stw r10,-12($inp)
  357. stw r11,-8($inp)
  358. stw r12,-4($inp)
  359. bdnz Ldeckey
  360. xor r3,r3,r3 # return value
  361. Ldec_key_abort:
  362. addi $sp,$sp,$FRAME
  363. blr
  364. .long 0
  365. .byte 0,12,4,1,0x80,0,3,0
  366. .long 0
  367. .size .${prefix}_set_decrypt_key,.-.${prefix}_set_decrypt_key
  368. ___
  369. }}}
  370. #########################################################################
  371. {{{ # Single block en- and decrypt procedures #
  372. sub gen_block () {
  373. my $dir = shift;
  374. my $n = $dir eq "de" ? "n" : "";
  375. my ($inp,$out,$key,$rounds,$idx)=map("r$_",(3..7));
  376. $code.=<<___;
  377. .globl .${prefix}_${dir}crypt
  378. .align 5
  379. .${prefix}_${dir}crypt:
  380. lwz $rounds,240($key)
  381. lis r0,0xfc00
  382. mfspr $vrsave,256
  383. li $idx,15 # 15 is not typo
  384. mtspr 256,r0
  385. lvx v0,0,$inp
  386. neg r11,$out
  387. lvx v1,$idx,$inp
  388. lvsl v2,0,$inp # inpperm
  389. le?vspltisb v4,0x0f
  390. ?lvsl v3,0,r11 # outperm
  391. le?vxor v2,v2,v4
  392. li $idx,16
  393. vperm v0,v0,v1,v2 # align [and byte swap in LE]
  394. lvx v1,0,$key
  395. ?lvsl v5,0,$key # keyperm
  396. srwi $rounds,$rounds,1
  397. lvx v2,$idx,$key
  398. addi $idx,$idx,16
  399. subi $rounds,$rounds,1
  400. ?vperm v1,v1,v2,v5 # align round key
  401. vxor v0,v0,v1
  402. lvx v1,$idx,$key
  403. addi $idx,$idx,16
  404. mtctr $rounds
  405. Loop_${dir}c:
  406. ?vperm v2,v2,v1,v5
  407. v${n}cipher v0,v0,v2
  408. lvx v2,$idx,$key
  409. addi $idx,$idx,16
  410. ?vperm v1,v1,v2,v5
  411. v${n}cipher v0,v0,v1
  412. lvx v1,$idx,$key
  413. addi $idx,$idx,16
  414. bdnz Loop_${dir}c
  415. ?vperm v2,v2,v1,v5
  416. v${n}cipher v0,v0,v2
  417. lvx v2,$idx,$key
  418. ?vperm v1,v1,v2,v5
  419. v${n}cipherlast v0,v0,v1
  420. vspltisb v2,-1
  421. vxor v1,v1,v1
  422. li $idx,15 # 15 is not typo
  423. ?vperm v2,v1,v2,v3 # outmask
  424. le?vxor v3,v3,v4
  425. lvx v1,0,$out # outhead
  426. vperm v0,v0,v0,v3 # rotate [and byte swap in LE]
  427. vsel v1,v1,v0,v2
  428. lvx v4,$idx,$out
  429. stvx v1,0,$out
  430. vsel v0,v0,v4,v2
  431. stvx v0,$idx,$out
  432. mtspr 256,$vrsave
  433. blr
  434. .long 0
  435. .byte 0,12,0x14,0,0,0,3,0
  436. .long 0
  437. .size .${prefix}_${dir}crypt,.-.${prefix}_${dir}crypt
  438. ___
  439. }
  440. &gen_block("en");
  441. &gen_block("de");
  442. }}}
  443. #########################################################################
  444. {{{ # CBC en- and decrypt procedures #
  445. my ($inp,$out,$len,$key,$ivp,$enc,$rounds,$idx)=map("r$_",(3..10));
  446. my ($rndkey0,$rndkey1,$inout,$tmp)= map("v$_",(0..3));
  447. my ($ivec,$inptail,$inpperm,$outhead,$outperm,$outmask,$keyperm)=
  448. map("v$_",(4..10));
  449. $code.=<<___;
  450. .globl .${prefix}_cbc_encrypt
  451. .align 5
  452. .${prefix}_cbc_encrypt:
  453. ${UCMP}i $len,16
  454. bltlr-
  455. cmpwi $enc,0 # test direction
  456. lis r0,0xffe0
  457. mfspr $vrsave,256
  458. mtspr 256,r0
  459. li $idx,15
  460. vxor $rndkey0,$rndkey0,$rndkey0
  461. le?vspltisb $tmp,0x0f
  462. lvx $ivec,0,$ivp # load [unaligned] iv
  463. lvsl $inpperm,0,$ivp
  464. lvx $inptail,$idx,$ivp
  465. le?vxor $inpperm,$inpperm,$tmp
  466. vperm $ivec,$ivec,$inptail,$inpperm
  467. neg r11,$inp
  468. ?lvsl $keyperm,0,$key # prepare for unaligned key
  469. lwz $rounds,240($key)
  470. lvsr $inpperm,0,r11 # prepare for unaligned load
  471. lvx $inptail,0,$inp
  472. addi $inp,$inp,15 # 15 is not typo
  473. le?vxor $inpperm,$inpperm,$tmp
  474. ?lvsr $outperm,0,$out # prepare for unaligned store
  475. vspltisb $outmask,-1
  476. lvx $outhead,0,$out
  477. ?vperm $outmask,$rndkey0,$outmask,$outperm
  478. le?vxor $outperm,$outperm,$tmp
  479. srwi $rounds,$rounds,1
  480. li $idx,16
  481. subi $rounds,$rounds,1
  482. beq Lcbc_dec
  483. Lcbc_enc:
  484. vmr $inout,$inptail
  485. lvx $inptail,0,$inp
  486. addi $inp,$inp,16
  487. mtctr $rounds
  488. subi $len,$len,16 # len-=16
  489. lvx $rndkey0,0,$key
  490. vperm $inout,$inout,$inptail,$inpperm
  491. lvx $rndkey1,$idx,$key
  492. addi $idx,$idx,16
  493. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  494. vxor $inout,$inout,$rndkey0
  495. lvx $rndkey0,$idx,$key
  496. addi $idx,$idx,16
  497. vxor $inout,$inout,$ivec
  498. Loop_cbc_enc:
  499. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  500. vcipher $inout,$inout,$rndkey1
  501. lvx $rndkey1,$idx,$key
  502. addi $idx,$idx,16
  503. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  504. vcipher $inout,$inout,$rndkey0
  505. lvx $rndkey0,$idx,$key
  506. addi $idx,$idx,16
  507. bdnz Loop_cbc_enc
  508. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  509. vcipher $inout,$inout,$rndkey1
  510. lvx $rndkey1,$idx,$key
  511. li $idx,16
  512. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  513. vcipherlast $ivec,$inout,$rndkey0
  514. ${UCMP}i $len,16
  515. vperm $tmp,$ivec,$ivec,$outperm
  516. vsel $inout,$outhead,$tmp,$outmask
  517. vmr $outhead,$tmp
  518. stvx $inout,0,$out
  519. addi $out,$out,16
  520. bge Lcbc_enc
  521. b Lcbc_done
  522. .align 4
  523. Lcbc_dec:
  524. ${UCMP}i $len,128
  525. bge _aesp8_cbc_decrypt8x
  526. vmr $tmp,$inptail
  527. lvx $inptail,0,$inp
  528. addi $inp,$inp,16
  529. mtctr $rounds
  530. subi $len,$len,16 # len-=16
  531. lvx $rndkey0,0,$key
  532. vperm $tmp,$tmp,$inptail,$inpperm
  533. lvx $rndkey1,$idx,$key
  534. addi $idx,$idx,16
  535. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  536. vxor $inout,$tmp,$rndkey0
  537. lvx $rndkey0,$idx,$key
  538. addi $idx,$idx,16
  539. Loop_cbc_dec:
  540. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  541. vncipher $inout,$inout,$rndkey1
  542. lvx $rndkey1,$idx,$key
  543. addi $idx,$idx,16
  544. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  545. vncipher $inout,$inout,$rndkey0
  546. lvx $rndkey0,$idx,$key
  547. addi $idx,$idx,16
  548. bdnz Loop_cbc_dec
  549. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  550. vncipher $inout,$inout,$rndkey1
  551. lvx $rndkey1,$idx,$key
  552. li $idx,16
  553. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  554. vncipherlast $inout,$inout,$rndkey0
  555. ${UCMP}i $len,16
  556. vxor $inout,$inout,$ivec
  557. vmr $ivec,$tmp
  558. vperm $tmp,$inout,$inout,$outperm
  559. vsel $inout,$outhead,$tmp,$outmask
  560. vmr $outhead,$tmp
  561. stvx $inout,0,$out
  562. addi $out,$out,16
  563. bge Lcbc_dec
  564. Lcbc_done:
  565. addi $out,$out,-1
  566. lvx $inout,0,$out # redundant in aligned case
  567. vsel $inout,$outhead,$inout,$outmask
  568. stvx $inout,0,$out
  569. neg $enc,$ivp # write [unaligned] iv
  570. li $idx,15 # 15 is not typo
  571. vxor $rndkey0,$rndkey0,$rndkey0
  572. vspltisb $outmask,-1
  573. le?vspltisb $tmp,0x0f
  574. ?lvsl $outperm,0,$enc
  575. ?vperm $outmask,$rndkey0,$outmask,$outperm
  576. le?vxor $outperm,$outperm,$tmp
  577. lvx $outhead,0,$ivp
  578. vperm $ivec,$ivec,$ivec,$outperm
  579. vsel $inout,$outhead,$ivec,$outmask
  580. lvx $inptail,$idx,$ivp
  581. stvx $inout,0,$ivp
  582. vsel $inout,$ivec,$inptail,$outmask
  583. stvx $inout,$idx,$ivp
  584. mtspr 256,$vrsave
  585. blr
  586. .long 0
  587. .byte 0,12,0x14,0,0,0,6,0
  588. .long 0
  589. ___
  590. #########################################################################
  591. {{ # Optimized CBC decrypt procedure #
  592. my $key_="r11";
  593. my ($x00,$x10,$x20,$x30,$x40,$x50,$x60,$x70)=map("r$_",(0,8,26..31));
  594. $x00=0 if ($flavour =~ /osx/);
  595. my ($in0, $in1, $in2, $in3, $in4, $in5, $in6, $in7 )=map("v$_",(0..3,10..13));
  596. my ($out0,$out1,$out2,$out3,$out4,$out5,$out6,$out7)=map("v$_",(14..21));
  597. my $rndkey0="v23"; # v24-v25 rotating buffer for first found keys
  598. # v26-v31 last 6 round keys
  599. my ($tmp,$keyperm)=($in3,$in4); # aliases with "caller", redundant assignment
  600. $code.=<<___;
  601. .align 5
  602. _aesp8_cbc_decrypt8x:
  603. $STU $sp,-`($FRAME+21*16+6*$SIZE_T)`($sp)
  604. li r10,`$FRAME+8*16+15`
  605. li r11,`$FRAME+8*16+31`
  606. stvx v20,r10,$sp # ABI says so
  607. addi r10,r10,32
  608. stvx v21,r11,$sp
  609. addi r11,r11,32
  610. stvx v22,r10,$sp
  611. addi r10,r10,32
  612. stvx v23,r11,$sp
  613. addi r11,r11,32
  614. stvx v24,r10,$sp
  615. addi r10,r10,32
  616. stvx v25,r11,$sp
  617. addi r11,r11,32
  618. stvx v26,r10,$sp
  619. addi r10,r10,32
  620. stvx v27,r11,$sp
  621. addi r11,r11,32
  622. stvx v28,r10,$sp
  623. addi r10,r10,32
  624. stvx v29,r11,$sp
  625. addi r11,r11,32
  626. stvx v30,r10,$sp
  627. stvx v31,r11,$sp
  628. li r0,-1
  629. stw $vrsave,`$FRAME+21*16-4`($sp) # save vrsave
  630. li $x10,0x10
  631. $PUSH r26,`$FRAME+21*16+0*$SIZE_T`($sp)
  632. li $x20,0x20
  633. $PUSH r27,`$FRAME+21*16+1*$SIZE_T`($sp)
  634. li $x30,0x30
  635. $PUSH r28,`$FRAME+21*16+2*$SIZE_T`($sp)
  636. li $x40,0x40
  637. $PUSH r29,`$FRAME+21*16+3*$SIZE_T`($sp)
  638. li $x50,0x50
  639. $PUSH r30,`$FRAME+21*16+4*$SIZE_T`($sp)
  640. li $x60,0x60
  641. $PUSH r31,`$FRAME+21*16+5*$SIZE_T`($sp)
  642. li $x70,0x70
  643. mtspr 256,r0
  644. subi $rounds,$rounds,3 # -4 in total
  645. subi $len,$len,128 # bias
  646. lvx $rndkey0,$x00,$key # load key schedule
  647. lvx v30,$x10,$key
  648. addi $key,$key,0x20
  649. lvx v31,$x00,$key
  650. ?vperm $rndkey0,$rndkey0,v30,$keyperm
  651. addi $key_,$sp,$FRAME+15
  652. mtctr $rounds
  653. Load_cbc_dec_key:
  654. ?vperm v24,v30,v31,$keyperm
  655. lvx v30,$x10,$key
  656. addi $key,$key,0x20
  657. stvx v24,$x00,$key_ # off-load round[1]
  658. ?vperm v25,v31,v30,$keyperm
  659. lvx v31,$x00,$key
  660. stvx v25,$x10,$key_ # off-load round[2]
  661. addi $key_,$key_,0x20
  662. bdnz Load_cbc_dec_key
  663. lvx v26,$x10,$key
  664. ?vperm v24,v30,v31,$keyperm
  665. lvx v27,$x20,$key
  666. stvx v24,$x00,$key_ # off-load round[3]
  667. ?vperm v25,v31,v26,$keyperm
  668. lvx v28,$x30,$key
  669. stvx v25,$x10,$key_ # off-load round[4]
  670. addi $key_,$sp,$FRAME+15 # rewind $key_
  671. ?vperm v26,v26,v27,$keyperm
  672. lvx v29,$x40,$key
  673. ?vperm v27,v27,v28,$keyperm
  674. lvx v30,$x50,$key
  675. ?vperm v28,v28,v29,$keyperm
  676. lvx v31,$x60,$key
  677. ?vperm v29,v29,v30,$keyperm
  678. lvx $out0,$x70,$key # borrow $out0
  679. ?vperm v30,v30,v31,$keyperm
  680. lvx v24,$x00,$key_ # pre-load round[1]
  681. ?vperm v31,v31,$out0,$keyperm
  682. lvx v25,$x10,$key_ # pre-load round[2]
  683. #lvx $inptail,0,$inp # "caller" already did this
  684. #addi $inp,$inp,15 # 15 is not typo
  685. subi $inp,$inp,15 # undo "caller"
  686. le?li $idx,8
  687. lvx_u $in0,$x00,$inp # load first 8 "words"
  688. le?lvsl $inpperm,0,$idx
  689. le?vspltisb $tmp,0x0f
  690. lvx_u $in1,$x10,$inp
  691. le?vxor $inpperm,$inpperm,$tmp # transform for lvx_u/stvx_u
  692. lvx_u $in2,$x20,$inp
  693. le?vperm $in0,$in0,$in0,$inpperm
  694. lvx_u $in3,$x30,$inp
  695. le?vperm $in1,$in1,$in1,$inpperm
  696. lvx_u $in4,$x40,$inp
  697. le?vperm $in2,$in2,$in2,$inpperm
  698. vxor $out0,$in0,$rndkey0
  699. lvx_u $in5,$x50,$inp
  700. le?vperm $in3,$in3,$in3,$inpperm
  701. vxor $out1,$in1,$rndkey0
  702. lvx_u $in6,$x60,$inp
  703. le?vperm $in4,$in4,$in4,$inpperm
  704. vxor $out2,$in2,$rndkey0
  705. lvx_u $in7,$x70,$inp
  706. addi $inp,$inp,0x80
  707. le?vperm $in5,$in5,$in5,$inpperm
  708. vxor $out3,$in3,$rndkey0
  709. le?vperm $in6,$in6,$in6,$inpperm
  710. vxor $out4,$in4,$rndkey0
  711. le?vperm $in7,$in7,$in7,$inpperm
  712. vxor $out5,$in5,$rndkey0
  713. vxor $out6,$in6,$rndkey0
  714. vxor $out7,$in7,$rndkey0
  715. mtctr $rounds
  716. b Loop_cbc_dec8x
  717. .align 5
  718. Loop_cbc_dec8x:
  719. vncipher $out0,$out0,v24
  720. vncipher $out1,$out1,v24
  721. vncipher $out2,$out2,v24
  722. vncipher $out3,$out3,v24
  723. vncipher $out4,$out4,v24
  724. vncipher $out5,$out5,v24
  725. vncipher $out6,$out6,v24
  726. vncipher $out7,$out7,v24
  727. lvx v24,$x20,$key_ # round[3]
  728. addi $key_,$key_,0x20
  729. vncipher $out0,$out0,v25
  730. vncipher $out1,$out1,v25
  731. vncipher $out2,$out2,v25
  732. vncipher $out3,$out3,v25
  733. vncipher $out4,$out4,v25
  734. vncipher $out5,$out5,v25
  735. vncipher $out6,$out6,v25
  736. vncipher $out7,$out7,v25
  737. lvx v25,$x10,$key_ # round[4]
  738. bdnz Loop_cbc_dec8x
  739. subic $len,$len,128 # $len-=128
  740. vncipher $out0,$out0,v24
  741. vncipher $out1,$out1,v24
  742. vncipher $out2,$out2,v24
  743. vncipher $out3,$out3,v24
  744. vncipher $out4,$out4,v24
  745. vncipher $out5,$out5,v24
  746. vncipher $out6,$out6,v24
  747. vncipher $out7,$out7,v24
  748. subfe. r0,r0,r0 # borrow?-1:0
  749. vncipher $out0,$out0,v25
  750. vncipher $out1,$out1,v25
  751. vncipher $out2,$out2,v25
  752. vncipher $out3,$out3,v25
  753. vncipher $out4,$out4,v25
  754. vncipher $out5,$out5,v25
  755. vncipher $out6,$out6,v25
  756. vncipher $out7,$out7,v25
  757. and r0,r0,$len
  758. vncipher $out0,$out0,v26
  759. vncipher $out1,$out1,v26
  760. vncipher $out2,$out2,v26
  761. vncipher $out3,$out3,v26
  762. vncipher $out4,$out4,v26
  763. vncipher $out5,$out5,v26
  764. vncipher $out6,$out6,v26
  765. vncipher $out7,$out7,v26
  766. add $inp,$inp,r0 # $inp is adjusted in such
  767. # way that at exit from the
  768. # loop inX-in7 are loaded
  769. # with last "words"
  770. vncipher $out0,$out0,v27
  771. vncipher $out1,$out1,v27
  772. vncipher $out2,$out2,v27
  773. vncipher $out3,$out3,v27
  774. vncipher $out4,$out4,v27
  775. vncipher $out5,$out5,v27
  776. vncipher $out6,$out6,v27
  777. vncipher $out7,$out7,v27
  778. addi $key_,$sp,$FRAME+15 # rewind $key_
  779. vncipher $out0,$out0,v28
  780. vncipher $out1,$out1,v28
  781. vncipher $out2,$out2,v28
  782. vncipher $out3,$out3,v28
  783. vncipher $out4,$out4,v28
  784. vncipher $out5,$out5,v28
  785. vncipher $out6,$out6,v28
  786. vncipher $out7,$out7,v28
  787. lvx v24,$x00,$key_ # re-pre-load round[1]
  788. vncipher $out0,$out0,v29
  789. vncipher $out1,$out1,v29
  790. vncipher $out2,$out2,v29
  791. vncipher $out3,$out3,v29
  792. vncipher $out4,$out4,v29
  793. vncipher $out5,$out5,v29
  794. vncipher $out6,$out6,v29
  795. vncipher $out7,$out7,v29
  796. lvx v25,$x10,$key_ # re-pre-load round[2]
  797. vncipher $out0,$out0,v30
  798. vxor $ivec,$ivec,v31 # xor with last round key
  799. vncipher $out1,$out1,v30
  800. vxor $in0,$in0,v31
  801. vncipher $out2,$out2,v30
  802. vxor $in1,$in1,v31
  803. vncipher $out3,$out3,v30
  804. vxor $in2,$in2,v31
  805. vncipher $out4,$out4,v30
  806. vxor $in3,$in3,v31
  807. vncipher $out5,$out5,v30
  808. vxor $in4,$in4,v31
  809. vncipher $out6,$out6,v30
  810. vxor $in5,$in5,v31
  811. vncipher $out7,$out7,v30
  812. vxor $in6,$in6,v31
  813. vncipherlast $out0,$out0,$ivec
  814. vncipherlast $out1,$out1,$in0
  815. lvx_u $in0,$x00,$inp # load next input block
  816. vncipherlast $out2,$out2,$in1
  817. lvx_u $in1,$x10,$inp
  818. vncipherlast $out3,$out3,$in2
  819. le?vperm $in0,$in0,$in0,$inpperm
  820. lvx_u $in2,$x20,$inp
  821. vncipherlast $out4,$out4,$in3
  822. le?vperm $in1,$in1,$in1,$inpperm
  823. lvx_u $in3,$x30,$inp
  824. vncipherlast $out5,$out5,$in4
  825. le?vperm $in2,$in2,$in2,$inpperm
  826. lvx_u $in4,$x40,$inp
  827. vncipherlast $out6,$out6,$in5
  828. le?vperm $in3,$in3,$in3,$inpperm
  829. lvx_u $in5,$x50,$inp
  830. vncipherlast $out7,$out7,$in6
  831. le?vperm $in4,$in4,$in4,$inpperm
  832. lvx_u $in6,$x60,$inp
  833. vmr $ivec,$in7
  834. le?vperm $in5,$in5,$in5,$inpperm
  835. lvx_u $in7,$x70,$inp
  836. addi $inp,$inp,0x80
  837. le?vperm $out0,$out0,$out0,$inpperm
  838. le?vperm $out1,$out1,$out1,$inpperm
  839. stvx_u $out0,$x00,$out
  840. le?vperm $in6,$in6,$in6,$inpperm
  841. vxor $out0,$in0,$rndkey0
  842. le?vperm $out2,$out2,$out2,$inpperm
  843. stvx_u $out1,$x10,$out
  844. le?vperm $in7,$in7,$in7,$inpperm
  845. vxor $out1,$in1,$rndkey0
  846. le?vperm $out3,$out3,$out3,$inpperm
  847. stvx_u $out2,$x20,$out
  848. vxor $out2,$in2,$rndkey0
  849. le?vperm $out4,$out4,$out4,$inpperm
  850. stvx_u $out3,$x30,$out
  851. vxor $out3,$in3,$rndkey0
  852. le?vperm $out5,$out5,$out5,$inpperm
  853. stvx_u $out4,$x40,$out
  854. vxor $out4,$in4,$rndkey0
  855. le?vperm $out6,$out6,$out6,$inpperm
  856. stvx_u $out5,$x50,$out
  857. vxor $out5,$in5,$rndkey0
  858. le?vperm $out7,$out7,$out7,$inpperm
  859. stvx_u $out6,$x60,$out
  860. vxor $out6,$in6,$rndkey0
  861. stvx_u $out7,$x70,$out
  862. addi $out,$out,0x80
  863. vxor $out7,$in7,$rndkey0
  864. mtctr $rounds
  865. beq Loop_cbc_dec8x # did $len-=128 borrow?
  866. addic. $len,$len,128
  867. beq Lcbc_dec8x_done
  868. nop
  869. nop
  870. Loop_cbc_dec8x_tail: # up to 7 "words" tail...
  871. vncipher $out1,$out1,v24
  872. vncipher $out2,$out2,v24
  873. vncipher $out3,$out3,v24
  874. vncipher $out4,$out4,v24
  875. vncipher $out5,$out5,v24
  876. vncipher $out6,$out6,v24
  877. vncipher $out7,$out7,v24
  878. lvx v24,$x20,$key_ # round[3]
  879. addi $key_,$key_,0x20
  880. vncipher $out1,$out1,v25
  881. vncipher $out2,$out2,v25
  882. vncipher $out3,$out3,v25
  883. vncipher $out4,$out4,v25
  884. vncipher $out5,$out5,v25
  885. vncipher $out6,$out6,v25
  886. vncipher $out7,$out7,v25
  887. lvx v25,$x10,$key_ # round[4]
  888. bdnz Loop_cbc_dec8x_tail
  889. vncipher $out1,$out1,v24
  890. vncipher $out2,$out2,v24
  891. vncipher $out3,$out3,v24
  892. vncipher $out4,$out4,v24
  893. vncipher $out5,$out5,v24
  894. vncipher $out6,$out6,v24
  895. vncipher $out7,$out7,v24
  896. vncipher $out1,$out1,v25
  897. vncipher $out2,$out2,v25
  898. vncipher $out3,$out3,v25
  899. vncipher $out4,$out4,v25
  900. vncipher $out5,$out5,v25
  901. vncipher $out6,$out6,v25
  902. vncipher $out7,$out7,v25
  903. vncipher $out1,$out1,v26
  904. vncipher $out2,$out2,v26
  905. vncipher $out3,$out3,v26
  906. vncipher $out4,$out4,v26
  907. vncipher $out5,$out5,v26
  908. vncipher $out6,$out6,v26
  909. vncipher $out7,$out7,v26
  910. vncipher $out1,$out1,v27
  911. vncipher $out2,$out2,v27
  912. vncipher $out3,$out3,v27
  913. vncipher $out4,$out4,v27
  914. vncipher $out5,$out5,v27
  915. vncipher $out6,$out6,v27
  916. vncipher $out7,$out7,v27
  917. vncipher $out1,$out1,v28
  918. vncipher $out2,$out2,v28
  919. vncipher $out3,$out3,v28
  920. vncipher $out4,$out4,v28
  921. vncipher $out5,$out5,v28
  922. vncipher $out6,$out6,v28
  923. vncipher $out7,$out7,v28
  924. vncipher $out1,$out1,v29
  925. vncipher $out2,$out2,v29
  926. vncipher $out3,$out3,v29
  927. vncipher $out4,$out4,v29
  928. vncipher $out5,$out5,v29
  929. vncipher $out6,$out6,v29
  930. vncipher $out7,$out7,v29
  931. vncipher $out1,$out1,v30
  932. vxor $ivec,$ivec,v31 # last round key
  933. vncipher $out2,$out2,v30
  934. vxor $in1,$in1,v31
  935. vncipher $out3,$out3,v30
  936. vxor $in2,$in2,v31
  937. vncipher $out4,$out4,v30
  938. vxor $in3,$in3,v31
  939. vncipher $out5,$out5,v30
  940. vxor $in4,$in4,v31
  941. vncipher $out6,$out6,v30
  942. vxor $in5,$in5,v31
  943. vncipher $out7,$out7,v30
  944. vxor $in6,$in6,v31
  945. cmplwi $len,32 # switch($len)
  946. blt Lcbc_dec8x_one
  947. nop
  948. beq Lcbc_dec8x_two
  949. cmplwi $len,64
  950. blt Lcbc_dec8x_three
  951. nop
  952. beq Lcbc_dec8x_four
  953. cmplwi $len,96
  954. blt Lcbc_dec8x_five
  955. nop
  956. beq Lcbc_dec8x_six
  957. Lcbc_dec8x_seven:
  958. vncipherlast $out1,$out1,$ivec
  959. vncipherlast $out2,$out2,$in1
  960. vncipherlast $out3,$out3,$in2
  961. vncipherlast $out4,$out4,$in3
  962. vncipherlast $out5,$out5,$in4
  963. vncipherlast $out6,$out6,$in5
  964. vncipherlast $out7,$out7,$in6
  965. vmr $ivec,$in7
  966. le?vperm $out1,$out1,$out1,$inpperm
  967. le?vperm $out2,$out2,$out2,$inpperm
  968. stvx_u $out1,$x00,$out
  969. le?vperm $out3,$out3,$out3,$inpperm
  970. stvx_u $out2,$x10,$out
  971. le?vperm $out4,$out4,$out4,$inpperm
  972. stvx_u $out3,$x20,$out
  973. le?vperm $out5,$out5,$out5,$inpperm
  974. stvx_u $out4,$x30,$out
  975. le?vperm $out6,$out6,$out6,$inpperm
  976. stvx_u $out5,$x40,$out
  977. le?vperm $out7,$out7,$out7,$inpperm
  978. stvx_u $out6,$x50,$out
  979. stvx_u $out7,$x60,$out
  980. addi $out,$out,0x70
  981. b Lcbc_dec8x_done
  982. .align 5
  983. Lcbc_dec8x_six:
  984. vncipherlast $out2,$out2,$ivec
  985. vncipherlast $out3,$out3,$in2
  986. vncipherlast $out4,$out4,$in3
  987. vncipherlast $out5,$out5,$in4
  988. vncipherlast $out6,$out6,$in5
  989. vncipherlast $out7,$out7,$in6
  990. vmr $ivec,$in7
  991. le?vperm $out2,$out2,$out2,$inpperm
  992. le?vperm $out3,$out3,$out3,$inpperm
  993. stvx_u $out2,$x00,$out
  994. le?vperm $out4,$out4,$out4,$inpperm
  995. stvx_u $out3,$x10,$out
  996. le?vperm $out5,$out5,$out5,$inpperm
  997. stvx_u $out4,$x20,$out
  998. le?vperm $out6,$out6,$out6,$inpperm
  999. stvx_u $out5,$x30,$out
  1000. le?vperm $out7,$out7,$out7,$inpperm
  1001. stvx_u $out6,$x40,$out
  1002. stvx_u $out7,$x50,$out
  1003. addi $out,$out,0x60
  1004. b Lcbc_dec8x_done
  1005. .align 5
  1006. Lcbc_dec8x_five:
  1007. vncipherlast $out3,$out3,$ivec
  1008. vncipherlast $out4,$out4,$in3
  1009. vncipherlast $out5,$out5,$in4
  1010. vncipherlast $out6,$out6,$in5
  1011. vncipherlast $out7,$out7,$in6
  1012. vmr $ivec,$in7
  1013. le?vperm $out3,$out3,$out3,$inpperm
  1014. le?vperm $out4,$out4,$out4,$inpperm
  1015. stvx_u $out3,$x00,$out
  1016. le?vperm $out5,$out5,$out5,$inpperm
  1017. stvx_u $out4,$x10,$out
  1018. le?vperm $out6,$out6,$out6,$inpperm
  1019. stvx_u $out5,$x20,$out
  1020. le?vperm $out7,$out7,$out7,$inpperm
  1021. stvx_u $out6,$x30,$out
  1022. stvx_u $out7,$x40,$out
  1023. addi $out,$out,0x50
  1024. b Lcbc_dec8x_done
  1025. .align 5
  1026. Lcbc_dec8x_four:
  1027. vncipherlast $out4,$out4,$ivec
  1028. vncipherlast $out5,$out5,$in4
  1029. vncipherlast $out6,$out6,$in5
  1030. vncipherlast $out7,$out7,$in6
  1031. vmr $ivec,$in7
  1032. le?vperm $out4,$out4,$out4,$inpperm
  1033. le?vperm $out5,$out5,$out5,$inpperm
  1034. stvx_u $out4,$x00,$out
  1035. le?vperm $out6,$out6,$out6,$inpperm
  1036. stvx_u $out5,$x10,$out
  1037. le?vperm $out7,$out7,$out7,$inpperm
  1038. stvx_u $out6,$x20,$out
  1039. stvx_u $out7,$x30,$out
  1040. addi $out,$out,0x40
  1041. b Lcbc_dec8x_done
  1042. .align 5
  1043. Lcbc_dec8x_three:
  1044. vncipherlast $out5,$out5,$ivec
  1045. vncipherlast $out6,$out6,$in5
  1046. vncipherlast $out7,$out7,$in6
  1047. vmr $ivec,$in7
  1048. le?vperm $out5,$out5,$out5,$inpperm
  1049. le?vperm $out6,$out6,$out6,$inpperm
  1050. stvx_u $out5,$x00,$out
  1051. le?vperm $out7,$out7,$out7,$inpperm
  1052. stvx_u $out6,$x10,$out
  1053. stvx_u $out7,$x20,$out
  1054. addi $out,$out,0x30
  1055. b Lcbc_dec8x_done
  1056. .align 5
  1057. Lcbc_dec8x_two:
  1058. vncipherlast $out6,$out6,$ivec
  1059. vncipherlast $out7,$out7,$in6
  1060. vmr $ivec,$in7
  1061. le?vperm $out6,$out6,$out6,$inpperm
  1062. le?vperm $out7,$out7,$out7,$inpperm
  1063. stvx_u $out6,$x00,$out
  1064. stvx_u $out7,$x10,$out
  1065. addi $out,$out,0x20
  1066. b Lcbc_dec8x_done
  1067. .align 5
  1068. Lcbc_dec8x_one:
  1069. vncipherlast $out7,$out7,$ivec
  1070. vmr $ivec,$in7
  1071. le?vperm $out7,$out7,$out7,$inpperm
  1072. stvx_u $out7,0,$out
  1073. addi $out,$out,0x10
  1074. Lcbc_dec8x_done:
  1075. le?vperm $ivec,$ivec,$ivec,$inpperm
  1076. stvx_u $ivec,0,$ivp # write [unaligned] iv
  1077. li r10,`$FRAME+15`
  1078. li r11,`$FRAME+31`
  1079. stvx $inpperm,r10,$sp # wipe copies of round keys
  1080. addi r10,r10,32
  1081. stvx $inpperm,r11,$sp
  1082. addi r11,r11,32
  1083. stvx $inpperm,r10,$sp
  1084. addi r10,r10,32
  1085. stvx $inpperm,r11,$sp
  1086. addi r11,r11,32
  1087. stvx $inpperm,r10,$sp
  1088. addi r10,r10,32
  1089. stvx $inpperm,r11,$sp
  1090. addi r11,r11,32
  1091. stvx $inpperm,r10,$sp
  1092. addi r10,r10,32
  1093. stvx $inpperm,r11,$sp
  1094. addi r11,r11,32
  1095. mtspr 256,$vrsave
  1096. lvx v20,r10,$sp # ABI says so
  1097. addi r10,r10,32
  1098. lvx v21,r11,$sp
  1099. addi r11,r11,32
  1100. lvx v22,r10,$sp
  1101. addi r10,r10,32
  1102. lvx v23,r11,$sp
  1103. addi r11,r11,32
  1104. lvx v24,r10,$sp
  1105. addi r10,r10,32
  1106. lvx v25,r11,$sp
  1107. addi r11,r11,32
  1108. lvx v26,r10,$sp
  1109. addi r10,r10,32
  1110. lvx v27,r11,$sp
  1111. addi r11,r11,32
  1112. lvx v28,r10,$sp
  1113. addi r10,r10,32
  1114. lvx v29,r11,$sp
  1115. addi r11,r11,32
  1116. lvx v30,r10,$sp
  1117. lvx v31,r11,$sp
  1118. $POP r26,`$FRAME+21*16+0*$SIZE_T`($sp)
  1119. $POP r27,`$FRAME+21*16+1*$SIZE_T`($sp)
  1120. $POP r28,`$FRAME+21*16+2*$SIZE_T`($sp)
  1121. $POP r29,`$FRAME+21*16+3*$SIZE_T`($sp)
  1122. $POP r30,`$FRAME+21*16+4*$SIZE_T`($sp)
  1123. $POP r31,`$FRAME+21*16+5*$SIZE_T`($sp)
  1124. addi $sp,$sp,`$FRAME+21*16+6*$SIZE_T`
  1125. blr
  1126. .long 0
  1127. .byte 0,12,0x04,0,0x80,6,6,0
  1128. .long 0
  1129. .size .${prefix}_cbc_encrypt,.-.${prefix}_cbc_encrypt
  1130. ___
  1131. }} }}}
  1132. #########################################################################
  1133. {{{ # CTR procedure[s] #
  1134. my ($inp,$out,$len,$key,$ivp,$x10,$rounds,$idx)=map("r$_",(3..10));
  1135. my ($rndkey0,$rndkey1,$inout,$tmp)= map("v$_",(0..3));
  1136. my ($ivec,$inptail,$inpperm,$outhead,$outperm,$outmask,$keyperm,$one)=
  1137. map("v$_",(4..11));
  1138. my $dat=$tmp;
  1139. $code.=<<___;
  1140. .globl .${prefix}_ctr32_encrypt_blocks
  1141. .align 5
  1142. .${prefix}_ctr32_encrypt_blocks:
  1143. ${UCMP}i $len,1
  1144. bltlr-
  1145. lis r0,0xfff0
  1146. mfspr $vrsave,256
  1147. mtspr 256,r0
  1148. li $idx,15
  1149. vxor $rndkey0,$rndkey0,$rndkey0
  1150. le?vspltisb $tmp,0x0f
  1151. lvx $ivec,0,$ivp # load [unaligned] iv
  1152. lvsl $inpperm,0,$ivp
  1153. lvx $inptail,$idx,$ivp
  1154. vspltisb $one,1
  1155. le?vxor $inpperm,$inpperm,$tmp
  1156. vperm $ivec,$ivec,$inptail,$inpperm
  1157. vsldoi $one,$rndkey0,$one,1
  1158. neg r11,$inp
  1159. ?lvsl $keyperm,0,$key # prepare for unaligned key
  1160. lwz $rounds,240($key)
  1161. lvsr $inpperm,0,r11 # prepare for unaligned load
  1162. lvx $inptail,0,$inp
  1163. addi $inp,$inp,15 # 15 is not typo
  1164. le?vxor $inpperm,$inpperm,$tmp
  1165. srwi $rounds,$rounds,1
  1166. li $idx,16
  1167. subi $rounds,$rounds,1
  1168. ${UCMP}i $len,8
  1169. bge _aesp8_ctr32_encrypt8x
  1170. ?lvsr $outperm,0,$out # prepare for unaligned store
  1171. vspltisb $outmask,-1
  1172. lvx $outhead,0,$out
  1173. ?vperm $outmask,$rndkey0,$outmask,$outperm
  1174. le?vxor $outperm,$outperm,$tmp
  1175. lvx $rndkey0,0,$key
  1176. mtctr $rounds
  1177. lvx $rndkey1,$idx,$key
  1178. addi $idx,$idx,16
  1179. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  1180. vxor $inout,$ivec,$rndkey0
  1181. lvx $rndkey0,$idx,$key
  1182. addi $idx,$idx,16
  1183. b Loop_ctr32_enc
  1184. .align 5
  1185. Loop_ctr32_enc:
  1186. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  1187. vcipher $inout,$inout,$rndkey1
  1188. lvx $rndkey1,$idx,$key
  1189. addi $idx,$idx,16
  1190. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  1191. vcipher $inout,$inout,$rndkey0
  1192. lvx $rndkey0,$idx,$key
  1193. addi $idx,$idx,16
  1194. bdnz Loop_ctr32_enc
  1195. vadduwm $ivec,$ivec,$one
  1196. vmr $dat,$inptail
  1197. lvx $inptail,0,$inp
  1198. addi $inp,$inp,16
  1199. subic. $len,$len,1 # blocks--
  1200. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  1201. vcipher $inout,$inout,$rndkey1
  1202. lvx $rndkey1,$idx,$key
  1203. vperm $dat,$dat,$inptail,$inpperm
  1204. li $idx,16
  1205. ?vperm $rndkey1,$rndkey0,$rndkey1,$keyperm
  1206. lvx $rndkey0,0,$key
  1207. vxor $dat,$dat,$rndkey1 # last round key
  1208. vcipherlast $inout,$inout,$dat
  1209. lvx $rndkey1,$idx,$key
  1210. addi $idx,$idx,16
  1211. vperm $inout,$inout,$inout,$outperm
  1212. vsel $dat,$outhead,$inout,$outmask
  1213. mtctr $rounds
  1214. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  1215. vmr $outhead,$inout
  1216. vxor $inout,$ivec,$rndkey0
  1217. lvx $rndkey0,$idx,$key
  1218. addi $idx,$idx,16
  1219. stvx $dat,0,$out
  1220. addi $out,$out,16
  1221. bne Loop_ctr32_enc
  1222. addi $out,$out,-1
  1223. lvx $inout,0,$out # redundant in aligned case
  1224. vsel $inout,$outhead,$inout,$outmask
  1225. stvx $inout,0,$out
  1226. mtspr 256,$vrsave
  1227. blr
  1228. .long 0
  1229. .byte 0,12,0x14,0,0,0,6,0
  1230. .long 0
  1231. ___
  1232. #########################################################################
  1233. {{ # Optimized CTR procedure #
  1234. my $key_="r11";
  1235. my ($x00,$x10,$x20,$x30,$x40,$x50,$x60,$x70)=map("r$_",(0,8,26..31));
  1236. $x00=0 if ($flavour =~ /osx/);
  1237. my ($in0, $in1, $in2, $in3, $in4, $in5, $in6, $in7 )=map("v$_",(0..3,10,12..14));
  1238. my ($out0,$out1,$out2,$out3,$out4,$out5,$out6,$out7)=map("v$_",(15..22));
  1239. my $rndkey0="v23"; # v24-v25 rotating buffer for first found keys
  1240. # v26-v31 last 6 round keys
  1241. my ($tmp,$keyperm)=($in3,$in4); # aliases with "caller", redundant assignment
  1242. my ($two,$three,$four)=($outhead,$outperm,$outmask);
  1243. $code.=<<___;
  1244. .align 5
  1245. _aesp8_ctr32_encrypt8x:
  1246. $STU $sp,-`($FRAME+21*16+6*$SIZE_T)`($sp)
  1247. li r10,`$FRAME+8*16+15`
  1248. li r11,`$FRAME+8*16+31`
  1249. stvx v20,r10,$sp # ABI says so
  1250. addi r10,r10,32
  1251. stvx v21,r11,$sp
  1252. addi r11,r11,32
  1253. stvx v22,r10,$sp
  1254. addi r10,r10,32
  1255. stvx v23,r11,$sp
  1256. addi r11,r11,32
  1257. stvx v24,r10,$sp
  1258. addi r10,r10,32
  1259. stvx v25,r11,$sp
  1260. addi r11,r11,32
  1261. stvx v26,r10,$sp
  1262. addi r10,r10,32
  1263. stvx v27,r11,$sp
  1264. addi r11,r11,32
  1265. stvx v28,r10,$sp
  1266. addi r10,r10,32
  1267. stvx v29,r11,$sp
  1268. addi r11,r11,32
  1269. stvx v30,r10,$sp
  1270. stvx v31,r11,$sp
  1271. li r0,-1
  1272. stw $vrsave,`$FRAME+21*16-4`($sp) # save vrsave
  1273. li $x10,0x10
  1274. $PUSH r26,`$FRAME+21*16+0*$SIZE_T`($sp)
  1275. li $x20,0x20
  1276. $PUSH r27,`$FRAME+21*16+1*$SIZE_T`($sp)
  1277. li $x30,0x30
  1278. $PUSH r28,`$FRAME+21*16+2*$SIZE_T`($sp)
  1279. li $x40,0x40
  1280. $PUSH r29,`$FRAME+21*16+3*$SIZE_T`($sp)
  1281. li $x50,0x50
  1282. $PUSH r30,`$FRAME+21*16+4*$SIZE_T`($sp)
  1283. li $x60,0x60
  1284. $PUSH r31,`$FRAME+21*16+5*$SIZE_T`($sp)
  1285. li $x70,0x70
  1286. mtspr 256,r0
  1287. subi $rounds,$rounds,3 # -4 in total
  1288. lvx $rndkey0,$x00,$key # load key schedule
  1289. lvx v30,$x10,$key
  1290. addi $key,$key,0x20
  1291. lvx v31,$x00,$key
  1292. ?vperm $rndkey0,$rndkey0,v30,$keyperm
  1293. addi $key_,$sp,$FRAME+15
  1294. mtctr $rounds
  1295. Load_ctr32_enc_key:
  1296. ?vperm v24,v30,v31,$keyperm
  1297. lvx v30,$x10,$key
  1298. addi $key,$key,0x20
  1299. stvx v24,$x00,$key_ # off-load round[1]
  1300. ?vperm v25,v31,v30,$keyperm
  1301. lvx v31,$x00,$key
  1302. stvx v25,$x10,$key_ # off-load round[2]
  1303. addi $key_,$key_,0x20
  1304. bdnz Load_ctr32_enc_key
  1305. lvx v26,$x10,$key
  1306. ?vperm v24,v30,v31,$keyperm
  1307. lvx v27,$x20,$key
  1308. stvx v24,$x00,$key_ # off-load round[3]
  1309. ?vperm v25,v31,v26,$keyperm
  1310. lvx v28,$x30,$key
  1311. stvx v25,$x10,$key_ # off-load round[4]
  1312. addi $key_,$sp,$FRAME+15 # rewind $key_
  1313. ?vperm v26,v26,v27,$keyperm
  1314. lvx v29,$x40,$key
  1315. ?vperm v27,v27,v28,$keyperm
  1316. lvx v30,$x50,$key
  1317. ?vperm v28,v28,v29,$keyperm
  1318. lvx v31,$x60,$key
  1319. ?vperm v29,v29,v30,$keyperm
  1320. lvx $out0,$x70,$key # borrow $out0
  1321. ?vperm v30,v30,v31,$keyperm
  1322. lvx v24,$x00,$key_ # pre-load round[1]
  1323. ?vperm v31,v31,$out0,$keyperm
  1324. lvx v25,$x10,$key_ # pre-load round[2]
  1325. vadduwm $two,$one,$one
  1326. subi $inp,$inp,15 # undo "caller"
  1327. $SHL $len,$len,4
  1328. vadduwm $out1,$ivec,$one # counter values ...
  1329. vadduwm $out2,$ivec,$two
  1330. vxor $out0,$ivec,$rndkey0 # ... xored with rndkey[0]
  1331. le?li $idx,8
  1332. vadduwm $out3,$out1,$two
  1333. vxor $out1,$out1,$rndkey0
  1334. le?lvsl $inpperm,0,$idx
  1335. vadduwm $out4,$out2,$two
  1336. vxor $out2,$out2,$rndkey0
  1337. le?vspltisb $tmp,0x0f
  1338. vadduwm $out5,$out3,$two
  1339. vxor $out3,$out3,$rndkey0
  1340. le?vxor $inpperm,$inpperm,$tmp # transform for lvx_u/stvx_u
  1341. vadduwm $out6,$out4,$two
  1342. vxor $out4,$out4,$rndkey0
  1343. vadduwm $out7,$out5,$two
  1344. vxor $out5,$out5,$rndkey0
  1345. vadduwm $ivec,$out6,$two # next counter value
  1346. vxor $out6,$out6,$rndkey0
  1347. vxor $out7,$out7,$rndkey0
  1348. mtctr $rounds
  1349. b Loop_ctr32_enc8x
  1350. .align 5
  1351. Loop_ctr32_enc8x:
  1352. vcipher $out0,$out0,v24
  1353. vcipher $out1,$out1,v24
  1354. vcipher $out2,$out2,v24
  1355. vcipher $out3,$out3,v24
  1356. vcipher $out4,$out4,v24
  1357. vcipher $out5,$out5,v24
  1358. vcipher $out6,$out6,v24
  1359. vcipher $out7,$out7,v24
  1360. Loop_ctr32_enc8x_middle:
  1361. lvx v24,$x20,$key_ # round[3]
  1362. addi $key_,$key_,0x20
  1363. vcipher $out0,$out0,v25
  1364. vcipher $out1,$out1,v25
  1365. vcipher $out2,$out2,v25
  1366. vcipher $out3,$out3,v25
  1367. vcipher $out4,$out4,v25
  1368. vcipher $out5,$out5,v25
  1369. vcipher $out6,$out6,v25
  1370. vcipher $out7,$out7,v25
  1371. lvx v25,$x10,$key_ # round[4]
  1372. bdnz Loop_ctr32_enc8x
  1373. subic r11,$len,256 # $len-256, borrow $key_
  1374. vcipher $out0,$out0,v24
  1375. vcipher $out1,$out1,v24
  1376. vcipher $out2,$out2,v24
  1377. vcipher $out3,$out3,v24
  1378. vcipher $out4,$out4,v24
  1379. vcipher $out5,$out5,v24
  1380. vcipher $out6,$out6,v24
  1381. vcipher $out7,$out7,v24
  1382. subfe r0,r0,r0 # borrow?-1:0
  1383. vcipher $out0,$out0,v25
  1384. vcipher $out1,$out1,v25
  1385. vcipher $out2,$out2,v25
  1386. vcipher $out3,$out3,v25
  1387. vcipher $out4,$out4,v25
  1388. vcipher $out5,$out5,v25
  1389. vcipher $out6,$out6,v25
  1390. vcipher $out7,$out7,v25
  1391. and r0,r0,r11
  1392. addi $key_,$sp,$FRAME+15 # rewind $key_
  1393. vcipher $out0,$out0,v26
  1394. vcipher $out1,$out1,v26
  1395. vcipher $out2,$out2,v26
  1396. vcipher $out3,$out3,v26
  1397. vcipher $out4,$out4,v26
  1398. vcipher $out5,$out5,v26
  1399. vcipher $out6,$out6,v26
  1400. vcipher $out7,$out7,v26
  1401. lvx v24,$x00,$key_ # re-pre-load round[1]
  1402. subic $len,$len,129 # $len-=129
  1403. vcipher $out0,$out0,v27
  1404. addi $len,$len,1 # $len-=128 really
  1405. vcipher $out1,$out1,v27
  1406. vcipher $out2,$out2,v27
  1407. vcipher $out3,$out3,v27
  1408. vcipher $out4,$out4,v27
  1409. vcipher $out5,$out5,v27
  1410. vcipher $out6,$out6,v27
  1411. vcipher $out7,$out7,v27
  1412. lvx v25,$x10,$key_ # re-pre-load round[2]
  1413. vcipher $out0,$out0,v28
  1414. lvx_u $in0,$x00,$inp # load input
  1415. vcipher $out1,$out1,v28
  1416. lvx_u $in1,$x10,$inp
  1417. vcipher $out2,$out2,v28
  1418. lvx_u $in2,$x20,$inp
  1419. vcipher $out3,$out3,v28
  1420. lvx_u $in3,$x30,$inp
  1421. vcipher $out4,$out4,v28
  1422. lvx_u $in4,$x40,$inp
  1423. vcipher $out5,$out5,v28
  1424. lvx_u $in5,$x50,$inp
  1425. vcipher $out6,$out6,v28
  1426. lvx_u $in6,$x60,$inp
  1427. vcipher $out7,$out7,v28
  1428. lvx_u $in7,$x70,$inp
  1429. addi $inp,$inp,0x80
  1430. vcipher $out0,$out0,v29
  1431. le?vperm $in0,$in0,$in0,$inpperm
  1432. vcipher $out1,$out1,v29
  1433. le?vperm $in1,$in1,$in1,$inpperm
  1434. vcipher $out2,$out2,v29
  1435. le?vperm $in2,$in2,$in2,$inpperm
  1436. vcipher $out3,$out3,v29
  1437. le?vperm $in3,$in3,$in3,$inpperm
  1438. vcipher $out4,$out4,v29
  1439. le?vperm $in4,$in4,$in4,$inpperm
  1440. vcipher $out5,$out5,v29
  1441. le?vperm $in5,$in5,$in5,$inpperm
  1442. vcipher $out6,$out6,v29
  1443. le?vperm $in6,$in6,$in6,$inpperm
  1444. vcipher $out7,$out7,v29
  1445. le?vperm $in7,$in7,$in7,$inpperm
  1446. add $inp,$inp,r0 # $inp is adjusted in such
  1447. # way that at exit from the
  1448. # loop inX-in7 are loaded
  1449. # with last "words"
  1450. subfe. r0,r0,r0 # borrow?-1:0
  1451. vcipher $out0,$out0,v30
  1452. vxor $in0,$in0,v31 # xor with last round key
  1453. vcipher $out1,$out1,v30
  1454. vxor $in1,$in1,v31
  1455. vcipher $out2,$out2,v30
  1456. vxor $in2,$in2,v31
  1457. vcipher $out3,$out3,v30
  1458. vxor $in3,$in3,v31
  1459. vcipher $out4,$out4,v30
  1460. vxor $in4,$in4,v31
  1461. vcipher $out5,$out5,v30
  1462. vxor $in5,$in5,v31
  1463. vcipher $out6,$out6,v30
  1464. vxor $in6,$in6,v31
  1465. vcipher $out7,$out7,v30
  1466. vxor $in7,$in7,v31
  1467. bne Lctr32_enc8x_break # did $len-129 borrow?
  1468. vcipherlast $in0,$out0,$in0
  1469. vcipherlast $in1,$out1,$in1
  1470. vadduwm $out1,$ivec,$one # counter values ...
  1471. vcipherlast $in2,$out2,$in2
  1472. vadduwm $out2,$ivec,$two
  1473. vxor $out0,$ivec,$rndkey0 # ... xored with rndkey[0]
  1474. vcipherlast $in3,$out3,$in3
  1475. vadduwm $out3,$out1,$two
  1476. vxor $out1,$out1,$rndkey0
  1477. vcipherlast $in4,$out4,$in4
  1478. vadduwm $out4,$out2,$two
  1479. vxor $out2,$out2,$rndkey0
  1480. vcipherlast $in5,$out5,$in5
  1481. vadduwm $out5,$out3,$two
  1482. vxor $out3,$out3,$rndkey0
  1483. vcipherlast $in6,$out6,$in6
  1484. vadduwm $out6,$out4,$two
  1485. vxor $out4,$out4,$rndkey0
  1486. vcipherlast $in7,$out7,$in7
  1487. vadduwm $out7,$out5,$two
  1488. vxor $out5,$out5,$rndkey0
  1489. le?vperm $in0,$in0,$in0,$inpperm
  1490. vadduwm $ivec,$out6,$two # next counter value
  1491. vxor $out6,$out6,$rndkey0
  1492. le?vperm $in1,$in1,$in1,$inpperm
  1493. vxor $out7,$out7,$rndkey0
  1494. mtctr $rounds
  1495. vcipher $out0,$out0,v24
  1496. stvx_u $in0,$x00,$out
  1497. le?vperm $in2,$in2,$in2,$inpperm
  1498. vcipher $out1,$out1,v24
  1499. stvx_u $in1,$x10,$out
  1500. le?vperm $in3,$in3,$in3,$inpperm
  1501. vcipher $out2,$out2,v24
  1502. stvx_u $in2,$x20,$out
  1503. le?vperm $in4,$in4,$in4,$inpperm
  1504. vcipher $out3,$out3,v24
  1505. stvx_u $in3,$x30,$out
  1506. le?vperm $in5,$in5,$in5,$inpperm
  1507. vcipher $out4,$out4,v24
  1508. stvx_u $in4,$x40,$out
  1509. le?vperm $in6,$in6,$in6,$inpperm
  1510. vcipher $out5,$out5,v24
  1511. stvx_u $in5,$x50,$out
  1512. le?vperm $in7,$in7,$in7,$inpperm
  1513. vcipher $out6,$out6,v24
  1514. stvx_u $in6,$x60,$out
  1515. vcipher $out7,$out7,v24
  1516. stvx_u $in7,$x70,$out
  1517. addi $out,$out,0x80
  1518. b Loop_ctr32_enc8x_middle
  1519. .align 5
  1520. Lctr32_enc8x_break:
  1521. cmpwi $len,-0x60
  1522. blt Lctr32_enc8x_one
  1523. nop
  1524. beq Lctr32_enc8x_two
  1525. cmpwi $len,-0x40
  1526. blt Lctr32_enc8x_three
  1527. nop
  1528. beq Lctr32_enc8x_four
  1529. cmpwi $len,-0x20
  1530. blt Lctr32_enc8x_five
  1531. nop
  1532. beq Lctr32_enc8x_six
  1533. cmpwi $len,0x00
  1534. blt Lctr32_enc8x_seven
  1535. Lctr32_enc8x_eight:
  1536. vcipherlast $out0,$out0,$in0
  1537. vcipherlast $out1,$out1,$in1
  1538. vcipherlast $out2,$out2,$in2
  1539. vcipherlast $out3,$out3,$in3
  1540. vcipherlast $out4,$out4,$in4
  1541. vcipherlast $out5,$out5,$in5
  1542. vcipherlast $out6,$out6,$in6
  1543. vcipherlast $out7,$out7,$in7
  1544. le?vperm $out0,$out0,$out0,$inpperm
  1545. le?vperm $out1,$out1,$out1,$inpperm
  1546. stvx_u $out0,$x00,$out
  1547. le?vperm $out2,$out2,$out2,$inpperm
  1548. stvx_u $out1,$x10,$out
  1549. le?vperm $out3,$out3,$out3,$inpperm
  1550. stvx_u $out2,$x20,$out
  1551. le?vperm $out4,$out4,$out4,$inpperm
  1552. stvx_u $out3,$x30,$out
  1553. le?vperm $out5,$out5,$out5,$inpperm
  1554. stvx_u $out4,$x40,$out
  1555. le?vperm $out6,$out6,$out6,$inpperm
  1556. stvx_u $out5,$x50,$out
  1557. le?vperm $out7,$out7,$out7,$inpperm
  1558. stvx_u $out6,$x60,$out
  1559. stvx_u $out7,$x70,$out
  1560. addi $out,$out,0x80
  1561. b Lctr32_enc8x_done
  1562. .align 5
  1563. Lctr32_enc8x_seven:
  1564. vcipherlast $out0,$out0,$in1
  1565. vcipherlast $out1,$out1,$in2
  1566. vcipherlast $out2,$out2,$in3
  1567. vcipherlast $out3,$out3,$in4
  1568. vcipherlast $out4,$out4,$in5
  1569. vcipherlast $out5,$out5,$in6
  1570. vcipherlast $out6,$out6,$in7
  1571. le?vperm $out0,$out0,$out0,$inpperm
  1572. le?vperm $out1,$out1,$out1,$inpperm
  1573. stvx_u $out0,$x00,$out
  1574. le?vperm $out2,$out2,$out2,$inpperm
  1575. stvx_u $out1,$x10,$out
  1576. le?vperm $out3,$out3,$out3,$inpperm
  1577. stvx_u $out2,$x20,$out
  1578. le?vperm $out4,$out4,$out4,$inpperm
  1579. stvx_u $out3,$x30,$out
  1580. le?vperm $out5,$out5,$out5,$inpperm
  1581. stvx_u $out4,$x40,$out
  1582. le?vperm $out6,$out6,$out6,$inpperm
  1583. stvx_u $out5,$x50,$out
  1584. stvx_u $out6,$x60,$out
  1585. addi $out,$out,0x70
  1586. b Lctr32_enc8x_done
  1587. .align 5
  1588. Lctr32_enc8x_six:
  1589. vcipherlast $out0,$out0,$in2
  1590. vcipherlast $out1,$out1,$in3
  1591. vcipherlast $out2,$out2,$in4
  1592. vcipherlast $out3,$out3,$in5
  1593. vcipherlast $out4,$out4,$in6
  1594. vcipherlast $out5,$out5,$in7
  1595. le?vperm $out0,$out0,$out0,$inpperm
  1596. le?vperm $out1,$out1,$out1,$inpperm
  1597. stvx_u $out0,$x00,$out
  1598. le?vperm $out2,$out2,$out2,$inpperm
  1599. stvx_u $out1,$x10,$out
  1600. le?vperm $out3,$out3,$out3,$inpperm
  1601. stvx_u $out2,$x20,$out
  1602. le?vperm $out4,$out4,$out4,$inpperm
  1603. stvx_u $out3,$x30,$out
  1604. le?vperm $out5,$out5,$out5,$inpperm
  1605. stvx_u $out4,$x40,$out
  1606. stvx_u $out5,$x50,$out
  1607. addi $out,$out,0x60
  1608. b Lctr32_enc8x_done
  1609. .align 5
  1610. Lctr32_enc8x_five:
  1611. vcipherlast $out0,$out0,$in3
  1612. vcipherlast $out1,$out1,$in4
  1613. vcipherlast $out2,$out2,$in5
  1614. vcipherlast $out3,$out3,$in6
  1615. vcipherlast $out4,$out4,$in7
  1616. le?vperm $out0,$out0,$out0,$inpperm
  1617. le?vperm $out1,$out1,$out1,$inpperm
  1618. stvx_u $out0,$x00,$out
  1619. le?vperm $out2,$out2,$out2,$inpperm
  1620. stvx_u $out1,$x10,$out
  1621. le?vperm $out3,$out3,$out3,$inpperm
  1622. stvx_u $out2,$x20,$out
  1623. le?vperm $out4,$out4,$out4,$inpperm
  1624. stvx_u $out3,$x30,$out
  1625. stvx_u $out4,$x40,$out
  1626. addi $out,$out,0x50
  1627. b Lctr32_enc8x_done
  1628. .align 5
  1629. Lctr32_enc8x_four:
  1630. vcipherlast $out0,$out0,$in4
  1631. vcipherlast $out1,$out1,$in5
  1632. vcipherlast $out2,$out2,$in6
  1633. vcipherlast $out3,$out3,$in7
  1634. le?vperm $out0,$out0,$out0,$inpperm
  1635. le?vperm $out1,$out1,$out1,$inpperm
  1636. stvx_u $out0,$x00,$out
  1637. le?vperm $out2,$out2,$out2,$inpperm
  1638. stvx_u $out1,$x10,$out
  1639. le?vperm $out3,$out3,$out3,$inpperm
  1640. stvx_u $out2,$x20,$out
  1641. stvx_u $out3,$x30,$out
  1642. addi $out,$out,0x40
  1643. b Lctr32_enc8x_done
  1644. .align 5
  1645. Lctr32_enc8x_three:
  1646. vcipherlast $out0,$out0,$in5
  1647. vcipherlast $out1,$out1,$in6
  1648. vcipherlast $out2,$out2,$in7
  1649. le?vperm $out0,$out0,$out0,$inpperm
  1650. le?vperm $out1,$out1,$out1,$inpperm
  1651. stvx_u $out0,$x00,$out
  1652. le?vperm $out2,$out2,$out2,$inpperm
  1653. stvx_u $out1,$x10,$out
  1654. stvx_u $out2,$x20,$out
  1655. addi $out,$out,0x30
  1656. b Lcbc_dec8x_done
  1657. .align 5
  1658. Lctr32_enc8x_two:
  1659. vcipherlast $out0,$out0,$in6
  1660. vcipherlast $out1,$out1,$in7
  1661. le?vperm $out0,$out0,$out0,$inpperm
  1662. le?vperm $out1,$out1,$out1,$inpperm
  1663. stvx_u $out0,$x00,$out
  1664. stvx_u $out1,$x10,$out
  1665. addi $out,$out,0x20
  1666. b Lcbc_dec8x_done
  1667. .align 5
  1668. Lctr32_enc8x_one:
  1669. vcipherlast $out0,$out0,$in7
  1670. le?vperm $out0,$out0,$out0,$inpperm
  1671. stvx_u $out0,0,$out
  1672. addi $out,$out,0x10
  1673. Lctr32_enc8x_done:
  1674. li r10,`$FRAME+15`
  1675. li r11,`$FRAME+31`
  1676. stvx $inpperm,r10,$sp # wipe copies of round keys
  1677. addi r10,r10,32
  1678. stvx $inpperm,r11,$sp
  1679. addi r11,r11,32
  1680. stvx $inpperm,r10,$sp
  1681. addi r10,r10,32
  1682. stvx $inpperm,r11,$sp
  1683. addi r11,r11,32
  1684. stvx $inpperm,r10,$sp
  1685. addi r10,r10,32
  1686. stvx $inpperm,r11,$sp
  1687. addi r11,r11,32
  1688. stvx $inpperm,r10,$sp
  1689. addi r10,r10,32
  1690. stvx $inpperm,r11,$sp
  1691. addi r11,r11,32
  1692. mtspr 256,$vrsave
  1693. lvx v20,r10,$sp # ABI says so
  1694. addi r10,r10,32
  1695. lvx v21,r11,$sp
  1696. addi r11,r11,32
  1697. lvx v22,r10,$sp
  1698. addi r10,r10,32
  1699. lvx v23,r11,$sp
  1700. addi r11,r11,32
  1701. lvx v24,r10,$sp
  1702. addi r10,r10,32
  1703. lvx v25,r11,$sp
  1704. addi r11,r11,32
  1705. lvx v26,r10,$sp
  1706. addi r10,r10,32
  1707. lvx v27,r11,$sp
  1708. addi r11,r11,32
  1709. lvx v28,r10,$sp
  1710. addi r10,r10,32
  1711. lvx v29,r11,$sp
  1712. addi r11,r11,32
  1713. lvx v30,r10,$sp
  1714. lvx v31,r11,$sp
  1715. $POP r26,`$FRAME+21*16+0*$SIZE_T`($sp)
  1716. $POP r27,`$FRAME+21*16+1*$SIZE_T`($sp)
  1717. $POP r28,`$FRAME+21*16+2*$SIZE_T`($sp)
  1718. $POP r29,`$FRAME+21*16+3*$SIZE_T`($sp)
  1719. $POP r30,`$FRAME+21*16+4*$SIZE_T`($sp)
  1720. $POP r31,`$FRAME+21*16+5*$SIZE_T`($sp)
  1721. addi $sp,$sp,`$FRAME+21*16+6*$SIZE_T`
  1722. blr
  1723. .long 0
  1724. .byte 0,12,0x04,0,0x80,6,6,0
  1725. .long 0
  1726. .size .${prefix}_ctr32_encrypt_blocks,.-.${prefix}_ctr32_encrypt_blocks
  1727. ___
  1728. }} }}}
  1729. #########################################################################
  1730. {{{ # XTS procedures #
  1731. # int aes_p8_xts_[en|de]crypt(const char *inp, char *out, size_t len, #
  1732. # const AES_KEY *key1, const AES_KEY *key2, #
  1733. # [const] unsigned char iv[16]); #
  1734. # If $key2 is NULL, then a "tweak chaining" mode is engaged, in which #
  1735. # input tweak value is assumed to be encrypted already, and last tweak #
  1736. # value, one suitable for consecutive call on same chunk of data, is #
  1737. # written back to original buffer. In addition, in "tweak chaining" #
  1738. # mode only complete input blocks are processed. #
  1739. my ($inp,$out,$len,$key1,$key2,$ivp,$rounds,$idx) = map("r$_",(3..10));
  1740. my ($rndkey0,$rndkey1,$inout) = map("v$_",(0..2));
  1741. my ($output,$inptail,$inpperm,$leperm,$keyperm) = map("v$_",(3..7));
  1742. my ($tweak,$seven,$eighty7,$tmp,$tweak1) = map("v$_",(8..12));
  1743. my $taillen = $key2;
  1744. ($inp,$idx) = ($idx,$inp); # reassign
  1745. $code.=<<___;
  1746. .globl .${prefix}_xts_encrypt
  1747. .align 5
  1748. .${prefix}_xts_encrypt:
  1749. mr $inp,r3 # reassign
  1750. li r3,-1
  1751. ${UCMP}i $len,16
  1752. bltlr-
  1753. lis r0,0xfff0
  1754. mfspr r12,256 # save vrsave
  1755. li r11,0
  1756. mtspr 256,r0
  1757. vspltisb $seven,0x07 # 0x070707..07
  1758. le?lvsl $leperm,r11,r11
  1759. le?vspltisb $tmp,0x0f
  1760. le?vxor $leperm,$leperm,$seven
  1761. li $idx,15
  1762. lvx $tweak,0,$ivp # load [unaligned] iv
  1763. lvsl $inpperm,0,$ivp
  1764. lvx $inptail,$idx,$ivp
  1765. le?vxor $inpperm,$inpperm,$tmp
  1766. vperm $tweak,$tweak,$inptail,$inpperm
  1767. neg r11,$inp
  1768. lvsr $inpperm,0,r11 # prepare for unaligned load
  1769. lvx $inout,0,$inp
  1770. addi $inp,$inp,15 # 15 is not typo
  1771. le?vxor $inpperm,$inpperm,$tmp
  1772. ${UCMP}i $key2,0 # key2==NULL?
  1773. beq Lxts_enc_no_key2
  1774. ?lvsl $keyperm,0,$key2 # prepare for unaligned key
  1775. lwz $rounds,240($key2)
  1776. srwi $rounds,$rounds,1
  1777. subi $rounds,$rounds,1
  1778. li $idx,16
  1779. lvx $rndkey0,0,$key2
  1780. lvx $rndkey1,$idx,$key2
  1781. addi $idx,$idx,16
  1782. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  1783. vxor $tweak,$tweak,$rndkey0
  1784. lvx $rndkey0,$idx,$key2
  1785. addi $idx,$idx,16
  1786. mtctr $rounds
  1787. Ltweak_xts_enc:
  1788. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  1789. vcipher $tweak,$tweak,$rndkey1
  1790. lvx $rndkey1,$idx,$key2
  1791. addi $idx,$idx,16
  1792. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  1793. vcipher $tweak,$tweak,$rndkey0
  1794. lvx $rndkey0,$idx,$key2
  1795. addi $idx,$idx,16
  1796. bdnz Ltweak_xts_enc
  1797. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  1798. vcipher $tweak,$tweak,$rndkey1
  1799. lvx $rndkey1,$idx,$key2
  1800. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  1801. vcipherlast $tweak,$tweak,$rndkey0
  1802. li $ivp,0 # don't chain the tweak
  1803. b Lxts_enc
  1804. Lxts_enc_no_key2:
  1805. li $idx,-16
  1806. and $len,$len,$idx # in "tweak chaining"
  1807. # mode only complete
  1808. # blocks are processed
  1809. Lxts_enc:
  1810. lvx $inptail,0,$inp
  1811. addi $inp,$inp,16
  1812. ?lvsl $keyperm,0,$key1 # prepare for unaligned key
  1813. lwz $rounds,240($key1)
  1814. srwi $rounds,$rounds,1
  1815. subi $rounds,$rounds,1
  1816. li $idx,16
  1817. vslb $eighty7,$seven,$seven # 0x808080..80
  1818. vor $eighty7,$eighty7,$seven # 0x878787..87
  1819. vspltisb $tmp,1 # 0x010101..01
  1820. vsldoi $eighty7,$eighty7,$tmp,15 # 0x870101..01
  1821. ${UCMP}i $len,96
  1822. bge _aesp8_xts_encrypt6x
  1823. andi. $taillen,$len,15
  1824. subic r0,$len,32
  1825. subi $taillen,$taillen,16
  1826. subfe r0,r0,r0
  1827. and r0,r0,$taillen
  1828. add $inp,$inp,r0
  1829. lvx $rndkey0,0,$key1
  1830. lvx $rndkey1,$idx,$key1
  1831. addi $idx,$idx,16
  1832. vperm $inout,$inout,$inptail,$inpperm
  1833. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  1834. vxor $inout,$inout,$tweak
  1835. vxor $inout,$inout,$rndkey0
  1836. lvx $rndkey0,$idx,$key1
  1837. addi $idx,$idx,16
  1838. mtctr $rounds
  1839. b Loop_xts_enc
  1840. .align 5
  1841. Loop_xts_enc:
  1842. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  1843. vcipher $inout,$inout,$rndkey1
  1844. lvx $rndkey1,$idx,$key1
  1845. addi $idx,$idx,16
  1846. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  1847. vcipher $inout,$inout,$rndkey0
  1848. lvx $rndkey0,$idx,$key1
  1849. addi $idx,$idx,16
  1850. bdnz Loop_xts_enc
  1851. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  1852. vcipher $inout,$inout,$rndkey1
  1853. lvx $rndkey1,$idx,$key1
  1854. li $idx,16
  1855. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  1856. vxor $rndkey0,$rndkey0,$tweak
  1857. vcipherlast $output,$inout,$rndkey0
  1858. le?vperm $tmp,$output,$output,$leperm
  1859. be?nop
  1860. le?stvx_u $tmp,0,$out
  1861. be?stvx_u $output,0,$out
  1862. addi $out,$out,16
  1863. subic. $len,$len,16
  1864. beq Lxts_enc_done
  1865. vmr $inout,$inptail
  1866. lvx $inptail,0,$inp
  1867. addi $inp,$inp,16
  1868. lvx $rndkey0,0,$key1
  1869. lvx $rndkey1,$idx,$key1
  1870. addi $idx,$idx,16
  1871. subic r0,$len,32
  1872. subfe r0,r0,r0
  1873. and r0,r0,$taillen
  1874. add $inp,$inp,r0
  1875. vsrab $tmp,$tweak,$seven # next tweak value
  1876. vaddubm $tweak,$tweak,$tweak
  1877. vsldoi $tmp,$tmp,$tmp,15
  1878. vand $tmp,$tmp,$eighty7
  1879. vxor $tweak,$tweak,$tmp
  1880. vperm $inout,$inout,$inptail,$inpperm
  1881. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  1882. vxor $inout,$inout,$tweak
  1883. vxor $output,$output,$rndkey0 # just in case $len<16
  1884. vxor $inout,$inout,$rndkey0
  1885. lvx $rndkey0,$idx,$key1
  1886. addi $idx,$idx,16
  1887. mtctr $rounds
  1888. ${UCMP}i $len,16
  1889. bge Loop_xts_enc
  1890. vxor $output,$output,$tweak
  1891. lvsr $inpperm,0,$len # $inpperm is no longer needed
  1892. vxor $inptail,$inptail,$inptail # $inptail is no longer needed
  1893. vspltisb $tmp,-1
  1894. vperm $inptail,$inptail,$tmp,$inpperm
  1895. vsel $inout,$inout,$output,$inptail
  1896. subi r11,$out,17
  1897. subi $out,$out,16
  1898. mtctr $len
  1899. li $len,16
  1900. Loop_xts_enc_steal:
  1901. lbzu r0,1(r11)
  1902. stb r0,16(r11)
  1903. bdnz Loop_xts_enc_steal
  1904. mtctr $rounds
  1905. b Loop_xts_enc # one more time...
  1906. Lxts_enc_done:
  1907. ${UCMP}i $ivp,0
  1908. beq Lxts_enc_ret
  1909. vsrab $tmp,$tweak,$seven # next tweak value
  1910. vaddubm $tweak,$tweak,$tweak
  1911. vsldoi $tmp,$tmp,$tmp,15
  1912. vand $tmp,$tmp,$eighty7
  1913. vxor $tweak,$tweak,$tmp
  1914. le?vperm $tweak,$tweak,$tweak,$leperm
  1915. stvx_u $tweak,0,$ivp
  1916. Lxts_enc_ret:
  1917. mtspr 256,r12 # restore vrsave
  1918. li r3,0
  1919. blr
  1920. .long 0
  1921. .byte 0,12,0x04,0,0x80,6,6,0
  1922. .long 0
  1923. .size .${prefix}_xts_encrypt,.-.${prefix}_xts_encrypt
  1924. .globl .${prefix}_xts_decrypt
  1925. .align 5
  1926. .${prefix}_xts_decrypt:
  1927. mr $inp,r3 # reassign
  1928. li r3,-1
  1929. ${UCMP}i $len,16
  1930. bltlr-
  1931. lis r0,0xfff8
  1932. mfspr r12,256 # save vrsave
  1933. li r11,0
  1934. mtspr 256,r0
  1935. andi. r0,$len,15
  1936. neg r0,r0
  1937. andi. r0,r0,16
  1938. sub $len,$len,r0
  1939. vspltisb $seven,0x07 # 0x070707..07
  1940. le?lvsl $leperm,r11,r11
  1941. le?vspltisb $tmp,0x0f
  1942. le?vxor $leperm,$leperm,$seven
  1943. li $idx,15
  1944. lvx $tweak,0,$ivp # load [unaligned] iv
  1945. lvsl $inpperm,0,$ivp
  1946. lvx $inptail,$idx,$ivp
  1947. le?vxor $inpperm,$inpperm,$tmp
  1948. vperm $tweak,$tweak,$inptail,$inpperm
  1949. neg r11,$inp
  1950. lvsr $inpperm,0,r11 # prepare for unaligned load
  1951. lvx $inout,0,$inp
  1952. addi $inp,$inp,15 # 15 is not typo
  1953. le?vxor $inpperm,$inpperm,$tmp
  1954. ${UCMP}i $key2,0 # key2==NULL?
  1955. beq Lxts_dec_no_key2
  1956. ?lvsl $keyperm,0,$key2 # prepare for unaligned key
  1957. lwz $rounds,240($key2)
  1958. srwi $rounds,$rounds,1
  1959. subi $rounds,$rounds,1
  1960. li $idx,16
  1961. lvx $rndkey0,0,$key2
  1962. lvx $rndkey1,$idx,$key2
  1963. addi $idx,$idx,16
  1964. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  1965. vxor $tweak,$tweak,$rndkey0
  1966. lvx $rndkey0,$idx,$key2
  1967. addi $idx,$idx,16
  1968. mtctr $rounds
  1969. Ltweak_xts_dec:
  1970. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  1971. vcipher $tweak,$tweak,$rndkey1
  1972. lvx $rndkey1,$idx,$key2
  1973. addi $idx,$idx,16
  1974. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  1975. vcipher $tweak,$tweak,$rndkey0
  1976. lvx $rndkey0,$idx,$key2
  1977. addi $idx,$idx,16
  1978. bdnz Ltweak_xts_dec
  1979. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  1980. vcipher $tweak,$tweak,$rndkey1
  1981. lvx $rndkey1,$idx,$key2
  1982. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  1983. vcipherlast $tweak,$tweak,$rndkey0
  1984. li $ivp,0 # don't chain the tweak
  1985. b Lxts_dec
  1986. Lxts_dec_no_key2:
  1987. neg $idx,$len
  1988. andi. $idx,$idx,15
  1989. add $len,$len,$idx # in "tweak chaining"
  1990. # mode only complete
  1991. # blocks are processed
  1992. Lxts_dec:
  1993. lvx $inptail,0,$inp
  1994. addi $inp,$inp,16
  1995. ?lvsl $keyperm,0,$key1 # prepare for unaligned key
  1996. lwz $rounds,240($key1)
  1997. srwi $rounds,$rounds,1
  1998. subi $rounds,$rounds,1
  1999. li $idx,16
  2000. vslb $eighty7,$seven,$seven # 0x808080..80
  2001. vor $eighty7,$eighty7,$seven # 0x878787..87
  2002. vspltisb $tmp,1 # 0x010101..01
  2003. vsldoi $eighty7,$eighty7,$tmp,15 # 0x870101..01
  2004. ${UCMP}i $len,96
  2005. bge _aesp8_xts_decrypt6x
  2006. lvx $rndkey0,0,$key1
  2007. lvx $rndkey1,$idx,$key1
  2008. addi $idx,$idx,16
  2009. vperm $inout,$inout,$inptail,$inpperm
  2010. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  2011. vxor $inout,$inout,$tweak
  2012. vxor $inout,$inout,$rndkey0
  2013. lvx $rndkey0,$idx,$key1
  2014. addi $idx,$idx,16
  2015. mtctr $rounds
  2016. ${UCMP}i $len,16
  2017. blt Ltail_xts_dec
  2018. be?b Loop_xts_dec
  2019. .align 5
  2020. Loop_xts_dec:
  2021. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  2022. vncipher $inout,$inout,$rndkey1
  2023. lvx $rndkey1,$idx,$key1
  2024. addi $idx,$idx,16
  2025. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  2026. vncipher $inout,$inout,$rndkey0
  2027. lvx $rndkey0,$idx,$key1
  2028. addi $idx,$idx,16
  2029. bdnz Loop_xts_dec
  2030. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  2031. vncipher $inout,$inout,$rndkey1
  2032. lvx $rndkey1,$idx,$key1
  2033. li $idx,16
  2034. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  2035. vxor $rndkey0,$rndkey0,$tweak
  2036. vncipherlast $output,$inout,$rndkey0
  2037. le?vperm $tmp,$output,$output,$leperm
  2038. be?nop
  2039. le?stvx_u $tmp,0,$out
  2040. be?stvx_u $output,0,$out
  2041. addi $out,$out,16
  2042. subic. $len,$len,16
  2043. beq Lxts_dec_done
  2044. vmr $inout,$inptail
  2045. lvx $inptail,0,$inp
  2046. addi $inp,$inp,16
  2047. lvx $rndkey0,0,$key1
  2048. lvx $rndkey1,$idx,$key1
  2049. addi $idx,$idx,16
  2050. vsrab $tmp,$tweak,$seven # next tweak value
  2051. vaddubm $tweak,$tweak,$tweak
  2052. vsldoi $tmp,$tmp,$tmp,15
  2053. vand $tmp,$tmp,$eighty7
  2054. vxor $tweak,$tweak,$tmp
  2055. vperm $inout,$inout,$inptail,$inpperm
  2056. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  2057. vxor $inout,$inout,$tweak
  2058. vxor $inout,$inout,$rndkey0
  2059. lvx $rndkey0,$idx,$key1
  2060. addi $idx,$idx,16
  2061. mtctr $rounds
  2062. ${UCMP}i $len,16
  2063. bge Loop_xts_dec
  2064. Ltail_xts_dec:
  2065. vsrab $tmp,$tweak,$seven # next tweak value
  2066. vaddubm $tweak1,$tweak,$tweak
  2067. vsldoi $tmp,$tmp,$tmp,15
  2068. vand $tmp,$tmp,$eighty7
  2069. vxor $tweak1,$tweak1,$tmp
  2070. subi $inp,$inp,16
  2071. add $inp,$inp,$len
  2072. vxor $inout,$inout,$tweak # :-(
  2073. vxor $inout,$inout,$tweak1 # :-)
  2074. Loop_xts_dec_short:
  2075. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  2076. vncipher $inout,$inout,$rndkey1
  2077. lvx $rndkey1,$idx,$key1
  2078. addi $idx,$idx,16
  2079. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  2080. vncipher $inout,$inout,$rndkey0
  2081. lvx $rndkey0,$idx,$key1
  2082. addi $idx,$idx,16
  2083. bdnz Loop_xts_dec_short
  2084. ?vperm $rndkey1,$rndkey1,$rndkey0,$keyperm
  2085. vncipher $inout,$inout,$rndkey1
  2086. lvx $rndkey1,$idx,$key1
  2087. li $idx,16
  2088. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  2089. vxor $rndkey0,$rndkey0,$tweak1
  2090. vncipherlast $output,$inout,$rndkey0
  2091. le?vperm $tmp,$output,$output,$leperm
  2092. be?nop
  2093. le?stvx_u $tmp,0,$out
  2094. be?stvx_u $output,0,$out
  2095. vmr $inout,$inptail
  2096. lvx $inptail,0,$inp
  2097. #addi $inp,$inp,16
  2098. lvx $rndkey0,0,$key1
  2099. lvx $rndkey1,$idx,$key1
  2100. addi $idx,$idx,16
  2101. vperm $inout,$inout,$inptail,$inpperm
  2102. ?vperm $rndkey0,$rndkey0,$rndkey1,$keyperm
  2103. lvsr $inpperm,0,$len # $inpperm is no longer needed
  2104. vxor $inptail,$inptail,$inptail # $inptail is no longer needed
  2105. vspltisb $tmp,-1
  2106. vperm $inptail,$inptail,$tmp,$inpperm
  2107. vsel $inout,$inout,$output,$inptail
  2108. vxor $rndkey0,$rndkey0,$tweak
  2109. vxor $inout,$inout,$rndkey0
  2110. lvx $rndkey0,$idx,$key1
  2111. addi $idx,$idx,16
  2112. subi r11,$out,1
  2113. mtctr $len
  2114. li $len,16
  2115. Loop_xts_dec_steal:
  2116. lbzu r0,1(r11)
  2117. stb r0,16(r11)
  2118. bdnz Loop_xts_dec_steal
  2119. mtctr $rounds
  2120. b Loop_xts_dec # one more time...
  2121. Lxts_dec_done:
  2122. ${UCMP}i $ivp,0
  2123. beq Lxts_dec_ret
  2124. vsrab $tmp,$tweak,$seven # next tweak value
  2125. vaddubm $tweak,$tweak,$tweak
  2126. vsldoi $tmp,$tmp,$tmp,15
  2127. vand $tmp,$tmp,$eighty7
  2128. vxor $tweak,$tweak,$tmp
  2129. le?vperm $tweak,$tweak,$tweak,$leperm
  2130. stvx_u $tweak,0,$ivp
  2131. Lxts_dec_ret:
  2132. mtspr 256,r12 # restore vrsave
  2133. li r3,0
  2134. blr
  2135. .long 0
  2136. .byte 0,12,0x04,0,0x80,6,6,0
  2137. .long 0
  2138. .size .${prefix}_xts_decrypt,.-.${prefix}_xts_decrypt
  2139. ___
  2140. #########################################################################
  2141. {{ # Optimized XTS procedures #
  2142. my $key_=$key2;
  2143. my ($x00,$x10,$x20,$x30,$x40,$x50,$x60,$x70)=map("r$_",(0,3,26..31));
  2144. $x00=0 if ($flavour =~ /osx/);
  2145. my ($in0, $in1, $in2, $in3, $in4, $in5 )=map("v$_",(0..5));
  2146. my ($out0, $out1, $out2, $out3, $out4, $out5)=map("v$_",(7,12..16));
  2147. my ($twk0, $twk1, $twk2, $twk3, $twk4, $twk5)=map("v$_",(17..22));
  2148. my $rndkey0="v23"; # v24-v25 rotating buffer for first found keys
  2149. # v26-v31 last 6 round keys
  2150. my ($keyperm)=($out0); # aliases with "caller", redundant assignment
  2151. my $taillen=$x70;
  2152. $code.=<<___;
  2153. .align 5
  2154. _aesp8_xts_encrypt6x:
  2155. $STU $sp,-`($FRAME+21*16+6*$SIZE_T)`($sp)
  2156. mflr r11
  2157. li r7,`$FRAME+8*16+15`
  2158. li r3,`$FRAME+8*16+31`
  2159. $PUSH r11,`$FRAME+21*16+6*$SIZE_T+$LRSAVE`($sp)
  2160. stvx v20,r7,$sp # ABI says so
  2161. addi r7,r7,32
  2162. stvx v21,r3,$sp
  2163. addi r3,r3,32
  2164. stvx v22,r7,$sp
  2165. addi r7,r7,32
  2166. stvx v23,r3,$sp
  2167. addi r3,r3,32
  2168. stvx v24,r7,$sp
  2169. addi r7,r7,32
  2170. stvx v25,r3,$sp
  2171. addi r3,r3,32
  2172. stvx v26,r7,$sp
  2173. addi r7,r7,32
  2174. stvx v27,r3,$sp
  2175. addi r3,r3,32
  2176. stvx v28,r7,$sp
  2177. addi r7,r7,32
  2178. stvx v29,r3,$sp
  2179. addi r3,r3,32
  2180. stvx v30,r7,$sp
  2181. stvx v31,r3,$sp
  2182. li r0,-1
  2183. stw $vrsave,`$FRAME+21*16-4`($sp) # save vrsave
  2184. li $x10,0x10
  2185. $PUSH r26,`$FRAME+21*16+0*$SIZE_T`($sp)
  2186. li $x20,0x20
  2187. $PUSH r27,`$FRAME+21*16+1*$SIZE_T`($sp)
  2188. li $x30,0x30
  2189. $PUSH r28,`$FRAME+21*16+2*$SIZE_T`($sp)
  2190. li $x40,0x40
  2191. $PUSH r29,`$FRAME+21*16+3*$SIZE_T`($sp)
  2192. li $x50,0x50
  2193. $PUSH r30,`$FRAME+21*16+4*$SIZE_T`($sp)
  2194. li $x60,0x60
  2195. $PUSH r31,`$FRAME+21*16+5*$SIZE_T`($sp)
  2196. li $x70,0x70
  2197. mtspr 256,r0
  2198. subi $rounds,$rounds,3 # -4 in total
  2199. lvx $rndkey0,$x00,$key1 # load key schedule
  2200. lvx v30,$x10,$key1
  2201. addi $key1,$key1,0x20
  2202. lvx v31,$x00,$key1
  2203. ?vperm $rndkey0,$rndkey0,v30,$keyperm
  2204. addi $key_,$sp,$FRAME+15
  2205. mtctr $rounds
  2206. Load_xts_enc_key:
  2207. ?vperm v24,v30,v31,$keyperm
  2208. lvx v30,$x10,$key1
  2209. addi $key1,$key1,0x20
  2210. stvx v24,$x00,$key_ # off-load round[1]
  2211. ?vperm v25,v31,v30,$keyperm
  2212. lvx v31,$x00,$key1
  2213. stvx v25,$x10,$key_ # off-load round[2]
  2214. addi $key_,$key_,0x20
  2215. bdnz Load_xts_enc_key
  2216. lvx v26,$x10,$key1
  2217. ?vperm v24,v30,v31,$keyperm
  2218. lvx v27,$x20,$key1
  2219. stvx v24,$x00,$key_ # off-load round[3]
  2220. ?vperm v25,v31,v26,$keyperm
  2221. lvx v28,$x30,$key1
  2222. stvx v25,$x10,$key_ # off-load round[4]
  2223. addi $key_,$sp,$FRAME+15 # rewind $key_
  2224. ?vperm v26,v26,v27,$keyperm
  2225. lvx v29,$x40,$key1
  2226. ?vperm v27,v27,v28,$keyperm
  2227. lvx v30,$x50,$key1
  2228. ?vperm v28,v28,v29,$keyperm
  2229. lvx v31,$x60,$key1
  2230. ?vperm v29,v29,v30,$keyperm
  2231. lvx $twk5,$x70,$key1 # borrow $twk5
  2232. ?vperm v30,v30,v31,$keyperm
  2233. lvx v24,$x00,$key_ # pre-load round[1]
  2234. ?vperm v31,v31,$twk5,$keyperm
  2235. lvx v25,$x10,$key_ # pre-load round[2]
  2236. vperm $in0,$inout,$inptail,$inpperm
  2237. subi $inp,$inp,31 # undo "caller"
  2238. vxor $twk0,$tweak,$rndkey0
  2239. vsrab $tmp,$tweak,$seven # next tweak value
  2240. vaddubm $tweak,$tweak,$tweak
  2241. vsldoi $tmp,$tmp,$tmp,15
  2242. vand $tmp,$tmp,$eighty7
  2243. vxor $out0,$in0,$twk0
  2244. vxor $tweak,$tweak,$tmp
  2245. lvx_u $in1,$x10,$inp
  2246. vxor $twk1,$tweak,$rndkey0
  2247. vsrab $tmp,$tweak,$seven # next tweak value
  2248. vaddubm $tweak,$tweak,$tweak
  2249. vsldoi $tmp,$tmp,$tmp,15
  2250. le?vperm $in1,$in1,$in1,$leperm
  2251. vand $tmp,$tmp,$eighty7
  2252. vxor $out1,$in1,$twk1
  2253. vxor $tweak,$tweak,$tmp
  2254. lvx_u $in2,$x20,$inp
  2255. andi. $taillen,$len,15
  2256. vxor $twk2,$tweak,$rndkey0
  2257. vsrab $tmp,$tweak,$seven # next tweak value
  2258. vaddubm $tweak,$tweak,$tweak
  2259. vsldoi $tmp,$tmp,$tmp,15
  2260. le?vperm $in2,$in2,$in2,$leperm
  2261. vand $tmp,$tmp,$eighty7
  2262. vxor $out2,$in2,$twk2
  2263. vxor $tweak,$tweak,$tmp
  2264. lvx_u $in3,$x30,$inp
  2265. sub $len,$len,$taillen
  2266. vxor $twk3,$tweak,$rndkey0
  2267. vsrab $tmp,$tweak,$seven # next tweak value
  2268. vaddubm $tweak,$tweak,$tweak
  2269. vsldoi $tmp,$tmp,$tmp,15
  2270. le?vperm $in3,$in3,$in3,$leperm
  2271. vand $tmp,$tmp,$eighty7
  2272. vxor $out3,$in3,$twk3
  2273. vxor $tweak,$tweak,$tmp
  2274. lvx_u $in4,$x40,$inp
  2275. subi $len,$len,0x60
  2276. vxor $twk4,$tweak,$rndkey0
  2277. vsrab $tmp,$tweak,$seven # next tweak value
  2278. vaddubm $tweak,$tweak,$tweak
  2279. vsldoi $tmp,$tmp,$tmp,15
  2280. le?vperm $in4,$in4,$in4,$leperm
  2281. vand $tmp,$tmp,$eighty7
  2282. vxor $out4,$in4,$twk4
  2283. vxor $tweak,$tweak,$tmp
  2284. lvx_u $in5,$x50,$inp
  2285. addi $inp,$inp,0x60
  2286. vxor $twk5,$tweak,$rndkey0
  2287. vsrab $tmp,$tweak,$seven # next tweak value
  2288. vaddubm $tweak,$tweak,$tweak
  2289. vsldoi $tmp,$tmp,$tmp,15
  2290. le?vperm $in5,$in5,$in5,$leperm
  2291. vand $tmp,$tmp,$eighty7
  2292. vxor $out5,$in5,$twk5
  2293. vxor $tweak,$tweak,$tmp
  2294. vxor v31,v31,$rndkey0
  2295. mtctr $rounds
  2296. b Loop_xts_enc6x
  2297. .align 5
  2298. Loop_xts_enc6x:
  2299. vcipher $out0,$out0,v24
  2300. vcipher $out1,$out1,v24
  2301. vcipher $out2,$out2,v24
  2302. vcipher $out3,$out3,v24
  2303. vcipher $out4,$out4,v24
  2304. vcipher $out5,$out5,v24
  2305. lvx v24,$x20,$key_ # round[3]
  2306. addi $key_,$key_,0x20
  2307. vcipher $out0,$out0,v25
  2308. vcipher $out1,$out1,v25
  2309. vcipher $out2,$out2,v25
  2310. vcipher $out3,$out3,v25
  2311. vcipher $out4,$out4,v25
  2312. vcipher $out5,$out5,v25
  2313. lvx v25,$x10,$key_ # round[4]
  2314. bdnz Loop_xts_enc6x
  2315. subic $len,$len,96 # $len-=96
  2316. vxor $in0,$twk0,v31 # xor with last round key
  2317. vcipher $out0,$out0,v24
  2318. vcipher $out1,$out1,v24
  2319. vsrab $tmp,$tweak,$seven # next tweak value
  2320. vxor $twk0,$tweak,$rndkey0
  2321. vaddubm $tweak,$tweak,$tweak
  2322. vcipher $out2,$out2,v24
  2323. vcipher $out3,$out3,v24
  2324. vsldoi $tmp,$tmp,$tmp,15
  2325. vcipher $out4,$out4,v24
  2326. vcipher $out5,$out5,v24
  2327. subfe. r0,r0,r0 # borrow?-1:0
  2328. vand $tmp,$tmp,$eighty7
  2329. vcipher $out0,$out0,v25
  2330. vcipher $out1,$out1,v25
  2331. vxor $tweak,$tweak,$tmp
  2332. vcipher $out2,$out2,v25
  2333. vcipher $out3,$out3,v25
  2334. vxor $in1,$twk1,v31
  2335. vsrab $tmp,$tweak,$seven # next tweak value
  2336. vxor $twk1,$tweak,$rndkey0
  2337. vcipher $out4,$out4,v25
  2338. vcipher $out5,$out5,v25
  2339. and r0,r0,$len
  2340. vaddubm $tweak,$tweak,$tweak
  2341. vsldoi $tmp,$tmp,$tmp,15
  2342. vcipher $out0,$out0,v26
  2343. vcipher $out1,$out1,v26
  2344. vand $tmp,$tmp,$eighty7
  2345. vcipher $out2,$out2,v26
  2346. vcipher $out3,$out3,v26
  2347. vxor $tweak,$tweak,$tmp
  2348. vcipher $out4,$out4,v26
  2349. vcipher $out5,$out5,v26
  2350. add $inp,$inp,r0 # $inp is adjusted in such
  2351. # way that at exit from the
  2352. # loop inX-in5 are loaded
  2353. # with last "words"
  2354. vxor $in2,$twk2,v31
  2355. vsrab $tmp,$tweak,$seven # next tweak value
  2356. vxor $twk2,$tweak,$rndkey0
  2357. vaddubm $tweak,$tweak,$tweak
  2358. vcipher $out0,$out0,v27
  2359. vcipher $out1,$out1,v27
  2360. vsldoi $tmp,$tmp,$tmp,15
  2361. vcipher $out2,$out2,v27
  2362. vcipher $out3,$out3,v27
  2363. vand $tmp,$tmp,$eighty7
  2364. vcipher $out4,$out4,v27
  2365. vcipher $out5,$out5,v27
  2366. addi $key_,$sp,$FRAME+15 # rewind $key_
  2367. vxor $tweak,$tweak,$tmp
  2368. vcipher $out0,$out0,v28
  2369. vcipher $out1,$out1,v28
  2370. vxor $in3,$twk3,v31
  2371. vsrab $tmp,$tweak,$seven # next tweak value
  2372. vxor $twk3,$tweak,$rndkey0
  2373. vcipher $out2,$out2,v28
  2374. vcipher $out3,$out3,v28
  2375. vaddubm $tweak,$tweak,$tweak
  2376. vsldoi $tmp,$tmp,$tmp,15
  2377. vcipher $out4,$out4,v28
  2378. vcipher $out5,$out5,v28
  2379. lvx v24,$x00,$key_ # re-pre-load round[1]
  2380. vand $tmp,$tmp,$eighty7
  2381. vcipher $out0,$out0,v29
  2382. vcipher $out1,$out1,v29
  2383. vxor $tweak,$tweak,$tmp
  2384. vcipher $out2,$out2,v29
  2385. vcipher $out3,$out3,v29
  2386. vxor $in4,$twk4,v31
  2387. vsrab $tmp,$tweak,$seven # next tweak value
  2388. vxor $twk4,$tweak,$rndkey0
  2389. vcipher $out4,$out4,v29
  2390. vcipher $out5,$out5,v29
  2391. lvx v25,$x10,$key_ # re-pre-load round[2]
  2392. vaddubm $tweak,$tweak,$tweak
  2393. vsldoi $tmp,$tmp,$tmp,15
  2394. vcipher $out0,$out0,v30
  2395. vcipher $out1,$out1,v30
  2396. vand $tmp,$tmp,$eighty7
  2397. vcipher $out2,$out2,v30
  2398. vcipher $out3,$out3,v30
  2399. vxor $tweak,$tweak,$tmp
  2400. vcipher $out4,$out4,v30
  2401. vcipher $out5,$out5,v30
  2402. vxor $in5,$twk5,v31
  2403. vsrab $tmp,$tweak,$seven # next tweak value
  2404. vxor $twk5,$tweak,$rndkey0
  2405. vcipherlast $out0,$out0,$in0
  2406. lvx_u $in0,$x00,$inp # load next input block
  2407. vaddubm $tweak,$tweak,$tweak
  2408. vsldoi $tmp,$tmp,$tmp,15
  2409. vcipherlast $out1,$out1,$in1
  2410. lvx_u $in1,$x10,$inp
  2411. vcipherlast $out2,$out2,$in2
  2412. le?vperm $in0,$in0,$in0,$leperm
  2413. lvx_u $in2,$x20,$inp
  2414. vand $tmp,$tmp,$eighty7
  2415. vcipherlast $out3,$out3,$in3
  2416. le?vperm $in1,$in1,$in1,$leperm
  2417. lvx_u $in3,$x30,$inp
  2418. vcipherlast $out4,$out4,$in4
  2419. le?vperm $in2,$in2,$in2,$leperm
  2420. lvx_u $in4,$x40,$inp
  2421. vxor $tweak,$tweak,$tmp
  2422. vcipherlast $tmp,$out5,$in5 # last block might be needed
  2423. # in stealing mode
  2424. le?vperm $in3,$in3,$in3,$leperm
  2425. lvx_u $in5,$x50,$inp
  2426. addi $inp,$inp,0x60
  2427. le?vperm $in4,$in4,$in4,$leperm
  2428. le?vperm $in5,$in5,$in5,$leperm
  2429. le?vperm $out0,$out0,$out0,$leperm
  2430. le?vperm $out1,$out1,$out1,$leperm
  2431. stvx_u $out0,$x00,$out # store output
  2432. vxor $out0,$in0,$twk0
  2433. le?vperm $out2,$out2,$out2,$leperm
  2434. stvx_u $out1,$x10,$out
  2435. vxor $out1,$in1,$twk1
  2436. le?vperm $out3,$out3,$out3,$leperm
  2437. stvx_u $out2,$x20,$out
  2438. vxor $out2,$in2,$twk2
  2439. le?vperm $out4,$out4,$out4,$leperm
  2440. stvx_u $out3,$x30,$out
  2441. vxor $out3,$in3,$twk3
  2442. le?vperm $out5,$tmp,$tmp,$leperm
  2443. stvx_u $out4,$x40,$out
  2444. vxor $out4,$in4,$twk4
  2445. le?stvx_u $out5,$x50,$out
  2446. be?stvx_u $tmp, $x50,$out
  2447. vxor $out5,$in5,$twk5
  2448. addi $out,$out,0x60
  2449. mtctr $rounds
  2450. beq Loop_xts_enc6x # did $len-=96 borrow?
  2451. addic. $len,$len,0x60
  2452. beq Lxts_enc6x_zero
  2453. cmpwi $len,0x20
  2454. blt Lxts_enc6x_one
  2455. nop
  2456. beq Lxts_enc6x_two
  2457. cmpwi $len,0x40
  2458. blt Lxts_enc6x_three
  2459. nop
  2460. beq Lxts_enc6x_four
  2461. Lxts_enc6x_five:
  2462. vxor $out0,$in1,$twk0
  2463. vxor $out1,$in2,$twk1
  2464. vxor $out2,$in3,$twk2
  2465. vxor $out3,$in4,$twk3
  2466. vxor $out4,$in5,$twk4
  2467. bl _aesp8_xts_enc5x
  2468. le?vperm $out0,$out0,$out0,$leperm
  2469. vmr $twk0,$twk5 # unused tweak
  2470. le?vperm $out1,$out1,$out1,$leperm
  2471. stvx_u $out0,$x00,$out # store output
  2472. le?vperm $out2,$out2,$out2,$leperm
  2473. stvx_u $out1,$x10,$out
  2474. le?vperm $out3,$out3,$out3,$leperm
  2475. stvx_u $out2,$x20,$out
  2476. vxor $tmp,$out4,$twk5 # last block prep for stealing
  2477. le?vperm $out4,$out4,$out4,$leperm
  2478. stvx_u $out3,$x30,$out
  2479. stvx_u $out4,$x40,$out
  2480. addi $out,$out,0x50
  2481. bne Lxts_enc6x_steal
  2482. b Lxts_enc6x_done
  2483. .align 4
  2484. Lxts_enc6x_four:
  2485. vxor $out0,$in2,$twk0
  2486. vxor $out1,$in3,$twk1
  2487. vxor $out2,$in4,$twk2
  2488. vxor $out3,$in5,$twk3
  2489. vxor $out4,$out4,$out4
  2490. bl _aesp8_xts_enc5x
  2491. le?vperm $out0,$out0,$out0,$leperm
  2492. vmr $twk0,$twk4 # unused tweak
  2493. le?vperm $out1,$out1,$out1,$leperm
  2494. stvx_u $out0,$x00,$out # store output
  2495. le?vperm $out2,$out2,$out2,$leperm
  2496. stvx_u $out1,$x10,$out
  2497. vxor $tmp,$out3,$twk4 # last block prep for stealing
  2498. le?vperm $out3,$out3,$out3,$leperm
  2499. stvx_u $out2,$x20,$out
  2500. stvx_u $out3,$x30,$out
  2501. addi $out,$out,0x40
  2502. bne Lxts_enc6x_steal
  2503. b Lxts_enc6x_done
  2504. .align 4
  2505. Lxts_enc6x_three:
  2506. vxor $out0,$in3,$twk0
  2507. vxor $out1,$in4,$twk1
  2508. vxor $out2,$in5,$twk2
  2509. vxor $out3,$out3,$out3
  2510. vxor $out4,$out4,$out4
  2511. bl _aesp8_xts_enc5x
  2512. le?vperm $out0,$out0,$out0,$leperm
  2513. vmr $twk0,$twk3 # unused tweak
  2514. le?vperm $out1,$out1,$out1,$leperm
  2515. stvx_u $out0,$x00,$out # store output
  2516. vxor $tmp,$out2,$twk3 # last block prep for stealing
  2517. le?vperm $out2,$out2,$out2,$leperm
  2518. stvx_u $out1,$x10,$out
  2519. stvx_u $out2,$x20,$out
  2520. addi $out,$out,0x30
  2521. bne Lxts_enc6x_steal
  2522. b Lxts_enc6x_done
  2523. .align 4
  2524. Lxts_enc6x_two:
  2525. vxor $out0,$in4,$twk0
  2526. vxor $out1,$in5,$twk1
  2527. vxor $out2,$out2,$out2
  2528. vxor $out3,$out3,$out3
  2529. vxor $out4,$out4,$out4
  2530. bl _aesp8_xts_enc5x
  2531. le?vperm $out0,$out0,$out0,$leperm
  2532. vmr $twk0,$twk2 # unused tweak
  2533. vxor $tmp,$out1,$twk2 # last block prep for stealing
  2534. le?vperm $out1,$out1,$out1,$leperm
  2535. stvx_u $out0,$x00,$out # store output
  2536. stvx_u $out1,$x10,$out
  2537. addi $out,$out,0x20
  2538. bne Lxts_enc6x_steal
  2539. b Lxts_enc6x_done
  2540. .align 4
  2541. Lxts_enc6x_one:
  2542. vxor $out0,$in5,$twk0
  2543. nop
  2544. Loop_xts_enc1x:
  2545. vcipher $out0,$out0,v24
  2546. lvx v24,$x20,$key_ # round[3]
  2547. addi $key_,$key_,0x20
  2548. vcipher $out0,$out0,v25
  2549. lvx v25,$x10,$key_ # round[4]
  2550. bdnz Loop_xts_enc1x
  2551. add $inp,$inp,$taillen
  2552. cmpwi $taillen,0
  2553. vcipher $out0,$out0,v24
  2554. subi $inp,$inp,16
  2555. vcipher $out0,$out0,v25
  2556. lvsr $inpperm,0,$taillen
  2557. vcipher $out0,$out0,v26
  2558. lvx_u $in0,0,$inp
  2559. vcipher $out0,$out0,v27
  2560. addi $key_,$sp,$FRAME+15 # rewind $key_
  2561. vcipher $out0,$out0,v28
  2562. lvx v24,$x00,$key_ # re-pre-load round[1]
  2563. vcipher $out0,$out0,v29
  2564. lvx v25,$x10,$key_ # re-pre-load round[2]
  2565. vxor $twk0,$twk0,v31
  2566. le?vperm $in0,$in0,$in0,$leperm
  2567. vcipher $out0,$out0,v30
  2568. vperm $in0,$in0,$in0,$inpperm
  2569. vcipherlast $out0,$out0,$twk0
  2570. vmr $twk0,$twk1 # unused tweak
  2571. vxor $tmp,$out0,$twk1 # last block prep for stealing
  2572. le?vperm $out0,$out0,$out0,$leperm
  2573. stvx_u $out0,$x00,$out # store output
  2574. addi $out,$out,0x10
  2575. bne Lxts_enc6x_steal
  2576. b Lxts_enc6x_done
  2577. .align 4
  2578. Lxts_enc6x_zero:
  2579. cmpwi $taillen,0
  2580. beq Lxts_enc6x_done
  2581. add $inp,$inp,$taillen
  2582. subi $inp,$inp,16
  2583. lvx_u $in0,0,$inp
  2584. lvsr $inpperm,0,$taillen # $in5 is no more
  2585. le?vperm $in0,$in0,$in0,$leperm
  2586. vperm $in0,$in0,$in0,$inpperm
  2587. vxor $tmp,$tmp,$twk0
  2588. Lxts_enc6x_steal:
  2589. vxor $in0,$in0,$twk0
  2590. vxor $out0,$out0,$out0
  2591. vspltisb $out1,-1
  2592. vperm $out0,$out0,$out1,$inpperm
  2593. vsel $out0,$in0,$tmp,$out0 # $tmp is last block, remember?
  2594. subi r30,$out,17
  2595. subi $out,$out,16
  2596. mtctr $taillen
  2597. Loop_xts_enc6x_steal:
  2598. lbzu r0,1(r30)
  2599. stb r0,16(r30)
  2600. bdnz Loop_xts_enc6x_steal
  2601. li $taillen,0
  2602. mtctr $rounds
  2603. b Loop_xts_enc1x # one more time...
  2604. .align 4
  2605. Lxts_enc6x_done:
  2606. ${UCMP}i $ivp,0
  2607. beq Lxts_enc6x_ret
  2608. vxor $tweak,$twk0,$rndkey0
  2609. le?vperm $tweak,$tweak,$tweak,$leperm
  2610. stvx_u $tweak,0,$ivp
  2611. Lxts_enc6x_ret:
  2612. mtlr r11
  2613. li r10,`$FRAME+15`
  2614. li r11,`$FRAME+31`
  2615. stvx $seven,r10,$sp # wipe copies of round keys
  2616. addi r10,r10,32
  2617. stvx $seven,r11,$sp
  2618. addi r11,r11,32
  2619. stvx $seven,r10,$sp
  2620. addi r10,r10,32
  2621. stvx $seven,r11,$sp
  2622. addi r11,r11,32
  2623. stvx $seven,r10,$sp
  2624. addi r10,r10,32
  2625. stvx $seven,r11,$sp
  2626. addi r11,r11,32
  2627. stvx $seven,r10,$sp
  2628. addi r10,r10,32
  2629. stvx $seven,r11,$sp
  2630. addi r11,r11,32
  2631. mtspr 256,$vrsave
  2632. lvx v20,r10,$sp # ABI says so
  2633. addi r10,r10,32
  2634. lvx v21,r11,$sp
  2635. addi r11,r11,32
  2636. lvx v22,r10,$sp
  2637. addi r10,r10,32
  2638. lvx v23,r11,$sp
  2639. addi r11,r11,32
  2640. lvx v24,r10,$sp
  2641. addi r10,r10,32
  2642. lvx v25,r11,$sp
  2643. addi r11,r11,32
  2644. lvx v26,r10,$sp
  2645. addi r10,r10,32
  2646. lvx v27,r11,$sp
  2647. addi r11,r11,32
  2648. lvx v28,r10,$sp
  2649. addi r10,r10,32
  2650. lvx v29,r11,$sp
  2651. addi r11,r11,32
  2652. lvx v30,r10,$sp
  2653. lvx v31,r11,$sp
  2654. $POP r26,`$FRAME+21*16+0*$SIZE_T`($sp)
  2655. $POP r27,`$FRAME+21*16+1*$SIZE_T`($sp)
  2656. $POP r28,`$FRAME+21*16+2*$SIZE_T`($sp)
  2657. $POP r29,`$FRAME+21*16+3*$SIZE_T`($sp)
  2658. $POP r30,`$FRAME+21*16+4*$SIZE_T`($sp)
  2659. $POP r31,`$FRAME+21*16+5*$SIZE_T`($sp)
  2660. addi $sp,$sp,`$FRAME+21*16+6*$SIZE_T`
  2661. blr
  2662. .long 0
  2663. .byte 0,12,0x04,1,0x80,6,6,0
  2664. .long 0
  2665. .align 5
  2666. _aesp8_xts_enc5x:
  2667. vcipher $out0,$out0,v24
  2668. vcipher $out1,$out1,v24
  2669. vcipher $out2,$out2,v24
  2670. vcipher $out3,$out3,v24
  2671. vcipher $out4,$out4,v24
  2672. lvx v24,$x20,$key_ # round[3]
  2673. addi $key_,$key_,0x20
  2674. vcipher $out0,$out0,v25
  2675. vcipher $out1,$out1,v25
  2676. vcipher $out2,$out2,v25
  2677. vcipher $out3,$out3,v25
  2678. vcipher $out4,$out4,v25
  2679. lvx v25,$x10,$key_ # round[4]
  2680. bdnz _aesp8_xts_enc5x
  2681. add $inp,$inp,$taillen
  2682. cmpwi $taillen,0
  2683. vcipher $out0,$out0,v24
  2684. vcipher $out1,$out1,v24
  2685. vcipher $out2,$out2,v24
  2686. vcipher $out3,$out3,v24
  2687. vcipher $out4,$out4,v24
  2688. subi $inp,$inp,16
  2689. vcipher $out0,$out0,v25
  2690. vcipher $out1,$out1,v25
  2691. vcipher $out2,$out2,v25
  2692. vcipher $out3,$out3,v25
  2693. vcipher $out4,$out4,v25
  2694. vxor $twk0,$twk0,v31
  2695. vcipher $out0,$out0,v26
  2696. lvsr $inpperm,0,$taillen # $in5 is no more
  2697. vcipher $out1,$out1,v26
  2698. vcipher $out2,$out2,v26
  2699. vcipher $out3,$out3,v26
  2700. vcipher $out4,$out4,v26
  2701. vxor $in1,$twk1,v31
  2702. vcipher $out0,$out0,v27
  2703. lvx_u $in0,0,$inp
  2704. vcipher $out1,$out1,v27
  2705. vcipher $out2,$out2,v27
  2706. vcipher $out3,$out3,v27
  2707. vcipher $out4,$out4,v27
  2708. vxor $in2,$twk2,v31
  2709. addi $key_,$sp,$FRAME+15 # rewind $key_
  2710. vcipher $out0,$out0,v28
  2711. vcipher $out1,$out1,v28
  2712. vcipher $out2,$out2,v28
  2713. vcipher $out3,$out3,v28
  2714. vcipher $out4,$out4,v28
  2715. lvx v24,$x00,$key_ # re-pre-load round[1]
  2716. vxor $in3,$twk3,v31
  2717. vcipher $out0,$out0,v29
  2718. le?vperm $in0,$in0,$in0,$leperm
  2719. vcipher $out1,$out1,v29
  2720. vcipher $out2,$out2,v29
  2721. vcipher $out3,$out3,v29
  2722. vcipher $out4,$out4,v29
  2723. lvx v25,$x10,$key_ # re-pre-load round[2]
  2724. vxor $in4,$twk4,v31
  2725. vcipher $out0,$out0,v30
  2726. vperm $in0,$in0,$in0,$inpperm
  2727. vcipher $out1,$out1,v30
  2728. vcipher $out2,$out2,v30
  2729. vcipher $out3,$out3,v30
  2730. vcipher $out4,$out4,v30
  2731. vcipherlast $out0,$out0,$twk0
  2732. vcipherlast $out1,$out1,$in1
  2733. vcipherlast $out2,$out2,$in2
  2734. vcipherlast $out3,$out3,$in3
  2735. vcipherlast $out4,$out4,$in4
  2736. blr
  2737. .long 0
  2738. .byte 0,12,0x14,0,0,0,0,0
  2739. .align 5
  2740. _aesp8_xts_decrypt6x:
  2741. $STU $sp,-`($FRAME+21*16+6*$SIZE_T)`($sp)
  2742. mflr r11
  2743. li r7,`$FRAME+8*16+15`
  2744. li r3,`$FRAME+8*16+31`
  2745. $PUSH r11,`$FRAME+21*16+6*$SIZE_T+$LRSAVE`($sp)
  2746. stvx v20,r7,$sp # ABI says so
  2747. addi r7,r7,32
  2748. stvx v21,r3,$sp
  2749. addi r3,r3,32
  2750. stvx v22,r7,$sp
  2751. addi r7,r7,32
  2752. stvx v23,r3,$sp
  2753. addi r3,r3,32
  2754. stvx v24,r7,$sp
  2755. addi r7,r7,32
  2756. stvx v25,r3,$sp
  2757. addi r3,r3,32
  2758. stvx v26,r7,$sp
  2759. addi r7,r7,32
  2760. stvx v27,r3,$sp
  2761. addi r3,r3,32
  2762. stvx v28,r7,$sp
  2763. addi r7,r7,32
  2764. stvx v29,r3,$sp
  2765. addi r3,r3,32
  2766. stvx v30,r7,$sp
  2767. stvx v31,r3,$sp
  2768. li r0,-1
  2769. stw $vrsave,`$FRAME+21*16-4`($sp) # save vrsave
  2770. li $x10,0x10
  2771. $PUSH r26,`$FRAME+21*16+0*$SIZE_T`($sp)
  2772. li $x20,0x20
  2773. $PUSH r27,`$FRAME+21*16+1*$SIZE_T`($sp)
  2774. li $x30,0x30
  2775. $PUSH r28,`$FRAME+21*16+2*$SIZE_T`($sp)
  2776. li $x40,0x40
  2777. $PUSH r29,`$FRAME+21*16+3*$SIZE_T`($sp)
  2778. li $x50,0x50
  2779. $PUSH r30,`$FRAME+21*16+4*$SIZE_T`($sp)
  2780. li $x60,0x60
  2781. $PUSH r31,`$FRAME+21*16+5*$SIZE_T`($sp)
  2782. li $x70,0x70
  2783. mtspr 256,r0
  2784. subi $rounds,$rounds,3 # -4 in total
  2785. lvx $rndkey0,$x00,$key1 # load key schedule
  2786. lvx v30,$x10,$key1
  2787. addi $key1,$key1,0x20
  2788. lvx v31,$x00,$key1
  2789. ?vperm $rndkey0,$rndkey0,v30,$keyperm
  2790. addi $key_,$sp,$FRAME+15
  2791. mtctr $rounds
  2792. Load_xts_dec_key:
  2793. ?vperm v24,v30,v31,$keyperm
  2794. lvx v30,$x10,$key1
  2795. addi $key1,$key1,0x20
  2796. stvx v24,$x00,$key_ # off-load round[1]
  2797. ?vperm v25,v31,v30,$keyperm
  2798. lvx v31,$x00,$key1
  2799. stvx v25,$x10,$key_ # off-load round[2]
  2800. addi $key_,$key_,0x20
  2801. bdnz Load_xts_dec_key
  2802. lvx v26,$x10,$key1
  2803. ?vperm v24,v30,v31,$keyperm
  2804. lvx v27,$x20,$key1
  2805. stvx v24,$x00,$key_ # off-load round[3]
  2806. ?vperm v25,v31,v26,$keyperm
  2807. lvx v28,$x30,$key1
  2808. stvx v25,$x10,$key_ # off-load round[4]
  2809. addi $key_,$sp,$FRAME+15 # rewind $key_
  2810. ?vperm v26,v26,v27,$keyperm
  2811. lvx v29,$x40,$key1
  2812. ?vperm v27,v27,v28,$keyperm
  2813. lvx v30,$x50,$key1
  2814. ?vperm v28,v28,v29,$keyperm
  2815. lvx v31,$x60,$key1
  2816. ?vperm v29,v29,v30,$keyperm
  2817. lvx $twk5,$x70,$key1 # borrow $twk5
  2818. ?vperm v30,v30,v31,$keyperm
  2819. lvx v24,$x00,$key_ # pre-load round[1]
  2820. ?vperm v31,v31,$twk5,$keyperm
  2821. lvx v25,$x10,$key_ # pre-load round[2]
  2822. vperm $in0,$inout,$inptail,$inpperm
  2823. subi $inp,$inp,31 # undo "caller"
  2824. vxor $twk0,$tweak,$rndkey0
  2825. vsrab $tmp,$tweak,$seven # next tweak value
  2826. vaddubm $tweak,$tweak,$tweak
  2827. vsldoi $tmp,$tmp,$tmp,15
  2828. vand $tmp,$tmp,$eighty7
  2829. vxor $out0,$in0,$twk0
  2830. vxor $tweak,$tweak,$tmp
  2831. lvx_u $in1,$x10,$inp
  2832. vxor $twk1,$tweak,$rndkey0
  2833. vsrab $tmp,$tweak,$seven # next tweak value
  2834. vaddubm $tweak,$tweak,$tweak
  2835. vsldoi $tmp,$tmp,$tmp,15
  2836. le?vperm $in1,$in1,$in1,$leperm
  2837. vand $tmp,$tmp,$eighty7
  2838. vxor $out1,$in1,$twk1
  2839. vxor $tweak,$tweak,$tmp
  2840. lvx_u $in2,$x20,$inp
  2841. andi. $taillen,$len,15
  2842. vxor $twk2,$tweak,$rndkey0
  2843. vsrab $tmp,$tweak,$seven # next tweak value
  2844. vaddubm $tweak,$tweak,$tweak
  2845. vsldoi $tmp,$tmp,$tmp,15
  2846. le?vperm $in2,$in2,$in2,$leperm
  2847. vand $tmp,$tmp,$eighty7
  2848. vxor $out2,$in2,$twk2
  2849. vxor $tweak,$tweak,$tmp
  2850. lvx_u $in3,$x30,$inp
  2851. sub $len,$len,$taillen
  2852. vxor $twk3,$tweak,$rndkey0
  2853. vsrab $tmp,$tweak,$seven # next tweak value
  2854. vaddubm $tweak,$tweak,$tweak
  2855. vsldoi $tmp,$tmp,$tmp,15
  2856. le?vperm $in3,$in3,$in3,$leperm
  2857. vand $tmp,$tmp,$eighty7
  2858. vxor $out3,$in3,$twk3
  2859. vxor $tweak,$tweak,$tmp
  2860. lvx_u $in4,$x40,$inp
  2861. subi $len,$len,0x60
  2862. vxor $twk4,$tweak,$rndkey0
  2863. vsrab $tmp,$tweak,$seven # next tweak value
  2864. vaddubm $tweak,$tweak,$tweak
  2865. vsldoi $tmp,$tmp,$tmp,15
  2866. le?vperm $in4,$in4,$in4,$leperm
  2867. vand $tmp,$tmp,$eighty7
  2868. vxor $out4,$in4,$twk4
  2869. vxor $tweak,$tweak,$tmp
  2870. lvx_u $in5,$x50,$inp
  2871. addi $inp,$inp,0x60
  2872. vxor $twk5,$tweak,$rndkey0
  2873. vsrab $tmp,$tweak,$seven # next tweak value
  2874. vaddubm $tweak,$tweak,$tweak
  2875. vsldoi $tmp,$tmp,$tmp,15
  2876. le?vperm $in5,$in5,$in5,$leperm
  2877. vand $tmp,$tmp,$eighty7
  2878. vxor $out5,$in5,$twk5
  2879. vxor $tweak,$tweak,$tmp
  2880. vxor v31,v31,$rndkey0
  2881. mtctr $rounds
  2882. b Loop_xts_dec6x
  2883. .align 5
  2884. Loop_xts_dec6x:
  2885. vncipher $out0,$out0,v24
  2886. vncipher $out1,$out1,v24
  2887. vncipher $out2,$out2,v24
  2888. vncipher $out3,$out3,v24
  2889. vncipher $out4,$out4,v24
  2890. vncipher $out5,$out5,v24
  2891. lvx v24,$x20,$key_ # round[3]
  2892. addi $key_,$key_,0x20
  2893. vncipher $out0,$out0,v25
  2894. vncipher $out1,$out1,v25
  2895. vncipher $out2,$out2,v25
  2896. vncipher $out3,$out3,v25
  2897. vncipher $out4,$out4,v25
  2898. vncipher $out5,$out5,v25
  2899. lvx v25,$x10,$key_ # round[4]
  2900. bdnz Loop_xts_dec6x
  2901. subic $len,$len,96 # $len-=96
  2902. vxor $in0,$twk0,v31 # xor with last round key
  2903. vncipher $out0,$out0,v24
  2904. vncipher $out1,$out1,v24
  2905. vsrab $tmp,$tweak,$seven # next tweak value
  2906. vxor $twk0,$tweak,$rndkey0
  2907. vaddubm $tweak,$tweak,$tweak
  2908. vncipher $out2,$out2,v24
  2909. vncipher $out3,$out3,v24
  2910. vsldoi $tmp,$tmp,$tmp,15
  2911. vncipher $out4,$out4,v24
  2912. vncipher $out5,$out5,v24
  2913. subfe. r0,r0,r0 # borrow?-1:0
  2914. vand $tmp,$tmp,$eighty7
  2915. vncipher $out0,$out0,v25
  2916. vncipher $out1,$out1,v25
  2917. vxor $tweak,$tweak,$tmp
  2918. vncipher $out2,$out2,v25
  2919. vncipher $out3,$out3,v25
  2920. vxor $in1,$twk1,v31
  2921. vsrab $tmp,$tweak,$seven # next tweak value
  2922. vxor $twk1,$tweak,$rndkey0
  2923. vncipher $out4,$out4,v25
  2924. vncipher $out5,$out5,v25
  2925. and r0,r0,$len
  2926. vaddubm $tweak,$tweak,$tweak
  2927. vsldoi $tmp,$tmp,$tmp,15
  2928. vncipher $out0,$out0,v26
  2929. vncipher $out1,$out1,v26
  2930. vand $tmp,$tmp,$eighty7
  2931. vncipher $out2,$out2,v26
  2932. vncipher $out3,$out3,v26
  2933. vxor $tweak,$tweak,$tmp
  2934. vncipher $out4,$out4,v26
  2935. vncipher $out5,$out5,v26
  2936. add $inp,$inp,r0 # $inp is adjusted in such
  2937. # way that at exit from the
  2938. # loop inX-in5 are loaded
  2939. # with last "words"
  2940. vxor $in2,$twk2,v31
  2941. vsrab $tmp,$tweak,$seven # next tweak value
  2942. vxor $twk2,$tweak,$rndkey0
  2943. vaddubm $tweak,$tweak,$tweak
  2944. vncipher $out0,$out0,v27
  2945. vncipher $out1,$out1,v27
  2946. vsldoi $tmp,$tmp,$tmp,15
  2947. vncipher $out2,$out2,v27
  2948. vncipher $out3,$out3,v27
  2949. vand $tmp,$tmp,$eighty7
  2950. vncipher $out4,$out4,v27
  2951. vncipher $out5,$out5,v27
  2952. addi $key_,$sp,$FRAME+15 # rewind $key_
  2953. vxor $tweak,$tweak,$tmp
  2954. vncipher $out0,$out0,v28
  2955. vncipher $out1,$out1,v28
  2956. vxor $in3,$twk3,v31
  2957. vsrab $tmp,$tweak,$seven # next tweak value
  2958. vxor $twk3,$tweak,$rndkey0
  2959. vncipher $out2,$out2,v28
  2960. vncipher $out3,$out3,v28
  2961. vaddubm $tweak,$tweak,$tweak
  2962. vsldoi $tmp,$tmp,$tmp,15
  2963. vncipher $out4,$out4,v28
  2964. vncipher $out5,$out5,v28
  2965. lvx v24,$x00,$key_ # re-pre-load round[1]
  2966. vand $tmp,$tmp,$eighty7
  2967. vncipher $out0,$out0,v29
  2968. vncipher $out1,$out1,v29
  2969. vxor $tweak,$tweak,$tmp
  2970. vncipher $out2,$out2,v29
  2971. vncipher $out3,$out3,v29
  2972. vxor $in4,$twk4,v31
  2973. vsrab $tmp,$tweak,$seven # next tweak value
  2974. vxor $twk4,$tweak,$rndkey0
  2975. vncipher $out4,$out4,v29
  2976. vncipher $out5,$out5,v29
  2977. lvx v25,$x10,$key_ # re-pre-load round[2]
  2978. vaddubm $tweak,$tweak,$tweak
  2979. vsldoi $tmp,$tmp,$tmp,15
  2980. vncipher $out0,$out0,v30
  2981. vncipher $out1,$out1,v30
  2982. vand $tmp,$tmp,$eighty7
  2983. vncipher $out2,$out2,v30
  2984. vncipher $out3,$out3,v30
  2985. vxor $tweak,$tweak,$tmp
  2986. vncipher $out4,$out4,v30
  2987. vncipher $out5,$out5,v30
  2988. vxor $in5,$twk5,v31
  2989. vsrab $tmp,$tweak,$seven # next tweak value
  2990. vxor $twk5,$tweak,$rndkey0
  2991. vncipherlast $out0,$out0,$in0
  2992. lvx_u $in0,$x00,$inp # load next input block
  2993. vaddubm $tweak,$tweak,$tweak
  2994. vsldoi $tmp,$tmp,$tmp,15
  2995. vncipherlast $out1,$out1,$in1
  2996. lvx_u $in1,$x10,$inp
  2997. vncipherlast $out2,$out2,$in2
  2998. le?vperm $in0,$in0,$in0,$leperm
  2999. lvx_u $in2,$x20,$inp
  3000. vand $tmp,$tmp,$eighty7
  3001. vncipherlast $out3,$out3,$in3
  3002. le?vperm $in1,$in1,$in1,$leperm
  3003. lvx_u $in3,$x30,$inp
  3004. vncipherlast $out4,$out4,$in4
  3005. le?vperm $in2,$in2,$in2,$leperm
  3006. lvx_u $in4,$x40,$inp
  3007. vxor $tweak,$tweak,$tmp
  3008. vncipherlast $out5,$out5,$in5
  3009. le?vperm $in3,$in3,$in3,$leperm
  3010. lvx_u $in5,$x50,$inp
  3011. addi $inp,$inp,0x60
  3012. le?vperm $in4,$in4,$in4,$leperm
  3013. le?vperm $in5,$in5,$in5,$leperm
  3014. le?vperm $out0,$out0,$out0,$leperm
  3015. le?vperm $out1,$out1,$out1,$leperm
  3016. stvx_u $out0,$x00,$out # store output
  3017. vxor $out0,$in0,$twk0
  3018. le?vperm $out2,$out2,$out2,$leperm
  3019. stvx_u $out1,$x10,$out
  3020. vxor $out1,$in1,$twk1
  3021. le?vperm $out3,$out3,$out3,$leperm
  3022. stvx_u $out2,$x20,$out
  3023. vxor $out2,$in2,$twk2
  3024. le?vperm $out4,$out4,$out4,$leperm
  3025. stvx_u $out3,$x30,$out
  3026. vxor $out3,$in3,$twk3
  3027. le?vperm $out5,$out5,$out5,$leperm
  3028. stvx_u $out4,$x40,$out
  3029. vxor $out4,$in4,$twk4
  3030. stvx_u $out5,$x50,$out
  3031. vxor $out5,$in5,$twk5
  3032. addi $out,$out,0x60
  3033. mtctr $rounds
  3034. beq Loop_xts_dec6x # did $len-=96 borrow?
  3035. addic. $len,$len,0x60
  3036. beq Lxts_dec6x_zero
  3037. cmpwi $len,0x20
  3038. blt Lxts_dec6x_one
  3039. nop
  3040. beq Lxts_dec6x_two
  3041. cmpwi $len,0x40
  3042. blt Lxts_dec6x_three
  3043. nop
  3044. beq Lxts_dec6x_four
  3045. Lxts_dec6x_five:
  3046. vxor $out0,$in1,$twk0
  3047. vxor $out1,$in2,$twk1
  3048. vxor $out2,$in3,$twk2
  3049. vxor $out3,$in4,$twk3
  3050. vxor $out4,$in5,$twk4
  3051. bl _aesp8_xts_dec5x
  3052. le?vperm $out0,$out0,$out0,$leperm
  3053. vmr $twk0,$twk5 # unused tweak
  3054. vxor $twk1,$tweak,$rndkey0
  3055. le?vperm $out1,$out1,$out1,$leperm
  3056. stvx_u $out0,$x00,$out # store output
  3057. vxor $out0,$in0,$twk1
  3058. le?vperm $out2,$out2,$out2,$leperm
  3059. stvx_u $out1,$x10,$out
  3060. le?vperm $out3,$out3,$out3,$leperm
  3061. stvx_u $out2,$x20,$out
  3062. le?vperm $out4,$out4,$out4,$leperm
  3063. stvx_u $out3,$x30,$out
  3064. stvx_u $out4,$x40,$out
  3065. addi $out,$out,0x50
  3066. bne Lxts_dec6x_steal
  3067. b Lxts_dec6x_done
  3068. .align 4
  3069. Lxts_dec6x_four:
  3070. vxor $out0,$in2,$twk0
  3071. vxor $out1,$in3,$twk1
  3072. vxor $out2,$in4,$twk2
  3073. vxor $out3,$in5,$twk3
  3074. vxor $out4,$out4,$out4
  3075. bl _aesp8_xts_dec5x
  3076. le?vperm $out0,$out0,$out0,$leperm
  3077. vmr $twk0,$twk4 # unused tweak
  3078. vmr $twk1,$twk5
  3079. le?vperm $out1,$out1,$out1,$leperm
  3080. stvx_u $out0,$x00,$out # store output
  3081. vxor $out0,$in0,$twk5
  3082. le?vperm $out2,$out2,$out2,$leperm
  3083. stvx_u $out1,$x10,$out
  3084. le?vperm $out3,$out3,$out3,$leperm
  3085. stvx_u $out2,$x20,$out
  3086. stvx_u $out3,$x30,$out
  3087. addi $out,$out,0x40
  3088. bne Lxts_dec6x_steal
  3089. b Lxts_dec6x_done
  3090. .align 4
  3091. Lxts_dec6x_three:
  3092. vxor $out0,$in3,$twk0
  3093. vxor $out1,$in4,$twk1
  3094. vxor $out2,$in5,$twk2
  3095. vxor $out3,$out3,$out3
  3096. vxor $out4,$out4,$out4
  3097. bl _aesp8_xts_dec5x
  3098. le?vperm $out0,$out0,$out0,$leperm
  3099. vmr $twk0,$twk3 # unused tweak
  3100. vmr $twk1,$twk4
  3101. le?vperm $out1,$out1,$out1,$leperm
  3102. stvx_u $out0,$x00,$out # store output
  3103. vxor $out0,$in0,$twk4
  3104. le?vperm $out2,$out2,$out2,$leperm
  3105. stvx_u $out1,$x10,$out
  3106. stvx_u $out2,$x20,$out
  3107. addi $out,$out,0x30
  3108. bne Lxts_dec6x_steal
  3109. b Lxts_dec6x_done
  3110. .align 4
  3111. Lxts_dec6x_two:
  3112. vxor $out0,$in4,$twk0
  3113. vxor $out1,$in5,$twk1
  3114. vxor $out2,$out2,$out2
  3115. vxor $out3,$out3,$out3
  3116. vxor $out4,$out4,$out4
  3117. bl _aesp8_xts_dec5x
  3118. le?vperm $out0,$out0,$out0,$leperm
  3119. vmr $twk0,$twk2 # unused tweak
  3120. vmr $twk1,$twk3
  3121. le?vperm $out1,$out1,$out1,$leperm
  3122. stvx_u $out0,$x00,$out # store output
  3123. vxor $out0,$in0,$twk3
  3124. stvx_u $out1,$x10,$out
  3125. addi $out,$out,0x20
  3126. bne Lxts_dec6x_steal
  3127. b Lxts_dec6x_done
  3128. .align 4
  3129. Lxts_dec6x_one:
  3130. vxor $out0,$in5,$twk0
  3131. nop
  3132. Loop_xts_dec1x:
  3133. vncipher $out0,$out0,v24
  3134. lvx v24,$x20,$key_ # round[3]
  3135. addi $key_,$key_,0x20
  3136. vncipher $out0,$out0,v25
  3137. lvx v25,$x10,$key_ # round[4]
  3138. bdnz Loop_xts_dec1x
  3139. subi r0,$taillen,1
  3140. vncipher $out0,$out0,v24
  3141. andi. r0,r0,16
  3142. cmpwi $taillen,0
  3143. vncipher $out0,$out0,v25
  3144. sub $inp,$inp,r0
  3145. vncipher $out0,$out0,v26
  3146. lvx_u $in0,0,$inp
  3147. vncipher $out0,$out0,v27
  3148. addi $key_,$sp,$FRAME+15 # rewind $key_
  3149. vncipher $out0,$out0,v28
  3150. lvx v24,$x00,$key_ # re-pre-load round[1]
  3151. vncipher $out0,$out0,v29
  3152. lvx v25,$x10,$key_ # re-pre-load round[2]
  3153. vxor $twk0,$twk0,v31
  3154. le?vperm $in0,$in0,$in0,$leperm
  3155. vncipher $out0,$out0,v30
  3156. mtctr $rounds
  3157. vncipherlast $out0,$out0,$twk0
  3158. vmr $twk0,$twk1 # unused tweak
  3159. vmr $twk1,$twk2
  3160. le?vperm $out0,$out0,$out0,$leperm
  3161. stvx_u $out0,$x00,$out # store output
  3162. addi $out,$out,0x10
  3163. vxor $out0,$in0,$twk2
  3164. bne Lxts_dec6x_steal
  3165. b Lxts_dec6x_done
  3166. .align 4
  3167. Lxts_dec6x_zero:
  3168. cmpwi $taillen,0
  3169. beq Lxts_dec6x_done
  3170. lvx_u $in0,0,$inp
  3171. le?vperm $in0,$in0,$in0,$leperm
  3172. vxor $out0,$in0,$twk1
  3173. Lxts_dec6x_steal:
  3174. vncipher $out0,$out0,v24
  3175. lvx v24,$x20,$key_ # round[3]
  3176. addi $key_,$key_,0x20
  3177. vncipher $out0,$out0,v25
  3178. lvx v25,$x10,$key_ # round[4]
  3179. bdnz Lxts_dec6x_steal
  3180. add $inp,$inp,$taillen
  3181. vncipher $out0,$out0,v24
  3182. cmpwi $taillen,0
  3183. vncipher $out0,$out0,v25
  3184. lvx_u $in0,0,$inp
  3185. vncipher $out0,$out0,v26
  3186. lvsr $inpperm,0,$taillen # $in5 is no more
  3187. vncipher $out0,$out0,v27
  3188. addi $key_,$sp,$FRAME+15 # rewind $key_
  3189. vncipher $out0,$out0,v28
  3190. lvx v24,$x00,$key_ # re-pre-load round[1]
  3191. vncipher $out0,$out0,v29
  3192. lvx v25,$x10,$key_ # re-pre-load round[2]
  3193. vxor $twk1,$twk1,v31
  3194. le?vperm $in0,$in0,$in0,$leperm
  3195. vncipher $out0,$out0,v30
  3196. vperm $in0,$in0,$in0,$inpperm
  3197. vncipherlast $tmp,$out0,$twk1
  3198. le?vperm $out0,$tmp,$tmp,$leperm
  3199. le?stvx_u $out0,0,$out
  3200. be?stvx_u $tmp,0,$out
  3201. vxor $out0,$out0,$out0
  3202. vspltisb $out1,-1
  3203. vperm $out0,$out0,$out1,$inpperm
  3204. vsel $out0,$in0,$tmp,$out0
  3205. vxor $out0,$out0,$twk0
  3206. subi r30,$out,1
  3207. mtctr $taillen
  3208. Loop_xts_dec6x_steal:
  3209. lbzu r0,1(r30)
  3210. stb r0,16(r30)
  3211. bdnz Loop_xts_dec6x_steal
  3212. li $taillen,0
  3213. mtctr $rounds
  3214. b Loop_xts_dec1x # one more time...
  3215. .align 4
  3216. Lxts_dec6x_done:
  3217. ${UCMP}i $ivp,0
  3218. beq Lxts_dec6x_ret
  3219. vxor $tweak,$twk0,$rndkey0
  3220. le?vperm $tweak,$tweak,$tweak,$leperm
  3221. stvx_u $tweak,0,$ivp
  3222. Lxts_dec6x_ret:
  3223. mtlr r11
  3224. li r10,`$FRAME+15`
  3225. li r11,`$FRAME+31`
  3226. stvx $seven,r10,$sp # wipe copies of round keys
  3227. addi r10,r10,32
  3228. stvx $seven,r11,$sp
  3229. addi r11,r11,32
  3230. stvx $seven,r10,$sp
  3231. addi r10,r10,32
  3232. stvx $seven,r11,$sp
  3233. addi r11,r11,32
  3234. stvx $seven,r10,$sp
  3235. addi r10,r10,32
  3236. stvx $seven,r11,$sp
  3237. addi r11,r11,32
  3238. stvx $seven,r10,$sp
  3239. addi r10,r10,32
  3240. stvx $seven,r11,$sp
  3241. addi r11,r11,32
  3242. mtspr 256,$vrsave
  3243. lvx v20,r10,$sp # ABI says so
  3244. addi r10,r10,32
  3245. lvx v21,r11,$sp
  3246. addi r11,r11,32
  3247. lvx v22,r10,$sp
  3248. addi r10,r10,32
  3249. lvx v23,r11,$sp
  3250. addi r11,r11,32
  3251. lvx v24,r10,$sp
  3252. addi r10,r10,32
  3253. lvx v25,r11,$sp
  3254. addi r11,r11,32
  3255. lvx v26,r10,$sp
  3256. addi r10,r10,32
  3257. lvx v27,r11,$sp
  3258. addi r11,r11,32
  3259. lvx v28,r10,$sp
  3260. addi r10,r10,32
  3261. lvx v29,r11,$sp
  3262. addi r11,r11,32
  3263. lvx v30,r10,$sp
  3264. lvx v31,r11,$sp
  3265. $POP r26,`$FRAME+21*16+0*$SIZE_T`($sp)
  3266. $POP r27,`$FRAME+21*16+1*$SIZE_T`($sp)
  3267. $POP r28,`$FRAME+21*16+2*$SIZE_T`($sp)
  3268. $POP r29,`$FRAME+21*16+3*$SIZE_T`($sp)
  3269. $POP r30,`$FRAME+21*16+4*$SIZE_T`($sp)
  3270. $POP r31,`$FRAME+21*16+5*$SIZE_T`($sp)
  3271. addi $sp,$sp,`$FRAME+21*16+6*$SIZE_T`
  3272. blr
  3273. .long 0
  3274. .byte 0,12,0x04,1,0x80,6,6,0
  3275. .long 0
  3276. .align 5
  3277. _aesp8_xts_dec5x:
  3278. vncipher $out0,$out0,v24
  3279. vncipher $out1,$out1,v24
  3280. vncipher $out2,$out2,v24
  3281. vncipher $out3,$out3,v24
  3282. vncipher $out4,$out4,v24
  3283. lvx v24,$x20,$key_ # round[3]
  3284. addi $key_,$key_,0x20
  3285. vncipher $out0,$out0,v25
  3286. vncipher $out1,$out1,v25
  3287. vncipher $out2,$out2,v25
  3288. vncipher $out3,$out3,v25
  3289. vncipher $out4,$out4,v25
  3290. lvx v25,$x10,$key_ # round[4]
  3291. bdnz _aesp8_xts_dec5x
  3292. subi r0,$taillen,1
  3293. vncipher $out0,$out0,v24
  3294. vncipher $out1,$out1,v24
  3295. vncipher $out2,$out2,v24
  3296. vncipher $out3,$out3,v24
  3297. vncipher $out4,$out4,v24
  3298. andi. r0,r0,16
  3299. cmpwi $taillen,0
  3300. vncipher $out0,$out0,v25
  3301. vncipher $out1,$out1,v25
  3302. vncipher $out2,$out2,v25
  3303. vncipher $out3,$out3,v25
  3304. vncipher $out4,$out4,v25
  3305. vxor $twk0,$twk0,v31
  3306. sub $inp,$inp,r0
  3307. vncipher $out0,$out0,v26
  3308. vncipher $out1,$out1,v26
  3309. vncipher $out2,$out2,v26
  3310. vncipher $out3,$out3,v26
  3311. vncipher $out4,$out4,v26
  3312. vxor $in1,$twk1,v31
  3313. vncipher $out0,$out0,v27
  3314. lvx_u $in0,0,$inp
  3315. vncipher $out1,$out1,v27
  3316. vncipher $out2,$out2,v27
  3317. vncipher $out3,$out3,v27
  3318. vncipher $out4,$out4,v27
  3319. vxor $in2,$twk2,v31
  3320. addi $key_,$sp,$FRAME+15 # rewind $key_
  3321. vncipher $out0,$out0,v28
  3322. vncipher $out1,$out1,v28
  3323. vncipher $out2,$out2,v28
  3324. vncipher $out3,$out3,v28
  3325. vncipher $out4,$out4,v28
  3326. lvx v24,$x00,$key_ # re-pre-load round[1]
  3327. vxor $in3,$twk3,v31
  3328. vncipher $out0,$out0,v29
  3329. le?vperm $in0,$in0,$in0,$leperm
  3330. vncipher $out1,$out1,v29
  3331. vncipher $out2,$out2,v29
  3332. vncipher $out3,$out3,v29
  3333. vncipher $out4,$out4,v29
  3334. lvx v25,$x10,$key_ # re-pre-load round[2]
  3335. vxor $in4,$twk4,v31
  3336. vncipher $out0,$out0,v30
  3337. vncipher $out1,$out1,v30
  3338. vncipher $out2,$out2,v30
  3339. vncipher $out3,$out3,v30
  3340. vncipher $out4,$out4,v30
  3341. vncipherlast $out0,$out0,$twk0
  3342. vncipherlast $out1,$out1,$in1
  3343. vncipherlast $out2,$out2,$in2
  3344. vncipherlast $out3,$out3,$in3
  3345. vncipherlast $out4,$out4,$in4
  3346. mtctr $rounds
  3347. blr
  3348. .long 0
  3349. .byte 0,12,0x14,0,0,0,0,0
  3350. ___
  3351. }} }}}
  3352. my $consts=1;
  3353. foreach(split("\n",$code)) {
  3354. s/\`([^\`]*)\`/eval($1)/geo;
  3355. # constants table endian-specific conversion
  3356. if ($consts && m/\.(long|byte)\s+(.+)\s+(\?[a-z]*)$/o) {
  3357. my $conv=$3;
  3358. my @bytes=();
  3359. # convert to endian-agnostic format
  3360. if ($1 eq "long") {
  3361. foreach (split(/,\s*/,$2)) {
  3362. my $l = /^0/?oct:int;
  3363. push @bytes,($l>>24)&0xff,($l>>16)&0xff,($l>>8)&0xff,$l&0xff;
  3364. }
  3365. } else {
  3366. @bytes = map(/^0/?oct:int,split(/,\s*/,$2));
  3367. }
  3368. # little-endian conversion
  3369. if ($flavour =~ /le$/o) {
  3370. SWITCH: for($conv) {
  3371. /\?inv/ && do { @bytes=map($_^0xf,@bytes); last; };
  3372. /\?rev/ && do { @bytes=reverse(@bytes); last; };
  3373. }
  3374. }
  3375. #emit
  3376. print ".byte\t",join(',',map (sprintf("0x%02x",$_),@bytes)),"\n";
  3377. next;
  3378. }
  3379. $consts=0 if (m/Lconsts:/o); # end of table
  3380. # instructions prefixed with '?' are endian-specific and need
  3381. # to be adjusted accordingly...
  3382. if ($flavour =~ /le$/o) { # little-endian
  3383. s/le\?//o or
  3384. s/be\?/#be#/o or
  3385. s/\?lvsr/lvsl/o or
  3386. s/\?lvsl/lvsr/o or
  3387. s/\?(vperm\s+v[0-9]+,\s*)(v[0-9]+,\s*)(v[0-9]+,\s*)(v[0-9]+)/$1$3$2$4/o or
  3388. s/\?(vsldoi\s+v[0-9]+,\s*)(v[0-9]+,)\s*(v[0-9]+,\s*)([0-9]+)/$1$3$2 16-$4/o or
  3389. s/\?(vspltw\s+v[0-9]+,\s*)(v[0-9]+,)\s*([0-9])/$1$2 3-$3/o;
  3390. } else { # big-endian
  3391. s/le\?/#le#/o or
  3392. s/be\?//o or
  3393. s/\?([a-z]+)/$1/o;
  3394. }
  3395. print $_,"\n";
  3396. }
  3397. close STDOUT;