Вы не можете выбрать более 25 тем Темы должны начинаться с буквы или цифры, могут содержать дефисы(-) и должны содержать не более 35 символов.

287 строки
7.3 KiB

  1. #include "asm-common.h"
  2. .arch armv8-a+crypto
  3. .extern F(abort)
  4. .extern F(rijndael_rcon)
  5. .text
  6. ///--------------------------------------------------------------------------
  7. /// Main code.
  8. /// The ARM crypto extension implements a little-endian version of AES
  9. /// (though the manual doesn't actually spell this out and you have to
  10. /// experiment)a, note that internal interface presents as big-endian so
  11. /// as to work better with things like GCM. We therefore maintain the round
  12. /// keys in little-endian form, and have to end-swap blocks in and out.
  13. ///
  14. /// For added amusement, the crypto extension doesn't implement the larger-
  15. /// block versions of Rijndael, so we have to end-swap the keys if we're
  16. /// preparing for one of those.
  17. // Useful constants.
  18. .equ maxrounds, 16 // maximum number of rounds
  19. .equ maxblksz, 32 // maximum block size, in bytes
  20. .equ kbufsz, maxblksz*(maxrounds + 1) // size of key-sched buffer
  21. // Context structure.
  22. .equ nr, 0 // number of rounds
  23. .equ w, nr + 4 // encryption key words
  24. .equ wi, w + kbufsz // decryption key words
  25. ///--------------------------------------------------------------------------
  26. /// Key setup.
  27. FUNC(rijndael_setup_arm64_crypto)
  28. // Arguments:
  29. // x0 = pointer to context
  30. // w1 = block size in 32-bit words
  31. // x2 = pointer to key material
  32. // x3 = key size in words
  33. pushreg x29, x30
  34. mov x29, sp
  35. // The initial round key material is taken directly from the input
  36. // key, so copy it over. Unfortunately, the key material is not
  37. // guaranteed to be aligned in any especially useful way. Assume
  38. // that alignment traps are not enabled. (Why would they be? On
  39. // A32, alignment traps were part of a transition plan which changed
  40. // the way unaligned loads and stores behaved, but there's never been
  41. // any other behaviour on A64.)
  42. mov x15, x3
  43. add x4, x0, #w
  44. 0: sub x15, x15, #1
  45. ldr w14, [x2], #4
  46. str w14, [x4], #4
  47. cbnz x15, 0b
  48. // Find out other useful things and prepare for the main loop.
  49. 9: ldr w9, [x0, #nr] // number of rounds
  50. madd w2, w1, w9, w1 // total key size in words
  51. leaext x5, rijndael_rcon // round constants
  52. sub x6, x2, x3 // minus what we've copied already
  53. add x7, x0, #w // position in previous cycle
  54. movi v1.4s, #0 // all-zero register for the key
  55. mov x8, #0 // position in current cycle
  56. // Main key expansion loop. Dispatch according to the position in
  57. // the cycle.
  58. 0: ldr w15, [x7], #4 // word from previous cycle
  59. cbz x8, 1f // first word of the cycle?
  60. cmp x8, #4 // fourth word of the cycle?
  61. b.ne 2f
  62. cmp x3, #7 // seven or eight words of key?
  63. b.cc 2f
  64. // Fourth word of the cycle, seven or eight words of key. We must do
  65. // the byte substitution.
  66. dup v0.4s, w14
  67. aese v0.16b, v1.16b // effectively, just SubBytes
  68. mov w14, v0.s[0]
  69. b 2f
  70. // First word of the cycle. Byte substitution, rotation, and round
  71. // constant.
  72. 1: ldrb w13, [x5], #1 // next round constant
  73. dup v0.4s, w14
  74. aese v0.16b, v1.16b // effectively, just SubBytes
  75. mov w14, v0.s[0]
  76. eor w14, w13, w14, ror #8
  77. // Common ending: mix in the word from the previous cycle and store.
  78. 2: eor w14, w14, w15
  79. str w14, [x4], #4
  80. // Prepare for the next iteration. If we're done, then stop; if
  81. // we've finished a cycle then reset the counter.
  82. add x8, x8, #1
  83. sub x6, x6, #1
  84. cmp x8, x3
  85. cbz x6, 9f
  86. cmov.cs x8, xzr
  87. b 0b
  88. // Next job is to construct the decryption keys. The keys for the
  89. // first and last rounds don't need to be mangled, but the remaining
  90. // ones do -- and they all need to be reordered too.
  91. //
  92. // The plan of action, then, is to copy the final encryption round's
  93. // keys into place first, then to do each of the intermediate rounds
  94. // in reverse order, and finally do the first round.
  95. //
  96. // Do all the heavy lifting with the vector registers. The order
  97. // we're doing this in means that it's OK if we read or write too
  98. // much, and there's easily enough buffer space for the
  99. // over-enthusiastic reads and writes because the context has space
  100. // for 32-byte blocks, which is our maximum and an exact fit for two
  101. // full-width registers.
  102. 9: add x5, x0, #wi
  103. add x4, x0, #w
  104. add x4, x4, w2, uxtw #2
  105. sub x4, x4, w1, uxtw #2 // last round's keys
  106. // Copy the last encryption round's keys.
  107. ld1 {v0.4s, v1.4s}, [x4]
  108. st1 {v0.4s, v1.4s}, [x5]
  109. // Update the loop variables and stop if we've finished.
  110. 0: sub w9, w9, #1
  111. add x5, x5, w1, uxtw #2
  112. sub x4, x4, w1, uxtw #2
  113. cbz w9, 9f
  114. // Do another middle round's keys...
  115. ld1 {v0.4s, v1.4s}, [x4]
  116. aesimc v0.16b, v0.16b
  117. aesimc v1.16b, v1.16b
  118. st1 {v0.4s, v1.4s}, [x5]
  119. b 0b
  120. // Finally do the first encryption round.
  121. 9: ld1 {v0.4s, v1.4s}, [x4]
  122. st1 {v0.4s, v1.4s}, [x5]
  123. // If the block size is not exactly four words then we must end-swap
  124. // everything. We can use fancy vector toys for this.
  125. cmp w1, #4
  126. b.eq 9f
  127. // End-swap the encryption keys.
  128. add x1, x0, #w
  129. bl endswap_block
  130. // And the decryption keys
  131. add x1, x0, #wi
  132. bl endswap_block
  133. // All done.
  134. 9: popreg x29, x30
  135. ret
  136. ENDFUNC
  137. INTFUNC(endswap_block)
  138. // End-swap w2 words starting at x1. x1 is clobbered; w2 is not.
  139. // It's OK to work in 16-byte chunks.
  140. mov w3, w2
  141. 0: subs w3, w3, #4
  142. ld1 {v0.4s}, [x1]
  143. rev32 v0.16b, v0.16b
  144. st1 {v0.4s}, [x1], #16
  145. b.hi 0b
  146. ret
  147. ENDFUNC
  148. ///--------------------------------------------------------------------------
  149. /// Encrypting and decrypting blocks.
  150. .macro encdec op, aes, mc, koff
  151. FUNC(rijndael_\op\()_arm64_crypto)
  152. // Arguments:
  153. // x0 = pointer to context
  154. // x1 = pointer to input block
  155. // x2 = pointer to output block
  156. // Set things up ready.
  157. ldr w3, [x0, #nr]
  158. add x0, x0, #\koff
  159. ld1 {v0.4s}, [x1]
  160. rev32 v0.16b, v0.16b
  161. // Check the number of rounds and dispatch.
  162. cmp w3, #14
  163. b.eq 14f
  164. cmp w3, #10
  165. b.eq 10f
  166. cmp w3, #12
  167. b.eq 12f
  168. cmp w3, #13
  169. b.eq 13f
  170. cmp w3, #11
  171. b.eq 11f
  172. callext F(abort)
  173. // Eleven rounds.
  174. 11: ld1 {v16.4s}, [x0], #16
  175. \aes v0.16b, v16.16b
  176. \mc v0.16b, v0.16b
  177. b 10f
  178. // Twelve rounds.
  179. 12: ld1 {v16.4s, v17.4s}, [x0], #32
  180. \aes v0.16b, v16.16b
  181. \mc v0.16b, v0.16b
  182. \aes v0.16b, v17.16b
  183. \mc v0.16b, v0.16b
  184. b 10f
  185. // Thirteen rounds.
  186. 13: ld1 {v16.4s-v18.4s}, [x0], #48
  187. \aes v0.16b, v16.16b
  188. \mc v0.16b, v0.16b
  189. \aes v0.16b, v17.16b
  190. \mc v0.16b, v0.16b
  191. \aes v0.16b, v18.16b
  192. \mc v0.16b, v0.16b
  193. b 10f
  194. // Fourteen rounds. (Drops through to the ten round case because
  195. // this is the next most common.)
  196. 14: ld1 {v16.4s-v19.4s}, [x0], #64
  197. \aes v0.16b, v16.16b
  198. \mc v0.16b, v0.16b
  199. \aes v0.16b, v17.16b
  200. \mc v0.16b, v0.16b
  201. \aes v0.16b, v18.16b
  202. \mc v0.16b, v0.16b
  203. \aes v0.16b, v19.16b
  204. \mc v0.16b, v0.16b
  205. // Drop through...
  206. // Ten rounds.
  207. 10: ld1 {v16.4s-v19.4s}, [x0], #64
  208. ld1 {v20.4s-v23.4s}, [x0], #64
  209. \aes v0.16b, v16.16b
  210. \mc v0.16b, v0.16b
  211. \aes v0.16b, v17.16b
  212. \mc v0.16b, v0.16b
  213. \aes v0.16b, v18.16b
  214. \mc v0.16b, v0.16b
  215. \aes v0.16b, v19.16b
  216. \mc v0.16b, v0.16b
  217. ld1 {v16.4s-v18.4s}, [x0], #48
  218. \aes v0.16b, v20.16b
  219. \mc v0.16b, v0.16b
  220. \aes v0.16b, v21.16b
  221. \mc v0.16b, v0.16b
  222. \aes v0.16b, v22.16b
  223. \mc v0.16b, v0.16b
  224. \aes v0.16b, v23.16b
  225. \mc v0.16b, v0.16b
  226. // Final round has no MixColumns, but is followed by final whitening.
  227. \aes v0.16b, v16.16b
  228. \mc v0.16b, v0.16b
  229. \aes v0.16b, v17.16b
  230. eor v0.16b, v0.16b, v18.16b
  231. // All done.
  232. rev32 v0.16b, v0.16b
  233. st1 {v0.4s}, [x2]
  234. ret
  235. ENDFUNC
  236. .endm
  237. encdec eblk, aese, aesmc, w
  238. encdec dblk, aesd, aesimc, wi
  239. ///----- That's all, folks --------------------------------------------------