This is relevant because of the enormous difference in signing speed between the regular and BDS-traversal-based xmss core.
Previous code allocated an array on the stack of mlen bytes, but it should be possible to also sign heap-space messages. By relying on the fact that sm and m fit the message + signature, we move the message so that 4*n bytes of prefix can be added.