Loads data into 128-bit XMM registers and performs conditional swap. This is probably less useful for SIDH, but will be useful for cSIDH