OpenVPN authentication hardened with ARM TrustZone

Published

January 12, 2021

The goal is to connect an embedded device to VPN network. The VPN uses authentication with X.509 certificates, which means that the device needs to store securely a private key. The question is, how to protect the key from being copied? Many ideas have been explored already, in this particular case, I’ll describe the solution which uses secure enclave. The project itself is quite easy to implement and it can serve as a hands-on intro to the ARM TrustZone-based TEEs.

Earlier last year, I needed an implementation of TLS server, which stored private keys in the secure enclave, namely OP-TEE running in the Trusted Execution Environment (TEE), protected by ARM TrustZone. A Similar idea will be used here, with software stack integration, being the main difference. A Previous project integrated the solution with BoringSSL, which requires changing the internals of the library. The preferred solution would not touch the internals of the TLS library, but rather work as a form of a plugin to the existing framework. OpenSSL implements the ENGINE API, which can be (and actually is) used as a way to implement cryptographic backends.

Finally, this is what I want to end up with:

Flow for OpenVPN with private client key in TEE

The private key will be stored in a secure enclave. The OpenVPN calls OpenSSL for cryptographic operations and operations related to TLS. At the init phase, the OpenSSL will load my implementation of the ENGINE API, which I call OpTEE ENGINE. This implements a callback, that’s called by TLS stack, for message signing. Finally, the engine implementation forwards the signing to the OP-TEE, which is the place where private key operation happens.

Security

But firstly, why do I think secure storage provides enough security?

The TEE that I’m claims compliance with GlobalPlatform API. Looking at the GP requirements in this specification (see 2.2.2), the basic requirement regarding secure storage are to:

obviously, encrypt the data (provide confidentiality as well as integrity)
be bound to a device, this one is important. It means that sensitive data can be accessed only by those applications which are running on a particular device and in the particular TEE (there may be multiple TEEs on the same device).
have an ability to hide sensitive keys form the TEE process running in the TEE
allow access to the data only by the TEE application which has created it (btw: TA=Trusted Application, an application running in the TEE).

In my context, it means that VPN private key is stored encrypted and can be used only by a single device. The secure storage can be copied to a different device, but as it is bound to a particular one, it can’t be decrypted there. Key can’t be accessed by malicious TA installed on the same device thanks to access separation. Finally, the TA that owns the key doesn’t have access to the sensitive data, so in case of a bug in the TA, the key doesn’t leak. It may leak in case of bug in the TEE, but in this case, the whole system is probably already compromised.

The spec gives hope for a decent level of security. Looking at implementation details, the Key Manager is a component implemented in OP-TEE, which ensures confidentiality and integrity of the data (see implementation details of secure storage). To provide device binding it uses Hardware Unique Key (HUK), which is defined as symmetric secret key stored in a piece of hardware (often in the SoC itself) of the device and is globally unique. OP-TEE uses it to derive, a key called SSK, which is then used to provide device binding. SSK is created at boot-time and stored in secure memory (never stored on disk):

SSK = HMAC-SHA256(HUK, Chip ID || “some data as salt”)

The SSK is then used to derive TSK key which is unique per TA installed in the TEE. This provides a possibility to allow access to the data only for TA which owns it. Finally, there is a FEK, randomly generated key used for file encryption.

An Important part of this whole story, but just implementation detail: the OP-TEE, as on GitHub, doesn’t actually try to use HUK. Retrieval of the HUK is specific to the SoC and needs to be implemented during integration with the concrete device/platform. Namely, there is a function called tee_otp_get_hw_unique_key, which must be filled with proper code for HUK retrieval. Similarly, to provide secure storage, the “chip ID” needs to be also retrieved, this is done by tee_otp_get_die_id which also needs to be filled with proper code. Currently, OP-TEE uses the stream of 0 bytes, as HUK.

Finally, the secure storage kept in normal world OS filesystem (/data/tee by default on linux). This subsystem uses AES/128. My ultimate goal is to have quantum-resistant TEE and AES/128 is too small to be resist quantum attacks (because of Grover’s algorithm), hence migration to 256-bit symmetric key is needed.

TLS client authentication

The X.509 certificates are used to authenticate a client to a VPN server. In this authentication method, a client sends a certificate and a proof for possession of the private key that corresponds to that certificate. In this case, the private key never leaves TEE, hence the primary functionality of an application running in the TEE, is to create a proof when requested.

Looking at the TLS level (TLSv1.3), the client authentication starts with a server requesting it in TLS Server Hello (4.3.2. of RFC 8446). In response, the client produces a proof by creating following signature:

proof = sign(0x20 byte repeated 32 times || “TLS 1.3, client CertificateVerify” || 0 || transcript hash)

The client uses the same algorithm as the one used when signing X.509 certificate and a private key, to create a signature. Signature is created over a concatenation of strings defined in the RFC (section 4.4.3) and a TLS transcript hash (section 4.4.1). Both, the X.509 certificate and proof are sent back to the server for verification.

Secure world

The Trusted Application is mostly copied from the previous project. In the current state, it is assumed that the key is loaded to TEE at some initial point, and then it is used when Normal World requests signing. An alternative implementation, could create a private key during the first boot and use it to create CSR, which is then signed by the CA and returned to the device. It’s a more complicated process, but this way, one can ensure that the client’s private key never existed anywhere else but on the device.

The demo TA comes with a simple key management app which can be used to install or remove keys from the device. It is also a good place to see how communication from Normal World to Secure World is implemented. Assuming, the TEE is running on the device, and tee-supplicant with Linux driver is loaded in the Normal World (see here for setup), an application can use GlobalPlatform API to send/receive requests to/from TEE. The code would look somehow like that:

    // TEE context
    TEEC_Context ctx;
    // Session with the TA
    TEEC_Session sess;
    // Operation context
    TEEC_Operation op;
    // ID of an app in the TEE
    TEEC_UUID uuid = TA_UUID;

    // Initialize a context connecting us to the TEE
    TEEC_InitializeContext(NULL, &ctx);
    // Open a session to the TA identified by uuid
    TEEC_OpenSession(&ctx, &sess, &uuid,
        TEEC_LOGIN_PUBLIC, NULL, NULL, &err_origin);

    // Initialize operation context 'op' (see github)
    // ...

    // Send command to the TA running in TEE
    TEEC_InvokeCommand(&sess, TA_INSTALL_KEYS, &op, &err_origin);

After opening a session with the TEE on a line 13, the application sets op context, by providing input arguments and setting buffers for the output. Then call to TEEC_InvokeCommand will trigger communication with the TEE. During this process, TA signature verification is done the TA is started. The entry point to the TA is a function called TA_InvokeCommandEntryPoint.

TEE_Result TA_InvokeCommandEntryPoint(void __maybe_unused *sess_ctx,
            uint32_t cmd_id,
            uint32_t param_types, TEE_Param params[4]) {
    (void)&sess_ctx; /* Unused parameter */
    switch (cmd_id) {
    case TA_INSTALL_KEYS:
        return install_key(param_types, params);
    case TA_SIGN_ECC:
        return sign_ecdsa(param_types, params);
    case TA_GET_PUB_KEY:
        return get_public_key(param_types, params);
        ...
    }
}

The TA is instructed by providing cmd_id to run specific logic, like key installation, signing or returning public key (the reason for which is described in next section). When installing the key, the TA will copy private and public key attributes to temporary transient_object and then create a file on persistent storage containing those attributes. The key is identified by key_id received from Normal World.

// Puts the key to the storage
static TEE_Result install_key(uint32_t param_types, TEE_Param params[4]) {
    //...
    TEE_ObjectHandle transient_obj = TEE_HANDLE_NULL;
    // ...
    TEE_AllocateTransientObject(TEE_TYPE_ECDSA_KEYPAIR,
            ecc->x.sz * 8, &transient_obj);
    ATTR_REF(cnt, TEE_ATTR_ECC_PRIVATE_VALUE, ecc->scalar);
    ATTR_REF(cnt, TEE_ATTR_ECC_PUBLIC_VALUE_X, ecc->x);
    ATTR_REF(cnt, TEE_ATTR_ECC_PUBLIC_VALUE_Y, ecc->y);
    TEE_InitValueAttribute(&attrs[cnt++], TEE_ATTR_ECC_CURVE,ecc->curve_id, 0);
    TEE_PopulateTransientObject(transient_obj, attrs, cnt);

    ret = TEE_CreatePersistentObject(
        TEE_STORAGE_PRIVATE,
        key_id, 32,
        TEE_DATA_FLAG_ACCESS_WRITE,
        transient_obj,
        NULL, 0, &persistant_obj);
    // ...
}

When signing, the TA will initialize key_handle - the handler to the key, it’s done by calling TEE_OpenPersistentObject with the key_id. Then, key_handle is used when setting up an operation identified by op (line 13) and finally used for signing (line 14). One should notice, that private key material stays in the TEE, it is never revealed to the TA.

// Performs ECDSA signing with a key from secure storage 
static TEE_Result sign_ecds (uint32_t param_types, TEE_Param params[4]) { 
TEE_OperationHandle op = TEE_HANDLE_NULL; 
TEE_ObjectHandle key_handle;

TEE_OpenPersistentObject(
    TEE_STORAGE_PRIVATE,
    key_id, 32,
    TEE_DATA_FLAG_ACCESS_READ, &key_handle);

// perform ECDSA sigining
TEE_AllocateOperation(&op, TEE_ALG_ECDSA_P256, TEE_MODE_SIGN, 256);
TEE_SetOperationKey(op, key_handle);
TEE_AsymmetricSignDigest(op, NULL, 0,
    params[1].memref.buffer, params[1].memref.size,
    params[2].memref.buffer, &params[2].memref.size);
LOG_RET(ret);

}

The demo code (here) supports ECDSA/p256 only but can be easily extended to provide support for all the schemes used by TLS v1.3.

OpenSSL engine for OP-TEE

One of the goals for this project was the ease the integration with the TLS layer. It should be possible to provide whole functionality as a plugin loaded to any modern version of OpenSSL, code modifications. OpenSSL provides the possibility to extend functionalities by implementing, so-called, ENGINE API. The dynamically loadable library may implement some cryptographic operations (like signing, verification, key generation) and register it by calling ENGINE’s API. When processing a cryptographic operation the OpenSSL uses custom implementation if provided. The general architecture and guide to build OpenSSL engines can be found in an excellent paper called Start your ENGINEs: dynamically loadable contemporary crypto.

In case of engine for OP-TEE, the code structure looks briefly like:

static int OPTEE_ENG_bind(ENGINE *e, const char *id) {
    // ... some initialization code ...

    // Set name and ID of an engine
    ENGINE_set_id(e, OPTEE_ENG_ENGINE_ID);
    ENGINE_set_name(e, OPTEE_ENG_ENGINE_NAME);
    // Call OPTEE_ENG_load_private_key to load the private key
    ENGINE_set_load_privkey_function(e, OPTEE_ENG_load_private_key));
    // Register callback for signing
    ENGINE_set_pkey_meths(e, OPTEE_ENG_pkey_meths);
}
static int OPTEE_ENG_pkey_meths(ENGINE *e, EVP_PKEY_METHOD **pmeth,
    const int **nids, int nid) {
    // Use EVP_PKEY_meth_copy to copy all the callbacks to new_meth
    EVP_PKEY_METHOD *new_meth = EVP_PKEY_meth_new(EVP_PKEY_EC, 0);
    EVP_PKEY_meth_copy(new_meth, EVP_PKEY_meth_find(EVP_PKEY_EC));
    // Set new callback for signing
    EVP_PKEY_meth_set_sign(new_meth, 0, OPTEE_ENG_evp_cb_sign);
    // Return new EVP_PKEY_METHOD struture
    *pmeth = new_meth;
    return 1;
}

// Tell the OpenSSL to call OPTEE_ENG_bind when plugin is loaded
IMPLEMENT_DYNAMIC_BIND_FN(OPTEE_ENG_bind)
IMPLEMENT_DYNAMIC_CHECK_FN()

The OP-TEE engine adds to the OpenSSL with 2 following custom implementations. The OPTEE_ENG_load_private_key extends the functionality of theENGINE_load_private_key function. The former is an ENGINE API function used by the OpenVPN to load private keys. The custom implementation, provided by the optee_eng, checks if a key with the given ID exists in the TEE. It returns initialized EVP_PKEY object, used by the OpenSSL for message signing, during TLS session establishment. Contrary to standard implementation, EVP_PKEY object returned by optee_eng doesn’t store the private key material instead, it keeps an ID corresponding to the private key.

The second functionality is implemented by OPTEE_ENG_evp_cb_sign. This function gets invoked when signing is requested for a key returned by OPTEE_ENG_load_private_key. The EVP_PKEY contains a list of function pointers, implementing singing, verification, key generation, etc. This callback is assigned to a pointer for message signing. Implementation of this function, calls TA in the TEE with an ID of a key and a message to sign. Then control is transferred to sign_ecdsa function implemented by the TA, which initializes handle to the key and calls TEE OS to perform performs ECDSA/p256 signing.

The IMPLEMENT_DYNAMIC_BIND_FN macro binds everything together. It defines an entry point of an engine - a first function that gets executed when the library is loaded to the OpenSSL (OPTEE_ENG_bind in this case). The function sets an identifier and name of an engine and uses ENGINE API to assign the callbacks (line 8 and 18 in the code listing above).

Side note: In case of the private key, the OpenSSL v1.1.1 requires that EVP_PKEY structure contains a public part of a key, otherwise loading of the certificate fails and TLS client won’t be able to initialize the connection. In this program, the public part is stored also in the TEE.

Ok, so dynamic engine provides implementation, but OpenSSL needs to somehow know how to load such a library. Following configuration can be added to the OpenSSL’s config file (/etc/ssl/openssl.cnf on my Linux), so that framework knows where to find the dynamic library when requesting engine load by ID `` in this case.

# Additional content of openssl.cnf

[default_conf]
engines = engine_section

[engine_section]
optee = optee_section

[optee_section]
engine_id = optee
dynamic_path = "/opt/liboptee_eng.so"
init = 1

Let’s try, if it works. On qemu emulating ARMv8 machine I now get:

qemu> openssl
OpenSSL> engine -c -v optee
(optee) OpTEE OpenSSL ENGINE.
 [id-ecPublicKey]

Seems engine can be loaded correctly. Now, when OpenSSL tries to sign a message it needs to do a call to TEE (which is an SMC call to switch CPU into the secure world), get a key from secure storage and return the signature. Also, a crypto operation is now not done by OpenSSL, but by crypto library provided by the OP-TEE OS (in this case it is a fork of LibTomCrypt). All in all, there is a cost of all that dance. Measuring this difference will give some idea and also is a good way to check if the whole flow works correctly. That is done by speed.cc located in the project’s repository. Benchmark runs 2 functions, the SignREE performs signing, fully in the Normal World by calling pure OpenSSL implementation and SignTEE uses optee_eng for singing. I’ve got the following results when running it on HiKey960 (ARM Cortex-A73).

The operation works correctly - the benchmarking code loads optee_eng ENGINE into vanilla OpenSSL and uses only EVP_API. Nevertheless, the slowdown is significant. At this point I need to say, that I haven’t done any more investigation, hence I’m not sure where the slow down comes from exactly. I’ve run release version of the software and used similar settings for the board as described here. I’m pretty sure optimization level in OpenSSL is much better than in LibTomCrypt, hence there is probably lots of room for improvement.

Side note: the benchmark expects to find in TEE a key with a name bench_key. It must be inserted by using key management app optee_keymgnt put bench_key <PEM_file_with_a_key>.

Plugging to the OpenVPN

At this point integration with the OpenVPN is very easy. The only requirement is a version 2.5 of the software (which includes this change). That change adds the possibility to use OpenSSL ENGINE to load private key, what’s needed here.

There is a trick that needs to be used here to configure OpenVPN correctly. So, the configuration file specifies has a key parameter which specifies the name of the file with the private key, corresponding to the certificate provided by cert parameter. In case of optee_eng, this is a name of the key stored in the TEE (this name is provided to ENGINE_load_private_key as key_id argument). Additionally, file with the same name must exist in the OpenVPN configuration directory. The OpenVPN will try to use the engine to load a key, only if loading from the file fails. So the file needs to be empty, to make sure the load of a key fails. The configuration needs to also specify engine parameter, to instruct OpenSSL to use the optee_eng. Whole configuration file as used on the client can be found here.

Setting-up OP-TEE image, building & running

The code of the solution is available on github. It was tested against OP-TEE 3.11, running in QEMU and on HiKey960 development board. To build and play with the solution, one requires first to build the OP-TEE itself (instructions here).

To compile the solution:

git clone https://github.com/henrydcase/optee_eng
cd optee_eng
git submodule init && git submodule update
mkdir build && cd build
cmake -DOPTEE_BUILD_DIR=<OPTEE location> -DPLATFORM=qemu ..
make
make install

The <OPTEE location> is a root directory for OPTEE. The -DPLATFORM specifies a platform for which solution should be built. I’ll use QEMU in this example. The make install command will copy all needed files to the OP-TEE’s build directory.

OP-TEE uses buildroot to create Normal World OS, where examples can be run. By default, the OpenVPN is not enabled. It can be done by applying 2 patches from optee_eng repo:

cd optee_eng
patch -p1 -d <OPTEE location>/buildroot < optee-patches/0001-openvpn-2.4.9-to-2.5.0.patch
patch -p1 -d <OPTEE location>/build < 0002_build_enable_openvpn.patch
cd <OPTEE location>/build
make run

To connect to the VPN, we need a server. The repository contains configuration for server and client, as well as a set of X.509 certificates (to regenerate certificates the create_cert.sh can be used). The command below configures and starts OpenVPN server on the host machine.

> cd optee_eng
> sudo openvpn --cd cfg --config openvpn_srv.conf
2021-02-06 23:39:53 OpenVPN 2.5.0 [git:makepkg/a73072d8f780e888+] x86_64-pc-linux-gnu [SSL (OpenSSL)] [LZO] [LZ4] [EPOLL] [PKCS11] [MH/PKTINFO] [AEAD] built on Nov  6 2020
2021-02-06 23:39:53 library versions: OpenSSL 1.1.1h  22 Sep 2020, LZO 2.10
2021-02-06 23:39:53 TUN/TAP device tun0 opened
2021-02-06 23:39:53 net_iface_mtu_set: mtu 1500 for tun0
2021-02-06 23:39:53 net_iface_up: set tun0 up
2021-02-06 23:39:53 net_addr_v4_add: 172.16.0.1/16 dev tun0
2021-02-06 23:39:53 UDPv4 link local (bound): [AF_INET][undef]:1194
2021-02-06 23:39:53 UDPv4 link remote: [AF_UNSPEC]
2021-02-06 23:39:53 Initialization Sequence Completed

Once QEMU is started and the user is log-in as root in NWd terminal, the tee-supplicant -d needs to be started. The supplicant makes it possible to communicate from Normal World to Secure World. Then next thing to do is to, is to insert a client key into TEE and start VPN.

> optee_keymgnt put vpn.testlab.com /etc/openvpn/certs/client.key
> rm /etc/openvpn/certs/client.key
> openvpn --cd /etc/openvpn/ --config client.conf
2021-02-09 00:27:27 Initializing OpenSSL support for engine 'optee'
2021-02-09 00:27:27 OpenSSL: error:0909006C:PEM routines:get_name:no start line
2021-02-09 00:27:27 PEM_read_bio failed, now trying engine method to load private key
2021-02-09 00:27:27 TCP/UDP: Preserving recently used remote address: [AF_INET]172.16.0.1:1194
...
2021-02-09 00:27:28 [vpn.testlab.com] Peer Connection Initiated with [AF_INET]172.16.0.1:1194
...

The second terminal displays logs from TEE OS running in parallel to Linux. One should see the following traces there:

# When inserting a key to the TEE
I/TA: New key [F671A1B757] registered
# During TLS handshake
I/TA: Sign for a key ID [F671A1B757] requested
I/TA: Message signed with key ID [F671A1B757]

At this point the VPN tunnel should be correctly created.

Conclusion

Hopefully, this example shows how to utilize ARM TrustZone from OpenSSL to secure private keys for OpenVPN. Ideas similar to implemented by optee_eng can be used with any software using OpenSSL - the same engine can be used by Nginx, ssh-agent or strongSwan on both client and server-side. The solution is fully “pluggable”, it doesn’t require any modification to existing software. It’s worth to notice that such isolation of private keys from internet-facing applications, may help to avoid security incidents. For example, it would be enough to use optee_eng to avoid hearthbleed, as the private key is not stored in the process running OpenSSL library.

As an improvement to this idea, one could think of using PKCS#11 standard for communication with TEE. It wasn’t done here for 2 reasons - PKCS#11 would require TA implementing the standard, which is not finished yet (but ongoing). The other reason is that my ultimate goal (which wasn’t presented here) is to use post-quantum cryptography. Those new schemes are not yet incorporated properly into PKCS#11 standard.

Finally, upcoming OpenSSL 3.0 removes support for ENGINE API completely. Instead, there is a new concept called providers. Hence, implementation of optee_eng for the upcoming version of OpenSSL will look probably slightly different. But from one hand OpenVPN doesn’t support this new version yet and from the other hand, it doesn’t seem 3.0 provides yet similar functionality for loading private keys.