What the heck is AEAD again?

2 months ago 3

Here’s a problem you might be familiar with: I keep forgetting what AEAD exactly means and why you would ever use it. Yes, I know the acronym stands for “Authenticated Encryption with Associated Data”, but does that really clarify anything? Not to me, so I’ve finally decided to sit down and write this blog post as a piece of help for my future self… and for anyone else who finds AEAD hard to retain.

Why bother at all?

Simply put, AEAD encryption is the current industry standard. That sounds like a good reason to bother, at least if you care about Understanding Your Building Blocks. You don’t have to take my word for it, though. Below are some relevant data points:

In TLS 1.3, released in 2018, “all ciphers are modeled as Authenticated Encryption with Associated Data (AEAD)” (see RFC 8446).
Following TLS, the QUIC protocol (which underlies HTTP/3), requires AEAD as well (see RFC 9001).
Google’s Tink cryptography library exclusively supports AEAD cipher modes when encrypting data (see choose a primitive and list of available primitives).

The list could be longer, but this is hopefully enough to prove AEAD is here to stay. Or as Thomas Ptacek put it in his famous Cryptographic Right Answers: “[AEAD] is the only way you want to encrypt in 2015”. (Yes, that was 10 years ago.)

Part 1 - Authenticated Encryption

Authenticating what?

When I think of authentication, the association in my mind is that of logging in to a website. In cryptography, however, authentication means proving the encrypted message is authentic, i.e. that it wasn’t altered after encryption and thus originates in its entirety from someone with access to the secret key.

Authentication is not merely a “nice to have” feature, as you might initially think. It’s often a basic¹ condition² for the security of the system. In some cases, for instance, lack of authentication can let an interceptor decrypt messages even without having the secret key!

Towards sane(r) defaults

Back when I first studied cryptography³, it was common practice to perform encryption and authentication in separate steps. You would pick an encryption scheme (e.g. AES-256 in CBC mode), an authentication scheme (e.g. HMAC-SHA256), and carefully knit them together in your code to ensure everything was properly authenticated⁴.

The following pseudo-code shows the encryption and decryption process from those days:

# Sender: encrypt and generate authentication tag (nonce, ciphertext) = encrypt(key, "hello world") tag = hmac(key, nonce + ciphertext) send(nonce, ciphertext, tag) # Receiver: verify authentication tag and decrypt (nonce, ciphertext, tag) = receive() assert tag == hmac(key, nonce + ciphertext) assert decrypt(key, nonce, ciphertext) == "hello world"

Quite a mouthful, isn’t it? Not as simple as merely calling encrypt and decrypt. No wonder people often messed up, like in the case of Apple’s iMessage vulnerability caused by… failing to include an authentication step altogether! By the way, even if you remember to authenticate, you still need to apply encryption and authentication in the right order, or the The Cryptographic Doom Principle will come for you.

Fortunately, in the last decade industry has introduced primitives that are more resistant to misuse. Have a look at the pseudo-code below (a simplified version of libsodium’s crypto_secretbox_easy functions):

# Sender: encrypt, including an authentication tag in the ciphertext (nonce, ciphertext) = encrypt_auth(key, "hello world") send(nonce, ciphertext). # Receiver: verify message authenticity and decrypt # (`decrypt` throws an exception if verification fails) (nonce, ciphertext) = receive() assert decrypt_auth(key, nonce, ciphertext) == "hello world"

Nice, isn’t it? Under the hood, this API is still using separate steps for encryption and authentication, but users of the API can’t mess up anymore. For someone like me, who leans heavily on an API’s design to guide me towards writing correct code, this is way better than older APIs that let you shoot yourself in the foot.

Part 2 - Associated Data

But why?

We have adopted authenticated encryption. Isn’t that enough to keep our messages secret? What’s all the “associated data” fuss about? Why the extra complexity?

Authenticated encryption is indeed enough to keep messages secret, but it turns out that you often need to send unencrypted data together with your encrypted message. That piece of unencrypted data is what cryptographers mean by “associated data”. Let me illustrate this with an example.

Imagine, for instance, you are developing a multi-user chat application. When two users engage in a conversation, they negotiate a secret key and start exchanging messages through a server. As you might expect, the server is unable to see the content of the messages, since they are encrypted. Still, when new messages get sent, the server needs access to the user id of the receiver to properly route a message to them. For that purpose, when an encrypted message is sent from the client, it also includes the unencrypted receiver’s user id. In other words, the receiver’s user id is sent as associated data of the encrypted message.

Now what happens if a man-in-the-middle intercepts the message and replaces the original receiver’s user id with a different user id? There are two possibilities:

If the chat protocol authenticates the user id and not only the encrypted message: the server will detect that the user id has been tampered with and drop the message as invalid.
Otherwise: the server will not detect that the user id has been tampered with and will happily forward the message to the new receiver. This is unintended! Luckily in this case, the final receiver would still be unable to read the message, because it’s encrypted with a key unknown to them. However, the point stands that lack of authentication of associated data exposes you to the creativity of an attacker, which could lead to a more serious security issue.

Let’s authenticate

Similar to authenticating an encrypted array of bytes, we can use an authentication scheme (e.g. HMAC-SHA256) to authenticate an encrypted message together with its associated data. Something like:

# Sender: encrypt and send together with tagged associated data associated_data = "an unencrypted string" (nonce, ciphertext) = encrypt(key, "hello world") tag = hmac(key, nonce + ciphertext + associated_data) send(nonce, ciphertext, associated_data, tag) # Receiver: verify encrypted and associated data, then decrypt (nonce, ciphertext, associated_data, tag) = receive() assert tag == hmac(key, nonce + ciphertext + associated_data) assert decrypt(key, nonce, ciphertext) == "hello world"

Quite a mouthful again, right? In fact, this looks complex enough in my eyes that I’m not even confident it’s correct… Couldn’t cryptography libraries make our lives easier? I’d rather trust them than my own code for something like this.

AEAD to the rescue

As I mentioned above, the industry has moved to primitives that are more resistant to misuse. The same libsodium library we referred to before provides encryption functions that authenticate both the encrypted bits and the associated data. Does that sound familiar? We are finally talking about Authenticated Encryption with Associated Data!

Let’s look at it in more detail. The simplified pseudo-code below has been adapted from libsodium and illustrates AEAD usage in practice:

# Sender: encrypt, including an authentication tag in the ciphertext # The authentication tag applies to both the encrypted bits and the unencrypted associated data. associated_data = "an unencrypted string" (nonce, ciphertext) = encrypt_aead(key, "hello world", associated_data) # Receiver: verify message authenticity and decrypt # (`decrypt` throws an exception if verification fails for the encrypted bits or the associated data) (nonce, ciphertext, associated_data) = receive() assert decrypt_aead(key, nonce, ciphertext, associated_data) == "hello world"

As you can see, the API now “forces” us to authenticate the encrypted bits and the associated data, preventing a wide range of mistakes. You can still introduce bugs if you try hard enough, but the API at least guides you towards the pit of success.

Part 3 - Using AEAD across libraries

What if you can’t use libsodium? Given the popularity of AEAD, multiple AEAD ciphers have been standardized, which means you can pick the one that suits you best and use it across libraries and programming languages. You might have seen names like AES256-GCM and ChaCha20-Poly1305 out there, so now comes the obvious question: which AEAD primitive should I choose?

I’m not a cryptographer, so unless I have very special requirements, I’d follow whatever Tink’s choose a primitive page recommends. Bear in mind, however, that generic cryptography advice is by definition limited. There are situations⁵ in which even Tink’s recommendation needs to be taken with a grain of salt. Hopefully your local cryptographer can help you out with their sage advice :)

The End

What the heck is AEAD again? I’m afraid I’ll have to go back to the beginning of this article and read it for a second time…

With special thanks to @ctz and @cpu, who reviewed an early draft of this article, suggested improvements, and verified my claims were accurate. I wouldn’t have dared publish it without their review! Any remaining mistakes are my own, obviously.

Read Entire Article