In my last cryptography post, I wrote about using message authentication codes
(MACs) as a way of guaranteeing message integrity. To review briefly, most ciphers
are designed to provide message confidentiality - which means that no one but the
sender and the intended receiver can see the plain-text of the message. But
ciphers that provide confidentiality don't necessarily make any guarantees that
the message received is exactly the message that was sent. There are a good number
of cryptographic attacks that work by altering the message in transit, and
depending on the cipher, that can result in a variety of undesirable
For example, if you use DES encryption with the ECB mode of operation,
you can insert new blocks anywhere in a message that you want. By using
a replay attack (where you take encrypted blocks from other messages using
the same encryption, and resend them), an attacker can alter your messages, and
you won't be able to detect it.
So in addition to just confidentiality, we need to provide integrity. What does integrity really mean? Basically, it expands the definition of the
decryption function. Written as a function signature, confidential message
decryption is a function
decrypt : ciphertext × key → plaintext. With message integrity, we add the
option that decrypt can return a result saying that the message is invalid:
decryptinteg : ciphertext × key → (plaintext | REJECT).
In the last post, I explained the basic concept behind integrity schemes: you add an addition chunk of data to the message, which summarizes the message; any change to the message will (with a very high degree of probability) result in
a change in the authentication code; so if the message is altered, that change
will be detected, because the authentication code will not match the rest
of the message.
The idea of the message authentication code seems quite simple. We know lots
of pseudo-random functions that can do a good job as MAC functions. But like
so many things in cryptography, the fact that the concept is simple doesn't
mean that the actual application of that concept is simple: it's incredibly
easy to get it disastrously wrong, even when all of the building blocks are right.
For example, let's look at a very simple (and very weak) integrity
system, called CBC-MAC.
In CBC-MAC, you compute an authentication code by running a your
block cipher on the message with an initialization vector set
to 0. The output for encrypting the last block is your MAC.
The reason that this works is fairly simple - CBC keeps pushing the
result from the previous encryption step through to the next one - so
the output from the final block conceptually includes information
from all of the other blocks. So you've got an authentication code
whose security is, roughly, equivalent to the security of the block
cipher used in CBC mode.
CBC-MAC isn't a bad mode of operation. It's got serious flaws - the biggest
one is that there are a number of attacks against it if it's used for
variable length messages. (And you can't fix it just by adding a message
length indicator to the message, it's deeper than that.) But CBC is
a pretty good integrity mode for fixed-length messages, and there's a widely
used variant, called CMAC, which works for variable length messages.
So how can this go wrong?
What do you use as the password to the MAC computation? CBC is
a secure cryptographic mode, so why not just use the same password? Lots
of novices (and even some not-no-novice folks!) have implemented CBC-MAC
where the integrity and confidentiality modes used the same key. If you
compute the MAC using the same key as the message, then you're open to
an attack which completely voids the integrity guarantee!
Imagine a message M, which is divided into n blocks: (M1, ..., Mn). For convenience, we'll treat the IV as if it were ciphertext block 0. Tuen using CBC with a key K, M encrypts to ciphertext blocks
(C1, ..., CK), where C_i=Encrypt(Mi ⊕ Ci-1, K), plus a CBC-MAC integrity block. If you look
at the CBC block, if it's using the same key as the main encryption,
then the value of the MAC is the result of computing
Encrypt(Mn ⊕Cn-1, k). That value
is exactly the same as CN! So it's trivial
to fake the MAC. If you use CBC-MAC with the same encryption key as
you used to encrypt the message, you're wasting your time: you might as well
not bother with the MAC at all, because you've got no integrity guarantee.
But that's an amazingly common mistake. A huge number of implementations
of supposedly secure cryptosystems that claim to guarantee integrity have
actually used CBC-MAC with one key.
I mentioned above that CBC-MAC is only safe for fixed-length messages. The reason for that is closely related to the trick I described above. Basically, the nature of CBC ultimately means that given a MAC value M, it's not hard to generate a string to append to a message which will result in it having the desired
MAC. Suppose we know that the message M will generate the mac TM,
and we know that the block we want to replace the message with, N has
the MAC TN. Then we can easily generate a string N', such that
appending MAC(M,N) = MAC(N). All we do is XOR the first block of N
with TM, and add it to the message. The new message
is is (M1,...,Mn,(N1⊕TN),
N2,...,Nn) - that is, we can remove the MAC from the
first message, use it to glue the two messages together, and tack TN on to the message as the MAC. TN, the MAC computed for T alone, will
be the MAC for the combined message.
It's easy to do that if we're allowed to change the length of the message. If the receiver doesn't know how long the message is going to be, then they can't
count on its integrity using CBC-MAC.
Next time, I'll look at a variation of CBC-MAC that provides integrity
for variable-length messages using 2 keys - one for confidentiality, and
one for integrity. Then I'll go on to a two-pass single-key mode of operation
that provides both confidentiality and integrity.