Algorithmic Assertions - Craig Gidney's Computer Science Blog

Catching a Crypto Bug in RedPhone

22 Feb 2016

RedPhone, now called Signal, is a well regarded open-source cryptographic phone-call/text-message app for iOS and Android. I use it all the time; it's great. RedPhone / Signal is developed and maintained by Open Whisper Systems.

Back in 2013, I was working on porting RedPhone to iOS. While doing that, I caught a head-slapper crypto bug in the Android version's ZRTP handshake. The bug didn't create any serious attacks, and was promptly fixed.

In this post: how I stumbled onto the bug, what it was, and lessons I learned from the experience.

Discovery

I found the bug because I was testing my implementation of the ZRTP handshake (which I'd written as part of the porting process). I'd captured a trace of the packets sent by the Android code, and I was writing unit tests asserting that the iOS code (when fed the same entropy) was generating packets bit-for-bit identical to the captured packet data.

The first thing I noticed wasn't a security bug, it was a spec compliance bug. Packets created by the android code had data 12 bytes to the right of where the spec said it should be. The Android code generates packets by manipulating a raw byte buffer, and apparently someone made a mistake when adding up one of the first few offset constants. That initial mistake spread to all the other offset constants.

Accidental shifting highlights a risk of doing parsing/packing via direct buffer manipulation, and of only testing your implementation against itself instead of a reference implementation, but it isn't a security issue. Ultimately, the exact same information was being communicated. Fixing the off-by-12 bug required writing gross legacy interop code, but that's all.

The real shock happened when I got around to testing the Confirm messages. The Android and iOS code disagreed on what the correct payload was:

computed (iOS):     ... f3 9e cc cc 9b 3d 3a cf fe 86 62 b7 ca 7b 18 d2 69 80 8c 8f 7b 9d ...
expected (Android): ... f3 9e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 8c 8f 7b 9d ...
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

computed (iOS):     ... 9b 83 a0 a0 81 05 3c 77 4c 78 bd 88 de c3 87 cd a4 a9 7e d2 bf c2 ...
expected (Android): ... 9b 83 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7e d2 bf c2 ...
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This particular mismatch confused me. At first I misread it; I thought the iOS code was failing to fill in a field. But actually the error is indicating that the iOS version has a bunch of gibberish in a place where the data we captured from the Android version says there should be zeroes.

I figured my code was accidentally mangling some data, and started figuring out which field was affected. At which point I probably said something like "Oh FUCK!". The affected field was an IV field; those are supposed to contain gibberish. Initialization vectors should never be re-used, but the Android version is merrily using zero every time! That is a seriously disturbing mistake to find in a well-regarded crypto app.

At first, it wasn't clear where the mistake was in the Android code. The ConfirmPacket code (from 2013) did generate a random iv and write it into the packet, but somehow the iv was disappearing before the packet was written to the network. Very confusing... until I noticed this:

private byte[] getIv() {
    byte[] iv = new byte[16];
    System.arraycopy(iv, IV_OFFSET, this.data, 0, iv.length);
    return iv;
}

See it?

The parameters for System.arrayCopy are src, srcPos, dest, destPos, and length. In that order. (A nice trap for anyone used to memcpy.)

getIv is supposed to extract the IV by copying the bytes corresponding to the IV out of the packet buffer and into a fresh buffer. What getIv is actually doing is the opposite: copying a fresh buffer over the packet buffer's IV. And getIv happens to be called even when creating a packet to send. So the iv always ends up zeroed.

(The fact that it's so easy to copy in the wrong direction, and not notice, is another reason direct buffer manipulation is a risky way to do parsing/packing. This is what happens when you write C in Java: you get C-style mistakes.)

Cause

It's easy to see how this bug could make it into the code. Programmers make dumb mistakes all the time, although some of us don't like to admit it. We transpose arguments, forget semicolons, go off by one... (thank cute kittens for testing). So the real question isn't how did this bug happen, it's how did this bug survive? Why didn't someone notice?

My personal opinion is that this bug survived because, once it's established, it's invisible to integration tests. Sure, the call initiator and the call responder are both making a mistake. But they're making the same mistake. And this is a case where two wrongs make a right. The participants just end up agreeing that the Confirm messages' IVs were zero (and that everyone should be shifted 12 bytes to the right).

The bug would have been (and was) caught by minimal unit testing, but Moxie doesn't (didn't?) value unit tests. Open Whisper Systems put their efforts into integration testing instead. (It's one of the things we clashed over.) Integration testing would have caught the bug being introduced, since the new buggy version wouldn't have been able to call existing non-buggy versions, but would never have caught the bug if it was there from the start.

(So the fact that the bug wasn't caught actually tells us something: it probably was there from the start. As opposed to being surreptitiously introduced by a certain government agency, since that would have caused calls to existing clients to fail during testing.)

Outcome

Transposing those two arguments was a bone-headed mistake, but fortunately it wasn't a disaster. There's two reasons for that.

First, the affected IV wasn't crucial to RedPhone. The Confirm message IV exists to provide indistinguishability under chosen plaintext attacks (ind-cpa), but I have no idea how someone could have used distinguishability to hurt a RedPhone user (especially since preshared mode wasn't implemented).

Second, the problem was fixed as soon as I reported it. I can't link to the commit on github, since the RedPhone-iOS repository was merged into the Signal-iOS repository without preserving history, but here's the relevant diff according to some random commit caching site:

[...]

src/org/thoughtcrime/redphone/crypto/zrtp/ConfirmPacket.java :
    [...]
    private byte[] getIv() {
       byte[] iv = new byte[16];
    -  System.arraycopy(iv, 0, this.data, IV_OFFSET, iv.length);
    +  System.arraycopy(this.data, IV_OFFSET, iv, 0, iv.length);
       return iv;
    }
    [...]

[...]

And here's the commit message:

[Moxie Marlinspike (March 23, 2013)]

Fix for Commit packet protocol error. (aka, System.arraycopy is not memcpy)

@Strilanc pointed out that the arguments for a System.arraycopy call are transposed, such that the commit packet was miscalculating the IV for CFB encryption.

This doesn't look immediately exploitable, but it's definitely wrong. The problem is that fixing it will break compatibility with clients that haven't yet received the fix.

This fix checks the client ID of the communicating client to see if it contains the fix, in which case it calculates the IV properly. Otherwise, it intentionally miscalculates the IV in order to maintain compatibility. In a few months, we'll phase out the backwards compat support.

So in the end everything turned out okay.

(If you want to make a contribution to Signal, I recommend finding and deleting the interop code alluded to by this post. Now that the affected legacy clients are gone, there's no need to keep the code that detects them and then zeroes the confirm IV and shifts packet payloads 12 bytes to the right.)

Lessons Learned

Don't only test entire protocol handshakes, also test individual packets.

Don't only test against your own implementation, also test against a reference implementation.

Direct buffer manipulation is an error-prone way to do serialization. (I recommend using parser combinators instead.)

It doesn't matter that the bug is shallow if nobody looks. (*cough* *cough*)