###
»Squeezing a key through a carry bit«
2017-12-27, 13:00–14:00, Saal Dijkstra

The Go implementation of the P-256 elliptic curve had a small bug due to a misplaced carry bit affecting less than 0.00000003% of field subtraction operations. We show how to build a full practical key recovery attack on top of it, capable of targeting JSON Web Encryption.

Carry bugs are fairly common, and usually too small to have big impact, or so they are considered. This one was no exception.

Go issue #20040 affected the optimized x86_64 assembly implementation of scalar multiplication on the NIST P-256 elliptic curve in the standard library.

`p256SubInternal`

computes `x - y mod p`

. In order to be constant time it has to do both the math for `x >= y`

and for `x < y`

, it then chooses the result based on the carry bit of `x - y`

. The old code chose wrong (`CMOVQNE`

vs `CMOVQEQ`

), but most of the times compensated by adding a carry bit that didn't belong in there (`ADCQ`

vs `ANDQ`

). Except when it didn't, once in a billion times (when `x - y < 2^256 - p`

). The whole patch is 5 lines.

The bug was found by a Cloudflare engineer because it caused ECDSA verifications to fail erroneously but the security impact was initially unclear. We devised an adaptive bug attack that can recover a scalar input to `ScalarMult`

by submitting attacker-controlled points and checking if the result is correct. Elliptic Curve Diffie-Hellman involves a secret scalar, a peer-provided point, and fails to establish a key if the result is incorrect.

We reported this to the Go team, Go 1.7.6 and 1.8.2 were issued and the vulnerability was assigned CVE-2017-8932.

At a high level, this P-256 ScalarMult implementation processes the scalar in blocks of 5 bits. We can precompute points that trigger the bug for each (and only) 5 bit value, and submit them. When the protocol fails, we learned 5 key bits, and we move on to the next 5, Hollywood style. In about 500 submissions on average we recover the whole key.

The precomputation involves a lot of unusable points and edge cases, but by modifying the optimized assembly implementation and generating points intelligently, we can produce a full round of points in seconds on 1000 machines (or spot instances). Each round depends on the previous ones, so must be computed live during each attack.

Normal ECDH does not offer an attacker multiple attempts against the same scalar, making the attack impossible. However, a variant of ECDH with a static scalar is used as a public key encryption scheme, for example in JSON Web Encryption. The attack can fully recover the private key in that scenario.

No bug is small enough.