While eating breakfast this morning, I decided to finish watching “Will We Survive First Contact?,” an episode of *Morgan Freeman’s Through the Wormhole*, a nice series on Science that does a reasonable job of translating science into laymen’s terms without simplifying too much. This episode dealt with how we might know when we encountered alien communication. The topic of aliens was just a vehicle for talking about information theory. Topic modeling made its appearance, though no one called it that.

^{607}seconds or about 10

^{600}years. Have the computer do a trillion multiplications per second and you decrease the time to 10

^{597}years. Have a thousand of these computers working at once and you have 10

^{594}years.A less naive approach is to recognize that if we know the target number and assume one of the primes, we can find the other by dividing and then check by doing another multiplication. This reduces the time needed for a thousand computers each doing a trillion operations per second to about 10

^{294}years.If we look at what a 1024-bit prime looks like, we’ll notice that we already know what two of the bits must be. The prime isn’t the number 2, so it must be odd. The lowest-order bit must be a 1. If the highest-order bit wasn’t a 1, then it wouldn’t be a 1024-bit prime. It might be a 1023-bit prime. So we know that the highest-order bit is a 1. This holds for both primes that go into the public key.We’ve already taken that into account in the above calculations about how long it would take to crack the public key if we know the high and low order bits of the primes. For each additional bit of information we know about the primes, we cut the time in half.Let’s assume for a moment that we are multiplying two numbers. Let’s say 11101 and 10011 (29 and 19). What does this look like in long-hand multiplication?

1 | 1 | 1 | 0 | 1 | |||||

1 | 0 | 0 | 1 | 1 | |||||

1 | 1 | 1 | 0 | 1 | |||||

1 | 1 | 1 | 0 | 1 | |||||

0 | 0 | 0 | 0 | 0 | |||||

0 | 0 | 0 | 0 | 0 | |||||

1 | 1 | 1 | 0 | 1 | |||||

1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 |

The result, 1000100111, is binary for 551 (29 times 19). Note that there are 10 bits in the result (5+5). The highest order bit is 1, as is the lowest (the product of two odd numbers will be odd). Note that the next to lowest bit is a 1. This is because the next to lowest bits are different (0 and 1). To see this, let’s substitute *x* and *y* for those bits and see how it falls out.

1 | 1 | 1 | x |
1 | |||||

1 | 0 | 0 | y |
1 | |||||

1 | 1 | 1 | x |
1 | |||||

y |
y |
y |
xy |
y |
|||||

0 | 0 | 0 | 0 | 0 | |||||

0 | 0 | 0 | 0 | 0 | |||||

1 | 1 | 1 | x |
1 | |||||

… | … | … | … | … |
… |
… | … | x^y |
1 |

Now we know that if the next to lowest bit in the product is a 1, that the next to lowest bits in the primes are different. If the lowest bit in the product is a 0, then the next to lowest bits in the primes are the same. If we know this bit for one prime, then we know it for the other. We’ve reduced the number of unknown bits by one. Depending on how we attack the problem, we may have cut our time in half to crack the product.

We can continue to decompose the product and generalize it, but the relationships quickly mount and become intractable, especially with traditional computing. We end up with a lot of conditions that the primes must satisfy, but no good way to try them all. This approach is not impossible if we have quantum computers. We just throw all the possibilities at the conditions and see which ones survive. In fact, this kind of entanglement (e.g., the next-to-lowest-order bits being the same or opposite) is what quantum computers should be good at.Until we have quantum computers, we can turn to other ways of doing massively parallel computers: specialized hardware. Instead of using a general purpose computer, we can use specially designed circuits that solve the problem using basic laws of physics. These computers can only work on one candidate at a time, but they can be made small and mass produced. Even if it takes a million transistors to see if a pair is valid for the key, then we can have a few hundred of these per square centimeter of silicon. Assuming a 1 GHz clock (slow these days), then we’ll get around a trillion calculations per second per square inch of silicon. Depending on power and cooling requirements, we can get a quadrillion calculations per second per cubic foot.This might not seem like much of a gain, but keep in mind that what we’re solving with this special hardware is really a linear equation. We have 2*k*+5 knowns (the product [2

*k*knowns], the fact that the primes are odd [2 knowns], the size of the primes [2 knowns], and the parity of the next-to-lowest bits in the primes [1 known]) and 2

*k*-5 unknowns (the two primes [

*k*unknowns each] minus the high and low order bits [4 unknowns], and minus the parity of the next-to-lowest bits in the primes [1 known]). There are 10 more knowns than unknowns. The linear system is overdetermined, so we should be able to work our way backwards bit by bit, assuming the right circuits.Instead of an exponential search (the number of primes to check is O(2

*)), we might have a linear search of O(*

^{k}*k*) since we might be able to determine one bit at a time. If that’s the case, then a trillion operations per second would crack a million keys a second.This might sound amazing. You might wonder why people aren’t doing this if it could work (and it very well might not). One answer is that the hardware has to be built for the size of the key. If the hardware is built for keys of size 2

*k*, then it probably won’t work for keys of size 2

*k*+2. Of course, if everyone is using keys of size 2

*k*and that’s the stuff you’re interested in, then it might be worth pouring money into developing specialized hardware.This is the trade-off between specialized solutions and general solutions. We can build brilliant tools that target a specific problem. These tools will break when the problem we tackle is a little different. Or we can build generalized tools that don’t do any one thing as well as the specialized tool, but they do many different things well enough. Thus the wide availability of CPUs in general things like laptops and tablets and the embedded use of FPGAs in specialized devices such as microwaves.The point of this isn’t that we should abandon specialized tools (though I do think we put too much emphasis on them), but that we shouldn’t discount information that doesn’t quite match what we expect. If we ignored the parity of the next-to-lowest bits because it didn’t give us the real values of those bits, we would not see the larger pattern at play. We’d be stuck with a naive brute-force method of breaking the keys.It’s easy to say that knowing how clicks and whistles are distributed is useless because it doesn’t tell us what they mean, but knowing that there might be meaning is the first step towards discovering it. Even if a tool such as topic modeling doesn’t give us the types of results we might get from a close reading, we shouldn’t dismiss the information it can give us. Knowing a little more about the thing we are studying can help us decide how to approach it next. It might feel like we’re going down a rabbit hole in the process, but rabbit holes can lead to amazing places.For further reading, I recommend:

*The Codebreakers: The Comprehensive History of Secret Communication from Ancient Times to the Internet*and

*Cryptonomicon*.