Intro: one way functions. ------------------------ Imagine a list of numbers: N = (23, 47, 129, 331, 94, 515, 872, 111, 235, 40021, 7, ...) Now, imagine a bit sequence B: 110100111 B has a value computable from N (let's assume for the moment that there are more elements in N than B). This function F that computes the value works like this: for each bit that is 1 in B, take the corresponding number in N and add it to a total (if the nth bit in B is on, then include nth number in N). Thus, F(B) = 1619 = (+ 23 47 331 872 111 235) However, even given F(B) and N, one cannot determine B in a reasonable amount of time! For small examples such as the above, the exhaustive search is feasible, but when N has 8000 numbers, it is out of the question. This is known as the knapsack problem, the idea being that you have a knapsack of certain size (1619), and a bunch of items (N), and you have to figure out which combination of items fits in the knapsack without leaving any space. Problem is that there might be more than one combo that does the trick, and anyway, anyone trying to decode text that was encrypted using a method based on the knapsack problem would have to solve the problem in order to decode the message. This is unacceptable -- the legal receiver has to be able to decode quickly. So it really has very little to do with public-key encryption, but it's such a neat thing that I had to talk about it anyway. And it is an example of a one-way function, which is related to public-key encryption. Here is a padlock analogy for a public key encryption system. A sends B a box locked with A's padlock. B sends it back to A, locked also now with B's lock (the box has many loops for locks). A sends it back, with A's padlock now removed, and B unlocks B's padlock on receiving the box. At no time in the transmission channel was the box unlocked. The padlocks can be realized with classical (i.e. private key encryption systems), because an encoded message can be further encoded with no trouble. As long as the encoding/decoding is commutative (i.e. the order in which the decodings take place relative to the encodings is irrelevant -- such a system can be designed) this method will work, however it provides no protection against a non-passive eavesdropper who masquerades as A or B in order to get the other side to remove their one padlock. Anyway, now suppose that you have two large primes, p and q. Their product, pq, can be publicized without giving away p or q, because there is no known tractable algorithm for factorization (yet, yet...) [rewrite following] 6.1. What is public-key cryptography? In a classic cryptosystem, we have encryption functions E_K and decryption functions D_K such that D_K(E_K(P)) = P for any plaintext P. In a public-key cryptosystem, E_K can be easily computed from some ``public key'' X which in turn is computed from K. X is published, so that anyone can encrypt messages. If D_K cannot be easily computed from X, then only the person who generated K can decrypt messages. That's the essence of public-key cryptography, published by Diffie and Hellman in 1976. In a classic cryptosystem, if you want your friends to be able to send secret messages to you, you have to make sure nobody other than them sees the key K. In a public-key cryptosystem, you just publish X, and you don't have to worry about spies. This is only the beginning of public-key cryptography. There is an extensive literature on security models for public-key cryptography, applications of public-key cryptography, other applications of the mathematical technology behind public-key cryptography, and so on. 6.2. What's RSA? RSA is a public-key cryptosystem defined by Rivest, Shamir, and Adleman. Here's a small example. See also [FTPDQ]. Plaintexts are positive integers up to 2^{512}. Keys are quadruples (p,q,e,d), with p a 256-bit prime number, q a 258-bit prime number, and d and e large numbers with (de - 1) divisible by (p-1)(q-1). We define E_K(P) = P^e mod pq, D_K(C) = C^d mod pq. Now E_K is easily computed from the pair (pq,e)---but, as far as anyone knows, there is no easy way to compute D_K from the pair (pq,e). So whoever generates K can publish (pq,e). Anyone can send a secret message to him; he is the only one who can read the messages. 6.3. Is RSA secure? Nobody knows. An obvious attack on RSA is to factor pq into p and q. See below for comments on how fast state-of-the-art factorization algorithms run. Unfortunately nobody has the slightest idea how to prove that factorization---or any realistic problem at all, for that matter---is inherently slow. It is easy to formalize what we mean by ``RSA is/isn't strong''; but, as Hendrik W. Lenstra, Jr., says, ``Exact definitions appear to be necessary only when one wishes to prove that algorithms with certain properties do _not_ exist, and theoretical computer science is notoriously lacking in such negative results.'' 6.4. How fast can people factor numbers? It depends on the size of the numbers. In October 1992 Arjen Lenstra and Dan Bernstein factored 2^523 - 1 into primes, using about three weeks of MasPar time. (The MasPar is a 16384-processor SIMD machine; each processor can add about 200000 integers per second.) The algorithm there is called the ``number field sieve''; it is quite a bit faster for special numbers like 2^523 - 1 than for general numbers n, but it takes time only exp(O(log^{1/3} n log^{2/3} log n)) in any case. An older and more popular method for smaller numbers is the ``multiple polynomial quadratic sieve'', which takes time exp(O(log^{1/2} n log^{1/2} log n))---faster than the number field sieve for small n, but slower for large n. The breakeven point is somewhere between 100 and 150 digits, depending on the implementations. Factorization is a fast-moving field---the state of the art just a few years ago was nowhere near as good as it is now. If no new methods are developed, then 2048-bit RSA keys will always be safe from factorization, but one can't predict the future. (Before the number field sieve was found, many people conjectured that the quadratic sieve was asymptotically as fast as any factoring method could be.) 6.5. What about other public-key cryptosystems? We've talked about RSA because it's well known and easy to describe. But there are lots of other public-key systems around, many of which are faster than RSA or depend on problems more widely believed to be difficult. This has been just a brief introduction; if you really want to learn about the many facets of public-key cryptography, consult the books and journal articles listed in part 10.