(Work in Progress -- please feel free to finish this and mail it to me)

        ASCII takes up a little more space than it needs to because
every character is the same "bit"-length, whether it's

1010101

or

0000001

        Huffman encoding makes a prefix-unique character set in which
not all characters are the same length, and makes sure that the most
frequently used characters get the shortest lengths.  It's not the
absolute winningest method of compression, but it's sure one of the
cutest, and is often used in combination with other methods in
compression programs.  Note that one couldn't really use
Huffman-encoded character sets as active data -- you'd have no way to
jump to character N, because first you'd need to know the lengths of
all the intermediate characters!  It's useful for saving space in
inert files, though.

        What does "prefix-unique" mean?  Let's assume a simplified
alphabet of 8 characters (so that it could be represented in 3 bits in
normal ASCII).  In order of frequency, they are A, B, C, D, E, F, G,
H.  (A is the most common, H the least, that is).  Now look at the
following tree:
                      _______________________
                      |                      |
                      0                      1
                      |                      |
                ------------           -------------
               0:A        1:B          0           1
                                     / |           | \
                                   /   |           |   \
                                 /     |           |     \
                               0:C    1:D          0       1
                                                 / |       | \
                                               /   |       |   \
                                             /     |       |     \
                                            0:E   1:F     0:G    1:H

        The leaves of the tree are letters in the alphabet, but the
tree is not balanced.  Some characters require only two bits (A and B,
for example), others make do with 3 (C and D), and some actually
require 4 bits (E, F, G, and H).  But the system is unambiguous: the
string 

010100111101101

translates to

BBAHBD

        ...one simply starts at the beginning of the string and the
root of the tree, follows nodes of the tree according to the bits in
the string until reaching a leaf, writes down the letter, and starts
back up at the beginning of the tree while preserving one's place in
the bit string, until the end of the string is reached.

        The question is, are we really winning?  Well, the string
BBAHBD in a standard 3-bit code would have required 6 * 3 = 18 bits to
represent; we managed to do it in 14 with Huffman encoding.  However,
the space savings in Huffman-encoding is entirely dependant on
statistics -- on the usage-frequency count of the alphabet being
encoded.  If we had to encode this string:

DDDDDD

        we'd need 24 bits Huffman-encoded, but only 18 in standard
fixed-length code!  We're counting on the fact that strings like
DDDDDD don't occur as often as ones like BBAHBD, which is more in
accord with our frequency estimate.

        There is an algorithm for forming the best Huffman "tree"
given an alphabet and the usage frequencies of its symbols, and the
Huffman-encoded alphabet will always be more efficient, in the long
run, than the fixed-length alphabet.  Here is the algorithm:

[insert 8 oz of pure, godlike, Huffman genius here]