SOFORTH, a Forth system that will be ready Real Soon Now.
        
        The overall memory arrangement looks like this (NOT shown to
scale, and addresses grow upward):


    ----------------
   |     Text       |
   |     Input      |
   |     Buffer     |
   |________________|
   |                |
   |    _ Return    |  
   |   |  Data _    |
   |   |_ Stack |   |
   |      Stack_|   |
   |                |
   |                |
   | \/ \/ \/ \/ \/ |
   |                |
   |                |
   | /\ /\ /\ /\ /\ |
   |                |
   |   Dictionary   |
   |                |
   |                |
    ----------------

        The data and return stacks grow down toward the dictionary,
which grows up.  There is no particular place to store variables or
constants; they live in the dictionary and share namespace with all
other words.

        The data and return stacks are interleaved, so that they will
both be sensitive only to the dictionary size.

        This implementation assumes 32-bit words.  We'll worry about
portability later; there should be only a few places where it will
matter.


The Dictionary.
---------------

        A Dictionary entry looks like this:


   (        |--------|   =  1 byte     )


    ___________________________________
   |                                   |
   |       end of parameters (0)       |
   |________|________|________|________|
   |                                   |
   |                ...                |
   |________|________|________|________|
   |                                   |
   |        parameter field            |
   |________|________|________|________|
   |                                   |
   |        parameter field            |
   |________|________|________|________|
   |                                   |
   |        parameter field            |
   |________|________|________|________|
   |                                   |
   |        code pointer field         |
   |________|________|________|________|
   |                                   |
   |        L     I      N    K        |
   |________|________|________|________|
       ... six words omitted, names
              are 31 bits long...
    ________ ________ ________ ________
   |        |        |        |        |
   | name   | name   | name   | name   |
   |________|________|________|________|
   | pbit & |        |        |        |
   | length | name   | name   | name   |
   |________|________|________|________|


        (Addresses grow upward in the above diagram.)

        The address of the entry is considered to be the word whose
first byte contains the precedence bit and name length (the first word
of the entry).

        The first bit in that word is the precedence
(IMMEDIATE/NON_IMMEDIATE) bit.  The remaining 7 bits of its byte are
the length of that name.  Following this are the characters of the
name itself (up to 31 bytes).  The characters of the name are in
reverse order, for ease of use by C's string functions.

        Later on, we might have variable length names, but Keep It
Simple for now.

        Next comes the Link to the previous entry (the link is always
on the next word-boundary past the name), then the code-pointer, then
the parameter fields.

        The code pointer could point to one of several different kinds
of functions:

   - address_interpreter(): run down the parameter fields doing
     address interpreter things (more on this later).

   - a primitive (in which case there are no PFAs).

                or maybe one of these:

   - variable_handler(): the first parameter field contains the value
     to be pushed on the stack, all the code needs to do is push it
     there and set some counters correctly (more later).

   - string_handler(): first parameter field contains the length of
     the string, the remaining ones contain the characters in the
     string.  For now, the code will just print the string; later, we
     can have operations that actually use/modify the data.
     This code also has to set some counters correctly.


The Address Interpreter.
------------------------

(PC = "program counter"
 PF = "parameter field")

        The way the address_interpreter() works is this:

        The function it's looking at could be a primitive ("+", for
example, which adds the top two numbers on the stack), or it could be
a word defined in terms of other words.

        In either case, it records the caller's next PF on the return
stack (it obtains this by advancing the PC).  It then *sets* the PC to
point to the first PF of the word currently being executed, and starts
a loop.  The loop runs down the PFs, executing them, until it runs
across one that's zero, at which point the PC's old value is restored
from the return stack and address_interpreter() returns.

        Numbers are pushed on the stack by code which takes the next
PF (obtained by incrementing PC again) and pushes its value, then
increments PC again.


Interpretation and colon compilation.
------------------------------------

        Brodie says on page 284 of Starting Forth:

  "Here is what INTERPRET does, in plain English:

   Begin a loop.  Within the loop, scan the next word form the input
   stream and try to look up.  If it's defined, execute it, then make
   sure the stack has not underflowed.  (If it has, exit the loop and
   display an error message.)  If it's _not_ defined, try to convert it
   to a number, leaving it on the stack.  Then repeat the loop until the
   input stream is exhausted.

   Now let's compare the interpreter's structure with that of the colon
   compiler:

   Begin a loop.  Within the loop, scan the next word from the input
   stream and try to look up.  If it _is_ defined, then treat it as a
   word.  If the word is immediate, then execute it and make sure the
   stack has not underflowed.  If it is _not_ immediate, emplace its
   compilation address.  If the word is _not_ defined, try to convert it
   to a number, compiling it as a literal.  Then repeat the loop until
   the input stream is exhausted, or the system state returns to
   interpreter mode."

        This all sounds just dandy to me, and so it's what I do.


Conditionals.
-------------

        There are some branching primitives: BRANCH, BEQ, BNZ, and
BZE.  `IF' makes use of BRANCH and BZE (it could equally well have
used BNZ, but the code is slightly smaller this way).  An example
definition:

: bar if baz then ;

        The PFs for bar look like (starting from the left):

        _____________
       |             ||
       |             \/
BZE  ADDR1  BAZ  END_OF_PARAMS


        Actually, there's a little bit of fudgery that has to happen,
because the loop in address_interpreter() will automatically increment
the PC (and branching just works by setting the PC, after all).  So,
to compensate for what we know address_interpreter() is going to do to
the PC, the address actually stored would be this:

        ____
       |    ||
       |    \/
BZE  ADDR1  BAZ  END_OF_PARAMS


        So.  `IF' is an immediate word which first compiles in a
branch instruction, then a blank entry (leaving the address of that
entry on the return stack), and compiles in the words up to `THEN'.

        THEN, which really means `ENDIF' in Forth, is also an
immediate word.  It gets the blank PFA off the return stack (you
remember, the one left by IF), and stores the last PFA, where BAZ is
stored in the above example.  It doesn't store the PFA containing
END_OF_PARAMS because address_interpreter() will automatically
increment the PC to that PFA in a moment, and we must compensate for
that fact.

        `ELSE' is just a thief.  It steals the IF's PFA off the return
stack, stores its own address there, and leaves a different address
for THEN.  Neither IF nor THEN ever know the difference; in fact,
there could be a bunch of ELSE-like entities between an IF and its
THEN, usurping addresses heartlessly, and no one would ever be the
wiser.

: bar if baz else boo then ;


        The actual addresses would look like this (i.e.: I'm
incorporating the aforementioned fudgery into this diagram):

        _________________    ____
       |                 || |    ||
       |                 \/ |    \/
BZE  ADDR1  BAZ  BRANCH  ADDR2  BOO  END_OF_PARAMS


        Compare the compiled code for an IF-THEN sequence with the
code for an IF- THEN-ELSE.