This file documents icode, the language understood by the Minor
interpreter.  We interpret Scheme by compiling it to icode, and then
interpreting that.  The icode interpreter supports multiple value
return, exceptions, and source-level debugging.

Icode is supposed to be simple to understand, compile to, and
interpret.  It is not supposed to be efficient.  There's allocation
going on here out the wazoo, and that's fine.  I don't want to spend
time optimizing an interpreter that could be spent optimizing a native
code generator.

(icode->procedure ARGLIST ICODE ENV) builds a Scheme procedure object
that takes the arguments given by ARGLIST, and executes the icode
block ICODE in the environment ENV extended by bindings for the actual
parameters.

ARGLIST is either a list of symbols, or an improper list of symbols
terminated by a symbol; it's interpreted as it would be if it appeared
as the arguments of a lambda form.

ENV represents the environment the procedure has closed over.  It is
an association list mapping symbols to values; assigning to an
identifier does a 'set-cdr!' on the alist element for the symbol that
represents that identifier.  However, ENV may also contain, in
addition to SYMBOL/VALUE pairs, top-level environment objects.  These
are consulted for variable references and assignments in the obvious
way.

ICODE is a list of icode instructions.

The state of the icode interpreter is a quadruplet (I V R K), where:
- I is a list of icode instructions,
- V is a stack of intermediate values,
- R is the current environment of variable bindings, in the form
  given above for ENV, and
- K is the current procedure's continuation.

The following gives the details of how I, V, and K work.  It's a
series of rules, each of which describes one step the interpreter
could take.  Each rule has the form PRE => POST: if the interpreter is
currently in a state matching the pattern PRE, then POST describes the
state it should be in at the next step.  For example, suppose the
current state is (((quote hello) (return)) () R K) for some R and K.
The only rule whose left-hand side matches that state is this one:

    (((quote D) I' ...) (V ...) R K)
      => ((I' ...) (D V ...) R K)

It matches with D == hello, (I' ...) == ((return)), and (V ...) == ().
So substituting those values into the POST part of the rule, we see
that the next state would be (((return)) (hello) R K).


(((quote D) I' ...) (V ...) R K)
  => ((I' ...) (D V ...) R K)
(((ref S) I' ...) (V ...) R K)
  => ((I' ...) (V' V ...) R K)
     where V' is S's value in R.
(((set S) I' ...) (V V' ...) R K)
  => ((I' ...) (V V' ...) R K)
     where the value of S in R has been set to V.
(((if (T ...) (E ...))) (#f V ...) R K)
  => ((E ...) (V ...) R K)
(((if (T ...) (E ...))) (C V ...) R K)
  => ((T ...) (V ...) R K)
(((pop) I' ...) (V V' ...) R K)
  => ((I' ...) (V' ...) R K)
(((unpack) I' ...) ((A ...) V ...) R K)
  => ((I' ...) (A ... V ...) R K)
(((proc L (I'' ...)) I' ...) (V ...) R K)
  => ((I' ...) (P V ...) R K)
     where P is (icode->procedure L (I'' ...) R)
(((tail-call)) (P A ...) R K)
  => ((I' ...) () R' K)
     where P is a procedure,
           I' is P's instruction list,
       and R' is P's environment, extended with bindings for the 
                 arguments (A ...).
(((call N M) I' ...) (P A ... V ...) R (F ...))
  => (((tail-call)) (P A ...) R ((((call N M) I' ...) (V ...) R) F ...))
     where (A ...) is N elements long.
(((return)) (A ...) R ((((call N M) I' ...) (V ...) R') F ...))
  => ((I' ...) (A ... V ...) R' F ...)
     where M is a number and (A ...) is M elements long
(((return)) (A ...) R ((((call N #f) I' ...) () R') F ...))
  => ((I' ...) (A ...) R' (F ...))
(((get-cont) I' ...) (V ...) R K)
  => ((I' ...) (K V ...) R K)
(((set-cont) I' ...) (K' V ...) R K)
  => ((I' ...) (V ...) R K')
(((catch (I'' ...)) I' ...) V R K)
  => ((I' ...) V R K)
(((throw)) (X) R ((((call N M) (catch (I'' ...)) I' ...) (V ...) R') F ...))
  => ((I'' ...) (X) R' (F ...))
(((throw)) (X) R (((I' ...) (V ...) R') F ...))
  => (((throw)) (X) R (F ...))
(((c-call) I' ...) (C A ...) R K)
  => ((I' ...) (A' ...) R K)
     where A' ... are the return values from applying the C procedure C to
              the arguments A ... .
(((c-return)) (A ...) R (C))
  => applies the C continuation object C to the return values A ... .

Here are definitions for the tricky control procedures I can think of,
using icode->procedure:

;;; Simple, two-argument version.
(define apply
  (icode->procedure '(f a)
                    '((ref a) (unpack) (ref f) (tail-call))
                    '()))
(define call/cc
  (icode->procedure '(f)
                    '((get-cont)
                      (proc (k) ((proc a ((ref k)
                                          (set-cont)
                                          (ref a)
                                          (unpack)
                                          (return)))
                                 (ref f)
                                 (tail-call)))
                      (tail-call))
                    '()))
(define call-with-values
  (icode->procedure '(t r)
                    '((ref t) (call 0 #f) (ref r) (tail-call))
                    '()))
(define (values . args)
  (call/cc (lambda (k) (apply k args))))
(define catch
  (icode->procedure '(b h)
                    '((ref b)
                      (call 0 #f)
                      (catch ((ref h)
                              (tail-call)))
                      (return))))
(define throw
  (icode->procedure '(x)
                    '((ref x)
                      (throw))))

It's a pity that catch involves so much allocation: two procedures,
the multiple value return list which is almost always going to be
unnecessary, etc.  But we're allocating a pair for every single
intermediate value, so this may not even be worth worrying about.


Source-level debugging requires us to be able to map between the
current state of the interpreter and a source code location.  Since
each instruction in icode is a proper list, with a fixed number of
operands (if any), we can store source-location information in the
last cdr of the instruction.

Source-level debugging also requires us to be able to find the
variables in scope at any point in the program.  Since the environment
is represented as an alist, the variables currently in scope are the
cars of the pairs in the alist.