This file documents icode, the language understood by the Minor interpreter. We interpret Scheme by compiling it to icode, and then interpreting that. The icode interpreter supports multiple value return, exceptions, and source-level debugging. Icode is supposed to be simple to understand, compile to, and interpret. It is not supposed to be efficient. There's allocation going on here out the wazoo, and that's fine. I don't want to spend time optimizing an interpreter that could be spent optimizing a native code generator. (icode->procedure ARGLIST ICODE ENV) builds a Scheme procedure object that takes the arguments given by ARGLIST, and executes the icode block ICODE in the environment ENV extended by bindings for the actual parameters. ARGLIST is either a list of symbols, or an improper list of symbols terminated by a symbol; it's interpreted as it would be if it appeared as the arguments of a lambda form. ENV represents the environment the procedure has closed over. It is an association list mapping symbols to values; assigning to an identifier does a 'set-cdr!' on the alist element for the symbol that represents that identifier. However, ENV may also contain, in addition to SYMBOL/VALUE pairs, top-level environment objects. These are consulted for variable references and assignments in the obvious way. ICODE is a list of icode instructions. The state of the icode interpreter is a quadruplet (I V R K), where: - I is a list of icode instructions, - V is a stack of intermediate values, - R is the current environment of variable bindings, in the form given above for ENV, and - K is the current procedure's continuation. The following gives the details of how I, V, and K work. It's a series of rules, each of which describes one step the interpreter could take. Each rule has the form PRE => POST: if the interpreter is currently in a state matching the pattern PRE, then POST describes the state it should be in at the next step. For example, suppose the current state is (((quote hello) (return)) () R K) for some R and K. The only rule whose left-hand side matches that state is this one: (((quote D) I' ...) (V ...) R K) => ((I' ...) (D V ...) R K) It matches with D == hello, (I' ...) == ((return)), and (V ...) == (). So substituting those values into the POST part of the rule, we see that the next state would be (((return)) (hello) R K). (((quote D) I' ...) (V ...) R K) => ((I' ...) (D V ...) R K) (((ref S) I' ...) (V ...) R K) => ((I' ...) (V' V ...) R K) where V' is S's value in R. (((set S) I' ...) (V V' ...) R K) => ((I' ...) (V V' ...) R K) where the value of S in R has been set to V. (((if (T ...) (E ...))) (#f V ...) R K) => ((E ...) (V ...) R K) (((if (T ...) (E ...))) (C V ...) R K) => ((T ...) (V ...) R K) (((pop) I' ...) (V V' ...) R K) => ((I' ...) (V' ...) R K) (((unpack) I' ...) ((A ...) V ...) R K) => ((I' ...) (A ... V ...) R K) (((proc L (I'' ...)) I' ...) (V ...) R K) => ((I' ...) (P V ...) R K) where P is (icode->procedure L (I'' ...) R) (((tail-call)) (P A ...) R K) => ((I' ...) () R' K) where P is a procedure, I' is P's instruction list, and R' is P's environment, extended with bindings for the arguments (A ...). (((call N M) I' ...) (P A ... V ...) R (F ...)) => (((tail-call)) (P A ...) R ((((call N M) I' ...) (V ...) R) F ...)) where (A ...) is N elements long. (((return)) (A ...) R ((((call N M) I' ...) (V ...) R') F ...)) => ((I' ...) (A ... V ...) R' F ...) where M is a number and (A ...) is M elements long (((return)) (A ...) R ((((call N #f) I' ...) () R') F ...)) => ((I' ...) (A ...) R' (F ...)) (((get-cont) I' ...) (V ...) R K) => ((I' ...) (K V ...) R K) (((set-cont) I' ...) (K' V ...) R K) => ((I' ...) (V ...) R K') (((catch (I'' ...)) I' ...) V R K) => ((I' ...) V R K) (((throw)) (X) R ((((call N M) (catch (I'' ...)) I' ...) (V ...) R') F ...)) => ((I'' ...) (X) R' (F ...)) (((throw)) (X) R (((I' ...) (V ...) R') F ...)) => (((throw)) (X) R (F ...)) (((c-call) I' ...) (C A ...) R K) => ((I' ...) (A' ...) R K) where A' ... are the return values from applying the C procedure C to the arguments A ... . (((c-return)) (A ...) R (C)) => applies the C continuation object C to the return values A ... . Here are definitions for the tricky control procedures I can think of, using icode->procedure: ;;; Simple, two-argument version. (define apply (icode->procedure '(f a) '((ref a) (unpack) (ref f) (tail-call)) '())) (define call/cc (icode->procedure '(f) '((get-cont) (proc (k) ((proc a ((ref k) (set-cont) (ref a) (unpack) (return))) (ref f) (tail-call))) (tail-call)) '())) (define call-with-values (icode->procedure '(t r) '((ref t) (call 0 #f) (ref r) (tail-call)) '())) (define (values . args) (call/cc (lambda (k) (apply k args)))) (define catch (icode->procedure '(b h) '((ref b) (call 0 #f) (catch ((ref h) (tail-call))) (return)))) (define throw (icode->procedure '(x) '((ref x) (throw)))) It's a pity that catch involves so much allocation: two procedures, the multiple value return list which is almost always going to be unnecessary, etc. But we're allocating a pair for every single intermediate value, so this may not even be worth worrying about. Source-level debugging requires us to be able to map between the current state of the interpreter and a source code location. Since each instruction in icode is a proper list, with a fixed number of operands (if any), we can store source-location information in the last cdr of the instruction. Source-level debugging also requires us to be able to find the variables in scope at any point in the program. Since the environment is represented as an alist, the variables currently in scope are the cars of the pairs in the alist.