The Design of Minor Scheme             -*- mode: text -*-
                      ==========================

Minor Scheme is an implementation of the Scheme programming language.
Its features include:

- good performance, via:
  + just-in-time and ahead-of-time compilers generating native
    machine code, and
  + precise, generational garbage collection,
- full support for native system threads,
- full source-level debugging,
- a C API and a foreign function interface, for interfacing with C
  code,
- a module system which supports separate compilation and
  cross-compilation, and
- a hygenic macro system.

Minor currently requires:

- Either:
  + An IA-32 family processor, 80486 or later (the 80386 does not guarantee
    atomicity of word stores in an SMP configuration), or
  + An AMD64 family processor.
- POSIX (IEEE Std 1003.1, 2003 Edition), with the following optional
  functionality (marked as indicated in the margins of the spec):
  + X/Open System Interface Extensions ("XSI") (the sigaction
    interface, in particular)
  + Threads ("THR")
  + Semaphores ("SEM")
- A version of GNU CC implementing ISO C (ISO/IEC 9899:1999 (E),
  commonly called "C99")
- Versions of GNU CC, the GNU Binutils, and the GNU C library that
  support the '__thread' thread-local storage class.

I'm certainly interested in patches needed to make it work with other
versions of that software.

As far as portability is concerned, I'm interested in changing Minor's
interfaces to system-specific facilities as needed to make Minor
easier to port, but beyond that, I want to invest my own time in
supporting the machines I actually develop on.  I'll leave writing
code for other systems to interested parties.

It is being developed on the following two systems:
- A Pentium II running Fedora Core 4
- An AMD64 running Fedora Core 4

Fedora Core 4 uses:
- Linux 2.6.14
- GNU libc 2.3.3
- GCC 4.0.1
- Binutils 2.15.94.0.2.2

The packages in Fedora Core generally include patches against their
upstream sources.


* Minor's Goals

** Basic Requirements

Here are the basic requirements and goals for Minor Scheme.  These are
all supposed to be characteristics apparent to the user, not
strategies for providing such characteristics (e.g., instead of
"generates native code", we would say, "provides good performance").

- Minor Scheme should provide hygenic macros that can do arbitrary
  computation, and a Flatt-style module system that supports
  predictable separate compilation in the presence of macros.

- Minor Scheme should support the full Scheme numeric tower, including
  arbitrary-precision integers and rational numbers.

- Minor Scheme should support a full set of continuation operators,
  including Queinnec's control operators and one-shot continuations.

- Minor Scheme should use modern Unicode (i.e., including those
  characters with codes above 64k) to represent strings.  It should
  provide a separate 'byte array' type for working with things other
  than text.

- Minor Scheme should provide full source-position tracking, both for
  compile-time and run-time errors.  This should include tracking code
  introduced by macro expansions.

- Minor Scheme should support source-level debugging.

- Minor Scheme should have a good interactive user interface, with
  legible, helpful error messages.

- Minor Scheme should be easy to use as a script interpreter.

- Minor Scheme should allow the use of native system threads, on both
  uniprocessor and multiprocessor systems.

- Minor Scheme should provide a thread-aware C API, to allow C code to
  call Scheme code.  The API's design should not preclude the use of
  copying, generational, incremental, or concurrent garbage collection.

- Minor Scheme should provide a full foreign function interface, to
  allow Scheme code to call C code.

- Minor Scheme should have a full interface to POSIX system calls, 
  Linux system calls, and other essential system libraries.

- Minor Scheme should fit well into other build systems: the
  ahead-of-time compiler should produce .o files that the user can
  link into their programs or shared libraries in the normal way,
  perhaps with reference to some run-time Minor libraries as well.

- Minor Scheme should facilitate the distribution of Scheme libraries
  separately from the Minor Scheme distribution itself.

- It should be possible to build Minor Scheme from source with the
  standard 'configure; make' sequence.  The distribution should
  include no binaries or other object files; every file should be a
  source file.

- Minor Scheme should have good performance.  It should compare well
  with Bigloo, Chez, Chicken, MIT Scheme, MzScheme, RScheme, and SCM
  --- both in interactive performance, and in the speed of
  pre-compiled code.

** Secondary Goals

Once the basic requirements have been met, here are others that are
necessary for a 1.0 release:

- Minor Scheme should have an interface to the Gtk 2.0 graphical user
  interface toolkit.

- Minor Scheme should have equivalents for 'lex' and 'yacc', for
  constructing custom parsers and lexers.

- Minor Scheme should have basic support for parsing and generating
  XML.

- Minor Scheme should have a full, efficient regular expression
  library, supporting both greedy and non-greedy operators.
  Repetition operators should be capable of returning lists of match
  values, not just a fixed set.


** Basic Non-Requirements

Here are considerations that are not important for Minor Scheme:

- Minor Scheme need not avoid taking advantage of system-specific
  facilities.  It is not meant to run out-of-the-box on any ISO C,
  POSIX, etc. system.

- Minor Scheme need not support LinuxThreads.  It may require the new
  "Native POSIX Threads Library" thread system, which properly
  implements the POSIX 'pthread' interface.


* Tactics

Here are the specific approaches we'll use to achieve the goals above.

** GC, then interpreter, then compiler.

We'll produce Minor in the following steps:

- Write the GC first.  SMP-safe and everything.

- Many of the functions in <minor/minor.h> are just constructors and
  accessors, and are trivial given a GC.  Those are next.  Obviously,
  'mn_eval' is not in this category.

- Implement an interpreter in C for 'core Scheme', the language
  produced by macro expansion.  Write this on top of the C API; since
  the API allows you to define Scheme procedures in C, this will be
  indistinguishable at the user level from the native code
  implementation.

  (I have this feeling modules are involved here.  What do the free
  variables in a macro-expanded program refer to?)

- Implement, in core Scheme, the macro expander and module system.
  Effectively, this implements full Scheme by translation into core
  Scheme.

  Extend the C runtime as necessary, but try to write as much as
  possible in Scheme, since the performance of the interpreter is not
  important, and the Scheme code can be shared with later stages.

- Implement an assembler.  This has two parts:
  - an arch-independent part: just a library for putting together byte
    strings that contain references to labels: blocks, labels, relocs,
    annotations, etc.
  - an arch-dependent part: this emits machine-code representations of
    some architecture's instructions, using the arch-independent
    facilities

  The result of the assembly process is blocks of bytes, annotated
  with labels it defines, labels it refers to, and relocs to say how
  to patch the latter's values into the blocks.  This is what the
  machine code -> procedure API expects; it's also what the ELF reader
  and writer (see below) operate on.

- Implement machine-code procedures.  This entails designing:
  - representations for machine-code procedures
  - calling/returning conventions for Scheme procedures
  - representations for Scheme continuations
  - representations for C continuations in Scheme
  - code->register use mappings
  - code->heap use mappings
  - code->unwinding info mappings (for backtraces and exceptions)
  - code<->source location mappings (for debugging and error messages)
  - code->environment mappings (for debugging)
  - code->catch block mappings
  - an API to let Scheme code wrap up hunks of machine code as actual
    procedure objects, annotated as above

  Designing the code annotations will be a challenge: if represented
  directly, they will be enormous.  Diwan, Moss, and Hudson describe
  their experiences: they had to go to some length to get the
  annotations down to 16% of the code size, using a safe-gc-points
  approach.  To allow GC to happen at arbitrary instruction
  boundaries, well, I don't know.

- Implement a JIT compiler for core Scheme in full Scheme.

- Implement separate compilation, so that the standard Unix linker can
  link separately compiled modules into a running program.  Use
  ordinary ELF relocatable object files to represent separately
  compiled modules.

- Build an interpreter-free Scheme executable, by compiling the JIT
  compiler to ELF files and then linking them in.  Now you have a
  interactive, native-code, JIT compiler!

- Once the compiler can generate functions directly callable from C,
  then we can re-implement the Minor C API functions in Scheme.  Since
  code generated by the Minor compiler is annotated code, it won't
  require synchronization as control enters and leaves incoherent
  sections, so the small functions (like mn_car and mn_cdr) will
  probably be faster than the C versions.

- Build all sorts of packages: RPMs, Debian .debs, Windows zip files,
  and so on to get a sense of how we can help packaging go smoothly.
  However, there should also be a turnkey way to build source Scheme
  packages --- ./configure && make, perhaps, with additional
  conventions for where libraries get installed.


** Multi-threaded, generational GC

Minor should use generational, non-incremental garbage collection.
Generational collection reduces both total collection time and typical
pause time.  Incremental collection makes the maximum pause time quite
short, but typically at the cost of increased overall run time; I
don't think it's the right trade for most applications, especially
since some non-incremental (but generational) collectors already seem
to have almost no noticeable collection pauses.

The collector must support multi-threaded programs.  However, this is
_not_ the same as supporting concurrent collection, where one thread
can collect while other threads run mutator code.  As far as I can
tell, concurrent collection entails the same sorts of performance
costs as incremental collection.  Instead, the Minor collector will
stop all mutator threads, collect, and then continue.

The C API in include/minor/minor.h effectively requires each thread to
register itself with the GC --- every function expects an mn_call
argument, and call arguments belong only to a specific thread.  This
means the collector has the information it needs to find all the
threads that might be touching the heap, and stop them if they are.

The first collector will be a copying collector, but on today's
machines memory is so slow that non-copying collectors may have the
advantage.  The only catch is, how do you represent generations
without copying?  If you set aside two bits per object (four
generations), that's more than 3% overhead, just for generational
information.  Thus the traditional approach of using the area an
object lives in to indicate its generation --- which requires copying.
One half-and-half idea would be to have a single bit per object (1.6%
overhead) lying around in a table off to the side that could act as
the lowest bit of the generation number, so that every even->odd
promotion could be done without copying, just by setting the bit.
Every odd->even promotion would actually copy, and clear the bit.
That'd give objects more time to die before we take the time to copy
them.

Anyway, once we've got something running, there will be lots of room
for improvement.

** Just-in-time compiler

To get competitive performance, Minor Scheme will do just-in-time
compilation to native code.  For ideas, we can look at Mono's code,
read the papers they refer to, and look at Dybvig's papers about Chez
Scheme.

I've heard that Dybvig has the following rule for Chez Scheme: he only
includes optimizations that, when performed on the Chez compiler
itself, speed it up enough to overcome the cost of performing the
optimization, when the compiler compiles itself.  In other words, the
compiler generates better code, but never gets slower --- each
optimization "pays for itself".  This is a great JIT rule.

However, it does preclude optimizations that would only benefit
non-compiler-like code: floating-point unboxing, SIMD detection,
and so on.  We'd have to find a different rule to decide whether to
include these sorts of things.

The rule only applies to optimizations: I hear that hygenic macros
slowed down the compiler a lot --- but they certainly were included.

** Machine code verifier

I don't know whether this is really a good idea.  But, as I've been
playing with prologue analyzers in GDB, it seems to me that most
compiler-generated machine code is pretty easy to analyze and prove
type-correct.  This is especially true for dumb compilers, but it
looks to me like it's also true for the output of optimizing
compilers.

So it should be possible to have a pass that runs after the compiler
and assembler that analyzes the machine code and verifies that the
compiler isn't using uninitialized registers, is untagging and GC
protecting properly, and so on.

George Necula did something like this for his Proof-Carrying Code
system, and he said it was surprisingly helpful at catching compiler
bugs: when the compiler generates bad code, it won't type-check.

With my assembler/disassembler stuff (in particular, the insn-case
form), I think this might be much easier than one would expect, and
very helpful.

Another thought here: rather than propagating detailed GC-tracking
information on every node through the compiler, perhaps we could only
propagate it on selected nodes (those that become labeled
instructions, say), and then re-infer the GC annotations after machine
code has been generated.  That inference pass would be the better part
of a machine code type-checker.

** C interface

To make it easier to use Minor Scheme within existing build systems,
and with existing tools, the Minor ahead-of-time compiler should
produce ordinary ELF relocatable object files (".o files"), that the
user can link with other .o files generated by other compilers to
produce working mixed-language executables.  The user will probably
need to link against a run-time library, too ("-lminor").  The Minor
AOT compiler should also be able to produce header files containing
declarations for the functions, variables, and other constructs the
Scheme module exports.

For example, the following makefile should work:

    %.o: %.scm
            minor-compile -c $<
    %.h: %.scm
            minor-compile --header $<

    prog: main.o gcd.o
            cc main.o gcd.o -lminor -o prog
    main.o: main.c gcd.h
    gcd.o gcd.h: gcd.scm

Here is main.c:

    #include <stdio.h>
    #include "gcd.h"

    int
    main (int argc, char **argv)
    {
      printf ("%d\n", gcd (atoi (argv[1]), atoi (argv[2])));
      return 0;
    }

Here is gcd.scm:

    (module gcd Minor
      (import c-ffi)
      
      (define (gcd a b)
        (if (= b 0)
            a
            (gcd b (remainder a b))))

      ;; Export the Scheme function gcd as a C function named "gcd",
      ;; which expects two int arguments, and returns an int.  The
      ;; C function's arguments are passed to the Scheme function in
      ;; the obvious way.
      (export-c-function int "gcd" (int int) (gcd ...)))

The command "minor-compile --header gcd.scm" should produce the file
"gcd.h", containing something like the following:

    /* gcd.h --- C functions exported by gcd.scm
       Generated by minor-compile 0.0 as follows:

         minor-compile --header gcd.scm

    */

    int gcd (int, int);

The build process should behave like this:

    $ make
    minor-compile --header gcd.scm
    cc -c main.c
    minor-compile -c gcd.scm
    cc main.o gcd.o -lminor -o prog
    $ ./prog 12 16
    4
    $

The 'export-c-function' form takes care of generating code for the
C-visible function that converts the incoming arguments to Scheme
objects, and the outgoing return value to a C value.  As used here,
the wrapper will also make sure Minor has been initialized (i.e., call
mn_init), and find an appropriate mn_call object to use for the
conversions and the call.  Options to 'export-c-function' not shown
here will allow C code to pass an mn_call object explicitly; in this
case, the initialization check will be unnecessary.

So, to create a stand-alone executable in Scheme, one could write:

    (module gcd2 minor
      (import c-ffi)

      (define (gcd a b)
        (if (= b 0)
            a
            (gcd b (remainder a b))))

      (define (main argv)
        (display (gcd (string->number (list-ref argv 1)
                                      (list-ref argv 2))))
        (newline))

      ;; Export the Scheme function main as a C function named "main".
      ;; The C function takes two arguments: an int named "argc" and a
      ;; char ** named "argv".  argv is an array whose length is argc,
      ;; where each element is a null-terminated string of chars.
      (export-c-function int "main" ((int argc)
                                     ((list c-string argc) argv))

        ;; The C "main" function converts its argv to the natural
        ;; corresponding Scheme type (a list of strings), and passes
        ;; only that list to the Scheme main; argc is unnecessary in
        ;; Scheme.
        (main argv)))

One could compile and run this file as follows:

    $ minor-compile -c gcd2.scm
    $ cc gcd2.o -lminor -o gcd2
    $ ./gcd2 30 40
    10
    $ 

*** Static Checking / Rechecking

The C interface should be designed to allow drift between underlying
interfaces and generated interfaces to be detected at compile-time.
The source code should include a complete description of the generated
interface, how its values are produced from the underlying interface's
values, etc.  The generation pass should always check this against the
underlying code.

On the Scheme side, of course, it can be impossible to tell statically
what types are going to be produced, but we should do the best we can
with the information we have.

* The Minor ABI

The interface for loading new machine code into a running Minor
program, and the conventions for producing ELF object files to be
linked against the Minor run-time, are public and documented
interfaces.  Minor's Scheme compiler is not privileged code; it is
simply another client of those interfaces.  (I'm not sure there will
ever be any other clients, but this is still an important interface,
and one worth keeping simple and documenting.)

As described above in "C Interface", the Minor ahead-of-time compiler
translates Scheme modules into ordinary ELF relocatable object files
(.o files), which can be linked using the standard system linker to
produce a working executable.  The Minor ABI describes the form
relocatable object files should have in order to be linked against the
Minor runtime, or loaded into an existing Minor session.

* Implementation

** Macros

** Modules

** Compilation

*** Optimizations

*** Code Generation

** Allocation and Garbage Collection

*** Multi-threaded collection

Right now, the Minor garbage collector stops all threads when any
thread needs to collect.  However, if each thread allocates in its own
nursery, then inter-nursery pointers can only be created by mutations,
which we could track.  Then each thread could collect its own nursery
without stopping any other threads.

The challenge is allowing other threads to access our nursery while we
collect it: normal forwarding pointers, stored in the objects
themselves, would look like corruption to other mutator code.  But the
nursery can put forwarding pointers outside objects (say, in the word
before the object), so the objects remain usable as we scavenge them.
This non-standard representation is easy to deal with:

- Mutators always allocate in the nursery, so mutator code will always
  construct non-standard objects.

- The collector switches from scanning one generation to another at
  well-defined points, so we can easily have a custom scanning
  function for the nursery.

We can initialize the nursery forwarding pointers when the block is
allocated, so the mutator can just skip over those words as it
allocates: the number of stores needed for an allocation won't
increase.  Larger objects will increase the nursery cache footprint,
though.

We'll need to put off updating references from other nurseries until
after scavenging is complete: until then, the to-space objects aren't
ready for use.  And we'll need a write barrier between the completion
of scavenging and the first inter-nursery update, so the other mutator
threads will see the effects of our scavenging.

This idea seems pretty straightforward; if you know of papers
describing it, please let me know.

Actually, this is completely wrong-headed.  But maybe it can be saved,
so I'm not going to delete it.

- We can't let other threads use the old copy of our nursery while we
  collect; what if they mutate the old copy?

- It's not true that inter-nursery references are only created by side
  effects.  The *first* inter-nursery reference must be created by a
  side-effect, but once it's there the other thread can copy the
  reference all over the place.  Including to other threads.

Now, the only way a reference to one of thread X's nursery objects can be
created is if thread X creates one.  What if we pull other threads
into the collection only if they could possibly have seen a reference
to one of our nursery objects?

*** Write Barriers

*** Code Annotations

** Object File Handling

*** generic ELF assembler

*** IA-32 assembler

*** Dwarf assembler

** Bootstrap interpreter

** Unicode

Using Unicode in strings will require some extra processing time for
converting strings to and from Unicode.  However, this seems like a
worthwhile price to pay for having a single, powerful string
representation within Minor itself.

A more serious drawback is that the conversion into Unicode, and then
back to the outside world's representation, may not preserve the
string byte-for-byte when the external encoding uses meta-information
that isn't retained in Unicode.  Unicode promises a clean round trip
for individual characters, not for meta-information used to represent
those characters.

For example, a string using a stateful encoding may include a sequence
to switch into one state, but then immediately switch into another
state, before producing any actual characters in the first state.
Such a transition has no representation in Unicode; Unicode only
represents characters, not state transitions.  So a round trip from a
stateful encoding to Unicode and back would drop the first (useless)
state transition.  The sequence of actual characters is preserved, but
not the exact sequence of bytes used to encode it.

Of course, if the end consumer is only looking at characters, then
this loss of information is no problem: Unicode goes to great lengths
to preserve them.  But many operating system kernels do not put
filenames in any canonical form before looking them up; they simply
treat them as opaque byte strings.  Or if the string is to be used as
a key in a database, the database might also treat it as an opaque
string of bytes.

To preserve strings byte-for-byte in cases like these, Minor
developers would need to hold the information in byte vectors, not
strings.  But since a developer general has no idea a priori what
character sets his code will be used in, this creates an incentive for
developers to simply use byte vectors everywhere, to avoid mangling
users' strings.

There are several things to point out about this concern:

- It's a bad idea to use stateful encodings for filenames in the first
  place: you could end up with any number of filenames in a directory
  that look identical, differing only in the arrangement of the
  meta-information in the names.  I suspect that most Unix systems
  avoid them, for this reason; it would be good to ask Handa-san about
  this.

  (For what it's worth, POSIX says that the behavior of the standard
  utilies (shell commands) and System Interface functions on strings
  using stateful encodings is implementation-dependent.  The POSIX
  Character Set Description File format cannot describe such
  encodings.)

- Any C program which converts 'char' strings to 'wchar_t' strings
  using the C library's multi-byte string handling functions will have
  the same behavior: they will preserve the sequence of characters,
  not the sequence of bytes.

- Java programs will have this problem, too.  People don't mind Java,
  right?  (It would be good to talk to Tom Tromey about this.)

It's possible to side-step the problem by having two distinct string
encodings within Minor: one for strings in the external character set,
which are not translated, and one for strings in Unicode.  But what
happens when you append one type of string to the other?  Obviously
some conversion happens, but in which direction?  Converting both to
Unicode would mean that innocent-looking operations (like appending a
"/", say) would result in conversions.  Converting both to the
external encoding would require knowing what shift state the
external-encoding string ended in, which is a mess.  And library
functions like regular expression matchers would need to be
duplicated, or make some sort of dynamic decision about how to access
string contents.

In this light, I think the benefits of having a single, uniform,
powerful internal representation outweigh the drawbacks.


* Infrastructure

** Bug database

** Per-commit regression testing

** Nightly benchmarking

* Readings

Olin Shivers.  Control-Flow Analysis of Higher-Order Languages.
Ph.D. dissertation, Carnegie Mellon University, May 1991.
Technical Report CMU-CS-91-145, School of Computer Science.
ftp://cs.cmu.edu/afs%2Fcs.cmu.edu%2Fuser%2Fshivers%2Flib%2Fpapers/diss.ps.Z

    This one is really essential.  Lambda is so powerful and hairy you
    can go crazy trying to think of accurate ways to analyze code
    built from it, but this thesis is a completely legible explanation
    of an open-ended analysis technique that is clearly correct,
    because it is simply a conservative approximation of actually
    interpreting the code.  If you can write an interpreter and be
    confident it's correct, then you can write a flow analyzer.  Handy
    caveats included.  It was the inspiration for my new prologue
    analysis framework in GDB.

    I think a lot of different analyses can be subsumed by this one.  I
    don't see why we really need to use CPS.

Flatt, M.  Composable and compilable macros: You want it when?
In Proceedings of ACM SIGPLAN International Conference on Functional
Programming, 2002.  http://citeseer.nj.nec.com/flatt02composable.html

    This paper is the first one I've seen that really nails why
    separate compilation can be kind of hairy for languages based on
    powerful macro systems.  I'm not sure we want to use this macro /
    module system, but we've got to address the points it raises.

Andre van Tonder.  SRFI-72: Hygienic macros.
http://srfi.schemers.org/srfi-72/srfi-72.html

    Hygienic macros are very hard to explain.  call/cc is too, but
    after working with it for a while, I feel like I understand it
    well, and I'm convinced it's the Right Thing.  I've not yet
    reached the same degree of comfort with hygienic macros.  I see a
    lot of confusion about them on the mailing lists, so I think I'm
    not alone.  It all makes me question whether they really are The
    Right Thing after all --- surely the Right Thing wouldn't be so
    difficult to think about.

    The macro system described in this SRFI is a simplification of the
    syntax-case-inspired hygienic macro systems popular nowadays, and
    I think it's worth looking into.
 
Oscar Waddell.  Extending the Scope of Syntactic Abstraction
PhD thesis, August 1999.
http://www.cs.indiana.edu/~owaddell/papers/index.html

    This thesis covers a very simple and powerful module system (a
    variant of which is in Chez Scheme), and an inlining algorithm.
    What do these have to do with each other?  The argument is that
    they complement each other --- you feel more comfortable using
    powerful macros because you can be confident that you'll still get
    good code.  It still seems like an odd mix to me.

    Be that as it may, the inlining algorithm looks awesome; I
    definitely want to do this.  It's very good at deciding what to
    inline, so that it doesn't bloat up the code much --- actually, it
    sometime makes code smaller.

    You might think you'd want to do flow analysis before inlining (to
    see which procedures are being used where), but actually, I think
    doing an inlining pass *before* flow analysis would be better.
    Olin Shivers' thesis actually describes a family of flow analyses
    that trade accuracy for time / space complexity; family members
    may differ in the extent to which they distinguish separate calls
    to the same function.  Now, if you inline a particular call and
    then do flow analysis, the analysis will obviously treat the
    inlined call as distinct from other calls to that function.  If
    you constrain the inliner to expand the code only by a constant
    factor, then you've told the flow analysis to distinguish certain
    calls while still running in linear time.

    I think.

Olin Shivers' history of T has pointers to a bunch of seminal Scheme
papers and dissertations:
http://www.paulgraham.com/thist.html

David F. Bacon, Perry Cheng, and V.T. Rajan.  A Unified Theory of
Garbage Collection.
http://www.research.ibm.com/people/d/dfb/publications.html

    Hans Boehm recommended this paper on gclist, expressing some strong
    opinions about finalization and destruction.  I could really use a
    unified theory of garbage collection.

Ulrich Drepper.  How to Write Shared Libraries.
2002-11-3, based on a tutorial given at UKUUG 2002 Conference in
Bristol.  http://people.redhat.com/drepper/dsohowto.pdf
Available from http://people.redhat.com/drepper, which also has slides.

Olin Shivers, James W. Clark and Roland McGrath.
Atomic heap transactions and fine-grain interrupts.
In \emph{Proceedings of the 1999 ACM International Conference
    on Functional Programming (ICFP)},
    September, 1999, Paris, France.
http://www.ai.mit.edu/~shivers/citations.html#heap

Amer Diwan, Eliot Moss, and Richard Hudson.
Compiler support for garbage collection in a statically typed language.

Manual Serrano and Marc Feeley.
Storage Use Analysis and its Applications
http://www-sop.inria.fr/mimosa/Manuel.Serrano/publi/sf-icfp96.ps.gz

William D. Clinger.
Proper Tail Recursion and Space Efficiency
ftp://ftp.ccs.neu.edu/pub/people/will/tail.ps.gz

Germain, Feeley, Monnier
Termite: a Lisp for Distributed Computing
http://lisp-ecoop05.bknr.net/pdf/19654.pdf

    Not sure this is relevant at all, but since we've opened Pandora's
    little concurrency box already, we might as well let this one out
    too...

- Mike Ashley's flow analysis paper
- Soft typing
- PLT's successor to soft typing
- SysV ABI (for ELF format)
- Intel manuals
- JNI api
- _Engineering a Simple, Efficient Code Generator Generator_, Fraser,
  Hanson and Proebsting
- Mono JIT

* Resources

** benchmarks

http://swissnet.ai.mit.edu/~jaffer/CNS/benchmarks has benchmarks
computing the digits of pi and the mean of many pseudo-random numbers.

see http://mangler.sourceforge.net/benchmarks.html The DHist benchmark
should be fairly portable, but use the dhist.srfi-0 file from the tarball
unless you want to get involved with an S2 installation. dhist.srfi-0 is
a single source that is nearly all R4RS, with just a couple of COND-EXPANDS
used to accomodate each individual Scheme system's build/optimization
quirks.

** test suites

http://swissnet.ai.mit.edu/ftpdir/scm/r4rstest.scm is a R4RS conformance
test for Scheme implementations.

** call/cc

Nils M Holm <nmh@despammed.com> collected call/cc tests from c.l.s.

** Unicode

*** SRFI-13 support

I don't think any Schemes other than Gauche have full support for
SRFI-13's ucs-range->char-set, but for those working on Unicode for new
or existing Schemes you may want to grab:

  http://synthcode.com/scheme/char-set.scm

which is portable (SRFI-14 only) code defining char-sets for all Unicode
scripts and all core Unicode properties.