/* minor.h --- C interface to Minor Scheme */ #ifndef MN_MINOR_H #define MN_MINOR_H #include #include #include #include /* This file defines the Minor C Interface, a C interface to the Minor Scheme system. Using this interface, you can: - create, access, and change Minor objects, - call Minor functions from C, - define C functions to be called from Minor, and - define new Minor types. This interface is thread-safe. Minor Scheme supports multiple concurrent threads of control, based on the underlying system's native thread support. This interface is modeled after the Java Native Interface (http://java.sun.com/j2se/1.3/docs/guide/jni/, or search at http://java.sun.com for the latest docs), since that nicely solves a lot of the same problems we need to. It would be easy to implement this interface in terms of the JNI, for a Java-based Scheme. */ /* Naming conventions. */ /* This interface defines a lot of functions, many of which fall into categories within which there's some regularity. To help make the names easier to remember, we try to follow some conventions. Type conversions: Many of the functions in the API convert C values to Scheme values and vice versa, or test whether such a conversion is possible without loss of information. A function that converts instances of one type A to another type B we call 'mn_A_to_B'. A function that checks whether a given value of type A can be converted to type B we call 'mn_A_is_B'. For example: - mn_str_to_string - Convert a null-terminated C string to a Scheme string. - mn_number_to_int - Convert a Scheme number to a C int. - mn_number_is_uint - Return true if a given Scheme number can be converted to a C unsigned int without underflow or overflow; otherwise, return false. We use the following lexemes in these functions' names to designate the types involved (and we pass and return the values with the following C types): - Text: - 'character', for Scheme characters - 'string', for Scheme strings - 'char', for C 'char' - 'wchar', for C 'wchar_t' - 'str', for C null-terminated byte strings (char *) - 'mem', for C arrays of arbitrary bytes (a char * and a size_t) - 'wcs', for C null-terminated wide character strings (wchar_t *) - 'wmem', for arrays of arbitrary wchar_t values (a wchar_t * and a size_t) And in : - 'utf8' for an array of well-formed UTF-8 characters (an mn_utf8_t * and a size_t) - 'unicode' for a single Unicode code point (mn_unicode_t) - Numbers: - 'number' for Scheme numbers - 'int' and 'uint' for the C types 'int' and 'unsigned int' - 'long' and 'ulong' for the C types 'long' and 'unsigned long' - 'llong' and 'ullong' for the C types 'long long', 'unsigned long long' (The name lexemes for C numeric types are the same as those used by the FOO_MIN and FOO_MAX macros defined by , except that they are in lower case.) C wrappers of Scheme functions: Some functions are C wrappers of standard Scheme functions. We give these the same name name they have in Scheme, with 'mn_' added to the front, and with hyphens changed to underscores. If the name ends with a question mark, we remove the question mark, and add 'is' in the natural place. Exclamation points are dropped. For example: Scheme function C function =================== ================= car mn_car vector-length mn_vector_length boolean? mn_is_boolean set-car! mn_set_car */ /* mn_refs: References to Minor objects. */ /* To support garbage collection, the Minor run-time needs to be able to find all the Minor objects C code is examining at any given time: if C code has access to an object, then the garbage collector will make sure not to free it. The functions in this interface never return, or accept, direct pointers to Minor objects. Instead, we introduce one level of indirection: they return and accept 'mn_ref_t' objects, which refer to Minor objects. C code: mn_ref_t *x; +-----+ | . | the mn_ref_t +--|--+ +-----+ `--------->| . | +--|--+ ,--------------. `-------->| minor object | `--------------' (The Minor objects themselves, of course, point directly to each other; mn_refs are strictly an interface to non-GC'd languages.) For example, the functions to construct and access pairs are declared like this: mn_ref_t *mn_cons (mn_call_t *c, mn_ref_t *car, mn_ref_t *cdr); mn_ref_t *mn_car (mn_call_t *c, mn_ref_t *car); mn_ref_t *mn_cdr (mn_call_t *c, mn_ref_t *cdr); (Ignore the 'mn_call_t' arguments for the moment.) The Minor run-time keeps a list of all the mn_refs that have been given to C code, and protects all the objects those mn_refs refer to from being garbage collected. We try to make mn_refs as lightweight as possible. C code using this interface is responsible for making sure every mn_ref_t it is given gets freed. Since this can be a rather complex task to manage, we introduce some wrinkles to make the common cases easy. Mn_refs come in two kinds: local, and global. - A local mn_ref_t is owned by a particular call of a C function by Minor code; when that function call returns, all the local refs it owns are automatically freed. Specifically, when Minor calls a C function, it passes an extra argument: a 'mn_call_t *' pointing to a call object created just for that Minor->C call. The functions in this interface that construct local references all take a 'mn_call_t *' as their first argument, and return local mn_refs owned by that call (with some exceptions, all marked as such). The call object also owns any mn_refs the C function may have received as arguments from Minor code. When the C function returns, all the local refs its call object owns are automatically freed, along with the call object itself. This means that, in the common case of Minor calling a C function that does some work on its arguments and then returns without stashing any Minor objects in global variables or data structures, the C code doesn't need to do any extra bookkeeping; once it returns, all the mn_refs it accumulated --- all local --- will be freed. However, the convenience of local mn_refs comes at a price: local mn_refs should not be stored in global variables, nor should they be shared with other threads. To get around this problem, you can promote a local mn_ref_t to a global mn_ref_t: - A global mn_ref_t lives until you explicitly free it. Global refs can be shared amongst other threads, and stored in global variables. However, since global refs are never automatically reclaimed, C code must take care of freeing them at the right time itself; global refs are more work to manage. Like any other kind of object shared between multiple threads, it's up to the user of this interface to ensure that one thread isn't using a global reference while another thread is freeing it. This interface provides functions to convert local mn_refs to global mn_refs and vice versa, and a function to explicitly free refs when necessary. There are also some cases where C code will need to explicitly free local mn_refs, before the call that owns them returns. For example, if a loop traverses a list using mn_car and mn_cdr, each element is returned as a local mn_ref_t. To avoid consuming storage proportional to the length of the list for all those local mn_refs, the loop must free them as it goes. To accomodate cases like this, there is also a function that explicitly frees a local mn_ref_t, mn_free_local_ref. Every function in this interface but one takes a 'mn_call_t *' as its first argument; unless stated otherwise, any mn_ref_t values it returns are owned by that call. When we use the call in the obvious way, we don't bother to give it a name in the prototype, for (a tiny bit of) legibility. The sole function that doesn't take a call argument is --- obviously --- the function that gives you your very first call: mn_first_call takes no arguments, and returns a call object. Since calls can't be shared amongst threads, every thread that wants to use this API must call this function. The first time it is called, we initialize the Minor library. Some subtleties: - Except where we state otherwise, all the functions in this interface promise to free any local refs they allocate before they return (other than the local ref(s) they return). This allows these functions to be used within long-running loops without accumulating local refs the caller has no way to free. - You may notice that even functions that don't need to allocate or return local references still expect a call argument --- if there's no need to indicate who should own any new local refs, why does the function need to know the current call? The collector also uses calls internally, as a cheap way to keep track of which threads might be accessing heap objects. If you've ever touched a heap object, you must have a call. So we can take care of adding a thread to our list when we allocate its first call --- instead of having every function in this interface check to make sure the calling thread is registered. - You may notice that this interface doesn't provide any functions that change the heap object a reference refers to. This is an important property, because it allows this interface to be perfectly thread-safe without doing any memory or execution synchronization while accessing references. Where the user's code shares references between threads (global references only, please), it's already the user's responsibility to do the right sorts of mutual exclusion to make that sharing kosher --- and that takes care of us, as well. The users manage the same synchronization burden they've always had; we don't add to it, in complexity or run-time overhead. (The functions mn_ad_car and mn_ad_cdr look a little like side-effecting functions, but the specification actually says they destroy the original reference and return you a fresh one. And it's always up to the client code to ensure that nobody destroys an object while someone else is using it. So the calling thread must have the only live pointer to the reference.) Functions that free references automatically: 'ad' functions Often a reference is meant to be passed to a function, and never used again. For example, to produce a reference the pair (1 . 2), one would need to write: mn_ref_t *one = mn_int_to_number (c, 1); mn_ref_t *two = mn_int_to_number (c, 2); mn_ref_t *onetwo = mn_cons (c, one, two); mn_free_local_ref (c, one); mn_free_local_ref (c, two); Since this is a common pattern, the Minor C API offers variants of functions that, in addition to whatever else the functions do, also free all the references they are passed as arguments. These functions have names containing the lexeme 'ad'. For example, the 'mn_cons' function takes two references to objects, and returns a reference to a pair whose car and cdr are the objects passed. The 'mn_ad_cons' function takes two references to objects and returns the same, but frees the two references it was passed. The above example can be written in a more functional style using 'mn_ad_cons': mn_ref_t *onetwo = mn_ad_cons (c, mn_int_to_number (c, 1), mn_int_to_number (c, 2)); The 'ad' functions can clean up code for traversing a Scheme list: for (l = mn_make_local_ref (c, list); mn_is_pair (c, l); l = mn_ad_cdr (c, l)) { ...; } The 'ad' functions are only a convenience; every 'ad' function could be implemented simply by calling the non-'ad' function, saving the result, and then freeing the argument references. (However, in some cases the 'ad' functions allow Minor to use an internal optimization, so they may be a bit faster.) */ /* Return a new global mn_ref_t referring to the same object as REF. REF itself may be a local or global mn_ref_t. */ mn_ref_t *mn_make_global_ref (mn_call_t *, mn_ref_t *ref); /* Like mn_make_global_ref, but then free REF as with mn_free_local_ref. */ mn_ref_t *mn_ad_global_ref (mn_call_t *, mn_ref_t *ref); /* Free the global reference REF. */ void mn_free_global_ref (mn_call_t *, mn_ref_t *ref); /* Return a new local reference, owned by CALL, that refers to the same object as REF. REF itself may be a local or global mn_ref_t. (I'm not sure what this function is good for. Maybe it would be useful for meeting some sort of allocation invariant.) */ mn_ref_t *mn_make_local_ref (mn_call_t *call, mn_ref_t *ref); /* Free the local mn_ref_t REF. If REF is actually a global mn_ref_t, do nothing. (This behavior allows code to use a global mn_ref_t where a local mn_ref_t was expected: it's fine to either call mn_free_local_ref explicitly, and it's also fine to simply return to Scheme and let the C interface clean up the local mn_refs.) */ void mn_free_local_ref (mn_call_t *, mn_ref_t *ref); /* Free the array of local references REFS, containing LEN elements, using mn_free_local_ref. (This does not free the array REFS itself.) */ void mn_free_local_ref_array (mn_call_t *c, mn_ref_t * const *refs, size_t len); /* Return the first Minor call object for the calling thread. A call object corresponds to a particular Minor->C call, but since the main program (and new threads) may begin execution in C code, and you need to have a call object before C can call Minor code, where do you get that first call? You call this function. For each thread we create a special "first" call object, that a new thread can use to call Minor functions. If there are, in fact, other active Minor->C calls in this thread, you should use the most recent of those calls instead. If you call this function while other calls are active, it will abort. (Would it be better to simply abort whenever this function is called more than once from any given thread? By not aborting unless there are other active calls, we ensure we don't hand out misleading information; isn't that good enough?) The first thread to call this function initializes the Minor runtime. */ mn_call_t *mn_first_call (void); /* Subcalls: A subcall is a call object you can create and free yourself. You can pass it to functions in this API like any other call object, and the subcall will own whatever local references they return. When you're done, you can free the subcall, which frees all the local references it owns. Every subcall is a child of some other call; a subcall can have children of its own. Freeing a parent call frees all its children as well (and their children, recursively). Since you need to have a parent call object on hand to produce a subcall, every tree of subcalls is rooted at an ordinary call object. A subcall is simply a bookkeeping aid: it's equivalent to maintaining a list of the local references yourself, except that freeing them is somewhat faster. */ /* Produce a subcall that is a child of CALL. */ mn_call_t *mn_subcall (mn_call_t *call); /* Free SUBCALL, and all its children. This frees all local references owned by SUBCALL. SUBCALL must be a subcall, produced by a call to mn_subcall, not a true call object; if it is a true call object, abort. */ void mn_free_subcall (mn_call_t *subcall); /* Free SUBCALL, and all its children. Return a new local reference, owned by CALL, referring to the same object as REF. If REF is NULL, return NULL. This is equivalent to a call to mn_make_local_ref to duplicate REF, and then a call to mn_free_subcall, to free SUBCALL. But it's a common idiom --- cleaning up after a computation that has produced a single result --- so we provide a function for it. SUBCALL must be a subcall, produced by a call to mn_subcall, not a true call object; if it is a true call object, abort. */ mn_ref_t *mn_finish_subcall (mn_call_t *call, mn_call_t *subcall, mn_ref_t *ref); /* Exceptions. */ /* The functions in this interface return exceptions in a way resembling the usual C 'errno' style, except that Minor's exceptions are Scheme exception objects, rather than integers, and the interface is reentrant, without resorting to magical definitions for an 'errno'-like variable. For each function in this interface that can return an exception, we document a distinguished 'exception return value' --- a null pointer, for example --- that indicates to the caller that an exception has occurred. Each thread has a 'pending exception' object, accessed via the 'mn_get_exception' and 'mn_set_exception' functions. When a function returns its exception return value, the caller can use 'mn_get_exception' to find the exception object describing the error. To throw an exception, C code can make an exception object, call 'mn_set_exception' to make that object the thread's pending exception, and return its own exception return value. Some of these functions take or produce strings; see the comments at the top of the "Characters" section for Minor's conventions for dealing with text. When Minor Functions Abort Instead Of Returning Exceptions, and Why: By convention, Minor C API functions handle type errors, index range errors, and numeric range errors by aborting, instead of returning an exception. These sorts of errors typically indicate bugs in the user's code itself: correct programs usually never encounter them. When this is not the case, the API provides functions to check for the conditions that would cause an abort. An interface which handles these sorts of errors by returning exceptions doesn't work well: - Users will often not check for exception return values in these cases, since they "know" the errors cannot occur. If the API reports them as exceptions which the user's code ignores, then the program behaves unpredictably, instead of failing in a controlled way. - For many functions in this API, these classes of errors are the only ones they can ever encounter, so these functions either return successfully, or not at all. This lets the user write terser, more legible code, without leaving errors unchecked. Each function's description details when it will abort. */ /* Return a new local reference to the calling thread's pending exception object. If there is no pending exception, return NULL. Note that calling this function does not clear the pending exception! You must call mn_set_exception yourself if you want future callers to see that there is no pending exception. */ mn_ref_t *mn_get_exception (mn_call_t *); /* Set the calling thread's pending exception object to EX. If you want to clear the pending exception, pass NULL as EX. */ void mn_set_exception (mn_call_t *, mn_ref_t *ex); /* Return a null-terminated string describing the exception EX. (In other words, get its error message.) The string is allocated using malloc; the caller is responsible for freeing it. If EX is not an exception, abort. */ char *mn_exception_string (mn_call_t *, mn_ref_t *ex); /* Make a generic exception with the error message MSG. MSG is a null-terminated string. If MSG cannot be converted to a Minor string, return NULL and set the pending exception. */ mn_ref_t *mn_make_generic_exception (mn_call_t *, const char *msg); /* Equality. */ /* Return true if A and B refer to the same object; otherwise, return zero. */ _Bool mn_eq (mn_call_t *, mn_ref_t *a, mn_ref_t *b); /* Return true if A and B are equal in the sense of the Scheme 'equal?' predicate. If both A and B are cyclic, this function may not return. (And, until we have annotated code running, this could block all other threads from performing a GC. This is a lot of trouble to fix in the C implementation, but will be a non-issue in annotated code, which will run faster anyway, so we're going to leave it as is for now.) */ _Bool mn_equal (mn_call_t *, mn_ref_t *a, mn_ref_t *b); /* Booleans. */ /* Return true if REF refers to a boolean value; otherwise, return false. */ _Bool mn_is_boolean (mn_call_t *, mn_ref_t *ref); /* Return a reference to #t / #f. */ mn_ref_t *mn_true (mn_call_t *); mn_ref_t *mn_false (mn_call_t *); /* Return true if REF is a true value (i.e., anything but #f); otherwise, return false. */ _Bool mn_is_true (mn_call_t *, mn_ref_t *ref); /* Return true if REF is a true value (i.e., anything but #f); otherwise, return false. Either way, free REF. */ _Bool mn_ad_is_true (mn_call_t *c, mn_ref_t *ref); /* Pairs. */ /* Return true if REF refers to a pair; otherwise, return false. */ _Bool mn_is_pair (mn_call_t *, mn_ref_t *ref); /* Return a new pair whose car is CAR and whose cdr is CDR. */ mn_ref_t *mn_cons (mn_call_t *, mn_ref_t *car, mn_ref_t *cdr); /* Like mn_cons, but free the references to CAR and CDR. The effect is identical to that of calling mn_cons, and then mn_free_local_ref twice. */ mn_ref_t *mn_ad_cons (mn_call_t *, mn_ref_t *car, mn_ref_t *cdr); /* Return the car/cdr of PAIR. If PAIR is not a pair, abort. */ mn_ref_t *mn_car (mn_call_t *, mn_ref_t *pair); mn_ref_t *mn_cdr (mn_call_t *, mn_ref_t *pair); /* Set the car/cdr of PAIR to VALUE, and return zero. If PAIR is not a pair, abort. */ void mn_set_car (mn_call_t *, mn_ref_t *pair, mn_ref_t *value); void mn_set_cdr (mn_call_t *, mn_ref_t *pair, mn_ref_t *value); /* If REF refers to a pair P, free REF and return a new reference to (car P) / (cdr P). If REF is not a pair, abort. This is no different from calling mn_car / mn_cdr and then calling mn_free_ref on the ref you passed to it, except that it's a little more readable, and the implementation can optimize the process. It's just helpful for traversing list structures. */ mn_ref_t *mn_ad_car (mn_call_t *, mn_ref_t *ref); mn_ref_t *mn_ad_cdr (mn_call_t *, mn_ref_t *ref); /* Lists. */ /* Return true if REF refers to a proper list; otherwise, return false. A proper list is either the empty list object (), or a pair whose cdr is a list. A "cyclic list", or a series of pairs chained through their cdrs whose last cdr points to an earlier pair in the series, is not a proper list; this function returns false given such a list. */ _Bool mn_is_list (mn_call_t *, mn_ref_t *obj); /* Return the length of the proper list LIST. If LIST isn't a proper list, or if the length doesn't fit in an int, return -1, and set the pending exception. This detects cyclic lists. */ int mn_length (mn_call_t *, mn_ref_t *list); /* Return a copy of LIST: the spine of the list is made from new pairs, but the elements are shared with the original. If LIST is not a proper list, return NULL, and set the pending exception. This detects cyclic lists. */ mn_ref_t *mn_copy_list (mn_call_t *, mn_ref_t *list); /* Return a reference to the empty list. */ mn_ref_t *mn_null (mn_call_t *); /* Return true if REF refers to the empty list, false otherwise. */ _Bool mn_is_null (mn_call_t *, mn_ref_t *ref); /* Allocate a new pair whose car is ELT and whose cdr is LIST. Free LIST, and return a reference to the new pair. This is no different from calling mn_cons and then mn_free_ref, except that it's a little more readable, and the implementation can optimize the process. */ mn_ref_t *mn_push (mn_call_t *, mn_ref_t *list, mn_ref_t *elt); /* Thread safety note: The functions in Minor that traverse lists are designed to behave reasonably on cyclic data structures, but it is still possible to confuse them by having other threads mutate the list structure while it is being traversed, to the point that they never return. Minor should (at least) promise that such infinite loops will not take place while holding any important internal locks, so other threads will be able to make forward progress. Unfortunately, that's very difficult to achieve in a run-time library implemented using incoherent sections, and we don't. It's our hope that re-implementing the run-time in Scheme will remove this problem, by allowing us to use annotated machin code instead of incoherent sections. */ /* Numbers. */ /* At the moment, Minor supports only exact integers --- and only fixnums at that. But it's obvious how the new functions should be added, and the existing functions shouldn't need to change their behaviors. */ /* Return true if REF refers to a number; otherwise, return false. */ _Bool mn_is_number (mn_call_t *, mn_ref_t *ref); /* Return true if REF refers to an exact number; return false otherwise. */ _Bool mn_is_exact (mn_call_t *, mn_ref_t *ref); /* Return true if REF refers to an integer; otherwise, return false. */ _Bool mn_is_integer (mn_call_t *, mn_ref_t *ref); /* Return true if REF refers to an exact integer; return false otherwise. This is equivalent to: mn_is_integer (call, ref) && mn_is_exact (call, ref) */ _Bool mn_is_exact_integer (mn_call_t *, mn_ref_t *ref); /* Return true iff N can be represented as an exact Minor integer. (Eventually, Minor will support bignums, and these functions will always return true. But: - the bignum support isn't done yet, and our own rules say we provide functions that check for each condition that could cause an abort, and - If this interface is to serve as a model for other Scheme implementations, it needs to support those where exact numbers have a limited range. */ _Bool mn_int_is_number (mn_call_t *, int n); _Bool mn_uint_is_number (mn_call_t *, unsigned int n); _Bool mn_long_is_number (mn_call_t *, long n); _Bool mn_ulong_is_number (mn_call_t *, unsigned long n); _Bool mn_llong_is_number (mn_call_t *, long long n); _Bool mn_ullong_is_number (mn_call_t *, unsigned long long n); /* Return true iff N is an exact integer that can fit in the given type, false otherwise. */ _Bool mn_number_is_int (mn_call_t *, mn_ref_t *n); _Bool mn_number_is_uint (mn_call_t *, mn_ref_t *n); _Bool mn_number_is_long (mn_call_t *, mn_ref_t *n); _Bool mn_number_is_ulong (mn_call_t *, mn_ref_t *n); _Bool mn_number_is_llong (mn_call_t *, mn_ref_t *n); _Bool mn_number_is_ullong (mn_call_t *, mn_ref_t *n); /* Return a mn_ref_t for an exact integer equal to N. If n is beyond the range of integers Minor can represent, abort. */ mn_ref_t *mn_int_to_number (mn_call_t *, int n); mn_ref_t *mn_uint_to_number (mn_call_t *, unsigned int n); mn_ref_t *mn_long_to_number (mn_call_t *, long n); mn_ref_t *mn_ulong_to_number (mn_call_t *, unsigned long n); mn_ref_t *mn_llong_to_number (mn_call_t *, long long n); mn_ref_t *mn_ullong_to_number (mn_call_t *, unsigned long long n); /* If N refers to an exact integer, return its value. If N does not refer to an exact integer, or if its value does not fit in the given return type, abort. */ int mn_number_to_int (mn_call_t *, mn_ref_t *n); unsigned int mn_number_to_uint (mn_call_t *, mn_ref_t *n); long mn_number_to_long (mn_call_t *, mn_ref_t *n); unsigned long mn_number_to_ulong (mn_call_t *, mn_ref_t *n); long long mn_number_to_llong (mn_call_t *, mn_ref_t *n); unsigned long long mn_number_to_ullong (mn_call_t *, mn_ref_t *n); /* Why not intmax_t? ptrdiff_t? intptr_t? */ /* Arithmetic and comparison functions could go here. */ /* Return true if N is numerically equal to M, false otherwise. (This returns true when, in Scheme, (= N M)). If N or M are not numbers, abort. */ _Bool mn_numbers_equal (mn_call_t *, mn_ref_t *n, mn_ref_t *m); /* Characters. */ /* General Conventions For Handling Text and Character Sets: Minor uses Unicode to represent characters and strings; C uses a representation that varies from one locale to another. Where the functions in this API accept or return 'char' or 'wchar_t' values, or strings made from them, those values use the current C execution character set; the API converts to and from Minor's internal representation as needed. This means that you can use such values with the standard C library functions for working with text (getchar, printf, atoi, and so on) in the normal way, without worrying about what representation Minor is using. Since the encoding of characters in the current C execution character set is determined by the current locale, the behavior of these functions may depend on the current locale --- specifically, that established for the LC_CTYPE category. Errors can occur during conversion: byte strings may not be well-formed encodings of code points; code points may be unassigned; and characters may not exist in the destination character set. Minor reports errors that would result in the loss of information. However, if a conversion can be performed without doing so, Minor may carry it through; for example, if the C execution character set is also Unicode, then Minor can arbitrary code points to characters or store them in strings, even if those code points have no character assigned to them. ISO C divides the C execution character set into the "basic character set" --- the upper- and lower-case letters, the digits, the graphic symbols used in C syntax (all the ASCII symbols but '$', '@', or '`'), the whitespace characters, and the null character --- and "extended characters". Characters in the basic character set, and strings containing only such characters, may always be converted to and from Minor values without error. */ /* Return true if REF refers to a character; otherwise, return false. */ _Bool mn_is_character (mn_call_t *, mn_ref_t *ref); /* Convert between Minor characters and the C execution character set. (Hah! "Minor characters"??? Get it? Pretty funny, huh!) */ /* Return true if CHARACTER is a character, and that character can be represented as a C char / wchar_t, false otherwise. */ _Bool mn_is_char (mn_call_t *, mn_ref_t *character); _Bool mn_is_wchar (mn_call_t *, mn_ref_t *character); /* Return CHARACTER as a C char / wchar_t. If CHARACTER cannot be represented in the given type, return EOF / WEOF, and set the pending exception. If CHARACTER is not a character, abort. */ int mn_character_to_char (mn_call_t *, mn_ref_t *character); wint_t mn_character_to_wchar (mn_call_t *, mn_ref_t *character); /* Return the Minor character corresponding to the 'char' or 'wchar_t' value C. If C cannot be converted to a Minor character, return EOF / WEOF and set the pending exception. */ mn_ref_t *mn_char_to_character (mn_call_t *, int c); mn_ref_t *mn_wchar_to_character (mn_call_t *, wchar_t c); /* Strings. */ /* A string is an array of characters. Minor strings are immutable. (This is a deviation from Scheme.) See the comments in the "Characters" section describing the general conventions for handling text and dealing with conversion errors. The functions here that provide the contents of a string all produce copies of the text for the user's use. If it's important to avoid this, then we could introduce a lease-based interface here. Leases are described in the file doc/leases. */ /* Return true if REF refers to a string; otherwise, return false. */ _Bool mn_is_string (mn_call_t *, mn_ref_t *ref); /* Return the length of STRING, in characters. If STRING is not a string object, abort. */ size_t mn_string_length (mn_call_t *, mn_ref_t *string); /* Return the i'th character of STRING. If STRING is not a string, or doesn't have that many characters, abort. */ mn_ref_t *mn_string_ref (mn_call_t *, mn_ref_t *string, int i); /* Return the contents of STRING as a null-terminated string. The memory for the string returned is allocated using malloc; the caller is responsible for freeing it. If STRING contains null characters, truncate it just before the first one. (Would it be more useful to just return the entire string, embedded nulls and all, with an extra null on the end?) If STRING cannot be fully and accurately converted to the C execution character set, return NULL and set the pending exception. If STRING is not a string, abort. */ char *mn_string_to_str (mn_call_t *, mn_ref_t *string); /* Return the contents of STRING as a block of characters, and set *LENGTH to its length in bytes. The memory returned is allocated *using malloc; the caller is responsible for freeing it. If STRING cannot be fully and accurately converted to the C execution character set, return NULL and set the pending exception. If STRING is not a string, abort. */ char *mn_string_to_mem (mn_call_t *, mn_ref_t *string, size_t *length); /* Return a Minor string object whose contents are the same as the null-terminated string STR. This is a copy of STR; the returned string does not refer to STR's memory. If STR cannot be fully and accurately converted to a Minor string, return NULL and set the pending exception. (For storing arbitrary sequences of bytes, use byte vectors; they are described in bytevec.h.) */ mn_ref_t *mn_string_from_str (mn_call_t *, const char *str); /* Return a Minor string object whose contents are the same as the LENGTH bytes at MEM. This makes a copy of MEM; the returned string does not refer to MEM's memory. MEM need not be null-terminated, and may contain embedded null characters. If MEM cannot be fully and accurately converted to a Minor string, return NULL and set the pending exception. (For storing arbitrary sequences of bytes, use byte vectors; they are described in bytevec.h.) */ mn_ref_t *mn_string_from_mem (mn_call_t *, const char *mem, size_t length); /* Symbols. */ /* See the comments in the "Characters" section describing the general conventions for handling text and dealing with conversion errors. */ /* Return true if REF refers to a symbol; otherwise, return false. */ _Bool mn_is_symbol (mn_call_t *, mn_ref_t *ref); /* Return the symbol whose name is NAME. If NAME is not a string, abort. */ mn_ref_t *mn_string_to_symbol (mn_call_t *, mn_ref_t *name); /* Return the name of the symbol SYMBOL as a Minor string. If SYMBOL is not a symbol, abort. */ mn_ref_t *mn_symbol_name (mn_call_t *, mn_ref_t *symbol); /* Return the symbol whose name is the null-terminated C string NAME. Every symbol's name is a valid string. If NAME cannot be fully and accurately converted to a string, return NULL and set the pending exception. */ mn_ref_t *mn_symbol_from_str (mn_call_t *, const char *name); /* Return the name of the symbol SYMBOL, as a malloc'd block of characters, and set *LENGTH to its length in bytes. The memory for the string returned is allocated using malloc; the caller is responsible for freeing it. If SYMBOL's name cannot be fully and accurately converted to a string, return NULL and set the pending exception. If SYMBOL is not a symbol, abort. */ char *mn_symbol_to_mem (mn_call_t *, mn_ref_t *symbol, size_t *length); /* Procedures. */ /* The Minor run-time keeps track of each time Scheme code calls a C function, and each time C code calls a Scheme function. Every local mn_ref_t is 'owned' by a particular Scheme->C call; when that call returns, all the local mn_refs it owns are freed. All the arguments passed in a given Scheme->C call are owned by that call. Furthermore, all local mn_refs allocated by functions in this API get attributed to the most recent Scheme->C call on the calling thread's stack. When that Scheme->C call returns, the run-time frees all the local mn_ref_t's it owned. */ /* Return true if REF refers to a procedure; otherwise, return false. */ _Bool mn_is_procedure (mn_call_t *, mn_ref_t *ref); /* Apply PROC to the Scheme list of arguments, ARGS. PROC must return exactly one value, to which we return a reference. If the application of PROC throws an exception, save that as the pending exception and return zero. If any of the following constraints are not met, set the pending exception and return zero: - PROC must be a procedure. - PROC must take as many arguments as there are elements in ARGS. - PROC must return exactly one value. */ mn_ref_t *mn_apply (mn_call_t *, mn_ref_t *proc, mn_ref_t *args); /* mn_callN applies FUNC to ARG1, ARG2, ... and returns the single result value. */ mn_ref_t *mn_call1 (mn_call_t *, mn_ref_t *func, mn_ref_t *arg); /* Apply PROC to the Scheme list of arguments, ARGS. Return the values PROC returns as a Scheme list. If PROC returns an exception, save that as the pending exception, and return zero. If PROC is not a procedure at all, return zero and set the pending exception. */ mn_ref_t *mn_apply_multi_valued (mn_call_t *, mn_ref_t *proc, mn_ref_t *args); /* Create a new Scheme procedure PROC based on the C function FUNC. Calling PROC calls FUNC with: - a fresh mn_call_t object NEW_CALL as its first argument, - a fresh reference to CLOSURE as its second argument, and - local references to the first N arguments to PROC as subsequent arguments. If PROC was created by one of the mn_make_procedure_N_rest functions, then calling PROC also pass FUNC one more argument at the end, which is a fresh reference to a fresh list of any remaining arguments passed to PROC, beyond the first N. The local references passed to FUNC are all owned by NEW_CALL. FUNC should return a reference to a Scheme object to be the single return value of the Scheme function. If FUNC returns zero, then the pending exception (see mn_get_exception) is thrown. When FUNC returns, all the local refs owned by NEW_CALL are freed. So, you might define a function 'foo' for a Scheme procedure that takes two arguments like this: mn_ref_t * foo (mn_call_t *call, mn_ref_t *closure, mn_ref_t *arg1, mn_ref_t *arg2) { ... } ... scheme_foo = mn_make_procedure_2 (call, foo, closure, "foo"); If NAME is non-zero, it is taken as a string to use as the procedure's name when the procedure value is printed. The procedure holds its own copy of NAME; it does not refer to the string it is passed. At present, we only provide functions to create procedures taking up to four fixed arguments, but it's easy to add support for more fixed arguments. If the limitations cause you trouble, feel free to extend the interface. CLOSURE may be NULL; in this case, FUNC will be passed NULL as its own CLOSURE argument. It's kind of gross to write out all these declarations like this, but it's worth it to get the static type checking from the C compiler. */ mn_ref_t *mn_make_procedure_0 (mn_call_t *, mn_ref_t *(*func) (mn_call_t *new_call, mn_ref_t *closure), mn_ref_t *closure, const char *name); mn_ref_t *mn_make_procedure_1 (mn_call_t *, mn_ref_t *(*func) (mn_call_t *new_call, mn_ref_t *closure, mn_ref_t *arg1), mn_ref_t *closure, const char *name); mn_ref_t *mn_make_procedure_2 (mn_call_t *, mn_ref_t *(*func) (mn_call_t *new_call, mn_ref_t *closure, mn_ref_t *arg1, mn_ref_t *arg2), mn_ref_t *closure, const char *name); mn_ref_t *mn_make_procedure_3 (mn_call_t *, mn_ref_t *(*func) (mn_call_t *new_call, mn_ref_t *closure, mn_ref_t *arg1, mn_ref_t *arg2, mn_ref_t *arg3), mn_ref_t *closure, const char *name); mn_ref_t *mn_make_procedure_4 (mn_call_t *, mn_ref_t *(*func) (mn_call_t *new_call, mn_ref_t *closure, mn_ref_t *arg1, mn_ref_t *arg2, mn_ref_t *arg3, mn_ref_t *arg4), mn_ref_t *closure, const char *name); mn_ref_t *mn_make_procedure_0_rest (mn_call_t *, mn_ref_t *(*func) (mn_call_t *new_call, mn_ref_t *closure, mn_ref_t *rest), mn_ref_t *closure, const char *name); mn_ref_t *mn_make_procedure_1_rest (mn_call_t *, mn_ref_t *(*func) (mn_call_t *new_call, mn_ref_t *closure, mn_ref_t *arg1, mn_ref_t *rest), mn_ref_t *closure, const char *name); mn_ref_t *mn_make_procedure_2_rest (mn_call_t *, mn_ref_t *(*func) (mn_call_t *new_call, mn_ref_t *closure, mn_ref_t *arg1, mn_ref_t *arg2, mn_ref_t *rest), mn_ref_t *closure, const char *name); mn_ref_t *mn_make_procedure_3_rest (mn_call_t *, mn_ref_t *(*func) (mn_call_t *new_call, mn_ref_t *closure, mn_ref_t *arg1, mn_ref_t *arg2, mn_ref_t *arg3, mn_ref_t *rest), mn_ref_t *closure, const char *name); mn_ref_t *mn_make_procedure_4_rest (mn_call_t *, mn_ref_t *(*func) (mn_call_t *new_call, mn_ref_t *closure, mn_ref_t *arg1, mn_ref_t *arg2, mn_ref_t *arg3, mn_ref_t *arg4, mn_ref_t *rest), mn_ref_t *closure, const char *name); /* Make a procedure but specify its arity dynamically. The returned procedure expects NARGS fixed arguments, and if REST is true, it expects a rest argument as well. NARGS must be between zero and four; this function doesn't let you get beyond the limitations on the number of fixed arguments. mn_func_t is never the right type for the FUNC argument: vararg function types are not compatible with function types that actually spell out their arguments' types. So you'll always need to cast that argument, and the C compiler won't check that you're passing a function that actually accepts the number of arguments it'll get. Use the mn_make_procedure_N[_rest] functions if you can. */ typedef mn_ref_t *mn_func_t (mn_call_t *, void *, ...); mn_ref_t *mn_make_procedure (mn_call_t *, mn_func_t *func, int nargs, _Bool rest, mn_ref_t *closure, const char *name); /* The maximum number of fixed arguments a procedure created by mn_make_procedure can expect. The only thing this is really good for, as far as I can tell, is for checking consistency between Minor and the test suite. */ #define MN_C_PROC_MAX_FIXED_ARITY (4) /* Like mn_make_procedure, except that FUNC's return value is a Scheme list of values, to be returned as the values of the function. Exception returns are handled the same way. */ mn_ref_t *mn_make_multi_valued_procedure (mn_call_t *, mn_func_t *func, int nargs, _Bool rest, mn_ref_t *closure, const char *name); /* Return the name of the procedure PROC, or NULL if it has none. The return value is allocated using malloc; it is the caller's responsibility to free it. If PROC is not a procedure, abort. */ char *mn_procedure_name (mn_call_t *, mn_ref_t *proc); /* It might be nice to add alternate entry points here that let you say more about the types of arguments functions expect, so more conversions can happen under the hood, and fewer local mn_refs will need to be allocated. */ /* Vectors. */ /* Return true if REF refers to a vector; otherwise, return false. */ _Bool mn_is_vector (mn_call_t *, mn_ref_t *ref); /* Return the length of VECTOR. If VECTOR is not a vector, abort. */ size_t mn_vector_length (mn_call_t *, mn_ref_t *vector); /* Return a new LEN-element vector whose elements are all ELT. */ mn_ref_t *mn_make_vector (mn_call_t *, size_t len, mn_ref_t *elt); /* Return a new LEN-element vector whose i'th element is ELTS[i]. */ mn_ref_t *mn_vector_from_array (mn_call_t *, mn_ref_t * const *elts, size_t len); /* Return an array whose i'th element is a reference to the i'th element of VECTOR. If LEN is non-zero, set *LEN to the length of VECTOR. The memory returned is allocated with malloc; it's the caller's responsibility to free it --- as well as every reference it contains; mn_free_ref_array may be helpful here. If VECTOR is not a vector, abort. */ mn_ref_t **mn_vector_to_array (mn_call_t *, mn_ref_t *vector, size_t *len); /* Return the I'th element of VECTOR. If VECTOR isn't a vector, or hasn't that many elements, abort. */ mn_ref_t *mn_vector_ref (mn_call_t *, mn_ref_t *vector, int i); /* Set the I'th element of VECTOR to OBJ, and return true. If VECTOR isn't a vector or hasn't that many elements, abort. */ void mn_vector_set (mn_call_t *, mn_ref_t *vector, int i, mn_ref_t *obj); /* Return a vector of the same length as LIST, whose i'th element is LIST's i'th element. If LIST is not a proper list, abort. */ mn_ref_t *mn_list_to_vector (mn_call_t *, mn_ref_t *list); /* Return a list of the same length as VECTOR, whose i'th element is VECTOR's i'th element. If VECTOR is not a vector, abort. */ mn_ref_t *mn_vector_to_list (mn_call_t *, mn_ref_t *vector); /* Input/Output ports. */ /* Naming conventions: The C or headers declare input and output functions whose names follow the convention that, if the function takes an explicit stream argument, the name starts with 'f': fprintf, fputs, and so on. The Minor C API function corresponding to a or input / output function named fFOO is named mn_FOO. All Minor C API input / output functions take an explicit port argument. The arguments are the same as those to the C function, except that a 'call' argument is added at the front, and the port always comes immediately after the 'call' argument. For example: C function Minor function ========================== =============================================== fputs (char *, FILE *) mn_puts (mn_call_t *, mn_ref_t *port, char *) fputws (wchar_t *, FILE *) mn_putws (mn_call_t *, mn_ref_t *port, wchar_t *) wint_t ungetwc wint_t mn_ungetwc (wint_t, FILE *) (mn_call_t *, mn_ref_t *port, wchar_t ch) Returning Exceptions: Like the C functions, the functions in this section use EOF or WEOF as their exception return value, and the input functions also return EOF or WEOF to indicate that they have reached end-of-file. So EOF / WEOF is both a normal return value and an exception return value. To distinguish these two cases, when an input function reaches end-of-file, it will always set its end-of-file indicator. If it encounters an error, it will always set the pending exception (obviously). Thus, when a function returns EOF or WEOF, the caller should always check the end-of-file indicator, and if it is clear, check the pending exception. For example: if ((ch = mn_getc (c, port)) == EOF) { if (mn_port_at_eof (c, port)) ... handle EOF ... else ... handle exception mn_get_exception (c) ... } The End-of-File Indicator: On POSIX systems, it is possible for an input stream to reach end- of-file more than once. For example, when a user types control-D at a terminal, a process reading from that terminal receives an end-of-file indication --- the 'read' system call returns zero, indicating that no bytes were read. However, the user may then continue to enter characters at the terminal, which can include more EOF characters. So from the reading process's point of view, even after receiving an end-of-file, there may be more data left to read. For consistency with the C input functions, Minor input ports include an end-of-file indicator, which records whether end-of-file has been reached. See the individual functions' descriptions for the details of house the end-of-file indicator works. (Note that only the C functions defined in this interface respect a port's end-of-file indicator. Scheme input functions operating on the same port may consume new input even if the end-of-file indicator is set. The end-of-file indicator is just a convenience for C programmers, making Minor ports behave more like C streams.) See the comments at the start of the section on "Characters" for Minor's general conventions for processing text. */ /* Return true if REF refers to a port, input port, or output port; otherwise, return false. */ _Bool mn_is_port (mn_call_t *, mn_ref_t *ref); _Bool mn_is_input_port (mn_call_t *, mn_ref_t *ref); _Bool mn_is_output_port (mn_call_t *, mn_ref_t *ref); /* Return true if REF refers to an open port; otherwise, return false. */ _Bool mn_is_open_port (mn_call_t *, mn_ref_t *ref); /* Create a new input or output port based on the Standard I/O file object FILE. The returned port's end-of-file indicator is always equal to that of FILE. Each port returned by these functions may be used either with the byte input/output routines (mn_putc; mn_getc; etc.) or the wide character input/output routines (mn_putwc; mn_getwc; etc.), but not both: you may not mix byte and wide character operations on a given port. The first operation performed determines the port's orientation (byte or wide character), and from that point on only operations of the same orientation may be performed. There is currently no way to arrange for FILE to be freed if the port object becomes garbage. We will eventually fix this, once we implement guardians. */ mn_ref_t *mn_make_stdio_input_port (mn_call_t *, FILE *file); mn_ref_t *mn_make_stdio_output_port (mn_call_t *, FILE *file); /* If PORT was created using mn_make_stdio_input_port or mn_make_stdio_output_port, return the underlying FILE object. If PORT is not a port at all, abort. Otherwise, return NULL. */ FILE *mn_stdio_port_file (mn_call_t *, mn_ref_t *port); /* Close PORT. If PORT is an output port with buffered unwritten data, write it out. If all goes well, return true; if an error occurs while doing the output, set the pending exception and return false. If PORT is already closed, this function has no effect. If PORT is not a port, abort. */ _Bool mn_close_port (mn_call_t *, mn_ref_t *port); /* Print the byte or wide character CH on PORT, and return CH. (For mn_putc, CH is first converted to an unsigned character.) If an error occurs while doing the output, return EOF / WEOF and set the pending exception. If PORT is not an open output port, abort. */ int mn_putc (mn_call_t *, mn_ref_t *port, int ch); wint_t mn_putwc (mn_call_t *, mn_ref_t *port, wchar_t ch); /* Print OBJECT on PORT the way 'write' does, and return true. This is a 'wide character' operation. If an error occurs while doing the output, return false and set the pending exception. If PORT is not an open output port, abort. */ _Bool mn_write (mn_call_t *, mn_ref_t *object, mn_ref_t *port); /* Print OBJECT on PORT the way 'display' does, and return true. This is a 'wide character' operation. If an error occurs while doing the output, return false and set the pending exception. If PORT is not an open output port, abort. */ _Bool mn_display (mn_call_t *, mn_ref_t *object, mn_ref_t *port); /* Print the null-terminated string / wide string STRING on PORT, and return a positive value. If an error occurs while doing the output, return EOF and set the pending exception. If PORT is not an open output port, abort. */ int mn_puts (mn_call_t *, mn_ref_t *port, const char *string); int mn_putws (mn_call_t *, mn_ref_t *port, const wchar_t *string); /* Print STRING, containing LENGTH bytes / wchar_t's, on PORT, and return true. If an error occurs while doing the output, return false and set the pending exception. If PORT is not an open output port, abort. */ int mn_put_mem (mn_call_t *, mn_ref_t *port, const char *string, size_t length); int mn_put_wmem (mn_call_t *, mn_ref_t *port, const wchar_t *string, size_t length); /* If PORT's end-of-file indicator is set, return EOF / WEOF. Otherwise, read a single byte / multibyte character from PORT, and return it. If end-of-file is reached, set PORT's end-of-file indicator and return EOF / WEOF. If an error occurs while doing the input, return false and set the pending exception. If PORT is not an open input port, abort. */ int mn_getc (mn_call_t *, mn_ref_t *port); wint_t mn_getwc (mn_call_t *, mn_ref_t *port); /* Push back the character / wide character CH onto PORT. If the call is successful, clear PORT's end-of-file indicator and return CH. (For mn_ungetc, CH is first converted to an unsigned character.) Only one character of pushback is guaranteed. If too many calls to mn_ungetc / mn_ungetwc are made without intervening calls to consume the pushed-back characters, they return EOF / WEOF and set the pending exception. If PORT is not an open input port, abort. */ int mn_ungetc (mn_call_t *, mn_ref_t *port, int ch); wint_t mn_ungetwc (mn_call_t *, mn_ref_t *port, wchar_t ch); /* If PORT's end-of-file indicator is set, return the end-of-file object. Otherwise, read a datum from PORT the way 'read' does, and return it. If no datum, complete or partial, is found on PORT before end-of-file is reached, set the end-of-file indicator, and return the end-of-file object. If there is an incomplete or ill-formed datum on PORT, or if an error occurs while doing the input, return NULL and set the pending exception. This is a 'wide character' operation. If PORT is not an open input port, abort. */ mn_ref_t *mn_read (mn_call_t *, mn_ref_t *port); /* Return true if PORT's end-of-file indicator is set, or false otherwise. NOTE: this just tests the end-of-file indicator; it doesn't go and check whether PORT is actually at end-of-file. That is, you must get an EOF / WEOF from some input function before this will return true. If PORT is not an open input port, abort. */ _Bool mn_port_at_eof (mn_call_t *, mn_ref_t *port); /* Clear PORT's end-of-file indicator. If PORT is not an open input port, abort. */ void mn_port_clear_eof (mn_call_t *, mn_ref_t *port); /* Return true if OBJ is the Scheme end-of-file object. */ _Bool mn_is_eof_object (mn_call_t *, mn_ref_t *obj); /* Return a reference to the end-of-file object. */ mn_ref_t *mn_eof_object (mn_call_t *); /* Eventually, I'd like to provide something that lets you do arbitrary computation to provide the stream's contents. It shouldn't require a function call for every character passed, and it should support some sort of buffering. We'll use it for character set conversion, compression, and so on. The C++ arrangement is probably good. MzScheme's custom ports meet all those criteria, but they're awfully complicated; is all that really necessary? */ /* Top-Level Environments. */ /* A top-level environment provides bindings for identifiers, either as variables with locations or syntactic keywords with expanders, that you can examine and modify incrementally, and evaluate code in. Identifiers are represented by symbols for now; maybe we'll change that later, for hygiene. Minor's rules for interpreting references to identifiers are as follows: - Code presented to 'eval' (or one of the related functions in this interface) is expanded immediately to core Scheme forms before any evaluation of the program those forms denote takes place. - Whether a particular use of an identifier is interpreted as a variable reference or a syntactic form depends on what sort of binding is in scope for that identifier when the use is expanded. If no binding is in scope, the use is assumed to refer to a variable which has not yet been defined. - When a variable reference is evaluated, whatever binding is present in the environment at that point is the one used. If the identifier is not currently bound to a variable, Minor throws an exception. - If you re-define an identifier, the new definition is visible to all prior variable references enclosed in that environment. For example, suppose we have an environment E which has no binding for the identifier x, and that we evaluate the expression (lambda () x) in E. - Evaluating the lambda expression will not raise an exception, even though x is unbound. Call the resulting procedure P. - Applying P now will throw an exception, complaining that x is unbound. - If we define x in E as a variable with value V1 and apply P again, P will return V1. - If we re-define x in E as a variable with value V2 and apply P again, P will return V2. - If we re-define x in E as a syntactic keyword, then applying P again will throw an exception, complaining that x is not a variable. - If we re-define x in E as a variable with value V3 and apply P again, P will return V3. The goal here is to allow you to think of all uses of a particular identifier in a particular environment as references to the same thing, even as identifiers are re-defined. (We don't achieve that, however: we expand syntax once, completely, before evaluation, and don't keep any record of the syntactic keywords we used, so re-defining a syntactic keyword as a variable doesn't cause code that referred to the syntactic keyword to break. This means you can be running code that uses two conflicting definitions of the same variable at once.) Apology: I'm not at all sure how environments should behave. One useful characteristic that some module systems have is the ability to export variables in a way that does not allow importing modules to assign to them. In such systems, the compiler can generate code for the exporting module under the assumption that the assignments it sees to those variables as it compiles the module are all that there will ever be in the whole program. Which is pretty nice. If the exporting module never mutates those variables, the compiler can treat the variable as a constant in importing modules. Which is also pretty nice. You could accomplish this with declarations. You could have a separate form of definition, 'define-constant', that tells the compiler it can assume the variable's value won't be changed. But it seems more graceful to have a way for the compiler to simply see that it is so from the way it is used; the fact of variable's constantness should just be evident from the code, even in the presence of separate compilation. Clearly some kind of extra information is needed (because otherwise new assignments to the variable could always show up), but it should be something much more general than 'define-constant' --- for example, if the variable's value is always a list, that would be nice for the compiler to be able to figure out, too; but does that mean we should add 'define-typed'? This can go on and on. Another useful characteristic: the Flatt module system and the proposed R6RS 'library' system impose a separation of phases that seems really valuable. I think something like this is exactly the right way to give some well-defined meaning to separate compilation. But it seems like Scheme should provide primitives that *allow* the construction of facilities like that. Building them directly into the language is an admission, in my eyes, that Scheme isn't really as good a language for playing with these kinds of ideas as it aspires to be. I came across some advice on solving hard problems from someone who was successful at that: Think of a simpler problem that seems similar. If you can't solve that, think of a simpler one. If you can solve it, then SOLVE IT. Don't just say you know how to solve it. After that, you might think of a way to attack the harder problem. You might realize something. --- Richard Feynman According to Don P. Mitchell http://www.mentallandscape.com/Writings_Feynman.htm I have no real clue how to address the sorts of issues I raised above. But then, neither have I actually written a hygienic macro expander for Scheme. So I'm just going to knock out a straightforward macro expander and interpreter based on a one-phase environment model, because I'm sure I can do that. When that's done, maybe I'll have a better perspective. So go ahead and use these functions. They will change, but if I'm doing it right, whatever they change to will probably permit these interfaces to implemented --- perhaps not optimally, but adequately. */ /* Create a new, empty top-level environment. */ mn_ref_t *mn_make_environment (mn_call_t *); /* Return true if OBJ is an environment, false otherwise. */ _Bool mn_is_environment (mn_call_t *, mn_ref_t *obj); /* Return true if (the identifier denoted by) SYMBOL is bound at all in ENV, either as a variable or as syntax. */ _Bool mn_is_bound (mn_call_t *, mn_ref_t *env, mn_ref_t *symbol); /* Return true if (the identifier denoted by) SYMBOL is a variable in ENV. Otherwise, return false. If ENV is not an environment, or if SYMBOL is not a symbol, abort. */ _Bool mn_is_variable (mn_call_t *, mn_ref_t *env, mn_ref_t *symbol); /* Bind (the identifier denoted by) SYMBOL to a location whose value is VALUE in ENV, and return true. "Minor's rules for interpreting references to identifiers", above, describe this function's behavior if SYMBOL is already bound. If ENV is not an environment, or SYMBOL is not a symbol, abort. */ void mn_define (mn_call_t *, mn_ref_t *env, mn_ref_t *symbol, mn_ref_t *value); /* Return the value of the variable denoted by SYMBOL in ENV. If SYMBOL does not denote a variable in ENV, return NULL and set the pending exception. (This could occur if SYMBOL is unbound in ENV, or if SYMBOL denotes a syntactic keyword.) If ENV is not an environment or SYMBOL is not a symbol, abort. */ mn_ref_t *mn_variable_value (mn_call_t *, mn_ref_t *env, mn_ref_t *symbol); /* Set the value of the variable denoted by SYMBOL in ENV to VALUE, and return true. If SYMBOL does not denote a variable in ENV, return false and set the current exception. (This could occur if SYMBOL is unbound in ENV, or if SYMBOL is a syntactic keyword.) If ENV is not an environment or SYMBOL is not a symbol, abort. */ _Bool mn_set_variable (mn_call_t *, mn_ref_t *env, mn_ref_t *symbol, mn_ref_t *value); /* Return true if (the identifier denoted by) SYMBOL is a syntactic keyword in ENV. Otherwise, return false. If ENV is not an environment, or if SYMBOL is not a symbol, abort. */ _Bool mn_is_syntax (mn_call_t *, mn_ref_t *env, mn_ref_t *symbol); /* Bind (the identifier denoted by) SYMBOL to the syntactic transformer TRANSFORMER in ENV. TRANSFORMER must be a procedure of one argument, just as in the 'define-syntax' form. "Minor's rules for interpreting references to identifiers", above, describe this function's behavior if SYMBOL is already bound. If ENV is not an environment or SYMBOL is not a symbol, abort. */ void mn_define_syntax (mn_call_t *, mn_ref_t *env, mn_ref_t *symbol, mn_ref_t *transformer); /* Return the transformer for the syntactic keyword SYMBOL in ENV. If SYMBOL is not bound to a syntactic keyword in ENV, return NULL and set the pending exception. (This could occur if SYMBOL is a variable in ENV, or if it is unbound altogether.) If ENV is not an environment or SYMBOL is not a symbol, abort. */ mn_ref_t *mn_syntax_transformer (mn_call_t *, mn_ref_t *env, mn_ref_t *symbol); /* Remove any binding for SYMBOL in ENV. If SYMBOL is not bound in ENV, do nothing. If ENV is not an environment or SYMBOL is not a symbol, abort. */ void mn_undefine (mn_call_t *, mn_ref_t *env, mn_ref_t *symbol); /* Return the environment containing all the usual bindings for Minor Scheme. NOTE: Don't modify this environment, unless you really mean to affect what other code sees as the default Minor environment. For user interaction, running scripts, etc. you should make a fresh environment, and copy this environment into it. */ mn_ref_t *mn_default_environment (mn_call_t *); /* Copy all the bindings in SRC to DEST. If SRC and DEST bind the same identifier, the binding from SRC replaces the binding from DEST. If SRC and DEST are not both environments, abort. */ void mn_environment_merge (mn_call_t *, mn_ref_t *dest, mn_ref_t *src); /* For each bound symbol in ENV, call FUNC, passing it a fresh call object, CLOSURE, and SYMBOL. FUNC should return true for the iteration to proceed; if it wants to throw an exception, it should call mn_set_exception and return false. If FUNC returns true for every binding in ENV, return true. SYMBOL is owned by the new call. If FREE_CLOSURE is non-zero, and a continuation is applied that causes this iteration to be abandoned, apply FREE_CLOSURE to a fresh call and CLOSURE. If the iteration returns normally, it will not call FREE_CLOSURE; Minor will not refer to CLOSURE again. If ENV is not an environment, abort. */ _Bool mn_environment_for_each (mn_call_t *, mn_ref_t *env, void *closure, _Bool (*func) (mn_call_t *, void *closure, mn_ref_t *symbol), void (*free_closure) (mn_call_t *, void *)); /* Eval. */ /* This may need to be re-thought in light of Matthew Flatt's phase distinction; compile-time and run-time environments are separate, no? */ /* Expand and then evaluate the expression EXPR in the environment ENV, and return its value. (EXPR must return exactly one value). EXPR must be a valid Minor Scheme expression represented as data. ENV must be an environment. If the evaluation raises an exception, save that as the pending exception, and return zero. If ENV is not an environment, or EXPR is not a valid Minor Scheme expression, return zero and set the pending exception. Expansion is done entirely before any evaluation takes place, when mn_eval is called. Evaluation interprets only fully-expanded expressions. */ mn_ref_t *mn_eval (mn_call_t *, mn_ref_t *expr, mn_ref_t *env); /* Expand and evaluate the expression EXPR in the environment ENV, and return its values as a Scheme list. The arguments, return value, and exception handling are as above. */ mn_ref_t *mn_eval_for_value_list (mn_call_t *, mn_ref_t *expr, mn_ref_t *env); /* Appendix: Async-safety and fork safety. */ /* A function is "async-safe" if it can be used reliably in a signal handler. POSIX specifies a limited set of system functions that are async-safe. None of the Minor C API functions are async-safe --- not even the trivial ones like mn_eq. (Doing any sort of operation on Minor heap objects involves coordinating with the garbage collector, and the primitives POSIX offers to do that --- mutexes, semaphores, and so on --- are not async-safe.) Similarly, the functions of this interface can't be used in a child process created by 'fork', when the parent is multi-threaded. (See "WHY NON-ASYNC-SAFE FUNCTIONS CAN'T BE USED IN FORKED CHILDREN", below.) However, when the parent process is single-threaded, and the POSIX system interface functions --- both async-safe and non-async safe --- are safe to use in forked children, then the Minor C API functions are safe to use, too. On most systems that implement POSIX, that precondition does hold: while multi-threaded programs and 'fork' are just inherently a bad mix, there are too many existing single-threaded programs that assume they have free reign in forked children for it to be acceptable for system libraries to choke on them. The Minor C library takes the appropriate steps to ensure that it, at least, won't be the source of the problem. WHY NON-ASYNC-SAFE FUNCTIONS CAN'T BE USED IN FORKED CHILDREN In general, it's not safe to use non-async-safe functions in a child process created by 'fork'. POSIX specifies that, when a thread calls 'fork', the child process inherits a copy of that thread, and a copy of all the parent's memory, including mutexes --- but none of the parent's other threads. This means that, if one of those other threads was holding a mutex when the fork took place, the child process will inherit locked mutexes, with no threads running to free them. If the child thread tries to acquire any of those mutexes, it will block forever. But mutexes are just one instance of the problem. In general, if a multi-threaded program calls 'fork', the child process will inherit whatever temporary inconsistent state other threads may have created, and any sort of cleanup those threads would normally be counted on to do isn't going to happen in the child. So, according to POSIX, only async-safe functions may be used in a child created by 'fork'. Async-safe functions are defined to be safe to use in signal handlers, and signal handlers can run at any time, in the midst of whatever inconsistent state and locked mutexes happen to be present. So they will be undisturbed by similar disorder in a child process. To be clear: none of these restrictions apply to a child process created by 'fork' *after it has called exec*. The exec wipes out the process's memory, replacing it with the contents of the executable file, and starts it running with a single thread. So any inconsistent state there might have been is gone after the exec. For details, see the rationale text in the POSIX spec for 'fork' and 'pthread_atfork'. It's the latter that acknowledges that, really, everything ought to continue to work after a fork if the parent is single-threaded. */ #endif /* MN_MINOR_H */