--- user-visible fixes * Change hash table iteration interface so that the 'next' function mutates the iterator, instead of consuming it and returning a new one. * Change hash table lexeme from 'hash_table' to 'hash'. * Change the environment lexeme from 'environment' to 'env'. * There are still type conversion functions whose names use _from_, contrary to the naming conventions given in minor.h: mn_string_from_str, for example. * In minor/minor.h naming conventions, note 'array' as name lexeme for C arrays (typically of references). * The random number generator's behavior depends on the size of 'unsigned'. It ought to be portable, to make tests reproducible. * Make sure all API functions follow abort vs. exception policy. * Get rid of mn_make_multi_valued_procedure and add a flag to mn_make_procedure: just have one universal constructor function, and then have whatever common-case functions we need (mn_make_multi_valued_procedure is not a common-case function...) * Perhaps all Minor I/O functions, other than byte and byte vector operations, should be wide-oriented operations. It's odd that we can't use mn_write on a port, and then use mn_exception_string and mn_display_str to print an error message. * Should mn_get_utf8 and mn_put_utf8 return a count of code units consumed, instead of a new pointer? Should they take a pointer by reference? (GCC's manual says it will never place a variable in a register if its address is taken.) Try changing the callers first, and see how they look. * Should the arguments to mn_push be reversed? They should be the same as in CL. The list argument should be passed by reference, to more closely resemble CL push. If we're going to name it that. * Iterator-based interfaces are much more natural in C than for_each-like interfaces. Should we replace mn_environment_for_each with an iterator interface? That would remove all the dumb unwinding hair, and the subcall crap. --- utter trivia * Rename globalrefs.[ch] to global-refs.[ch] * Set up process to do nightly check out / make dist / unpack / build { in source tree, in separate tree } --- consistency checks * Why use 'assert' when we have 'check'? Also check for 'if (...) abort'. Just print error message with source location. --- others * The hash table tests shouldn't dump stats to stdout; it should just check the parameters for reasonability, and fail the test if they're not. Certainly a badly-performing hash function is something we'd want to fix, right? * Do we need to #define _POSIX_C_SOURCE in files using POSIX headers? * Use ISO C 'inline', instead of GCC hair. * numeric conversion functions for size_t, intmax_t, and uintmax_t. * tests for unicode-case.c * Implement better Unicode case insensitivity in reader. * Don't store a terminating null character in strings. More sophisticated representations (shared, quick-concat) won't allow that, and since the strings aren't in the C execution character set anyway, we can't do the trick of passing them directly to system calls. * Abstract out indexing in hash-tables.c. * Strictly-typed mn__tag_symbol, mn__tag_vector. * mn_from_char and mn_from_wchar aren't total, but there's no predicate the user can call to make sure they'll succeed. Granted, Unicode ought to be able to handle whatever's there, but if the C execution character set contains undefined code points, then those can't be converted to Unicode. * The hash table should be its own type, and the symbol table just one application of it. * Hash tables should not rely on incoherent sections for their mutual exclusion. * Environments should not rely on incoherent sections for their mutual exclusion. * For tests that don't actually communicate between threads, add "torture test" that runs them in many threads over and over. Then let the regression tests run single-threaded (and faster!) by default. It'd even be useful to run tests over and over in a single thread: it would show leaks and maybe catch bugs sensitive to when GC occurs. ==== prd.txt item 6 ("implement Core Scheme") finished - Test that mn_environment_for_each really frees its closure when we apply a continuation that exits the iteration. ==== prd.txt item 7 ("Macro expander") finished * Provide inlined, system-specific definitions for the mn__begin_coherent and mn__end_coherent functions, and see if that speeds things up. * Eliminate uses of mn__per_thread outside pause-posix-tls.c, and maybe trace.c. * Exceptions should be structures as in Mzscheme, not strings. * Hash tables need an iterator or a for-each function. ==== prd.txt item 8 ("Full R5RS, with modules and macros") finished * In the Mozilla sources, js/src/jsstr.cpp has a data structure which seems to encode almost all the Unicode characteristics we want, in much less space than the table gen-cat-table produces for the categories alone. We should figure out whether to steal that. * "Text buffer" and "byte buffer" types that support insertion and deletion at arbitrary points (using a gap buffer?), markers, and the like. * text/byte string formatting library based on inserting into buffers, with a syntax that hides creating a buffer, passing the buffer to all the operators, and turning the result into a string * Should we have a C-level function to do a longjmp in a way that runs unwinds for things like mn_environment_for_each? * Should we be using pthread cancellation cleanups in functions that call back into user code --- mn_environment_for_each and the mn_apply family of functions? * Don't export mn__ functions from libminor.so * Move all forms of basic character execution set to test-lib.c, along with a 'char_name' function; use in c-api-ports, c-api-characters, and c-api-strings. * New test stress-pair.c: build random trees, trade pieces between threads, then replay whole process and check that the results are as we expected. To make inter-thread trades reproducible, have each thread record the sequence of other threads it traded with, and then in the replay wait only for that thread. * Need tests for strings with embedded null characters. * generate per-type test functions in c-api-numbers.c with a shell script * Use linked lists in gc/tests/disjoint-types.c. * Are new ref groups too large? At the moment, we have chosen the size of a reference clump to be small enough that we don't mind allocating one for every call, but large enough that we still reduce our allocation overhead. But those two needs are clearly in tension, and there's no need for them to be. I'll bet small ref groups will be more common than large ones. So we could have variable clump sizes --- start out with a small clump, with room for thirty refs or so, and then each new clump we allocate to the ref would be double the size of the previous clump. That way, the number of clumps would be logarithmic in the number of refs ("The log of n, where n is the size of something, is effectively a constant."), but ref groups with few refs would still be small. * Use fields in call structures in preference to thread-local variable references. TLS references could entail function calls. * Weak references * eqv object sets? Like hash tables, but you can only associate boolean values? Internally, they can be implemented with half the space overhead, and there's no way to do as well (that I can see) without primitives. * Weak eqv hash tables. * The symbol hash should be weak. (Remember that deleting entries from a table that rehashes collisions requires care.) * Guardians * GC should avoid copying large objects. * Once GC avoids copying large objects, use malloc / realloc to manage eqv hash tables; see comments in hash.c. * Implement leases for accessing string and byte vector contents. I think this will address many people's objections to the JNI-style interfaces, by allowing Minor API function calls to get pulled out of inner loops. * There should be immutable pairs. Yes, this means you can't use a constant displacement in an addressing mode to access cars and cdrs; you'll really have to mask things off. Get over it. * Labels for built-in types and symbols constructed at initialization should be statically allocated and initialized, and placed in the immortal generation. * Mirage GC map nodes should be statically allocated and initialized. * Should we do a better job at choosing initial block sizes? * Can we get better locality by doing depth-first traversal? * heap dumper, for debugging GC problems * When we actually start generating object files containing heap objects, we'll want to actually fix the values in the tag enums, since they'll be a matter of public interface. * The labels for all the labelled types should be statically allocated and initialized, and placed in the immortal generation. * There should be a "debugging" mode, where freeing a reference marks it as garbage, but the storage itself is never reused, and every API function checks to see if it has been passed a dead reference. Similarly for calls. Ideally, one could switch this on and off without recompiling. * A C++ API could be much simpler to use than the C API, because it's possible to use C++ features like destructors and copy operators to manage the references for you. Dynamic casts could actually make Scheme environments (say) implement the C++ STL mapping protocols.