Newsgroups: oberlin.cs
From: jimb@totoro.cs.oberlin.edu (Jim Blandy)
Subject: Re: Neat schemes for implementing buffers in an editor
In-Reply-To: carlton@husc8.harvard.edu's message of Wed, 16 Dec 1992 22:36:50 GMT
Nntp-Posting-Host: totoro.cs.oberlin.edu
Organization: I see no organization here.
Distribution: oberlin
Date: Thu, 17 Dec 1992 13:24:22 GMT


In article <CARLTON.92Dec16173650@husc8.harvard.edu> carlton@husc8.harvard.edu (david carlton) writes:

>I'm writing an editor.  I need to implement buffers, of course.  What
>are people's favorite tricks for how to store the buffer internally?

You know how Emacs does it, right?  It keeps the entire text of the
buffer in memory, in one big array of characters.  The array is larger
than necessary to hold the text of the buffer; the extra space is
called the "gap".

Suppose your buffer contains the text

	"Beauty sat bathing by a spring"

Emacs might store this 30-character text in a 50-character buffer like
this:

	"Beauty sat bathing by a spring                    "
				       |^^^^ the gap ^^^^^|

Emacs always ignores the characters in the gap - this buffer is only
30 characters long, and the last character in it is a "g".  The
buffer's text is only that which is outside the gap.

Now, suppose your cursor is at the end of the buffer (after the "g" in
spring) and you type the text " where" (six characters).  Emacs would
just stick the characters into the beginning of the gap, and shrink
the gap, resulting in this:

	"Beauty sat bathing by a spring where              "
					     |^^^the gap^^|

If you hit "delete" to delete the "e" in "where", we just expand the
gap to enclose it:

	"Beauty sat bathing by a spring wher               "
					    |^^^the gap^^^|

Now, this is all fine and dandy, but what happens if you want to add
or delete text in the middle?  Well, the gap doesn't have to be at the
end of the buffer; we can move some text to the end, thereby shifting
the gap backwards:

	"Beauty sat bathing by a                spring wher"
				 |^^^the gap^^^|

Once that's done, inserting and deleting just before "spring" is as
cheap as it was at the end of the buffer.  If you type the characters
"bed", we'd just stick them into the front of the gap, as before:

	"Beauty sat bathing by a bed            spring wher"
				    |^^the gap^|

Note that the text of the buffer is now "Beauty sat bathing by a
bedspring wher"; Emacs always skips the gap, so you never have to deal
with it.

If I want to delete the word "spring", I just enlarge the gap to
enclose it:

	"Beauty sat bathing by a bed                   wher"
				    |^^^^^the gap^^^^|

Emacs is perfectly happy to let you move the cursor around without
moving the gap; it only shifts the gap around when there is actual
insertion or deletion to be done.

When you fill up the gap, Emacs just uses realloc to enlarge the
array, and then enlarges the gap to fit.


How efficient is this?

Well, for starters, notice that when you shift the gap back and forth,
you're actually just moving text in the other direction.  So the cost
of moving the gap is proportional to the _distance_ you move it, not
the size of the buffer.  Since the majority of editing consists of
fiddling about in a small area, and then jumping elsewhere and
fiddling about there, the majority of gap shifts are for very small
distances, so they don't cost much.

Note also that once the gap is where you want it, insertion and
deletion are practially free.  Emacs probably didn't need to shift the
gap at all while I typed this sentence, because I was mostly typing
and then deleting the stuff I just typed.  I did jump up a line or two
once or twice and fix something; Emacs had to copy the line's text
back and forth to accomodate me.

A hidden advantage of this is that it's very easy to code with.  Other
systems which break the text up into lines or many separate chunks of
text make it difficult for the programmer to work with.


The worst situation for this representation is when you're making
frequent jumps between the top and bottom of the buffer and editing
text each time.  Then Emacs has to copy the entire text of the buffer
back and forth each time you jump and edit.

This technique is hard on malloc; it's tough to find a contiguous
block of free storage, and you can end up with a lot of
fragmentation.  All versions of Emacs until version 19 had severe
problems with fragmentation; they often occupy much more memory than
they need because of fragmentation due to buffer allocation.  In fact,
it's such a pain that Emacs 19 uses its own custom memory allocator,
which eliminates fragmentation due to buffer allocation.  Plan 9's SAM
editor does almost exactly the same thing.

It's also hard on the pager; you can cause every byte of an Emacs
buffer to be paged in just by editing at the top, and then editing at
the bottom of the buffer.  Emacs 19's new allocator doesn't help this
much.


I've written other buffer managers for fun that paged text out to
disk, didn't require large contiguous blocks of memory, etcetera.  I
had problems keeping the text from breaking up into lots of little
tiny separately-allocate regions.


I look forward to hearing about your solution and your experiences
with it!