Newsgroups: oberlin.cs From: jimb@totoro.cs.oberlin.edu (Jim Blandy) Subject: Re: Neat schemes for implementing buffers in an editor In-Reply-To: carlton@husc8.harvard.edu's message of Wed, 16 Dec 1992 22:36:50 GMT Nntp-Posting-Host: totoro.cs.oberlin.edu Organization: I see no organization here. Distribution: oberlin Date: Thu, 17 Dec 1992 13:24:22 GMT In article carlton@husc8.harvard.edu (david carlton) writes: >I'm writing an editor. I need to implement buffers, of course. What >are people's favorite tricks for how to store the buffer internally? You know how Emacs does it, right? It keeps the entire text of the buffer in memory, in one big array of characters. The array is larger than necessary to hold the text of the buffer; the extra space is called the "gap". Suppose your buffer contains the text "Beauty sat bathing by a spring" Emacs might store this 30-character text in a 50-character buffer like this: "Beauty sat bathing by a spring " |^^^^ the gap ^^^^^| Emacs always ignores the characters in the gap - this buffer is only 30 characters long, and the last character in it is a "g". The buffer's text is only that which is outside the gap. Now, suppose your cursor is at the end of the buffer (after the "g" in spring) and you type the text " where" (six characters). Emacs would just stick the characters into the beginning of the gap, and shrink the gap, resulting in this: "Beauty sat bathing by a spring where " |^^^the gap^^| If you hit "delete" to delete the "e" in "where", we just expand the gap to enclose it: "Beauty sat bathing by a spring wher " |^^^the gap^^^| Now, this is all fine and dandy, but what happens if you want to add or delete text in the middle? Well, the gap doesn't have to be at the end of the buffer; we can move some text to the end, thereby shifting the gap backwards: "Beauty sat bathing by a spring wher" |^^^the gap^^^| Once that's done, inserting and deleting just before "spring" is as cheap as it was at the end of the buffer. If you type the characters "bed", we'd just stick them into the front of the gap, as before: "Beauty sat bathing by a bed spring wher" |^^the gap^| Note that the text of the buffer is now "Beauty sat bathing by a bedspring wher"; Emacs always skips the gap, so you never have to deal with it. If I want to delete the word "spring", I just enlarge the gap to enclose it: "Beauty sat bathing by a bed wher" |^^^^^the gap^^^^| Emacs is perfectly happy to let you move the cursor around without moving the gap; it only shifts the gap around when there is actual insertion or deletion to be done. When you fill up the gap, Emacs just uses realloc to enlarge the array, and then enlarges the gap to fit. How efficient is this? Well, for starters, notice that when you shift the gap back and forth, you're actually just moving text in the other direction. So the cost of moving the gap is proportional to the _distance_ you move it, not the size of the buffer. Since the majority of editing consists of fiddling about in a small area, and then jumping elsewhere and fiddling about there, the majority of gap shifts are for very small distances, so they don't cost much. Note also that once the gap is where you want it, insertion and deletion are practially free. Emacs probably didn't need to shift the gap at all while I typed this sentence, because I was mostly typing and then deleting the stuff I just typed. I did jump up a line or two once or twice and fix something; Emacs had to copy the line's text back and forth to accomodate me. A hidden advantage of this is that it's very easy to code with. Other systems which break the text up into lines or many separate chunks of text make it difficult for the programmer to work with. The worst situation for this representation is when you're making frequent jumps between the top and bottom of the buffer and editing text each time. Then Emacs has to copy the entire text of the buffer back and forth each time you jump and edit. This technique is hard on malloc; it's tough to find a contiguous block of free storage, and you can end up with a lot of fragmentation. All versions of Emacs until version 19 had severe problems with fragmentation; they often occupy much more memory than they need because of fragmentation due to buffer allocation. In fact, it's such a pain that Emacs 19 uses its own custom memory allocator, which eliminates fragmentation due to buffer allocation. Plan 9's SAM editor does almost exactly the same thing. It's also hard on the pager; you can cause every byte of an Emacs buffer to be paged in just by editing at the top, and then editing at the bottom of the buffer. Emacs 19's new allocator doesn't help this much. I've written other buffer managers for fun that paged text out to disk, didn't require large contiguous blocks of memory, etcetera. I had problems keeping the text from breaking up into lots of little tiny separately-allocate regions. I look forward to hearing about your solution and your experiences with it!