From: friedman@cli.com (Noah Friedman) To: kfogel@floss.life.uiuc.edu Cc: jimb@gnu.ai.mit.edu Subject: Re: boyer-moore string searching Date: Wed, 8 Feb 95 23:01:49 CST > Hmm, Knuth doesn't seem to discuss it... is it recent? Yeah, it postdates AOCP, though Knuth commented on an optimization to B-M which the latter acknowledge in the last version of the paper. Here's the reference: Robert S. Boyer, J Strother Moore "A Fast String Searching Algorithm" Communications of the ACM, V.20, No. 10, pp. 762-772; October 1977. I only have a paper copy which is faded somewhat, but I could copy it for you if you can't find it easily. Basically, the idea is that you do your search starting from the end of your pattern and work backwards. If your pattern is patlen chars long, start by looking at the patlen'th char in your search string. If that char isn't in the pattern anywhere, you can skip forward patlen chars without examining the rest, because you know that no part of the string before that can match. (If it is, you have to compute how far forward you can skip, but you build a table of "deltas" before you start). For instance, consider searching for the string "string" in "the substring to be found": string the substring to be found ^ `u' doesn't appear anywhere in "string", so skip forward 6 positions: string the substring to be found ^ This time the char `n' does appear in "string" at the patlen-1 position, so you can only move forward one char. But this time you've found the match, and it only took 2+patlen comparisons instead of 9+patlen for the traditional "search forward in both pattern and search-string" algorithm. There are more details for further optimization that guarantee its worst-case behavior is linear in patlen+i, but I'll let you read the paper for yourself. :-) Basically, it just amazes me how anything so obvious could have been so unapparent before.