Here are two posts I saw about taking square roots quickly. ------------------------------------------------------------------------------ From: Robert W Vogler Newsgroups: comp.lang.asm.x86 Subject: Re: Square root? Date: Mon, 20 Feb 1995 19:40:49 -0700 Organization: Brigham Young University NNTP-Posting-Host: bert.cs.byu.edu Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII In-Reply-To: <3i6en3$7lj@nippur.irb.hr> On 19 Feb 1995, Daniel Mosmondor wrote: > I'm looking for a good and 386 optimized source which could get me a > INT sqare root of 32 bit number. Does anyone has anything available, > or some ideas? My source isn't for 386, but the algorithm is awesome. Its a strange, but powerful algorithm. It will generate the sqrt of a 32bit number in just 5 iterations. Its a O(log n) algorithm, where n is the number of bits in your number. With this consideration, you could claim that it is really a O(log (log n)) algorithm. But I don't have time to explain it right now. Mail me a message around Wednesday, and I'll write it up for you. For the meanwhile here's a simple algorithm. Your answer will always be half the number of bits as the operand (rounded up). If your taking the root of 32bits, you'll end up with 16. Take a 16bit number. Set the 1st bit. Square the number. If it's larger than the operand, clear the bit. If it's not, leave it. Set the next lower bit. Continue the process til you've finished all 16bits. It's fast an simple, but still only a O(n) algorithm. The better one is more complex per iteration, but is more efficient. Write back in a few days for that one. A few months ago, some ideas were brought up. One used lookup tables. Another used the fact that squares are spaced apart at a rate of 2(n)1). sqr(0), sqr(1)=1, sqr(2)=4, sqr(3)=9,sqr(4)=16; +1,+3,+5,+7. Robert Vogler lowlevel@bert.cs.byu.edu ------------------------------------------------------------------------------ From: tut@iai.mv.com (Bill Tuttle) Newsgroups: comp.lang.asm.x86 Subject: Re: Square root? Distribution: world Date: Mon, 20 Feb 95 17:45:15 GMT Organization: Imaging Automation dmosmon@oleh.srce.hr writes in article <3i6en3$7lj@nippur.irb.hr>: > > I'm looking for a good and 386 optimized source which could get me a > INT sqare root of 32 bit number. Does anyone has anything available, > or some ideas? > > This is meant to be used in a font scaller which would take little > bitmapped fonts from VGA ROM and make them big. Idea behind this is > very interesting, so I want you to hurry and give me some good SQRTs. > > Thanks! > I have included both a 'C' version and a assembly version. If the 'C' version is compiled with a 32 bit intel compiler the "r >>= 1" statement can be replaced with a "r >>= 33" statement to save a clock on the shift. A good compiler with save a few clocks by optimizing the first and last iteration. I have not done this in the assembly version because it is not significant in the overall total time. I believe this to be pretty close to optimized as far as time. I have seen the newton method with an optimized initial estimatator do better on average by 10%. However, the benifit of this binary iterative method is the same time for all numbers. unsigned long sqrtul(unsigned long v) { unsigned long r = 0, s; #define STEP(k) s = r + (1L << k * 2); r >>= 1; \ if (s <= v) { v -= s; r |= (1L << k * 2); } STEP(15); STEP(14); STEP(13); STEP(12); STEP(11); STEP(10); STEP(9); STEP(8); STEP(7); STEP(6); STEP(5); STEP(4); STEP(3); STEP(2); STEP(1); STEP(0); return r; } sqrtul PROC square:DWORD mov edx, square xor eax, eax SQRTMASK = 1 SHL 30 REPT 16 lea ecx, [eax+SQRTMASK] DB 0C1H, 0E8h, 01 ; shr eax, 1 -- immediate 2 clocks .IF ecx <= edx add eax, SQRTMASK sub edx, ecx .ENDIF SQRTMASK = SQRTMASK SHR 2 ENDM ret sqrtul ENDP END Hope this helps! Bill Tuttle -- tut@iai.mv.com