T.R | Title | User | Personal Name | Date | Lines |
---|
509.1 | CROSS REF | WGBH::APPELLOF | Carl J. Appellof | Thu Jun 12 1986 20:01 | 7 |
| Try looking at note 485 (especially .5 and .6). What exactly do
you mean by an "integer square root"? Does it mean "the integer
that, when squared, is closest to the original value"?
Interested in a PDP-10 MACRO version of an integer square root which
uses a binary adaptation of the old-fashioned grade school method?
|
509.2 | Comparison | WBZ::APPELLOF | Carl J. Appellof | Thu Jun 12 1986 20:47 | 6 |
| It looks like your method is proportional to the size of the number.
If you trim off leading zeroes, the old fashioned way is proportional
to the number of digits in the number, which is roughly proportional
to the log of the number. If you don't trim leading zeroes, then
the old-fashioned way is just proportional to the word length.
|
509.3 | replies to .1 and .2; a performance question | EAGLEA::BEST | R. D. Best, 32 bit sys. arch. & A.D., VAXBI | Thu Jun 12 1986 21:52 | 44 |
|
In response to .1 and .2:
Regarding the definition: It seems that both Hemenway's and my
algorithm produce the smallest positive integer >= sqrt( x ). The
'jump points' ( where the function output changes from n to n+1 ) occur when
the domain value changes from a_perfect_square to a_perfect_square+1.
( e.g. sqr3( 25 ) = 5, sqr3( 26 ) = 6 )
If there are methods that produce 'smallest positive integer <= sqrt( x )'
or 'closest integer to sqrt( x )' that run faster, I'm interested.
On the PDP-10 macro routine: sure, let's put it in. Is it already in
a file somewhere ? I may need an interpretation though; I don't know the
instruction set of the PDP-10.
I was aware that there are faster algorithms; for example, scanning down
from the MSB two bits at a time until you find a non-zero bit is a good
way to determine the MSB of the square root:
bit_position( MSB( sqrt( X ) ) ) ~= bit_position( MSB( X ) ) div 2.
From there, I suppose that I could use a table to look up the odd integer
to start subtracting with. I didn't bother trying to do that in
the program in .0 because I couldn't think of a real neat way to write
an efficient bit_position function in Pascal ( I guess the scanning technique
IS sort of taking a rough logarithm; well how about a real fast integer LOG
fcn ?)
Hmmm. I think that even if you don't chop leading zeroes, the time
complexity is on the order of sqrt( x ). You effectively count up to
the square root by fixed increments. Is this what you mean by
proportional to the word length ?
Since the expanded recurrence version ( sqr3 ) does not work as well
as sqr1 for small numbers, I would like to find a way to improve it
without hacking up the code with a lot of special case tests.
There is one thing about the algorithm in .0 that puzzles me.
I ran it on both a 785 and an 8600. The crossover value ( i.e.
the value of x at which the naive algorithm and the expanded recurrence version
performed equally well ) was consistently smaller on the 8600 than the 785.
Does anyone know why this happens ?
|
509.4 | Here's one | WGBH::APPELLOF | Carl J. Appellof | Fri Jun 13 1986 18:47 | 49 |
| I was going to put in the MACRO-10 routine, since it was my native
language for a long time, but it would have taken me longer to
explain the instructions than writing it up in MACRO-32, so here's
a FORTRAN-callable MACRO-32 integer square root routine. It works
with 32-bit integers, but there is no check for negative args.
.TITLE ISQRT
;
; Integer square root
; Input: 4(AP) contains address of integer argument
; Output: R0 contains the greatest integer less than or equal
; to the square root of the argument
;
ISQRT:: .word ^M<R2,R3>
CLRL R0 ;zap place for normalized word
MOVL @4(AP),R1 ;Is the argument 0?
BEQL DONE ;Yes, done
CLRQ R2 ;zero R2 and R3 also
1$: ASHQ #-2,R0,R0 ;Shift out 2 bits
INCL R2 ;Count digit pairs
TSTL R1 ;Done all digits yet?
BNEQ 1$ ;No, continue
;
; At this point: R1 contains 0
; R0 contiains normalized argument
; R2 contains number of digit pairs
;
LOOP: ASHQ #2,R0,R0 ;Shift 2 bits of argument into R1
ASHL #2,R3,R3 ;One more digit of result
INCL R3 ;and add 1
CMPL R3,R1 ;Will it fit?
BGTR 2$ ;No
SUBL2 R3,R1 ; and subtract
INCL R3 ;Yes, next result digit is a 1
2$: ASHL #-1,R3,R3 ;Readjust result
SOBGTR R2,LOOP ;Go do next digit pair
MOVL R3,R0 ;Put answer in the right place
DONE: RET ;DONE
.END
The runtime of this routine is proportional to the number of
bits in the argument, which is proportional to log(2) of the number.
That is a better runtime than one which is proportional to the square
root of the argument.
|
509.5 | Newton Rules, OK!? | SHEILA::PUCKETT | Fortran will Never Die | Fri Jun 20 1986 06:45 | 16 |
| I played about with ISQRT (.4) and a Fortran version of SQR3 (.1)
and found that these both rise as LOG(n), as expected. This gets
quite slow as n gets large; Newtons method is much faster for large
n (i.e. multiprecision) PROVIDED THAT you start with a good initial
guess.
My initial guess works as follows: if the number n has k bits and
k is even, the guess is 2**(k/2). (top bit set)
If k is odd the guess is 2**(k/2)+2**(k/2-1) (top 2 bits set)
Typically you only need 4-5 divisions to get from this to an
accurate integer square root on a 50 digit number. The number of
divisions grows much more slowly than LOG(n) but I dont know the
theory behind this. Maybe it is LOG(LOG(n))?
- Giles
|
509.6 | Another LOG(n) method | SHEILA::PUCKETT | Fortran will Never Die | Fri Jun 20 1986 06:50 | 4 |
| Oops, I forgot to mention another LOG(n) method I tried: binary
chop, with initial limits 2**(k/2-1) and 2**(k/2+1) for a k-digit
number n. This is very fast for small n (5-7 digits) but the comments
in .-1 apply to it just like ISQRT and SQR3.
|
509.7 | How does fast division work? | TOOK::APPELLOF | Carl J. Appellof | Fri Jun 20 1986 13:21 | 24 |
| I still believe that SQR3 time rises as the square root of the
argument. This gets painfully slow as N rises. ISQRT goes up as
LOG(n). The easiest way to fake up a Newton's method rooter for
a 32 bit number is the following FORTRAN function:
INTEGER FUNCTION FSQRT(N)
FSQRT=INT(DSQRT(DFLOAT(N)))
RETURN
END
The description of the method used for the initial approximation
is in the VMS RTL manual description of MTH$xSQRT.
This method only requires 3 iterations of the Newton-Raphson procedure,
and therefore the time is constant (if division is cheap). It beats
ISQRT all the time.
On the other hand, we might consider hardware implementations.
The algorithm used in ISQRT is very similar to that used in ordinary
integer division (shift and subtract). I do not know if there is
a better method to do division in hardware. If there is not, then
an ISQRT-like hardware algorithm would beat a Newton's method, since
Newton uses division, which is also proportional to LOG(n) using
the shift-and-subtract method.
|
509.8 | Newtons is LOG(LOG(n)) ?? | SHEILA::PUCKETT | An Indestructible Metallic Alloy | Mon Jun 30 1986 07:12 | 9 |
| RE: .5,.6
It appears using multi-precision routines on 80-120 digit nos.
that Newtons method DOUBLES the precision with each iteration.
This is a whole heck of a lot better than LOG(n), it is in fact
approximately LOG(LOG(n)).
Anyone up on the theory of these optimisation methods?
- Giles
|
509.9 | Let them read Macro-10 | GALLO::AMARTIN | Alan H. Martin | Mon Jul 07 1986 17:19 | 8 |
| Re .7:
Oh, splurge, Carl. Stick the stupid Macro-10 routine in the conference.
Posterity ought to count for something, and for all I know, one version
or the other might get a boost from the difference in architectures.
If someone can't read it, that is their loss. Don't bother prettying
it up - I am not trying to make work for you.
/AHM/THX
|
509.10 | exit | TOOK::APPELLOF | Carl J. Appellof | Wed Jul 16 1986 14:05 | 52 |
| Alright, here is the stupid MACRO-10 routine. The only architectural
advantage the -10 has is the JFFO instruction, which finds the
LEFT-most 1-bit in a word (as opposed to the VAX FFO instruction,
which finds the RIGHT-most 1-bit which is useless for normalizing
numbers.). It IS possible to fake a similar operation on a VAX
by using a CVTLF instruction and extracting the resulting exponent
field. The rest of the algorithm is exactly the same on a VAX or
a -10. Of course, all of this is useless for a multiprecision version.
I'd like to see the FAST multiprecision SQRT which someone mentioned
in another note.
TITLE ISQRT
AP=16 ;ARGUMENT POINTER AC
P=17 ;Stack pointer
ENTRY ISQRT
;
; Call:
; I = ISQRT(N)
;
ISQRT: PUSH P,1 ;Save all ACs we use
PUSH P,2
PUSH P,3
MOVE 0,@(16) ;Get argument from arg list
JFFO 0,.+2 ;Find leftmost 1 bit in word
JRST SQREND ;None found, answer is 0
MOVE 2,0 ;Save the argument
TRZ 1,1 ;Get nearest even number of leading zeroes
LSH 2,(1) ;Normalize the argument
HRROI 3,-^D36(1) ;Calculate Negative number of digits
ASH 3,-1 ;number of pairs of digits
SETZB 0,1 ;ZERO the answer and accumulator
LOOP: LSHC 1,2 ;Slide in the next digit pair
LSH 0,2 ;gives us one more digit of answer
TRO 0,1 ;In this position of the answer
CAMLE 0,1 ;Will it fit?
JRST LOOP1 ;Nope
SUB 1,0 ;Yup, subtract from accumulator
TRO 0,2 ;Set answer digit
LOOP1: LSH 0,-1 ;Readjust accumulator
AOJL 3,LOOP ;Go for next digit pair
SQREND: POP P,3 ;Restore saved ACs
POP P,2
POP P,1
POPJ P, ;Return with answer in AC0
END
|
509.11 | Premature optimization is the root of all evil | NOBUGS::AMARTIN | Alan H. Martin | Thu Oct 30 1986 22:06 | 25 |
| Re .10:
Thanks. The routine even works as written under extended addressing!
(However, you can trash AC1 and AC16 with impunity, thus eliminating two
PUSH/POP pairs, and changing "JRST SQREND" to "POPJ P," in the process!)
By the way,
>(as opposed to the VAX FFO instruction,
> which finds the RIGHT-most 1-bit which is useless for normalizing
> numbers.)
On a 2's complement machine with a "count the number of leading zeros"
instruction, anyone can find the number of trailing zeros with a
(small) constant additionaly amount of work.
Conversely, on a 2's complement machine with a "count the number of
trailing zeros" instruction, I am aware of no fast method of counting
leading zeros.
I pose the former problem as an exercise to the reader. Can anyone
give a solution to the latter problem?
/AHM/THX
|
509.12 | Is speedup possible? | TOOK::APPELLOF | Carl J. Appellof | Fri Oct 31 1986 12:29 | 9 |
| Thanks, Alan, for pointing out what I had forgotten: that AC1 is
fair game in FORTRAN-10 function routines.
I still have a nagging feeling that there is a version of this
algorithm which would operate on more than 2 bits at a time. After
all, it's merely a binary adaptation of the grade school paper and
pencil method, and that one looks at a pair of decimal digits on
each pass. Can you think of a simple way of examining 4 bits at
a time?
|
509.13 | Who needs divisions? | AKQJ10::YARBROUGH | | Wed Apr 22 1987 18:46 | 4 |
| There is a recent article in IEEE Proceedings, Vol 75 #2, p. 262,
about sqrts with a Digital Signal Processor. It may be worth a look.
Lynn
|
509.14 | use a trick from DEC-10 processor reference | EAGLE1::BEST | R D Best, Systems architecture, I/O | Tue Jun 16 1987 18:03 | 14 |
| re .11:
> On a 2's complement machine with a "count the number of leading zeros"
> instruction, anyone can find the number of trailing zeros with a
> (small) constant additionaly amount of work.
Let the word size of the machine be N. Let the argument be A.
If the "count the number of leading zeros" operator is leading(x), then
the number of trailing zeros is:
trailing( A ) = N - leading( and( A, -A ) ) - 1
This is because and( A, -A ) yields a bit mask with a single 1 in the position
of the rightmost set bit in the argument.
|