[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference hydra::amiga_v1

Title:AMIGA NOTES
Notice:Join us in the *NEW* conference - HYDRA::AMIGA_V2
Moderator:HYDRA::MOORE
Created:Sat Apr 26 1986
Last Modified:Wed Feb 05 1992
Last Successful Update:Fri Jun 06 1997
Number of topics:5378
Total number of notes:38326

861.0. "bummed" by VIKING::PRAETORIUS (unineted nucular jewlery relator) Thu Oct 29 1987 13:21

     Saw this in a recent SIGARCH publication, hadta be the first to post it
(right, all you knew about this one already, I'm gonna post it anyway).

	; arg in D0, sgn(arg) comes out in D1 (-1,0,1 for D0 <,=,> 0)

	ADD.L	D0,D0	; Set carry if D0 is negative
	SUBX.L	D1,D1	; D1-D1-carry, i.e., spreads carry through D1 & carry
	NEGX.L	D0	; 0-D0-carry, sets carry if result is nonzero
	ADDX.L	D1,D1	; -1 + -1 + 1 if D0 was negative,
			; 0 + 0 + 1 if D0 was positive
			; 0 + 0 + 0 if D0 was zero

I don't have the rag with me, or I'd post the others.  These code sequences were
all found by a program at CMU called a superoptimizer, which does a (pruned)
tree search of random code segments (up to 11 instructions) to see if they
meet a certain goal.

     Cute, huh?
								RP
T.RTitleUserPersonal
Name
DateLines
861.1W O WHYSTER::DEARBORNTrouvez MieuxThu Oct 29 1987 13:5819
    Amazing...
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    What is it?
    
861.2effective usage of the instruction set16BITS::KRUGERThu Oct 29 1987 19:4620
    Very nice -- 
    perhaps we can post a collection of 68000 tricks here? I'll have
    to go back and look in my black bag.... Here's some minor stuff
    
    SUB D0,D0 is faster than CLR D0, by the way. (Aside from it's cute
    application to affecting the flags.)
    
    for (i=9; i>0 && x<>42; i--)
      .....
    
    	BRA 1$
    2$	.....
    1$  CMP x,#42
      	DBNE i,2$
    
    Note that DBcc is far and away superior to the SOB of DECland.
    
    
    
    
861.3STAR::BANKSIn Search of MediocrityThu Oct 29 1987 19:599
    I'm told (but I forgot from where)  that the fastest way to clear
    a D register is
    
    MOVEQ.L	#0,D0
    
    Sorta makes you wonder why they even bothered with CLR.L when
    everything else is faster.  (Then again, I fondly remember a machine
    whose JUMPA was slower than other forms of unconditional branching,
    so I guess this is the same, right?)
861.4more from SIGARCHSTAR::BANKSIn Search of MediocrityFri Oct 30 1987 00:21107
    Other bummages from the same source as .0 (he was reading my copy,
    after all (and don't they always start publishing all this interesting
    stuff right as I cancel my subscription?)):
    
    Not quite as dramatic as the SIGNUM in .0, but copied as close to
    verbatim as I can get:
    
    I.3  Max and Min
    
    This program finds the maximum of the unsigned numbers in D0 and
    D1 and returns the answer in D0.  The comments on the right show
    what's in the various registers during execution and is similar
    to the boolean expression checker's method of analysis:
    
	(D0=X, D1=Y)	|Flag,Reg|If D1>D0   |If C1<=D0
    	SUB.L	D1,D0	|(C,D0) =|(1, X-Y)   |(0, X-Y)
    	SUBX.L	D2,D2	|(C,D2) =|(1,11..11) |(0,0...0)
    	OR.L	D2,D0	|(C,D0) =|(1,11..11) |(0,X-Y)
    	ADDX.L	D1,D0	|D0 =    |Y	     |X
    	(D0 = max(X, Y))
    
    This program finds the minimum of the unsigned numbers in D0 and
    D1 and returns the answer in D0.
    
    	(D0=X, D1=Y)	|Flag,Reg|If D1>D0   |If D1<=D0
    	SUB.L	D1,D0	|(C,D0) =|(1, X-Y)   |(0, X-Y)
    	SUBX.L	D2,D2	|D2 =    |111...111  |000...000
   	AND.L	D2,D0	|D0 =    |X-Y	     |0
    	ADD.L	D1,D0	|D0 =    |X	     |Y
    	(D0 = min(X, Y))
    
    Simultaneous min and max.
    
    	(D0=X, D1=Y)	|Flag,Reg|If D1>D0   |If D1<=D0
    	SUB.L	D1,D0	|(C,D0) =|(1, X-Y)   |(0, X-Y)
    	SUBX.L	D2,D2	|D2 =    |111...111  |000...000
    	AND.L	D0,D2   |D2 =    |X-Y        |0
    	EOR.L	D2,D0	|D0 =    |0          |X-Y
    	ADD.L	D1,D0	|D0 =    |Y	     |X
    	ADD.L	D2,D1	|D1 =	 |X	     |Y
    	(D0 = max(X, Y), D1 = min(X, Y))
    
    I.5 Decimal to Binary
    
    This piece converts a 8 digit BCD number stored in D0, one digit
    to a nibble, to binary with the result also in D0.  It is the longest
    sequence generated by superoptimizer, and was actually done in three
    stages.  This idea that this was even possible came while generating
    sequences to multiply by 10.  At first I had superoptimizer compute
    the 2 digit BCD to binary conversion function '((D0 & 0xF0) >> 4)
    * 10 + (D0 & 0x0F)'.  This came out surprisingly short:
    
    (2 digit BCD number in D0)
    	MOVE.B	D0,D1
    	AND.B	#$F0,D1
    	LSR.B	#3,D1
    	SUB.B	D1,D0
    	SUB.B	D1,D0
    	SUB.B	D1,D0
    (binary equivalent in D0)
    
    What is actually being computed is:
    
    	ans = D0 - 3 * ((D0 & 0xF0) / 8)
    
    Representing the contents of D0 as (H:L) where H is the upper nibble
    and L is the lower nibble, we get:
    
    	D0 = 16 * H + L,	D0 & 0xF0 = 16 * H
    	ans = (16 * H + L) - 3 * (16 * H / 8)
    	    = 16 * H + L - 6 * H
    	    = 10 * H + L
    
    which is the 2 digit BCD to binary function.  Encouraged by this
    result, superoptimizer was put to the task of computing first the
    4 digit BCD to binary function and then the 8 digit BCD to binary
    function.  Here is the 8 digit converter:
    
    	(8 digit BCD number in D0)
    	MOVE.L	D0,D1		*
    	AND.L	#$F0F0F0F0,D1	*
    	LSR.L	#3,D1		*
    	SUB.L	D1,D0		*
    	SUB.L	D1,D0		*
    	SUB.L	D1,D0		*
    	MOVE.L	D0,D1		+
    	AND.L	#$FF00FF00,D1	+
	LSR.L	#1,D1		+
    	SUB.L	D1,D0		+
    	LSR.L	#2,D1		+
    	SUB.L	D1,D0		+
    	LSR.L	#3,D1		+
    	ADD.L	D1,D0		+
    	MOVE.L	D0,D1		-
    	SWAP	D1		-
    	MULU	#$D8F0,D1	-
    	SUB.L	D1,D0		-
    	(binary equivalent in D0)
    
    What is most amazing is the first section (marked by * alongside
    the program).  It looks exactly like the 2 digit BCD to binary
    function.  This section computes 4 simultaneous 2 digit BCD to binary
    functions on adjacent pairs of nibbles and deposits the answer back
    into the byte occupied by those nibbles.  The second part (marked
    by +) computes two simultaneous 2 byte base 100 to binary conversion
    functions.  Finally, the third part computes the function 'high
    word of D0 * 10000 + low word of D0' to complete the conversion.