[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

324.0. "Base conversion algorithms" by WHYVAX::HETRICK () Thu Jul 25 1985 19:39

     A problem arising in file transmission protocols is the
representation of "special" characters, characters which might be
corrupted by the data path in use.  Control characters may interact
with environment in undesired ways (e.g., ^S may cause the receiver to
be unable to acknowledge a transmission), and the eighth bit of
characters may be lost by seven bit wide data paths. 

     Different file transmission protocols deal with this in different
ways.  The KERMIT protocol, for example, has "quoting" features, such
that C0 control characters are represented as a prefix character
followed by G0 character, G1 characters are represented as a prefix
character followed by a G0 character, and C1 characters are
represented as two prefix characters followed by a G0 character. The
G0 quoting characters are also quoted.  Over the entire 256 character
set, an average of 1.766 characters is required to represent a single
byte.  The XMODEM, YMODEM, and uucp protocols deal with the problem by
ignoring it. 

     If the byte stream to be transmitted is viewed as a (very large)
integer expressed in radix 256, then it is possible to convert this
radix 256 representation into a representation whose radix is at most
96.  This would allow the use of printable ASCII characters as the
digits of the representation, avoiding most questions of data path
affects.  However, any such conversion must require knowledge of only
a small number of consecutive digits;  using a traditional base
conversion algorithm on even a small 10 kilobyte file is out of the
question. 

     The public domain program uuencode converts a radix 256 digit
stream into a radix 64 digit stream.  The digits of the radix 64 digit
stream are the 64 ASCII characters starting at space.  The public
domain program uudecode converts the radix 64 representation back into
the radix 256 representation, the original byte stream.  This
conversion requires 1.333 characters to be transmitted for each byte
of data.  The algorithm used is essentially: 

     CONST
	 InputRadix  = 256;  {  64 for uudecode }
	 OutputRadix =  64;  { 256 for uudecode }

     CurrentAccum := 0;
     CurrentRadix := 1;

     WHILE NOT EOF (InputFile)
     DO
	 BEGIN
	 READ (InputFile, InputChar);
	 CurrentAccum := ORD (InputChar) * CurrentRadix + CurrentAccum;
	 CurrentRadix := InputRadix      * CurrentRadix;
	 WHILE CurrentRadix > OutputRadix
	 DO
	     BEGIN
	     WRITE (OutputFile,
		 CHR (ORD (' ') + CurrentAccum MOD OutputRadix);
	     CurrentAccum := CurrentAccum DIV OutputRadix;
	     CurrentRadix := CurrentRadix DIV OutputRadix
	     END
	 END;

     IF CurrentRadix > 1
     THEN
	 WRITE (OutputFile, CHR (ORD ' ') + CurrentAccum);

(Actually, the programs are written in C and think they are doing 
clever bit twiddling.  The above is equivalent and extensible to other
bases.) 

     Unfortunately, the above algorithm works only if InputRadix and
OutputRadix are powers of some other number.  [Actually, that is a
sufficient condition, rather than a necessary condition.  The
necessary condition for conversion both ways is that, for any two
integers k0 and k1, (InputRadix to the k0) is either a multiple or
submultiple of (OutputRadix to the k1).  I think this implies that
InputRadix and OutputRadix be powers of some other number, but I get
lost in the prime factorization exponent expressions.]

     Using 64 as OutputRadix requires 1.333 characters to represent a
single byte.  This is better than KERMIT's 1.766, but using 95 as the
radix in place of 64 would require only 1.218 characters to represent
a single byte, and the digits could be the non-space printable ASCII
characters.  (Adding the space character as a digit would permit radix
96, with 1.215 characters to represent a single byte.  But uucp
mailers like to throw away trailing blanks on a line, so permitting
space as a digit is dangerous.  Of course, the uuencode and uudecode
programs mentioned above use space as a digit, so uuencode'd files
must be further packaged to ensure that trailing blanks are protected
against uucp mailers.  I do love UNIX*, it's so sensible.) 

     Question:  Is there an order N algorithm that will convert an N 
digit number in one base to a number in another base, where
arithemetic on the N digit number requires order N time, without
restriction on the bases chosen? 

_______________
* UNIX is a trademark of AT & T Bell Laboratories.
T.RTitleUserPersonal
Name
DateLines
324.1ALIEN::POSTPISCHILFri Jul 26 1985 15:0532
It is not possible, in general, to convert a number from one base to another
in a serial fashion, looking at only a few digits at a time.

Suppose you are trying to convert 6553n from decimal to hexadecimal.  Depending
on whether the last digit is a 5 or a 6, the result could be FFFF or 10000.
Obviously, you must know the last digit of the number before you can obtain a
single digit of the result.

Similarly, if you were trying to convert from something such as trinary to
decimal, you would need to know the first digit before you could obtain any
digit of the result.

However, there are cases where such conversion may be possible.  In converting
from decimal to hexadecimal, looking at the last 4 decimal digits gives
sufficient information to compute the last hexadecimal digit.  A conversion can
proceed from last digit to first.  (It is not even necessary to read the file
being transmitted backwards; simply consider it as a number written backwards.)

I believe a necessary and sufficient condition for this requirement is that the
destination base must be a power of some factor of the source base.  (16 is the
fourth power of two, and two divides 10.)  96 and 256 do not satisfy this
requirement.  Since all factors of 256 are powers of two, any destination base
must also be a power of two. 

It might not be necessary to convert the file from one base to another.  Such
a scheme gives an optimal compression if every byte occurs with the same
frequency, but this is not always the case.  If the frequency of different
bytes is not always equal, schemes similar to those described in the base note
may be better.


				-- edp
324.2GOLLY::GILBERTFri Jul 26 1985 20:3620
It *is* possible, in general, to convert a number from one base to another in
serial fashion, looking at only a few digits at a time, in O(N) time, where N is
the size (in 'digits') of the number, and calculations on reasonably sized
numbers (no larger that base^2) take unit time. 

Let the input and output numbers be read and written (respectively) serially
from the least-significant digit to the most-significant digit.  The algorithm
is very straight-forward.

If the numbers are accessed from most-significant to least-significant, simply
reverse the input number, convert the number to the new base, reverse it, and
output this reversed number.  Although this requires storage for the entire
number, the reversals are *still* O(N) on the size of the numbers.

If the numbers must be accessed from most- to least-significant digit, AND only
limited storage is available, I believe the conversion can't be done in O(N)
time (in general), *unless* the number of digits in the source string is known
in advance.  With the input size known in advance, the algorithm seems tricky.

					- Gilbert
324.3WHYVAX::HETRICKFri Jul 26 1985 21:046
Well, one response says "it can't be done", and one says "yes it can, and
it's trivial."

I'm dumb today.  What is the trivial algorithm?

				Brian Hetrick
324.4ALIEN::POSTPISCHILFri Jul 26 1985 21:1815
Re .2:

> Let the input and output numbers be read and written (respectively) serially
> from the least-significant digit to the most-significant digit.  The algorithm
> is very straight-forward.

I do not mind being contradicted, but you should support your claim.  Can you
present such an algorithm?

The last 11 digits of a 12 digit number in base three are 00000000000.  Please
tell me any of the six significant digits of the representation of this
number in base 10.


				-- edp
324.5TOOLS::STANFri Jul 26 1985 21:221
I think he's got you there!
324.6BAGELS::ROSENBAUMFri Jul 26 1985 21:3311
{I interested in the answer to .4 as well, but in the mean time..}

Somewhere it was mentioned that all this is useful if the elements of the
character stream are randomly distributed.  This may be the case for binary
files, although even in that case it's unlikely.  In the case of plaintext,
I would think that it would much more fruitful to pursue text compression
(blank compression, huffman encoding, etc.) rather than radix conversion.

Who says we have to use the real world as our model, though?

__Rich
324.7GOLLY::GILBERTFri Jul 26 1985 23:5515
324.8ALIEN::POSTPISCHILSun Jul 28 1985 19:4410
Re .7:

The quoted text does not indicate the numbers are processed in serial fashion.

The last 11 digits of a 12 digit number in base three are 00000000000.  Please
tell me any of the six significant digits of the representation of this
number in base 10.


				-- edp
324.9ALIEN::POSTPISCHILMon Jul 29 1985 14:1625
324.10MARIAH::LARYThu Nov 07 1985 19:435
In a more prosaic vein, 95^16 is a little larger (about a factor of 2) than
256^13, so you can convert the data in 13-byte groups by doing a fairly
simple (VAX instruction set peculiarities aside) set of quadword divides.
The efficiency is 16/13 or about 1.23:1

324.112^n base conversion trickGUMDRP::HAINSWORTHShoes and ships and sealing waxFri Jun 12 1987 22:2537
    Re .0
    
    I thought of base 64 before I finished reading the first page of
    your note.
    
    As somebody mentioned earlier, there IS a trick to converting between
    bases that are powers of the same number.  Let me illustrate with
    hexadecimal (2^4) to octal (2^3) conversion:
    
                                |                       |
    Hexadecimal		|   D   |   9   |   E   |   2   |
    Binary               1 1 0 1 0 1 0 1 1 1 1 0 0 0 1 0|
    Octal	       1  |  5  |  2  |  7  |  4  |  2  |
                                |                       |
  
    Note that these numbers line up every 12 binary digits, and thus
    every 3 hexadecimal digits will turn into 4 octal digits (becoming
    12 binary digits in between).  Thus the process is "encapsulated":
    as soon as you accumulate 3 hexadecimal digits, you can send out
    4 octal digits and clear your buffer.
    
    The repeating period (number of digits) is the least common multiple 
    of the two powers (in this case, 4 and 3) to which the root number
    (in this case, 2) is raised to get the two bases.
    
    So how can we efficiently map a base 256 number into a base that's
    no bigger than 96?  Hmmm...
    
    Well, 256 = 2^8.
    The biggest power of 2 that's less than 96 is 2^6, or 64.
    To map 2^8 into 2^6, I need a buffer for LCM(8,6) = 24 bits.
    That's 3 input characters, or 4 output characters.  
    
    I don't think you can do any better without opening Pandora's box 
    of generalized base conversion.
    
    John