[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

324.0. "Base conversion algorithms" by WHYVAX::HETRICK () Thu Jul 25 1985 19:39

     A problem arising in file transmission protocols is the
representation of "special" characters, characters which might be
corrupted by the data path in use.  Control characters may interact
with environment in undesired ways (e.g., ^S may cause the receiver to
be unable to acknowledge a transmission), and the eighth bit of
characters may be lost by seven bit wide data paths. 

     Different file transmission protocols deal with this in different
ways.  The KERMIT protocol, for example, has "quoting" features, such
that C0 control characters are represented as a prefix character
followed by G0 character, G1 characters are represented as a prefix
character followed by a G0 character, and C1 characters are
represented as two prefix characters followed by a G0 character. The
G0 quoting characters are also quoted.  Over the entire 256 character
set, an average of 1.766 characters is required to represent a single
byte.  The XMODEM, YMODEM, and uucp protocols deal with the problem by
ignoring it. 

     If the byte stream to be transmitted is viewed as a (very large)
integer expressed in radix 256, then it is possible to convert this
radix 256 representation into a representation whose radix is at most
96.  This would allow the use of printable ASCII characters as the
digits of the representation, avoiding most questions of data path
affects.  However, any such conversion must require knowledge of only
a small number of consecutive digits;  using a traditional base
conversion algorithm on even a small 10 kilobyte file is out of the
question. 

     The public domain program uuencode converts a radix 256 digit
stream into a radix 64 digit stream.  The digits of the radix 64 digit
stream are the 64 ASCII characters starting at space.  The public
domain program uudecode converts the radix 64 representation back into
the radix 256 representation, the original byte stream.  This
conversion requires 1.333 characters to be transmitted for each byte
of data.  The algorithm used is essentially: 

     CONST
	 InputRadix  = 256;  {  64 for uudecode }
	 OutputRadix =  64;  { 256 for uudecode }

     CurrentAccum := 0;
     CurrentRadix := 1;

     WHILE NOT EOF (InputFile)
     DO
	 BEGIN
	 READ (InputFile, InputChar);
	 CurrentAccum := ORD (InputChar) * CurrentRadix + CurrentAccum;
	 CurrentRadix := InputRadix      * CurrentRadix;
	 WHILE CurrentRadix > OutputRadix
	 DO
	     BEGIN
	     WRITE (OutputFile,
		 CHR (ORD (' ') + CurrentAccum MOD OutputRadix);
	     CurrentAccum := CurrentAccum DIV OutputRadix;
	     CurrentRadix := CurrentRadix DIV OutputRadix
	     END
	 END;

     IF CurrentRadix > 1
     THEN
	 WRITE (OutputFile, CHR (ORD ' ') + CurrentAccum);

(Actually, the programs are written in C and think they are doing 
clever bit twiddling.  The above is equivalent and extensible to other
bases.) 

     Unfortunately, the above algorithm works only if InputRadix and
OutputRadix are powers of some other number.  [Actually, that is a
sufficient condition, rather than a necessary condition.  The
necessary condition for conversion both ways is that, for any two
integers k0 and k1, (InputRadix to the k0) is either a multiple or
submultiple of (OutputRadix to the k1).  I think this implies that
InputRadix and OutputRadix be powers of some other number, but I get
lost in the prime factorization exponent expressions.]

     Using 64 as OutputRadix requires 1.333 characters to represent a
single byte.  This is better than KERMIT's 1.766, but using 95 as the
radix in place of 64 would require only 1.218 characters to represent
a single byte, and the digits could be the non-space printable ASCII
characters.  (Adding the space character as a digit would permit radix
96, with 1.215 characters to represent a single byte.  But uucp
mailers like to throw away trailing blanks on a line, so permitting
space as a digit is dangerous.  Of course, the uuencode and uudecode
programs mentioned above use space as a digit, so uuencode'd files
must be further packaged to ensure that trailing blanks are protected
against uucp mailers.  I do love UNIX*, it's so sensible.) 

     Question:  Is there an order N algorithm that will convert an N 
digit number in one base to a number in another base, where
arithemetic on the N digit number requires order N time, without
restriction on the bases chosen? 

_______________
* UNIX is a trademark of AT & T Bell Laboratories.

T.R	Title	User	Personal Name	Date	Lines
324.1		ALIEN::POSTPISCHIL		`Fri Jul 26 1985 15:05`	32
	It is not possible, in general, to convert a number from one base to another in a serial fashion, looking at only a few digits at a time. Suppose you are trying to convert 6553n from decimal to hexadecimal. Depending on whether the last digit is a 5 or a 6, the result could be FFFF or 10000. Obviously, you must know the last digit of the number before you can obtain a single digit of the result. Similarly, if you were trying to convert from something such as trinary to decimal, you would need to know the first digit before you could obtain any digit of the result. However, there are cases where such conversion may be possible. In converting from decimal to hexadecimal, looking at the last 4 decimal digits gives sufficient information to compute the last hexadecimal digit. A conversion can proceed from last digit to first. (It is not even necessary to read the file being transmitted backwards; simply consider it as a number written backwards.) I believe a necessary and sufficient condition for this requirement is that the destination base must be a power of some factor of the source base. (16 is the fourth power of two, and two divides 10.) 96 and 256 do not satisfy this requirement. Since all factors of 256 are powers of two, any destination base must also be a power of two. It might not be necessary to convert the file from one base to another. Such a scheme gives an optimal compression if every byte occurs with the same frequency, but this is not always the case. If the frequency of different bytes is not always equal, schemes similar to those described in the base note may be better. -- edp
324.2		GOLLY::GILBERT		`Fri Jul 26 1985 20:36`	20
	It is possible, in general, to convert a number from one base to another in serial fashion, looking at only a few digits at a time, in O(N) time, where N is the size (in 'digits') of the number, and calculations on reasonably sized numbers (no larger that base^2) take unit time. Let the input and output numbers be read and written (respectively) serially from the least-significant digit to the most-significant digit. The algorithm is very straight-forward. If the numbers are accessed from most-significant to least-significant, simply reverse the input number, convert the number to the new base, reverse it, and output this reversed number. Although this requires storage for the entire number, the reversals are still O(N) on the size of the numbers. If the numbers must be accessed from most- to least-significant digit, AND only limited storage is available, I believe the conversion can't be done in O(N) time (in general), unless the number of digits in the source string is known in advance. With the input size known in advance, the algorithm seems tricky. - Gilbert
324.3		WHYVAX::HETRICK		`Fri Jul 26 1985 21:04`	6
	Well, one response says "it can't be done", and one says "yes it can, and it's trivial." I'm dumb today. What is the trivial algorithm? Brian Hetrick
324.4		ALIEN::POSTPISCHIL		`Fri Jul 26 1985 21:18`	15
	Re .2: > Let the input and output numbers be read and written (respectively) serially > from the least-significant digit to the most-significant digit. The algorithm > is very straight-forward. I do not mind being contradicted, but you should support your claim. Can you present such an algorithm? The last 11 digits of a 12 digit number in base three are 00000000000. Please tell me any of the six significant digits of the representation of this number in base 10. -- edp
324.5		TOOLS::STAN		`Fri Jul 26 1985 21:22`	1
	I think he's got you there!
324.6		BAGELS::ROSENBAUM		`Fri Jul 26 1985 21:33`	11
	{I interested in the answer to .4 as well, but in the mean time..} Somewhere it was mentioned that all this is useful if the elements of the character stream are randomly distributed. This may be the case for binary files, although even in that case it's unlikely. In the case of plaintext, I would think that it would much more fruitful to pursue text compression (blank compression, huffman encoding, etc.) rather than radix conversion. Who says we have to use the real world as our model, though? __Rich
324.7		GOLLY::GILBERT		`Fri Jul 26 1985 23:55`	15
324.8		ALIEN::POSTPISCHIL		`Sun Jul 28 1985 19:44`	10
	Re .7: The quoted text does not indicate the numbers are processed in serial fashion. The last 11 digits of a 12 digit number in base three are 00000000000. Please tell me any of the six significant digits of the representation of this number in base 10. -- edp
324.9		ALIEN::POSTPISCHIL		`Mon Jul 29 1985 14:16`	25
324.10		MARIAH::LARY		`Thu Nov 07 1985 19:43`	5
	In a more prosaic vein, 95^16 is a little larger (about a factor of 2) than 256^13, so you can convert the data in 13-byte groups by doing a fairly simple (VAX instruction set peculiarities aside) set of quadword divides. The efficiency is 16/13 or about 1.23:1
324.11	2^n base conversion trick	GUMDRP::HAINSWORTH	Shoes and ships and sealing wax	`Fri Jun 12 1987 22:25`	37
	Re .0 I thought of base 64 before I finished reading the first page of your note. As somebody mentioned earlier, there IS a trick to converting between bases that are powers of the same number. Let me illustrate with hexadecimal (2^4) to octal (2^3) conversion: \| \| Hexadecimal \| D \| 9 \| E \| 2 \| Binary 1 1 0 1 0 1 0 1 1 1 1 0 0 0 1 0\| Octal 1 \| 5 \| 2 \| 7 \| 4 \| 2 \| \| \| Note that these numbers line up every 12 binary digits, and thus every 3 hexadecimal digits will turn into 4 octal digits (becoming 12 binary digits in between). Thus the process is "encapsulated": as soon as you accumulate 3 hexadecimal digits, you can send out 4 octal digits and clear your buffer. The repeating period (number of digits) is the least common multiple of the two powers (in this case, 4 and 3) to which the root number (in this case, 2) is raised to get the two bases. So how can we efficiently map a base 256 number into a base that's no bigger than 96? Hmmm... Well, 256 = 2^8. The biggest power of 2 that's less than 96 is 2^6, or 64. To map 2^8 into 2^6, I need a buffer for LCM(8,6) = 24 bits. That's 3 input characters, or 4 output characters. I don't think you can do any better without opening Pandora's box of generalized base conversion. John