[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

129.0. "Compressing spaces" by TURTLE::GILBERT () Mon Aug 20 1984 02:52

Occasionally, I'd want to convert all occurences of multiple spaces in a
document to a single space.  However, the editor I used couldn't handle
this; instead, it allowed a global subtitution of one string with another.

If the document has strings of 24 spaces or less, this could be accomplished
by the following 5 global substitutions:

	Replace this many	So the maximum string
	spaces with 1 space	of spaces is now
	       12		       12  (= 23 div 12 + 23 mod 12)
		3			5  (= 11 div  3 + 11 mod  3)
		2			3  (=  5 div  2 +  5 mod  2)
		2			2  (=  3 div  2 +  3 mod  2)
		2			1  (=  2 div  2 +  2 mod  2)

However, I discovered that 4 global substitutions suffice!

	Replace this many	So the maximum string
	spaces with 1 space	of spaces is now
		4			8  (= 23 div 4 + 23 mod 4)
		3			4  (=  8 div 3 +  8 mod 3)
		2			2  (=  4 div 2 +  4 mod 2)
		2			1  (=  2 div 2 +  2 mod 2)

Interesting.  Various problems come to mind.
How many substitutions are required to "compress" up to 255 spaces?
Up to how many spaces can be compressed with only 4 substitutions?  

    				- Gilbert
T.RTitleUserPersonal
Name
DateLines
129.1METOO::YARBROUGHTue Aug 21 1984 15:574
For n=255, using 12->1 reduces the maximum string to 21+3 = 24 which reduces
to the 24 case discussed above, so only 5 steps are required. A 3-d contour
plot of number of steps vs starting value and step size should show a trough
of minimum effort. - Lynn Yarbrough
129.2METOO::YARBROUGHThu Aug 23 1984 14:5621
I've done a more careful and thorough analysis with these results:
Notation: (4..8) means use any of the numbers from 4 to 8 in an editing pass;
"-> n" means this operation leaves a string of at most n blanks behind.
For N=255 we have in five steps:
	(13..20) -> 30
	(4..8)   -> 9
	(3..4)   -> 4
	(2..3)   -> 2
	(2)      -> 1
So there are 8*5*2*2 = 160 combinations of steps that will do the job in five
steps with maximum reduction at the first step.
Actually, in four steps one can reduce up through 40 blanks:
	(6..7) -> 10
	(3..4) -> 4
	etc.
So for N=255 there are LOTS of possibilities:
	(8..32+) -> 40 (I didn't look at > 32)
	(6..7)   -> 10 etc.
There are thus at least 200 sequences of five steps that will work.

-LDY-
129.3TURTLE::GILBERTThu Aug 23 1984 17:3317
Let f(s) be the largest number of spaces reducible to s spaces in one step.
Let v(s) be the size of a substitution that yields this reduction.
Let g(n) be the largest number of spaces reducible to 1 space in n steps.
	     n
Then g(n) = f (1).  Also, g(1)=2, g(2)=4, g(3)=10, g(4)=40.
Determine f(s), v(s), and g(n) -- a recurrence for g(n) is acceptable.

I'll try to post my solution next week.

More generally, ...

If, instead of reducing strings to 1 space, consider reducing all strings of
k or more spaces to exactly k spaces (strings of less than k spaces are left
unchanged).  I.e., all substitutions are of the form z -> k.
Determine f (s), v (s), and g (n).
	   k	  k	     k
					- Gilbert
129.4TURTLE::GILBERTWed Aug 29 1984 04:4339
Let <x,y> denote the substitution of x spaces with y spaces.
Then <v,1> reduces x spaces to (x div v + x mod v) == c(v,x) spaces.

Let f(v,s) be the largest number such that x <= f(v,s) implies c(v,x) <= s.

Theorem:
    f(v,s) = v*(s-v+2) + (v-2) = v*(s-v+3) - 2.
Proof:
    If 0 <= x <= v*(s-v+1) + (v-1) then c(v,x) <= (s-v+1) + (v-1) = s.
    If v*(s-v+2) <= x <= v*(s-v+2) + (v-2) then c(v,x) <= (s-v+2) + (v-2) = s.
    Thus, x <= f(v,s) implies c(v,x) <= s.
    If x = v*(s-v+2) + (v-1) then c(v,x) = (s-v+2) + (v-1) = s+1.
    Thus, f(v,s) = v*(s-v+2) + (v-2) is maximal.

						s+3		    s+3
If v = v(s) maximizes f(v,s), then v(s) = floor(---) or v(s) = ceil(---), and
						 2		     2
		       s+3	   s+3
f(v,s) == f(s) = floor(---) * ceil(---) - 2 .
			2	    2

If s is odd, then f(s) = (s+3)^2/4 - 2.
If s is even, then f(s) = ((s/2)+1)*((s/2)+2) - 2, and thus f(s) is also even.


Let g(n) be the maximum spaces reducible to 1 space by n substitutions.
	     n
Then g(n) = f (1) -- the function is applied n times.
So, g(1) = f(1) = 2, g(2) = f(f(1)) = f(2) = 4, g(3) = f(f(f(1))) = f(4) = 10.
Because s is even implies f(s) is even, we have:

				     g(n-1)	g(n-1)
	g(1) = 2, g(n) = f(g(n-1)) = ------ * ( ------ + 3 ) .
					2	   2


The generalization for <v,k> substitutions, k > 1, is still open.

					- Gilbert
129.5RUSURE::EDPAlways mount a scratch monkey.Thu Mar 16 1995 19:5525
    The function g Gilbert gives in .4 (g(i+1) = g(i)^2/4+3g(i)/2) is
    optimal if every string of spaces is replaced by a smaller string of
    spaces.  Can anybody give a closed form for this g?
    
    Without that restriction, g(3) is infinity:  Substitute " " with " x ",
    "x  " with "", and " x " with " ".  This changes any number of spaces
    to a single space.
    
    With the restriction that each string may contain only spaces, then up
    to q^i spaces can be reduced to a single space by these substitutions:
    
    	1 space to q spaces,
    	q+1 spaces to 1 space, repeated i times, and
    	q-1 spaces to 0 spaces.
    
    Thus g(3) is unbounded; it can be made as large as desired by using
    sufficiently large strings.  This can result in larger strings of
    spaces being converted to the null string.
    
    
    				-- edp
    
    
Public key fingerprint:  8e ad 63 61 ba 0c 26 86  32 0a 7d 28 db e7 6f 75
To find PGP, read note 2688.4 in Humane::IBMPC_Shareware.
129.6RUSURE::EDPAlways mount a scratch monkey.Wed Mar 22 1995 19:1660
    For what L can all strings of no more than L spaces be reduced to one
    space by n replacements in which each search and replacement string is
    at most q spaces?  The answer is at least q^(n-2).

    Proof:  Use a->b to denote replacing a spaces with b spaces, and
    consider the sequence of replacements:

	1 -> q-1,
	i repetitions of q -> 1, and
	q-1 -> 1.

    This sequence converts each string of up to q^i spaces to a single
    space.

    To see this, observe that the first replacement converts N spaces to
    N(q-1) spaces, so we have a multiple of q-1 spaces.  Each of the q->1
    replacements reduces the number of spaces by some multiple of q-1, so
    we are always left with a multiple of q-1.  I will prove below that if
    we start with at most q^i spaces, we are left with fewer than 2(q-1)
    spaces after completing the q->1 replacements.  Since we must have a
    positive multiple of q-1 spaces and we have fewer than 2(q-1), we must
    have exactly q-1 spaces.  The final replacement then converts this to a
    single space.

    Consider what happens in each of the q->1 replacements.  Suppose q->1
    is applied to a string of A spaces, where A = pq+r and 0 <= r < q --
    that is, p and r are the quotient and remainder when q divides A. After
    replacement, there are p+r spaces.  p+r = A/q+r-r/q.  Since r is at
    most q-1, p+r = A/q+r-r/q <= A/q+(q-1)-(q-1)/q.  For brevity, let B =
    (q-1)-(q-1)/q = (q-1)(1-1/q).  Thus applying q->1 to A spaces yields at
    most A/q+B spaces.  If we apply q->1 again, we get at most (A/q+B)/q+B
    spaces, which equals A/q^2 + B + B/q.  After i repetitions, the number
    of spaces is at most A/q^i plus a geometric series with largest term B
    and ratio 1/q between terms.  The sum of the series is
    B(1-1/q^i)/(1-1/q), so we have at most A/q^i + B(1-1/q^i)/(1-1/q)
    spaces.

    If we begin with at most q^i spaces, the first replacement, 1->q-1,
    gives us at most (q-1)q^i spaces.  Then the number of spaces after i
    repetitions of q->1 is at most:

	(q-1)q^i/q^i +      B       * (1-1/q^i)/(1-1/q) =
	(q-1)        +      B       * (1-1/q^i)/(1-1/q) =
	(q-1)        + (q-1)(1-1/q) * (1-1/q^i)/(1-1/q) =
	(q-1)        + (q-1)        * (1-1/q^i)         =
	(q-1)        +      (q-1)*1 + (q-1)*(-1/q^i)    =
	          2(q-1)            - (q-1)/q^i.

    This last expression is clearly less than 2(q-1), QED.


    This result is not optimal in all cases, since 6->1, 3->1, 2->1, 2->1
    works for up to 40 spaces, which is better than 1->5, 6->1, 6->1, 5->1.
    
    
    				-- edp
    
    
Public key fingerprint:  8e ad 63 61 ba 0c 26 86  32 0a 7d 28 db e7 6f 75
To find PGP, read note 2688.4 in Humane::IBMPC_Shareware.