[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

360.0. "A,B,C,D with no perm. substrings" by TOOLS::STAN () Sun Oct 20 1985 16:12

The following is an unsolved problem:

Is there a sequence on 4 symbols without consecutive segments which
are permutations of each other?

References:

Richard K. Guy, Unsolved Problems in Number Theory. Springer-Verlag,
	New York: 1981. Page 124. Problem E21.

P. A. B. Pleasants, Non-repetitive sequences. Proc. Cambridge Philos. Soc.
	68(1970)267-274.

T. C. Brown, Is there a sequence on four symbols in which no two adjacent
	segments are permuations of one another? Amer. Math. Monthly
	78(1971)886-888.
T.RTitleUserPersonal
Name
DateLines
360.1BEING::POSTPISCHILTue Oct 22 1985 12:4715
There is a program in BEING""::USER1:[POSTPISCHIL.PUB]NP.C;1 and NP.EXE;1
which tries to find the a prefix of an infinitely-long string (our modified
infinite-type strings which are just sequences of characters).  Just enter
a number after running it.

Be sure to take the ;1 version, later versions may be incomplete as I work on
the program.  The size of backups in finding such a sequence seems to be larger
than for our previous problem; I do not yet know if it has a limit. 

Is there any hope for a solution similar to the set NR (this is NP by the
way, for no permutations)?  That is, can we find four strings to replace
A, B, C, and D with?


				-- edp
360.2BEING::POSTPISCHILTue Oct 22 1985 14:529
As an interesting problem, people might wish to figure out why the "acceptable"
function of the program mentioned in .1 works.  Given a string and a number
indicating a last position in that string, it determines whether or not the
string contains two adjacent substrings which are permutations of each other.
(It assumes the string without the last character removed does not contain
any such substrings.)


				-- edp
360.3SPRITE::OSMANTue Oct 22 1985 17:0316
.0 asks for "a sequence on 4 symbols . . .".  I assume it actually wants
an *infinite* sequence (or construction algorithm).  Without the *infinite*
qualifier, the sequence

	ABCD

fits the requirements.

By the way, edp's NP program modified with "#define NUM  3" claims that with
only three symbols A, B, C allowed, the maximum length without a duplicate
repeated permutative relationship is eight characters, with the first
possible string being

	ABACABAA.

/Eric
360.4KOBAL::GILBERTTue Oct 22 1985 18:5313
re .-1

That should be "ABACAB", of length 7.

	ABACABA
	ABACBAB
	ABACBC
	ABCABA
	ABCAC
	ABCBABC

The six strings above, their prefixes strings, and strings formed by
cyclic substitutions of A,B,C are the only solutions over three characters.
360.5BEING::POSTPISCHILTue Oct 22 1985 19:439
Re .3:

Be careful about using the program in that way.  Currently, when it reaches
the requested character, it prints its buffer, with little regard to how
much further it can be extended.  The maximum backup it found is printed
as an estimate of how many characters cannot be trusted.


				-- edp
360.6R2ME2::GILBERTWed Oct 23 1985 19:5718
Using Eric Postpischil's program, I generated a 10,000 character string, then
broke it into pieces just before all occurences of "abac" (there were 218 such
pieces, with 209 being unique).  Then, I took the 35 shortest of these, and
threw them at a program which takes the strings, four at a time, and checks:

	The strings have uniquely identifying substrings,
	The number of As,Bs,Cs,Ds in the strings are linearly independent,
	There are no duplicated permutations of short length, when the 4
	    strings are substituted for the 4 symbols in a string in NP.

The above conditions would allow a constructive proof that there are arbitrarily
long strings in NP, similar to the proof for NR.  Unfortunately, with the 35
shortest pieces, it was unable to satisfy the conditions.  I'm now running the
program, checking the 209 unique substrings I'd gotten.  Needless to say, this
may take a while before finding 4 that would produce a proof, if, indeed, 4 of
these strings *can* be used for such a proof.

					- Gilbert
360.7BEING::POSTPISCHILThu Oct 24 1985 11:5517
Re .6:

That is similar to the approach I have been using, but one thing bothers
me:  Even if a set of four strings has linearly independent numbers of the
characters (lets put them in an array called Q, with each row being the numbers
of A's, B's, C's, and D's for one of the strings), might it not be possible
to find four numbers [w x y z] = N (not all zero) such that NQ, although
not 0, has small elements, so that a repetition could be completed by taking
some characters from adjacent strings?  It seems to me that this is different
from the problem of ensuring there are no short duplicated permutations;
we must also ensure that it is not possible that NQ + f(x) - f(y) (where
f(t) is a four by one array representing the number of times each character
occurs in t) is never zero for any x and y which are substrings of any string
of the set of four strings.


				-- edp
360.8R2ME2::GILBERTThu Oct 24 1985 14:3840
Yes, it is a somewhat different problem; any solution so generated would
have to be double-checked.  But it's perhaps a moot point....

From the set of 209 unique substrings, I was unable to find 4 that satisfied
the conditions.  I now suspect that finding 4 such strings is impossible; we
might even be able to prove this.  Of course, this doesn't mean the string
can't be arbitrarily long, it just means that the simple substitution method
won't generating arbitrarily long strings; another method might be based on
substitutions based on pairs of adjacent characters.

For what it's worth, I tried generating a string in NP, with the additional
requirement that any "d" must be immediately preceded by an "a", thinking
that this might lead to a workable substitution method (since there'd be
fewer ways the substitution pieces would fit together, and less likely to
find 'short' repetitions).  However, the longest string possible with this
restriction is 39 characters in length.

Then, I tried loosening the restriction; a "d" must be preceded by an "a"
or "c".  After a considerable amount of time, the string eventually grew
to over 500 characters in length.  The string grows quite slowly, and may
or may not be infinite.

Back to the 'original' problem....  Might we be able to find a way to generate
arbitrarily long strings of 5 symbols, and then apply some transformation (say
on pairs of characters) to reduce the number of symbols to 4?

Alternately, could we find a faster way of generating a string in NR?  This
might help find better 'pieces' for a substitution method, but may also help
show (by exhaustion) an upper bound on the length of strings in NR.  The large
backup amounts imply that the search frequently goes down 'the wrong path'.
If some of these 'bad paths' could be tabulated, they could be avoided.  For
example, if we know the string 'abacabcdbad' can only be followed by a "c",
we may avoid considerable work persuing a continuation of "a" or "b" whenever
this substring is encountered in the string being generated.

Also, does anyone care to propose some methods of generating arbitrarily long
strings in NR, even without all the details, even invalid ideas?  Some other
idea may prove more productive here than the simple substitution method.

					- Gilbert
360.9BEING::POSTPISCHILThu Oct 24 1985 15:0617
Re .8:

How about trying other sets of strings?  Would the set of strings containing
only two Ds and beginning with BD and ending with DC be useful?
                                                                       
> Alternately, could we find a faster way of generating a string in NR?  This
> might help find better 'pieces' for a substitution method, but may also help
> show (by exhaustion) an upper bound on the length of strings in NR.  The large
> backup amounts imply that the search frequently goes down 'the wrong path'. If
> some of these 'bad paths' could be tabulated, they could be avoided. 

I'm a skeptic on this point, although you have done some amazing things.
But this method would seem to only limit small backup amounts.  I would expect
the larger backups to depend upon larger parts of the string.


				-- edp
360.10R2ME2::GILBERTThu Oct 24 1985 21:1713
That may be worthwhile...  The file HARE::SYS$PUBLIC:NP.A (while it lasts)
is a backup save set of a program and an executable image that reads a list
of strings from a file (ABC.LOG, one is included, lowercase only), and checks
to see whether some 4 of them can be used for the substitutions.

Re your scepticism.  Out of small backup amounts, big backups grow.  But I'm
a bit skeptical about it, too.  Suppose that substrings of length 20 would
be useful in avoiding the 'bad paths'.  In the NR problem, this would be fine,
because not very many different substrings are generated; but with NP, there
don't seem to be many duplications.  But after watching an algorithm chug
along for ages just trying to get 10,000 characters -- and spending most
of its time going forward a little and backing up -- anything's worth a try.

360.11BEING::POSTPISCHILFri Oct 25 1985 16:4168
A few definitions:

        The gcd of a set of vectors is the vector in which each element
        is the greatest common denominator of the corresponding elements
        in the vectors of the set.
        
        The remainder of two vectors ([...] % [...]) is the vector
        in which each element is the remainder of the corresponding
        elements.  (E.g. [a] % [b] = [n], where n is the remainder
        when a is divided by b.)
        
        For a string, s, in an alphabet of n characters, let f(s) denote
        a vector of length n where the i-th element is the number
        of times the i-th character of the alphabet occurs in s.
        
For what it's worth, I think a sufficient condition for a set of four strings,
S, being suitable for use as "macro" replacements for characters in a proof is:

        f(S) (the image of f -- the set of values it takes on for the
        values in S) is linearly independent. 
        
        For all strings x, y, and z (not all null) ((there exist strings
        a, b, and c such that ax, by, and zc are in S) implies (f(z) +
        2f(y) - f(x)) % gcd(f(S)) <> 0). 

To see why this is, consider a potential repeated permutation.  It must contain
some whole number of macro strings and a few pieces.  Let us consider the
point where the repeated permutations adjoin.  If we add up the characters
in the permutation on the right and subtract those from the left (using
vectors), we should get the zero vector.  Suppose the pieces are null.  Then
the sum just described can be obtained by counting the occurrences of each
macro string, s, (with the ones on the right be counted as positive and the
ones on the left being counted as negative) multiplying that number for each
string by f(s), and adding together the resulting vectors.  But if we get
the zero vector, the number of occurrences for each macro string must be
zero (because f(S) is linearly independent), hence the "source" string into
which the macro strings have been substituted had a repetition.

Now suppose the pieces are not null.  The contribution from the whole
macro strings must be 0 modulo gcd(f(S)).  At the far left, there may be
some characters from the end of a macro string.  These are accounted for in x,
for which f(x) is subtracted in the count.  At the far right, there may be
some characters from the beginning of a macro string.  These are accounted
for in z.  Finally, the meeting point of the repetitions may itself be in the
middle of the string.  This is accounted for in y.  To see this, consider the
macro string in which the repetitions meet.  There is a portion to the left
of the meeting point (b) and a portion to the right of the meeting point (y).
Suppose the macro string itself is s.  What we really want is f(y)-f(b); the
characters to the right of the meeting point minus those to the left.  But
f(s) = f(b)+f(y), so f(b) = f(s)-f(y).  Thus, f(y)-f(b) = f(y)-(f(s)-f(y)) =
2f(y)-f(s).  But f(s) is 0 modulo gcd(f(S)), so we may drop it.

The above covers all cases involving any string that crosses the boundary
of a macro string.  I will show that it also covers the case of a repetition
within a single string.  Suppose s = opqr is in S, with pq being a repeated
permutation.  From the inequality given above in the sufficient condition,
consider the case where x = r, y = qr, and z = o.  If the inequality is
satisfied, f(o)+2f(qr)-f(r) is not zero modulo gcd(f(S)).  Since f(s) is zero
modulo gcd(f(S)), f(o)+2f(qr)-f(r)-f(s) is not zero modulo gcd(f(S)).  Clearly,
this also means it is not zero.  Thus 0 <> f(o)+2f(qr)-f(r)-f(s) =
f(o)+2f(q)+2f(r)-f(r)-f(opqr) = f(o)+2f(q)+f(r)-f(o)-f(p)-f(q)-f(r) = 
f(q)-f(p).  But f(q)-f(p) must be zero if pq is a repeated permutation.  So the
above condition eliminates all possibilities of repetitions in the string
obtained by substituting members of S in a string containing no repetitions.

                                             
        				-- edp
                                        
360.12R2ME2::GILBERTTue Oct 29 1985 20:096
Here's a method that might be used to produce an arbitrarily long string in NP.

Take the counting numbers in base 4 notation (1,2,3,10,11,12,13,20,...),
and take the the sum of the digits, mod 4 (1,2,3,1,2,3,0,2,3,0,1,3,...).
Then, make substitutions on adjacent pairs of digits (either paired like
[1,2],[3,1],[2,3],... or like [1,2],[2,3],[3,1],[1,2],...).
360.13R2ME2::GILBERTWed Oct 30 1985 01:446
re 360.12:

I tried this, using base 3 and base 4.  The generated string wasn't even
cube-free.  I'll try it again, using groups of 3 characters.

Any other ideas?
360.14ALIEN::POSTPISCHILThu Oct 31 1985 10:5823
Re .13:

Here's the glimmering of an idea:  Consider generating all strings in NP of
length 0, then of length 1, then of length 2, and so on.  I used a simple
procedure to generate strings of NR:  To generate all strings of length n+1,
consider each string of length n.  For each such string, take the last n-1
characters and find the previously found such string, complete with links to its
children of length n.  Those links give the possible characters that may follow
the n-1 length substring.  For each of these characters, at most one test is
necessary to determine if it may be appended to the string of length n:  If n+1
is even, are the first (n+1)/2 characters a permutation of the last (n+1)/2
characters? 

Note that if the n-1 length substring has several children, only one may be
disqualified by that test.  If we can show that deficiencies caused by the n-1
length substring having no children or having one child which is disqualified
are made up for by there being other strings of length n which get several
children, we will have shown that the size of the sets of strings of given
lengths increases as the length increases.  This will give us arbitrarily-long
strings, and infinitely-long strings are only a short hop away. 


				-- edp 
360.15R2ME2::GILBERTSat Nov 02 1985 15:227
Can someone find a cube-free omega-string using 5 (or even 6) symbols?

An omega-string is just an arbitrarily/infinitely long string.  By cube-free,
I mean that the string contains no substring xyz such that the characters
in the substrings x, y, and z are permutations of each other.

This would be a step in the right direction.