[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

360.0. "A,B,C,D with no perm. substrings" by TOOLS::STAN () Sun Oct 20 1985 16:12

The following is an unsolved problem:

Is there a sequence on 4 symbols without consecutive segments which
are permutations of each other?

References:

Richard K. Guy, Unsolved Problems in Number Theory. Springer-Verlag,
	New York: 1981. Page 124. Problem E21.

P. A. B. Pleasants, Non-repetitive sequences. Proc. Cambridge Philos. Soc.
	68(1970)267-274.

T. C. Brown, Is there a sequence on four symbols in which no two adjacent
	segments are permuations of one another? Amer. Math. Monthly
	78(1971)886-888.

T.R	Title	User	Date	Lines
360.1		BEING::POSTPISCHIL	`Tue Oct 22 1985 12:47`	15
	There is a program in BEING""::USER1:[POSTPISCHIL.PUB]NP.C;1 and NP.EXE;1 which tries to find the a prefix of an infinitely-long string (our modified infinite-type strings which are just sequences of characters). Just enter a number after running it. Be sure to take the ;1 version, later versions may be incomplete as I work on the program. The size of backups in finding such a sequence seems to be larger than for our previous problem; I do not yet know if it has a limit. Is there any hope for a solution similar to the set NR (this is NP by the way, for no permutations)? That is, can we find four strings to replace A, B, C, and D with? -- edp
360.2		BEING::POSTPISCHIL	`Tue Oct 22 1985 14:52`	9
	As an interesting problem, people might wish to figure out why the "acceptable" function of the program mentioned in .1 works. Given a string and a number indicating a last position in that string, it determines whether or not the string contains two adjacent substrings which are permutations of each other. (It assumes the string without the last character removed does not contain any such substrings.) -- edp
360.3		SPRITE::OSMAN	`Tue Oct 22 1985 17:03`	16
	.0 asks for "a sequence on 4 symbols . . .". I assume it actually wants an infinite sequence (or construction algorithm). Without the infinite qualifier, the sequence ABCD fits the requirements. By the way, edp's NP program modified with "#define NUM 3" claims that with only three symbols A, B, C allowed, the maximum length without a duplicate repeated permutative relationship is eight characters, with the first possible string being ABACABAA. /Eric
360.4		KOBAL::GILBERT	`Tue Oct 22 1985 18:53`	13
	re .-1 That should be "ABACAB", of length 7. ABACABA ABACBAB ABACBC ABCABA ABCAC ABCBABC The six strings above, their prefixes strings, and strings formed by cyclic substitutions of A,B,C are the only solutions over three characters.
360.5		BEING::POSTPISCHIL	`Tue Oct 22 1985 19:43`	9
	Re .3: Be careful about using the program in that way. Currently, when it reaches the requested character, it prints its buffer, with little regard to how much further it can be extended. The maximum backup it found is printed as an estimate of how many characters cannot be trusted. -- edp
360.6		R2ME2::GILBERT	`Wed Oct 23 1985 19:57`	18
	Using Eric Postpischil's program, I generated a 10,000 character string, then broke it into pieces just before all occurences of "abac" (there were 218 such pieces, with 209 being unique). Then, I took the 35 shortest of these, and threw them at a program which takes the strings, four at a time, and checks: The strings have uniquely identifying substrings, The number of As,Bs,Cs,Ds in the strings are linearly independent, There are no duplicated permutations of short length, when the 4 strings are substituted for the 4 symbols in a string in NP. The above conditions would allow a constructive proof that there are arbitrarily long strings in NP, similar to the proof for NR. Unfortunately, with the 35 shortest pieces, it was unable to satisfy the conditions. I'm now running the program, checking the 209 unique substrings I'd gotten. Needless to say, this may take a while before finding 4 that would produce a proof, if, indeed, 4 of these strings can be used for such a proof. - Gilbert
360.7		BEING::POSTPISCHIL	`Thu Oct 24 1985 11:55`	17
	Re .6: That is similar to the approach I have been using, but one thing bothers me: Even if a set of four strings has linearly independent numbers of the characters (lets put them in an array called Q, with each row being the numbers of A's, B's, C's, and D's for one of the strings), might it not be possible to find four numbers [w x y z] = N (not all zero) such that NQ, although not 0, has small elements, so that a repetition could be completed by taking some characters from adjacent strings? It seems to me that this is different from the problem of ensuring there are no short duplicated permutations; we must also ensure that it is not possible that NQ + f(x) - f(y) (where f(t) is a four by one array representing the number of times each character occurs in t) is never zero for any x and y which are substrings of any string of the set of four strings. -- edp
360.8		R2ME2::GILBERT	`Thu Oct 24 1985 14:38`	40
	Yes, it is a somewhat different problem; any solution so generated would have to be double-checked. But it's perhaps a moot point.... From the set of 209 unique substrings, I was unable to find 4 that satisfied the conditions. I now suspect that finding 4 such strings is impossible; we might even be able to prove this. Of course, this doesn't mean the string can't be arbitrarily long, it just means that the simple substitution method won't generating arbitrarily long strings; another method might be based on substitutions based on pairs of adjacent characters. For what it's worth, I tried generating a string in NP, with the additional requirement that any "d" must be immediately preceded by an "a", thinking that this might lead to a workable substitution method (since there'd be fewer ways the substitution pieces would fit together, and less likely to find 'short' repetitions). However, the longest string possible with this restriction is 39 characters in length. Then, I tried loosening the restriction; a "d" must be preceded by an "a" or "c". After a considerable amount of time, the string eventually grew to over 500 characters in length. The string grows quite slowly, and may or may not be infinite. Back to the 'original' problem.... Might we be able to find a way to generate arbitrarily long strings of 5 symbols, and then apply some transformation (say on pairs of characters) to reduce the number of symbols to 4? Alternately, could we find a faster way of generating a string in NR? This might help find better 'pieces' for a substitution method, but may also help show (by exhaustion) an upper bound on the length of strings in NR. The large backup amounts imply that the search frequently goes down 'the wrong path'. If some of these 'bad paths' could be tabulated, they could be avoided. For example, if we know the string 'abacabcdbad' can only be followed by a "c", we may avoid considerable work persuing a continuation of "a" or "b" whenever this substring is encountered in the string being generated. Also, does anyone care to propose some methods of generating arbitrarily long strings in NR, even without all the details, even invalid ideas? Some other idea may prove more productive here than the simple substitution method. - Gilbert
360.9		BEING::POSTPISCHIL	`Thu Oct 24 1985 15:06`	17
	Re .8: How about trying other sets of strings? Would the set of strings containing only two Ds and beginning with BD and ending with DC be useful? > Alternately, could we find a faster way of generating a string in NR? This > might help find better 'pieces' for a substitution method, but may also help > show (by exhaustion) an upper bound on the length of strings in NR. The large > backup amounts imply that the search frequently goes down 'the wrong path'. If > some of these 'bad paths' could be tabulated, they could be avoided. I'm a skeptic on this point, although you have done some amazing things. But this method would seem to only limit small backup amounts. I would expect the larger backups to depend upon larger parts of the string. -- edp
360.10		R2ME2::GILBERT	`Thu Oct 24 1985 21:17`	13
	That may be worthwhile... The file HARE::SYS$PUBLIC:NP.A (while it lasts) is a backup save set of a program and an executable image that reads a list of strings from a file (ABC.LOG, one is included, lowercase only), and checks to see whether some 4 of them can be used for the substitutions. Re your scepticism. Out of small backup amounts, big backups grow. But I'm a bit skeptical about it, too. Suppose that substrings of length 20 would be useful in avoiding the 'bad paths'. In the NR problem, this would be fine, because not very many different substrings are generated; but with NP, there don't seem to be many duplications. But after watching an algorithm chug along for ages just trying to get 10,000 characters -- and spending most of its time going forward a little and backing up -- anything's worth a try.
360.11		BEING::POSTPISCHIL	`Fri Oct 25 1985 16:41`	68
	A few definitions: The gcd of a set of vectors is the vector in which each element is the greatest common denominator of the corresponding elements in the vectors of the set. The remainder of two vectors ([...] % [...]) is the vector in which each element is the remainder of the corresponding elements. (E.g. [a] % [b] = [n], where n is the remainder when a is divided by b.) For a string, s, in an alphabet of n characters, let f(s) denote a vector of length n where the i-th element is the number of times the i-th character of the alphabet occurs in s. For what it's worth, I think a sufficient condition for a set of four strings, S, being suitable for use as "macro" replacements for characters in a proof is: f(S) (the image of f -- the set of values it takes on for the values in S) is linearly independent. For all strings x, y, and z (not all null) ((there exist strings a, b, and c such that ax, by, and zc are in S) implies (f(z) + 2f(y) - f(x)) % gcd(f(S)) <> 0). To see why this is, consider a potential repeated permutation. It must contain some whole number of macro strings and a few pieces. Let us consider the point where the repeated permutations adjoin. If we add up the characters in the permutation on the right and subtract those from the left (using vectors), we should get the zero vector. Suppose the pieces are null. Then the sum just described can be obtained by counting the occurrences of each macro string, s, (with the ones on the right be counted as positive and the ones on the left being counted as negative) multiplying that number for each string by f(s), and adding together the resulting vectors. But if we get the zero vector, the number of occurrences for each macro string must be zero (because f(S) is linearly independent), hence the "source" string into which the macro strings have been substituted had a repetition. Now suppose the pieces are not null. The contribution from the whole macro strings must be 0 modulo gcd(f(S)). At the far left, there may be some characters from the end of a macro string. These are accounted for in x, for which f(x) is subtracted in the count. At the far right, there may be some characters from the beginning of a macro string. These are accounted for in z. Finally, the meeting point of the repetitions may itself be in the middle of the string. This is accounted for in y. To see this, consider the macro string in which the repetitions meet. There is a portion to the left of the meeting point (b) and a portion to the right of the meeting point (y). Suppose the macro string itself is s. What we really want is f(y)-f(b); the characters to the right of the meeting point minus those to the left. But f(s) = f(b)+f(y), so f(b) = f(s)-f(y). Thus, f(y)-f(b) = f(y)-(f(s)-f(y)) = 2f(y)-f(s). But f(s) is 0 modulo gcd(f(S)), so we may drop it. The above covers all cases involving any string that crosses the boundary of a macro string. I will show that it also covers the case of a repetition within a single string. Suppose s = opqr is in S, with pq being a repeated permutation. From the inequality given above in the sufficient condition, consider the case where x = r, y = qr, and z = o. If the inequality is satisfied, f(o)+2f(qr)-f(r) is not zero modulo gcd(f(S)). Since f(s) is zero modulo gcd(f(S)), f(o)+2f(qr)-f(r)-f(s) is not zero modulo gcd(f(S)). Clearly, this also means it is not zero. Thus 0 <> f(o)+2f(qr)-f(r)-f(s) = f(o)+2f(q)+2f(r)-f(r)-f(opqr) = f(o)+2f(q)+f(r)-f(o)-f(p)-f(q)-f(r) = f(q)-f(p). But f(q)-f(p) must be zero if pq is a repeated permutation. So the above condition eliminates all possibilities of repetitions in the string obtained by substituting members of S in a string containing no repetitions. -- edp
360.12		R2ME2::GILBERT	`Tue Oct 29 1985 20:09`	6
	Here's a method that might be used to produce an arbitrarily long string in NP. Take the counting numbers in base 4 notation (1,2,3,10,11,12,13,20,...), and take the the sum of the digits, mod 4 (1,2,3,1,2,3,0,2,3,0,1,3,...). Then, make substitutions on adjacent pairs of digits (either paired like [1,2],[3,1],[2,3],... or like [1,2],[2,3],[3,1],[1,2],...).
360.13		R2ME2::GILBERT	`Wed Oct 30 1985 01:44`	6
	re 360.12: I tried this, using base 3 and base 4. The generated string wasn't even cube-free. I'll try it again, using groups of 3 characters. Any other ideas?
360.14		ALIEN::POSTPISCHIL	`Thu Oct 31 1985 10:58`	23
	Re .13: Here's the glimmering of an idea: Consider generating all strings in NP of length 0, then of length 1, then of length 2, and so on. I used a simple procedure to generate strings of NR: To generate all strings of length n+1, consider each string of length n. For each such string, take the last n-1 characters and find the previously found such string, complete with links to its children of length n. Those links give the possible characters that may follow the n-1 length substring. For each of these characters, at most one test is necessary to determine if it may be appended to the string of length n: If n+1 is even, are the first (n+1)/2 characters a permutation of the last (n+1)/2 characters? Note that if the n-1 length substring has several children, only one may be disqualified by that test. If we can show that deficiencies caused by the n-1 length substring having no children or having one child which is disqualified are made up for by there being other strings of length n which get several children, we will have shown that the size of the sets of strings of given lengths increases as the length increases. This will give us arbitrarily-long strings, and infinitely-long strings are only a short hop away. -- edp
360.15		R2ME2::GILBERT	`Sat Nov 02 1985 15:22`	7
	Can someone find a cube-free omega-string using 5 (or even 6) symbols? An omega-string is just an arbitrarily/infinitely long string. By cube-free, I mean that the string contains no substring xyz such that the characters in the substrings x, y, and z are permutations of each other. This would be a step in the right direction.