[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

514.0. "find shortest string NOT found in longer one." by SIERRA::OSMAN (and silos to fill before I feep, and silos to fill before I feep) Mon Jun 16 1986 14:47

    Given a string of bytes (characters), what's the most efficient
    way to calculate the shortest string NOT found in the given
    string ?
    
    For example, suppose our alphabet only contained "a", "b", and
    "c".
    
    In this short alphabet, if we start with "a", we could say
    the shortest string NOT to be found therein is "b".  If we
    start with "abc", the shortest not found is "aa".  If we start with
    "aabbcc", the shortest is "ac".
    
    If we start with "aabacbbcca", the shortest is perhaps "aaa".
    
    I can think of algorithms for solving the problem, but there
    are probably much faster ones.  For instance, one could search
    for "a", then "b", then "c" until one is NOT found, trying
    all the characters in the alphabet.  If all are found, one
    could then try "aa", "ab", "ac", going through all two-character
    combos.  This just seems slow though.
    
    /Eric

T.R	Title	User	Personal Name	Date	Lines
514.1	Hint	CLT::GILBERT	Juggler of Noterdom	`Tue Jun 17 1986 01:02`	3
	The shortest substring (in the fixed alphabet) that doesn't occur in the long string can be determined in O(n) time, where n is the length of the long string.
514.2	dumb algorithm to do it	SIERRA::OSMAN	and silos to fill before I feep, and silos to fill before I feep	`Thu Jun 19 1986 17:06`	13
	Here's an algorithm: For every character position k, with k running from 1 through n, where n is the length of the string, form k substrings s, of characters j through k, with j running from 1 through k. Set bit s of a bitvector to 1. When all through, scan the vector for the first zero bit. This is the shortest non-occurring substring. However, this is probably far from the shortest algorithm. Can someone give an example of an o(n) algorithm ? /Eric
514.3		CLT::GILBERT	Juggler of Noterdom	`Fri Jun 20 1986 05:41`	32
	Given a string x of n characters from an alphabet of k characters, the shortest string NOT found in the given string can be found in O(n) time, where the size of the alphabet is fixed. Let w = log n, rounded up to an integer. Now, the length of the shortest k string NOT in x must be <= w, since there are k^w strings of this length, x contains n+1-k substrings of length w, and k^w >= n > n+1-k. Note that k^w is O(n). We can use a bitvector of length k^w to indicate substrings of length w that occur in x. This takes O(n) time, with a fairly straight-forward encoding of the index. Briefly, as we step through the characters of x, the next index into the bitvector is computed from the previous index by k*index mod k^w + nextchar (the alternative of recomputing the index directly from each w-character substring would take O(nlogn) time). Now we can 'compress' the bitvector, producing a one of length k^(w-1) that represents (w-1)-character substrings that occur in x. This can be done in O(n) time. This one can be further reduced (to k^(w-2)) in O(n/k) time, and so on. Summing these times still gives an O(nk/(k-1)), or O(n) running time. The total length of these bitvectors is k^w + k^(w-1) + ... + k + 1, or (k^(w+1)-1)/(k-1), or roughly nk/(k-1). Thus, we can search each of these bitvectors (from smallest to largest) for a bit that's not been set in O(n) time. This bit corresponds to the shortest substring that doesn't occur in x. Note that it takes O(n) time (worst case) to verify that a w-character string does not occur in an n-character string. Thus, the above approach is about as good as can be expected, within a constant of proportionality.