| Here's an algorithm:
For every character position k, with k running from 1 through n, where
n is the length of the string, form k substrings s, of characters
j through k, with j running from 1 through k.
Set bit s of a bitvector to 1. When all through, scan the vector
for the first zero bit. This is the shortest non-occurring substring.
However, this is probably far from the shortest algorithm.
Can someone give an example of an o(n) algorithm ?
/Eric
|
| Given a string x of n characters from an alphabet of k characters, the
shortest string NOT found in the given string can be found in O(n) time,
where the size of the alphabet is fixed.
Let w = log n, rounded up to an integer. Now, the length of the shortest
k
string NOT in x must be <= w, since there are k^w strings of this length,
x contains n+1-k substrings of length w, and k^w >= n > n+1-k. Note that
k^w is O(n).
We can use a bitvector of length k^w to indicate substrings of length w
that occur in x. This takes O(n) time, with a fairly straight-forward
encoding of the index. Briefly, as we step through the characters of x,
the next index into the bitvector is computed from the previous index by
k*index mod k^w + nextchar (the alternative of recomputing the index
directly from each w-character substring would take O(nlogn) time).
Now we can 'compress' the bitvector, producing a one of length k^(w-1)
that represents (w-1)-character substrings that occur in x. This can
be done in O(n) time. This one can be further reduced (to k^(w-2)) in
O(n/k) time, and so on. Summing these times still gives an O(nk/(k-1)),
or O(n) running time.
The total length of these bitvectors is k^w + k^(w-1) + ... + k + 1,
or (k^(w+1)-1)/(k-1), or roughly nk/(k-1). Thus, we can search each
of these bitvectors (from smallest to largest) for a bit that's not
been set in O(n) time. This bit corresponds to the shortest substring
that doesn't occur in x.
Note that it takes O(n) time (worst case) to verify that a w-character
string does not occur in an n-character string. Thus, the above approach
is about as good as can be expected, within a constant of proportionality.
|