[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

803.0. "Discontiguous common substrings" by CLT::GILBERT (Builder) Mon Dec 14 1987 16:06

I liked this problem from USENET, and thought others might, too.

					- Gilbert
 
Here's a probability question whose answer should be known, but I don't
know it.
 
Suppose that A and B are two random binary strings of length n.
What is the expected size of the longest common substring, where the
subtring doesn't have to appear contiguously in A and B?
 
For example, if  A = 001011 and B = 101101, an example of a longest common
substring is 0101, which appears as 0-101-- in A and -01-01 in B.
 
A good asymptotic estimate would do if an exact answer isn't available.
Experiment suggests a value close to  0.82*n  for large n.
 
Please e-mail any suggestions (even if you post them, as news often gets
lost on the way here).   Thanks.
 
Brendan McKay
 
Paper:     Computer Science Department, Australian National University,
           GPO Box 4, Canberra, ACT 2601, Australia.
Telephone: + 61 62 49 3845                Telex: AA 62760 NATUNI
ACSnet:    bdm@anucsd.oz                  ARPA:  bdm%anucsd.oz@uunet.uu.net
CSNET:  bdm@anucsd.oz@csnet-relay.csnet   JANET: anucsd.oz!bdm@ukc
UUCP: {uunet,ubc-vision,ukc,mcvax,prlb2,hplabs,enea,mulga}!munnari!anucsd!bdm
BITNET:  anucsd.oz!bdm@uunet.uu.net (or similar)
T.RTitleUserPersonal
Name
DateLines
803.1what if they must be contiguous ?VIDEO::OSMANtype video::user$7:[osman]eric.vt240Mon Dec 14 1987 18:4513
    What is the largest expected common substring when comparing two
    length-n strings if the substring must be contiguous ?
    
    For instance, consider two random strings of length 8, which I
    make up as we speak, by letting my fingers randomly hit the 0 and
    1 keys:
    
    	01100110
    	10101010
    
    What do we have here ?  Largest is 10, length 2.
    
    /Eric
803.2CLT::GILBERTBuilderWed Dec 16 1987 22:0741
    Let E(a,b) be the expectation of the length of the longest 
    discontiguous match between random binary strings of length
    a and b, respectively.

    Then E(a,b) = E(b,a), E(a,0) = 0, and

	E(1,1) = 1/2
	E(1,2) = 3/4
	E(1,3) = 7/8
	E(1,b) = ( 2^b - 1 )/( 2^b )

	E(2,2) = 9/8
	E(2,3) = 23/16
	E(2,b) = ( 4*2^b - 2*b - 3 )/( 2*2^b )

	E(3,3) = 58/32
	E(3,4) = 137/64
	E(3,5) = 308/128
	E(3,6) = 667/256
	E(3,7) = 1406/512
	E(3,8) = 2909/1024

	E(4,4) = 323/128
	E(4,5) = 732/256
	E(4,6) = 1612/512
	E(4,7) = 3467/1024
	E(4,8) = 7313/2048

	E(5,5) = 1662/512
	E(5,6) = 3677/1024
	E(5,7) = 7975/2048
	E(5,8) = 17019/4096

	E(6,6) = 8151/2048
	E(6,7) = 17739/4096
	E(6,8) = 38049/8192

	E(7,7) = 38678/8192
	E(7,8) = 83186/16384

	E(8,8) = 178212/32768
803.3SSDEVO::LARYThu Dec 17 1987 00:447
	E(3,b) = ( 12*2^b - 2*b^2 - 3*b - 11 )/( 4*2^b )

	E(4,b) = ( 32*2^b - 4*(b^3)/3 - (b^2)/2 - 103*b/6 - 27 ) / ( 8*2^b )

Note: 4*(b^3)/3 + (b^2)/2 + 103*b/6 is always an integer, as it equals
   b^3 + 17*b + b*(b+1)*(2*b+1)/6 and the rightmost term is always an integer.
803.4SSDEVO::LARYThu Dec 17 1987 01:0422
Another way of looking at E(a,b) is E(a,b) = a - Pa(b)/(2^b), where:

P1(b) = 1


	    3
P2(b) = b + -
	    2


	1  2    3       11
P3(b) = - b  +  - b  +  -- 
	2       4        4


	1  3     1  2    103       27
P4(b) = - b  +  -- b  +  --- b  +  --
	6       16        48        8


Do these polynomials look familiar??
(I'd feel a lot better if P3's 11/4 were 9/4.....)
803.5Another look at itSQM::HALLYBKhan or bust!Thu Dec 17 1987 13:5419
N	E(N,N) from Peter		equals:

1    	E(1,1) = 1/2	 		     2 / 4^1
2	E(2,2) = 9/8			    18 / 4^2
3	E(3,3) = 58/32	 		   116 / 4^3
4	E(4,4) = 323/128		   646 / 4^4
5	E(5,5) = 1662/512		  3324 / 4^5
6	E(6,6) = 8151/2048		 16302 / 4^6
7	E(7,7) = 38678/8192		 77356 / 4^7
8	E(8,8) = 178212/32768		356424 / 4^8
    
    Anybody recognize the sequence above ^^^ ?

    Factoring the elements seems to lead nowhere, and after 5 they
    no longer grow with N-factorial.  Maybe some clever reader will
    recognize some direct or recursive relation.  Maybe Peter will
    furnish us with higher order E(N,N).

      John