[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

750.0. "Urn probability problem; with application" by VMSINT::THIEL (Dave Thiel -- VMS Development) Fri Aug 14 1987 23:47

    An urn contains M balls uniquely numbered 1 through M.  A set of
    F balls is drawn at random from the urn.  N <= F, F <= M.
    
    What is the probability that the set of F balls contains at least
    one set of N sequentially numbered balls?
    
    I'm looking for an exact solution, a function of F, M, and N.
    

    This problem has a practical aspect.  Allow me to restate the problem.
    A VAX is configured with M pages of physical memory.  VMS has
    placed F of those pages onto the free list.  Let us assume that the
    pages on the free list are selected at random from the total set of M
    pages.  What is the probability that the free list contains at least
    one set of N physically contiguous pages?

T.RTitleUserPersonal
Name
DateLines
750.1CLT::GILBERTBuilderSun Aug 16 1987 01:3616
>   I'm looking for an exact solution, a function of F, M, and N.

    I think that this has no closed-form solution.  If you have some
    rough idea of the values of F, M, and N, perhaps an approximate
    solution would suffice?

    I did find that

	p(M,F,N) = ((M+1-F) * choose(M-N,F-N)) / choose(M,F), if F < 2*N,

    where choose(N,K) is a binomial coefficient, i.e., N!/( (N-K)! K! ).

    The problem with the above is that if P >= 2*N, it counts some things
    too many times -- the probability is too high.  However, if the
    probability is known to be low, the overcounting can be approximated
    and removed. 
750.2VMSINT::THIELDave Thiel -- VMS DevelopmentMon Aug 17 1987 01:3419
    The limited domain solution of .1 seems to fail a basic sanity test,
    e.g.
    
	P(M,F,1) = 1 for F > 0
    
        
   The offered limted solution is:
    
	((M+1-F) * choose(M-N,F-N)) / choose(M,F), if F < 2*N,

    When N = 1, this is:
    
    	((M+1-F) * choose(M-1,F-1)) / choose(M,F)
    
    =   ((M+1-F) * (M-1)! * F! * (M-F)!) / ((F-1)! * (M-F)! * M!)
    
    =   (M+1-F) * F / M
    
    
750.3BEING::POSTPISCHILAlways mount a scratch monkey.Mon Aug 17 1987 13:2325
    Re .2:
    
    >     =   (M+1-F) * F / M

    Since N <= F < 2N and N=1, then F=1, so the expression is
    
    	= (M+1-1)*1/M = 1, which is expected.
    
    However, I think the expression in .1 needs adjustment for another
    reason.  It basically enumerates the ways to pick a sequence and then
    pick the F-N remaining numbers.  This counts some sequences twice for
    some selections of the remaining numbers.  (E.g., picking 3, 4 and then
    2 is the same as picking 2, 3, and then 4.) 
    
    There are C(M-N,F-N) selections which contain a sequence starting with
    1.  For the other numbers 2 through M+1-F, there are C(M-N-1,F-N)
    selections, obtained by picking a sequence and then selecting the rest
    of the numbers except for the number right before the sequence, which
    would give us a sequence starting with a lower number, which we already
    counted.  So we have:
    
    	p(M,F,N) = (C(M-N,F-N) + (M-F) C(M-N-1,F-N)) / C(M,F) if F <= 2N
    
    
    				-- edp 
750.4CLT::GILBERTBuilderMon Aug 17 1987 21:148
>   For the other numbers 2 through M+1-F, there are [...]

    Don't you mean "For the other numbers 2 through M-N+1, there are ..."?
    You'll find that this simplifies to the expression given in .1.

>    	p(M,F,N) = (C(M-N,F-N) + (M-F) C(M-N-1,F-N)) / C(M,F) if F <= 2N

    Here, you must intend "if F < 2N".
750.6CLT::GILBERTBuilderTue Aug 18 1987 16:3475
    The approach taken in .3 is useful, but in general it overaccounts for
    things.  The following takes the corrections into consideration, and
    yields an answer which, while not a closed expression, is the sum of
    only floor(f/n) terms.


    Let z(m,n,p) be the number of ways to place p distinct runs of length n
    over a range of m.  With a little work we discover that:

	z(m,n,p) = C(m-p*n+1, p)

    Note .3 had

	w(m,n,f,1) = C(m-n,f-n) + (m-n)*C(m-n-1,f-n)

    More generally,

	w(m,n,f,p) = z(m-(n+1),n,p-1) * C(m-p*(n+1)+1,f-p*n)
			+ (z(m,n,p) - z(m-(n+1),n,p-1)) * C(m-p*(n+1),f-p*n)

    Note that this is roughly z(m,n,p) * C(m-p*(n+1),f-p*n); that is,
    the number of ways to place p runs of length n, times the number
    of ways to place the rest of the numbers.  The exact expression
    accounts for combinations that have a run starting at position 1.

    We can tediously expand the equation for w(m,n,f,p), to get

				      (m-n*p)!
	w(m,n,f,p) = (m-f+1) * ----------------------
			       (m-p-f+1)! p! (f-n*p)!
    
    But w is not what we want, because it counts arrangements having multiple
    runs too many times.  For example, w(m,n,f,1) counts arrangements having
    two runs twice, arrangements having three runs three times, and so on.
    More generally, w(m,n,f,p) counts arrangements having k runs C(k,p) times. 

    Letting x(m,n,f,p) be the exact count, we have:

		     inf
    		     ---
	w(m,n,f,p) = >   C(k,p) * x(m,n,f,k)
		     ---
		     k=p

    Of course, the upper bound on the summation need only go to floor(f/n).

    What we want is the exact number arrangements having at least one run:

	inf
	---
	>   x(m,n,f,k)
	---
	k=1

    But this is equal (!) to the following:

	inf              inf
	---              ---     k+1
	>   x(m,n,f,k) = >   (-1)    w(m,n,f,k)
	---              ---
	k=1              k=1


    And so the number of ways to choose f numbers from a set of m such
    that there is at least one run of length n is:

     floor(f/n)
	---     p+1    (m-f+1) (m-n*p)!
	>   (-1)    ----------------------
	---         (m-p-f+1)! p! (f-n*p)!
	p=1

    Of course, the probability is this divided by C(m,f).

					- Gilbert
750.7CLT::GILBERTBuilderTue Aug 18 1987 16:547
Or to make it a tad prettier,

floor(f/n)                              floor(f/n)
   ---     p+1    (m-f+1) (m-n*p)!         ---     p+1
   >   (-1)    ---------------------- =    >   (-1)    C(m-f+1,p) C(m-n*p,m-f)
   ---         (m-p-f+1)! p! (f-n*p)!      ---
   p=1                                     p=1
750.8SSDEVO::LARYThu Aug 20 1987 18:4120
In the summation

floor(f/n)                           
   ---     p+1    (m-f+1) (m-n*p)!   
   >   (-1)    ----------------------
   ---         (m-p-f+1)! p! (f-n*p)!
   p=1                               

The quantity (m-p-f+1) can become negative if f is greater than ~ mn/(n+1).

Computationally, I don't think this is a problem, as by the time this happens
in the summation the terms have become vanishingly small.

I believe the problem arises from the fact that the domain of w(m,n,f,p) in p
is smaller than the domain of the final summation in p; i.e. sometimes there
aren't any "rest of the numbers" to place. It can be fixed by changing the
upper limit of the sum to min(floor(f/n),m-f+1).

Nice derivation! especially the combinatorial sum trick....
750.9CLT::GILBERTBuilderThu Aug 20 1987 21:482
    Thanks for pointing out the problem.  I think your explanation doesn't
    quite suffice, but agree that w(m,n,f,p) is the likely culprit.
750.10VMSINT::THIELDave Thiel -- VMS DevelopmentFri Aug 21 1987 04:0670
    An urn contains M balls uniquely numbered 1 through M.  A set of
    F balls is drawn at random from the urn.  M >= F >= N > 0.
    
    What is the probability that the set of F balls contains at least
    one set of N sequentially numbered balls?
    
    There are C(M,F) ways to choose a set of F balls from the M balls.
    
    Let Q (M,F,N), M >= F >= N > 0, be the number of ways to choose F balls
    from the M balls such that there are at least N consecutively numbered
    balls among the F.

                Q (M,F,N)
    P (M,F,N) = ---------
                 C(M,F)
    
    Determine Q by enumerating all possibilities (stay with me, it's
    not as tedious as it may sound).
    
    Use the notation 1 (2) 3 ... to indicate a set of balls containing
    the balls numbered 1 and 3, omitting the ball numbered 2, and and
    arbitrary collection of balls with numbers greater than 3.
    
    Q (M,F,N) =
    
    	if M = F, then 1	(there is only one way to choose F,
    				 and F must contain N consecutively
    				 numbered balls)
    
    	otherwise, M > F.

	(1) ...  then  Q(M-1,F,N)
    
	1 (2) ...  if N > 1 then  Q(M-2,F-1,N) if F-1 >= N else 0
    
	1 2 (3) ...  if N > 2 then Q(M-3,F-2,N) if F-2 >= N else 0

    	...
    
        1 2 ... N-1 (N)  then Q(M-N,F-N+1,N) if F-N+1 >= N else 0

        1 2 ... N-1 N ...  then C(M-N,F-N)

    Adding all of these up:
    
    Q(M,F,N) (M >= F >= N > 0)  =
    
        if M = F    then   1
    
                                 min(N,F+1-N)
                                      ---
	otherwise    	C(M-N,F-N) +  >    Q(M-i,F-i+1,N)
                                      ---
                                      i=1

    and therefore:
    
    P(M,F,N) (M >= F >= N > 0)  =
    
        if M = F    then   1 / C(M,F)
    

                                 min(N,F+1-N)
                                      ---
	             	C(M-N,F-N) +  >    Q(M-i,F-i+1,N)
                                      ---
                                      i=1
        otherwise      -----------------------------------
                                    C(M,F)

750.11Ayup, that counts all the possibilitiesCLT::GILBERTBuilderFri Aug 21 1987 17:5330
re .10
	How many evaluations of distinct Qs are needed?  Assuming
	you store each computed Q in a two-dimensional array (the
	N parameter is the same for all of them), as reuse these
	saved values as needed, then roughly 2*N*(M-N)*(M-F)/(N-1)
	different values of the Q function must be evaluated.
	And, if F > 2N, most of these Qs require N additions; this
	approach is never much better than that given in .6, and
	may be considerably worse.

	Note that the calculations can be done without recursion:

	f := N;
	for m := f to M - ((F-f)*N) div (N-1) do
	    Q[m,f] := C(m,f);
	for f := N+1 to F do
	    begin
	    ibound := min(N,f+1-N);
	    for m := f to M - ((F-f)*N) div (N-1) do
		begin
		temp := C(m-N,f-N);
		for i := 1 to ibound do
		    temp := temp + Q[m-i,f-i+1];
		Q[m,f] := temp;
		end;
	    end;

Note that m, M, f, and F are distinct variables.  For simplicity, and
only slightly more computation, the expression "M - ((F-f)*N) div (N-1)"
could be replaced with "M - (F-f)".
750.12Demonstration of equivalent resultsCLT::GILBERTBuilderMon Aug 24 1987 00:03130
    The following Pascal program computes Q(m,f,n) by three different
    techniques, compares the results (to ensure they are equal), and
    displays the amount of CPU time consumed by each calculation.

					- Gilbert


program y (input, output);


function C(n,k: integer): integer;
var
    i,t:	integer;
begin
if n < k
then
    t := 0
else
    begin
    t := 1;
    for i := 1 to k do t := (t*(n+1-i)) div i;
    end;
C := t;
end;

function Q1 (m,f,n: integer): integer;
var
    p,t,u:	integer;
begin
t := 0;
for p := 1 to f div n do
    begin
    u := C(m-f+1,p) * C(m-n*p,m-f);
    if not odd(p) then u := -u;
    t := t + u;
    end;
Q1 := t;
end;

{ }
{floor(f/n)                              floor(f/n)
{   ---     p+1    (m-f+1) (m-n*p)!         ---     p+1
{   >   (-1)    ---------------------- =    >   (-1)    C(m-f+1,p) C(m-n*p,m-f)
{   ---         (m-p-f+1)! p! (f-n*p)!      ---
{   p=1                                     p=1
{ }

function Q2 (m,f,n: integer): integer;
var
    i,t:	integer;
begin
if m = f
then
    t := 1
else
    begin
    t := C(m-n,f-n);
    for i := 1 to min(n,f+1-n) do
	t := t + Q2(m-i,f-i+1,n);
    end;
Q2 := t;
end;

function Q3 (M,F,N: integer): integer;
var
    mm,ff,i,ibound,t: integer;
    Q: array [1..100,1..100] of integer;
begin
ff := N;
for mm := ff to M - ((F-ff)*N) div (N-1) do
    Q[mm,ff] := mm - ff + 1;
for ff := N+1 to F do
    begin
    ibound := min(N,ff+1-N);
    for mm := ff to M - ((F-ff)*N) div (N-1) do
	begin
	t := C(mm-N,ff-N);
	if mm > ff then
	for i := 1 to ibound do
	    t := t + Q[mm-i,ff-i+1];
	Q[mm,ff] := t;
	end;
    end;
Q3 := Q[M,F];
end;

procedure check (m,f,n: integer);
var
    E1,E2,E3:	integer;
    t0,t1,t2,t3:    integer;
begin
t0 := CLOCK;
E1 := Q1(m,f,n);
t1 := CLOCK;
E2 := Q2(m,f,n);
t2 := CLOCK;
E3 := Q3(m,f,n);
t3 := CLOCK;
if (E1 = E2) and (E2 = E3) then else write ('*** ');
write ('Q(',m:0,',',f:0,',',n:0,') = ',E1:0);
if (E1 = E2) and (E2 = E3) then else write (' vs ',E2:0,' vs ',E3:0);
writeln;
writeln (t1-t0:0,' ',t2-t1:0,' ',t3-t2:0);
end;

{ }
{    Q(M,F,N) (M >= F >= N > 0)  =
{        if M = F    then   1
{                                 min(N,F+1-N)
{				      ---
{	otherwise    	C(M-N,F-N) +  >    Q(M-i,F-i+1,N)
{				      ---
{                                     i=1
{ }

procedure main;
var
    m,f,n:	integer;
begin
while true do
    begin
    readln (m,f,n);
    check (m,f,n);
    end;
end;

begin
main;
end.