[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

664.0. "Shuffling algorithms" by EAGLE1::DANTOWITZ (David ...) Mon Feb 09 1987 16:56

	Has anyone done analysis on shuffling algorithms?

	For example, given an array of N elements what is an
	efficient and effective method to shuffle the elements?


	Given a function ran(X) that returns an integer in the range
	[0,X) I use the following shuffle:



	Initialize each element in your destination array to a value
	that is not present in the source array (call it sen).

        For each element in the array choose a number in [0,X).  If that
	entry is filled in the destination array then search forward
	for a free entry.  (On each iteration through the loop the search
	direction is reversed.)


        direction := 1;

	Loop: I from 0 to X-1

	   index := ran(x);

           while dest[index] <> sen do
             BEGIN
               index := index + direction;

               if index < 0 then index := x-1;   { wrap around }

               if index = x then index := 0;     { wrap around }
             END;

           dest[index] := source[i];

           direction := - direction;

        end loop;
T.RTitleUserPersonal
Name
DateLines
664.1CLT::GILBERTeager like a childMon Feb 09 1987 17:3216
I've always liked the following approach.

As in .0, let ran(m) return a random integer (with a uniform distribution)
in the range 0..m-1.  Let swap(x,y) exchange the values of x and y.

	! Algorithm to perform an in-place shuffle of an n-element array,
	! x[1..n].
	!
	for i := n downto 1 do
	    swap (x[i], x[ran(i+1)]);

Note that care must be used in creating the ran(m) function -- it's
tempting to use a linear-congruential random number generator, and just
take that integer modulo m.  Instead, you should convert the number to a
floating-point number in the range [0..1), multiply by m, and take the
integer part of the product.  This has been discussed elsewhere.
664.2Peter, you're a card!SQM::HALLYBAre all the good ones taken?Mon Feb 09 1987 18:017
    Any reason why you use ran(i+1) instead of ran(n+1) in the .1 algorithm?
    It seems to me that restricting the target range would somehow bias
    the deck -- e.g., card 1 has two chances to swap for card 51 but
    only one chance to swap for card 52.  But this is just off the top
    of my head, it may all work out...
    
      "double-down" John
664.3Ante Up!COMET::ROBERTSDwayne RobertsMon Feb 09 1987 18:5720
    (To borrow and modify Peter's example in .1:)
    
    	! Algorithm to perform an in-place shuffle of an n-element array,
	! x[1..n].
	!
	for i := 1 to n do
	    swap (x[i], x[1+ran(n)]);

    RE .2: A randomly shuffled deck of cards is not made more random by
    re-shuffling it, so it shouldn't make any difference if a card is
    swapped once or a thousand times.
    
    Note that the APL language has a built-in dyadic operator "?"
    (pronounced "deal") that does a shuffle.  For example, the statement 
    
    DECK _ 52 ? 52
    
    would create a 52-element array named "DECK" with the integers from
    1 to 52 "randomly" distributed within it.

664.4Shuffle by sortingJON::MORONEYLegalize LibertyMon Feb 09 1987 20:1513
One shuffling algorithm I used follows.  Create 2 arrays, A[N] and B[N].
N is the size of the "deck".

Step 1:  Fill array A with integers from 1 to N.  Doesn't matter if they're
         in order.
Step 2:  Fill array B with random numbers, using your favorite random number
         generator that returns a real x such that 0 <= x < 1.0.
Step 3:  Using your favorite sorting algorithm with 1 modification, sort array
         B.  The modification is, whenever you swap 2 numbers B[I] and B[J],
         you also swap A[I] and A[J].
Step 4:  When the numbers in B are sorted, those in A are shuffled.

-Mike
664.5BEING::POSTPISCHILAlways mount a scratch monkey.Mon Feb 09 1987 22:2612
    The method presented in .0 does not give all permutations an equal
    chance of being selected.  For example, the second number you move
    has a 2/n chance of going immediately after the first one, when
    this chance should be 1/(n-1).
    
    The method in .4 requires unnecessary time and space, since the method
    Gilbert presented in .1 achieves the same effect quickly and gives an
    equal probability to each permutation, subject to the precision of the
    random-number operation.
    
    
    				-- edp 
664.6Skinning catsCLT::GILBERTeager like a childTue Feb 10 1987 00:3525
There was a typo in .1 -- it should read:

	for i := n downto 1 do swap (x[i], x[1+ran(i)]);

I find this easiest to prove correct.  If the loop order is changed:

	for i := 1 to n do swap (x[i], x[1+ran(i)]);

it's still correct, and you need not know n before processing.  Dwayne's
algorithm in .3:

	for i := 1 to n do swap (x[i], x[1+ran(n)]);

may well produce a better shuffle for standard (linear-congruential based)
random number generators.  I've used Mike's algorithm in .4 when shuffling
large data files (prepend random numbers, and feed them to a sort utility).


Mike's note "Shuffling by Sorting" reminds me of a sort algorithm proposed
by Phil Emma -- "Sorting by Shuffling":  Check whether the array is in order,
and if not, swap two randomly selected elements, and repeat (an early version
of the algorithm swapped the two elements only if they were out of order, but
that bordered on the practical).  Out of curiousity (and because we're getting
good at calculating expectations), what's the expected number of swaps to sort
three elements?  How about 4 elements?
664.7Looks random, but...CHOVAX::YOUNGBack from the Shadows Again,Tue Feb 10 1987 00:4219
    It is easy to demonstrate that the method listed in .0 is not 'fair'.
    This kind of 'lookaside' shuffling tends to result in clumps of
    cards early on.  Thus cards the are near each other in the beginning
    and of the same residue modulo 2 tend to be near each other when
    done.
    
    The method mentioned in .1 was for years my preferred method of
    doing shuffles, and all that time I believed it to be fair.  Then
    while trying to solve a related problem, I ran several tests to
    'prove' that it was fair.  Much to my chagrin very clear patterns
    in the redistribution showed up.  I would now say that it is only
    marginaly fairer than the .0 method.
    
    The method mentioned in .4 is new to me, and has great appeal. 
    It is clearly a fair shuffle, and I would bet that it has optimal
    performance for any fair shuffle.  Very clever, this is definitely
    the method that I plan to use in the future.
    
    --  Barry
664.8Efficiency CountsCOMET::ROBERTSDwayne RobertsTue Feb 10 1987 01:016
    .4 should shuffle very well; however, it is only as efficient as the
    sort used.  What's the fastest sort?  nlogn? 
    
    The methods that Peter and I advance are _linear_.  There could be a
    significant difference with large values of n.
     
664.9random thoughts...CHOVAX::YOUNGBack from the Shadows Again,Tue Feb 10 1987 01:5423
    Re .8:
    
    I would maintain that getting a correct answer was more important
    than getting an incorrect one efficiently.
    
    As for the efficiency of sorting, it is usually considered to be
    O(N*log(N)), though there are some radix sorts that are aguably
    O(N).
    
    It has just occured to me though that a fair algorithim with almost
    O(N) time could be designed by taking the original algorithim in
    .0 and using the more standard seperate lookaside lists, instead
    of the simpler 'in-deck' lookaside algorithim.  This involves building
    a seperate list for every 1..N.  Then scan up the main array, adding
    lists with a single element to the output list, skipping those with
    no elements, and sorting (.4) those with more than one element.
    This is a fair algorithim, and runs in average time:
    
    	O(N + N + (N^.5)*log(N^.5))  -->  O(N)
    
    Hows that?  (I learn something new every day in this conference  8^)

    --  Barry
664.10BEING::POSTPISCHILAlways mount a scratch monkey.Tue Feb 10 1987 11:5413
    Re .7:
    
    I would like to know more about the problems you found with the method
    Gilbert presented in .1.  Given a truly random random-number generator,
    Gilbert's procedure gives each possible permutation an exactly equal
    chance of occurring.
    
    The only chance for failure should be in the random-number generator.
    What did you use, and what sensitivities do the various methods
    presented in this topic have to the failings of common generators?
    
    
    				-- edp 
664.11.1 visits small subset of permutationsMATRIX::ROTHTue Feb 10 1987 15:2812
     I agree with .10 in principle, that each permutation is equally
     likely.  However, in practice there are usually vastly fewer random number
     states available than permutations.  For example, shuffling 52 cards
     there are around 8.052E+67 permutations, but with a 32 bit state
     word there are only 4.295E+09 random numbers.  So you can only
     visit 1.0/(1.875E+58) of the possible permutations.  Given that
     multiplicative congruential generators are notorious for falling
     on hyperplanes and so on, such use of the generator may cause problems.

     Just a guess...

     - Jim
664.12MATRIX::ROTHTue Feb 10 1987 15:315
    Of course the comments in .11 apply equally well to the idea of
    sorting an array of random numbers, but there the 'bad behaviour'
    of the generator may be less of a problem.

    - Jim
664.13CHOVAX::YOUNGBack from the Shadows Again,Tue Feb 10 1987 16:2619
    Re .10:
    
    I may be dense but it is not apparent to me that Peter's algorithim
    "gives each possible permutation an exactly equal chance of occuring."
    If you could (or anyone else), would you post an argument, or proof
    of this?
    
    My observations had nothing to do with the validity of the random
    number generators.  I came to my conclusions by running some simple
    tests that measured the redistribution results.  These tests used
    several different random-number generators, and most came from Knuth
    and had very good figures from the spectral test.  I will try to
    recreate these tests, and then submit the results here.
    
    While the end result of Peter's method may indeed be subtle, that
    does not mean that it is fair.  Many subtle random-number generators
    for instance, turned out to have poor distributions.
    
    --  Barry
664.14SQM::HALLYBAre all the good ones taken?Tue Feb 10 1987 17:4223
>    I may be dense but it is not apparent to me that Peter's algorithim
>    "gives each possible permutation an exactly equal chance of occuring."
>    If you could (or anyone else), would you post an argument, or proof

    One simple argument goes like this, using a 52-card deck as an example.
    Let J represent any card in the deck.  Clearly card J has a 1/52 chance
    of ending up in position 52, since that would occur only at the outset
    (n=52) and once placed there the card doesn't move.  (Peter's algorithm)
    
    Now we ask ourselves the chance that card J has of ending up in
    position 51.  For this to happen (a) J may not be card 52, and 
    (b) J must be selected for position 51.  Recall from probability that
    P(A & B) = P(A) * P(B|A).  Here, P(a) = 51/52, clearly the probability
    that card J is _not_ card 52 and P(b) =  1/51, since no matter where J
    is in the remainder of the deck, it has an even chance of being
    selected therefrom.  Since there are 51 cards to choose from we get 1/51.
    Multiplying those together gives us the probability 1/52 that J ends up
    as card 51.  Proceed inductively to demonstrate that all ppositions
    have an even chance of ending up with card J.  This easily generalizes
    to decks of other than 52 cards.  The algorithm appears to be as
    reliable as the random number generator.

      John
664.15Just wiggle a little and new ones appearMODEL::YARBROUGHWed Feb 11 1987 12:0814
>                                ... each permutation is equally
>     likely.  However, in practice there are usually vastly fewer random number
>     states available than permutations.  For example, shuffling 52 cards
>     there are around 8.052E+67 permutations, but with a 32 bit state
>     word there are only 4.295E+09 random numbers.  So you can only
>     visit 1.0/(1.875E+58) of the possible permutations.  Given that
>     multiplicative congruential generators are notorious for falling
>     on hyperplanes and so on, such use of the generator may cause problems.
>
You can get more permutations by NOT starting from the same original 'deck'
of numbers each time, but retaining the current permutation as the starter
for the next. There are any number of simple transformations of the
original 'deck' that will produce a different sequence, with a very high
probability that any two such sequences has no common permutations. 
664.16OoppsCHOVAX::YOUNGBack from the Shadows Again,Wed Feb 11 1987 15:5119
    Re .14:
    
    Well John, you have shown me the error of my obstinate ways.  I
    can now see that Peter's method using the expression:
    
    	Swap (Card[i], Card[1 + ran(i)]);
    
    is indeed fair.  I was misled because of its similarity to the method
    that I had been using which was:
    
    	Swap (Card[i], Card[1 + ran(n)]);
    
    which is NOT fair.  Thus Peter's method does seem to be the fastest
    and simplest fair method.
    
    Like I said, I learn something new every day ... 8^)
    
    
    --  Barry
664.17CLT::GILBERTeager like a childWed Feb 11 1987 20:548
Re .16:
    
>                    I was misled because of its similarity to the method
>   that I had been using which was:
>   	Swap (Card[i], Card[1 + ran(n)]);
>   which is NOT fair.

    Huh?  It looks fair to me.
664.18I want 52! possibilites, not just 2^32VIDEO::OSMANand silos to fill before I feep, and silos to fill before I feepThu Feb 12 1987 02:1714
Dot something or other raised an interesting point, that there are "only"
2^32 maximum random numbers available from MTH$RANDOM (are there 2^32,
or are many unreachable, but that's a different topic), but there are
52! desired permutations.

So, how do we program VMS to "truly" shuffle a deck ?  That is, how
do we give an equal chance to every one of the 52! combinations
coming up.

Perhaps 2^32 is plenty enough shuffles that it's not worth pursuing,
but certainly from an academic perspective it's an interesting
question.

/Eric
664.19Carry results forward and repeatMODEL::YARBROUGHThu Feb 12 1987 12:3219
Starting from the sequence 1..52 and a different random seed each time, you
can in principle generate 2^31+1 permutations. (The period of the VAX RNG
is chosen that way because it is prime. It's not guaranteed that these
permutations are all distinct, although it is very probable.) Then it's
time to recycle the random numbers. 

Next time, start each calculation with the last permutation generated by the 
previous cycle. This gives a different set of 2^32 permutations. Again, 
it's not guaranteed that there won't be two alike, but if there are, they 
will have been generated by a different seed.

Repeat this process. It will duplicate a sequence of permutations only when
the last permutation of the cycle is 1..52. 

I don't have the energy to figure out what the period of this process is.
A group theory expert could tell you. However, if the period of the RNG is 
indeed prime as it is supposed to be, then there is a good chance that the 
entire sequence of 52! permutations can be generated this way, and any 
subsequence can be found from the appropriate seed and initial permutation.
664.20BEING::POSTPISCHILAlways mount a scratch monkey.Thu Feb 12 1987 12:4321
    Re .18:
    
    Find a number, a, relatively prime to 52! such that (a-1) is divisible
    by four and not eight, and by the first powers but not the second
    powers of 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 39, 41, 43, and 47.
    Take another number, c, relatively prime to 52!, and your random number
    generator is Xnext = (a*Xcurrent + c) mod 52!.
    
    The potency of this generator is 25, which is enormous, and I would bet
    it passes the spectral test.  However, since we would be using every
    bit of the generated number, patterns would show up, so it might be
    better to use "mod 52!^n", where n is some integer greater than one.
    
    Use this generator in Gilbert's algorithm:  Given a generated number,
    r[0], let x[i] be the quotient and r[i+1] be the remainder of dividing
    r[i] by 52!^n/P(52,i+1), as i goes from 0 to 51, and where P(m,n) is
    the number of permutations of n objects selected from m objects.
    Return the x's. 
    
    
    				-- edp
664.21A reference seen on this...ENGINE::ROTHWed Feb 25 1987 11:3914
    Re .20 - that's pretty elegant.  However, there are probably cheaper
    ways of getting more 'state' into a random number generator, for
    example an additive generator based on a primitive trinomial will
    have an astronomical period and only requires a few integer adds
    per number.  I know such generators have been proposed for encryption,
    but don't have the list handy (many are based on Mersenne primes).

    What brings this to mind is that I happened to come across this
    paper yesterday while browsing around for something else -

	"Faster Methods for Random Sampling", J. Vitter
	CACM, July, 1984

    - Jim
664.22Typo alert!ZFC::DERAMODaniel V. D'EramoTue Dec 01 1987 15:4011
    Re: .19
    
    2^31 + 1 is not prime; however, 2^31 - 1 is.
    
    Re: .20
    
    You do not need to include 39 in the list of numbers, because 3
    and 13 are already in the list.  (39 = 3 * 13).  It's inclusion
    is not incorrect, it is just redundant.
    
    Dan
664.23Shake well before using.ZFC::DERAMODaniel V. D'EramoTue Dec 01 1987 21:5129
     The last swap of X[1] with itself in .1 can be avoided by

          for i := n downto 2
             swap(X[i], X[random integer in {1,...,i}])

     Instead of verifying this by calculating probabilities as in
     .14, I like to think of this algorithm as putting X[1]
     through X[n] into a bag, shaking them up, and WITHOUT
     LOOKING reaching in and taking one.  That goes into X[n].
     Now repeat this, with the bag now holding  i=n-1 elements,
     and put the randomly selected one into X[i] = X[n-1].
     Repeat for i=n-2,...,2 -- that is, until there is just one
     element left in the bag.  Put that one into X[1].  The "bag"
     at any one point in the loop is {X[1],...,X[i]} and the swap
     & decrement of i takes a random element out of the bag and
     puts it into its final resting place in the array.

     If you did this with a real deck of cards and a real paper
     bag, then it seems obvious that all of the 52! possible
     permutations are equally likely [if you shake the bag well
     enough. :^)].

     I would suggest that both this method from .1 [as well as
     the fair method in .4] could be used to test your favorite
     random number generator.  Shuffle <1,2,3,4,5> a few thousand
     times and record how many times each permutation comes up.
     Then run your favorite statistical test on the results.

     Dan