[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:Mathematics at DEC
Moderator:RUSURE::EDP
Created:Mon Feb 03 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2083
Total number of notes:14613

81.0. "Wanted: test data" by AURORA::HALLYB () Sun Jun 17 1984 00:46

Looking at ways to tweak my factoring program, I was wondering if anybody knows
of lists of large composite numbers with factors.  The idea is to find, say,
100 numbers, each 30 digits long, having exactly two 15-digit prime factors. 
(30 & 15 are just examples; I need many such test cases). 

I'm generating lots of such test data but this will take quite a while to do;
does anybody know if RAND or NSA or somebody already has it off-the-shelf?
T.RTitleUserPersonal
Name
DateLines
81.1HARE::STANSun Jun 17 1984 20:201
The book I mention in note 51.5 has lots of such data.
81.2AURORA::HALLYBTue Jun 19 1984 22:0012
I don't want people to say "sure, but notice that all the numbers he's
testing evenly divide p^k +- 1", which is what Stan is referring to.
I'm looking more for this kind of table:

100000000001000077597523 = 285030087263 * 350840155021
100000000001000446185743 = 235111196773 * 425330657891
100000000001000717777063 = 149080454239 * 670778745017

The above numbers are simply the first entries (above some threshold)
satisfying my criteria, namely that the factors have the same size.
If no such list of test data exists I might make some limited money generating
the stuff myself.  Maybe I oughta get in touch with Digital Press...
81.3TURTLE::GILBERTWed Jun 20 1984 15:3413
It would probably be easier/better to prove the program correct, rather than
rely on a relatively small sample of test cases.  Anyway, ...

Why not test the factorization program against some known factorizations?

If you're explicitly interested in composite numbers that are the product of
two 15-digit primes, then why not find some 15-digit primes, and multiply them
together?  I'm sure you can find several amoung the various factorizations in
the text Stan mentioned.  Also, there are some algorithms for finding primes
that are slightly larger than some given N (these are used by some public-key
encryption schemes that sound suspiciously similar to what you're requesting).

					- Gilbert
81.4AURORA::HALLYBWed Jun 20 1984 16:4513
The suggestions are good valid suggestions that don't solve the problem.
I guess I'm not making my point clear enough.  Let me try this:  The end
goal here is to "tweak" the program to perform well.  In order to make a
fair test I need some test data.  If I generate the test data by any of the
suggested methods then my tweaking is subject to the criticism that it's
only good for finding factors that have the same structure as the test data.
Kind of like tuning a VMS system to run realtime data collection, and expecting
the same system to run DBMS-32 optimally.

What I believe is needed is some kind of tabular data where the factors have no
intrinsic structure -- they didn't come out of an algorithmic process that
generated them; they are "random" factors, subject only to the constraint that
they be sufficiently large primes.  (I DO want to tune away from direct search).
81.5RANI::LEICHTERJSat Jun 23 1984 03:094
There is a recent algorithm that will generate random numbers with known
factorizations and a uniform distribution.  Would this help? - I can get
you a reference.  (It's still not clear exactly what you are after.)
						-- Jerry
81.6AURORA::HALLYBSat Jun 23 1984 20:2152
OK, if this doesn't make sense let's all drop the subject.
I intend to obtain the following tables for various values of k and t: 

	Table 47.  Average time (sec) to factor using k=3, t=7
Number  |
of	|    Size (decimal digits) of number to be factored 
factors |	  16	  20	  ...	 40 (say)
--------+--------------------------------------------------
2	|	   8.4	  23	  ...	3982		I'll also have
3	|	   4.5	  17	  ...	1762		the standard
4	|	   2.0	   9	  ...	 341		deviations.

The table entries are currently keyboard fantasies.  I don't know if I need
to go out to 40 digits or more or less.  I think the lower bound will be 16
or 20, based upon observed speeds.  From a whole series of tables of this form
I intend to deduce optimum k and t values, if they exist, for the algorithm to
use.  My current conjecture is that the optimum value of `t' will depend on the
size of the number, but that `k' won't.  Since nobody really knows the story
here (how to select k,t), I propose to empirically determine what works well,
with what reliability.

So to fill in these entries I need some "good" numbers to factor.  I keep trying
to say that I don't want algorithmically-generated "good" numbers because I want
to end up with results that are "independent" -- they don't depend on the struc-
ture of the factors.  Another restriction is that the factors have to be reason-
ably large.  To split a 24-digit number into primes 11 * 12345678901234567890123
(pretend the latter number is prime) is not terribly enlightening and doesn't
really reveal a lot about the algorithm.  So a further restriction on "goodness"
is that the factors be fairly large.  To be "good", a number N with K factors
has to have all factors larger than the (K+1)st root of N.  Rather an arbitrary
definiton at this point, but it keeps the riffraff out.

Once I'm able to determine optimum (k,t) values the next step would be to code
up the algorithm to use those values.  But even more interesting is the ability
to (perhaps) be able to project factoring time for the algorithm so that we can
dynamically decide when factoring is computationally infeasible.   (Example:  it
may be that we can determine in O(days) that N probably has only two factors
but finding them may take O(years).  This would be statistically deducible based
upon table entries given mean,std_dev, and a favorable distribution.  It would
not be a certainty, but if you've got to factor 100 numbers this approach could
help decide which to go for.

There is also the possibility to compare this algorithm with others, such as
Pollard's Rho-method (which I also have coded) and others yet to appear in
print.  Once there's a solid data base it's pretty easy to make reliable
statements about which method is superior to others.  Then we can talk about
parallel-processor versions of algorithms... 

But the whole key to this is having a list of "good" numbers to factor.  I can
afford to be so picky about good numbers because they're easily generated in a
matter of CPU-months, and I'm currently doing that.  I was hoping to find a
"standard test case", though, rather than having to generate one myself.