[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference rusure::math

Title:	Mathematics at DEC

Moderator:	RUSURE::EDP

Created:	Mon Feb 03 1986
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2083
Total number of notes:	14613

81.0. "Wanted: test data" by AURORA::HALLYB () Sun Jun 17 1984 00:46

Looking at ways to tweak my factoring program, I was wondering if anybody knows
of lists of large composite numbers with factors.  The idea is to find, say,
100 numbers, each 30 digits long, having exactly two 15-digit prime factors. 
(30 & 15 are just examples; I need many such test cases). 

I'm generating lots of such test data but this will take quite a while to do;
does anybody know if RAND or NSA or somebody already has it off-the-shelf?

T.R	Title	User	Date	Lines
81.1		HARE::STAN	`Sun Jun 17 1984 20:20`	1
	The book I mention in note 51.5 has lots of such data.
81.2		AURORA::HALLYB	`Tue Jun 19 1984 22:00`	12
	I don't want people to say "sure, but notice that all the numbers he's testing evenly divide p^k +- 1", which is what Stan is referring to. I'm looking more for this kind of table: 100000000001000077597523 = 285030087263 * 350840155021 100000000001000446185743 = 235111196773 * 425330657891 100000000001000717777063 = 149080454239 * 670778745017 The above numbers are simply the first entries (above some threshold) satisfying my criteria, namely that the factors have the same size. If no such list of test data exists I might make some limited money generating the stuff myself. Maybe I oughta get in touch with Digital Press...
81.3		TURTLE::GILBERT	`Wed Jun 20 1984 15:34`	13
	It would probably be easier/better to prove the program correct, rather than rely on a relatively small sample of test cases. Anyway, ... Why not test the factorization program against some known factorizations? If you're explicitly interested in composite numbers that are the product of two 15-digit primes, then why not find some 15-digit primes, and multiply them together? I'm sure you can find several amoung the various factorizations in the text Stan mentioned. Also, there are some algorithms for finding primes that are slightly larger than some given N (these are used by some public-key encryption schemes that sound suspiciously similar to what you're requesting). - Gilbert
81.4		AURORA::HALLYB	`Wed Jun 20 1984 16:45`	13
	The suggestions are good valid suggestions that don't solve the problem. I guess I'm not making my point clear enough. Let me try this: The end goal here is to "tweak" the program to perform well. In order to make a fair test I need some test data. If I generate the test data by any of the suggested methods then my tweaking is subject to the criticism that it's only good for finding factors that have the same structure as the test data. Kind of like tuning a VMS system to run realtime data collection, and expecting the same system to run DBMS-32 optimally. What I believe is needed is some kind of tabular data where the factors have no intrinsic structure -- they didn't come out of an algorithmic process that generated them; they are "random" factors, subject only to the constraint that they be sufficiently large primes. (I DO want to tune away from direct search).
81.5		RANI::LEICHTERJ	`Sat Jun 23 1984 03:09`	4
	There is a recent algorithm that will generate random numbers with known factorizations and a uniform distribution. Would this help? - I can get you a reference. (It's still not clear exactly what you are after.) -- Jerry
81.6		AURORA::HALLYB	`Sat Jun 23 1984 20:21`	52
	OK, if this doesn't make sense let's all drop the subject. I intend to obtain the following tables for various values of k and t: Table 47. Average time (sec) to factor using k=3, t=7 Number \| of \| Size (decimal digits) of number to be factored factors \| 16 20 ... 40 (say) --------+-------------------------------------------------- 2 \| 8.4 23 ... 3982 I'll also have 3 \| 4.5 17 ... 1762 the standard 4 \| 2.0 9 ... 341 deviations. The table entries are currently keyboard fantasies. I don't know if I need to go out to 40 digits or more or less. I think the lower bound will be 16 or 20, based upon observed speeds. From a whole series of tables of this form I intend to deduce optimum k and t values, if they exist, for the algorithm to use. My current conjecture is that the optimum value of `t' will depend on the size of the number, but that `k' won't. Since nobody really knows the story here (how to select k,t), I propose to empirically determine what works well, with what reliability. So to fill in these entries I need some "good" numbers to factor. I keep trying to say that I don't want algorithmically-generated "good" numbers because I want to end up with results that are "independent" -- they don't depend on the struc- ture of the factors. Another restriction is that the factors have to be reason- ably large. To split a 24-digit number into primes 11 * 12345678901234567890123 (pretend the latter number is prime) is not terribly enlightening and doesn't really reveal a lot about the algorithm. So a further restriction on "goodness" is that the factors be fairly large. To be "good", a number N with K factors has to have all factors larger than the (K+1)st root of N. Rather an arbitrary definiton at this point, but it keeps the riffraff out. Once I'm able to determine optimum (k,t) values the next step would be to code up the algorithm to use those values. But even more interesting is the ability to (perhaps) be able to project factoring time for the algorithm so that we can dynamically decide when factoring is computationally infeasible. (Example: it may be that we can determine in O(days) that N probably has only two factors but finding them may take O(years). This would be statistically deducible based upon table entries given mean,std_dev, and a favorable distribution. It would not be a certainty, but if you've got to factor 100 numbers this approach could help decide which to go for. There is also the possibility to compare this algorithm with others, such as Pollard's Rho-method (which I also have coded) and others yet to appear in print. Once there's a solid data base it's pretty easy to make reliable statements about which method is superior to others. Then we can talk about parallel-processor versions of algorithms... But the whole key to this is having a list of "good" numbers to factor. I can afford to be so picky about good numbers because they're easily generated in a matter of CPU-months, and I'm currently doing that. I was hoping to find a "standard test case", though, rather than having to generate one myself.