Factoring 1 [Archive] - thinBasic: Basic Programming Language Community Forum

View Full Version : Factoring 1

danbaron

30-04-2011, 08:29

I'm posting this in the science sub-forum. That way, I may be less vulnerable to attacks for writing the code of this topic in Racket.

This topic is about factoring composite integers. I became interested in it from Johannes' work on big integers, factoring composites, and primes.

Does this topic qualify as science? I think it does. Factoring is part of mathematics, and mathematics is the basis of science. And certainly, factoring done by computer, is part of computer science - and, computer science is part of science.

The code and output below use Racket, which is a descendant of Scheme, which is a dialect of Lisp.

I'm calling this topic, "Factoring 1", because, the algorithm I used is the most basic factoring algorithm.

Every positive integer, n, can be factored into k primes, for some value of k (k >= 1), such that the product of the k primes, equals n.

The algorithm works by dividing the integer (n) to factor, by members of a list containing the first y primes. If the y-th prime is at least as big as sqrt(n), then, the list is guaranteed to be able to factor n.

I arbitrarily decided to make a list of all of the primes less than or equal to the first prime greater than 10,000,000. That list is saved to file, and can be used again and again. Being able to read and write the list to file quickly, depended upon the ability of Racket to "serialize" and "deserialize" lists. The list is read and written as one unit, and the actions are very fast.

(When you use a computer to make a list of the first y primes, for some value of y, you soon realize that each succeeding prime takes longer to calculate than the previous one. The rate of growth of the list is always decreasing. At first, the list seems to grow very fast. But soon you notice that it is growing more slowly. Actually, as y approaches infinity, so does the time to calculate the y-th prime. But, theoretically, it is always possible to find the y+1-th prime, if you have a list of the first y primes. So, every prime that has ever been found, or ever will be found, could be found by constructing a list that begins with 2, and then repeatedly calculates the next prime and appends it to the list.)

So, the first thing I had to do was to make the list of primes. There is a procedure below, "set-initial-primes-file". It serialized and saved to a file called, "prime-list.txt", just the first four primes, 2, 3, 5, 7. I needed that little file, so I had something I could append to.

After that, the procedure, "append-primes-file", was used to increase the size of the list (file). It takes a parameter, "total-time", which determines how long the procedure runs. As written, the units of, "total-time", are hours. I ran the procedure numerous times, and each time it ran, the list (file) grew, until it reached its finished size. I think it took approximately 16 total hours to complete the file.

It turns out that the first prime greater than 10,000,000, is, 10,000,019. The complete list contains the first 664,580 primes, and its associated file size is 6.25 MB.

Once I had the completed file, "primes-list.txt", I could use it to factor composite integers, or to verify that an integer is prime.

The procedure that factors a number is called, "list-factor". I called it that, because, it factors a number by using a list of primes.

Since the list's largest prime, is 10,000,019, "list-factor" can factor any number that contains no factor larger than, (10,000,019)^2, or, 100,000,380,000,361. It means that the procedure can factor any number containing no factor larger than 14 digits.

******************************************************************************************************************************************************

What happens if the procedure encounters a number (or a quotient of factors) that is bigger than the 15 digit number (10000019)^2, i.e., 100000380000361?

For instance, the 15 digit number, 688846502588399 is greater than 100000380000361, and it happens to be prime.

However, what happens if we multiply together the next two primes after, 10000019, i.e., 10000079 * 10000103?

We get another 15 digit number, 100001820008137, which is also greater than 100000380000361, but, it is composite.

Can the procedure determine that 688846502588399 is prime, and that 100001820008137 is composite?

It cannot make the determination.

The procedure works by taking floor(sqrt(n)), where n is the number (or quotient of a partial factorization) being factored.

So, for 688846502588399, it finds, floor(sqrt(688846502588399)), equals 26245885.

And for, 100001820008137, it finds, floor(sqrt(100001820008137)), equals 10000090.

Both "roots" are bigger than the the biggest prime in the list, 10000019.

So, in both cases, the procedure starts at the beginning of the primes list with 2, and divides each number by each prime in the list. If the remainder of a division is 0, then, that prime divisor is one of the number's factors.

The procedure will stop before reaching the end of the primes list if it hasn't found a factor, and a candidate divisor is greater than the number's "root" (floor(sqrt(n))). (It is impossible for the smaller/smallest factor of an integer, n, to be larger than floor(sqrt(n)). For instance, consider n to be 21. floor(sqrt(21)) = 4. Therefore, if 21 is not prime, then, it must have a factor <= 4. And it does, 3. 3 * 7 = 21. You could look at it like this. If n = r * r, where r is the square root of n, and if n = a * b, where a and b are factors of n, then, if b is greater than r, a must be less than r.)

If the procedure tries every prime in the list up to its "root", and doesn't find a factor, then, the number must be prime.

(Incidentally, sqrt(21) = 4.58257569495584.. floor(sqrt(21)) = 4. ceiling(sqrt(21)) = 5. We are working with integers, so we have to use either 4 or 5 as potential factors. And, 5 * 5 = 25, would be greater than 21, so we use 4 as the maximum, i.e., floor(sqrt(n)).)

In the cases of the two numbers above, in both cases, floor(sqrt(n)) is larger than the last prime in the list, 1000019. So, unless the procedure finds a factor, it will try dividing the numbers by every prime in the list. And in both cases, it will never find a factor. For 688846502588399, there are no factors (besides 1 and itself), because it is prime. For 100001820008137, neither of its factors (10000079 and 10000103) is in the list, because they are the next two primes after the list's largest prime, 100000019.

In summary, when the procedure encounters a number (or a quotient of factors) that is larger than (10000019)^2, which it is unable to factor, it cannot say whether it is prime or composite. At that point, the procedure must stop.

----------------------------------------------------------------

What do I mean by a "quotient of factors", or, a "quotient of a partial factorization"?

Consider the integer, 2815417672486545963.

If the procedure factors it, the first factor found will be 3.

So, the procedure will divide the number by 3, getting the "quotient of the first factor", 938472557495515321.

Then, the procedure will attempt to factor 938472557495515321, and it will find the second factor, 29.

The procedure will divide 938472557495515321 by 29, getting the "quotient of the second factor", 32361122672259149.

Now, the procedure will try to factor the "quotient of factors", 32361122672259149.

But here, it has reached a dead end. 32361122672259149 is greater than (10000019)^2, and, it has no factors in the primes list.

So, the procedure can only say that it has partially factored 2815417672486545963. It cannot say whether 32361122672259149 is prime or composite.

******************************************************************************************************************************************************

In order to factor numbers, I run the code from within the DrRacket IDE. Doing that, compiles and loads the procedures. Then, I execute the procedure, "read-primes-file", from DrRacket's REPL (read-eval-print loop). That loads the list of primes. After that, I can factor numbers by executing the procedure, "list-factor".

I made two more procedures, "time-read-primes-file", and, "time-list-factor". They time the "read-primes-file" and the "list-factor" procedures. The displayed times are in seconds.

Below is shown the code, and example interactions using the REPL.

(To factor bigger numbers, I will have to use another method.)

(One more thing. If you want to instantaneously become at least a billionaire, all you need to do is to develop a procedure that will factor an arbitrary 1000 digit decimal integer, on an average computer, in one second.)

:oops: :x :twisted:

; code -------------------------------------------------------------------------------

#lang racket
(require racket/serialize)

(define pl (list))
(define start-time 0)

(define (set-start-time)
(set! start-time (current-milliseconds)))

(define (elapsed-time)
(/ (- (current-milliseconds) start-time) 3600000.0))

(define (set-initial-primes-file)
(define out (open-output-file "primes-list.txt" #:exists 'replace))
(define l (list 2 3 5 7))
(write (serialize l) out)
(close-output-port out))

(define (append-primes-file total-time)
(read-primes-file)
(set-start-time)
(displayln (length pl))
(define last-prime (last pl))
(define (test-next-number n)
(cond
((> (last pl) 10000000) void)
((even? n) (test-next-number (+ n 1)))
((> (elapsed-time) total-time) void)
(else
(define n-root (floor (sqrt n)))
(define result (check-list n n-root pl))
(when result (set! pl (append pl (list n))))
(test-next-number (+ n 1)))))
(define (check-list test test-root l)
(define check (car l))
(cond
((null? l) #t)
((> check test-root) #t)
((= (modulo test check) 0) #f)
(else (check-list test test-root (cdr l)))))
(test-next-number (+ last-prime 1))
(write-primes-file))

(define (write-primes-file)
(define out (open-output-file "primes-list.txt" #:exists 'replace))
(write (serialize pl) out)
(close-output-port out))

(define (read-primes-file)
(define in (open-input-file "primes-list.txt"))
(set! pl (deserialize (read in)))
(close-input-port in))

(define (list-factor n)
(define factors (list))
(define (test-factors n1 l)
(cond
((and (null? l) (null? factors))
(displayln "The procedure cannot factor this number.")
(displayln "Either it is a prime greater than 100,000,380,000,361,")
(displayln "or it is a composite containing factors, all of which are greater than 10,000,019."))
((and (null? l) (not (null? factors)))
(set! factors (append factors (list n1)))
(displayln "The procedure cannot fully factor this number.")
(displayln "The last factor is either a prime greater than 100,000,380,000,361,")
(displayln "or it is a composite containing sub-factors, all of which are greater than 10,000,019.")
factors)
(else
(define test (car l))
(define sr (floor (sqrt n1)))
(cond
((= (modulo n1 test) 0)
(set! factors (append factors (list test)))
(check-duplicates n1 test l))
((> test sr)
(set! factors (append factors (list n1)))
factors)
(else (test-factors n1 (cdr l)))))))
(define (check-duplicates n2 t l1)
(define n3 (/ n2 t))
(cond
((= n3 1) factors)
((= (modulo n3 t) 0)
(set! factors (append factors (list t)))
(check-duplicates n3 t l1))
(else (test-factors n3 (cdr l1)))))
(cond
((= n 0)
(list 0))
((= n 1)
(list 1))
(else (test-factors n pl))))

(define (time-read-primes-file)
(let ((t1 (current-milliseconds)))
(read-primes-file)
(let ((t2 (current-milliseconds)))
(let ((tt (/ (- t2 t1) 1000.0)))
tt))))

(define (time-list-factor n)
(let ((t1 (current-milliseconds)))
(let ((factor-list (list-factor n)))
(let ((t2 (current-milliseconds)))
(let ((tt (/ (- t2 t1) 1000.0)))
(values factor-list tt))))))

; REPL interactions ------------------------------------------------------------------

Welcome to DrRacket, version 5.1 [3m].
Language: racket.
> (time-read-primes-file)
4.009
> (list-factor 0)
'(0)
> (list-factor 1)
'(1)
> (list-factor 2)
'(2)
> (list-factor 3)
'(3)
> (list-factor 4)
'(2 2)
> (list-factor 5)
'(5)
> (list-factor 6)
'(2 3)
> (list-factor 1024)
'(2 2 2 2 2 2 2 2 2 2)
> (time-list-factor 111111111)
'(3 3 37 333667)
0.001
> (time-list-factor 1111111111)
'(11 41 271 9091)
0.001
> (time-list-factor 11111111111)
'(21649 513239)
0.319
> (time-list-factor 111111111111)
'(3 7 11 13 37 101 9901)
0.001
> (time-list-factor 1111111111111)
'(53 79 265371653)
0.002
> (time-list-factor 11111111111111)
'(11 239 4649 909091)
0.001
> (time-list-factor 111111111111111)
'(3 31 37 41 271 2906161)
0.001
> (time-list-factor 1111111111111111)
'(11 17 73 101 137 5882353)
0.001
> (time-list-factor 11111111111111111)
'(2071723 5363222357)
24.243
> (time-list-factor 111111111111111111)
'(3 3 7 11 13 19 37 52579 333667)
0.536
> (time-list-factor 1111111111111111111)
The procedure cannot factor this number.
Either it is a prime greater than 100,000,380,000,361,
or it is a composite containing factors, all of which are greater than 10,000,019.
97.527
> (time-list-factor 11111111111111111111)
'(11 41 101 271 3541 9091 27961)
1.628
> (time-list-factor 111111111111111111111)
'(3 37 43 239 1933 4649 10838689)
1.636
> (time-list-factor 1111111111111111111111)
'(11 11 23 4093 8779 21649 513239)
0.21
> (time-list-factor 11111111111111111111111)
The procedure cannot factor this number.
Either it is a prime greater than 100,000,380,000,361,
or it is a composite containing factors, all of which are greater than 10,000,019.
107.971
> (time-list-factor 111111111111111111111111)
'(3 7 11 13 37 73 101 137 9901 99990001)
0.125
> (time-list-factor 1111111111111111111111111)
'(41 271 21401 25601 182521213001)
6.208
> (time-list-factor 11111111111111111111111111)
The procedure cannot fully factor this number.
The last factor is either a prime greater than 100,000,380,000,361,
or it is a composite containing sub-factors, all of which are greater than 10,000,019.
'(11 53 79 859 280846283204599997)
91.478
> (time-list-factor 111111111111111111111111111)
The procedure cannot fully factor this number.
The last factor is either a prime greater than 100,000,380,000,361,
or it is a composite containing sub-factors, all of which are greater than 10,000,019.
'(3 3 3 37 757 333667 440334654777631)
87.828
> (time-list-factor 1111111111111111111111111111)
'(11 29 101 239 281 4649 909091 121499449)
10.889
> (time-list-factor 11111111111111111111111111111)
'(3191 16763 43037 62003 77843839397)
2.168
> (time-list-factor 111111111111111111111111111111)
'(3 7 11 13 31 37 41 211 241 271 2161 9091 2906161)
0.14
> (time-list-factor 1111111111111111111111111111111)
The procedure cannot fully factor this number.
The last factor is either a prime greater than 100,000,380,000,361,
or it is a composite containing sub-factors, all of which are greater than 10,000,019.
'(2791 6943319 57336415063790604359)
94.458
> (time-list-factor 11111111111111111111111111111111)
'(11 17 73 101 137 353 449 641 1409 69857 5882353)
0.593
> (time-list-factor 111111111111111111111111111111111)
The procedure cannot fully factor this number.
The last factor is either a prime greater than 100,000,380,000,361,
or it is a composite containing sub-factors, all of which are greater than 10,000,019.
'(3 37 67 21649 513239 1344628210313298373)
91.104
> (time-list-factor 1111111111111111111111111111111111)
The procedure cannot fully factor this number.
The last factor is either a prime greater than 100,000,380,000,361,
or it is a composite containing sub-factors, all of which are greater than 10,000,019.
'(11 103 4013 2071723 117957818840753430733)
92.212
> (time-list-factor 11111111111111111111111111111111111)
The procedure cannot fully factor this number.
The last factor is either a prime greater than 100,000,380,000,361,
or it is a composite containing sub-factors, all of which are greater than 10,000,019.
'(41 71 239 271 4649 123551 102598800232111471)
94.068
> (time-list-factor 111111111111111111111111111111111111)
'(3 3 7 11 13 19 37 101 9901 52579 333667 999999000001)
12.855
> (time-list-factor 1111111111111111111111111111111111111)
The procedure cannot fully factor this number.
The last factor is either a prime greater than 100,000,380,000,361,
or it is a composite containing sub-factors, all of which are greater than 10,000,019.
'(2028119 547853016076034547830334961169)
88.655
> (time-list-factor 11111111111111111111111111111111111111)
The procedure cannot fully factor this number.
The last factor is either a prime greater than 100,000,380,000,361,
or it is a composite containing sub-factors, all of which are greater than 10,000,019.
'(11 1010101010101010101010101010101010101)
93.319
> (time-list-factor 111111111111111111111111111111111111111)
The procedure cannot fully factor this number.
The last factor is either a prime greater than 100,000,380,000,361,
or it is a composite containing sub-factors, all of which are greater than 10,000,019.
'(3 37 53 79 239073561261285168617387389778123)
94.863
> (time-list-factor 1111111111111111111111111111111111111111)
'(11 41 73 101 137 271 3541 9091 27961 1676321 5964848081)
15.662
> (time-list-factor 11111111111111111111111111111111111111111)
The procedure cannot fully factor this number.
The last factor is either a prime greater than 100,000,380,000,361,
or it is a composite containing sub-factors, all of which are greater than 10,000,019.
'(83 1231 538987 201763709900322803748657942361)
94.13
> (time-list-factor 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111)
The procedure cannot fully factor this number.
The last factor is either a prime greater than 100,000,380,000,361,
or it is a composite containing sub-factors, all of which are greater than 10,000,019.
'(11 41 101 251 271 3541 5051 9091 21401 25601 27961 60101 7019801 341233306557836423189042926585457900151074303303755301)
90.121
> (time-list-factor 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111)
The procedure cannot fully factor this number.
The last factor is either a prime greater than 100,000,380,000,361,
or it is a composite containing sub-factors, all of which are greater than 10,000,019.
'(11
41
73
101
137
251
271
401
1201
1601
3541
5051
9091
21401
25601
27961
60101
1676321
7019801
263980647407951828155827620420447036038246324232161580581998828957153987896442603868348285938940685117797680817907519399405310559008181)
97.999
>

Johannes

30-04-2011, 22:32

It turns out that the first prime greater than 10,000,000, is, 10,000,019. The complete list contains the first 664,580 primes, and its associated file size is 6.25 MB.
That's an average of almost 10 bytes per prime. Does Racket store integers (or numbers in general) as decimal ASCII?

I have to say I'm very impressed with Racket's code density. I have a little trouble reading it. (I know you do MODs using the values from prime list pl but I wouldn't want to have to change anything in your program.)

I'm tempted. You know I am. ;)

Johannes

30-04-2011, 22:42

(One more thing. If you want to instantaneously become at least a billionaire, all you need to do is to develop a procedure that will factor an arbitrary 1000 digit decimal integer, on an average computer, in one second.)
As it stands now, the best algorithm to factor large numbers is the General number field sieve (http://en.wikipedia.org/wiki/General_number_field_sieve).

The running time of the number field sieve is super-polynomial but sub-exponential in the size of the input.If you developed an algorithm to do it in polynomial time you'd get the Nobel Prize in mathematics, guaranteed. If you can do it in O(n^2) they'd probably make you king of a country of your choice.

If the CIA or FSB didn't kill you first. ;)

danbaron

01-05-2011, 05:48

Racket is basically Scheme. I didn't know anything about Scheme. jack clued me into it.

http://www.thinbasic.com/community/showthread.php?11081-M.I.T.-Lisp-Videos&highlight=lisp+videos

(I think what they refer to as Lisp in the videos, is actually Scheme.)

I am convinced that Scheme is a very smart language. And, I don't think there is a better implementation of it than Racket. In fact, I have also downloaded and tried both the Allegro and LispWorks implementations of Common Lisp. They are both expensive commercial products. I swear that Racket (absolutely free) is better than both of them. (On the Internet you can quickly learn that, often, how much something costs has no relation to how much it is worth.) (Of course, you are correct about MODs and pl. And, you can change anything you want. What does it hurt? It's free, as in, "free beer!".)

Here's a tip. In Scheme and Common Lisp, "(car l)", means the first element of the list, l. "(cdr l)", means the remainder of the list (with (car l) removed). The names, car and cdr, are artifacts of the distant past, which come from the architecture of the computer that Lisp was first developed on.

http://en.wikipedia.org/wiki/CAR_and_CDR

---------------------------------------------------------------------------------------

Here is the beginning of the 6.25 MB file, "primes-list.txt".

((2) 0 () 0 () () (c 2 c 3 c 5 c 7 c 11 c 13 c 17 c 19 c 23 c 29 c 31 c 37 c 41 c 43 c 47 c 53 c 59 c 61 c 67 c 71 c 73 c 79 c 83 c 89 c 97 c 101 c 103 c 107 c 109 c 113 c 127 c 131 c 137 c 139 c 149 c 151 c 157 c 163 c 167 c 173 c 179 c 181 c 191 c 193 c 197 c 199 c 211 c 223 c 227 c 229 c 233 c 239 c 241 c 251 c 257 c 263 c 269 c 271 c 277 c 281 c 283 c 293 c 307 c 311 c 313 c 317 c 331 c 337 c 347 c 349 c 353 c 359 c 367 c 373 c 379 c 383 c 389 c 397 c 401 c 409 c 419 c 421 c 431 c 433 c 439 c 443 c 449 c 457 c 461 c 463 c 467 c 479 c 487 c 491 c 499 c 503 c 509 c 521 c 523 c 541 c 547 c 557 c 563 c 569 c 571 c 577 c 587 c 593 c 599 c 601 c 607 c 613 c 617 c 619 c 631 c 641 c 643 c 647 c 653 c 659 c 661 c 673 c 677 c 683 c 691 c 701 c 709 c 719 c 727 c 733 c 739 c 743 c 751 c 757 c 761 c 769 c 773 c 787 c 797 c 809 c 811 c 821 c 823 c 827 c 829 c 839 c 853 c 857 c 859 c 863 c 877 c 881 c 883 c 887 c 907 c 911 c 919 c 929 c 937 c 941 c 947 c 953 c 967 c 971 c 977 c 983 c 991 c 997 c 1009 c 1013 c 1019 c 1021 c 1031 c 1033 c 1039 c

And here is the end of it.

9998701 c 9998719 c 9998741 c 9998743 c 9998749 c 9998753 c 9998777 c 9998797 c 9998801 c 9998809 c 9998851 c 9998861 c 9998867 c 9998887 c 9998893 c 9998903 c 9998929 c 9998969 c 9998971 c 9998977 c 9999047 c 9999049 c 9999053 c 9999071 c 9999083 c 9999161 c 9999163 c 9999167 c 9999193 c 9999217 c 9999221 c 9999233 c 9999271 c 9999277 c 9999289 c 9999299 c 9999317 c 9999337 c 9999347 c 9999397 c 9999401 c 9999419 c 9999433 c 9999463 c 9999469 c 9999481 c 9999511 c 9999533 c 9999593 c 9999601 c 9999637 c 9999653 c 9999659 c 9999667 c 9999677 c 9999713 c 9999739 c 9999749 c 9999761 c 9999823 c 9999863 c 9999877 c 9999883 c 9999889 c 9999901 c 9999907 c 9999929 c 9999931 c 9999937 c 9999943 c 9999971 c 9999973 c 9999991 c 10000019))

---------------------------------------------------------------------------------------

There is no Nobel Prize in mathematics.

http://nobelprizes.com/nobel/why_no_math.html

http://en.wikipedia.org/wiki/Fields_Medal

---------------------------------------------------------------------------------------

I looked at the Wikipedia article about the General number field sieve. My first impression is that I am not optimistic about being able to correctly do it. But, there are other methods between what I did and it. Maybe I can do one or more of those.

---------------------------------------------------------------------------------------

As far as me trying to increase the size of my primes list, I think that is a dead end.

To estimate the number of primes less than or equal to x, we use the formula pi(x) ~= x / ln(x).

For x = 10^7, that is 620,421 (but actually there are 664,579, the biggest prime in the list is 10^7 + 19).

For x = 10^8, that is 5,428,681.

So, just to increase my list so that the biggest prime was greater than 10^8, I would have to add approximately 5,428,681 - 664,580 = 4,764,101 primes.

It took approximately 16 hours to make the list as it is, with the first 664,580 primes. Assuming that the rate is the same, it would take approximately 4,764,101 / 664,580 * 16 = 115 hours to get the first prime greater than 10^8. But, of course, even worse is the fact that the rate at which the list grows, continually decreases. (I guess we could figure out the equation for the rate.)

Additionally, the file would at least increase in size to approximately 5,428,681 / 664,580 * 6.25 = 51.05 MB. But, of course, even worse is the fact that on average, each additional prime requires more storage space than the previous one.

And, even if I was able to make the list contain the first prime greater than 10^8, I would only be guaranteed of being able to factor any number with no factor larger than 16 digits. Already, I can factor any number with no factor larger than 14 digits.

How about if we wanted to make the list contain all of the primes up to 10^9?

10^9 / ln(10^9) = 48,254,942

The floor estimate for the size of that file would be 48,254,942 / 664,580 * 6.25 = 453.81 MB.

And then, we would only be guaranteed of being able to factor any number with no factor larger than 18 digits.

I think absolutely, now or in the future, the method of finding the first y primes, will confound any computer. Even if at the start it is spitting out primes at the rate of 10^50 per second, sooner or later it will slow to one prime per year, and then one prime per million years, etc.

---------------------------------------------------------------------------------------

Absolutely, if you made a method to do instantaneous factoring, you would have to proceed very very carefully. You would have to write it down and hide it somewhere, maybe out in the forest under a rock. You would have to destroy all other traces of it. If you did those things, maybe you could walk into the CIA and negotiate. (But, there's no guarantees in a situation like this, right? You might find yourself on the way to Guantanamo. Maybe if you agreed to split the money with a high government official, you would have a protector, who knows?)

:cathug:

danbaron

01-05-2011, 09:52

I changed the code and REPL interactions a little bit (above).

(Better not to be greedy, right?)

:oops: :?: :twisted:

danbaron

01-05-2011, 20:50

If you notice above in the REPL interactions, the code could not even partially factor, two of the integers I tried,

1111111111111111111 (19 ones),

11111111111111111111111 (23 ones).

I went to, http://www.wolframalpha.com/ .

I tried to factor them both using the commands, FactorInteger[1111111111111111111], and, FactorInteger[11111111111111111111111].

"Super-Mathematica" replied, that both of them are primes.

So, both numbers consist of only one factor, themselves (or, if you want, two factors, themselves and 1). In other words, each is already factored.

Since the largest prime in the primes list I made is 10,000,019, the code can make no further progress if it encounters a factor larger than (10,000,019)^2, or, 100,000,380,000,361.

So, basically, if the smallest factor (the code finds the factors in the order of smallest to largest) in an integer is more than 14 digits, the code will respond that it cannot factor it at all.

The code will factor any prime of less than 15 digits. In that case it will return a list containing one member, the prime itself.

:o

Johannes

01-05-2011, 21:37

There is no Nobel Prize in mathematics.
The bastards. :D

Come up with an algorithm to factor a composite integer in linear time and they'll create an award for the occasion.

About storage of primes, there are very efficient ways.

If you assume that 2 is prime then you can ignore all other even primes and you could use bits to indicate primality or not. You'd need one bit for every two natural numbers. If you also assume that 3 is prime then you can also ignore all multiples of 3 and you'd need one bit for every three natural numbers. Add 5 as a prime and the ratio becomes one bit for every 4 out of 15 numbers: 8 bits (also known as one "byte") per 30 numbers. (This third one is a standard method of compact prime storing.)

For the primes up to 10 million you'd need 10^7/30 = 333,334 bytes = 325.52 kBytes (as compared to Racket's 6.25 MB).

You can keep extending the efficiency by assuming the next prime number as a given. Without assuming anything the ratio is of course exactly 1. Assuming 2 is prime it becomes 1:2. Assuming 3 is prime it becomes 1:3 (or rather 2:6, you'll see why in a minute). Assuming 5 is prime it becomes 8:30. The logic behind it is if you assume the next prime p as given then the new ratio becomes the old ratio multiplied by n-1/n. So assuming 7 as prime changes the ratio from 8:30 to 8*(7-1):30*7 = 48:210 or six bytes for every 210 natural numbers. The next prime is 11 and assuming that one would change the ratio to 480:2310.

Storing those primes up to 10 million with 2, 3, 5, 7 and 11 given as prime would now take 259,800 bytes or 253.71 kBytes, 77.94% of the earlier example.

But you're right. At one point either storage or calculation time will become prohibitive. No matter how fast your computer is, at one point the next prime will take a million years to compute.

danbaron

02-05-2011, 00:19

I believe you about the storage.

I changed the code and REPL interactions (above) again.

For some reason my mind thinks of things sequentially, not simultaneously.

:oops: :cry: :evil: :twisted:

Johannes

03-05-2011, 20:15

It took approximately 16 hours to make the list as it is, with the first 664,580 primes.
Six minutes flat. (A lie: it was 6m01s.) And not a single assembler command; it's all in standard thinBasic.

I'm using the storage method as described above, with the first six primes assumed (2 through 13). This has the advantage that only 5760 out of 30,030 numbers have to be checked for primality, a speed gain of a factor of five. (I think you just skipped the even numbers, which is of course a speed gain of a factor of two.)

At first the program would have taken fifty times (!) as long but I was able to speed it up by adding one single standard thinBasic command.

DoEvents(Off)Unbelievable that the Windows events take up so much time.

I'm still tweaking the prime generating program a little, and I'm also writing another program to be able to use that enormous primes file. My goal is to calculate all 32-bit primes...

danbaron

04-05-2011, 09:48

I'll try to make a faster way too.

:idea: :?:

Johannes

04-05-2011, 12:58

I've got it down to 5m49s now (for the primes up to 10 million) and I'm guessing that's it for straight-up thinBasic.

I removed the need for an SQR in the main calculating loop, replacing it by an addition, a multiplication and a compare (the bit with srt and prd). This is also very useful if (when?) I convert the main loop to assembler. After having done some 330 million numbers the checking runs at some 6,000 numbers per second. Almost another 4 billion numbers to do...

The program is more complicated than it needs to be because it works with a variable number of assumed primes (variable nap). Be careful changing that number of assumed primes because if you do it will delete any work done with another number of assumed primes. Decoding the primes file with a higher number of assumed primes is slower, but you win a lot of memory.

I'm working on an include script that can use/process the file generated with this program. Once that's finished I'll put it in a topic in the math subforum. I will definitely complete the entire 32-bits primes file, possibly with the main bit in assembler. I'll make the primes file (102,976,561 bytes, 98.2 MB) available through P2P (.torrent) then.

'----------------------------------------------------------------------------
' !All32BitPrimes
'
' Create a file with all 32-bit prime numbers.
'
' ©2011 Johannes
'----------------------------------------------------------------------------

Uses "Console","File"

' Number of assumed primes. Must be between 3 and 6.
Word nap = 6

' File name for prime number file.
String fn_primes = APP_SourcePath+"Primes.primes"

' Auto-save interval in seconds.
Quad svi = 60

' Some initialisations.
DoEvents(Off)
PrintL "DETERMINE ALL 32-BIT PRIMES"
PrintL "==========================="
DWord i,j,k,n,nbr
Word srt,prd
Quad che,mxi = 4294967296
HiResTimer_Init
nap=Min(Max(nap,3),6)

' Check work done so far.
If Not FILE_Exists(fn_primes) Then FILE_Save(fn_primes,Chr$(nap))
Integer hnd=FILE_Open(fn_primes,"BINARY")
String chk=FILE_Get(hnd,1)
FILE_Close(hnd)
If chk<>Chr$(nap) Then FILE_Save(fn_primes,Chr$(nap))

' Determine all 16-bit primes using Euler's Sieve.
Print "Determining all 16-bit prime numbers: "
Word sieve(65535)
For i=2 To 65535
sieve(i)=i
Next i
n=65534
For i=2 To 255
If sieve(i) Then
For j=Int(65535/i) To i Step -1
If sieve(j) Then sieve(i*j)=0:n-=1
Next j
EndIf
Next i
Word p16b(n)
j=0
For i=2 To 65535
If sieve(i) Then
j+=1
p16b(j)=sieve(i)
EndIf
Next i
ReDim sieve()
PrintL TStr$(n)+" primes found."

' Print assumed primes and prime storage information.
Print "Working with"+Str$(nap)+" assumed primes:"
Ext fract = 100.0
For i=1 To nap
j=p16b(i)
fract*=(j-1)/j
Print Str$(j)
Next i
PrintL "."
PrintL "Fraction of stored numbers is "+Format$(fract,"0.000")+"%."

' Determine sizes of number block and bit mask.
Word blk,bts,msk = 1
If nap Then
For i=1 To nap
blk*=p16b(i)
bts*=p16b(i)-1
Next i
EndIf
msk=bts/8

' Check number of already considered numbers.
Quad hgh = (FILE_Size(fn_primes)-1)*blk/msk
fract=Min(100,hgh/42949672.96)
Print "Every":If msk>1 Then Print Str$(msk)
Print " byte":If msk>1 Then Print "s"
Print " define":If msk=1 Then Print "s"
PrintL Str$(blk)+" numbers."
PrintL "Total file size for all 32-bit primes is"+Str$(msk*Ceil(mxi/blk)+1)+" bytes."
PrintL "Currently"+Str$(hgh)+" numbers have been considered."
PrintL "Overall progress is "+Format$(fract,"0.000")+"%."

' Check if all work has been done.
If hgh>=mxi Then
PrintL "All 32-bit primes have been determined."
WaitKey
Stop
EndIf

' Construct offset table.
Long ofs(bts)
Boolean prm
k=0
PrintL "Creating offset table ("+TStr$(bts)+" entries)."
For i=1 To blk
prm=TRUE
For j=1 To nap
If Mod(i,p16b(j))=0 Then prm=FALSE:Exit For
Next j
If prm Then k+=1:ofs(k)=i
Next i

' Reserve (a whole lotta) memory for primes list and load existing data.
Byte dta(msk*Ceil(mxi/blk))
If FILE_Size(fn_primes)>1 Then Poke$ VarPtr(dta(1)),Mid$(FILE_Load(fn_primes),2)

' Append 16-bit primes if necessary.
Byte byt
If hgh<65536 Then
PrintL "Appending 16-bit primes."
For i=1 To msk*Ceil(65536/blk)
dta(i)=0
Next i
For i=1 To n
j=Array Scan ofs, =Mod(p16b(i),blk)
If j Then
j-=1
k=msk*Int(p16b(i)/blk)+Int(j/8)+1
j=Mod(j,8)
byt=1
SHIFT LEFT byt,j
dta(k)=(dta(k) Or byt)
EndIf
Next i
hgh=blk*Int(65536/blk)
EndIf

' Main loop for checking successive numbers.
PrintL "Autosaving every"+Str$(svi)+" seconds."
svi*=1000000
Print "Determining further primes"
k=msk*hgh/blk+1
byt=1
Quad ctr
srt=Int(Sqr(hgh))
HiResTimer_Get
While hgh<mxi
For i=1 To bts
che=hgh+ofs(i)
If che>=mxi Then Exit For
nbr=che
If srt<65535 Then
prd=srt+1
If prd*prd<=nbr Then srt=prd
EndIf
prm=TRUE
j=1
While p16b(j)<=srt And j<=n
If Mod(nbr,p16b(j))=0 Then prm=FALSE:Exit While
j+=1
Wend
If prm Then dta(k)=(dta(k) Or byt)
SHIFT LEFT byt,1
If byt=0 Then byt=1:k+=1
Next i
hgh+=blk
ctr+=HiResTimer_Delta
If ctr>=svi Then ctr=0:SavePrimes():Print "."
Wend
PrintL

' Save primes list at program end.
PrintL "Saving primes list."
SavePrimes()
PrintL "All 32-bits primes have been determined."

DoEvents(On)
WaitKey

Sub SavePrimes()
FILE_Save(fn_primes,Chr$(nap)+Peek$(VarPtr(dta(1)),msk*Int(hgh/blk)))
End Sub

danbaron

05-05-2011, 10:41

I worked on something, just to see how fast I could find all of the primes through 10,000,019.

I used a vector, called, "primes-vector", with 10,000,020 elements (the extra element is because Racket indexes vectors from 0).

The procedure, "make-primes-vector", sets the value of an index to #t if the index is prime, and to #f if the index is composite.

Below, in the REPL interactions, I timed the procedure (in seconds), and then counted the number of primes, as a check.

(Now am too tired for anything else.)

:idea: :arrow: :?:

; code --------------------------------------------------------------------------------------

#lang racket

(define primes-vector 0)
(define max-val 0)

(define (make-primes-vector n)
(set! max-val n)
(set! primes-vector (make-vector (+ max-val 1) #t))
(vector-set! primes-vector 0 #f)
(vector-set! primes-vector 1 #f)
(define max-prime (floor (sqrt n)))
(define (do-prime a)
(cond
((> a max-prime) void)
(else
(remove-multiples a 2)
(do-prime (next-prime (+ a 1))))))
(define (remove-multiples b c)
(define index (* b c))
(cond
((> index max-val) void)
(else
(vector-set! primes-vector index #f)
(remove-multiples b (+ c 1)))))
(define (next-prime d)
(if (vector-ref primes-vector d) d
(next-prime (+ d 1))))
(do-prime 2))

(define (count-primes)
(define count 0)
(define (iter-count i)
(cond
((> i max-val) count)
(else
(when (vector-ref primes-vector i) (set! count (+ count 1)))
(iter-count (+ i 1)))))
(iter-count 2))

(define (time-make-primes-vector n)
(let ((t1 (current-milliseconds)))
(make-primes-vector n)
(let ((t2 (current-milliseconds)))
(let ((tt (/ (- t2 t1) 1000.0)))
tt))))

; REPL interactions -------------------------------------------------------------------------

Welcome to DrRacket, version 5.1 [3m].
Language: racket.
> (time-make-primes-vector 10000019)
8.861
> (count-primes)
664580
>

Johannes

05-05-2011, 12:19

I worked on something, just to see how fast I could find all of the primes through 10,000,019.
Ah! That's a different kettle of fish. :)

You're doing a sieve, presupposing 10,000,019 as a limit. In thinBasic it took me 11.88 seconds and that includes keeping a tally of the number of primes (also 664,580 for me). Please wait while I do this in PowerBASIC. (Racket also compiles, so it's only fair.)

'----------------------------------------------------------------------------
' !Primes10M
'
' Calculate primes up to 10 million.
'
' ©2011 Johannes
'----------------------------------------------------------------------------

DoEvents(Off)
HiResTimer_Init
HiResTimer_Get

' Determine primes using Euler's Sieve.
Long i,j,n,x=10000019
Byte s(n)=1
s(1)=0:x-=1
For i=2 To Int(Sqr(n))
If s(i) Then
For j=Int(n/i) To i Step -1
If s(j) Then s(i*j)=0:x-=1
Next j
EndIf
Next i

Quad t=HiResTimer_Delta
DoEvents(On)

MsgBox 0,"Time:"+Str$(t/1000000)+$CRLF+"Primes:"+Str$(x)

Johannes

05-05-2011, 12:45

Please wait while I do this in PowerBASIC.
This is ridiculous.

'----------------------------------------------------------------------------
' !Primes10M
'
' Calculate primes up to 10 million.
'
' ©2011 Johannes
'----------------------------------------------------------------------------
#COMPILE EXE
#DIM NONE

FUNCTION PBMAIN()
DIM i AS LONG
DIM j AS LONG
DIM n AS LONG
DIM t1 AS LONG
DIM t2 AS LONG
DIM x AS LONG

t1=TIMER

' Determine primes using Euler's Sieve.
n=10000019
x=10000019
DIM s(n) AS BYTE
FOR i=0 TO n
s(i)=1
NEXT i
s(0)=0
s(1)=0:x=x-1
FOR i=2 TO INT(SQR(n))
IF s(i) THEN
FOR j=INT(n/i) TO i STEP -1
IF s(j) THEN s(i*j)=0:x=x-1
NEXT j
END IF
NEXT i

t2=TIMER

MSGBOX "Time:"+STR$(t2-t1)+$CRLF+"Primes:"+STR$(x)

END FUNCTION
Please note that I do everything between the timing: creating the array, filling it with ones, keeping track of the number of primes.

PowerBASIC times in whole seconds. For those 664,580 primes the time is 0 (zero) seconds. I upped the limit to 100 million and the time was 1 second (5,761,455 primes). Then I upped the limit to 1 billion and I got a significant time: 11 seconds (50,847,534 primes).

ErosOlmi

05-05-2011, 12:52

Just in case, for those lazy like me, thinBasic has native IsPrime (http://www.thinbasic.com/public/products/thinBasic/help/html/isprime.htm) function ;)

HiResTimer_Init
HiResTimer_Get
'---Determine primes
Long x = 10000019
Long Count
Long nPrimes
For Count = 1 To x
If IsPrime(Count) Then Incr nPrimes
Next
Quad t=HiResTimer_Delta
MsgBox 0, "Time:" + Str$(t/1000000) + $CRLF + "Primes:" + Str$(nPrimes)

ErosOlmi

05-05-2011, 12:55

To me, PowerBasic as always been the best native compiler, ever.
Not just because it is fast, creates optimized code and is reliable, but mainly because they introduced, during the years, many new commands always keeping very high compatibility with previous code.

Johannes

05-05-2011, 17:10

After having done some 330 million numbers the checking runs at some 6,000 numbers per second. Almost another 4 billion numbers to do...
By now my program has determined the primes up to 700 million, one-sixth of the entire work load. Total running time up to now somewhere around 20 or 30 hours. The checking slows down proportional to SQR(n) of course.

And then I wrote the sieve program in thinBasic and then PowerBASIC (see above). And I realised that if I used bits instead of bytes I could fit those 4 billion numbers in memory. Which makes a sieve possible.

'----------------------------------------------------------------------------
' !All32BitPrimes
'
' Create a file containing all 32-bit primes using Euler's Sieve.
'
' ©2011 Johannes
'----------------------------------------------------------------------------

#COMPILE EXE
#DIM NONE

FUNCTION PBMAIN()
DIM i AS DWORD
DIM j AS DWORD
DIM k AS DWORD
DIM l AS DWORD
DIM maxb AS DWORD
DIM maxn AS DWORD
DIM q AS QUAD
DIM t1 AS DWORD
DIM t2 AS DWORD
DIM t3 AS DWORD
DIM x AS DWORD

' Upper limit for prime generation.
maxn = 4294967295
' maxn = 1073741823
' maxn = 268435455
' maxn = 67108863
' maxn = 16777215
' maxn = 4194303
' maxn = 1048575
' maxn = 262143
' maxn = 65535

' Define assumed primes.
DIM ap(5) AS DWORD
ARRAY ASSIGN ap() = 2,3,5,7,11,13

' Determine storage parameters.
DIM blk AS DWORD
DIM bts AS DWORD
DIM msk AS DWORD
blk = 1
bts = 1
FOR i=0 TO 5
blk = blk*ap(i)
bts = bts*(ap(i)-1)
NEXT i
msk = bts/8

' Construct offset table.
DIM ofs(bts-1) AS DWORD
k = 0
FOR i=1 TO blk
l = 1
FOR j=0 TO 5
IF (i MOD ap(j))=0 THEN l = 0 : EXIT FOR
NEXT j
IF l<>0 THEN ofs(k) = i : k=k+1
NEXT i

' Create array for sieve.
q = maxn
q = blk*INT((q+blk)/blk)
q = INT((q+7)/8)
maxb = q
DIM sieve(maxb-1) AS BYTE
q = maxn
INCR q
SHIFT RIGHT q,3
FOR i=0 TO q-1
sieve(i) = &hFF
NEXT i

t1 = TIMER

' Zero and one are not prime numbers.
sieve(0)=sieve(0) AND &hFC
x = maxn-1
' Determine primes using Euler's Sieve.
FOR i=0 TO INT(SQR(maxn))
k = i
SHIFT RIGHT k,3
l = 1
SHIFT LEFT l,(i AND 7)
IF (sieve(k) AND l)<>0 THEN
j = INT(maxn/i)
WHILE j>=i
k = j
SHIFT RIGHT k,3
l = 1
SHIFT LEFT l,(j AND 7)
IF (sieve(k) AND l)<>0 THEN
k = i*j
SHIFT RIGHT k,3
l = 1
SHIFT LEFT l,(i*j AND 7)
sieve(k) = sieve(k) XOR l
x = x-1
END IF
j = j-1
WEND
END IF
NEXT i

t2 = TIMER

' Extract relevant bits for output file.
DIM i1 AS QUAD
DIM k1 AS QUAD
DIM m AS DWORD
DIM src AS QUAD
DIM tgt AS DWORD
q = maxn
q = msk*INT((q+blk)/blk)
DIM dta(q-1) AS BYTE
i1 = 0
tgt = 0
m = 1
WHILE i1<maxn
FOR j=0 TO bts-1
src = i1+ofs(j)
k1 = src
SHIFT RIGHT k1,3
l = 1
SHIFT LEFT l,(src AND 7)
IF (sieve(k1) AND l)<>0 THEN
dta(tgt) = dta(tgt) OR m
END IF
SHIFT LEFT m,1
IF m=256 THEN m = 1 : tgt = tgt+1
NEXT j
i1 = i1+blk
WEND

t3 = TIMER

' Display statistics.
MSGBOX "Maximum:"+STR$(maxn)+$CRLF+"Sieve time:"+STR$(t2-t1)+$CRLF+"Primes:"+STR$(x)+$CRLF+$CRLF+"Save time:"+STR$(t3-t2)+$CRLF

' Save file.
OPEN CURDIR$+"/PB_Primes.primes" FOR BINARY ACCESS WRITE AS #13
PUT$ #13,CHR$(6)
PUT$ #13,PEEK$(VARPTR(dta(0)),q)
SETEOF #13
CLOSE #13

END FUNCTIONA little over 5 million numbers per second, and the program is linear. So where the old program takes twice as long to proces a number four times as big, this new program wades through 5 million checks per second if the number is 1 or if it's 1 billion.

Generating the file for 268 million numbers took 52 seconds. The conversion between the two massive arrays took another 3 seconds. That transfer is bit by bit and I didn't really optimise it. Although only set bits (primes) are transferred, we're still talking about 14,630,843 primes (around 4.8 million bit-by-bit transfers per second).

Projected time to determine all 4 billion primes: 14 minutes.

Just in time: my program finished for primes up to one billion-and-a-bit. A total of 54,400,028 primes, determined in 3m34s. Another 12 seconds for convsersion to the output format, resulting file is 24.5 MB. And my (slow) thinBasic program that checks the compressed prime files says it's all in order.

Johannes

05-05-2011, 23:04

7196

And the primes check out. All 203,280,221 of them. Times are in seconds.

I'll whip up a thinBasic include script that can use that 98.2-MB file. IsPrime, NextPrime, PrevPrime, NthPrime, perhaps some other stuff I can think of. When that's finished I'll put that script along with the compiled PowerBASIC program (less than 10 kB in an archive) in the math subforum.

ErosOlmi

05-05-2011, 23:07

Consider that, if you like, I can add those keywords as thinBasic native commands in Core engine.

Thanks a lot
Eros

danbaron

06-05-2011, 01:00

You're doing a sieve, presupposing 10,000,019 as a limit. In thinBasic it took me 11.88 seconds and that includes keeping a tally of the number of primes (also 664,580 for me). Please wait while I do this in PowerBASIC. (Racket also compiles, so it's only fair.)

If you mean "fair" in the sense of a competition, then, I think none of this is fair. Thinbasic is an interpreter, Racket is compiled into byte code, and, as far as I know, Powerbasic is compiled into machine code; and, we didn't create any of them. Additionally, the timings are for different machines. Additionally, we didn't create the algorithms we used. Etc. (My guess is that, "keeping a tally of the number of primes", does not slow down the procedure very much.)

And, I question whether speed itself, is self-evidently an admirable and worthy goal. I can make an analogy to driving a car. What good is it being the first one to reach a destination, if you have nothing to do when you get there? Or, I guess, equivalently, what good is it being the first one, to reach a random destination?

:cry: :p

Johannes

06-05-2011, 09:00

Thinbasic is an interpreter, Racket is compiled into byte code, and, as far as I know, Powerbasic is compiled into machine code; and, we didn't create any of them.
PowerBASIC does generate pure machine code, and very efficient machine code at that. And don't forget that the only reason that I managed to get this speed was to use a gigantic amount of memory. I can write the program in assembler, no (big) problem, but multi-megabyte memory management in a Windows environment?

Racket generates pseudo code? And still is that fast? I'm very impressed.

In the end this became a contest with myself. The earlier thinBasic script mimicked your Racket program in that it continued with previously created data. And in the final version it would have determined all primes in a week or so. Perhaps with some assembler coding I could have gotten that time down to a day. But that PowerBASIC is something else. And I enjoyed the mental challenge of working entirely with bits.

And thank you for giving me this mental challenge. I love to program but sometimes it's difficult to come up with something to program.

Johannes

06-05-2011, 09:14

Consider that, if you like, I can add those keywords as thinBasic native commands in Core engine.
Eros,

For these commands to work you would need that gigantic primes file. For all 32-bit primes it's 98.2 megabytes. Zipped it's still well over 80 megabytes so including it in a thinBasic release is not workable. Supplying the compiled PowerBASIC program (10 kBytes) to generate the primes file is of course possible but that generates its own problems.

Apart from the loading of that file there is also the need to create a 5760-entry table before that file can be processed. That doesn't take too long in compiled form but the overhead is still there.

But if you still think this is interesting for the thinBasic core I will write the necessary code. After I finish my BigInt module, because this was "just" an intermezzo. ;)

ErosOlmi

06-05-2011, 11:45

But if you still think this is interesting for the thinBasic core I will write the necessary code. After I finish my BigInt module, because this was "just" an intermezzo. ;)

For me adding new native commands (compiled in Core or other modules) is always good as far as the commands are generic and usable in different situations.

Of course I will never create a command that has a dependancy file (even if the file is small) but if the only problem is needed memory I do not see any problem: important is to mention in help file what will happen when using that specific keyword so the programmer is informed.

Ciao
Eros