View Full Version : Toom-Cook Multiplication
danbaron
12-07-2011, 07:40
I think that Toom-Cook is still one of the fastest methods for multiplying big integers of less than 10,000 digits (the other being Karatsuba, which I haven't looked at).
http://en.wikipedia.org/wiki/Schönhage–Strassen_algorithm (http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm)
http://en.wikipedia.org/wiki/Toom–Cook_multiplication (http://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication)
http://en.wikipedia.org/wiki/Karatsuba_multiplication
I may be wrong (I often am), but my impression is that Toom-Cook is pretty simple.
I can show my idea of the basic concept by an example.
Say, we want to multiply 23 times 35.
We write,
p(x) = 2x + 3,
q(x) = 3x + 5.
We are using our realization that any integer can be written as a polynomial.
Here, p(x), represents 23, and q(x), represents 35, when x equals 10.
We write,
p(x)q(x) = r(x).
That is, p(x) times q(x), equals r(x).
So,
(2x + 3)(3x + 5) = ax^2 + bx + c = r(x).
Now,
p(0)q(0) = r(0).
So,
(2*0 + 3)(3*0 + 5) = a*0 + b*0 + c.
Therefore,
c = 15.
Now,
p(1)q(1) = r(1).
Therefore, when we do the substitutions (for x and c),
a + b = 25.
Now,
p(-1)q(-1) = r(-1).
Therefore, when we do the substitutions (for x and c),
a - b = -13.
Now, we already know c, and we just need to find a and b.
We have two linear equations and two unknowns,
a + b = *25,
a - b = -13.
We just add the two equations and we get,
2a = 12.
Therefore,
a = 6.
Now, we can substitute 6 for a in,
a + b = 25,
and we get,
b = 19.
So,
r(x) = 6x^2 + 19x + 15.
Now, we substitute 10 for x in r(x), and we are done,
r(10) = 600 + 190 + 15 = 805.
Believe it or not!
:!: :idea: :?:
(We could have just multiplied p(x) times q(x) to find the coefficients of r(x), right? That is, we could have multiplied (2x + 3)(3x + 5); but this is supposed to be fast multiplication, and that is what we want to avoid doing (If we multiply p(x) times q(x), then we have gained nothing, correct?). That is the trick to the method, we don't do those multiplications. In a real case, the coefficients of p(x) and q(x) would typically be huge.
Also, I think that usually p(x) and q(x) are written as quadratic polynomials (both of the numbers to be multiplied are split into 3 parts of approximately equal numbers of digits), and then, the method is called Toom-3 (a quadratic polynomial has 3 coefficients). How, if you wanted to do so, you would determine the polynomial degree which is most efficient for a particular multiplication, I don't know.
Also, when at the end we substituted 10 into r(x) to get the answer, 805, in this case, doing that only required register shifting and additions. We should always be able to choose our base so that this final step only requires register shifting and additions.)
I thought multiplication was not an expensive operation Dan. That division was. So instead of dividing by 2 you would multiply by 0.5
It will take some math experts and low level coders to figure out a timing run of these methods perhaps.
danbaron
14-07-2011, 07:08
Hi Kent.
Every operation becomes expensive if the operands are big enough, right?
We're talking about multiplications of "big" integers, with maybe millions of digits.
In that case, there are methods which are thousands of times faster than the way you would multiply 123 times 456 with a pencil and paper.
I agree, it is more efficient to multiply by the reciprocal of the divisor, than to divide by the divisor.
But, what if the divisor has a million digits?
Then, if you want to get the reciprocal, you are no longer working with integers, now you have to keep track of the decimal point.
An integer can be written as,
an*x^n + a(n-1)*x^(n-1) + a(n-2)*x(n-2) + .. + a1*x^1 + a0*x^0.
For instance,
12345 can be written as,
1*10^4 + 2*10^3 + 3*10^2 + 4*10^1 + 5*10^0.
(10^0 = 1. In fact for any positive integer n, n^0 = 1.)
But, if you are working with floating point numbers, you instead have,
an*x^n + a(n-1)*x^(n-1) + a(n-2)*x(n-2) + .. + a1*x^1 + a0*x^0 + a(-1)*x(-1) + a(-2)*x(-2) + .. (<-- This could go on forever, and for some numbers, like pi, it does.)
For instance,
12345.678 can be written as,
1*10^4 + 2*10^3 + 3*10^2 + 4*10^1 + 5*10^0 + 6*10^(-1) + 7*10^(-2) + 8*10^(-3).
So, when working with big integers, and you want to do division using multiplication by the reciprocal, you introduce another complication - you are now working with big floating point numbers. But, your idea may be the best way anyway, I haven't investigated division.
(When I use the word, "big", I mean unlimited precision.)
(By the way, how would you find the decimal representation of, say, the reciprocal of 123456789, i.e., the decimal representation of 1 / 123456789, without using division? I think it can be done, but, to me, how is not obvious.)
Remember, that even if you use the QUAD integer type, you are limited to integers not greater than 2^63, or approximately, 10^19. It means that you only have 19 digits of precision. Any calculations with integers having more than 19 digits will be approximations if there is a cast to type EXT, otherwise, they will either generate an overflow error, or will be plain wrong.
Racket has built-in big integers. You can see below, that on my machine it took it 30.41 seconds to do the exact calculation of 999999^999999.
You can also see that that number has 5,999,994 digits. If you could fit 50 lines of 100 digits per page, it would take approximately 1200 pages to print that number.
I don't know how Racket does it, but, it performs calculations on big integers really fast.
My guess is that on my machine, by the pencil and paper method, the calculation of 999999^999999 would probably finish - never.
So, if a person wants to do arithmetic on big integers, and he has a choice of using Implementation A or Implementation B, and Implementation B is 10,000 times as fast as Implementation A, then, if the price ($$$) of the software is not a factor, most likely he will use Implementation B.
' code -----------------------------------------------------------------------------------------------------------------
#lang racket
(define big-num 0)
(define (num-digits n)
(add1 (order-of-magnitude n)))
(define (time-function f m n)
(let ((t1 (current-milliseconds)))
(set! big-num (f m n))
(let ((t2 (current-milliseconds)))
(let ((tt (/ (- t2 t1) 1000.0)))
tt))))
' REPL interactions ----------------------------------------------------------------------------------------------------
Welcome to DrRacket, version 5.1 [3m].
Language: racket.
> (time-function expt 999999 999999)
30.41
> (num-digits big-num)
5999994
>
Wow you are talking about big numbers Dan. I can't even begin to wrap my mind around them... but this gave me an idea.
What if they come up with a new base system. Something like base 256. Use symbols from ascii table to represent the numbers in this new base as they do in Hexadecimal-- with 0 - 9 and A - F?
So a quad sized integer would handle quite a larger number as you pointed out. 2^63 = 9,223,372,036,854,775,808 10^19 = 10,000,000,000,000,000,000 256^8 = 18,446,744,073,709,551,616
To simplify these huge numbers... let's divide them equally so we can use 9.2 with 63 expSize 10.0 with 19 expSize and 18.4 with 8 expSize. So for 1/8 of the exponent size compared to base 2, we get a number that is twice the value.
If base 2 is 64 bits as you showed for a quad, then 64 bits of base 256 would be: 5.2,374,249,726,338,269,920,211,035,149,242e+151 which I think is (2^63) x 5.678,427,533,559,428,832,416,592,249,125e+132
Does this make any sense? Would it effect performance using a different base system?
danbaron
14-07-2011, 09:39
To work with big integers, we have to make registers to contain them. A register is a one dimensional array of a particular data type.
Say we have a register consisting of 10,000 BYTEs. Then, that register can contain an integer of 10,000 digits.
We could use base 10, but then we would be wasting most of each BYTE. If our data type is BYTE, then, the natural base to use is 256, because a byte is 8 bits, and 2^8 = 256. We don't need to make up any special symbols, a base 256 digit ranges from 0 to 255.
The larger the base we use, the fewer will be the number of digits for a particular integer. Then, during a multiplication, there are fewer carries, so, it seems to me that generally, the bigger the base, the better. On the other hand, multiplication with the smallest base, 2, is amazingly simple.
On the other thread I tried using base 65536 (2^16). I was using C, and that was the biggest base I could implement, the reason being that for a 32 bit system, an unsigned long int is 4 bytes, so the largest value it can represent is 2^32 - 1, = 4,294,967,295, or 65536 * 65536 - 1. In base 65536, when you multiply two digits, the largest value you can get is 65535 * 65535 = 4,294,836,225. If I used a larger base than 65536, I would sometimes overflow the unsigned long int.
But, of course, if we use a base other than 10, most likely we will be confronted with the problem of doing conversions back and forth to base 10, assuming that the user expects the input and output to be in base 10.
:evil: :twisted:
regarding Bases, look at this comic about the third base (ternary numbers):
http://www.calamitiesofnature.com/archive/?c=548
Brian Hayes said about the third base: They are the Goldilocks choice among numbering systems: When base 2 is too small and base 10 is too big, base 3 is just right.
http://www.americanscientist.org/issues/pub/third-base/
there is a reasoning about that.
That comic was funny :)
Dan, thanks for explaining all of this. Otherwise it would be areas that I never would have thought about.
danbaron
14-07-2011, 23:19
Unfortunately, I have to report zak to the internet police, because of the joke. :twisted:
The article about ternary is good. That guy is smart.
-------------------------------------------------------
"Wow you are talking about big numbers Dan. I can't even begin to wrap my mind around them... but this gave me an idea."
Concerning numbers that are unimaginably big, I always go back to the Ackermann function.
http://en.wikipedia.org/wiki/Ackermann_function
Just above the section titled, "Inverse", you can find this quote, referring to A(4,3).
"Written as a power of 10, this is roughly equivalent to 106.031×1019,727."
(http://en.wikipedia.org/wiki/Ackermann_function)
(A(4,4) is unimaginably bigger than A(4,3), and A(5,*) is unimaginably bigger than A(4,*), but let's just look at A(4,3), as we'll see that it's big enough to amaze.)
Let's forget about the 6.031 (it won't matter for what we will show), and say that A(4,3) equals 10^10^19727.
What does that mean?
We can get an idea by looking at numbers of digits.
10^10^6 has 1,000,001 (one million and one) digits.
10^10^12 has 1,000,000,000,001 (one trillion and one) digits.
Try to imagine how many digits 10^10^19727 has.
If you can conjure any idea at all, you're a lot more imaginative than me.
For the Ackermann function coded according to the recursive definition, I would be surprised if the value of A(4,3) is calculated during our lifetimes. In fact, I now offer the prize of $10 to the first one who does it! (Actually, I would be surprised if during our lifetimes, a computer is just able to count, 1,2,3, .., to 10^10^19727.) If someone did do the calculation for A(4,3), how could you check that it was correct?
:oops: :twisted:
....how could you check that it was correct? That line saved you your $10 Dan :)
I was going to, as a joke post an absurd-idly super long list of numbers and claim the prize, hehehehe.
danbaron
09-12-2011, 00:25
I was able to implement Toom-Cook multiplication.
More specifically, the implementation is of Toom-3, as shown by the example in this link.
http://en.wikipedia.org/wiki/Toom–Cook_multiplication (http://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication)
The good news is, it works correctly.
The bad news is, it runs slower than standard multiplication, and, it should run faster.
By "standard" multiplication, I mean the way you would multiply 123 times 456 with a pencil and a piece of paper.
The code is written in Fortran, using the GNU gfortran compiler, and the Code::Blocks IDE.
http://www.thinbasic.com/community/showthread.php?11410-a-fortran-environment&highlight=fortran
In the code below, only the program file is shown. It uses 4 other module files, "random", "reg", "time", and, "toom".
In the example program below, randomly generated integers of various sizes are multiplied by the two methods, standard multiplication, and, Toom-3. In each case, the times for the two multiplications are shown.
Toom multiplication works by recursively breaking down the two integers to be multiplied, into smaller and smaller pieces. The recursion cannot go on forever, that is where the variable, "toomlimit", shown in the code, is used. When the integer pieces become smaller than the value of toomlimit, the pieces are multiplied by standard multiplication, and then the recursion chain rewinds.
Here, I used 100 for the value of toomlimit. This actually gave the worst times, and a large number of recursion calls (2011-12-09: My calculations for the number of recursion calls were wrong, I fixed them.), as you can see in the output. Seemingly, the bigger toomlimit is set, the faster is the Toom multiplication. But, my experience is that no matter how big you set it, you cannot beat the corresponding standard multiplication. I don't know why I got this result.
I've checked the answers to the two multiplication procedures for various example problems, and, the answers I get are always correct.
In the program below, as you can see from the output, for each multiplication, the answers given by each procedure were checked to see if they were the same, and in each case they were.
Maybe I'll try to implement Schonhage-Strassen multiplication.
http://en.wikipedia.org/wiki/Schönhage–Strassen_algorithm (http://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm)
If I can get it to work, maybe the results will be better.
(I don't expect it, but, if anyone wants to see the 4 other module files, I can post them.)
# code ---------------------------------------------------------------------------------------------------------
program testtoom
use random
use reg
use time
use toom
implicit none
type(regtype)::r1,r2,r3,r4
real::t1,t2
integer::numdigits,i
character(len=10)::s
toomlimit=100
call allocateseeds()
call setdefaultseed()
do i=1,5
toomcount=0
numdigits=10000*i
maxdigits=2*numdigits+1
call allocatereg(r1,numdigits)
call setrandomreg(r1)
call allocatereg(r2,numdigits)
call setrandomreg(r2)
call allocatereg(r3,maxdigits)
call allocatereg(r4,maxdigits)
call startclock()
call standardmult(r1,r2,r3)
call stopclock()
call elapsedtime(t1)
call startclock()
call toommult(r1,r2,r4)
call stopclock()
call elapsedtime(t2)
print'(a,i5,a)','multiplication of two ',numdigits,' digit random integers'
print*
print'(a)','standard multiplication'
call printelapsedtime(t1)
print*
print'(a)', 'Toom multiplication'
call printelapsedtime(t2)
print*
print'(a,l1)','Products are equal? ',regsareequal(r3,r4)
print'(a,i5)','Number of times recursive subroutine toommult called = ',toomcount
print*
print'(a)','--------------------------------------------------------------------------'
print*
call deallocatereg(r1)
call deallocatereg(r2)
call deallocatereg(r3)
call deallocatereg(r4)
enddo
print'(a)', '"Enter" a character to quit.'
read*,s
end program
# output -------------------------------------------------------------------------------------------------------
multiplication of two 10000 digit random integers
standard multiplication
elapsed CPU time in seconds = 2.964019E+0000
Toom multiplication
elapsed CPU time in seconds = 2.151254E+0001
Products are equal? T
Number of times recursive subroutine toommult called = 781
--------------------------------------------------------------------------
multiplication of two 20000 digit random integers
standard multiplication
elapsed CPU time in seconds = 1.201208E+0001
Toom multiplication
elapsed CPU time in seconds = 7.932651E+0001
Products are equal? T
Number of times recursive subroutine toommult called = 781
--------------------------------------------------------------------------
multiplication of two 30000 digit random integers
standard multiplication
elapsed CPU time in seconds = 2.730017E+0001
Toom multiplication
elapsed CPU time in seconds = 3.191937E+0002
Products are equal? T
Number of times recursive subroutine toommult called = 3906
--------------------------------------------------------------------------
multiplication of two 40000 digit random integers
standard multiplication
elapsed CPU time in seconds = 4.867230E+0001
Toom multiplication
elapsed CPU time in seconds = 5.562683E+0002
Products are equal? T
Number of times recursive subroutine toommult called = 3906
--------------------------------------------------------------------------
multiplication of two 50000 digit random integers
standard multiplication
elapsed CPU time in seconds = 8.098010E+0001
Toom multiplication
elapsed CPU time in seconds = 8.536375E+0002
Products are equal? T
Number of times recursive subroutine toommult called = 3906
--------------------------------------------------------------------------
"Enter" a character to quit.