... y no have more than eight registers?
Especially because PowerBASIC won't let me use ebp or esp. Tried storing ebp in memory to free it for processing but PB wouldn't let me. I know it's "only in my mind" and a habit I picked up working with ARM assembler for over 20 years. Having sixteen all-purpose registers of which only three are reserved (program counter, link register for call, stack pointer) is so comfortable. But I know x86 architecture has a far superior memory management so I just have to learn to change my behaviour.
What I had was this.
FUNCTION BII_MulOld(BYVAL a AS STRING,BYVAL b AS STRING) AS STRING
LOCAL c AS STRING
LOCAL i AS DWORD
LOCAL j AS DWORD
LOCAL n AS DWORD
LOCAL k AS QUAD
LOCAL l AS QUAD
b = b+$NUL+$NUL+$NUL+$NUL
c = STRING$(LEN(a)+LEN(b)+4,0)
FOR i=1 TO LEN(b)-4 STEP 3
n = CVDWD(MID$(b,i,3)+$NUL)
IF n=0 THEN ITERATE FOR
k = 0
FOR j=1 TO LEN(a)-3 STEP 4
l = n*CVDWD(MID$(a,j,4))+CVDWD(MID$(c,j+i-1,4))+k
k = INT(l/BII_4g)
l = l MOD BII_4g
MID$(c,j+i-1,4) = MKDWD$(l)
NEXT j
WHILE k>0
l = CVDWD(MID$(c,j+i-1,4))+k
k = 0
IF l>=BII_4g THEN l = l-BII_4g : k = 1
MID$(c,j+i-1,4) = MKDWD$(l)
j = j+4
WEND
NEXT i
TrimHighNull(c)
FUNCTION = c
END FUNCTION
And after a lot of thinking I ended up with this.
'----------------------------------------------------------------------------
'
' Multiply two unsigned Big Integers.
'
FUNCTION BII_Mul(BYVAL a AS STRING,BYVAL b AS STRING) AS STRING
#REGISTER NONE
LOCAL c AS STRING
LOCAL i AS DWORD
' Outer loop is slowest so longest number in a.
IF LEN(a)<LEN(b) THEN SWAP a,b
' Reserve memory for product.
c = STRING$(LEN(a)+LEN(b),0)
push3
! mov eax,b ; pointer to string b
! mov ebx,[eax-4] ; length of string b
! shr ebx,2 ; number of words in b
! xor edx,edx ; offset for c
asm_mul1:
! mov ecx,[eax] ; word of b
! cmp ecx,0 ; skip if word zero
! je asm_mul5
! push eax ; put registers on stack
! push ebx
! push edx
' Inner loop multiplies a with a word from b and adds this to c.
! mov esi,a ; pointer to string a
! mov ebx,[esi-4] ; length of string a
! shr ebx,2 ; number of words in a
! mov i,ebx
! mov edi,c ; pointer to string c
! add edi,edx ; add offset
! xor ebx,ebx ; carry for first multiplication
asm_mul2:
! mov eax,[esi] ; word of a
! mul ecx ; multiply with word ofn b
! add eax,[edi] ; add to word of c
! adc edx,0 ; carry for high word
! add eax,ebx ; carry of previous multiplication
! adc edx,0 ; carry for high word
! mov [edi],eax ; word of product
! mov ebx,edx ; save high word as carry for next loop
! add esi,4 ; next word of a
! add edi,4 ; next word of c
! dec dword i ; all words in a done?
! jne asm_mul2 ; until counter is zero
' Process possible carry beyond highest word of a.
! add ebx,[edi] ; add final carry
! mov [edi],ebx ; store word
! jnc asm_mul4 ; no carry means finished
asm_mul3:
! add edi,4 ; next word of a
! inc dword [edi] ; carry set, add zero, so inc
! je asm_mul3 ; carry means herhalen
' End inner loop multiplication.
asm_mul4:
! pop edx ; get registers from stack
! pop ebx
! pop eax
asm_mul5:
! add eax,4 ; next word in b
! add edx,4 ; increase offset for c
! dec ebx ; all words in b done?
! jne asm_mul1 ; until counter is zero
pop3
' Remove possible high words zero.
TrimHighNull(c)
FUNCTION = c
END FUNCTION
It was a lot of work but I'm more than pleased with the result: calculating 3 to the power 32,767 now runs more than 243 times faster. The speed increase compared with the original (interpreted) thinBasic include script is a factor of more than 3200.
Bookmarks