PDA

View Full Version : Performance Tests



Charles Pegge
21-03-2009, 08:54
' Test of loop optimization with Oxygen
uses "Console", "Oxygen"

dim T1 as quad
dim T2 as quad

dim p0,p1,p2 as dword
dim src as string

' -- Prepare Oxygen script

src = "

function Calculate_basic(c as long) as long at #p1
local ic, jc as double, i as long
'
'
'BASIC VERSION
'
for i=1 to c
ic = sin(i)
jc = cos(i)
next
end function


'


function Calculate_asm(c as long) as long at #p2
local ic, jc as double, i as long
'
'ASSEMBLER VERSION
'
'
dim a=&c
mov eax,a
mov edx,[eax]
fld1
fld1
(
fsincos
fstp qword jc
fstp qword ic
fld1
faddp st(1),st(0)
fld st(0),st(0)
dec edx
jg repeat
)
fcomp st(0),st(0)
'
end function

sub finish() at #p0
terminate
end sub
"

o2_basic src
if len(o2_error) then
msgbox 0, o2_error : stop
end if
o2_exec


dim c as long=1000000

declare function Calculate_o2bas(c as long) as long at p1
declare function Calculate_o2asm(c as long) as long at p2
declare sub finish () at p0

printl "SPEED BENCHMARKS:"
printl ""
printl "Double precision Sin & Cos Test: loops=" & c
printl ""

doevents(off)

hirestimer_init

T1 = hirestimer_get
Counter c
T2 = hirestimer_get

printl "Script: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")

T1 = hirestimer_get
Calculate_o2bas c
T2 = hirestimer_get

printl "Compiled: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")

T1 = hirestimer_get
Calculate_o2asm c
T2 = hirestimer_get

printl "Assembler: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")

finish
waitkey

function Counter( MaxCount as long ) as long
local ic, jc as double,i as long
For i = 1 To MaxCount
jc = sin(i)
jc = cos(i)
Next
end function


Correction to Assembly code

Charles Pegge
21-03-2009, 13:38
Double precision square.
In this case the difference is more extreme!



' Test of loop optimization with Oxygen
uses "Console", "Oxygen"

dim T1 as quad
dim T2 as quad

dim p0,p1,p2 as dword
dim src as string

' -- Prepare Oxygen script

src = "

function Calculate_basic(c as long) as long at #p1
local ic, jc as double, i as long
'
'
'BASIC VERSION
'
for i=1 to c
ic = i*i
next
end function


'


function Calculate_asm(c as long) as long at #p2
local ic, jc as double, i as long
'
'ASSEMBLER VERSION
'
'
dim a=&c
mov eax,a
mov edx,[eax]
fld1
fld1
(
fmul st(0),st(0)
fstp qword ic
fld1
faddp st(1),st(0)
fld st(0),st(0)
dec edx
jg repeat
)
fcomp st(0),st(0)
'
end function

sub finish() at #p0
'print `This neither`
terminate
end sub
"

o2_basic src
if len(o2_error) then
msgbox 0, o2_error : stop
end if
o2_exec


dim c as long=1000000

declare function Calculate_o2bas(c as long) as long at p1
declare function Calculate_o2asm(c as long) as long at p2
declare sub finish () at p0

printl "SPEED BENCHMARKS:"
printl ""
printl "Double precision square Test: loops=" & c
printl ""

doevents(off)

hirestimer_init

T1 = hirestimer_get
Counter c
T2 = hirestimer_get

printl "Script: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")

T1 = hirestimer_get
Calculate_o2bas c
T2 = hirestimer_get

printl "Compiled: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")

T1 = hirestimer_get
Calculate_o2asm c
T2 = hirestimer_get

printl "Assembler: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")

finish
waitkey

function Counter( MaxCount as long ) as long
local ic, jc as double,i as long
For i = 1 To MaxCount
ic = i*i
Next
end function

ErosOlmi
03-04-2009, 15:29
Dear Charles,

I'm working on next thinBasic 1.7.8.0 and your latest thinBasic_Oxygen.dll
I'm having GPF problems in testing your speed test scripts. It seems function "Calculate_asm" GPF when it reach "jg repeat"

Do you see any reasons why this can happen?

Thanks a lot
Eros

Charles Pegge
03-04-2009, 23:09
Hi Eros,

Good to have you back :)

I found a flaw in the SinCos test which could have generated the GPF. I used fst instead of fstp
to store one of the variables, which means the FPU would accumulate junk and spin its 8 registers round - generally being disruptive. I had it right the first time - case of doublethink - sorry!

Have you had any problems with the square test? The FPU stacking looks okay there.

Charles.

ErosOlmi
04-04-2009, 07:54
Thanks Charles. I'm back and all seems fine now :)
_________________________________________________

Your examples are ok now. Sin/Cos and square tests are running fine here.
Hey: what a difference in execution time!!!!!

In Sin/Cos test I get the following results:
Table 1:[/b] Sin/Cos test"]
Type Time in microseconds
Script 718,323
Compiled 105,460
Assembler 61,385


In Square test I get the following results:
Table 2:[/b] Square test"]
Type Time in microseconds
Script 449,875
Compiled 4,531
Assembler 2,203


Oxygen (I prefer the BASIC syntax) is a miracle. What a great job you have done.

Thanks
Eros

Charles Pegge
04-04-2009, 10:09
Well Eros, it has taken 6 months of intensive work, beyond the assembler to get this far. I had not appreciated the full intricacies of putting a Basic compiler together, - even a minimalist one, and I can now appreciate the immense amount of work you had to put into thinBasic.

But it should give programmers the option of boosting the performance of any part of their code that requires it. -No need to worry about speed any more.

The next stage is to apply it to as many demanding tasks as possible and flush out the remaining bugs and bottlenecks.

Charles

Petr Schreiber
04-04-2009, 18:10
True script acceleration Charles,

here are my results:

Sin/Cos


TypeTime in microseconds


Script
1 170 480


Compiled
121 125


Assembler
78 326



Square


TypeTime in microseconds


Script
636 667


Compiled
3 889


Assembler
2 501



CPU was AMD Sempron 64bit 3400+ ( but 1,8GHz and running on 32bit WinXP Home )
:occasion:

Michael Hartlef
05-04-2009, 09:17
Hi Charles,

wow that is power. You truly smashed the arguments about not using a interpreted language because of speed reasons. Ok thinBasic is not so interpreted anymore then, but I finally studied O2H now :read: and how you integrated it into thinBasic is just awesome. Code the function in O2H, declare a new function pointer in thinBasic and use it like other functions. Simply amazing! :eusaclap:

Here are my results:

Sin/Cos


TypeTime in microseconds


Script
479 201


Compiled
69 522


Assembler
45 696



Square


TypeTime in microseconds


Script
311 688


Compiled
2 409


Assembler
1 417



:drink:

Michael

Charles Pegge
05-04-2009, 13:09
Thank you all for the figures!

Here is an extension of the square test comparing 2 additional techniques. The ultimate, is using the SIMD registers providing an additional 4X over normal assemler :)

I found these tests very sensitive to byte alignment (as Bob Zale recently reminded us in the PB Gazette) so I have put an Align macro in at the head of each loop. With the wrong alignment, I found the performance dropped by as much as 30% on otherwise identical tests.

CAVEAT: SIMD only working to single precision.

Square


TypeTime in microseconds


Script
652 597


Compiled
3 509


Mixed
3 186


Assembler
2 255


SIMD Asm
551






' Test of loop optimization with Oxygen

'SQUARES

uses "Console", "Oxygen"

dim T1 as quad
dim T2 as quad

dim p0,p1,p2,p3,p4 as dword
dim src as string

' -- Prepare Oxygen script

src = "

def align o2 !10 'align 16 bytes

function Calculate_basic(c as long) as long at #p1
local as double ic, jc, as long i
'
'
'BASIC VERSION
'
align
for i=1 to c
ic = i*i
next
#endv

end function


function Calculate_asmbas(c as long) as long at #p2
local as double ic, jc, as long i
'
'ASSEMBLER / BASIC MIXED VERSION
'
ecx=c
align
do
dec ecx : jl exit
inc i : ic = i*i
loop
end function


'


function Calculate_asm(c as long) as long at #p3
local ic, jc as double, i as long
'
'ASSEMBLER VERSION
'
'
dim a=&c
mov eax,a
mov edx,[eax]
fld1
fld1
align
(
fmul st(0),st(0)
fstp qword ic
fld1
faddp st(1),st(0)
fld st(0),st(0)
dec edx
jg repeat
)
fcomp st(0),st(0)
'
end function



function Calculate_simd(c as long) as long at #p4
'
'SIMD VERSION
'
'declare xmm vars first for correct alignment
'
local ic,ia as xmm
local i as long
ia.1=>4,4,4,4
movups xmm2,ia
ia.1=>1,2,3,4
movups xmm1,ia
movups xmm1,xmm0
ecx=c
align
do
sub ecx,4 : jl exit
movups xmm0,xmm1
mulps xmm0,xmm0
movups ic,xmm0
addps xmm1,xmm2
loop


'
end function

sub finish() at #p0
'print `This neither`
terminate
end sub


"

o2_basic src

'msgbox 0, o2_len+$cr+o2_view "o2h "+src

if len(o2_error) then
msgbox 0, o2_error : stop
end if

o2_exec


dim c as long=1000000

declare function Calculate_o2bas(c as long) as long at p1
declare function Calculate_o2asmbas(c as long) as long at p2
declare function Calculate_o2asm(c as long) as long at p3
declare function Calculate_o2simd(c as long) as long at p4
declare sub finish () at p0

printl "SPEED BENCHMARKS:"
printl ""
printl "Double precision square Test: loops=" & c
printl ""

doevents(off)

hirestimer_init
T1 = hirestimer_get
Counter c
T2 = hirestimer_get
printl "Script: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")

hirestimer_init
T1 = hirestimer_get
Calculate_o2bas c
T2 = hirestimer_get
printl "Compiled: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")

hirestimer_init
T1 = hirestimer_get
Calculate_o2asmbas c
T2 = hirestimer_get
printl "Mixed: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")


hirestimer_init
T1 = hirestimer_get
Calculate_o2asm c
T2 = hirestimer_get
printl "Assembler: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")

hirestimer_init
T1 = hirestimer_get
Calculate_o2simd c
T2 = hirestimer_get
printl "SIMD: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")



finish
waitkey

function Counter( MaxCount as long ) as long
local ic, jc as double,i as long
For i = 1 To MaxCount
ic = i*i
Next
end function

Petr Schreiber
05-04-2009, 14:26
Charles,

that is incredible.

Here I my results:
TypeTime (microseconds)Script
676 625
Compiled
4 087
Mixed
4 015
Assembler
2 521
SIMD
629


For the first time I see how SIMD makes it truly 4x faster.

Michael Hartlef
05-04-2009, 14:35
:shock:

And here I my results:

TypeTime (microseconds)Script
297 414
Compiled
2 411
Mixed
2 684
Assembler
1 478
SIMD
544


I guess I have to consult you to speed up some collision routines :)

Michael Hartlef
05-04-2009, 14:37
Which processor generation started to support SIMD register?

ErosOlmi
05-04-2009, 14:40
TypeTime (microseconds)Script
442 438
Compiled
4 567
Mixed
5 914
Assembler
2 214
SIMD
1 009

Petr Schreiber
05-04-2009, 15:04
I think SSE was supported for the first time on the Pentium III.

Charles Pegge
05-04-2009, 15:15
Yes Intel introduced SIMD with the Pentium 3 in 1999. (My remaining 486 PC died around that time - it knew its time had come :()

PS: The performance figures do fluctuate a bit - so you need to run the test several times to get typical figures.
I've also tried shuffling the instructions around and opening up the loop but I can't boost the performance measurably. Memory loading and storing seem to be the bottlenecks on this test.

GSAC3
06-04-2009, 03:11
Eros:

I have been trying to run the various O2 speed test examples in this thread using thinBasic version 1.8.1.0 under XP without success. Please see attached screenshot of the error I am getting when I try to run the first example posted in this thread.

Don

ErosOlmi
06-04-2009, 03:59
Don,

you have a temp thinBasic version I released to fix some problems.
Remove it and install this preview version I'm still working on: http://www.thinbasic.biz/projects/thinbasic/thinBasic_1.7.8.0.zip

GSAC3
06-04-2009, 19:48
Eros:

Thanks for you'r reply -- all speed tests work now.

Here are results on my XP system:

SCRIPT 275,903
COMPILED 2,836
MIXED 3,327
ASSEMBLED 1,648
SIMD 652

Don

Charles Pegge
18-07-2009, 12:21
Measuring Performance within Oxygen code





'------------------------
'MEASURING EXECUTION TIME
'========================


uses "oxygen"

dim src as string

src = "
#basic

DECLARE FUNCTION QueryPerformanceCounter LIB `KERNEL32.DLL` ALIAS `QueryPerformanceCounter` (lpPerformanceCount AS QUAD) AS LONG
DECLARE FUNCTION QueryPerformanceFrequency LIB `KERNEL32.DLL` ALIAS `QueryPerformanceFrequency` (lpFrequency AS QUAD) AS LONG

dim as quad t1,t2,fr

QueryPerformanceCounter t1

dim i,e

e=1E8 'NUMBER OF LOOPS

'-------------------
'CODE BEING MEASURED
'===================
'
for i=1 to E
next

'===================


QueryPerformanceCounter t2
QueryPerformanceFrequency fr

print `Loops: ` e ` Exec time: ` str((t2-t1)/fr)


terminate
"
'msgbox 0,o2_view src
o2_basic src
if len(o2_error) then
msgbox 0, o2_error : stop
end if
o2_exec