View Full Version : Performance Tests
Charles Pegge
21-03-2009, 08:54
' Test of loop optimization with Oxygen
uses "Console", "Oxygen"
dim T1 as quad
dim T2 as quad
dim p0,p1,p2 as dword
dim src as string
' -- Prepare Oxygen script
src = "
function Calculate_basic(c as long) as long at #p1
local ic, jc as double, i as long
'
'
'BASIC VERSION
'
for i=1 to c
ic = sin(i)
jc = cos(i)
next
end function
'
function Calculate_asm(c as long) as long at #p2
local ic, jc as double, i as long
'
'ASSEMBLER VERSION
'
'
dim a=&c
mov eax,a
mov edx,[eax]
fld1
fld1
(
fsincos
fstp qword jc
fstp qword ic
fld1
faddp st(1),st(0)
fld st(0),st(0)
dec edx
jg repeat
)
fcomp st(0),st(0)
'
end function
sub finish() at #p0
terminate
end sub
"
o2_basic src
if len(o2_error) then
msgbox 0, o2_error : stop
end if
o2_exec
dim c as long=1000000
declare function Calculate_o2bas(c as long) as long at p1
declare function Calculate_o2asm(c as long) as long at p2
declare sub finish () at p0
printl "SPEED BENCHMARKS:"
printl ""
printl "Double precision Sin & Cos Test: loops=" & c
printl ""
doevents(off)
hirestimer_init
T1 = hirestimer_get
Counter c
T2 = hirestimer_get
printl "Script: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")
T1 = hirestimer_get
Calculate_o2bas c
T2 = hirestimer_get
printl "Compiled: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")
T1 = hirestimer_get
Calculate_o2asm c
T2 = hirestimer_get
printl "Assembler: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")
finish
waitkey
function Counter( MaxCount as long ) as long
local ic, jc as double,i as long
For i = 1 To MaxCount
jc = sin(i)
jc = cos(i)
Next
end function
Correction to Assembly code
Charles Pegge
21-03-2009, 13:38
Double precision square.
In this case the difference is more extreme!
' Test of loop optimization with Oxygen
uses "Console", "Oxygen"
dim T1 as quad
dim T2 as quad
dim p0,p1,p2 as dword
dim src as string
' -- Prepare Oxygen script
src = "
function Calculate_basic(c as long) as long at #p1
local ic, jc as double, i as long
'
'
'BASIC VERSION
'
for i=1 to c
ic = i*i
next
end function
'
function Calculate_asm(c as long) as long at #p2
local ic, jc as double, i as long
'
'ASSEMBLER VERSION
'
'
dim a=&c
mov eax,a
mov edx,[eax]
fld1
fld1
(
fmul st(0),st(0)
fstp qword ic
fld1
faddp st(1),st(0)
fld st(0),st(0)
dec edx
jg repeat
)
fcomp st(0),st(0)
'
end function
sub finish() at #p0
'print `This neither`
terminate
end sub
"
o2_basic src
if len(o2_error) then
msgbox 0, o2_error : stop
end if
o2_exec
dim c as long=1000000
declare function Calculate_o2bas(c as long) as long at p1
declare function Calculate_o2asm(c as long) as long at p2
declare sub finish () at p0
printl "SPEED BENCHMARKS:"
printl ""
printl "Double precision square Test: loops=" & c
printl ""
doevents(off)
hirestimer_init
T1 = hirestimer_get
Counter c
T2 = hirestimer_get
printl "Script: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")
T1 = hirestimer_get
Calculate_o2bas c
T2 = hirestimer_get
printl "Compiled: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")
T1 = hirestimer_get
Calculate_o2asm c
T2 = hirestimer_get
printl "Assembler: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")
finish
waitkey
function Counter( MaxCount as long ) as long
local ic, jc as double,i as long
For i = 1 To MaxCount
ic = i*i
Next
end function
ErosOlmi
03-04-2009, 15:29
Dear Charles,
I'm working on next thinBasic 1.7.8.0 and your latest thinBasic_Oxygen.dll
I'm having GPF problems in testing your speed test scripts. It seems function "Calculate_asm" GPF when it reach "jg repeat"
Do you see any reasons why this can happen?
Thanks a lot
Eros
Charles Pegge
03-04-2009, 23:09
Hi Eros,
Good to have you back :)
I found a flaw in the SinCos test which could have generated the GPF. I used fst instead of fstp
to store one of the variables, which means the FPU would accumulate junk and spin its 8 registers round - generally being disruptive. I had it right the first time - case of doublethink - sorry!
Have you had any problems with the square test? The FPU stacking looks okay there.
Charles.
ErosOlmi
04-04-2009, 07:54
Thanks Charles. I'm back and all seems fine now :)
_________________________________________________
Your examples are ok now. Sin/Cos and square tests are running fine here.
Hey: what a difference in execution time!!!!!
In Sin/Cos test I get the following results:
Table 1:[/b] Sin/Cos test"]
Type Time in microseconds
Script 718,323
Compiled 105,460
Assembler 61,385
In Square test I get the following results:
Table 2:[/b] Square test"]
Type Time in microseconds
Script 449,875
Compiled 4,531
Assembler 2,203
Oxygen (I prefer the BASIC syntax) is a miracle. What a great job you have done.
Thanks
Eros
Charles Pegge
04-04-2009, 10:09
Well Eros, it has taken 6 months of intensive work, beyond the assembler to get this far. I had not appreciated the full intricacies of putting a Basic compiler together, - even a minimalist one, and I can now appreciate the immense amount of work you had to put into thinBasic.
But it should give programmers the option of boosting the performance of any part of their code that requires it. -No need to worry about speed any more.
The next stage is to apply it to as many demanding tasks as possible and flush out the remaining bugs and bottlenecks.
Charles
Petr Schreiber
04-04-2009, 18:10
True script acceleration Charles,
here are my results:
Sin/Cos
TypeTime in microseconds
Script
1 170 480
Compiled
121 125
Assembler
78 326
Square
TypeTime in microseconds
Script
636 667
Compiled
3 889
Assembler
2 501
CPU was AMD Sempron 64bit 3400+ ( but 1,8GHz and running on 32bit WinXP Home )
:occasion:
Michael Hartlef
05-04-2009, 09:17
Hi Charles,
wow that is power. You truly smashed the arguments about not using a interpreted language because of speed reasons. Ok thinBasic is not so interpreted anymore then, but I finally studied O2H now :read: and how you integrated it into thinBasic is just awesome. Code the function in O2H, declare a new function pointer in thinBasic and use it like other functions. Simply amazing! :eusaclap:
Here are my results:
Sin/Cos
TypeTime in microseconds
Script
479 201
Compiled
69 522
Assembler
45 696
Square
TypeTime in microseconds
Script
311 688
Compiled
2 409
Assembler
1 417
:drink:
Michael
Charles Pegge
05-04-2009, 13:09
Thank you all for the figures!
Here is an extension of the square test comparing 2 additional techniques. The ultimate, is using the SIMD registers providing an additional 4X over normal assemler :)
I found these tests very sensitive to byte alignment (as Bob Zale recently reminded us in the PB Gazette) so I have put an Align macro in at the head of each loop. With the wrong alignment, I found the performance dropped by as much as 30% on otherwise identical tests.
CAVEAT: SIMD only working to single precision.
Square
TypeTime in microseconds
Script
652 597
Compiled
3 509
Mixed
3 186
Assembler
2 255
SIMD Asm
551
' Test of loop optimization with Oxygen
'SQUARES
uses "Console", "Oxygen"
dim T1 as quad
dim T2 as quad
dim p0,p1,p2,p3,p4 as dword
dim src as string
' -- Prepare Oxygen script
src = "
def align o2 !10 'align 16 bytes
function Calculate_basic(c as long) as long at #p1
local as double ic, jc, as long i
'
'
'BASIC VERSION
'
align
for i=1 to c
ic = i*i
next
#endv
end function
function Calculate_asmbas(c as long) as long at #p2
local as double ic, jc, as long i
'
'ASSEMBLER / BASIC MIXED VERSION
'
ecx=c
align
do
dec ecx : jl exit
inc i : ic = i*i
loop
end function
'
function Calculate_asm(c as long) as long at #p3
local ic, jc as double, i as long
'
'ASSEMBLER VERSION
'
'
dim a=&c
mov eax,a
mov edx,[eax]
fld1
fld1
align
(
fmul st(0),st(0)
fstp qword ic
fld1
faddp st(1),st(0)
fld st(0),st(0)
dec edx
jg repeat
)
fcomp st(0),st(0)
'
end function
function Calculate_simd(c as long) as long at #p4
'
'SIMD VERSION
'
'declare xmm vars first for correct alignment
'
local ic,ia as xmm
local i as long
ia.1=>4,4,4,4
movups xmm2,ia
ia.1=>1,2,3,4
movups xmm1,ia
movups xmm1,xmm0
ecx=c
align
do
sub ecx,4 : jl exit
movups xmm0,xmm1
mulps xmm0,xmm0
movups ic,xmm0
addps xmm1,xmm2
loop
'
end function
sub finish() at #p0
'print `This neither`
terminate
end sub
"
o2_basic src
'msgbox 0, o2_len+$cr+o2_view "o2h "+src
if len(o2_error) then
msgbox 0, o2_error : stop
end if
o2_exec
dim c as long=1000000
declare function Calculate_o2bas(c as long) as long at p1
declare function Calculate_o2asmbas(c as long) as long at p2
declare function Calculate_o2asm(c as long) as long at p3
declare function Calculate_o2simd(c as long) as long at p4
declare sub finish () at p0
printl "SPEED BENCHMARKS:"
printl ""
printl "Double precision square Test: loops=" & c
printl ""
doevents(off)
hirestimer_init
T1 = hirestimer_get
Counter c
T2 = hirestimer_get
printl "Script: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")
hirestimer_init
T1 = hirestimer_get
Calculate_o2bas c
T2 = hirestimer_get
printl "Compiled: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")
hirestimer_init
T1 = hirestimer_get
Calculate_o2asmbas c
T2 = hirestimer_get
printl "Mixed: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")
hirestimer_init
T1 = hirestimer_get
Calculate_o2asm c
T2 = hirestimer_get
printl "Assembler: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")
hirestimer_init
T1 = hirestimer_get
Calculate_o2simd c
T2 = hirestimer_get
printl "SIMD: microSecs:" & $tab & FORMAT$(T2 - T1, "#,")
finish
waitkey
function Counter( MaxCount as long ) as long
local ic, jc as double,i as long
For i = 1 To MaxCount
ic = i*i
Next
end function
Petr Schreiber
05-04-2009, 14:26
Charles,
that is incredible.
Here I my results:
TypeTime (microseconds)Script
676 625
Compiled
4 087
Mixed
4 015
Assembler
2 521
SIMD
629
For the first time I see how SIMD makes it truly 4x faster.
Michael Hartlef
05-04-2009, 14:35
:shock:
And here I my results:
TypeTime (microseconds)Script
297 414
Compiled
2 411
Mixed
2 684
Assembler
1 478
SIMD
544
I guess I have to consult you to speed up some collision routines :)
Michael Hartlef
05-04-2009, 14:37
Which processor generation started to support SIMD register?
ErosOlmi
05-04-2009, 14:40
TypeTime (microseconds)Script
442 438
Compiled
4 567
Mixed
5 914
Assembler
2 214
SIMD
1 009
Petr Schreiber
05-04-2009, 15:04
I think SSE was supported for the first time on the Pentium III.
Charles Pegge
05-04-2009, 15:15
Yes Intel introduced SIMD with the Pentium 3 in 1999. (My remaining 486 PC died around that time - it knew its time had come :()
PS: The performance figures do fluctuate a bit - so you need to run the test several times to get typical figures.
I've also tried shuffling the instructions around and opening up the loop but I can't boost the performance measurably. Memory loading and storing seem to be the bottlenecks on this test.
Eros:
I have been trying to run the various O2 speed test examples in this thread using thinBasic version 1.8.1.0 under XP without success. Please see attached screenshot of the error I am getting when I try to run the first example posted in this thread.
Don
ErosOlmi
06-04-2009, 03:59
Don,
you have a temp thinBasic version I released to fix some problems.
Remove it and install this preview version I'm still working on: http://www.thinbasic.biz/projects/thinbasic/thinBasic_1.7.8.0.zip
Eros:
Thanks for you'r reply -- all speed tests work now.
Here are results on my XP system:
SCRIPT 275,903
COMPILED 2,836
MIXED 3,327
ASSEMBLED 1,648
SIMD 652
Don
Charles Pegge
18-07-2009, 12:21
Measuring Performance within Oxygen code
'------------------------
'MEASURING EXECUTION TIME
'========================
uses "oxygen"
dim src as string
src = "
#basic
DECLARE FUNCTION QueryPerformanceCounter LIB `KERNEL32.DLL` ALIAS `QueryPerformanceCounter` (lpPerformanceCount AS QUAD) AS LONG
DECLARE FUNCTION QueryPerformanceFrequency LIB `KERNEL32.DLL` ALIAS `QueryPerformanceFrequency` (lpFrequency AS QUAD) AS LONG
dim as quad t1,t2,fr
QueryPerformanceCounter t1
dim i,e
e=1E8 'NUMBER OF LOOPS
'-------------------
'CODE BEING MEASURED
'===================
'
for i=1 to E
next
'===================
QueryPerformanceCounter t2
QueryPerformanceFrequency fr
print `Loops: ` e ` Exec time: ` str((t2-t1)/fr)
terminate
"
'msgbox 0,o2_view src
o2_basic src
if len(o2_error) then
msgbox 0, o2_error : stop
end if
o2_exec