PDA

View Full Version : Comparing Mid$ with string pointer



primo
24-12-2017, 12:26
Thanks Eros for the note about using the pointer for a string in the thread Advent of Code, 2017 by Petr, it is much speedier than Mid$
i have tested a timing demo for a text of 270000 digits, when we use pointers its time is about 0.15 seconds but with Mid$ it is around 3.9 , ie 27 times speedier. i choose input String input = Repeat$(30000, "123456789") for reasonable waiting time
if we replace 30000 by 300000, with pointer the time is about 1.6 sec. but with Mid$ it seems like Mid$ lost in the space for the text of 2,700,000 digits, even it should return after about 60 seconds, i have waited 5 minutes and then closed the console. it seems the time for the Mid$ is exponentially larger with the increasing length of the string.
so i deduce Mid$ is suitable for everyday strings while pointer to string is suitable for huge strings

Uses "console"
Long total

Dim T1, T2 As Quad
Extended T3

String input = Repeat$(30000, "123456789")
Long index
Dim TXT(Len(input)) As String * 1 At StrPtr(input)
PrintL "length of the numeric text= " + UBound(TXT)
HiResTimer_Init
T1 = HiResTimer_Delta
For index = 1 To Len(input)

total + TXT(index)
'total + Mid$(input, index,1)
Next
T2 = HiResTimer_Delta
PrintL "sum of digits= " + total

T3 = T2-T1
T3 = T3/1000000 'the one second = million microseconds
PrintL LTrim$(Str$(T3))+ " (seconds)"
WaitKey

Petr Schreiber
26-12-2017, 22:36
Hi Primo,

thanks for sharing this. It is important to keep this in spotlight, as it comes handy in many situations.

Tip #1
LTrim$(Str$(x)) is not needed. Just use Format$(x) for the same result.

Tip #2
HiResTimer_Delta is your friend, it is enough to call:

HiResTimer_Get
delta = HiResTimer_Delta

... to get th Delta from last call. This saves you 2 variables in this case.

Tip #3
You can build your benchmark routine and specific implementations then to functions.
This allows automatical detection of those test functions and automatical benchmarking:


Uses "console"
uses "stat"

function TBMain()
' -- Setup
hiResTimer_Init
string input = repeat$(30000, "123456789")

printL "length of the numeric text = " + len(input)
printl

' -- List all user functions to array
string functions()
function_list(functions, %Function_User)

' -- Benchmark all functions matching signature "APPROACH_*"
for f as long = 1 to countOf(functions)
if isLike(functions(f), "APPROACH_*", false) then
Benchmark(functions(f), input)
end if
next

waitKey
end function

function Benchmark(functionName as string, parameter as string) as long
ext delta
long result

hiResTimer_Get
' -- Call function by name
call functionName(parameter) to result
delta = hiResTimer_Delta ' Delta since last hiResTimer_* call

printl lset$(functionName, 32 using " "), ' -- Pad to left with spaces
format$(delta/1000000, "0.000") + "s" + ' -- Format it like we are interested in zero and 3 decimals
", result = " + result

return result

end function

function Approach_Mid(byRef input as string) as long
long result
for index as long = 1 to len(input)
result + mid$(input, index, 1)
next

return result
end function

function Approach_String1Overlay(byRef input as string) as long
dim TXT(len(input)) as string * 1 at strPtr(input)
long result
for index as long = 1 to len(input)
result + TXT(index)
next

return result
end function

function Approach_Stat(byRef input as string) as long
dim ascii(len(input)) as byte at strPtr(input)
return stat_sum(ascii) - len(input)*48 ' -- Ha ha! ASCII values of numbers start at 48
end function



Petr

ErosOlmi
27-12-2017, 16:37
Interesting:


length of the numeric text = 270000

APPROACH_STRING1OVERLAY 0.293s, result = 1350000
APPROACH_STAT 0.004s, result = 1350000
APPROACH_MID 6.660s, result = 1350000

ErosOlmi
27-12-2017, 20:47
To avoid to use "STAT" module you can substitute the following line

Return stat_sum(ascii) - len(input)*48 ' -- Ha ha! ASCII values of numbers start at 48
with the following line

Return Array Sum Ascii - len(input)*48 ' -- Ha ha! ASCII values of numbers start at 48


Also "Approach_String1Overlay" function can be optimized with the following.
thinBasic will take care to convert string array elements into numbers before summing elements:

function Approach_String1Overlay(byRef input as string) as long
dim TXT(len(input)) as string * 1 at strPtr(input)
Return Array Sum TXT
end function

This will perform 5x faster than original:

length of the numeric text = 270000

APPROACH_STRING1OVERLAY 0.077s, result = 1350000
APPROACH_STAT 0.005s, result = 1350000
APPROACH_MID 6.105s, result = 1350000

ErosOlmi
27-12-2017, 22:11
MID$ function is slow because every time the input string (first parameter of MID$) is parsed as a string expression and copied into another internal temporary string.
If string is quite big it can takes a while.

In the above example string is 270000 bytes and for/next takes place for each byte so 270000 times.
Total number of bytes moved in memory is 270000^2 = 72900000000

I'm trying to see if I found a way to understand if first parameter is not a string expression but a single scalar variable.
In this case I could avoid to copy original string into a temp one but use the original string directly.

ErosOlmi
27-12-2017, 22:56
For the moment I've developed a new function called MIDF$ (fast version of MID$)
Instead of accepting as first parameter a string expression it accept a scalar string variable.
In this way it is able to determine string pointer and get needed bytes without the need to create a local temporary copy of the string expression.

Results are this one:

length of the numeric text = 270000

APPROACH_STRING1OVERLAY 0.243s, result = 1350000
APPROACH_MIDF 0.220s, result = 1350000 <--------FAST version of MID$
APPROACH_STAT 0.003s, result = 1350000
APPROACH_MID 7.070s, result = 1350000
APPROACH_STRING1OVERLAYWITHARRAYSUM 0.038s, result = 1350000


MIDF$ unction will be present in next thinBasic update.

Full code:

Uses "console"uses "stat"

function TBMain()
' -- Setup
hiResTimer_Init
string input = repeat$(30000, "123456789")

printL "length of the numeric text = " + len(input)
printl

' -- List all user functions to array
string functions()
function_list(functions, %Function_User)

' -- Benchmark all functions matching signature "APPROACH_*"
for f as long = 1 to countOf(functions)
if isLike(functions(f), "APPROACH_*", false) then
Benchmark(functions(f), input)
end if
next

waitKey
end function

function Benchmark(functionName as string, parameter as string) as long
ext delta
long result

hiResTimer_Get
' -- Call function by name
call functionName(parameter) to result
delta = hiResTimer_Delta ' Delta since last hiResTimer_* call

printl lset$(functionName, 50 using " "), ' -- Pad to left with spaces
format$(delta/1000000, "0.000") + "s" + ' -- Format it like we are interested in zero and 3 decimals
", result = " + result

return result

end function

function Approach_Mid(byRef input as string) as long
long result
for index as long = 1 to len(input)
result + mid$(input, index, 1)
next

return result
end function


function Approach_MidF(byRef input as string) as long
long result
for index as long = 1 to len(input)
result + midf$(input, index, 1)
next

return result
end function

function Approach_String1Overlay(byRef input as string) as long
dim TXT(len(input)) as string * 1 at strPtr(input)
long result
for index as long = 1 to len(input)
result + TXT(index)
next

return result
end function


function Approach_String1OverlayWITHArraySum(byRef input as string) as long
dim TXT(len(input)) as string * 1 at strPtr(input)
return Array sum TXT
end function

function Approach_Stat(byRef input as string) as long
dim ascii(len(input)) as byte at strPtr(input)
return stat_sum(ascii) - len(input)*48 ' -- Ha ha! ASCII values of numbers start at 48
end function