Results 1 to 8 of 8

Thread: Tail recursion optimization?

  1. #1

    Question Tail recursion optimization?

    I have a tail recursive piece of code that could potentially call itself recursively up to thousands of times, and overall be executed up to millions of times each time the program is run. This raises some concerns regarding performance, and the size of the stack available to thinBasic.
    That's why I have these questions:
    • Does thinBasic optimize/eliminate tail recursive calls?
    • If the answer to the previous is no, do I run the risk of overflowing the stack, if the function I'm talking about takes one potentially 240-character long string, one 6-character long string, and one dword as arguments?
    • Is there any profiling information available regarding the FILE_Exists function from the FILE module? My recursive function involves it, so it could be called millions of times as well. I would like to have even a rough estimate as to how long it takes.



    This is the code:
    uses "FILE"
    $myseparator = "_-" 
    
    Function uniquefilename(unsafe As String) As String
    If Not FILE_Exists(unsafe) Then Return unsafe
    Return sequencefilename(getpathandfilename(unsafe), getdotext(unsafe),0)
    End Function
    
    Function sequencefilename(filename As String, dotext As String,_
                              seqnum As DWord) As String
    If Not FILE_Exists(filename & $myseparator & tob32(seqnum) & 
                       dotext) Then _
          Return filename & $myseparator & tob32(seqnum) & dotext                        
    Return sequencefilename(filename,dotext,seqnum+1)
    End Function
    
    tob32 is a function that encodes a number as a base 32 string. As for the other functions, getpathandfilename and getdotext, well, you can probably imagine what they do.

  2. #2
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,814
    Rep Power
    10
    Ciao,

    recursion is always a possible source of problems and will for sure slow down execution due to the necessity to allocate/deallocate local functions stack in a controlled way.
    thinBasic does not creates real compiled machine code applications so the way it handle local stack is completely different from compiled applications.
    In thinBasic stacks are an array of pointers (one for each stacked call to a function) to hash tables created/destroyed on the fly in heap memory, the same memory used by any other allocated variable.
    Every function call has its own hash table used to keep track of function local variables and parameters.

    That said, you need to make your tests in a test controlled environment and see what happen.
    Factors that can influence thinBasic capability to resists to many recursive calls and speed execution are many.
    On speed execution we can work to try to optimize it in some way (for example developing external compiled DLL to do part of the job)

    To your questions:
    1. no optimization takes place on recursive functions
    2. that risk is always there. Influenced factors are so many that is difficult to me to reply. That's why I said you need to try
    3. I'm not aware File_Exists has some performance issue but if you suspect or have evidences that it can be optimized, I can work on it.
      Internally it is developed like the following (Power Basic code):
        FUNCTION File_Exists(BYVAL FullFileName AS STRING) AS LONG    FUNCTION = %FALSE
          IF DIR$(FullFileName, %NORMAL OR %READONLY OR %HIDDEN OR %SYSTEM) = "" THEN EXIT FUNCTION
          FUNCTION = %TRUE
        END FUNCTION
      
      There are at least two point where I can optimize a bit (passing file name by reference and testing for length instead of string)


    Looking at your functions maybe you can get some heklp on File_PathSplit function: http://www.thinbasic.com/public/prod..._pathsplit.htm
    Calling a native thinBasic function is always hundred of times faster than calling script functions.

    Regarding profiling, if your script does not crash due to recursion ... thinBasic has a rough profiler that can help you.
    Just add the following line to your source code
    #Profile on, %TRUE
    
    and you will have the following: http://www.thinbasic.com/public/prod...ml?profile.htm
    It measure all script function call measuring execution time.
    It also save a .csv file in the same directory of your script.

    Hope this can help.
    Keep me informed, I will be happy to help on this massive task if I can.

    Ciao
    Eros
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  3. #3
    Thanks, Eros, for the advice about the File_Pathsplit function, and for the insight into how thinBasic operates internally. From your reply, I figure that a typical program would be more likely to run out of heap memory altogether, long before it could overflow the stack.

    I have to suggest, then, a seemingly easy optimization for tail calls (not just tail recursive): destroy the previous "local stack" and its pointer, and replace it with a pointer to the new local stack. This could decrease memory usage dramatically, at perhaps a slightly increased runtime cost (the time thinBasic takes to figure out if a call is a tail call or not). Or perhaps a "TAIL" or "tailcall" keyword could be introduced, to indicate to thinBasic that this optimization can be done on a given function or call.
    Though I don't know if this would be actually easy to implement in thinBasic.

    As for the recursive function I posted, I expect it to typically be executed less than a hundred times each time the program runs, so optimizing it isn't as high a priority as my tone might have implied. But then, you never know when you're going to find the madman who tries to create five thousand files, all named "%HOMEPATH%\a.txt"
    Still, I might run some tests one of these days.

    Also, congratulations and thanks for the #profile directive. That's a big win for thinBasic.

  4. #4
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,814
    Rep Power
    10
    Quote Originally Posted by Vandalf View Post
    I have to suggest, then, a seemingly easy optimization for tail calls (not just tail recursive): destroy the previous "local stack" and its pointer, and replace it with a pointer to the new local stack. This could decrease memory usage dramatically, at perhaps a slightly increased runtime cost (the time thinBasic takes to figure out if a call is a tail call or not). Or perhaps a "TAIL" or "tailcall" keyword could be introduced, to indicate to thinBasic that this optimization can be done on a given function or call.
    Though I don't know if this would be actually easy to implement in thinBasic.
    Yes, not easy at the moment mainly for my time. I want to release new version with new IDE and working on implementing any data type inside TYPE/END TYPE. That's my priority.
    But I will study a solution for tail calls.

    Quote Originally Posted by Vandalf View Post
    As for the recursive function I posted, I expect it to typically be executed less than a hundred times each time the program runs, so optimizing it isn't as high a priority as my tone might have implied. But then, you never know when you're going to find the madman who tries to create five thousand files, all named "%HOMEPATH%\a.txt"
    Still, I might run some tests one of these days.
    Le me know about your tests, I'm curious.

    Quote Originally Posted by Vandalf View Post
    Also, congratulations and thanks for the #profile directive. That's a big win for thinBasic.
    Thanks. It slow down execution a bit but can show interesting information for medium/complex scripts.
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  5. #5
    For reasons unknown, my copy of thinBasic is unable to generate timing data when profiling. I did try running it as administrator. All it ever shows in the time columns is 0.0000. The rest of the profiling data are shown correctly.
    In the end, I had to use a watch.

    In an empty folder, uniquefilenamev1 generated 201 files in 12.5 seconds.
    In an empty folder, uniquefilenamev2 generated 201 files in 16 seconds.
    This was all with #profile on.
    I did some more tests, but I don't count them because I forgot to note if #profile was on or off during those. Still, they showed the same tendency where v1 is faster than v2.

    The difference between the two variants is that v1 uses the tob32 function, which takes a number and returns a corresponding base 32* string, while v2 uses the b32plusone function, which takes a base 32* string and returns the equivalent of converting it to a number, adding one to it, and then applying the tob32 function to that new number.

    *Custom encoding, not conforming to RFC 4648.

    This is the script:
    Uses "FILE"          
    #PROFILE On                       
    $tob32_charset = "0123456789ABCDEFGHJKMNPQRTUVWXYZ"
    '0123456789ABCDEFGH JK MN PQR TUVWXYZ
    %tob32_charsetptr = StrPtr($tob32_charset) As DWord
    
    $testfilename = "C:\users\usr1\tst\ab.txt"               
    
    For i As Word=0 To 200
      FILE_Append( uniquefilenamev1( $testfilename ) , "hello world" )
    Next      
    
    MsgBox 0, "done"                                                               
    
    Function uniquefilenamev1(unsafe As String) As String
    If Not FILE_Exists(unsafe) Then Return unsafe
    Return sequencefilenamev1( _  
               FILE_PathSplit(unsafe, %PATH_ROOTPATHFILE), _
               "." & FILE_PathSplit(unsafe, %PATH_EXT), _
               0)
    End Function     
    
    Function uniquefilenamev2(unsafe As String) As String
    If Not FILE_Exists(unsafe) Then Return unsafe
    Return sequencefilenamev2( _  
               FILE_PathSplit(unsafe, %PATH_ROOTPATHFILE), _
               "." & FILE_PathSplit(unsafe, %PATH_EXT), _
               "0")
    End Function                              
                                
    Function sequencefilenamev1(filename As String, dotext As String, seqnum As Number) As String
    Dim retfilename As String = filename & "-" & tob32(seqnum) & dotext 
    If Not FILE_Exists(retfilename) Then Return retfilename                              
    Return sequencefilenamev1(filename,dotext,seqnum+1)
    End Function    
      
    Function sequencefilenamev2(filename As String, dotext As String, seqnum As String) As String
    Dim retfilename As String = filename & "-" & seqnum & dotext 
    If Not FILE_Exists(retfilename) Then Return retfilename                              
    Return sequencefilenamev2(filename,dotext,b32plusone(seqnum))
    End Function    
    
    Function tob32(nu As Number) As String
    Static n As DWord 
    Static res As String
    res=""
    Repeat
    n=nu And 31
    res= Memory_Get(%tob32_charsetptr+n,1) & res
    nu=SHIFT RIGHT nu, 5
    Until nu=0        
    Return res
    End Function                            
                                
    Function b32plusone(thestring As String) As String
    'returns a base 32 string with a value of one plus the given string's value 
    'the encoding it assumes is 0123456789ABCDEFGH JK MN PQR TUVWXYZ
    'it is case sensitive
    Static carry,tmp As Byte                
    Static i,j As DWord
    i= Len(thestring)
    j= StrPtr(thestring)-1    
    carry=0
    tmp=Asc(thestring,i)
    '                     HI,KL,NO,RS,9:,Z[ 
    If(In(Poke(j+i,tmp+1),73,76,79,83,58,91)>0) Then _
       If (In(Poke(j+i,tmp+2),59,92)>0) Then _
          If Poke(j+i,tmp+8)<>65 Then _
             carry=Poke(j+i,48)
    For i =i-1 To 1 Step -1 While ((carry>0) And (Asc(thestring,i)=90))
      Poke(j+i, 48)  
    Next          
    
    If (i<=0) Then 
      If (carry>0)Then Return "1" + thestring
    Else    
      tmp=Asc(thestring,i)
      If (carry>0) Then carry = 1
      If(In(Poke(j+i,tmp+carry),73,76,79,83,58)>0) Then _
        If Poke(j+i,tmp+2)=59 Then Poke(j+i,65)
    End If   
             
    Return thestring
    End Function
    
    Last edited by Vandalf; 21-03-2017 at 12:51.

  6. #6
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,814
    Rep Power
    10
    Ciao Vandalf,

    very interesting script and needs.
    I think Profiler is not showing timing due to impossibility (for some reasons) to use high resolution timing functions: https://msdn.microsoft.com/en-us/lib...(v=vs.85).aspx
    For future versions I will add some features allowing to understand if high resolution timing functions are available and if not automatically switch to a less precise timing.

    I've simplified a little your script in order to focus on few aspects.
    On my computer creating 201 files on an empty directory takes about 3.5 seconds.
    I will be into a 2 days meeting at work but I will recap this in the week-end.
    My idea is to try to optimize (if possible) some functions already used here in your script and/or develop new specific functions in Core or File modules.

    I will let you know.

    Thanks
    Eros

    Uses "FILE"
    uses "Console"
    
    
    #PROFILE On
    
    
    $tob32_charset = "0123456789ABCDEFGHJKMNPQRTUVWXYZ"
    '0123456789ABCDEFGH JK MN PQR TUVWXYZ
    %tob32_charsetptr = StrPtr($tob32_charset) As DWord
    
    
    '---Create a path under script directory where to save files 
    $testPath = APP_ScriptPath & "Profile_NoTiming\"
    DIR_MakeAll($testPath)
    
    
    '---Initial file name
    $testfilename = $testPath & "ab.txt"
    
    
    string sFileName
    
    
    double t0, t1, t2, t9
    
    
    t0 = Timer
    printl "Starting time:", Time$
    For i As Word=0 To 200
      t1 = Timer
      sFileName = uniquefilenamev1( $testfilename )
      t2 = timer
      printl i, sFileName, format$(T2 - T1, "#0.0000"), "secs"
      FILE_append(sFileName, "hello world")
    Next
    printl "Ending time:", Time$
    t9 = Timer
    
    
    printl "All done in", format$(T9 - T0, "#0.0000"), "secs"
    WaitKey
    
    
    
    
    Function uniquefilenamev1(unsafe As String) As String
      If Not FILE_Exists(unsafe) Then
        function = unsafe
      Else
        function = sequencefilenamev1( _
                 FILE_PathSplit(unsafe, %PATH_ROOTPATHFILE), _
                 "." & FILE_PathSplit(unsafe, %PATH_EXT), _
                 0)
      end if
    End Function
     
    Function sequencefilenamev1(filename As String, dotext As String, seqnum As long) As String
      string retfilename = filename & "-" & tob32(seqnum) & dotext
    
    
      If Not FILE_Exists(retfilename) Then
        function = retfilename
      Else
        function = sequencefilenamev1(filename, dotext, seqnum + 1)
      end If
    End Function
    
    
    Function tob32(nu As long) As String
      Static n As DWord
      dim res As String
      
      'res = ""
      Repeat
        n = nu And 31
        res = Memory_Get(%tob32_charsetptr + n, 1) & res
        nu = SHIFT RIGHT nu, 5
      Until nu = 0
      
      function = res
    End Function
    
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  7. #7
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,814
    Rep Power
    10
    Ciao Vandalf,

    the following version of the scripts takes about 0.4 seconds in my computer on an empty directory and 7.2 seconds into a directory already having 201 files.

    Trick is to add
    Static lSequence as Long
    
    inside uniquefilenamev1 function in order to remember the last sequence and avoid to repeat checking all the time from the beginning (zero).

    It is a trick for this situation, maybe it cannot be used in all situations.
    For example if there is a situation where a process creates files and another process remove files, this can create "holes" in the sequence and the below functions would not intercept it during execution. It would fill the gaps during a successive execution.

    Let me know.
    Ciao
    Eros


    Uses "FILE"
    uses "Console"
    
    
    #PROFILE On
    
    
    $tob32_charset = "0123456789ABCDEFGHJKMNPQRTUVWXYZ"
    '0123456789ABCDEFGH JK MN PQR TUVWXYZ
    %tob32_charsetptr = StrPtr($tob32_charset) As DWord
    
    
    '---Create a path under script directory where to save files 
    $testPath = APP_ScriptPath & "Profile_NoTiming\"
    DIR_MakeAll($testPath)
    
    
    '---Initial file name
    $testfilename = $testPath & "ab.txt"
    
    
    string sFileName
    
    
    double t0, t1, t2, t9
    
    
    t0 = Timer
    printl "Starting time:", Time$
    For i As Word=0 To 200
      t1 = Timer
      sFileName = uniquefilenamev1( $testfilename )
      t2 = timer
      printl i, sFileName, format$(T2 - T1, "#0.0000"), "secs"
      FILE_append(sFileName, "hello world")
    Next
    printl "Ending time:", Time$
    t9 = Timer
    
    
    printl "All done in", format$(T9 - T0, "#0.0000"), "secs"
    WaitKey
    
    
    
    
    Function uniquefilenamev1(unsafe As String) As String
      Static lSequence as Long
      
      If Not FILE_Exists(unsafe) Then
        function = unsafe
      Else
        function = sequencefilenamev1( _
                 FILE_PathSplit(unsafe, %PATH_ROOTPATHFILE), _
                 "." & FILE_PathSplit(unsafe, %PATH_EXT), _
                 lSequence)
        lSequence += 1
      end if
    End Function
     
    Function sequencefilenamev1(filename As String, dotext As String, seqnum As long) As String
      string retfilename
    
    
      retfilename = filename & "-" & tob32(seqnum) & dotext
      If Not FILE_Exists(retfilename) Then
        function = retfilename
      Else
        function = sequencefilenamev1(filename, dotext, seqnum + 1)
      end If
    
    '---Non recursive version of the function---
    '  retfilename = filename & "-" & tob32(seqnum) & dotext
    '  While FILE_Exists(retfilename)
    '    seqnum += 1
    '    retfilename = filename & "-" & tob32(seqnum) & dotext
    '  wend
    '  function = retfilename
      
    End Function
    
    
    Function tob32(nu As long) As String
      Static n As DWord
      dim res As String
      
      'res = ""
      Repeat
        n = nu And 31
        res = Memory_Get(%tob32_charsetptr + n, 1) & res
        nu = SHIFT RIGHT nu, 5
      Until nu = 0
      
      function = res
    End Function
    
    Last edited by ErosOlmi; 24-03-2017 at 15:18.
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  8. #8
    Thank you, Eros. That's a pretty clever solution.
    I was thinking of a more algorithmic strategy: a binary search.
    Using decimal numbers for readability, these could be the filenames checked with such a strategy:
    "ab-1.txt"	'exists
    "ab-2.txt" 	'exists
    "ab-4.txt" 	'exists
    "ab-8.txt" 	'exists
    "ab-16.txt" 'doesn't exist
    "ab-12.txt" 'exists
    "ab-14.txt" 'exists
    "ab-15.txt" 'doesn't exist, and is the "least" filename available
    
    This assumes that there are no gaps in the sequence of files. If gaps existed, this strategy would give erratic results, filling some gaps but not others. Your strategy is better in that regard, in my opinion: I want the numeric part of the filename to reflect (at least partially) the order of creation of the files.
    Your strategy is also much better in that it resolves filenames in constant time, whereas a binary search takes logarithmic time. Both are still a massive improvement over the original, which takes linear time.

    The full background to my initial problem is this:
    I am developing a web scraping program. To paraphrase, a program that parses an HTML file (a webpage) and selectively downloads files from hyperlinks found in the HTML file.
    I quickly discovered that this program would overwrite previously downloaded files that had the same original name. I went through several quick and dirty solutions to that problem, finally settling on what I posted earlier in this thread.
    Then I started to wonder about the worst-case performance: When a malicious webmaster or an oblivious user prepares a webpage linking to many (hundreds or thousands of) different files, all with the same filename (or with a small number of different names). I would try to scrape that page, and my program would work for a while, but it would eventually slow down and practically freeze, simply due to the sheer number of filenames it would have to check to find the first one available.
    This worst-case scenario is not too far-fetched: smartphones, for example, tend to give very simplistic default names to user-created files. I have seen hundreds of different files named "16 - 1.jpg" in Google Plus.

    My next step should be to patch your solution into the web scraper. During one scraping session, when my program detected long sequences of filenames in the downloads folder, it would add their "root filename" to a dictionary. The dictionary would keep track of the length of those sequences. It would be a generalized form of your solution.
    I would eventually want to also devise a strategy to speed up the initial phase of assessing the length of a sequence of filenames. Likely involving the OS_Shell function and the "dir /b" and "find /c" commands.

    So thanks again, Eros. You're being great help.

Similar Threads

  1. Tree Recursion
    By peter in forum Sources, Templates, Code Snippets, Tips and Tricks, Do you know ...
    Replies: 13
    Last Post: 12-05-2017, 08:58
  2. Tree Recursion 2
    By primo in forum TBGL General
    Replies: 1
    Last Post: 21-02-2016, 10:38
  3. A nice (verbal) example of recursion
    By RobbeK in forum General
    Replies: 2
    Last Post: 25-03-2014, 02:05
  4. Various types of recursion
    By Petr Schreiber in forum Sources, Templates, Code Snippets, Tips and Tricks, Do you know ...
    Replies: 6
    Last Post: 25-05-2010, 06:38
  5. Array vs type optimization
    By MouseTrap in forum Power Basic
    Replies: 6
    Last Post: 04-03-2009, 11:05

Members who have read this thread: 0

There are no members to list at the moment.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •