Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: ARRAY SCAN results

  1. #1

    ARRAY SCAN results

    Greets,

    I'm just learning thinBasic and hate to ask a question or request help, but after reading the Help file and checking sample scripts I have run into a problem.

    Quick background: I'm working with an elevation file and I have a LOT of elevation files! First six lines are header data and then there are 1201 lines of elevation data, each with 1201 entries. Any geo-referenced point with no data gets assigned -9999. What I'm attempting to do on a batch basis is read the file into an array with Parse(FILE_Load(FileToLoad), MyMatrix(), $CRLF , $SPC). Then I use nRec = Array Scan MyMatrix() , = "-9999" to scan the array and count all instances of -9999.

    The file I'm testing with has 1202 instances of -9999, but my little program tells me there are 2422 occurrences??

    I would be very appreciative if someone could take a quick look and tell me where my beginner skills went awry? Many, many years ago I worked in Microsoft's PDS 7.0, but things have changed from that time and my mind isn't quite as sharp.

    The file location would need to be adjusted in the thinBasic file (line 4) and please excuse some of the other code, as I put things in just to make sure I'm following along OK.

    Thank you for any assistance,
    Lance
    Attached Files Attached Files
    Last edited by ErosOlmi; 17-11-2011 at 19:09. Reason: Removed link to mediafire and attached file here

  2. #2
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,817
    Rep Power
    10
    Dear Lance,

    sorry for the delay but your post has been automatically moderated by forum anti-spam software.
    This is usually done till a user has reached at least 4 posts published.
    Anyway, now I've got it and released from vault area.

    Ok, the problem is that ARRAY SCAN ... works on arrays and not on matrices so when you use it on a matrix you get incorrect results. Maybe I will add a check and create a run time error.
    I got your code and amended a bit in order to be able to determine missing data:
    uses "file"
    uses "console"
    
    
    Dim FileToLoad  As String Value APP_SourcePath & "cgn21w001.asc.txt"
    dim MyMatrix()  as string
    dim nLines      as long
    dim nCols       as long
    dim T0, T1      as quad
    
    
    Dim sBuffer     As String
    Dim InfoBuffer  As String
    Dim DataBuffer  As String
    Dim lPos        As Long
    Dim nRec        As Long
    
    
      '---Just one line do the job of loading file data, parsing text lines, dimensioning and filling the matrix.
      '------
      '---Load full file into a string buffer
    PrintL "Loading file " & FileToLoad
      sBuffer = FILE_Load(FileToLoad)
      
    PrintL "getting info and data parts ..."
      '---Now we have to remove first 6 lines
        '---First find the 6th occurrence of $CRLF
          lPos = InStr(sBuffer, $CRLF, 6)
        '---Than create two buffers: one for the info part and one for data part
          InfoBuffer = LEFT$(sBuffer, lPos)
          DataBuffer = Mid$(sBuffer, lPos + 2)
      
    PrintL "Creating Matrix data ..."
      '---Now we parse data buffer
      Parse(DataBuffer, MyMatrix(), $CRLF , $SPC)
    
    
      '--Now get the number of lines and max number of columns parsed
      nLines = ubound(MyMatrix(1))
      nCols  = ubound(MyMatrix(2))
    
    
    PrintL "Lines:", nLines, "Columns:", nCols
      
    '---Write some info
    PrintL "Searching missing data ..."
    
    
    dim CountLine   as long
    dim CountCol    as long
    
    
    For CountLine = 1 To nLines
      For CountCol = 1 To nCols
        If MyMatrix(CountLine, CountCol) = "-9999" Then
          Incr nRec
        End If
      Next
    Next
    PrintL "Missing data found:", nRec
    
    
     
    PrintL Repeat$(79, "-")
    PrintL "Program terminated. Press any key to close."
    WaitKey
    
    Let me know and sorry again for the delay.

    Ciao
    Eros

    PS: a matrix scan can be a nice idea
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  3. #3
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,817
    Rep Power
    10
    Here a version with some timing handling

    uses "file"
    uses "console"
    
    
    Dim FileToLoad  As String Value APP_SourcePath & "cgn21w001.asc.txt"
    dim MyMatrix()  as string
    dim nLines      as long
    dim nCols       as long
    dim T0, T1      as quad
    
    
    Dim sBuffer     As String
    Dim InfoBuffer  As String
    Dim DataBuffer  As String
    Dim lPos        As Long
    Dim nRec        As Long
    Dim MyTimer     As cTimer
    
    
    MyTimer = New cTimer("Timer used to store elapsed time between stages")
    MyTimer.Start
    
    
      '---Just one line do the job of loading file data, parsing text lines, dimensioning and filling the matrix.
      '------
      '---Load full file into a string buffer
    PrintL "Input file: " & FileToLoad
    Print "Loading file ... "
      sBuffer = FILE_Load(FileToLoad)
    PrintL MyTimer.Elapsed
    
    
    Print "Getting info and data parts ..."
      '---Now we have to remove first 6 lines
        '---First find the 6th occurrence of $CRLF
          lPos = InStr(sBuffer, $CRLF, 6)
        '---Than create two buffers: one for the info part and one for data part
          InfoBuffer = LEFT$(sBuffer, lPos)
          DataBuffer = Mid$(sBuffer, lPos + 2)
    PrintL MyTimer.Elapsed
    PrintL "Info size in bytes: ", Len(InfoBuffer)
    PrintL "Data size in bytes: ", Len(DataBuffer)
    
    
    
    
    Print "Creating Matrix data ... "
      '---Now we parse data buffer
      Parse(DataBuffer, MyMatrix(), $CRLF , $SPC)
    
    
      '--Now get the number of lines and max number of columns parsed
      nLines = ubound(MyMatrix(1))
      nCols  = ubound(MyMatrix(2))
    PrintL MyTimer.Elapsed
    
    
    PrintL "Lines:", nLines, "Columns:", nCols
      
    '---Write some info
    Print "Searching missing data ... "
    
    
    dim CountLine   as long
    dim CountCol    as long
    
    
    For CountLine = 1 To nLines
      For CountCol = 1 To nCols
        If MyMatrix(CountLine, CountCol) = "-9999" Then
          Incr nRec
        End If
      Next
    Next
    PrintL MyTimer.Elapsed
    PrintL "Missing data found:", nRec
    
    
     
    PrintL Repeat$(79, "-")
    PrintL "Total time:", MyTimer.Elapsed
    MyTimer.Stop
    
    
    PrintL "Program terminated. Press any key to close."
    WaitKey
    
    Almost most of the time is taken by process that build MyMatrix that is Parse function.
    Maybe I can improve this process in future thinBasic versions.
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  4. #4
    Thank you, Eros. Your code works well, based on initial tests. Only 15,000 more files to run it on!

    I guess my confusion was in defining an "Array" versus a "Matrix", but that's clear enough for me now.

    My work in thinBasic has just begun and I anticipate there will be more than enough questions to get past the threshold of four posts. I posted a couple of replies about an inline If...Then...Else If issue, but those don't show on the post count.

    Thanks for your work and assistance. I'll do my best to research before asking. Though I may have a couple of generalized questions in a couple of days.

    Lance

  5. #5
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,817
    Rep Power
    10
    Quote Originally Posted by LCSims View Post
    I posted a couple of replies about an inline If...Then...Else If issue, but those don't show on the post count.
    You mean this post: http://www.thinbasic.com/community/p...hp?issueid=296
    It was made into support area and unfortunately that area is not considered by anti spam system working in this forum.
    In any case you are now aboard
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  6. #6
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,817
    Rep Power
    10
    Quote Originally Posted by LCSims View Post
    Thank you, Eros. Your code works well, based on initial tests. Only 15,000 more files to run it on!
    A lot of data.
    You posted an example of 8Mb file times 15000 files is around 120Gb of data.
    Posted file has 1201 rows times 1201 columns so it has 1442401 items times 15000 files it will bring you more than 21636015000 items (more than 21 billions)

    Depending on what you need to do with such a huge amount if data, maybe consider making a script that loads data into a database for further analyzing in there.
    You can make a table with few fields:
    1. fileID (used to back identify the source of data)
    2. row
    3. column
    4. value
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  7. #7
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,817
    Rep Power
    10
    Dear Lance,

    thanks to your example I was able to improve execution speed of PARSE function when used with quadratic string buffers by almost 10 times.

    Here the posted example working with 8Mb file for 1201 x 1201 string matrix was taking 24 seconds to be executed.
    With the new version of PARSE function it will take 2.5 seconds.
    I was making too much parsing string operation inside a loop while I could do it just once.

    I will post very soon a preliminary thinBasic 1.9 version so you will be able to post.
    It should change a lot the time prospective to have have to manage 15000 files.

    Ciao
    Eros
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  8. #8
    Greets Eros,

    Something good came out of my hacking? I'm pleased!

    My data sets cover the planet, as I make add-ons for a flight simulator. The 1201x1201 sets are very low resolution compared to some other areas. The main body of the U.S. comes in 11801x11801 for one degree of latitude and longitude. Even more precise data gets much, much larger. I found thinBasic because the other variant I was using choked on the 11801x data, literally froze up. I'll be making quarter files in a lot of areas, as adjoining areas may have to be loaded and I'd hate to tax my system with three files loaded into "matrixes" of 11801x.

    Right now I'm preparing to start looping through the 15,000 valid files to see how much, if any invalid data is in the files. Some is to be expected. Running on a secondary computer should take about 15 hours or so, maybe more? But that's what secondary systems are for!

    Thanks again for your efforts. I'll come back next week with a couple of generalized questions.

    Lance

  9. #9
    thinBasic author ErosOlmi's Avatar
    Join Date
    Sep 2004
    Location
    Milan - Italy
    Age
    57
    Posts
    8,817
    Rep Power
    10
    OK, I've uploaded a preliminary thinBasic 1.9.0.0

    It implements a super fast version of PARSE function when used in "quadratic" string buffer to be parsed into a matrix.
    As I said, in my tests usinf Lance data (attached to first post of this thread) I passed from 22 seconds to less than 1 second for the parsing process.

    I've also introduced a new optional parameter used to reduce the time spent during number of columns determination.
    To determine maximum number of columns in the string buffer, PARSE scans all the lines determining which line has the maximum number of columns in order to dimension the matrix.
    If one is secure that the number of columns are fixed for all the lines, this new parameter can be set to one for telling PARSE function to just use 1 line for number of columns determination.

    Url is: http://www.thinbasic.biz/projects/th...ic_1.9.0.0.zip

    Let me know.
    Eros
    www.thinbasic.com | www.thinbasic.com/community/ | help.thinbasic.com
    Windows 10 Pro for Workstations 64bit - 32 GB - Intel(R) Xeon(R) W-10855M CPU @ 2.80GHz - NVIDIA Quadro RTX 3000

  10. #10
    Thank you for the update, Eros. I have a sub-set of data files that I use to test my programming on. One folder has 50 files, the other 92. Each file maintains the same structure as the file you looked at, 1801 rows and 1801 columns. I have my program set up to loop through all the files in a folder, with a timer running. I'll refer to 1.8.9 as old and 1.9.0 as new;

    50 files old = 1206.7sec new = 161.2sec
    92 files old = 2221.6sec new = 313.9sec

    A big savings, especially when working on a large scale. I didn't implement the optional parameter setting for columns, but will take a look and learn about that. As the data sets are uniform in structure, that parameter could be worthwhile.

    Lance

Page 1 of 2 12 LastLast

Similar Threads

  1. Windows 7, system scan and fix, cool command
    By kryton9 in forum Shout Box Area
    Replies: 3
    Last Post: 24-06-2011, 23:25
  2. Search function of forum gives weird results
    By Michael Hartlef in forum General
    Replies: 9
    Last Post: 03-10-2008, 12:51
  3. Help file: results of your efforts
    By ErosOlmi in forum Samples for help file
    Replies: 7
    Last Post: 30-05-2007, 17:23
  4. Speed Test Results
    By catventure in forum General
    Replies: 3
    Last Post: 17-10-2006, 19:04

Members who have read this thread: 0

There are no members to list at the moment.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •