zak
22-04-2011, 08:56
i want to remind the users about using regular expressions in searching for patterns, suppose you want to search pi for your birthday year 1902 then zero or more of any digits then your wife birthday year 1924 without overlapping patterns, we will use the example as a template: VBRegExp_Test_MatchesAndCollections.tbasic in the C:\thinBasic\SampleScripts\VBRegExp
the pattern "1902.*?1924" and the text is the attached pi.txt (beware it is a continous digits up to 2 millions digits without new lines so your notepad or wordpad may hang in windows xp, i am using freeware notepad++ from http://notepad-plus-plus.org/release/5.9 it can display such file)
the search result will be saved to a file, since the console can't display the possible big text files.
if the string to search is 34190242819244412441902234192456 then applying the regex 1902.*?1924 will result in patterns:
19024281924
19022341924
the meaning of .*? in 1902.*?1924 : . any char or digit, * zero or more of the previous (.) , and we put ? to suppress the greedy behaviour of the engine from searching to the widest pattern possible to searc for the smallest patterns.
attached the same VBRegExp_Test_MatchesAndCollections.tbasic modified slightly and the pi.txt, i have attached pi.txt to experiment more with huge text but you can use any text and any regex.
'---The following code illustrates how to obtain a SubMatches collection from a regular
'---expression search and how to access its individual members.
Uses "VBREGEXP", "file"
dim lpRegExp as dword
dim lpMatches as dword
dim lpMatch as dword
Dim strValue, sPi As String
'---Allocate a new regular expression instance
lpRegExp = VBREGEXP_New
sPi = FILE_Load(APP_SourcePath + "pi.txt")
'---Check if it was possible to allocate and if not stop the script
if isfalse lpRegExp then
MSGBOX 0, "Unable to create an instance of the RegExp object." & $crlf & "Script terminated"
stop
end if
'---Set pattern
VBRegExp_SetPattern lpRegExp, "1902.*?1924"
'---Set case insensitivity
VBREGEXP_SetIgnoreCase lpRegExp, -1
'---Set global applicability
VBRegExp_SetGlobal lpRegExp, -1
'---Execute search
lpMatches = VBRegExp_Execute(lpRegExp, sPi)
IF ISFALSE lpMatches THEN
MSGBOX 0, "1. No match found"
else
dim nCount as long value VBMatchCollection_GetCount(lpMatches)
IF nCount = 0 THEN
MSGBOX 0, "2. No match found"
else
'---Iterate the Matches collection
dim I as long
strValue += "Total matches found: " & nCount & $CRLF & string$(50, "-") & $crlf
FOR i = 1 TO nCount
lpMatch = VBMatchCollection_GetItem(lpMatches, i)
IF ISFALSE lpMatch THEN EXIT FOR
strValue += "Match number " & i & " found at position: " & VBMatch_GetFirstIndex(lpMatch) & " length: " & VBMatch_Getlength(lpMatch) & $CRLF
strValue += "Value is: " & VBMatch_GetValue(lpMatch) & $CRLF
strValue += "--------------" & $CRLF
VBREGEXP_Release lpMatch
NEXT
'MSGBOX 0, strValue
'PrintL strValue
FILE_Save(APP_SourcePath +"results.txt",strValue)
END IF
END IF
IF istrue lpMatches THEN VBREGEXP_Release(lpMatches)
IF istrue lpRegExp THEN VBREGEXP_Release(lpRegExp)
MsgBox 0,"results saved to a results.txt"
the pattern "1902.*?1924" and the text is the attached pi.txt (beware it is a continous digits up to 2 millions digits without new lines so your notepad or wordpad may hang in windows xp, i am using freeware notepad++ from http://notepad-plus-plus.org/release/5.9 it can display such file)
the search result will be saved to a file, since the console can't display the possible big text files.
if the string to search is 34190242819244412441902234192456 then applying the regex 1902.*?1924 will result in patterns:
19024281924
19022341924
the meaning of .*? in 1902.*?1924 : . any char or digit, * zero or more of the previous (.) , and we put ? to suppress the greedy behaviour of the engine from searching to the widest pattern possible to searc for the smallest patterns.
attached the same VBRegExp_Test_MatchesAndCollections.tbasic modified slightly and the pi.txt, i have attached pi.txt to experiment more with huge text but you can use any text and any regex.
'---The following code illustrates how to obtain a SubMatches collection from a regular
'---expression search and how to access its individual members.
Uses "VBREGEXP", "file"
dim lpRegExp as dword
dim lpMatches as dword
dim lpMatch as dword
Dim strValue, sPi As String
'---Allocate a new regular expression instance
lpRegExp = VBREGEXP_New
sPi = FILE_Load(APP_SourcePath + "pi.txt")
'---Check if it was possible to allocate and if not stop the script
if isfalse lpRegExp then
MSGBOX 0, "Unable to create an instance of the RegExp object." & $crlf & "Script terminated"
stop
end if
'---Set pattern
VBRegExp_SetPattern lpRegExp, "1902.*?1924"
'---Set case insensitivity
VBREGEXP_SetIgnoreCase lpRegExp, -1
'---Set global applicability
VBRegExp_SetGlobal lpRegExp, -1
'---Execute search
lpMatches = VBRegExp_Execute(lpRegExp, sPi)
IF ISFALSE lpMatches THEN
MSGBOX 0, "1. No match found"
else
dim nCount as long value VBMatchCollection_GetCount(lpMatches)
IF nCount = 0 THEN
MSGBOX 0, "2. No match found"
else
'---Iterate the Matches collection
dim I as long
strValue += "Total matches found: " & nCount & $CRLF & string$(50, "-") & $crlf
FOR i = 1 TO nCount
lpMatch = VBMatchCollection_GetItem(lpMatches, i)
IF ISFALSE lpMatch THEN EXIT FOR
strValue += "Match number " & i & " found at position: " & VBMatch_GetFirstIndex(lpMatch) & " length: " & VBMatch_Getlength(lpMatch) & $CRLF
strValue += "Value is: " & VBMatch_GetValue(lpMatch) & $CRLF
strValue += "--------------" & $CRLF
VBREGEXP_Release lpMatch
NEXT
'MSGBOX 0, strValue
'PrintL strValue
FILE_Save(APP_SourcePath +"results.txt",strValue)
END IF
END IF
IF istrue lpMatches THEN VBREGEXP_Release(lpMatches)
IF istrue lpRegExp THEN VBREGEXP_Release(lpRegExp)
MsgBox 0,"results saved to a results.txt"