PDA

View Full Version : Mixed Case Formatter



kryton9
03-06-2008, 02:45
I ended up with two versions. Instead of confusing things, I will post here just one which I think is the best way to go.
First the code followed by usage:

Uses "ui"
Uses "File"
Uses "Console"
Dim Source , List As String
Dim Filter , s , l As String
Dim lcs, lcl As String
Dim SourceNumLines , x , i , ListNumLines As Long

Filter = "ThinBasic Files ( *.tBasic , *.tBasicc ) |*.tBasic;*.tBasicc|"
Filter += "All Files ( *.* ) |*.*"
Source = Dialog_OpenFile ( 0 , "Open a Source File" , Dir_GetCurrent , Filter , "tBasic" , %Ofn_FileMustExist Or %Ofn_HideReadOnly Or %Ofn_EnableSizing )
List = App_SourcePath + "MixedCaseWordListv2.txt"

Dim Sources As String Value File_Load ( Source )
Dim Lists As String Value File_Load ( List )
Dim FileOut As String Value "OutFileMixedCase.txt"
Dim OutPutText As String
Dim SourceLines ( ) , ListLines ( ) As String

Dim Count As Long
Dim pos As Long Value -1
Dim nTimes As dWord = 1

SourceNumLines = ParseCount ( Sources , $crlf )
SourceNumLines = Parse ( Sources , SourceLines , $crlf )

ListNumLines = ParseCount ( Lists , $crlf )
ListNumLines = Parse ( Lists , ListLines , $crlf )

For Count = 1 To SourceNumLines
Console_Cls
Print "Working" + $crlf
Print Count + " / " + SourceNumLines
For x = 1 To Len ( SourceLines ( Count ) )
SourceLines ( Count ) = lCase$ ( SourceLines ( Count ) )
Next
s = SourceLines ( Count )
For i = 1 To ListNumLines
l = ListLines(i)
lcs = lCase$(s)
lcl = lCase$(l)
While pos < Len(s)
pos = Instr( lcs , lcl ,nTimes)
If pos > 0 Then
Mid$(s,pos,Len(l)) = ListLines(i)
Incr nTimes
End If
If pos = 0 Then Exit While
Wend
pos = -1
nTimes = 0
Next
OutPutText += s + $crlf
Next
File_Save ( App_SourcePath + FileOut , OutPutText )

This program will ask you to open the script file you want to convert to mixed case.
It will then use "MixedCaseWordListv2.txt" to format your script and
put the a new file to "OutFileMixedCase.txt"

The beauty of this work flow is that all the complicated logic that would be involved is all handled by your word list.
I will attach an example list that is by no mean complete. But at least you can get an idea of how your list should be created and maintained.
Basically, override of previous words is done by how much further down the list the word is located.
So short words which appear often and will mess up the look: as to if on
will get overwritten when needed by
was ton gif won
which can be overwritten by words further down the list
wash button gift wonder

to make things easy, the attached file has a msworks spreadsheet. MSworks comes for free on all xp machines as far as I know.
I set up a simple sheet which takes your word list, column b and puts the length of the word in column a
Just add words to the bottom of your list.
Use fill down to copy the formula into the new rows for column a and then sort
col a asc
col b asc
Then select all your words in column b and paste them into "MixedCaseWordListv2.txt"

Probably seems a lot more scary than it is actually to do.

kryton9
03-06-2008, 03:51
Ok, just updated it so it can clean test.tbasic and itself :makeMixedCaseV2.tbasic

If you look at the code, it takes what would be tough to do otherwise, like determine when to make TBGL, TBGL, or tbgl. But this 2 pass method really makes it easy.

Remember, check to see if a word or keyword is in the big list first. For instance I had to add, Dim, Len etc.
Then run the program and check to see what happens to the new words added to the big list.

Then if not correct, copy the incorrect version and past it into the word list 2, with the correction.
incorrect,correct

The latest version will always be in the first post.

kryton9
03-06-2008, 07:07
I got a new idea that kicked in while updating the lists in the other style.

I now take the big list And then sort it by the word length in ascending order and then by the words in alphabetical order.
This way, let's take the word begin
when it sees
Be and then In later and possibly Gin even later on. Don't forget sorted by length and then by alphabet. So it would go something like this.

begin
Begin
BegIn
BeGin
Begin
Since Begin is 5 characters long, it overrides the first shorter matches.

I can't use replace$ for this so I am using instr and some mid$ trickery.
So far initial tests are good.

Still tinkering with tests and debug sessions tracing it.

But wanted to let you guys know maybe onto another route from the first, which works, but we have to really adjust for many scripts till we build up a great second override list.

If this second way works, only 1 list is needed.

Petr Schreiber
03-06-2008, 08:44
Hi Kent,

very interesting proggie.
Sometimes it has a bit funny output, like:


fOlder
cOnSole_wrItelIne


The biggest "danger" is that it affects even string literals. Maybe better to use tokenizer to get the words, then analyse if they are ok to replace or string literal and then you can safely REPLACE$ just the token you handle.


Thanks,
Petr

kryton9
03-06-2008, 22:33
Petr, that is what I meant by garbled and then you can fix it in word list two.

The second method I am developing is the answer. There is a bug that is hard to trace even with the debugger as it goes through so many words it is too tedious to do step by step. Hopefully today I can solve the puzzle and have a nice working version.

kryton9
04-06-2008, 00:28
The program is finished. I will change my first post and fill in the details there.