PDA

View Full Version : Tokenizer: Configurizable?



Michael Hartlef
29-04-2007, 21:08
Hi folks,

I looked over the tokenizer sample script and I liked what I saw. My question is:

Can the returning token types by configurized, means adding new ones or changing current ones?

Michael

ErosOlmi
01-05-2007, 14:27
Hi Mike,

please give me more details about your idea. I like it a lot but would like to know more.
I imagine you are talking about the possibility to pass lists of tokens and assigning them a new token type or something like that.

Let me know.
Eros

Michael Hartlef
01-05-2007, 15:07
Exactly, passing a list of new tokens and assigning them to a new token type.
Or changing the current tokentypes, means what they are assigned to.

To learn thinBasic better I got the idea to port/recode my FANTOM project to thinBasic. Nothing to compete with thinBasic, no no. Just to see how far I could get. And the tokenizer is a great functionality of TB. I know, I could make my own GetToken function, where I take the output of the TB command, and analyse it. But if there would be a way to customize the tokenizer, then it would be great. Even right now it is a great functionality.

ErosOlmi
01-05-2007, 15:19
OK Mike. I will see what I can do. I like it and should not be so difficult.

kryton9
02-05-2007, 09:52
Mike, I looked over your new site a few times in the passsed few weeks, I couldn't find the fantom page. I really loved that logo you had done for it and the coloring scheme on the page.

Maybe you can be part of thinBasicII development team in the future. TB2, with full support for objects, like smalltalk and ruby but faster!!

We need to use that fantom logo somewhere in the game. That was sweet looking from what I remember!!

ErosOlmi
02-05-2007, 16:13
Mike,
a possible idea:

Tokenizer_GetNextToken would change from


Tokenizer_GetNextToken(MyBuffer, CurrentPosition, TokenType, Token)

to


Tokenizer_GetNextToken(MyBuffer, CurrentPosition, TokenType, Token [, TokenSubType])

so an optional sub type can be returned.

Than I create a new module function that allows to add new special keys. Something like:


Tokenizer_KeyAdd(NewKeyString, KeyMainType, KeySubType)

For example you can add keys like:


%Fantom = 100
%Fantom_Var = 1
%Fantom_Int = 2
%Fantom_As = 3

Tokenizer_KeyAdd("var", %Fantom, %Fantom_Var)
Tokenizer_KeyAdd("int", %Fantom, %Fantom_Int)
Tokenizer_KeyAdd("as" , %Fantom, %Fantom_As)


When Tokenizer_GetNextToken will find a token of %TOKENIZER_STRING type will make an additional search between the user added keywords. If found the new main type and an optional subtype will be returned.

I will use an internal hash table that will store a special structure for every new key added.

What do you think? Any other idea?
Let me know.
Eros

ErosOlmi
02-05-2007, 17:37
Mike,

sorry but I didn't wait for your reply :P
Please get attached new module and an example. Can give you an idea of what I was talking about.
I'm still open to any request or change.

Let me know.
Eros

Michael Hartlef
02-05-2007, 18:28
Thanks Eros, I will give it a whirl :D

Michael Hartlef
02-05-2007, 18:41
Works like a charm. I like it.

Do see a possibility to change the type of the predefined ones?
For an example, the $ character is a delimiter right now. I would like to include it inside the STRING type. Would there be an easy way for you to implement this? Or the ":" character, I would like to change this to be an EOL token.

I hope I made myself clear. If not, please aks again.

ErosOlmi
02-05-2007, 18:46
Let me check.
I say now "yes, of course" but I need to change a bit of code because delimiters are now hard coded.
I will change them from hard coded to dynamic with possibility to change predefined main type and sub type.

Michael Hartlef
02-05-2007, 19:33
What ever you think is fine. Like I said, even now I can perfectly work with it. Thanks for all the improvements.

ErosOlmi
02-05-2007, 20:57
Mike, I've done it. Hope you will like.
You will be able to set any of the 255 ascii table char to any of the following groups: NewLine, Space, DQuote, Delim, Numeric, Alpha
This at any time and anywhere during script execution. Yu will also able to set in group, so pass a list of chars to be set into a group.

An example:


'---Create a new keywords group. Assign a value >= 100
dim MyDictionary as long value 100

'---Add keywords to the group
tokenizer_KeyAdd("DIM" , MyDictionary, 1)
tokenizer_KeyAdd("AS" , MyDictionary, 2)
tokenizer_KeyAdd("CONSOLE_writeline" , MyDictionary, 3)
'...
'---Add as many keywords you need
'...

'---Now change some default behave in char parsing
Tokenizer_Default_Char("$", %TOKENIZER_DEFAULT_Alpha)
Tokenizer_Default_Char("%", %TOKENIZER_DEFAULT_Alpha)
Tokenizer_Default_Char(":", %TOKENIZER_DEFAULT_newline)
Tokenizer_Default_Char(";", %TOKENIZER_DEFAULT_newline)

'---... or even faster
Tokenizer_Default_Set("$%", %TOKENIZER_DEFAULT_Alpha)
Tokenizer_Default_Set(":;", %TOKENIZER_DEFAULT_newline)


I'm going home now (I'm at work right now).
I need to fix some few points and than post a new update. Maybe tomorrow, sorry.

Ciao
Eros

Michael Hartlef
02-05-2007, 22:20
YES, I like it! :) You give to much! :)