ReneMiner
09-10-2022, 21:58
as a programming language thinbasic needs the chars of Ascii 0 to 127 only- right?
We use UTF-8 but except for the content of quoted strings, thinbasic does not really require any unicode-chars.
We could use UTF-7 (=ASCII 7 Bit: 0 to 127) as well and use the "useless" BOM in front but some texteditor could display it -
"useless" because 1-byte chars have no bytes that were to order - but the editor could append 1 byte in the end of a script!
Or 1000 bytes after a Script-terminating zero-byte.
For example to hide secret information that is not displayed in the editor but attached to a script.
That could be as very last a clue what encoding 7-Bit-Ascii or UTF-8 is used.
But before that final bit we could have a whole lot of sections that can store internal or other information!
for example:
the way how the designer wants to create windows or controls in a way the interpreter can read and realize it without the need for human readability as long the visual designer can reload and modify it again.
The programmer can set some points in the designer where a script-function is called as well as to check what controls should react and call an event-procedure. The designer can parse if these Callback Controlname_OnAction() are present in the script,
if Uses "UI" is contained and for runtime-addition of more controls there is the classic way as well as the possibility to disable controls initially to hide them.
Also other kinds of binary ressources, miniature-dll-modules or config-settings, a defaults providing Ini-file or xml can be hidden after a script-terminating $NUL-char.
- bound and sealed - attached to the code.
Visible only in Hex-editors or such that know what to find there.
Maybe with a switch to turn on professional mode of the editor
or
maybe machinecode-orders?
some pre-compiled binaries?
remember DATA-sections and READ-keywords?
Just asking to trigger ideas...
However, my point was - the code uses actually chars in range 0 to 127.
- zero not really but its a signal to terminate!
Why only a String when it can terminate a script?
At least the visible code.
"When this were a string to store"
it requires 33 bytes in memory if a delimiter follows or as zero-terminated string.
It requires 32 Bytes (+ 4 Bytes Strptr+ 4 Bytes dwStrLen + 4 Bytes Varptr and maybe a lot more to manage this piece of information to be present the moment we call for it)
I say if we use dynamic strings as container for another type of strings in the range of ASCII it can store much more.
the line above will not require more than 26 bytes with a little trick and a little assembler routine.
Without the need for any additional byte we find a (native) delimiter provided by our systems and already in place that can
terminate all members of an "ASCII-String-Array"
The machine-language would read in the string and store it as ASCII and the last char of a token ("verb") would set Bit value 128 = 1 and omit to store the following space-char.
In the other direction back from stored ASCII-string to the memory we use in a variable or send to output:
if no punctuation nor chr$( <= 32 ) follows after a through bit 128-terminated wordchar but another wordchar and when its defined to be an ASCII-String, then a space is to insert except in the last position of the stored memory
(where the space would be CHR$(160))
We use UTF-8 but except for the content of quoted strings, thinbasic does not really require any unicode-chars.
We could use UTF-7 (=ASCII 7 Bit: 0 to 127) as well and use the "useless" BOM in front but some texteditor could display it -
"useless" because 1-byte chars have no bytes that were to order - but the editor could append 1 byte in the end of a script!
Or 1000 bytes after a Script-terminating zero-byte.
For example to hide secret information that is not displayed in the editor but attached to a script.
That could be as very last a clue what encoding 7-Bit-Ascii or UTF-8 is used.
But before that final bit we could have a whole lot of sections that can store internal or other information!
for example:
the way how the designer wants to create windows or controls in a way the interpreter can read and realize it without the need for human readability as long the visual designer can reload and modify it again.
The programmer can set some points in the designer where a script-function is called as well as to check what controls should react and call an event-procedure. The designer can parse if these Callback Controlname_OnAction() are present in the script,
if Uses "UI" is contained and for runtime-addition of more controls there is the classic way as well as the possibility to disable controls initially to hide them.
Also other kinds of binary ressources, miniature-dll-modules or config-settings, a defaults providing Ini-file or xml can be hidden after a script-terminating $NUL-char.
- bound and sealed - attached to the code.
Visible only in Hex-editors or such that know what to find there.
Maybe with a switch to turn on professional mode of the editor
or
maybe machinecode-orders?
some pre-compiled binaries?
remember DATA-sections and READ-keywords?
Just asking to trigger ideas...
However, my point was - the code uses actually chars in range 0 to 127.
- zero not really but its a signal to terminate!
Why only a String when it can terminate a script?
At least the visible code.
"When this were a string to store"
it requires 33 bytes in memory if a delimiter follows or as zero-terminated string.
It requires 32 Bytes (+ 4 Bytes Strptr+ 4 Bytes dwStrLen + 4 Bytes Varptr and maybe a lot more to manage this piece of information to be present the moment we call for it)
I say if we use dynamic strings as container for another type of strings in the range of ASCII it can store much more.
the line above will not require more than 26 bytes with a little trick and a little assembler routine.
Without the need for any additional byte we find a (native) delimiter provided by our systems and already in place that can
terminate all members of an "ASCII-String-Array"
The machine-language would read in the string and store it as ASCII and the last char of a token ("verb") would set Bit value 128 = 1 and omit to store the following space-char.
In the other direction back from stored ASCII-string to the memory we use in a variable or send to output:
if no punctuation nor chr$( <= 32 ) follows after a through bit 128-terminated wordchar but another wordchar and when its defined to be an ASCII-String, then a space is to insert except in the last position of the stored memory
(where the space would be CHR$(160))