PDA

View Full Version : i18n internationalization functionality in thinbasic / thinair



DirectuX
28-10-2018, 18:27
Hi,

I can't find anything (bundled examples, manual, forum) about internationalization (https://en.wikipedia.org/wiki/Internationalization_and_localization)capability of thinbasic, maybe (or not) something similar to gettext (https://en.wikipedia.org/wiki/Gettext).



Did I miss something ?

If no, what are your thought towards this (with thinbasic in mind) ?

ErosOlmi
28-10-2018, 18:38
Nothing specific is present in thinBasic but if we can discuss a bit about that ... maybe we can produce something

DirectuX
28-10-2018, 20:24
we can discuss a bit about that ... maybe we can produce something
I assume we to be the community.
For my part: nothing unnecessarily complicated. I think that it's more convenient to code with internationalization in mind from the beginning than to review code afterwards. Do you agree ?
Then, do we want a solution specific to thinbasic or something compatible, or a mix of both : inspired from existing and as simple as possible in thinBasic ?

ErosOlmi
28-10-2018, 21:01
Something doable with AppConfig module can be something like that.

Create an XML file like the following and call it locale.xml
First level is locale codes, second level are keys and their traslations.
Keys are repeated in each language

<?xml version="1.0" encoding="utf-8"?><AppConfig>


<en-GB>
<Good_Morning>Good Morning</Good_Morning>
<What_Is_Your_Name>What is your name?</What_Is_Your_Name>
</en-GB>


<fr-FR>
<Good_Morning>Bonjour</Good_Morning>
<What_Is_Your_Name>Quel est votre nom?</What_Is_Your_Name>
</fr-FR>

</AppConfig>



In script do something like that. Use .SetSearchKey method to set first level search to be the locale code, in this way in .GetKey method you just need to search for the key


uses "Console"
uses "AppConfig"


function TBMain() as long
dim AppLocale as new cAppConfig
dim sLocale as string

'---Load translations
AppLocale.Load(APP_SourcePath & "Locale.xml")


if AppLocale.ErrorPresent Then
'---Some error occurred
printl "Error code", AppLocale.ErrorCode
printl "Error description", AppLocale.ErrorDescription
Else
'---Set locale code
sLocale = "en-GB"
AppLocale.SetSearchKey(sLocale)
printl "Setting locale to:", sLocale
printl AppLocale.GetKey("Good_Morning")
printl AppLocale.GetKey("What_Is_Your_Name")
printl ""


'---Set locale code
sLocale = "fr-FR"
AppLocale.SetSearchKey(sLocale)
printl "Setting locale to:", sLocale
printl AppLocale.GetKey("Good_Morning")
printl AppLocale.GetKey("What_Is_Your_Name")


printl ""
end If

printl "Press a key to end" in %CCOLOR_FLIGHTRED
WaitKey
end function


Attached the files.
Just an idea.

ErosOlmi
28-10-2018, 21:18
In the above example you can have multiple approaches:

single file/all languages
a file for each language

Files can be located anywhere.

DirectuX
29-10-2018, 13:33
Attached the files.
Just an idea.

Edited the files.

Why XML ?

9895

ErosOlmi
29-10-2018, 13:51
Always interesting others source code.

I think there is an error: you used variadic parameter in i18n function. You need to call the function with variadic params inside () in this way:

Printl i18n("s1_Is_My_Name",("DirectuX"))https://www.thinbasic.com/public/products/thinBasic/help/html/index.html?functions_subs.htm search for variadic.


Why XML?
Well because I like it, because it is very flexible and also because thinBasic has already AppConfig module done based on xml file format.
I also like jSon but to me they are more complicated to read when arrays of arrays ... are involved.

But we can find another way, other file formats and develop a specific thinBasic module for that if needed.
Just gives ideas

For the few documents I've seen, personally I do not like gettext complexity and its binary files and all the stuffs needed to convert files from one format to the other.
Instead I like the idea to find a way to read source code and determine which strings are to be covered by i18n conversion.
But this is only my opinion.

Thanks for this suggestion, I like it.
Hope to find a good way easy and flexible.

DirectuX
29-10-2018, 16:05
You need to call the function with variadic params inside ()
True, missed that part.


develop a specific thinBasic module for that if needed.
It's up to you. I'm personally not in need of a module right now as some subs may do the work (As you may have understand, currently I just check If all the coding bricks are available for my project), however,

code would be shorter with some dedicated properties/methods
it would add a functionality to thinbasic
it's an interesting thing to think at



I do not like gettext complexity and its binary files and all the stuffs needed to convert files from one format to the other.
we share this thought


Instead I like the idea to find a way to read source code and determine which strings are to be covered by i18n conversion.
But this is only my opinion.
Thanks for this suggestion, I like it.
Hope to find a good way easy and flexible.
For this : when right clicking on a tab, there is a contextual tool called "Indent Code", we may have "Generate translation file" and "Merge translation file"
Parsing the code for each i18n("...") call, and have the file built/updated.
Filename can also be retrieved from source code. Comments for translators too.

'[i] <some comment to translators>

Two jointly Idea I'm not pleased with:
expand$ 's function usage in my code. (example1 below)
and
xml's format requirements for keys (example2 below)
see :

(example1) would prefer

s = expand2$("my name is %1, %2",(whaterverItsNameVariable1,whaterverItsNameVariable2))

rather than

Dim stringVar as string
Dim stringVar2 as string
s = expand$ ("my name is stringVar, stringVar2")
Because variable name doesn't need to be exposed to translator.
I expected to find something to do this in the stringbuilder module but didn't.


(example2) would prefer

s = expand2$("my name is %1, %2",(whaterverItsNameVariable1,whaterverItsNameVariable2))
with "my name is %1, %2" as a "key" or hash("my name is %1, %2") as key

rather than

s = expand2$("my_name_is_s1,_s2",(whaterverItsNameVariable1,whaterverItsNameVariable2))
with "my_name_is_s1,_s2" as a key , compelling to code AND edit the XML at the same time.
More: % , $ ... are not accepted in xml tags

ErosOlmi
29-10-2018, 22:18
While I will think and study more about i18n (new to me and intrigued to find a clever solution), have a look at StrFormat$ to build strings from expressions and markers.
It uses {1}, {2}, ... {n} for placeholders

https://www.thinbasic.com/public/products/thinBasic/help/html/index.html?strformat$.htm

DirectuX
30-10-2018, 10:30
have a look at StrFormat$ to build strings from expressions and markers.
It uses {1}, {2}, ... {n} for placeholders
Yes, this answers the example1 above.

Concerning xml , I am not against at all. It just implies invalid characters escaping, on the contrary of a plain text list (less error prone: one sentence per line, one separator between index and string, no escaping management when translating, enabling simple notepad to edit the file).

But let's keep it xml.

Xml file updated 9896

DirectuX
02-11-2018, 13:26
Hi,

first, SetSearchKey still does not work in the script, has it work for you ?
9905

then, I though a bit about the source code extraction (user's easiness in mind):
an idea of logic:
1/ define the default reference language that will be used within the script ex: "en-GB"
2/ set the language to be displayed ex: "fr-FR"
3/ localizable text is surrounded by a mere function ex: i18n("My name is {1}",(param1))
4/ the function process the string as in the attached script v3

When programming, it is not wanted to have to edit the translation file at the same time, so default text must be in the script (and as a fallback too).

For extraction:
1/Retrieve xml filename used in the code, make a backup (filename-[date:hour].xml) or create one if needed.
2/add attribute "editing" to all sentence elements in the xml file
3/Get a string after i18n(
4/Get any comment right above the i18n ('[i] <here comment for translator>)
5/XMLEscape the comment, the string to build the attribute, and the string to put the text; consideration for use of CDATA (https://en.wikipedia.org/wiki/CDATA)
6/In the xml file, check for existence of the built attribute for the elements containing the reference language attribute.
7/If not exist: add it
8/If exist: remove "editing" attribute for that element
9/Check if wanted translations placeholders exists, add it if needed. Remove corresponding "editing" attribute.
10/Add or replace comment for translator
11/Back to step 3 unless no more string to process
12/Remove elements with "editing" attribute
13/Sort XML file


finally, in the last xml file, though I think ]not, I ask: are xml's attributes retrievable through AppConfig ?

ErosOlmi
02-11-2018, 15:46
Replies

first: yes it it. Example from your "Original suggestion by EO" folder is working fine

then: give me some time to study

finally: at the moment not. It is not a matter to read attributes but how to transform them into the hash table I use inside cAppConfig object.
XML nodes tree is transformed into a sequence of keys separated by "\"
So nodes like
<en-GB>
<Good_Morning>
are transformed into the key "en-GB\Good_Morning"
How to transform attributes?

I think the best way is to create a dedicated module so we can define a precise XML format to adhere to.
In that way I can create specific module class dedicated to internationalization that as many hash table as the languages defined in the xml file.
The <object>.setLocale will set the language intenally setting the active hash table to get keys from.
Or something like that.

...
thinking ...

ErosOlmi
02-11-2018, 15:49
Hope to be able to publish a new module to be used as a starting point for more detailed discussions.
So we can see if I'm on the right track.

DirectuX
02-11-2018, 16:33
first: yes it it. Example from your "Original suggestion by EO" folder is working fine
I can't see why, same version 1.10.5.0 of thinBasic ?
Here the error file...9906



finally: at the moment not. [...]
I think the best way is to create a dedicated module so we can define a precise XML format to adhere to.

I apprehend, if you make it a module, will the source code be still available ? I'm not sure how you work with sdk with community.




In that way I can create specific module class dedicated to internationalization that as many hash table as the languages defined in the xml file.

why not have a module per usage:
a xml module that deal with xml, ((un-)escaping, entities, tags and attributes)
a i18n module that uses the xml module and that deal with translation



The <object>.setLocale will set the language intenally setting the active hash table to get keys from.
Or something like that.

...
thinking ...
You'll have to manage multiple language at once when auto-building from code. This is my second argue for 2 modules rather than one.


It is not a matter to read attributes but how to transform them into the hash table I use inside cAppConfig object.
XML nodes tree is transformed into a sequence of keys separated by "\"
So nodes like
<en-GB>
<Good_Morning>
are transformed into the key "en-GB\Good_Morning"
How to transform attributes?
To my understanding, you transformed a tree structure (xml) to a table.
Attributes looks to me like an array of values stored in the key
idea: linked lists ? Though I'm not sure how it works (have not studied it yet)

ErosOlmi
02-11-2018, 16:57
Ops, SetSerchKey is not yet published, it is in 1.10.6 which I'm working on:
https://www.thinbasic.com/public/products/thinBasic/help/html/index.html?version_1_10_6_0.htm

I will see if I can publish a preview by this evening so you can test.

For parsing XML there is an include (B_XML60.INC) and a DLL that wrap standard Microsoft api. But is general purpose and quite complex.
See examples in \thinBasic\SampleScripts\XML
I do not want that complexity that's why I developed "AppConfig" module that with few lines of code does what hundred of lines do if using classic xml parsing.

ErosOlmi
02-11-2018, 17:13
Preview of thinBasic 1.10.6.0
https://www.thinbasic.biz/projects/thinbasic/thinBasic_1.10.6.0.zip

DirectuX
02-11-2018, 17:45
Preview of thinBasic 1.10.6.0
https://www.thinbasic.biz/projects/thinbasic/thinBasic_1.10.6.0.zip

:) the script works.

DirectuX
02-11-2018, 18:04
I do not want that complexity that's why I developed "AppConfig" module

I understand, I proposed attributes because they are less restrictive on allowed characters.
Did you looked at Locale2.xml ?

ErosOlmi
02-11-2018, 18:08
I understand, I proposed attributes because they are less restrictive on allowed characters.
Did you looked at Locale2.xml ?

Yes I saw.
I'm creating a new module based on that example.

As soon as I will have something working I will publish, maybe by tomorrow if family permits :)

DirectuX
02-11-2018, 18:13
maybe by tomorrow if family permits :)
totally agree :D

ErosOlmi
02-11-2018, 18:45
Inside
<Sentence text="Good_Morning" lang="en-GB" >Good Morning</Sentence>
can we assume that
"Good_Morning" text
is unique over a language, whatever node it is present?
In this way I can have an internal hash table key\data where key can be LANG + \ + TEXT like "en-GB\Good_Morning"
Setting the Locale would set the main part of the key

I mean I'm trying to find a way to define something like to below example.
It would be possible to have up to 3 levels of nodes but such nodes are only a way to keep translations organized and not part of the key to search for.
Otherwise things like i18n("Good_Morning") would not be possible.





<?xml version="1.0" encoding="utf-8"?>
<i18n>


<Main_Script>
<Globals>
<Sentence text="Good_Morning" lang="en-GB" >Good Morning</Sentence>
<Sentence text="Good_Morning" lang="fr-FR" >Bonjour</Sentence>
<Sentence text="What_Is_Your_Name" lang="en-GB" >What is your name?</Sentence>
<Sentence text="What_Is_Your_Name" lang="fr-FR" >Quel est votre nom?</Sentence>
</Globals>
<!-- <some comment to translators> -->
<Sentence text="{1}_Is_My_Name" lang="en-GB" >{1} is my name.</Sentence>
<Sentence text="{1}_Is_My_Name" lang="fr-FR" >Je m'appelle {1}.</Sentence>
</Main_Script>

<Include_myIncludedFile>
<Dialogs>
<Sentence text="Dialog_Title" lang="en-GB" >Dialog title</Sentence>
<Sentence text="Dialog_Title" lang="en-GB" >Titre de la fenêtre</Sentence>
</Dialogs>
<Menu>
<Sentence text="Close_all_files" lang="fr-FR" >Close all files.</Sentence>
<Sentence text="Close_all_files" lang="fr-FR" >Fermer tous les fichiers.</Sentence>
</Menu>
</Include_myIncludedFile>

</i18n>






Also trying to figure out how to interact between the editor, thinAir, and translations in order to find a way thinAir is aware of the need of some translations.
Also to have the possibility to automatically create translations

Will read back all previous suggestions

Thinking ...
Any idea is welcome.

DirectuX
03-11-2018, 09:39
Will read back all previous suggestions
Yep' , as some questions were already discussed.



Otherwise things like i18n("Good_Morning") would not be possible.
(as in previous suggestions) Maybe not i18n("Good_Morning") but i18n("Good morning") as in plain english.
then, the xml building script convert the string to a unique escaped tag (lang/text) and this is transparent to the programmer and less human-error prone and provides a fallback.



can we assume that
"Good_Morning" text
is unique over a language, whatever node it is present?
I think so, and the xml automated building can insure this is the case.



Thinking ...
Any idea is welcome.

I wish to know if we go with <tags> and "attributes" or only <tags> ? I'm not sure to understand what you decided so far.

DirectuX
03-11-2018, 10:05
considering my last question just above, if you choose the <tag> only version , xml might look like this:



<?xml version="1.0" encoding="utf-8"?>
<i18n>
<Main_Script>
<Globals>
<Good_Morning>
<en-GB>Good Morning</en-GB>
<fr-FR>Bonjour</fr-FR>
</Good_Morning>
<What_Is_Your_Name>
<en-GB>What is your name?</en-GB>
<fr-FR>Quel est votre nom?</fr-FR>
</What_Is_Your_Name>
</Globals>
<!-- <some comment to translators> -->
<#1#_Is_My_Name>
<en-GB>{1} is my name.</en-GB>
<fr-FR>Je m'appelle {1}.</fr-FR>
</#1#_Is_My_Name>
</Main_Script>
<Include_myIncludedFile>
<Dialogs>
<Dialog_Title>
<en-GB>Dialog title</en-GB>
<fr-FR>Titre de la fenêtre</fr-FR>
</Dialog_Title>
</Dialogs>
<Menu>
<Close_all_files>
<en-GB>Close all files.</en-GB>
<fr-FR>Fermer tous les fichiers.</fr-FR>
</Close_all_files>
</Menu>
</Include_myIncludedFile>
</i18n>


or if we keep a symmetry list between xml's tags and the string passed to i18n it could even be simplier:


' Symmetry list
' syntax : [tag] [i18n submitted string]
' * tag could be zero padded to keep alignment
'
1 "Good Morning"
2 "What is your name?"
3 "{1} is my name."
4 "Dialog title"
5 "Close all files."


<?xml version="1.0" encoding="utf-8"?>
<i18n>
<Main_Script>
<Globals>
<1>
<en-GB>Good Morning</en-GB>
<fr-FR>Bonjour</fr-FR>
</1>
<2>
<en-GB>What is your name?</en-GB>
<fr-FR>Quel est votre nom?</fr-FR>
</2>
</Globals>
<!-- <some comment to translators> -->
<3>
<en-GB>{1} is my name.</en-GB>
<fr-FR>Je m'appelle {1}.</fr-FR>
</3>
</Main_Script>
<Include_myIncludedFile>
<Dialogs>
<4>
<en-GB>Dialog title</en-GB>
<fr-FR>Titre de la fenêtre</fr-FR>
</4>
</Dialogs>
<Menu>
<5>
<en-GB>Close all files.</en-GB>
<fr-FR>Fermer tous les fichiers.</fr-FR>
</5>
</Menu>
</Include_myIncludedFile>
</i18n>



XML files attached : 9908
memo:the built xml can be written to be human readable and contain script-unused informations.

DirectuX
18-11-2018, 17:12
or if we keep a symmetry list between xml's tags and the string passed to i18n it could even be simplier:


' Symmetry list
' syntax : [tag] [i18n submitted string]
' * tag could be zero padded to keep alignment
'
1 "Good Morning"
2 "What is your name?"
3 "{1} is my name."
4 "Dialog title"
5 "Close all files."


as a reminder (mostly for myself) I found that the INI module can manage that.