View Full Version : Oxygen O2H Tech Blog
Charles Pegge
28-09-2008, 23:39
O2H
O2H is a Just-In-Time compiler assembler, deployed in a single DLL that takes source code strings and returns executable binary. It is an extension of the Oxygen assembler. This implementation can be used with thinBasic to provide high speed functions for time critical operations, that would otherwise be impractical using interpreted code alone.
I will be posting updates of the Oxygen source code and DLL here as the compiler develops.
This code is in the alpha stage, - undergoing further work and testing so please approach with caution!
The code is now split into the following parts:
thinBasicOxygen modular interface for thinBasic
thincore header for creating thinBasic modules in FreeBasic
o2glo structures, constants and global variables
o2lex lexing and parsing functions
o2par parsing: lower translation functions
o2sem semantics: higher translation functions
o2fun the original oxygen machine code renderer / linker
o2asm the assembler which translates assembly code into oxygen script
o2prep preprocessor / compiler translating high level code to Assembler and o2 script
o2run runtime procedures used by compiled programs
Coding Strategy
The Oxygen compiler is written in very basic BASIC and assembler. Once a stable kernel is established, Oxygen will extend itself with a library of functions and macros written in Oxygen - and ultimately the kernel itself will be rewritten in standalone Oxygen. :)
compiling from the freebasic console: (FreeBasic 0.20.0)
fbc -dylib thinBasic_Oxygen.bas
Both the Module and source code can now be found here:
http://community.thinbasic.com/index.php?topic=2517
Charles Pegge
29-09-2008, 21:28
Adopting Olestrings (BSTR)
This brings h2o into line with thinBasic for handling dynamic strings, ensuring that strings are consistently allocated and deallocated by the OS.
This code exposes the raw mid$() function. A negative index is read as counting from the end of the string backwards instead of counting from the beginning of the string forwards.
uses "OXYGEN"
dim src as string
dim s,t as string
dim v as long
src= "
push ebp
mov ebp,esp
push ebx
mov eax,[ebp+8]
mov eax,[eax] ' deref once
mov ebx,kernel
mid eax,[ebp+12],[ebp+16]
pop ebx
mov esp,ebp
pop ebp
ret 12
"
o2_buf 1 : o2_asmo src : declare function MIDS ( s as string, byval i as long, byval le as long ) as string at o2_buf 1
if len(o2_error) then msgbox 0,o2_error()+o2_view (src) : stop
S="one two three"
t=mids s,5,3
't=mids s,-5,5
msgbox 0,s+$cr+t
Internal: (FreeBasic)
function mids(byval p as BSTR,byval i as long,byval j as long) as BSTR
asm
mov edi,[p] 'pointer
mov edx,[edi-4] 'length
'-------------'neg offsets
mov esi,[i]
cmp esi,0
jg midok2 'skip if positive
add esi,edx 'adjust negs
inc esi
cmp esi,1 'lowest limit=1
jge midok1
mov esi,1 'minimum is 1
midok1:
mov [i],esi
midok2:
mov esi,edi 'pointer
cmp edx,[i]
jge midok3 'past end?
push 0
push 0
call SysAllocStringByteLen
mov esi,eax
jmp midx 'null allocation
midok3:
add edi,[i] 'src base
dec edi 'base adjust
mov eax,edi 'src base
add eax,[j] 'req boundary
add edx,esi 'end boundary
cmp eax,edx
jle midok4 'choose lower length
mov eax,edx
midok4:
sub eax,edi 'set length
push eax 'bytes
push eax '
push 0 'null string
call SysAllocStringByteLen
push edi 'src base
push eax 'new string dest
mov esi,eax
call copyn
midx:
mov [function],esi ' new string
end asm
end function
Charles Pegge
02-10-2008, 08:02
Declarative Syntax
dim global static local declare function sub byref byval ptr as pointer at =
Dim supports both formats:
dim as long a,b,c
and
dim a,b,c as long
also mixed types
dim a,b,c as long d,e,f as double
and assignments
dim as long a=40, b=50
uses "OXYGEN"
uses "FILE"
dim vv as long
dim src as string
src="
declare function aaa ( byval da as double ) as long export
declare function aab ( byref da() as double ) as long export
global as long ga,gb
global as long gc,gd
function abc ( byref a as double, byval b as double, byval c as double ) as dword export
dim as quad v1 at [ecx], v2 at [edx]
dim as long v3=42,v4=46
dim as double w3=42,w4=46
dim v5,v6 as double as single l1,l2
a=b
end function
"
msgbox 0,o2_error()+o2_view (src) : stop
file_save ("t.txt",o2_error()+o2_view(src)) : stop
o2_asmo src : o2_exec : msgbox 0,"0x"+hex$(vv)
ErosOlmi
02-10-2008, 08:37
That is great Charles.
I need to align your DIM syntax in thinBasic too i order to keep compatible.
Thanks a lot.
Eros
Charles Pegge
02-10-2008, 10:14
Hi Eros,
I split these compound dim structures into simple ones of single type in pre-def format - then they are all compiled the same way.
' wt w3 variable types
' ws list of variables of the current type
' wva() dim statements accumulator
if wr="as" then
w3=getas(s,i)
if ws="" then wt=w3:continue do ' set predef
if wt="" then wva(n)+="var "+w3+ws+cr:ws="":continue do ' postdef
wva(n)+="var "+wt+ws+cr:wt=w3:ws="":continue do ' predef
end if
ErosOlmi
02-10-2008, 13:17
Yes, thinBasic does very similarly. It will be quite easy for me to adapt.
DIM AS <type> <var> [, var [... ]]
is a syntax I wanted already developed.
Thanks
Eros
Michael Hartlef
02-10-2008, 14:08
That is evil. I could understand
TYPE var, var, var
but DIM AS TYPE, VAR VAR make my toe nails bend backwards. Thank god it is optional. :D
ErosOlmi
02-10-2008, 14:38
Yes, me too.
But those are all different ways used by different BASIC dialects to make exactly the same things. The more we support, the better to migrate other piece of code or ... users ;)
I also like TYPE var [, var [...]]
but a little more difficult to manage for example when UDT are used.
Charles Pegge
02-10-2008, 16:19
dim as long a,b,c is one of the delights of freebasic.
It does have some advantages once you get accustomed to it. - editing is a little easier and also translating from C
I am attempting to provide a very flexible dim statement to accommodate different dialects. On the down side its not so good for enforcing code discipline
How about this: ;D
dim a,b,c as long as double d e f
Petr Schreiber
02-10-2008, 16:52
Charles,
thanks for report on your H2O progress.
If you need easy port from C, then what about using DIM <datatype> instead of DIM AS DATATYPE.
Like:
DIM LONG A, B ,C
But I must say I like the current ThinBasic syntax still a bit more :) - it is more wordy, but also clear.
It is a bit confusing when type specifier is once before variable name, and in other case after variable name - this mix would not be very clear.
Like yours:
dim a,b,c as long as double d e f
;D
I don't know, I would probably use the classic thinBasic syntax I presume.
Petr
Charles Pegge
02-10-2008, 17:45
Ok Petr,
I've made the as optional - its a little extra work for the type identifier. Each word has to be checked against all intrinsic and defined types. But I like the idea of a more intelligent compiler.
dim a,b,c long double d e f
Charles Pegge
09-10-2008, 21:50
thinBasic Interface
Several related functions with persistant globals and statics can be defined in one assembly string.
To make life simple the thinBasic declaration and the Oxygen function prototypes are almost identical.
The syntax is not rigorous but its worth being able to do this.
uses "OXYGEN"
dim vv as long
dim p0,p1,p2,p3,p4 as long
dim src as string
declare sub finish () at p0
declare function fun1 () as long at p1
declare function fun2 () as long at p2
declare function fun3 () as long at p3
declare function fun4 () as long at p4
src="
h2o
'behind the scenes intitialisation here
'...
sub finish() at p0
'free resources here
end sub
function fun1 () as long at p1
'...
end function
function fun2 () as long at p2
'...
end function
function fun3 () as long at p3
'...
end function
function fun4 () as long at p4
'...
end function
;...
"
o2_asmo src
o2_exec ' this call initialises all persistent data
dim vv as long
vv=fun1() ' ...
'...
finish() ' releases all resources
Petr Schreiber
09-10-2008, 22:01
Hi Charles,
thanks for the report, I just do not get why the same functions are declared twice at specific address :-[
Petr
ErosOlmi
09-10-2008, 22:07
I think ...
one is in the thinBasic script.
The others are inside the string that Oxygen will interpret and transform into machine code.
Both will point to the same LONG allowing script to directly call on the fly compiled functions.
Petr Schreiber
09-10-2008, 22:12
Yes,
but I wonder how compiler fills those p0 - p4.
Maybe at Oxygen compile time the pointers are supplied to variable after AT?
Petr
ErosOlmi
09-10-2008, 22:23
I do not know really.
thinBasic SDK has functions to test if a variable exists so in theory module can check if P0, P1, ... exists in script and use it.
Maybe we have to wait Charles reply or something to work with.
Anyway this seems exciting.
John Spikowski
09-10-2008, 22:25
FYI:
The H2O trademark is used by Aestiva for the free version of the HTML/OS (Basic like) web scripting solution.
http://h2o.aestiva.com
Using the H2O reference may cause the owner of this company to raise his ugly head like in the past and sig his lawyers after the responsible parties.
I try to stay as far away from this company as I can. (besides the $800 per URL cost of the product) ScriptBasic has a MUCH better web scripting solution and it's FREE and open source.
John
Charles Pegge
09-10-2008, 22:38
Yes Eros, this shows declaration, source code, compilation, initialisation, function-call, and finally release.
Hi Petr,
Due to obfuscation rules there is currently no way for Oxygen to reliably read thinBasic script and vice versa - hence the need to make separate declarations. But the p0..p4 variables are provided with the function locations at o2_exec time.
Thanks for the copyright warning John, h2o is more of a project name at present. We could always call it o2h :)
Block Syntax
These are 3 different ways to define various blocks consistently. (but function and select-blocks have a sort of double structure with slightly different layouts)
Single Line:
defs veg carrot 1 bean 2 cabbage 3 potato 40
bracketed
defs veg
(
carrot 1
bean 2
cabbage 3
potato 40
)
ended
defs veg
carrot 1
bean 2
cabbage 3
potato 40
end defs
Petr Schreiber
09-10-2008, 22:48
Thanks for explanations,
now I get it ... I think.
The block syntax looks good to me, I think I will go for the multiline versions.
Petr
Charles Pegge
09-10-2008, 23:38
I think having a choice of formats is important - some items are simply easier to read if they are contained on one line - but more complex expressions benefit from the multi line format. A compiler can take a more leisurely approach to parsing and decide which format is being used.
OOP Type
This has really kept me very quiet over the last few days as there is a fair degree of complexity to work through. The format is an elaboration of the o2 OOP format, which can still be used in the h2o environment.
Objects of the class can be defined in exactly the same way as the familiar UDT but I have yet to work out the best way to handle dynamic objects - (c++ new and delete as these have an extra level of indirection and I do not wish to cause programmers psychiatric trauma by entanglements with virtual objects passed byref or byval.
This piece uses overloaded methods - four varieties of qq() distinguishable by their param signatures. In practice, one would want to keep things as simple and consistent as possible but I need to explore the limits for testing.
type someclass
method qq () as long private
method qq(byval a as double)
method qq(long,long)
method qq(long long long)
method rr(long)
/
pv as dword
r1 as dword
r2 as dword
end type
methods of someclass
method rr(long) as long
'this.r2=eax
end method
method qq()
end method
method qq(byval aa1 as double)
end method
method qq(aa2 as long, b as long )
end method
method qq(aa3 as long,b as long c as long)
end method
end methods
Petr Schreiber
10-10-2008, 08:18
Charles,
pretty incredible :)
Thanks again,
Petr
Charles, thanks have been reading but not doing any coding for a while, so held back from commenting, but you are taking very tough to grasp subject matter and really making it accessible. This looks really elegant.
Charles Pegge
13-10-2008, 14:36
Hi Kent,
the most difficult part is walking through all the possible permutations - the program feels like a town with many alley-ways. But its gradually taking on a claear pattern.
I've spent the last few days working on the header codings for the main program and functions - which is all new territory for me. A minimal function could be a single ret byte or it could have parameters local variables and static variables and a thinBasic variable pointer, ending up with a skeleton of more then 100 bytes.
New
The best solution for dynamic objects seems to be something like this:
dim plum as new fruit
then plum exists as a memory pointer like a byref variable
As with strings, deletion of the object can be the responsibility of the system.
Petr Schreiber
13-10-2008, 16:02
Hi Charles,
this syntax is quite ok to me.
In some languages the variable is declared as normal and by assigning NEW FRUIT you allocate object.
But your way is a bit more straightforward, I like it too.
Petr
I like the NEW keyword as am getting used to it in C++, and I like how you have auto garbage cleaning by the system. That looks like a very nice and easy to understand way of handling it all.
Charles Pegge
13-10-2008, 22:15
How about this:
Like
dim ... like
A way of cloning objects - so they start with identical class and values. It is very simple to implement.
dim apricot, nectarine, peach like new plum
with qpricot .skin.texture=furry
with nectarine .size *=2
with peach .size *=2 : .skin.texture=furry
...
Never thought about that. Sure seems logical and fits perfectly!
Charles Pegge
06-11-2008, 09:58
After passing through the long dark tunnel of expression parsing, I reckon we are about half way through this project and on schedule to arrive in early December. The most difficult aspect to deal with is the highly recursive construction of function parameters containing expressions and in turn: expressions containing function calls with parameters containing further expressions .. ad infinitum. The compiler has to cope with extreme coding styles whether they are desirable or not.
In addition there has to be operator precedence and automatic type conversion, and complications like applying bitwise operations to floating point numbers. These are some of the ghosts that leap out and take you by surprise. But I think I've got a stable architecture now that will cope with all of these, and adding the full set of operators and types should be fairly simple.
I have split the files down into more distinctive units as they emerge - A separate file for the globals representing the state machine. Lexing functions are now separate from the semantic functions used to identify and build expressions and turn them into assembler.
String management and garbage collection are next on the todo list.
Michael Hartlef
06-11-2008, 11:29
Man, I'm so curios what it will look alike. December isn't comming fast enough. ;D But take your time. Better well thought and worked out, than rushed.
Charles Pegge
06-11-2008, 13:11
Hi Mike,
It will look very familiar, like a regular BASIC with some extensions of the syntax and a few adaptations for use inside a thinBasic String. And before releasing it I hope to have a full test suite to check all the features. This will be essential for the future course of developement, where every enhancement can be automatically checked for unexpected results.
jcfuller
06-11-2008, 21:09
Charles,
Will this be available outside the ThinBasic environment?
James
Charles Pegge
06-11-2008, 22:53
Hi James,
This could be embedded in any software that can use a dll and handle strings - as with the Oxygen assembler itself. There is also the capability of compiling standalone exes and dlls using the TOMDK though this is a bit raw at present and needs further work to support resource files etc.
But my primary target is to give thinBasic the ability to JIT compile time-critical functions and make them run almost as fast as Assembly code.
ErosOlmi
06-11-2008, 22:55
But my primary target is to give thinBasic the ability to JIT compile time-critical functions and make them run almost as fast as Assembly code.
:-* Thanks!
Charles Pegge
22-11-2008, 20:52
Strings, Dynamic Objects and Arrays
At the risk of stretching my schedule, I've been working on enhancements for array/string handling, but I think it's going to be well worth it.
All dynamic objects are contained in Bstrings. This makes memory management very simple. There is no need to specify the size of a dynamic array. When you run out of elements the array will be automatically regenerated with spare buffer space fore and aft.
Where speed is critical, static arrays will still be available.
Dynamic Arrays contained in BSTR
These will automatically redim when necessary and will support LPUSH/LPOP and UPUSH/UPOP, which causes the floating base and ubound to shift. As you can see the array header and structure is really simple. No other runtime information is needed.
[ -4 Length | 0 FloatingBase Idx | 4 Ubound Idx | LowerBuf | DATA | UpperBuf ]
Petr Schreiber
22-11-2008, 22:00
I am looking forward to your implementation :)
Charles Pegge
25-11-2008, 08:25
Here are some of the possibilities that follow from implementing stretchable dynamic arrays:
Static Arrays
Subscripts are specified
Dim abc(10,4,2) as long
Dynamic Arrays
No subscript given
Dim abc() as long
UDTs with Dynamic Arrays
Just as UDTs can contain dynamic strings , they can also contain dynamic arrays, Both can be represented as a single pointer. So the structure itself always maintains a fixed length.
The space allocated for the actual array is contained in a dynamic string.
Type Long_Flexi_Array
a() as long
End Type
And this acheived by holding string pointers inside the array string
Type String_Flexi_Array
a() as string
End Type
Arrays of other UDTs are also possible:
Type Army
troops() as soldier
tanks() as vehicle
End Type
UDTs with Lazy Recursion
Many objects in the real world have recursive structures. For instance fern fronds are often composed of minature versions of themselves - like fractals.
Because Dynamic structures are not formed until they are used in an expression, Recursion can be safely specified without bursting the computer :).
Type Fern_Frond
frond() as fern_frond
shape as shape_spec
...
End Type
An object with this structure could be built to any level - the recursion terminates wherever there is a null pointer.
Charles Pegge
26-11-2008, 09:54
Memory Management
Every time the length of a dynamic string is altered - a new version is created and the old one is discarded. so when it comes to copying or deleting objects that contain strings of strings, the internal operations can get quite complex. It is essentially the same task as copying or deleting a folder in Windows - the tree of subdirectories and file has to be processed recursively.
At the end of their life, objects have to be dismantled systematically to release all their component strings.
But for small local objects, we can take advantage of the stack to hold all their components. With a single instruction to adjust the stack pointer, all is instantly recycled.
Charles Pegge
24-03-2009, 20:33
O2H Postings
As the Oxygen project is in a new phase, I am doing a little house-keeping to make things simpler.
Both the module and the source code will now be posted to the o2h compiler topic in separate zips, along with any other related material. And we can now put this thread to bed :zzz:
http://community.thinbasic.com/index.php?topic=2517
Charles, thanks for an understanding of what is around the corner. Really cool stuff, thanks again!