PDA

View Full Version : Oxygen O2H Tech Blog



Charles Pegge
28-09-2008, 23:39
O2H

O2H is a Just-In-Time compiler assembler, deployed in a single DLL that takes source code strings and returns executable binary. It is an extension of the Oxygen assembler. This implementation can be used with thinBasic to provide high speed functions for time critical operations, that would otherwise be impractical using interpreted code alone.

I will be posting updates of the Oxygen source code and DLL here as the compiler develops.

This code is in the alpha stage, - undergoing further work and testing so please approach with caution!

The code is now split into the following parts:

thinBasicOxygen modular interface for thinBasic
thincore header for creating thinBasic modules in FreeBasic

o2glo structures, constants and global variables
o2lex lexing and parsing functions
o2par parsing: lower translation functions
o2sem semantics: higher translation functions
o2fun the original oxygen machine code renderer / linker
o2asm the assembler which translates assembly code into oxygen script
o2prep preprocessor / compiler translating high level code to Assembler and o2 script
o2run runtime procedures used by compiled programs

Coding Strategy

The Oxygen compiler is written in very basic BASIC and assembler. Once a stable kernel is established, Oxygen will extend itself with a library of functions and macros written in Oxygen - and ultimately the kernel itself will be rewritten in standalone Oxygen. :)



compiling from the freebasic console: (FreeBasic 0.20.0)

fbc -dylib thinBasic_Oxygen.bas

Both the Module and source code can now be found here:

http://community.thinbasic.com/index.php?topic=2517

Charles Pegge
29-09-2008, 21:28
Adopting Olestrings (BSTR)

This brings h2o into line with thinBasic for handling dynamic strings, ensuring that strings are consistently allocated and deallocated by the OS.

This code exposes the raw mid$() function. A negative index is read as counting from the end of the string backwards instead of counting from the beginning of the string forwards.





uses "OXYGEN"

dim src as string
dim s,t as string
dim v as long

src= "
push ebp
mov ebp,esp
push ebx
mov eax,[ebp+8]
mov eax,[eax] ' deref once
mov ebx,kernel
mid eax,[ebp+12],[ebp+16]
pop ebx
mov esp,ebp
pop ebp
ret 12
"

o2_buf 1 : o2_asmo src : declare function MIDS ( s as string, byval i as long, byval le as long ) as string at o2_buf 1



if len(o2_error) then msgbox 0,o2_error()+o2_view (src) : stop


S="one two three"
t=mids s,5,3
't=mids s,-5,5
msgbox 0,s+$cr+t


Internal: (FreeBasic)





function mids(byval p as BSTR,byval i as long,byval j as long) as BSTR

asm
mov edi,[p] 'pointer
mov edx,[edi-4] 'length
'-------------'neg offsets
mov esi,[i]
cmp esi,0
jg midok2 'skip if positive
add esi,edx 'adjust negs
inc esi
cmp esi,1 'lowest limit=1
jge midok1
mov esi,1 'minimum is 1
midok1:
mov [i],esi
midok2:
mov esi,edi 'pointer
cmp edx,[i]
jge midok3 'past end?
push 0
push 0
call SysAllocStringByteLen
mov esi,eax
jmp midx 'null allocation
midok3:
add edi,[i] 'src base
dec edi 'base adjust
mov eax,edi 'src base
add eax,[j] 'req boundary
add edx,esi 'end boundary
cmp eax,edx
jle midok4 'choose lower length
mov eax,edx
midok4:
sub eax,edi 'set length
push eax 'bytes
push eax '
push 0 'null string
call SysAllocStringByteLen
push edi 'src base
push eax 'new string dest
mov esi,eax
call copyn
midx:
mov [function],esi ' new string
end asm
end function

Charles Pegge
02-10-2008, 08:02
Declarative Syntax

dim global static local declare function sub byref byval ptr as pointer at =

Dim supports both formats:

dim as long a,b,c
and
dim a,b,c as long

also mixed types

dim a,b,c as long d,e,f as double

and assignments

dim as long a=40, b=50




uses "OXYGEN"
uses "FILE"

dim vv as long
dim src as string

src="
declare function aaa ( byval da as double ) as long export
declare function aab ( byref da() as double ) as long export
global as long ga,gb
global as long gc,gd
function abc ( byref a as double, byval b as double, byval c as double ) as dword export
dim as quad v1 at [ecx], v2 at [edx]
dim as long v3=42,v4=46
dim as double w3=42,w4=46
dim v5,v6 as double as single l1,l2
a=b
end function
"



msgbox 0,o2_error()+o2_view (src) : stop
file_save ("t.txt",o2_error()+o2_view(src)) : stop
o2_asmo src : o2_exec : msgbox 0,"0x"+hex$(vv)

ErosOlmi
02-10-2008, 08:37
That is great Charles.

I need to align your DIM syntax in thinBasic too i order to keep compatible.
Thanks a lot.
Eros

Charles Pegge
02-10-2008, 10:14
Hi Eros,
I split these compound dim structures into simple ones of single type in pre-def format - then they are all compiled the same way.



' wt w3 variable types
' ws list of variables of the current type
' wva() dim statements accumulator
if wr="as" then
w3=getas(s,i)
if ws="" then wt=w3:continue do ' set predef
if wt="" then wva(n)+="var "+w3+ws+cr:ws="":continue do ' postdef
wva(n)+="var "+wt+ws+cr:wt=w3:ws="":continue do ' predef
end if

ErosOlmi
02-10-2008, 13:17
Yes, thinBasic does very similarly. It will be quite easy for me to adapt.
DIM AS <type> <var> [, var [... ]]
is a syntax I wanted already developed.

Thanks
Eros

Michael Hartlef
02-10-2008, 14:08
That is evil. I could understand

TYPE var, var, var

but DIM AS TYPE, VAR VAR make my toe nails bend backwards. Thank god it is optional. :D

ErosOlmi
02-10-2008, 14:38
Yes, me too.

But those are all different ways used by different BASIC dialects to make exactly the same things. The more we support, the better to migrate other piece of code or ... users ;)

I also like TYPE var [, var [...]]
but a little more difficult to manage for example when UDT are used.

Charles Pegge
02-10-2008, 16:19
dim as long a,b,c is one of the delights of freebasic.

It does have some advantages once you get accustomed to it. - editing is a little easier and also translating from C

I am attempting to provide a very flexible dim statement to accommodate different dialects. On the down side its not so good for enforcing code discipline

How about this: ;D

dim a,b,c as long as double d e f

Petr Schreiber
02-10-2008, 16:52
Charles,

thanks for report on your H2O progress.

If you need easy port from C, then what about using DIM <datatype> instead of DIM AS DATATYPE.
Like:


DIM LONG A, B ,C


But I must say I like the current ThinBasic syntax still a bit more :) - it is more wordy, but also clear.
It is a bit confusing when type specifier is once before variable name, and in other case after variable name - this mix would not be very clear.

Like yours:


dim a,b,c as long as double d e f

;D

I don't know, I would probably use the classic thinBasic syntax I presume.


Petr

Charles Pegge
02-10-2008, 17:45
Ok Petr,

I've made the as optional - its a little extra work for the type identifier. Each word has to be checked against all intrinsic and defined types. But I like the idea of a more intelligent compiler.

dim a,b,c long double d e f

Charles Pegge
09-10-2008, 21:50
thinBasic Interface

Several related functions with persistant globals and statics can be defined in one assembly string.
To make life simple the thinBasic declaration and the Oxygen function prototypes are almost identical.
The syntax is not rigorous but its worth being able to do this.

uses "OXYGEN"

dim vv as long
dim p0,p1,p2,p3,p4 as long
dim src as string

declare sub finish () at p0
declare function fun1 () as long at p1
declare function fun2 () as long at p2
declare function fun3 () as long at p3
declare function fun4 () as long at p4


src="

h2o

'behind the scenes intitialisation here
'...

sub finish() at p0
'free resources here
end sub

function fun1 () as long at p1
'...
end function

function fun2 () as long at p2
'...
end function

function fun3 () as long at p3
'...
end function

function fun4 () as long at p4
'...
end function

;...

"
o2_asmo src
o2_exec ' this call initialises all persistent data

dim vv as long

vv=fun1() ' ...
'...
finish() ' releases all resources

Petr Schreiber
09-10-2008, 22:01
Hi Charles,

thanks for the report, I just do not get why the same functions are declared twice at specific address :-[


Petr

ErosOlmi
09-10-2008, 22:07
I think ...

one is in the thinBasic script.
The others are inside the string that Oxygen will interpret and transform into machine code.

Both will point to the same LONG allowing script to directly call on the fly compiled functions.

Petr Schreiber
09-10-2008, 22:12
Yes,

but I wonder how compiler fills those p0 - p4.
Maybe at Oxygen compile time the pointers are supplied to variable after AT?


Petr

ErosOlmi
09-10-2008, 22:23
I do not know really.
thinBasic SDK has functions to test if a variable exists so in theory module can check if P0, P1, ... exists in script and use it.

Maybe we have to wait Charles reply or something to work with.

Anyway this seems exciting.

John Spikowski
09-10-2008, 22:25
FYI:

The H2O trademark is used by Aestiva for the free version of the HTML/OS (Basic like) web scripting solution.

http://h2o.aestiva.com

Using the H2O reference may cause the owner of this company to raise his ugly head like in the past and sig his lawyers after the responsible parties.

I try to stay as far away from this company as I can. (besides the $800 per URL cost of the product) ScriptBasic has a MUCH better web scripting solution and it's FREE and open source.

John

Charles Pegge
09-10-2008, 22:38
Yes Eros, this shows declaration, source code, compilation, initialisation, function-call, and finally release.

Hi Petr,
Due to obfuscation rules there is currently no way for Oxygen to reliably read thinBasic script and vice versa - hence the need to make separate declarations. But the p0..p4 variables are provided with the function locations at o2_exec time.

Thanks for the copyright warning John, h2o is more of a project name at present. We could always call it o2h :)

Block Syntax

These are 3 different ways to define various blocks consistently. (but function and select-blocks have a sort of double structure with slightly different layouts)

Single Line:

defs veg carrot 1 bean 2 cabbage 3 potato 40


bracketed

defs veg
(
carrot 1
bean 2
cabbage 3
potato 40
)

ended

defs veg
carrot 1
bean 2
cabbage 3
potato 40
end defs

Petr Schreiber
09-10-2008, 22:48
Thanks for explanations,

now I get it ... I think.
The block syntax looks good to me, I think I will go for the multiline versions.


Petr

Charles Pegge
09-10-2008, 23:38
I think having a choice of formats is important - some items are simply easier to read if they are contained on one line - but more complex expressions benefit from the multi line format. A compiler can take a more leisurely approach to parsing and decide which format is being used.


OOP Type

This has really kept me very quiet over the last few days as there is a fair degree of complexity to work through. The format is an elaboration of the o2 OOP format, which can still be used in the h2o environment.
Objects of the class can be defined in exactly the same way as the familiar UDT but I have yet to work out the best way to handle dynamic objects - (c++ new and delete as these have an extra level of indirection and I do not wish to cause programmers psychiatric trauma by entanglements with virtual objects passed byref or byval.

This piece uses overloaded methods - four varieties of qq() distinguishable by their param signatures. In practice, one would want to keep things as simple and consistent as possible but I need to explore the limits for testing.


type someclass

method qq () as long private
method qq(byval a as double)
method qq(long,long)
method qq(long long long)
method rr(long)
/
pv as dword
r1 as dword
r2 as dword

end type

methods of someclass

method rr(long) as long
'this.r2=eax
end method

method qq()
end method

method qq(byval aa1 as double)
end method

method qq(aa2 as long, b as long )
end method

method qq(aa3 as long,b as long c as long)
end method

end methods

Petr Schreiber
10-10-2008, 08:18
Charles,

pretty incredible :)

Thanks again,
Petr

kryton9
10-10-2008, 22:40
Charles, thanks have been reading but not doing any coding for a while, so held back from commenting, but you are taking very tough to grasp subject matter and really making it accessible. This looks really elegant.

Charles Pegge
13-10-2008, 14:36
Hi Kent,

the most difficult part is walking through all the possible permutations - the program feels like a town with many alley-ways. But its gradually taking on a claear pattern.

I've spent the last few days working on the header codings for the main program and functions - which is all new territory for me. A minimal function could be a single ret byte or it could have parameters local variables and static variables and a thinBasic variable pointer, ending up with a skeleton of more then 100 bytes.

New

The best solution for dynamic objects seems to be something like this:

dim plum as new fruit

then plum exists as a memory pointer like a byref variable

As with strings, deletion of the object can be the responsibility of the system.

Petr Schreiber
13-10-2008, 16:02
Hi Charles,

this syntax is quite ok to me.
In some languages the variable is declared as normal and by assigning NEW FRUIT you allocate object.

But your way is a bit more straightforward, I like it too.


Petr

kryton9
13-10-2008, 21:29
I like the NEW keyword as am getting used to it in C++, and I like how you have auto garbage cleaning by the system. That looks like a very nice and easy to understand way of handling it all.

Charles Pegge
13-10-2008, 22:15
How about this:

Like

dim ... like

A way of cloning objects - so they start with identical class and values. It is very simple to implement.

dim apricot, nectarine, peach like new plum

with qpricot .skin.texture=furry
with nectarine .size *=2
with peach .size *=2 : .skin.texture=furry
...

kryton9
13-10-2008, 22:17
Never thought about that. Sure seems logical and fits perfectly!

Charles Pegge
06-11-2008, 09:58
After passing through the long dark tunnel of expression parsing, I reckon we are about half way through this project and on schedule to arrive in early December. The most difficult aspect to deal with is the highly recursive construction of function parameters containing expressions and in turn: expressions containing function calls with parameters containing further expressions .. ad infinitum. The compiler has to cope with extreme coding styles whether they are desirable or not.

In addition there has to be operator precedence and automatic type conversion, and complications like applying bitwise operations to floating point numbers. These are some of the ghosts that leap out and take you by surprise. But I think I've got a stable architecture now that will cope with all of these, and adding the full set of operators and types should be fairly simple.

I have split the files down into more distinctive units as they emerge - A separate file for the globals representing the state machine. Lexing functions are now separate from the semantic functions used to identify and build expressions and turn them into assembler.

String management and garbage collection are next on the todo list.

Michael Hartlef
06-11-2008, 11:29
Man, I'm so curios what it will look alike. December isn't comming fast enough. ;D But take your time. Better well thought and worked out, than rushed.

Charles Pegge
06-11-2008, 13:11
Hi Mike,

It will look very familiar, like a regular BASIC with some extensions of the syntax and a few adaptations for use inside a thinBasic String. And before releasing it I hope to have a full test suite to check all the features. This will be essential for the future course of developement, where every enhancement can be automatically checked for unexpected results.

jcfuller
06-11-2008, 21:09
Charles,
Will this be available outside the ThinBasic environment?

James

Charles Pegge
06-11-2008, 22:53
Hi James,

This could be embedded in any software that can use a dll and handle strings - as with the Oxygen assembler itself. There is also the capability of compiling standalone exes and dlls using the TOMDK though this is a bit raw at present and needs further work to support resource files etc.

But my primary target is to give thinBasic the ability to JIT compile time-critical functions and make them run almost as fast as Assembly code.

ErosOlmi
06-11-2008, 22:55
But my primary target is to give thinBasic the ability to JIT compile time-critical functions and make them run almost as fast as Assembly code.

:-* Thanks!

Charles Pegge
22-11-2008, 20:52
Strings, Dynamic Objects and Arrays

At the risk of stretching my schedule, I've been working on enhancements for array/string handling, but I think it's going to be well worth it.

All dynamic objects are contained in Bstrings. This makes memory management very simple. There is no need to specify the size of a dynamic array. When you run out of elements the array will be automatically regenerated with spare buffer space fore and aft.

Where speed is critical, static arrays will still be available.


Dynamic Arrays contained in BSTR

These will automatically redim when necessary and will support LPUSH/LPOP and UPUSH/UPOP, which causes the floating base and ubound to shift. As you can see the array header and structure is really simple. No other runtime information is needed.


[ -4 Length | 0 FloatingBase Idx | 4 Ubound Idx | LowerBuf | DATA | UpperBuf ]

Petr Schreiber
22-11-2008, 22:00
I am looking forward to your implementation :)

Charles Pegge
25-11-2008, 08:25
Here are some of the possibilities that follow from implementing stretchable dynamic arrays:

Static Arrays

Subscripts are specified

Dim abc(10,4,2) as long



Dynamic Arrays

No subscript given

Dim abc() as long



UDTs with Dynamic Arrays

Just as UDTs can contain dynamic strings , they can also contain dynamic arrays, Both can be represented as a single pointer. So the structure itself always maintains a fixed length.

The space allocated for the actual array is contained in a dynamic string.


Type Long_Flexi_Array
a() as long
End Type


And this acheived by holding string pointers inside the array string


Type String_Flexi_Array
a() as string
End Type


Arrays of other UDTs are also possible:


Type Army
troops() as soldier
tanks() as vehicle
End Type





UDTs with Lazy Recursion

Many objects in the real world have recursive structures. For instance fern fronds are often composed of minature versions of themselves - like fractals.

Because Dynamic structures are not formed until they are used in an expression, Recursion can be safely specified without bursting the computer :).


Type Fern_Frond
frond() as fern_frond
shape as shape_spec
...
End Type


An object with this structure could be built to any level - the recursion terminates wherever there is a null pointer.

Charles Pegge
26-11-2008, 09:54
Memory Management


Every time the length of a dynamic string is altered - a new version is created and the old one is discarded. so when it comes to copying or deleting objects that contain strings of strings, the internal operations can get quite complex. It is essentially the same task as copying or deleting a folder in Windows - the tree of subdirectories and file has to be processed recursively.

At the end of their life, objects have to be dismantled systematically to release all their component strings.

But for small local objects, we can take advantage of the stack to hold all their components. With a single instruction to adjust the stack pointer, all is instantly recycled.

Charles Pegge
24-03-2009, 20:33
O2H Postings

As the Oxygen project is in a new phase, I am doing a little house-keeping to make things simpler.

Both the module and the source code will now be posted to the o2h compiler topic in separate zips, along with any other related material. And we can now put this thread to bed :zzz:

http://community.thinbasic.com/index.php?topic=2517

kryton9
25-03-2009, 05:23
Charles, thanks for an understanding of what is around the corner. Really cool stuff, thanks again!