PDA

View Full Version : Oxygen Module for Structured Machine Code Programming



Charles Pegge
14-03-2008, 03:23
Updated 17 Mar 2008 13:20 gmt
New zip:
Oxygen with integrated thinBasic # variables
No longer case sensitive.
New thinCore interface for FreeBasic (work in progress)


This is my first thinBasic module.:)

It is based on my O2 (alias Oxygen) project which supports block structured machine code. It is intended to be used in conjunction with MC_Eval and MC_Exec. The O2 notation uses the same quote mark for comments as ThinBasic and does not use double quotes to delineate its own string literals. This enables Oxygen scripts to be incorporated, without any modifications into thinBasic multi-line strings. Oxygen Strings are 'glued' onto the end of MC_Eval strings to bind to thinBasic variables etc. The example included shows how this is done.


I reproduce it here:


uses "Oxygen"
dim vv as long = 5
dim sMC as string=MC_Eval ("
b9 #vv
8b 09
")+Oxygen_Eval("

'--------------'
' TESTS
'
' 12:10 14 Mar 2008
'
'--------------'
' nested loops
'--------------'

33 d2
(
b8 nl10
(
42
48 7f repeat
)
49
7f repeat
)
eb gEnd ' jump over some string data
[ this is a string ]

.End_of_Prog

8b c2
c3
")
Dim RetVal as long
RetVal=MC_Exec(sMC)
msgbox 0, Hex$(RetVal)



Since Oxygen_Eval returns a BSTR, which PB uses for its dynamic strings, I hope I am right in assuming the returned string will be correctly disposed of by thinBasic when it is no longer needed ?

The Zip below includes source code, docs and the dll module which has no other dependencies. But I need to produce lots of examples to fully demonstrate its use.

kryton9
14-03-2008, 05:28
It almost gave the correct answer to the meaning of life (hitchhikers guide to the galaxy) 42, I got 32, but if that is in hex I guess it is 50 in decimal.

Michael Hartlef
14-03-2008, 09:09
Hi Charles,

first of all, thank you for your support of thinBasic. Every new module makes the package more powerful and one day it will help someone. I also vote for some example and docu as I can't get my head right now around it.

Cheers
Michael

Charles Pegge
14-03-2008, 10:32
Here is the instruction set so far. I will refrain from adding more and see what can be done using the set in its present state.



------------------------------------------------------------------------
INSTRUCTION SET
------------------------------------------------------------------------
all instructions are case sensitive (lowercase) except for hexadecimal
all words are delimited by white space or comment mark
' comment to end of line
[..] string literal (the square brackets are nestable)
.label forward labels (only the first 2 letters are significant)
: make entry in jump table
$ make space (value in hexadecimal)
$$ make space & align to nearest 4 bytes
2 digits 0-9 a-f hexadecimal byte (these are not case sensitive)
3 digits 0-7 octal byte
g short forward relative jump (but not into an inner block)
gl long forward relative jump (ditto)
( start of block
) end of block
x short jump exit from block
xl long jump exit from block
r short jump repeat from start of block
rl long jump repeat from start of block
h hexadecimal numbers: (not case sensitive)
hw word: 2 byte integer
hl long: 4 byte integer
n decimal numbers:
nb byte 1 byte
nw word: 2 byte integer
nl long: 4 byte integer
nq quad: 8 byte integer
ns single: 4 byte floating point
nd double: 8 byte floating point

------------------------------------------------------------------------

ErosOlmi
14-03-2008, 10:57
Thanks a lot Charles!
Machine code is not my "core business" (I still cannot understand it very well) but I appreciate a lot your efforts. Hope it will move some passion.



Since Oxygen_Eval returns a BSTR, which PB uses for its dynamic strings, I hope I am right in assuming the returned string will be correctly disposed of by thinBasic when it is no longer needed ?

Yes, thinCore should dispose allocated BSTR.
I will check asap and if not I will fix it.

Ciao
Eros

Petr Schreiber
14-03-2008, 11:20
Thanks Charles,

so far it looks like a very nice addition!


Petr

Charles Pegge
14-03-2008, 11:20
I am delighted to say that creating a thinBasic module was painless.

I followed Eros sample code very closely, and got the Oxygen module working without any hitches. I wasn't certain how Freebasic would read a BSTR into its own dynamic string structure so I wrote a low level string copy which interprets the BSTR structure precisely rather than casting it as a Zstring (Asciiz in PB terms). It could be done with PEEKS and POKES but I used a short piece of ASM.

The reason that FreeBasic does not use BSTR for its dynamic strings, is that they are Microsoft platform dependent, so it has its own system that will also work under Linux.

Anyway here is the interface function, most of which deals with the BSTR to FB dynamic string transfer

FreeBasic

Function Oxygen() AS BSTR

Dim ParensPresent As Long

ParensPresent = thinBasic_CheckOpenParens_Optional

Dim codBSTR As BSTR '---OLE string will be used to return value to thinCore

dim srcBSTR As BSTR
dim src as string
dim cod as string
dim ert as long
dim i as long
dim j as any ptr

thinBasic_ParseString(srcBSTR)
asm
mov eax,[srcBSTR]
mov ecx,[eax-4]
mov [i],ecx
end asm
if i>0 then
' copy to freebasic dynamic string then compile
src=string$(i,chr$(0))
j=strptr(src)
asm
mov esi,[srcBSTR]
mov edi,[j]
mov ecx,[i] 'length of data
nexch:
mov al,[esi] ' src
mov [edi],al ' dest
inc esi
inc edi
dec ecx
jnz nexch
end asm
cod=Hexlink( src,ert,i ) ' the Oxygen compiler linker
end if

codBSTR = SysAllocStringByteLen( strptr(cod), Len(cod) )

If ParensPresent = TB_TRUE Then thinBasic_CheckCloseParens_Mandatory

Function = codBSTR

End Function

RobertoBianchi
14-03-2008, 11:32
Hi Charles,

first of all thank you very much for starting the new thinbasic Oxygen module.
It seems a very interesting module that opens many others way of development that will make sure ThinBasic able to better meet what the user needs .

Regards,
Roberto

Charles Pegge
14-03-2008, 12:52
Thank you all

When there are updates, I will replace the Oxygen zip file attached to the first post on this thread. and I'll post examples with the other MC_Evals on the machine code board;
http://community.thinbasic.com/index.php?board=151.0

thinBasic_Oxygen will be synchronised with its stand alone counterpart that works directly from the console.
available here:

http://www.jose.it-berater.org/smfforum/index.php?topic=1618

RobertoBianchi
14-03-2008, 13:40
Ok thanks for details, it will be really great if you can also add a step up from machine code to assembler code.

Roberto

Petr Schreiber
14-03-2008, 14:03
Roberto,

nice idea! Don't know if not too much time consuming for Charles to develop, but mov, add ... would make code even more clear to understand.
But the current state is already terrific.


Petr

Charles Pegge
14-03-2008, 14:21
Yes I am seriously thinking about it Roberto. Most of it is straightforward, but there are some ambiguities which require interpretation. The 1:1 correspondance with machine code is not perfect. There are also a large number of instructions and their permutations so I will try to automate the coding, by interpreting the data directly from the manuals to avoid human error.

But even an incomplete implementation, could be made easier to use than the static inline assemblers in PB and FB.

RobertoBianchi
14-03-2008, 16:05
Great!

Charles please let me know if you needs support.

Roberto

Michael Hartlef
14-03-2008, 17:53
Yes I am seriously thinking about it Roberto. Most of it is straightforward, but there are some ambiguities which require interpretation. The 1:1 correspondance with machine code is not perfect. There are also a large number of instructions and their permutations so I will try to automate the coding, by interpreting the data directly from the manuals to avoid human error.

But even an incomplete implementation, could be made easier to use than the static inline assemblers in PB and FB.




Yes, that would be awesome.

Petr Schreiber
14-03-2008, 18:14
Always learning something new here,

I thought assembler = opcodes represented by name rather than hexvalue.
So now I see how highlevel assembler is comparing to machine code :)


Thanks,
Petr

ErosOlmi
14-03-2008, 23:37
Charles,

do you think you need a FreeBasic SDK function that, giving a variable name, will return:

the variable maintype (number, string, udt, variant, ...)
the variable subtype (long, double, single, dynamic string, fixed string, ....)
if variable is an array and how many elements
the variable data pointer
...


Maybe you can interface script variable with Oxygen script on the fly like I did in MC string eval.

Let me know.

Ciao
Eros

Charles Pegge
15-03-2008, 01:41
Yes I think that is a good direction to go in. There's some extra work involved for the more specialised types like EXT and CURRENCY which are not native to FB but as these are passed by pointer they can be handled easily at low level.

It will be nice to link variables and their attributes directly into Oxygen scripts. I am also looking at the logistics of adding Assembler directly to Oxygen.

RobertoBianchi
15-03-2008, 09:56
I'm thinking how activate the syntax highlight for machine code into ThinAIR.
While if Charles will add the assembler interpreter feature this can be done using the marker blocking ASM ... END or ASM ASM (...) or the character "!" on each line as is the case with PB, how we can do it now with machine code?
What are yours suggestions?

Thanks,
Roberto

ErosOlmi
15-03-2008, 10:06
Roberto,

thanks a lot for your time on this. ThinBasic support multi-line strings as you have seen in the many examples.
So opening double quote is in one line, then closing double quote can be many lines after.
If thinAir can manage this syntax, it could be very nice.
This feature is already there so maybe you can concentrate on this for the moment and see what comes out from machine code handling later.

Just my idea.
Ciao
Eros

RobertoBianchi
15-03-2008, 10:16
You are welcome.

Sorry but even if I can manage multi line I'm not able to recognize standard multi line from multi line that holds machine code.
Better should be an inline marker but with some additional work also START ... END marker block should be work.

Roberto

ErosOlmi
15-03-2008, 11:28
Yes I understand but multiline is there and ASM not and also not decided anything. And it will take some time.
Also ASM will be specific of a particular module and not thinBAsic general Core engine so we will not able to follow its evolution from inside of thinBasic (as far as I can see).

I think thinAir should have something in config that should say the beginning of the block and the end of the block in order to be automatically able to understand them whatever they are. Not to be specific to any particular syntax but general like:
block: [FUNCTION][END FUNCTION]
block: [SUB][END SUB]
block: [BEGIN CONST][END CONST]
...

PS: maybe better to split this part of the topic into thinAir forum.

Charles Pegge
15-03-2008, 13:03
Hi Eros and Roberto

With Asm it is desirable to be able to use quote quote marks eg:

mov al,"b" or db "piece of text"


This would require thinBasic to use special markers to delineate a block of data instead of double-quotes. I suggest a pair of brackets, either [] or {} because of nestability. - Brackets are also used extensively within Asm itself

I know this is a fundamental change to thinBasic syntax but I think it will help with syntax highlighting and support many other kinds of script or data passed to modules in general.

MyAsmProc=O2_Asm {
[color=black]mov eax,#vv
mov [eax], "helo"
..
ret
}

please excuse my colors :)

ErosOlmi
15-03-2008, 13:11
Charles,

who will be responsible to parse all code inside {...} of O2_Asm {...} ?
If responsible to parse code will be module function linked to O2_Asm, I do not see much troubles.
thinCore just need to parse first { and last } and pass all the inside code to O2_Asm function as string.

This will also let to thinAir the freedom to remain as is because syntax will be very close to current parsed syntax.

Ciao
Eros

Charles Pegge
15-03-2008, 15:07
Yes the entire content, including comments can be passed as a BSTR to the module. To avoid the nested brackets problem (for which you would need to parse the entire content) you could use keywords like this:

dim sMc as string=Oxygen_Eval Script
..
EndScript Oxygen_Eval

In general terms:
[module function name] script.. EndScript [module function name]

This syntax could easily support scripts which may contain other scripts :) without getting too complicated

pseudocode:
read script
do
p=instr next_word_pos script, "endscript "
if p=0 then p= len script+1: exit (error)
w=nextword script
if w=function_name then a=next_word_pos script: exit
next_word_pos=this_word_pos
loop
if noerror then assign script ..

Charles Pegge
15-03-2008, 15:27
Eros,

A question about the FreeBasic interface: How did you generate libthinCore.dll.a ? Did you use dlltool and a DEF file ?

ErosOlmi
15-03-2008, 15:51
Attached file the .bat and .def file I use.

In the .def file you declare the dll from which to export and the functions followed by @ followed by the number of bytes calling process expects to find into the stack.
In the .bat file, the command I use.

I remember it took some time for me to understand this process.

Hope this can help.
Eros

ErosOlmi
15-03-2008, 15:54
Yes the entire content, including comments can be passed as a BSTR to the module. To avoid the nested brackets problem (for which you would need to parse the entire content) you could use keywords like this:

dim sMc as string=Oxygen_Eval Script
..
EndScript Oxygen_Eval

In general terms:
[module function name] script.. EndScript [module function name]

This syntax could easily support scripts which may contain other scripts :) without getting too complicated

pseudocode:
read script
do
p=instr next_word_pos script, "endscript "
if p=0 then p= len script+1: exit (error)
w=nextword script
if w=function_name then a=next_word_pos script: exit
next_word_pos=this_word_pos
loop
if noerror then assign script ..



I will study your idea. Very interesting and very easy to manage by thinBasic parser.

Eros

RobertoBianchi
15-03-2008, 20:53
Charles,

if there are difficulty inside the interpreters to manage a block (either machine code or assembler) is better to keep things simple (I mean use string or multi string) and try to solve the problem with marker inside the string.

Ciao,
Roberto

ErosOlmi
15-03-2008, 21:09
Charles,

see atached file and let me know if it can be ok for you.
Mainly I've added (as you suggested) RAWTEXT/END RAWTEXT block.
In theory inside that block there can be anything. If all OK a string containing the block will be returned.

Ciao
Eros

Updated:

SCRIPT/END SCRIPT
substituted with
RAWTEXT/END RAWTEXT
2008.03.16: added interface function thinBasic_VariableGetInfoEX. See thinCore.inc file inside zip file


Attached file removed. Features present in current thinBasic preview version 1.6.0.2

Petr Schreiber
15-03-2008, 21:14
Hi Eros,

fantastic! But as inside can be "anything", why not call it RAWTEXT / END RAWTEXT or similarly.
Without the magic ( Charles module ) it has no script like functionality.

Just idea, could be wrong, today I realised my bike has pierced pneumatic so I am in aggressive mood ;D


Petr

ErosOlmi
15-03-2008, 22:34
OK, you are right.
I will change to something else.

ErosOlmi
15-03-2008, 23:09
Updated above post with a new Core.
Now it implements RAWTEXT/END RAWTEXT.

Charles Pegge
16-03-2008, 00:52
Thanks Eros,

I have just been testing RAWTEXT. It seems to parse the string taking out comments also turning the whole string, except for quoted content, into upper case. Would it be possible to capture the unprocessed content before it goes through any of the main parsing procedures?



I am also playing with the thinCore.BI file and the DEF / dlltool batch file file you kindly sent earlier. I seem to be calling this function successfully but it does not want to return a variable pointer (null). Do I need to give the variable name in uppercase. Are there any decorations to add to the varname?

This is my intepretation of thinBasic_VariableGetInfo

PS Corrected BSTR ptr to BSTR: variable attribs now visible.

FreeBasic


' '----------------------------------------------------------------------------
' 'thinBasic_VariableGetInfo
' '----------------------------------------------------------------------------
' ' Returns additional variable info giving variable name.
' ' ATTENTION: pass variable as reference. Function will populate
' ' with relevant information
' ' Function will return a pointer to variable data that can be used in other
' ' situations.
' '----------------------------------------------------------------------------
DECLARE FUNCTION thinBasic_VariableGetInfo _
LIB "thinCore.DLL" _
ALIAS "thinBasic_VariableGetInfo" _
( _
Byval SearchKey AS BSTR , _
Byval pMainType As Long Ptr , _ '---ATTENTION: parameter passed BYREF will return info
Byval pSubType As Long Ptr , _ '---ATTENTION: parameter passed BYREF will return info
Byval pIsArray As Long Ptr _ '---ATTENTION: parameter passed BYREF will return info
) As Long

'---Equates for variable Main Type
Const MainType_IsNumber As Long = 20
Const MainType_String As Long = 30
Const MainType_Variant As Long = 50
Const MainType_UDT As Long = 60
'---Equates for variable Sub Type
Const SubType_Byte As Long = 1
Const SubType_Integer As Long = 2
Const SubType_Word As Long = 3
Const SubType_DWord As Long = 4
Const SubType_Long As Long = 5
Const SubType_Quad As Long = 6
Const SubType_Single As Long = 7
Const SubType_Double As Long = 8
Const SubType_Currency As Long = 9
Const SubType_Ext As Long = 10
Const SubType_AsciiZ As Long = 15

Charles Pegge
16-03-2008, 02:45
Well I've made a little more progress - can now read the Variable attributes and get a variable pointer but cant see the contents of an initialised variable eg an integer value yet. Maybe it's bed time..

Charles Pegge
16-03-2008, 11:03
This is the test function I am using to look at the thinBasic LONG variable:
still getting 0 instead of the initialised value 42.

PS: Corrected code with thinBasic_DirectPtrToDataPtr: Now working!



Function GetvarPtr(VarName as string) as long
dim as PBStringDescriptor pbsd
dim as long MainType , SubType, IsArray

pbsd.bstr=SysAllocStringByteLen( strptr(VarName), Len(VarName) )
'pbsd.other=0

dim as long i,j
i=thinBasic_VariableGetInfo (pbsd.bstr, @MainType, @SubType, @IsArray )
if i then
j=thinBasic_DirectPtrToDataPtr(i)
if j then
'MessageBox 0,str$(peek(long,i))+" "+str$(MainType)+" "+str$(SUbType)+" "+str$(IsArray),"test",0
MessageBox 0,VarName+" "+str$(peek(long,j))+" "+str$(i)+" "+str$(j),"vars",0
end if
end if
SysFreeString pbsd.bstr
function=i

end function

ErosOlmi
16-03-2008, 11:07
Hi Charles,

I will work on ROWTEXT in order to fix case and other stuff.

Regarding thinBasic_VariableGetInfo interface function, the returned pointer is not the a pointer to variable data but a pointerr to a structure holding the variable info.
As I said before, I need to develop a new specific function for you.

In the meantime, to get variable data pointer you can use "thinBasic_DirectPtrToDataPtr(BYVAL DirectPtr AS LONG)" function passing the value returned by "thinBasic_VariableGetInfo".

Eros

ErosOlmi
16-03-2008, 11:26
Charles,

I've updated zip file in previous post: http://community.thinbasic.com/index.php?topic=1603.msg11611#msg11611
I've added interface function "thinBasic_VariableGetInfoEX". See thinCore.inc file for declaration. "DataPtr" parameter is the direct pointer to variable data. I can implement more if needed.

Ciao
Eros

Charles Pegge
16-03-2008, 12:14
Thank you Eros, I have incorporated the new function in thinCore.BI and thinCore.DEF and tested out the new thinCore. We are nearly there :)

Can we retain the comments in the RAWTEXT. These are still being stripped out. Some scripts will inevitably want to use the single quote mark for other purposes.

ErosOlmi
16-03-2008, 12:19
OK, perfect.

I will not be so fast to chage ROWTEXT behave because I have to change the script pre-parsing process. I need some more time, sorry.
In the meantime I hope you can continue your O2 dev using double quoted string.

Ciao
Eros

Charles Pegge
16-03-2008, 12:44
That is no problem for Oxygen or the Assembler since they both use BASIC style comments. Though for debugging purposes it is still desirable to have the comments there eventually.

I am making a list of issues relating to the Freebasic interface. For instance multidimensional arrays are row major in contrast to PB column major order. Nothing that needs immediate attention.

One thing I spotted: the Asciiz subtype returns 25 but the equate in both the INC and BI files shows it as 15

ErosOlmi
16-03-2008, 12:50
One thing I spotted: the Asciiz subtype returns 25 but the equate in both the INC and BI files shows it as 15

Sorry. 25 is the correct value.
I will fix INC and BI files in next release. I need to work more also on FreeBasic .BI file. If you can send me your version I will "line" them.

Thanks a lot
Eros

Charles Pegge
16-03-2008, 13:07
If you like I can translate the rest of the BI functions and markup where there might be problems. This will help me to get familiar with the interface. Then I can pass them back to you Eros.

ErosOlmi
16-03-2008, 13:13
... For instance multidimensional arrays are row major in contrast to PB column major order. Nothing that needs immediate attention.


thinBasic variables are not 1 to one with the compiler (Power Basic) variable but we have built our own "variable system" so I have full control over how data in handled.
Internally we have a function that calculates the exact element position giving its coordinates up to 3 dimensions. So, again, we have full control on how that position is calculated. Actually, as you stated, thinBasic follow column order mainly for historical reasons (we wanted thinBasic very compatible with our be-loved Power Basic) but I can change, or better, add a new variable specific characteristic that tells if multidim variable follows column or row order. So a new syntax like the following:

DIM MyMatrix(10, 20) AS LONG {ROWORDER | COLUMNORDER}
Where ColumnOrder is the default. And internally, the function in place to calculate element position will check if variable has column or row order and act consequently.

I will check if this idea is doable but at first sight I would say yes.

Ciao
Eros

ErosOlmi
16-03-2008, 13:15
If you like I can translate the rest of the BI functions and markup where there might be problems. This will help me to get familiar with the interface. Then I can pass them back to you Eros.



A big thanks Charles.
If you have any comment to do to thinBasic SDK interface, something you think can improve it or whatever, let me know.

Ciao
Eros

Petr Schreiber
16-03-2008, 15:23
Hi Eros,

roworder, columnorder idea is very nice, it can make life easier when converting code from other languages.


Thanks,
Petr

Charles Pegge
17-03-2008, 20:22
I have updated Oxygen, allowing variables to be bound directly in the script, and removing case sensitivity, so it can be used with RAWTEXT and the latest preview version of thinBasic.

The zip is updated at the beginning of this thread.

I have also included my latest edits of libthinCore.dll,a, def file and thinCore.bi for FreeBasic development, including the more recent API functions and constants in thinCore.inc. I need to carry out some more checks before handing over to Eros but it was all fairly straightforward.

RAWTEXT has reverted to uppercase conversion even with yesterday's thinCore. It must be doing the conversion elsewhere. But as well as making Oxtgen case insensitive, I envisage the assembler being the same. Also using BASIC style comments.

This is how the new version Oxygen can now be used with RAWTEXT and the variable #vv directly incorporated:



uses "Oxygen"

dim vv as long = 6
dim sMc as string=rawtext
'--------------'
' nested loops
'--------------'

8b 0d #vv ' mov ecx,vv
ba NL0 ' mov edx,0
( ' [do]
b8 nl7 ' mov eax,7
( ' [do]
42 ' inc edx
48 ' dec eax
7f r ' jg repeat
) ' [end do]
49 ' dec ecx
7f r ' jg repeat
) ' [end do]
eb gEnd ' jmp End_of_Prog
[ This is a string ] ' db ...
" This is a string " ' db ...
'
.End_of_Prog '
'
8b c2 ' mov eax,edx
c3 ' ret
end rawtext
Dim tMC as string=Oxygen_Eval(sMC)
Dim RetVal as long=MC_Exec(tMC)
Msgbox 0, "The meaning of life is: "+RetVal
'Msgbox 0,sMC

Petr Schreiber
17-03-2008, 20:44
Hi Charles,

thanks for the update, worked well here!
I think Kent will like the script because of its finally right output ;)

Is the loop performed until EAX = 0 ?


( ' [do]
42 ' inc edx
48 ' dec eax
7f r ' jg repeat
) ' [end do]



Thanks,
Petr

Petr Schreiber
17-03-2008, 21:01
Hmm,

is the following decomposition of your program correct Charles?


dim vv as long = 6
dim eax, ebx, ecx, edx as long

ecx = vv
edx = 0
do
eax = 7
do
edx += 1
eax -= 1
loop until eax <= 0
ecx -= 1
loop until ecx <= 0 ' -- But why ecx ? Because it was last referenced ?

eax = edx

msgbox 0, "The meaning of life is:"+STR$(eax)


I did it to understand it better, still too unsure in machine code and assembly, but want to master it :)


Thanks,
Petr

Corrected according to Charles comments

Charles Pegge
17-03-2008, 21:55
Hi Petr, I agree with your decompostion except the meaning of jg would be 'until <=0' instead of 'until=0'.

The register codings work like this
eax 000
ecx 001
edx 010
ebx 011
esp 100
ebp 101
esi 110
edi 111

which is added to:
40 inc
48 dec
b8 mov [immediate data]

then there are conditional short jump codes
74 jz
75 jnz
..
7c jl
7d jge
7e jle
7f jg

So encoding is easy. Reading it back is not quite as easy.

Petr Schreiber
17-03-2008, 22:30
Thanks for explanation,

that is interesting. I think my problem is that I still in my mind want to directly equal the machine code to assembler, which is wrong. I need to study the Intel manuals better :)


Thanks,
Petr

Charles Pegge
17-03-2008, 23:10
The ancestors of the x86 are the 4004 (4 bits) and the 8080 (8 bits). Devices of these ancient times often coded their instructions in octal since bytes were in short supply, they had to be used as efficiently as possible. This is the legacy we have today. The addressing byte consists of

Simplistically:

mod: 2 bits addressing mode
reg: 3 bits register1 (normally the destination)
r/m: 3 bits register2 (normally the source)

when mod=00 register2 hold the memory address
when mod=01 register2 holds the memory address + offset (next byte)
when mod=10 register2 holds the memory address + offset (next 4 bytes)
when mod=11 register2 is treated as the location of the data (reg to reg transfer)

Example: mov 32 bit
8b c1 ' mov eax,ecx ' going from right to left in Intel syntax
8b c8 ' mov ecx,eax
8b 01 ' mov eax, [ecx]
8b 41 08 ' mov eax [ecx+8]
8b 81 00 02 00 00 ' mov eax,[ecx+512]

But the x86 has elaborated on this scheme to extend the addressing modes (like irregular verbs in natural languages), so for some combinations its a bit more complicated.

ErosOlmi
17-03-2008, 23:23
Charles,

you are giving us poor mortal a lot of interesting knowledge is short but effective way that goes directly into our gray matter.
I can see some lights at the bottom of the tunnel.

A big thanks. This is all new to me.
Eros

Charles Pegge
18-03-2008, 00:55
Well this is good mental exercise - devising rules for the assembler. :)

To show how Intel stretch out their addressing modes:

if mod is 00 and r/m is 101 then it uses the offset as an absolute address:

useful example:
8b 05 #vv ' mov eax, [#vv] ' move contents of vv into eax

if mod is 00 01 or 10 and r.m is 04 then there is an extended address mode byte called a SIB which stands for Scale Index Base.

This enables the address to be derived from 2 registers, one of which is scaled. A sort of 2 dimensional array. It allows data in these arrays to be referenced in one go. For example:

8b 84 91 #vv ' mov eax,[ ecx + edx *4 + #vv ]

In this case the primary address byte 84 breaks down as follows

mod 10 for a 4 byte long offset (#vv)
reg 000 destination the eax register
r/m 100 a SIB byte follows

the SIB byte 91 breaks down as follows

Scale 10 signifying index * 4 bytes
Index 010 the index is the value in edx
Base 001 the base is the value in ecx

The offset, our absolute address to the start of #vv is then added to these.

There are a few more complications to contend with but this is the most elaborate addressing mode available.

Petr Schreiber
18-03-2008, 21:25
Hi Charles,

thanks again, learning bit by bit works very good, at least for me.
So here I attach cover of book you might not even now that you write here / will write ;D


Thanks a lot,
Petr

kryton9
18-03-2008, 21:53
As all things in this thread, awesome stuff Petr and well deserved for Charles!

Charles Pegge
19-03-2008, 01:11
Thanks for your comments and the book! If its for kids it will be a scrap book. Coding these rules into an Assembler is hard work. Attempting a regular block structure was pretty hopeless so I have resorted to the good old fashioned GOTO. It reduces deep nesting and makes the code much cleaner. If all goes well I should have a small kernel of the instruction set useable in a few days. It will sit in its own layer above oxygen, which will do all the linking.