CWAD (AKA: Compressed, Where's All the Data) [Archive] - thinBasic: Basic Programming Language Community Forum

View Full Version : CWAD (AKA: Compressed, Where's All the Data)

ISAWHIM

20-10-2008, 06:39

WARNING: Long winded prelude to the actual content...
Scroll to the second post for the code segment.

What is a CWAD?
C = Compressed
WAD = "Where's All the Data" (Normally uncompressed RAW data used by a CORE program. Popularized by DOOM.)

Actually, a WAD is like a ZIP or RAR file. It is one file that contains many other files. With one small exception, the data is structured with an intent. Images are a block, sounds are a block, animation is a block, levels are a block, objects are a block, monsters are a block, weapons are a block, scripts are a block... etc...

This is stolen from an OLD OLD OLD concept that we normally call a "Database". (Part of the reason ZIP and RAR had a hard time getting patents for the "Stolen Concept". They were limited to "Specific format", and "Processing of data", in the patents.)

Any-who...

I have seen a lot of places where this TYPE/STYLE of structure could play a great role in overall development here with ThinBASIC. Much of the data we use, demands us to organize it, index it, and attempt to ensure it is all unique.

This has led me to come-up with this general structure and concept of the CWAD. (The compressed version, which has obvious gains in this power-of-two expanding world.)

I will use "Graphics" as the first example...

Games which once used only a hand-full of 256x256 graphics, now use thousands of 1024x1024+ graphics, each year was getting exponentially larger. However, we have reached what is called a "Creative edge". Millions of images can not be created by one entity, so now we work in reverse. We are heading back towards the 256x256 graphics, which allows the quantity to grow as we grow. EG, What would have been one large image creating the entire side of a ship. Is now hundreds of images, which we originally used to create the rendered image of the side of the ship. The painted metal, the rusty-holes, the rivet-lines, the stained drip-marks, the dirty scuff-marks of wear, the barnacle patches.

Why are we doing this? Because computers which were once limited to X surfaces (Triangles), are now able to handle 100 times more than they once could. Plus, new "Tricks" have been developed, which reduces the number of actual X surfaces (Triangles), we actually see. This gives us the advantage of using many smaller images to create larger images, as desired. (Like a mini-rendering tool inside of the game.)

For the second example I will use animation...

What was once a simple, push, pull, move, and slide, is now a complex chain of stacked events. When you type on the keyboard, your finger moves up and down and left and right. That is added to the motion of the other fingers, and your wrist. That is added to the motion of your arm as you reach for the far keys. That is added to your eye-motion as you look from screen to keys. That is added to your facial expressions as you squint your eyes, and raise your brow to insight. That is added to the pulsing vein that raises under your skin as you think. That is added to the chest rise and fall as you breath. That is added to the head-tilt which you do, as you contemplate how to finish your paragraph.

Imagine all that done as one long line of code... Now you have to do the next animation, where you get-up for coffee. That could take another day to program, and you never even finished the paragraph.

This is why we use editors. They "Script" animation, as independent blocks of data. That data can be compounded to form one split-second animation, which reacts to each other animations results. You program "Scratch nose", which is an animation of, "Bend finger", and "Raise arm", and "Bend wrist", and "Wiggle finger"... Followed by nothing (Which returns your arm/finger back to the last position, or leads it into a new animation.)

Those individual animations are saved as a script, and those script-animations are saved as a script. And so-on, and so-on... Script on script on script... That is a "Generic" form of manual compression, but the scripts are still long, and can gain from further program compression. One or two scripts is only a couple bytes, but compounded millions of scripts and motions can easily be into the MB reach.

Here is where it becomes an advantage for us.

The program doesn't care that "Bone45" is called, "Right index finger", nor does the viewer. You have to create all these long unique names, which is a task in a task. Or, you could just throw them away, and save hours of wasted time. Since that is what the game does after you wasted hours creating all of them. (Not to mention, you have to creatively craft many conflicts, which throws many standards out the door, or makes them a total non-advantage.)

Now picture this...

You don't code every door and window and wall by hand. You don't track "Floor_Tile_Green_Dirty_Chinese_01.bmp". You don't even know what that image is, until you look at it! (If you had four tiles, it would be possible, but most games have more than four floor styles, each with four floor tile variations, and possibly separate sizes of each.)

What you do, is this. You create "Ship", the name and object is irrelevant. That "Ship" has 45 images, (paint01, paint02, paint03, paint04, deck1, deck2, door1, door2, door3, door4, window1, window2, window3...)

Once you create it, it gets added to the the WAD as, ObjID#7 (I#432,I#434,I#120,I#435,I#436...)

You notice the third image... "paint03" was not a new image, but one that already exists in the WAD, and it was used, as opposed to creating another new image-block. When programming the game, if not using a script, you open the WAD, and see that ObjID#7 is the object you want, or you can actually associate a unique name with that. "Ship", which would return the ID# for use in the game, which is what the game code expects, a model ID#.

Here is where it can get real interesting.

ObjID#7 can be ObjID#2's Model, with ObjID#2's, ObjID#4's, ObjID#7's Images.

Ok, hard to visualize... Model 7 was the same model ship as Model 2, but you just changed images, some which belonged to the WAD model 2 and 4, plus some new images. So, you just created a "Variation" of Model 2, which is ObjectID#7 in the game.

EG, To change the tires on a car should not be a complex in-game task, but this still allows that. You just have to know how many modelID's there are in the WAD, and all your "Custom" styles would be that number + 1, 2, 3, 4. Instant "Unique" control, without funky stacked names, which the game doesn't care about anyways.

You need a crowd with 100 people, all doing separate animations, and having unique bodies, clothing styles, heads, motions... You could set them all up in the WAD, if you want instant random crowd creation. But you would be better using the code to manage those objects. Using the object-parts inside the WAD as your source.

Another advantage is... If you replace/update images or the object or the models... you instantly get notified of a name-conflict, (Which is still irrelevant if you don't use names), and all associated images, objects and models become updated. (Add animations, or user-data to that as well.)

Now... For simple control, and to keep this first release down to a minimum. I am only going to deal with the two item/sections that will possibly be used the most. Since there are only a few "Standard" formats involved, this should be rather easy to launch.

I do expect versions to change, but that should only impact the data-storage, and available output. I do not expect this to be used directly inside of code, as RAW control... Though, it can be, if you reduce it to only the parts you need for your program. (It is my hope, that this eventually turns into a MODULE. For now, it will operate as an INCLUDE or as a separate program.)

ISAWHIM

20-10-2008, 06:41

Here is what I have setup for a generic format, for version 1

Flow goes something like this... (Similar to a folder-crawl)

CWAD_Header
CWAD_Model_Index
- Model_File 1
- - Model_Header 1
- - - Model_RAW 1
- Model_File 2
- - Model_Header 2
- - - Model_RAW 2
CWAD_Image_Index
- Image_File 1
- - Image_Header 1
- - - Image_RAW 1
- Image_File 2
- - Image_Header 2
- - - Image_RAW 2

But the actual data is stored like this...

CWAD_Header
CWAD_Model_Index
CWAD_Image_Index
- Model_File 1
- Model_File 2
- Image_File 1
- Image_File 2
- - Model_Header 1
- - - Model_RAW 1
- - Model_Header 2
- - - Model_RAW 2
- - Image_Header 1
- - - Image_RAW 1
- - Image_Header 2
- - - Image_RAW 2

Code removed, pending format change. (More flexible code.)

ISAWHIM

20-10-2008, 07:09

The data is able to be constructed into two parts, if desired.
DATA-Header: This is user created, formulated from the file, extracted from the file, or just RAW-UNCOMPRESSED data.
RAW-DATA: This should be the original data, which will be compressed. But this can be a formulated portion of the data.

The (Header) will always be uncompressed, but is not required to hold any information.
The (RAW) will always be compressed, but is not required to hold any information.
One or the other MUST be present, or the "File" does not exist, and will be removed from that pack.

The (Header) can contain two parts. The first part must be "FIXED-LENGTH" data. The second part must be "FIXED-SIZE-VALUES". One value in the first part, must indicate the quantity of "FIXED-SIZE-VALUES", and another value in the first part must be named "StepOver", and hold the value of total "Bytes" which must be stepped over, in addition to the header, to get to the beginning of the compressed data. (That will make more sense later.)

For a simple example...

(CWAD Head) Has 4 OBJ
(CWAD Model_Index) "House" = #1, "Tree" = #2, {Each at byte position X, and is Y bytes long.}
{"House" @ X to Y}
- (Model_Head #1) Ver:1006, Vert:[3] {"Lawn"}
- - (Model_RAW #1:1) X,Y,Z,...blah blah...
- - (Model_RAW #1:2) X,Y,Z,...blah blah...
- - (Model_RAW #1:3) X,Y,Z,...blah blah...
{"Tree" @ X to Y}
- (Model_Head #2) Ver:1006, Vert:[2] {"Tree"}
- - (Model_RAW #2:1) X,Y,Z,...blah blah...
- - (Model_RAW #2:2) X,Y,Z,...blah blah...

For a more complex example...
(CWAD Head) Has 4 OBJ
(CWAD Model_Index) "House" = #1, "Tree" = #2, {Each at byte position X, and is Y bytes long.}
{"House" @ X to Y}
- (Model_Head #1) Ver:1006, Vert:[3] {"Lawn"}
- - (Model_RAW #1:1) X,Y,Z,...blah blah...
- - (Model_RAW #1:2) X,Y,Z,...blah blah...
- - (Model_RAW #1:3) X,Y,Z,...blah blah...
{"Tree" @ X to Y}
- (Model_Head #2) Ver:1006, Vert:[2] {"Tree"}
- - (Model_RAW #2:1) X,Y,Z,...blah blah...
- - (Model_RAW #2:2) X,Y,Z,...blah blah...
(CWAD Image_Index) "Roof" = #1, "Leaves" = #2, {Each at byte position X, and is Y bytes long.}
{"Roof" @ X to Y}
- (IMG Index #1:1) RAW-ICON BMP 32x32
- - (IMG Index #1:2) FULL RAW FULL IMAGE DATA
{"Leaves" @ X to Y}
- (IMG Index #2:1) RAW-ICON BMP 32x32
- - (IMG Index #2:2) FULL RAW FULL IMAGE DATA

ISAWHIM

20-10-2008, 08:02

How am I expecting this to function?

Similar to this...

You want your "Tree" loaded... (This is an object)
1. You call the function to loadCWADobj("Tree") or loadCWADobj(2).
2. It returns a value related to the NEW object SlotID where the object was loaded.
3. You can modify that object, by saying swapIMG(SlotID,"Leaves","Grass").
4. It returns (0) if it works. The SlotID will now hold the modified object.

The above actions need to be controlled by the CWAD, due to the fact that the data is in the CWAD. However, individual data can be pulled from the CWAD, and fully managed by you.

You want to create your "Tree" by yourself...
1. You call the function to getCWADobj("Tree") or getCWADobj(2).
2. It returns the DataID and ImageID for further manipulation. DataID is the RAW-M15, ImageID is a CSV image-list.
3. You can alter the image-list to use FILES, and alter the M15 to load as an altered FILE. (Both written to the HD for now.)
4. Alternately, you can manipulate the M15 and save as a FILE, but pass the IMAGE ID's back, or altered, along with the FILE LOCATION... and have the CWAD load the altered DATA.

loadCWADobjFILE(MyFile) {CWAD will expect images contained in the WAD, but will try to load files if no ID#'s are used. That allows you to Mix CWAD with CUSTOM images. Where you may want to impose a screen-shot, or game-altered graphic.}

There would not be a (MyFile,MyImages), because that is the native function of TBGL_M15LoadModel, and has no dependant use of CWAD data.

This is not a game-manager, only a CWAD manager. The only functions "Built-in" would be specific to the types that are "Built-in". You have the ability to manipulate any data inside, but be aware that certain manipulations related to replacement and removal would be permanent to your data.

The "Custom" area can use your own format, except for the "Header", and "Name" area. You can use the Name area optionally, so you may have it contain any data, or no data. It will always be Qed by the Name commands. You can technically throw CWADs inside CWADs... But you will loose performance until they are all expanded. One reason you MAY want to do that, would be for "Quality Control", or to get more "Custom Areas", or use them for "Game Progression", to speed-up loading from single large CWADs.

Let me know what you think... The image portion is almost complete, and nearly ready for use. (Working on the M15 will be a little more of a challenge.)

ISAWHIM

20-10-2008, 08:39

I forgot to add that each area is modular, and can be included or removed, and potentially have separate versions related to them. (Where custom creation is desired.)

The OBJ area may later have OBJ2, or you can obviously add your own as OBJx. (That should not be an issue for updated versions, as you would be required to "Update" as well. If Version1.0.0.0 did not have OBJ2, but you added OBJ2, that would not conflict with Version1.0.0.1 if it DID have OBJ2, because the data-version would not match. Your updated code would have to be called something new, but if called FROOBJ, it would still function the same. It would know to look for OBJ2 with an older version, as the "Owned" custom data. Just as the NEW OBJ2 would not attempt to load older version data that it was not designed to read.)

EG, You should ignore any data that is not EXACT, or IN-RANGE for the custom code you create. (Otherwise you will be reading other data that may NOT be yours. Non expected data.)

Other proposed areas of development may include... (Most of these are script data.)
- Animation-Scripts (For multi-segmented animations and linear flow and forced motion.)
- Camera-Tracks (Variation of an animation-script with non-linear flow and non-forced motion.)
- AI-Paths (For those who use branching and thinking AI on non-linear paths.)
- Scenes (Containing many objects, but not an object itself.)
- Rooms (Large segmented objects, acting as one object.)
- Light-Grids (Segmented lighting control, for deep games, to reduce overhead programming.)
- Sounds (Um, for sounds in the game.)
- Effects (Um, for visual effects in the game.)

Many are specific to games, again, but the uses are endless... Network-logging, Backup-tracking, Update-tracking, Project-development, Presentation-display...

ErosOlmi

20-10-2008, 12:34

Jason,

your idea is very interesting.
There were some discussions few months ago about the possibility to store data needed by TBGL into a ZIP like file but at that time there was no real need.

Maybe argument for a specific thinBasic module.

Ciao
Eros

ISAWHIM

20-10-2008, 15:25

Well...

I am thinking for large-scale ability. Since TBGL can handle it.

I just finished reformatting the code for a more modular and custom arrangement. (The format above was sort-of locked-in at the first header.)

Now, it acts like a file-system, at the core. Expansion only demands that the initial "Header" of any "Pack", exist and contain the critical elements. The individual "File-headers" are a free-format, as is the "File-Raw-Data". The only limitation for a header, is that if it is NOT fixed-length... It must include the "Step-Over" type-value, which gets added to the header-length, so it knows where the compressed segment begins. (You don't have to use both. Data can be in one or the other or both, as long as the correct values are set.)

Now all I have to make is a quick GUI to interface the code, and display the results. (Adding items to, and removing them, and displaying the contents of the "Packed CWAD". As well as a demonstration of how to use the pack in an actual program, as the read-only, and using some internal functions.)

I still imagine the greatest need for this, will come with larger format creations. For simple use, with only a hand-full of items, images, and scripts, this will offer no interest. This is why I am allowing such a wide range of potential values.

However, at the moment, the largest single CWAD size can only be about 2GB. This is due to the use of the LoadFile() command. But that should be within everyone's reach.

Personally, I just want it for image-swapping of models. The pre-rendered 32x32 icons make a nice touch to the ImageThumb portion of the CWAD. This also fixes the issue with the SketchUp OBJ conversion issue, where colors and images export as layers, and not image-maps.

Ok, I need some sleep now... I am seeing cross-eyed!

Petr Schreiber

20-10-2008, 19:28

Hi Jason,

thanks for your input.
I will be able to read it in depth during weekend ( that is not bad joke related to textlength, just school crunch time now :) )

Thanks,
Petr

ISAWHIM

20-10-2008, 19:48

I will be able to read it in depth during weekend

Sure... LOL... If I had a Quarter... :P

Just do me a favor... and don't change the format of the M15 for a while... ;D

Though, an A15 or B15 file would go hand in hand with the M15. (Animation and Bones)

I saw there was a way to "Rig" the M15 format. But when you get the collision module completed, this could help link all of them into one complete system. (With the Animation and Bones not being part of the GL collection, because there might eventually be a DX module that could use those same formats, but have no use for the M15. Plus, many 2D items use bones and animation also.)

Ok, back to work, before I drift-off again.

kryton9

20-10-2008, 21:23

Jason's explanation is really something and in reading it I kept thinking of working with the Unreal 2004 editor and it seemed to go hand in hand.
I think if any of you play with any of the modern game editors this sort of paradigm fits right in. It is easier to see and use than to write about and read about.

Jason you could even add custom scripts as you do graphics and models, for behaviors and physics as they do in unreal.

ISAWHIM

21-10-2008, 14:38

Ok, I have the CWAD shell finished and the three primary sets of code which handle the creation of "New" CWAD files. It does not actually create the file yet, as I am still playing in memory.

I have sliced it up into two parts. The shell is as dumb as a rock, and it does whatever you tell it to do. It only checks to see that you are feeding it valid info, and if so, it uses it to create the indexes. (It has no idea what "You" are actually doing on your side of the code.)

For "Your" side, and by that, I mean "My side", for now...

You have to use the index commands to register your data, once you have formatted it to taste. These are the "Type codes" that go with the "Type names" which are used in the index files. Since it is impossible for the shell CWAD to guess how you handle data, it expects that you are giving it the correct data that you want indexed.

Sounds complex, but that is the job of the "Type" to use the data as it needs. It is the CWAD's job to tell the user where the data is located, and to extract it, and nothing more. The CWAD uses a generic extraction if there is no TYPE functions that exist for the TYPE NAME in the CWAD. This allows anyone to use any TYPE code they wish, or use none, and simply ignore that useless data in the CWAD. (The purpose of the index, is to stop the file from breaking if there is one missing TYPE function. This also allows ANY FUNCTION to pull that RAW DATA, for use in code.)

Point being... If you have a CWAD which was created with SOUND and IMAGES and MODELS, but you want to use that CWAD in another program, one without SOUND... The CWAD can still see past the SOUND, and if this is being edited, the SOUND sections can be removed, and the file shortened to only relevant data.

I have setup a "Manual" index... now I have to program the part which creates the matching data, that I told the index I had. These "Chunks" do get passed to the CWAD, as they match the index, and can be confirmed. But the CWAD has no idea if the chunk actually has SOUND or valid SOUND or an IMAGE... That is the "You" side. But it will push it and move it and delete it as desired, while you update the index with new info/settings.

Again, I am playing with the ImageThumb version, since that is simple to manage.

I will post the code tonight after work. (Though, it only contains the test creation, and little protection to invalid data input. It will just crash silently if incorrect data is used.)

Is there an image-control, or would I just draw on a window? I suppose I could load a TBGL screen and draw on squares.

ISAWHIM

22-10-2008, 13:39

Ok, here is the "Reduced comment" version... LOL, I had more comments than code.

We need a "Strip Comments" option.

Ok, I included the mini-manager, which doesn't do anything other than test the file. The file is an include, and will not run alone. All "Packs" will be includes also. The CWAD will check if all require "Pack" functions exist... otherwise it will just spit-out the data as RAW data. (It will also send a code to let you know if a "Pack" registration is missing.)

The "Test" is a "Fred" pack test. There is no "Fred" pack... but you can create one if you want!

This checks for "Fred"... Can't find it... So it gives you an alternative. "Use (GENERIC)?"

If you say yes... You now have a GENERIC pack setup in the CWAD. (Code stops there.)

If you say no... It indicates that no action will be taken, because no valid action was supplied. (You have to program a "Fred" pack control, or find the missing "Fred" pack controller, or live with a GENERIC pack setup.)

Forgive the sloppy code... I still have dead commands that I was testing, and variables that will not be used. But I will not have a replacement until I get one of the NON-GENERIC packs developed. (The GENERIC pack is almost complete.)

ISAWHIM

22-10-2008, 15:36

Just finished the ability to add raw data, with the file-name or other name for identification.

Technically, I just completed making a ZIP file format. (Almost a CWAD! I can smell it, it is so close!... Wait, that was my last brain-cell frying.)

I will post that code tonight, after I had time to humanize it, and reduce it a little more. It is SOOO not standard in any way. Except for the function-names. But now I know which variables I no longer need.

All comments and stupid warnings will be removed also. They will be replaced with some visual "Tree" structure to indicate what is being created. Once the file can be written, it will be time to write the READ code. (That should be a lot easier to handle, since it is all literal, and only reads data as it needs it.)

Seems that "MSGBOX" does not like to display raw compressed text. LOL... Thought my code was broken... Dumped it to a text-file and all is there. Even looks like a ZIP inside.

ISAWHIM

22-10-2008, 17:58

Ok, here is a slightly better example of it in action...

This has a folder with 8 (512x512) full-color BMP images. (About 6.5MB)
Reduces down to about 4.5MB... Using ZIP. (Standard ZIP output.)

Nothing special, but the 8 (2048x2048) images went from 96MB down to about 45MB.

Again, same with a normal ZIP.

However... When I converted them to 90% quality JPG, then saved them... Turning them back into BMP after unzipping... The reduction with nearly no quality loss was closer to 60% compression.

Anywho... The program has two tests... one with the images, and one with simple text-strings. Feel free to add your own file-paths and names in the manager-setup code. You can compound the calls over and over. (Though the pop-up message-boxes might bug you a bit. The last button will output the final result of all the buttons you press.)

For a speed test... throw in a loop, and comment out all the message warnings. Works quite quick, but it is no speed daemon. (Would be faster if I removed all the checking code, and use more GLOBALS.)

ErosOlmi

23-10-2008, 08:25

Jason,

this is a very interesting project and seems perfect for a new thinBasic module where code would be compiled and speed would be 1000 times faster than pure thinBasic code.

Eros

ISAWHIM

23-10-2008, 17:47

Eventually, I hope... but not until the "Generic" part is finished. (The rest is all conditional after that.)

I am having trouble attempting to create PARENT/CHILD winows in windows... Like a paint-program would have. If I can't figure it out. I will reduce it to panel-views. (This is for the CWAD-Manager part, not the actual CWAD part.)

I am starting to think that a single window, with zones, may be the better alternative. There is sample code for that.