PDA

View Full Version : All Basic word count code challenge



John Spikowski
09-05-2009, 22:15
*** deleted ***

dcromley
11-05-2009, 05:19
Well, you certainly got my interest. I'm still learning thinBasic, so I'm sure I'm not using all it's power, but I did write a program that does the deed and works surprisingly (to me) fast. So I registered there and posted the fact that I had a program. "Trim" was the adjective I used.

I took care to minimize the moving of strings and used a binary tree to keep the words sorted. B-Trees are better because they are more balanced, but that takes more code. It'll be interesting to follow.

Charles Pegge
11-05-2009, 09:46
This code challenge goes to the heart of many programming applications. I'm using it to test Oxygen to the limit. So far, O2 can gather the Bible wordlist in around 4 seconds and sorting it alphabetically will take about 0.1 seconds. But my word gathering technique is fairly crude at present. My ambition is to `do` the Bible in about 1 second eventually, and reuse the best functions in the runtime library.

Charles

kryton9
11-05-2009, 11:12
Good luck entrants - as this is a cool challenge.

zlatkoAB
11-05-2009, 14:54
Everything that you say sounds fine-
but where is your source code? :unsure:

ErosOlmi
11-05-2009, 15:06
http://www.allbasic.info/forum/index.php?topic=688.0

martin
11-05-2009, 19:16
Here is my quickly written simple code. I'm sure it's not perfect and not the fasted but i guess it's a good start for a beginner like me :)


uses "file", "console", "dt"

function tbmain()
local woord, tekst as string value = lcase$(FILE_Load("c:\bible.txt"))
local output as string value=""
local totaal, T1 as long value = timer

tekst=trimfull$(replace$(tekst, any "().,&:,+=-*?!%$~' " & $dq & $tab & $crlf, with " ")) & " "
totaal=TALLY(tekst, " ")-1

printl "Counting words, please wait a few minutes..." & $crlf

do while TALLY(tekst, " ")>0
woord=left$(tekst,instr(tekst, " "))
output+=woord & ": " & TALLY(tekst,woord) & $crlf
tekst=replace$(tekst,woord,"")
loop

FILE_Save("c:\words.txt",output)

printl "total words: " & totaal
printl "output file (list of unique words) saved to c:\words.txt"
printl "Running time: " & DT_SecToTime(Timer - T1)
printl "-------------------"
printl "PRESS A KEY TO QUIT"
waitkey
end function

dcromley
11-05-2009, 21:25
Very interesting. And didactic.

A couple of anomalies when running against KingLear.txt:


doecontacharacterthintended : 1
dattranscriptierrorinfringement : 1

I think I'll single-cycle through the entire run and learn something. :) Thanks.

ErosOlmi
12-05-2009, 00:10
Posted my code: http://www.allbasic.info/forum/index.php?topic=688.msg2220#msg2220

martin
12-05-2009, 06:43
Posted my code: http://www.allbasic.info/forum/index.php?topic=688.msg2220#msg2220


Wow your code does the job in 9 seconds. Amazing! :shock:

ErosOlmi
12-05-2009, 09:18
In this example timing can be very very different depending on CPU memory access time.
A lot of string handling is going under the curtains and this is effected by how CPU and memory work together.

I think I will be able to get better results because I found that JOIN$ (http://www.thinbasic.com/public/products/thinBasic/help/html/join%24.htm) function is not very well optimized for speed. But for this I need to release a new thinBasic update.

Ciao
Eros

Charles Pegge
12-05-2009, 23:32
Using OOP and a mix of Assembler and Basic, Oxygen files an wordlist for the Bible in 2.65 seconds (including 0.05 secs compile time). But there's more to do - I'm using an 8 byte hash to check for unique words but no hash table yet - just brute force scanning.

Charles

ErosOlmi
12-05-2009, 23:44
I was able to go down to 6.765 seconds with pure thinBasic code.



Reading input file: C:\thinBasic\SampleScripts\_Test\Challenge_WordCount\bible.txt
Infput file size: 4397206 bytes
Adjusting buffer ...
Removing double nuls from buffer ...
Parsing every word into an array ...
Sorting array asc ...
Counting words ...
Saving output results to: C:\thinBasic\SampleScripts\_Test\Challenge_WordCount\bible.txt.Out.Txt

------ All done ------
Number of words found ....... : 807580
Number of unique words counted: 13262
Total time: 6.765 seconds


Updated file: there was a little error.

Petr Schreiber
13-05-2009, 00:03
Hi Charles, Eros,

those are great results!

I did minor mod of Eros version, I get little improvement around 5% by switching off boundcheck and using $SPC instead of $NUL, which allowed use of TrimFull$:


USES "Console"
USES "File"

#DEFAULT BOUNDCHECK OFF
dim T1, T2 as double

T1 = timer

'---Load file
dim sInFile as string = app_sourcepath & "bible.txt"
dim sOutFile as string = sInFile & ".Out.Txt"

printl "Reading input file: ", sInFile

dim sInBuffer as string = file_load(sInFile)

printl "Infput file size:", len(sInBuffer), "bytes"

printl "Adjusting buffer ..."
'---Change all delimiters to $SPC chars
dim sSearch as string = chr$(1 to 32)+"0123456789:-|(){}&#91;]';,.?!/\^!@#$%^&*_<>=+"+$DQ+CHR$(123 to 255)
dim sReplace as string = string$(len(sSearch), $spc)
sInBuffer = ucase$(Replace$( sInBuffer, any sSearch, sReplace))

'---Remove all double $SPC to single $SPC in order to have all words separated by a single $SPC
printl "Removing double spaces from buffer ..."
sInBuffer = trimFull$(sInBuffer)

'---Parse the buffer into an array
Dim Words() As string '---All words uncounted
Dim cWords() As string '---All words counted
Dim nWords As Long '---Original number of words
Dim nWordsOk As Long
dim cWord as long
dim Counter as long

printl "Parsing every word into an array ..."
nWords = PARSE(sInBuffer, Words, $spc)

'---Sort ASC
printl "Sorting array asc ..."
array sort Words

'---Scann all words and count
printl "Counting words ..."
redim cWords(nWords)
for Counter = 1 to nWords - 1
cWord += 1
if Words(Counter) <> Words(Counter + 1) then
nWordsOk += 1
cWords(nWordsOk) = Words(Counter) & " (" & cWord & ")"
cWord = 0
end if
next

'---Last compare to check if last item is equal to previous of is different alone
if cWord = 0 then
cWord += 1
nWordsOk += 1
cWords(nWordsOk) = Words(Counter) & " (" & cWord & ")"
else
cWord += 1
cWords(nWordsOk) = Words(Counter) & " (" & cWord & ")"
end if

'---Redim to the last found word
printl "Saving output results to: ", sOutFile
redim preserve cWords(nWordsOk)
file_save(sOutFile, join$(cWords, $crlf))

T2 = timer

printl
printl "------ All done ------"
printl "Number of words found ....... : ", nWords
printl "Number of unique words counted: ", nWordsOK
printl "Total time:", format$(T2 - T1, "#0.000"), "seconds"

waitkey

Charles Pegge
13-05-2009, 02:17
Best O2 result so far: 0.308 seconds with numbers & punctuation filtered or 0.289 seconds unfiltered.

I'm using a very large number of hash buckets: 512 Meg, each 1 bit wide. :diablo:

Charles

PS: Losing a few words - further work needed.

martin
13-05-2009, 11:17
deleted

martin
13-05-2009, 11:43
edit: decided to delete my above reply because the code i posted here was terrible wrong and i don't want to go offtopic

ErosOlmi
13-05-2009, 12:37
Martin,

Oxygen is ... wow, it is a beast: power, full of possibilities, fast and with great support from Charles.

But it is important to remember that it is another programming language and thinBasic keywords cannot be just inserted into a Oxygen scripts and compiled.
Charles has done a great job to improve and expand Oxygen to an astonishing level but to be able to use it, one need to know what are the keywords supported by Oxygen, what keywords are 1 to 1 with thinBasic, ... and so on.

So keywords like TALLY, INSTR, REPLACE$ ... I'm not sure if supported by Oxygen and if yes I'm not sure if the syntax is the same as in thinBasic.

We have to consider that Charles creature has to compile on the fly source text buffer into machine code and a single keyword is than expanded in many even hundred machine code instructions.
But once you get used to Oxygen way of working you will have a great weapon in your hand that can be used in many situations where speed is important.

Ciao
Eros

martin
13-05-2009, 13:29
But it is important to remember is that it is another programming language and thinBasic keywords cannot be just inserted into a Oxygen scripts and compiled.
Hi Eros, yes that's what I thought for a moment. But now I see it isn't that easy. I need to study more documentation and examples :read:

Lionheart008
13-05-2009, 16:26
hello... :)

my 10 cent for today...

some weeks ago I have written a textanalyse speed script (http://community.thinbasic.com/index.php?topic=2553.0), but I cannot use it for this exercise, very pitty... I have tried to modified it for nearly one hour... without success... so I have to go back this morning to eros/petr basic script and modified it in some way... perhaps it's some millisecond faster than petr's or eros script, don't know... ;) my laptop took different times for it... and I have still to low knowledge to adept it for oxygen... sorry...

charles oxygen script will beat all others, I am sure... ;)

I have tried to use "+trim$(join$(outString, $CRLF)" wasn't very useful...
and... what's the different between PARSE$ (didn't run) and PARSE (runs) command ???

better somebody else can check my script for correct speed time :) my machine is very, very old and tired...

ciao, Lionheart

Charles Pegge
13-05-2009, 18:16
Hi Everybody,

Here is my current effort, using a proper hash table to check for unique words - which takes up most of the time. I'm using an 8 byte hash of each word. When a word is identified against a hash it is further checked against the original word. On the Bible with an HP Pavillion desktop it takes 0.370 seconds including compile time (0.06 Secs). With an 8 byte hash, the second part of the word check proves to be unnecessary, and omiting will save another 0.05 seconds. Surprisingly, sorting only takes up 0.015 seconds.

Charles

PS I have not altered much recently but to make sure it all works:

Oxygen Update: http://community.thinbasic.com/index.php?topic=2517

Petr Schreiber
13-05-2009, 19:19
Very nice Charles,

on my good old Sempron ticking at 1.8GHz it takes not more than 0.410 seconds.
Do you plan adding word count for each word?


Petr

GSAC3
13-05-2009, 19:40
Charles:

You'r version is a real screamer!

I tried it on two different BIBLE.TXT files on my "beast" and got the following results---

COMPILE: 47.622 microsec 46.630 microsec
RUN: 217.440 microsec 203.313 microsec
TOTAL: 265.062 microsec 249.943 microsec

WORD COUNT: 12,860 13,172

DELL XPS-1710
XP Pro SP-2
Intel Duo T7600 @ 2.33 GHZ
2.00 GB ram

kryton9
13-05-2009, 20:56
Amazing optimizations you guys are coming up with in standard thinBasic and then Charles comes in with an incredible time with Oxygen.
I started to scan through the list generated by the program of sorted words and it is just amazing that the computer can do all of that so quickly.
I guess that is why we love our computers, they sure can do amazing stuff with the right amazing code!!

Charles Pegge
14-05-2009, 00:30
Here is a more streamlined O2 version (mk 8 ).

I've removed some redundant code and combined one or two of the methods. The verify procedure has been switched out as I am confident it is not needed with 8 byte hashing. This brings the time down from 0.370 to 0.280 seconds.

However there seem to be some positional effects - depending on the size of the source code string - even comments appear to affect the performance :o . I can't pin it down yet.

Charles

PS I'll put the word counts in later, Petr. :)

Lionheart008
14-05-2009, 10:59
hi all bible tester :)

have tuned my version and lost (win!) nearly four seconds in the thinbasic conventional way (meaning: without oxygen) I have made simple some changes and adding new stuff :)

1) perhaps anybody can check the speed with a fast machine and test the script, would be nice...my suggestion aims to around 4 until 5 seconds on a power machine, but I am not sure :)

2) the second version includes oxygen, but only a tricky way to use it... ;) check it too :)

Ciao, have all a nice and sunny day, Lionheart
ps: it's nearly frustrating to see charles result about fantastic 0.37 sec... uhps... oxygen alien and groovy like ! :D

Charles Pegge
15-05-2009, 16:29
By changing most variables from local to static then reworking the hash coder and word reader - further reductions have been achieved. The overall speed has come down from 0.270 seconds to 0.222 which means the run time is now 0.165 seconds

Wherever possible data is loaded into the CPU 4 in character morsels at a time instead of single bytes and many words will be processed without code loops.


Charles

This version is included with the latest Oxygen as reading9.tbasic

http://community.thinbasic.com/index.php?topic=2517

Petr Schreiber
15-05-2009, 16:40
Hi Charles,

0.230 total time on my Sempron, amazing result!
Is getFile new Oxygen native function?


Petr

ErosOlmi
15-05-2009, 17:55
Ok, I will never go even closer to Charles code but here it is my last try in bibble word count.

Attached script takes advantages from latest thinBasic beta 1.7.8.0 you can get here: http://community.thinbasic.com/index.php?topic=2588.0
so it is mandatory to download it in order to test this script.

In particular the new statement ARRAY UNIQUE ...
that in one single line of code does mainly all the job to find unique words and count them

array unique Words(), cWords(), ascend, lWords()
I think this new feature is enough general to be useful in different places where programmer has to classify or count elements.
For the moment it is limited to dynamic string arrays but once tested enough it will be easily expanded to work on any kind of arrays.

Also some visible improvements in REPLACE$ thanks to the help of Petr who sent me an optimized version.
I think also JOIN$ will have visible improvements.

Anyhow at the end I was able to go down to around 3.1 seconds from the previous 6.5 or so. Hope you like it.

Ciao
Eros

Petr Schreiber
15-05-2009, 19:37
Just installed latest ThinBasic.
Your original code ran 5.8 seconds, new version 2.7 seconds!

ErosOlmi
15-05-2009, 19:49
Hey Petr you CPU runs script faster than mine !
This means your CPU is more "sting optimized"

Petr Schreiber
15-05-2009, 19:57
I was a bit surprised :)

I tested with all programs closed and waited till Firefox shutdown stops making CPU work at 100%.
But I guess that is typical benchmarking approach to avoid interaction with other processes.

Lionheart008
15-05-2009, 22:59
hi eros, petr, all :)

first of all: the new thinbasic release is fast! faster than ever, I have tested it, think about more than 30-40 per cent ! :D, very, very good, eros!


Also some visible improvements in REPLACE$ thanks to the help of Petr who sent me an optimized version.
I think also JOIN$ will have visible improvements.

Anyhow at the end I was able to go down to around 3.1 seconds from the previous 6.5 or so. Hope you like it.

@ dear eros, would you like to go with your tuned speed script (one page before) a big part down under => 2.5 seconds or even more ??? It's a serious offer and not a joke, have tested it some minutes before... ;)

I can fix it for you, if you like, have made some experiments with your script... (you just changed/add/delete some values...) and after that of course with my last oxygen/thinbasic/lionheart_megamodule script... I love this crazy stuff from charles and without changing anything after installation of new thinbasic beta release all scripts running faster about 30/40 per cents :) - thank you for your programming work !...

bye and servus, Lionheart

ps: by the way: if you are testing your script for one, two, three times again... the values are always changing a little about 0.2 - 0.4 milliseconds... that's what I have noticed...

Lionheart008
16-05-2009, 01:11
night crawler input :)

tuned version: perhaps anybody can test it for me again... this is my last new test script with my lionmodule included... it's about half time faster than the old one... thank you very much for this new beta release of thinbasic, eros ! great work! 8)

my script speed results for the bible text should take about 2.3-2.6 seconds... or faster /slower ... ???

to get a good average of the time results... make three tests ;)
thanks in advance ! :D

good night, best wishes, have a nice week-end, Lionheart
ps: all done in zip folders includes the bible text or "pur" scripts in second zip folder with dll of oxygen and lionheart_megamodule dll ;)

kryton9
16-05-2009, 04:42
Here are my times:
Eros WordCount_Simple_2.tbasic 2.062
Frankos_TextSpeedAnalyse_Bible_Lionmodule_Tuned_new.tbasic 1971613
Frankos_TextSpeedAnalyse_Bible_Lionmodule_Tuned_new.tbasic 1968929

Petr Schreiber
16-05-2009, 10:26
Hi Frank,

your script runs a bit faster than Eros version, but I do not understand what it does, as the output is little bit confusing - something UCASE, something not, some empty lines.

I guess that could be because of different mask you use:


chr$(1 to 2)+"0123456789:-|(){}&#91;]';,.?!/\^!@#$%^&*_<>=+"+$DQ '+chr$(i-1)

- you do not filter out $CR, $LF, $TAB and other characters.
That makes the replacing faster, but also the output confused ... I think.


Petr

Petr Schreiber
16-05-2009, 10:43
Modifying mask can bring some good results,

this variation to Eros code still counts the same number of words, while using shorter mask.
Time changed from 2.7 to 2.5 on my PC.

But of course, this mask was created based only on characters which really exist in Bible.txt, so on other texts it could fail. Thats why Eros version is still better.


USES "Console"
USES "File"

dim T1, T2 as quad
HiResTimer_Init
T1 = HiResTimer_Get

'---Load file
dim sInFile as string = app_sourcepath & "bible.txt"
dim sOutFile as string = sInFile & ".Out.Txt"

printl "Reading input file: ", sInFile

dim sInBuffer as string = ucase$(file_load(sInFile))

printl "Infput file size:", len(sInBuffer), "bytes"

'---Change all delimiters to SPC chars
printl "Adjusting buffer ..."
dim sSearch as string = CHR$($CRLF+$DQ+"!#&'(),-.0123456789:;<>?&#91;]")
sInBuffer = Replace$(sInBuffer, any sSearch, with $SPC)

'---Romove all double SPC to single $SPC in order to have all words separated by a single SPC
printl "Removing double $SPC from buffer ..."
sInBuffer = TrimFull$(sInBuffer)

'---Parse the buffer into an array
Dim Words() As String '---All words uncounted
Dim cWords() As String '---All words counted
Dim lWords() As long '---All words counted repetition
Dim nWords As Long '---Original number of words
dim Counter as long

printl "Parsing every word into an array ..."
nWords = PARSE(sInBuffer, Words, $SPC)

'---
printl "Calculating unique words ..."
array unique Words(), cWords(), ascend, lWords()

'---
printl "Mixing words and counters ..."
for Counter = 1 to ubound(cWords)
cWords(Counter) = cWords(Counter) & " (" & format$(lWords(Counter)) & ")"
next

'---Redim to the last found word
printl "Saving output results to: ", sOutFile
file_save(sOutFile, join$(cWords, $crlf))

T2 = HiResTimer_Get

printl
printl "------ All done ------"
printl "Number of words found ....... : ", nWords
printl "Number of unique words counted: ", ubound(cWords)
printl "Total time:", format$((T2 - T1)/1000000, "#.000"), "seconds"

waitkey

ErosOlmi
16-05-2009, 11:02
Petr,

you touched a good point: correctness of results.
Speed is important but not the most important aspect. You can get more or less speed just changing few piece of hardware.
Correct data is much more important. You cannot get correct or bad results changing any piece of hardware (well, sometimes yes ;) ).

ErosOlmi
16-05-2009, 11:29
Ok,

I posted my last try in AllBasic.info forum at http://www.allbasic.info/forum/index.php?topic=688.msg2247#msg2247
I made little modifications to the script (attached to this post) in order to possibly fully follow challenge rules posted at http://www.allbasic.info/forum/index.php?topic=683.msg2188#msg2188 In last example I forgot about rule to have output all in lower case.

Thanks to this challenge and to the possibility to concentrate on single specific language aspects, I was able to release a faster thinBasic engine.
I'm going now to fix some bugs Petr has sent me and popped out in last thinBasic beta and to expand ARRAY UNIQUE ... feature in order to be able to work on all kind of arrays.

Ciao
Eros

Lionheart008
16-05-2009, 11:34
good morning :) !
dear petr, eros...

must grin a little... ;) as I am not with my desktop pc... (with all my examples with bible testing) I try to say it here some closer parts what I wanted to do... with my tunings..

and kent has friendly tested a version, without big tuning and without new thinbasic release ! ;)

1) ok, eros, you may have right in some opinion the script has not only the aim to be fast... I have had three different versions of my script, every time I have got new results and new output strings... it's quite ok for me... I would like only to help or making better results.. the script from petr does show it too... :)


you touched a good point: correctness of results.
Speed is important but not the most important aspect.

2) I haven't changed a lot only few things..., cause I have to try to speed eros script...
I ran this script from the beginning with
a)
dim T1, T2 as quad
HiResTimer_Init
T1 = HiResTimer_Get


b) to change the mask:

dim stringSearch as string = chr$(1 to 32)+"0123456789:-|(){}&#91;]';,.?!/\^!@#$%^&*_<>=+"+$DQ '+chr$(i-1)

changed it into chr$(1 to 4 ) or chr$(1 to 8 ) - that increase the speed a lot !

c)
dim inStringBuffer as string = ucase$(file_load(sInFile)
changed it into: file_load(inString) - it's faster for me...

3) ...these are the main changes I have done, you can try it, dear petr and eros ;)
I have spent one, two hours for testing various modes... you can believe that ...

part one... of the post more to come... the removal of my girl friends going on very fast :)

4) rules: the best way were to have ONE SPEED MACHINE to test all the bible-test-scripts with the same conditions for this testing methods ;)

best regards, thanks, Lionheart

ps: a) I have created last night also an oxygen/thinbasic script with the bible test with less than 0.004-0.005 seconds :) but that's another planet...

b) but I will check again a new version with pure thinbasic conventional code ;)

I am off now :)

Lionheart008
16-05-2009, 14:32
short input :)

hi all bible fans...I am at home at the moment... and have caught my old notebook, not the desktop pc ! ;)

here I send my OXYGEN-TUNED version of the bible-test-script... I cannot beat charles script, but you can see it's formel one friendly like 8)

@eros and petr ...

I have used the same old conditions like (you do in your scripts) my formerly first testscript (conventional one only with thinbasic code)... don't change any mask values ;)

... this new oxygen script it's very, very fast and does all the work it should do with sorting and output and collecting all words .... :shock:


'-- speed textanalyse script for bible example by lionheart, 13.-16. may 2009

uses "Console", "File", "Oxygen" '--- , "Lionhearts_Megamodul"
#DEFAULT BOUNDCHECK OFF
dim src as string

' -- Prepare Oxygen script
src = "
o2h
function Calculate_asm(c as long) as long
local ic, jc as double, i as long
'
dim a=&c
mov eax,a
mov edx,[eax]
fld1
fld1
(
fmul st(0),st(0)
fstp qword ic
fld1
faddp st(1),st(0)
fld st(0),st(0)
dec edx
jg repeat
)
fcomp st(0),st(0)
'
end function

doevents(off)
hirestimer_init
T1 = hirestimer_get
o2_asm src
T2 = hirestimer_get


sub finish() at #p0
'print `running.. testing the bible text`
'proc mbox `lionhearts bible script is oxygen tuned !`
terminate
end sub
"
o2_basic " print `Hello and Ciao dear Bible Tester` : terminate "

Console_WriteLine
Console_SetTitle("-- TextSpeedAnalyse Script with bible.txt by lionheart --")

dim inString as string = app_sourcepath & "bible.txt"
dim outString(256) as string = inString & "lion_mysorted_bible.txt"
dim t1, t2 as quad

dim i as long

doevents(off)
hiResTimer_Init
t1 = hiResTimer_Get
o2_asmo src
T2 = hirestimer_get

printl "Reading bible text file: ", inString
dim inStringBuffer as string = lcase$(file_load(inString) )
printl "Input file size:", len(inStringBuffer), "all my hot bytes"


'---Change all delimiters to $SPC chars
dim stringSearch as string = chr$(1 to 32)+"0123456789:-|(){}&#91;]';,.?!/\^!@#$%^&*_<>=+"+$DQ+chr$(i-1)
dim stringReplace as string = string$(len(stringSearch), $spc)

inStringBuffer = Replace$( inStringBuffer, any stringSearch, with $SPC)

'---Remove all double $SPC to single $SPC in order to have all words separated by a single $SPC
printl "Removing double spaces from buffer ..."
inStringBuffer = trimFull$(inStringBuffer)

'---Parse the buffer into an array
Dim Words() As string '---All words uncounted
Dim cWords() As string '---All words counted
Dim nWords As Long '---Original number of words
Dim lWords() As long '---All words counted repetition
dim Counter as long

printl "Parsing every word into an array ..."
nWords = PARSE(inStringBuffer, Words, $spc)

printl "Calculating all unique words ..."
array unique Words(), cWords(), ascend, lWords()

'---
printl "Mixing words and counters ..."
for Counter = 1 to ubound(cWords)
cWords(Counter) = cWords(Counter) & " (" & format$(lWords(Counter)) & ")"
next

'---Redim to the last found word
printl "Saving output results to: ", outString(256)
file_save(outString(256), join$(cWords, $crlf))

dim Characters(256) as long
dim stringLen as long = len(inString)

for i = 1 to stringLen
Characters( asc(inString, i)+1 ) += 1
next

for i = 1 to 256
if Characters(i) > 0 then outString(i) = Characters(i)+$TAB+iif$( i<2, PARSE$( inString,",F",i), chr$(i-1))
next

'--- bonus :)
array sort outString(i), descend
printl
printl "- plus bonus: array descend sorted ! - "
printl

printl "------ oxygen tuned: all work is done ------"
printl
printl "Number of all words found ....... : ", nWords
printl "Number of unique words counted: ", ubound(cWords)
printl "lionhearts total time: " & format$((t2 - t1)/1000000, "#.000,"), " Seconds complete "

o2_exec
waitkey


if you like test it... my time belongs at 0.007 - 0.008 seconds, with a faster machine it takes the half time, I suggest... see you :)

nice day, best regards from sunny germany :D
back at removal work again... uff...

Petr Schreiber
16-05-2009, 14:53
Hi Frank,

I hate to be the bad guy, but your script does not do the "all the work it should do with sorting and output and collecting all words" in 0.007 seconds, simply because the time you measure is here:


t1 = hiResTimer_Get
o2_asmo src
T2 = hirestimer_get


... which is compile time of Oxygen script ( which does some floating point math not related to the words ).
When you convert it to:


t1 = hiResTimer_Get
o2_asmo src
printl o2_error
T2 = hirestimer_get

you will see that there is compile time error. That is caused by fact you accidentaly use ThinBasic keywords in Oxygen BASIC.

I understand this, I am often in such a coding extatic state I forgot to check where do I benchmark and then the results presented are ... not precise.


Petr

Lionheart008
16-05-2009, 15:04
;)

hi petr:) no problem... for me, I will check it tomorrow... oh yes, I am a human beeing, not a machine... I take it with humour, doesn't matter at all...

I have done it without o2_asmo src... and the result is similar for me...


t1 = hiResTimer_Get
'o2_asmo src
T2 = hirestimer_get


and my output.bible.text is just looking very good, isn't it ??? same result and content with 174 kb like eros had! ;) So I have thought everything is going fine with the script... ???

oh yes, quite interesting stuff... you are not the bad guy... I can handle very good with constructive critics :)


I understand this, I am often in such a coding extatic state I forgot to check where do I benchmark and then the results presented are ... not precise.


your quote above: good to notice that, I agree with you :D

ciao, best regards, Lionheart :)
really off now... somebody will a loud female voice beat me at my right side ;)

Petr Schreiber
16-05-2009, 15:10
Hi Frank,

the problem is timings other people post include file loading, processing and saving, while you posted timing for part of code which does not affect solution of the problem.

I think the T1 should be right after module loading, and T2 assignment right after saving the file to disk.

The output.bible.text is ok :occasion:


Petr

Charles Pegge
16-05-2009, 18:44
Further fanatical optimisations: :diablo:

I can get hash encoding almost for free by combining it with the word reading method next. And by using meta commands unnecessary code has been excluded. The time has come down from 0.222 seconds to 0.198 Seconds - Runtime 0.142

Pentium CPU registers can work in parallel when there are no dependencies.

Charles



'
'EMBEDDED HASH ENCODER
'---------------------
'
(
cmp ah,0 : jz nwordscan
rol ecx,5
ror edx,7
xor cl, al
xor dl, al
)
repeat
)
'
nwordscan: 'END OF WORDSCAN LOOP
'
h1=ecx
h2=edx



reading11.tbasic
http://community.thinbasic.com/index.php?topic=2517

@Petr,

Getfile and Putfile are minimalist file i/o built into O2H. No handles. - Not efficient for large string since they use the string concatenator which means double string copies. costing an extra 50 microseconds in this case.

Charles

Petr Schreiber
16-05-2009, 19:00
Further fanatical optimisations


Idea for new horror movie grows in my head - mad scientist, char&#91;] l'es, works in his laboratory on heretic optimizing experiment. Once he reaches timing of zero whole Universe will collapse. Based on real story.

Good job :lol:

Charles Pegge
16-05-2009, 19:36
Hi Petr,

Can you make your video card do this sort of thing with CUDA :) ?

I envisage patten recognition with parallel processing.

Charles

Petr Schreiber
17-05-2009, 00:04
Hi Charles,

will try, but I am currently still in process of translating headers to make it usable from ThinBasic :shock:

Charles Pegge
17-05-2009, 00:20
Hi Petr,

I was just curious to know whether the new generation of NVIDIA cards were capable of running procedural code albeit in a parallel way. In other words are they Turing-complete (capable of emulating any other computer)

Charles

Charles Pegge
17-05-2009, 14:56
Hi John,

I need to provide word counts and clean up my code a bit. but I have not discovered any more practical opportunities to shorten the time.

When it is ready you will need the latest Oxygen, but not a another thinBasic unless Eros releases a new submission.

This has been a very useful exercise - tuning up both the compiler and lexing techniques.

Charles

Charles Pegge
22-05-2009, 12:06
Well this completes my entry for the first wordcount challenge, with the counts included.
The good news is that it does it in 0.24 Seconds, and the bad news that it uses over 1000 lines of script. However most of this code will eventually find its way in the library and we are left a very simple top level script:



dim as string s,n
dim as list lis
'n= `KingLear.txt`
n=`Bible.txt`
s=getfile n
lis.wordlist s
lis.sort
lis.save `s.txt`


I've included this in the Oxygen zip as reading13.tbasic

http://community.thinbasic.com/index.php?topic=2517

Charles Pegge
23-05-2009, 16:59
Hardcore if you please. :blush:

Thanks for testing John.

kryton9
24-05-2009, 05:00
Congrats Charles, that is incredible when you think about all the work it is doing.