Replacing ARRAY SORT

**MikeTrader** · 10-08-2009, 00:32

powerbasics ARRAY SORT is a convenient way sort arrays of any data type. It also allows you to tag a second array of any kind to be sorted. Under the hood there is a lot going on and it took many years to implement.

When converting PB code to another language, this functionality needs to be replaced. This thread accomplishes that task by using a string sorting algorithm posted by Stan Durham for string arrays and Quick sort for integer arrays. Both are as fast or faster than the PB implementation and the code is in a high level language for easy porting.

Under the hood, PB selects sorting algorithms for the data type. Since the data type of an array is held in the array descriptor table (and only available using the keyword ARRAYATTR) a seperate function must be used for each datatype. Most real world implementations will be complete with three, STRING, LONG, QUAD, as most everything can be squeezed into these (floats can be multiplied by 1000 for example and fed through sorting as integers)

Strings are by far the tricky one. The challenge is to not only implement a fast method, but give it traction meaning tag an array, allow case to be ignored and sort ascend/descend with a call that is not overly long and combersome to use. For strings the criteria are:

Handle any length String including Zero length
ASCENDING/DESCENDING order
TAG an ARRAY of pointers
Select FIRST & LAST elements TO SORT
Equivalent speed to PBs SORT ARRAY
Zero or 1+ based Arrays
A generic CALL that is user friendly
High level language compatible (no ASM or PB keywords)

The Algorithm sorts the handles the PB Dynamic BStr Strings.
A Dynamic String array in powerbasic is an array of pointers to BStr Strings.
pArr = VARPTR(b(1)) will return the address of the first element of this array
pStr = @pArr will return a pointer to the actual BStr STRING
@pStr will return the STRING contents.
This is equivelent to the powerbasic reserved word pStr = STRPTR(b(1))

But these two operations are not equivelent.
When a zero length string is assigned with b(x) = "" powerbasic does not request memory
for a non existant string, so the pointer is set to zero (NULL Pointer).
To compensate for this, STRPTR returns a pointer to a real memory location (one address is used for all NULL strings) to avoid a GPF when dereferencing a NULL pointer. Unfortunatly this is not the case when when you do the two step de-referencing.

As this algorithm derives part of its speed by sorting the pointers to the strings and swapping them
(not swapping the actual strings themselves) it uses the two step deferencing to find the character
at each position in each string. When a zero length string is encountered a GPF would occur.
To make this implementation work, a conditional statement must check each pointer.

In testing with 1300 first names, it performed a little slower 5400 Clks than PBs 3400.
This is because there are many slots of two names that are swapped unecessarily. This
is a product of the method the algorithm uses
Additional testing would be required to determine if additional conditional statements
would make a significant difference to this.

But when 2000, random strings up to 32 Chars are used it performed 2x faster 1700 Clks than PBs 3500

Undoubtably PB uses ASM which would speed this algorithm up a little, but the purpose is to use
a high level language to replace the PB ARRAY SORT functionality and thus convert PB code to
another language (with friendly support).

First = First element in the array = LBOUND()
Last = Last element in the array = UBOUND()

In this example code, a UDT is sorted as the TAG array. Notice the Tagged pointer array
is declared AS MyTYPE PTR. This is tricky, because PB does not allow an array of pointers
to be passed BYREF for some reason. To get around this, we use the same trick of passing
a pointer to the array table (an array of DWORDS).

Then we use DIM AT to create and array in the SUB that can be used to access the
pointers as though we had sent them BYREF.

This was found to be 2x faster when sorting 1m LONG values with QuickSort rather than using
pointer offsets @pArr[i].

Thread: Replacing ARRAY SORT

Thread Tools

Search Thread

Display

Threaded View

Replacing ARRAY SORT

Members who have read this thread: 0

Bookmarks

Bookmarks

Posting Permissions