different types of floating point? [Archive] - thinBasic: Basic Programming Language Community Forum

View Full Version : different types of floating point?

kryton9

23-01-2011, 09:02

I watched this video tonight. http://www.youtube.com/watch?v=5I5An-4sz1s&playnext=1&list=PLB89282B909CC304A&index=14

In it he mentions avoiding floating point math, but then mentions using something called 16.16 fixed point and then later he uses 4.12 fixed point.... anyone know what these mean?

Charles Pegge

23-01-2011, 09:46

Brilliant video Kent.

I think 4 12 floating point refers to a 4 bit exponent and 12 bit significand. Same principle for 16 16.

The idea is not to waste processing power on unnecessary precision in mobile devices.

Charles

kryton9

24-01-2011, 00:40

Thanks Charles for the reply and explanation.

Johannes

28-01-2011, 13:51

I think 4 12 floating point refers to a 4 bit exponent and 12 bit significand. Same principle for 16 16.
No, that would still be floating point.

In floating point you have a sign, exponent and significand. But here they specifically state it's fixed point. The 16.16 and 4.12 denote the bits before and after the binary point.

So the first case has numbers in the form &bxxxxxxxxxxxxxxxx.xxxxxxxxxxxxxxxx and the second one in form &bxxxx.xxxxxxxxxxxx. These could both be unsigned numbers with a separate sign bit or two-complement. (Two-complement works very well with fixed point, so my guess is the latter.) But there is no exponent.

Where floating point has a large range of numbers (exponent) with a fixed precision (significand), but that precision translates to varying number magnitudes (exponent), fixed point has a strictly defined range of numbers but always the same precision. If we assume two-complement (like thinBasic's integer variable types Integer, Long and Quad) the ranges for the two formats are:

16.16: &h8000.0000 through &hFFFF.FFFF and &h0000.0000 through &h7FFF.FFFF
-32768 through -1/65536 and 0 through 32767 65535/65536. Precision is 1/65536 or 0.0000152587890625.

4.12: &h8.000 through &hF.FFF and &h0.000 through &h7.FFF
-8 through -1/4096 and 0 through 7 4095/4096. Precision is 1/4096 or 0.000244140625.

Precision in decimal would be 4.8 decimals for the first format, 3.6 decimals for the second. (Multiply the number of fraction bits by Log10(2) or 0.30103.)

Not surprisingly the number formats are 32 bits and 16 bits respectively. Multiplying the latter, for instance, would be a 16x16 bit multiplication (a single instruction in assembler) resulting in a 32-bit result. An arithmatic shift right (which preserves the sign for two-complement) of twelve bits (also a single assembler instruction) would then give a correct result, assuming no overflow occurs.

Although range is limited with fixed point, execution speed is fantastic. Take, for instance, adding two numbers. If the magnitudes/exponents differ in floating point you get a time penalty for shifting the smallest number "down" to the correct magnitude. But with fixed point it's a simple addition. You can basically ignore the binary point and pretend they're integers. Multiplication in floating point has the problem of the top bit not ending up as zero, and so you need a check and possibly an internal shift. Don't get me started on division. ;)

Many moons ago I wrote a 128-bit fixed-point Mandelbrot generator (on my ARM) and it's workable, even on a 32-bit machine. My number format was 4.124 (you don't need a lot of precision before the binary point with Mandelbrot) but I used unsigned integers with an explicit sign bit instead of what I described above: &bSxxx.yyy...yyy.

Charles Pegge

28-01-2011, 18:57

Johannes,

Thanks for correcting.

I see that Floating point hardware is available for the Arm but we have to assume that not all the present generation of mobile devices have it.

http://www.arm.com/products/processors/technologies/vector-floating-point.php

As for division -- it is at leas 30 times slower than multiplication even on floating point hardware. Programmers need to know that eliminating unnecessary divisions is one of the most effective ways of improving execution speed.

Charles

kryton9

29-01-2011, 00:41

Johannes, thanks for your explanation too.

I do better with real examples than descriptive definitions.

For example the floating point number. 12.73948025784216
How would that be represented in 16.16 and 4.12 ?

How would one go about doing something like this, again in 16.16 and 4.12 fixed point math ?
11.5432 + 78.6825
11.5432 - 78.6825
11.5432 * 78.6825
11.5432 / 78.6825

Thanks.

Johannes

29-01-2011, 01:16

I see that Floating point hardware is available for the Arm but we have to assume that not all the present generation of mobile devices have it.
My Acorn machine is from the early 1990s (still working!) and there wasn't a hardware FPU then. It's all software emulated and it. Is. Slow. Hence me writing the Mandelbrot generator in fixed-point.

As for division -- it is at leas 30 times slower than multiplication even on floating point hardware. Programmers need to know that eliminating unnecessary divisions is one of the most effective ways of improving execution speed.
Exactly. I have colleagues who obviously think that calculations either take no time at all or all execute at the same speed. If you have to divide by a constant, store the inverse and multiply by that.

Johannes

29-01-2011, 01:35

Johannes, thanks for your explanation too.

I do better with real examples than descriptive definitions.

For example the floating point number. 12.73948025784216
How would that be represented in 16.16 and 4.12 ?
In single precision (binary) floating point (there is also decimal floating point these days) this would be stored as 1.59243503223027 with an exponent of 3 (1.59243503223027 * 2^3 = 12.73948025784216).

The binary representation of the significand would be &b1.10010111101010011101001. (Internatlly that initial '1' is not stored.)

Now, fixed point 16.16 (usually Q16.16 is used) this would be &b0000000000001100.1011110101001111 and in Q4.12 this would be &b1100.101111010101. In both cases the final bit has been rounded up. If you convert the values back to decimal you get the following results:

Single-precision FP: 12.73948001861572265625
Q16.16: 12.7394866943359375
Q4.12: 12.739501953125

I've emphasized the incorrect digits. You supplied 16 digits (all of them, not just the fraction) and to represent them correctly you need 54 bits of significand. In this case Double Precision might work (53 bits of precision) but Ext (64 bits) would give you a perfect result.

Normally you don't need that kind of precision. If you can do with almost 5 correct decimals then Q16.16 would work.

Please note that for your example you would really need 5 bits for the integer part because you always need a bit for the sign. But the 'Q' definition doesn't count that sign bit. You would have to have Q4.11 to be able to store it in 16 bits.

Johannes

29-01-2011, 02:15

How would one go about doing something like this, again in 16.16 and 4.12 fixed point math ?
11.5432 + 78.6825
11.5432 - 78.6825
11.5432 * 78.6825
11.5432 / 78.6825
First things first: you cannot do any of them in Q4.12 because you need more than 4 bits to store 78 which is &b1001110. So let's just do Q16.16.

The numbers you have chosen in Q16.16 are &b0000000000001011.1000101100001111 and &b0000000001001110.1010111010111000.

Addition:
&b0000000000001011.1000101100001111
&b0000000001001110.1010111010111000 +
&b0000000001011010.0011100111000111

This is &b1011010 + (&b0011100111000111 / 65536) = 90.2256927490234375

Subtraction:
&b0000000000001011.1000101100001111
&b0000000001001110.1010111010111000 -
&b1111111110111100.1101110001010111

This is two-complement and is minus (&b10000000000000000 - result) = -&b0000000001000011.0010001110101001 = - (&b1000011 + (&b0010001110101001 / 65536)) = -67.1392974853515625

As you can see the results are correct if you round to four decimals places.

Multiplication:
&b0000000000001011.1000101100001111
&b0000000001001110.1010111010111000 *

First I will pretend these are integers:

&b00000000000010111000101100001111
&b00000000010011101010111010111000 *

Since the result will be 64 bits (32+32 bits) I'm going to pad the numbers:

&b0000000000000000000000000000000000000000000010111000101100001111
&b0000000000000000000000000000000000000000010011101010111010111000 *
&b0000000000000000000000000000000000000000010111000101100001111000
&b0000000000000000000000000000000000000000101110001011000011110000
&b0000000000000000000000000000000000000001011100010110000111100000
&b0000000000000000000000000000000000000101110001011000011110000000
&b0000000000000000000000000000000000010111000101100001111000000000
&b0000000000000000000000000000000000101110001011000011110000000000
&b0000000000000000000000000000000001011100010110000111100000000000
&b0000000000000000000000000000000101110001011000011110000000000000
&b0000000000000000000000000000010111000101100001111000000000000000
&b0000000000000000000000000001011100010110000111100000000000000000
&b0000000000000000000000000010111000101100001111000000000000000000
&b0000000000000000000000000101110001011000011110000000000000000000
&b0000000000000000000000101110001011000011110000000000000000000000 +
&b0000000000000000000000111000110000111111011000100010010011001000

Because the factors are both Q16.16 the result is Q32.32 (add the numbers before and after the point). So the result in Q32.32 is:

&b00000000000000000000001110001100.00111111011000100010010011001000

Strip this result from the outsides to Q16.16:

&b0000001110001100.0011111101100010

And this is &b0000001110001100 + (&b0011111101100010 / 65536) = 908.247589111328125

Now we see that the result is incorrect in the fourth decimal. That is because only four decimals are correct in the originals and the absolute error is large enough to show up in the result.

Binary division is exactly the same as long division in decimal. I hope you'll forgive me for not doing that here. ;) The binary result however is &b0.0010010110001110 which is 0.146697998046875. The result of the decimal calculation would have been 0.14670606551 so this one is also correct to four rounded decimals.

kryton9

30-01-2011, 06:53

Thanks you Johannes, if I knew it was going to be that much work I never would have bothered with so many examples. Thank you again for taking the time, I appreciate it very much.

Johannes

31-01-2011, 15:29

Not at all. Right now I'm fully into this stuff (well, when the people at work don't bother me too much ;)) and it helps me think about how to optimise certain calculations.

I don't know what your background in math is but Wikipedia has some excellent information (math is non-political, so no edit wars) on the most esoteric subjects. And their info on floating-point is all there.

Check out this bit on conversion and rounding (http://en.wikipedia.org/wiki/Floating_point#Representable_numbers.2C_conversion_and_rounding). It looks very accessible to me, but then again, when it comes to binary arithmetic I am, of course, completely insane. :D

kryton9

04-02-2011, 06:04

Thanks for the link. All of this is over my head, but I keep chipping away at it, one of these days it will sink in. I do find it interesting much more so than I ever did when in school.