Math / Library supportSupports floating and fixed point math, including math libraries CONTENTS
INTRODUCTIONThe math support includes integer, fixed and floating point math including library functions: Integer: 8, 16, 24 and 32 bits, with and without sign Fixed point: 20 different formats, with and without sign Floating point: 16, 24 and 32 bits Math support for each compiler edition: RED+FREE STANDARD+EXTENDED int 8+16 8+16+24+32 fixed - 8+16+24+32 float 24 16+24+32 The compiler will automatically locate the required function for an operation like 'a*b'. Fixed point requires more worst case analysis to get correct results. This must include calculation of accumulated error and avoiding truncation and loss of significant bits. It is often straight forward to get correct results when using floating point. However, floating point functions requires significantly more code. In general, floating point and fixed point are both slow to execute. Note that floating point is FASTER than fixed point on multiplication and division, but slower on most other operations. Note that operations not found in the libraries are handled by the built in code generator. Also, the compiler will use inline code for operations that are most efficient handled inline. SAVE CODE AND SAVE RAM: All libraries are optimized to get compact code. The floating point library is more compact than the Microchip floating point libraries written in assembly. All variables (except for the floating point flags) are allocated on the generated stack to enable efficient RAM reuse with other local variables. A new concept of transparent sharing of parameters in a library is introduced to save code. CC5X can automatically delete unused library functions. This feature can also be used to delete unused user application functions. #pragma library 1 .. library functions that are deleted if unused #pragma library 0 .. remaining user application The normal use of '#pragma library' is in separate source library files that are included in the user application. FLOATING POINTThe compiler supports 16, 24 and 32 bit floating point. The supported 32 bit floating point format can be converted to and from the IEEE754 format by 3 instructions (macro in math32f.h). Format Resolution Range 16 bit 2.4 digits +/- 3.4e38, +/- 1.1e-38 24 bit 4.8 digits +/- 3.4e38, +/- 1.1e-38 32 bit 7.2 digits +/- 3.4e38, +/- 1.1e-38 Note that 16 bit floating point is intended for special situations where accuracy is less important. Supported floating point types: float16 : 16 bit floating point float, float24 : 24 bit floating point double, float32 : 32 bit floating point 32 bit floating point format: address ID X a.low8 : LSB, bit 0-7 of mantissa X+1 a.midL8 : bit 8-15 of mantissa X+2 a.midH8 : bit 16-22 of mantissa, bit 23: sign bit X+3 a.high8 : MSB, bit 0-7 of exponent, with bias 0x7F bit 23 of mantissa is a hidden bit, always equal to 1 zero (0.0) : a.high8 = 0 (mantissa & sign ignored) MSB LSB 7F 00 00 00 : 1.0 = 1.0 * 2**(0x7F-0x7F) = 1.0 * 1 7F 80 00 00 : -1.0 = -1.0 * 2**(0x7F-0x7F) = -1.0 * 1 80 00 00 00 : 2.0 = 1.0 * 2**(0x80-0x7F) = 1.0 * 2 80 40 00 00 : 3.0 = 1.5 * 2**(0x80-0x7F) = 1.5 * 2 7E 60 00 00 : 0.875 = 1.75 * 2**(0x7E-0x7F) = 1.75 * 0.5 7F 60 00 00 : 1.75 = 1.75 * 2**(0x7E-0x7F) = 1.75 * 1 7F 7F FF FF : 1.9999998808 00 7C E3 5A : 0.0 (mantissa & sign ignored) 00 00 00 00 : 0.0 01 00 00 00 : 1.1754943508e-38 : smallest number above zero FE 7F FF FF : 3.4028234664e+38 : largest number FF 00 00 00 : +INF : positive infinity FF 80 00 00 : -INF : negative infinity 24 bit floating point format: address ID X a.low8 : LSB, bit 0-7 of mantissa X+1 a.mid8 : bit 8-14 of mantissa, bit 15: sign bit X+2 a.high8 : MSB, bit 0-7 of exponent, with bias 0x7F bit 15 of mantissa is a hidden bit, always equal to 1 zero (0.0) : a.high8 = 0 (mantissa & sign ignored) MSB LSB 7F 00 00 : 1.0 = 1.0 * 2**(0x7F-0x7F) = 1.0 * 1 7F 80 00 : -1.0 = -1.0 * 2**(0x7F-0x7F) = -1.0 * 1 80 00 00 : 2.0 = 1.0 * 2**(0x80-0x7F) = 1.0 * 2 80 40 00 : 3.0 = 1.5 * 2**(0x80-0x7F) = 1.5 * 2 7E 60 00 : 0.875 = 1.75 * 2**(0x7E-0x7F) = 1.75 * 0.5 7F 60 00 : 1.75 = 1.75 * 2**(0x7E-0x7F) = 1.75 * 1 7F 7F FF : 1.999969482 00 7C 5A : 0.0 (mantissa & sign ignored) 01 00 00 : 1.17549435e-38 : smallest number above zero FE 7F FF : 3.40277175e+38 : largest number FF 00 00 : +INF : positive infinity FF 80 00 : -INF : negative infinity 16 bit floating point format: address ID X a.low8 : LSB, bit 0-6 of mantissa, bit 7: sign bit X+1 a.high8 : MSB, bit 0-7 of exponent, with bias 0x7F bit 7 of mantissa is a hidden bit, always equal to 1 zero (0.0) : a.high8 = 0 (mantissa & sign ignored) MSB LSB 7F 00 : 1.0 = 1.0 * 2**(0x7F-0x7F) = 1.0 * 1 7F 80 : -1.0 = -1.0 * 2**(0x7F-0x7F) = -1.0 * 1 80 00 : 2.0 = 1.0 * 2**(0x80-0x7F) = 1.0 * 2 80 40 : 3.0 = 1.5 * 2**(0x80-0x7F) = 1.5 * 2 7E 60 : 0.875 = 1.75 * 2**(0x7E-0x7F) = 1.75 * 0.5 7F 60 : 1.75 = 1.75 * 2**(0x7E-0x7F) = 1.75 * 1 7F 7F : 1.9921875 00 7C : 0.0 (mantissa & sign ignored) 01 00 : 1.175494-38 : smallest number above zero FE 7F : 3.389531+38 : largest number FF 00 : +INF : positive infinity FF 80 : -INF : negative infinity FLOATING POINT EXCEPTION FLAGSThe floating point flags are accessible in the application program. At program startup the flags should be initialized: FpFlags = 0; // reset all flags, disable rounding FpRounding = 1; // enable rounding Also, after an exception is detected and handled, the exception bit should be cleared so that new exceptions can be detected. Note that exceptions can be ignored if this is most convenient. New operations are not affected by old exceptions. This also enables delayed handling of exceptions. Only the application program can clear exception flags. Definitions: char FpFlags; // contains the floating point flags bit FpOverflow @ FpFlags.1; // floating point overflow bit FpUnderFlow @ FpFlags.2; // floating point underflow bit FpDiv0 @ FpFlags.3; // floating point divide by zero bit FpDomainError @ FpFlags.5; // domain error exception bit FpRounding @ FpFlags.6; // floating point rounding, // 0 = truncation, 1 = unbiased // rounding to nearest LSB IEEE754 INTEROPERABILITYThe floating point format used is not equivalent to the IEEE754 standard, but the difference is very small. The reason for using a different format is code efficiency. Macros for converting to and from IEEE754 are available: math32f.h: // before sending a floating point value out of the controller: float32ToIEEE754(floatVar); // change to IEEE754 (3 instr.) // before using a floating point value received from outside: IEEE754ToFloat32(floatVar); // change from IEEE754 (3 instr.) math24f.h: float24ToIEEE754(floatVar); // change to IEEE754 (3 instr.) IEEE754ToFloat24(floatVar); // change from IEEE754 (3 instr.) FIXED POINT, INTRODUCTIONFixed point can be used instead of floating point, mainly to save program space. Fixed point math uses formats where the decimal point is permanently set at byte boundaries. For example, fixed8_8 uses one byte for the integer part and one byte for the decimal part. Fixed point operations map nicely to integer operations except for multiplication and division, which are supported by library functions. Example: fixed8_8 fx; fx.low8 : Least significant byte, decimal part fx.high8 : Most significant byte, integer part MSB LSB 1/256 = 0.00390625 07 01 : 7 + 0x01*0.00390625 = 7.0039625 07 80 : 7 + 0x80*0.00390625 = 7.5 07 FF : 7 + 0xFF*0.00390625 = 7.99609375 00 00 : 0 FF 00 : -1 FF FF : -1 + 0xFF*0.00390625 = -0.0039625 7F 00 : +127 7F FF : +127 + 0xFF*0.00390625 = 127.99609375 80 00 : -128 Convention: fixed<S><I>_<D> : <S> : 'U' : unsigned <none>: signed <I> : number of integer bits <D> : number of decimal bits Thus, fixed16_8 uses 16 bits for the integer part plus 8 bits for the decimal, for a total of 24 bits. The resolution for fixed16_8 is 1/256=0.0039, which is the lowest possible increment. This is equivalent to 2 decimal digits (actually 2.4 decimal digits). New built in fixed point types: Type: #bytes Range Resolution fixed8_8 2 (1+1) -128, +127.996 0.00390625 fixed8_16 3 (1+2) -128, +127.99998 0.000015259 fixed8_24 4 (1+3) -128, +127.99999994 0.000000059605 fixed16_8 3 (2+1) -32768, +32767.996 0.00390625 fixed16_16 4 (2+2) -32768, +32767.99998 0.000015259 fixed24_8 4 (3+1) -8388608, +8388607.996 0.00390625 fixedU8_8 2 (1+1) 0, +255.996 0.00390625 fixedU8_16 3 (1+2) 0, +255.99998 0.000015259 fixedU8_24 4 (1+3) 0, +255.99999994 0.000000059605 fixedU16_8 3 (2+1) 0, +65535.996 0.00390625 fixedU16_16 4 (2+2) 0, +65535.99998 0.000015259 fixedU24_8 4 (3+1) 0, +16777215.996 0.00390625 (additional types with decimals only; no integer part) fixed_8 1 (0+1) -0.5, +0.496 0.00390625 fixed_16 2 (0+2) -0.5, +0.49998 0.000015259 fixed_24 3 (0+3) -0.5, +0.49999994 0.000000059605 fixed_32 4 (0+4) -0.5, +0.4999999998 0.0000000002328 fixedU_8 ÿ1 (0+1) 0, +0.996 0.00390625 fixedU_16 2 (0+2) 0, +0.99998 0.000015259 fixedU_24 3 (0+3) 0, +0.99999994 0.000000059605 fixedU_32 4 (0+4) 0, +0.9999999998 0.0000000002328 To sum up:
CONSTANTSFloating point constants are allowed, using 32 bit floating point format during compilation and calculation. fixed8_8 a = 10.24; fixed16_8 a = 8 * 1.23; fixed8_16 x = 2.3e-3; fixed8_16 x = 23.45e1; fixed8_16 x = 23.45e-2; fixed8_16 x = 0.; fixed8_16 x = -1.23; Constant rounding error example: Constant: 0.036 Variable type: fixed16_8 (with 1 byte for decimals) Error calculation: 0.036*256=9.216. The byte values assigned to the variable are simply 0,0,9. The decimals of 9.216 are rounded away. The error is (9.216-9)/9.216 = 0.024. The compiler prints the normalized error as a warning. CURRENT TYPE CONVERSIONThe fixed point types are handled as subtypes of float. Type casts are therefore infrequently required. FIXED POINT INTEROPERABILITYIt is recommended to stick to one fixed point format in a program. The main problem when using mixed types is the enormous number of combinations which makes library support a challenge. However, many mixed operations are allowed when CC5X can map the types to the built in integer code generator: fixed8_16 a, b; fixed_16 c; a = b + c; // OK, code is generated directly a = b * 10.22; // OK: library function is supplied a = b * c; // a new user library function is required! // A type cast can select an existing library function: a = b * (fixed8_16)c; INTEGER LIBRARIESThe math integer libraries allows selection between different optimizations, speed or size. The libraries contains operations for multiplication, division and division remainder. math16.h : basic library, up to 16 bit, signed and unsigned math24.h : basic library, up to 24 bit, signed and unsigned math32.h : basic library, up to 32 bit, signed and unsigned math16m.h : speed & size, 8*8, 16*16 math24m.h : speed & size, 8*8, 16*16, and 24*8 multiply. math32m.h : speed & size, 8*8, 16*16, and 32*8 multiply. These libraries can be used when execution speed is critical. NOTE 1: they must be included first (before math??.h) NOTE 2: math??.h contains similar functions (which are deleted) The min and max timing cycles are approximate only. The enhanced 14 bit core will use fewer cycles and less code. Sign: -: unsigned, S: signed Sign Res=arg1 op arg2 Program Approx. CYCLES A:math32.h: B:math24.h: C:math16.h: Code min aver max ABC - 16 = 8 * 8 13 83 83 83 ABC S 16 = 8 * 8 21 85 85 85 ABC S/- 16 = 16 * 16 18 197 222 277 .B. S 24 = 16 * 16 35 220 261 334 A.. S 32 = 16 * 16 42 223 253 313 A.. - 32 = 16 * 16 22 215 240 295 AB. - 24 = 16 * 8 15 198 198 198 ..C - 16 = 16 * 8 16 179 179 179 .B. - 24 = 24 * 8 16 247 247 247 A.. - 32 = 32 * 8 17 356 356 356 .B. - 24 = 24 * 16 26 217 263 361 A.. - 32 = 32 * 16 31 239 310 447 .B. - 24 = 24 * 24 25 337 410 553 A.. S/- 32 = 32 * 32 31 513 654 929 ABC - 16 = 16 / 8 18 235 235 235 AB. - 24 = 24 / 8 19 368 368 368 A.. - 32 = 32 / 8 20 517 517 517 ABC - 16 = 16 / 16 25 287 291 335 .B. - 24 = 24 / 16 31 481 512 633 A.. - 32 = 32 / 16 32 665 718 873 .B. - 24 = 24 / 24 36 564 576 732 A.. - 32 = 32 / 32 47 943 966 1295 ABC S 16 = 16 / 8 33 196 201 211 AB. S 24 = 24 / 8 37 305 310 326 A.. S 32 = 32 / 8 41 430 436 457 ABC S 16 = 16 / 16 49 296 309 361 .B. S 24 = 24 / 16 53 450 473 543 A.. S 32 = 32 / 16 57 626 660 747 .B. S 24 = 24 / 24 66 573 597 762 A.. S 32 = 32 / 32 83 952 990 1329 ABC - 8 = 16 % 8 18 226 226 226 .B. - 8 = 24 % 8 19 354 354 354 A.. - 8 = 32 % 8 20 502 502 502 ABC - 16 = 16 % 16 23 280 283 312 .B. - 16 = 24 % 16 29 463 497 599 A.. - 16 = 32 % 16 30 636 698 828 .B. - 24 = 24 % 24 34 556 567 700 A.. - 32 = 32 % 32 45 934 955 1254 ABC S 8 = 16 % 8 30 189 190 195 .B. S 8 = 24 % 8 35 291 292 300 A.. S 8 = 32 % 8 39 413 415 425 ABC S 16 = 16 % 16 46 290 297 332 .B. S 16 = 24 % 16 50 442 455 501 A.. S 16 = 32 % 16 54 614 634 692 .B. S 24 = 24 % 24 66 567 584 725 A.. S 32 = 32 % 32 86 944 974 1284 A:math32m.h: B:math24m.h: C:math16m.h: Code min aver max ABC - 16 = 8 * 8 37 50 50 50 ABC S/- 16 = 16 * 16 23+37 74 147 158 .B. - 24 = 24 * 8 32+37 124 162 166 A.. - 32 = 32 * 8 43+37 178 212 222 FIXED POINT LIBRARIESmath16x.h : 16 bit fixed point, 8_8, signed and unsigned math24x.h : 24 bit fixed point 8_16, 16_8, signed and unsigned math32x.h : 32 bit fixed point 8_24, 16_16, 24_8, signed and unsigned The libraries can be used separately or combined. The timing stated is measured in instruction cycles (4*clock) and includes parameter transfer, call, return and assignment of the return value. The min and max timing cycles are approximate only. The enhanced 14 bit core will use fewer cycles and less code. Sign: -: unsigned, S: signed Sign Res=arg1 op arg2 Program Approx. CYCLES math16x.h: Code min aver max S 8_8 = 8_8 * 8_8 47 226 263 339 - 8_8 = 8_8 * 8_8 23 214 252 326 S 8_8 = 8_8 / 8_8 51 497 518 584 - 8_8 = 8_8 / 8_8 35 528 558 680 math24x.h: Code min aver max S 16_8 = 16_8 * 16_8 60 376 450 577 - 16_8 = 16_8 * 16_8 27 364 437 580 S 16_8 = 16_8 / 16_8 68 850 893 1093 - 16_8 = 16_8 / 16_8 46 894 944 1222 S 8_16 = 8_16 * 8_16 60 354 428 555 - 8_16 = 8_16 * 8_16 28 342 415 558 S 8_16 = 8_16 / 8_16 68 1050 1116 1349 - 8_16 = 8_16 / 8_16 46 1104 1188 1520 math32x.h: Code min aver max S 24_8 = 24_8 * 24_8 77 558 722 983 - 24_8 = 24_8 * 24_8 35 546 709 1026 S 24_8 = 24_8 / 24_8 85 1298 1366 1761 - 24_8 = 24_8 / 24_8 57 1361 1432 1929 S 16_16= 16_16*16_16 78 561 704 930 - 16_16= 16_16*16_16 36 549 690 965 S 16_16= 16_16/16_16 85 1546 1650 2097 - 16_16= 16_16/16_16 57 1617 1733 2305 S 8_24 = 8_24 * 8_24 77 529 672 896 - 8_24 = 8_24 * 8_24 35 517 658 933 S 8_24 = 8_24 / 8_24 85 1794 1936 2433 - 8_24 = 8_24 / 8_24 57 1872 2033 2680 FLOATING POINT LIBRARIESmath16f.h : 16 bit floating point basic math operations math24f.h : 24 bit floating point basic math operations math24lb.h : 24 bit floating point library math32f.h : 32 bit floating point basic math operations math32lb.h : 32 bit floating point library NOTE: The timing values includes parameter transfer, call and return and also assignment of the return value. The min and max timing cycles are approximate only. The enhanced 14 bit core will use fewer cycles and less code. Basic 32 bit math: *** Timing **** Size min aver max a * b: multiplication 91 380 468 553 a / b: division 125 523 610 742 a + b: addition 182 39 135 225 a - b: subtraction add+5 46 142 232 int32 -> float32 79 45 69 118 float32 -> int32 86 36 77 143 Basic 24 bit math: *** Timing **** Size min aver max a * b: multiplication 77 226 261 294 a / b: division 102 323 359 427 a + b: addition 152 33 114 173 a - b: subtraction add+5 40 121 180 int24 -> float24 62 36 64 106 float24 -> int24 74 31 72 117 Basic 16 bit math: *** Timing **** Size min aver max a * b: multiplication 62 104 107 114 a / b: division 82 137 154 171 a + b: addition 118 27 86 130 a - b: subtraction add+5 34 93 137 int16 -> float16 72 40 71 107 float16 -> int16 53 26 60 98 The following operations are handled by inline code: assignment, comparing with constants, multiplication and division by a multiple of 2 (i.e. a*0.5, b * 1024.0, c/4.0) Floating point library functions: float32 sqrt( float32); // square root Input range: positive numbers including zero Timing: min aver max 1174 1303 1415 Size: 76 words * minimum complete program example: 97 words float24 sqrt( float24); // square root Input range: positive numbers including zero Timing: min aver max 645 700 758 Size: 62 words * minimum complete program example: 79 words float32 log( float32); // natural log function Input range: positive numbers above zero Timing: min aver max 3493 4766 5145 Size: 265 words + basic 32 bit math library * minimum complete program example: 764 words float24 log( float24); // natural log function Input range: positive numbers above zero Timing: min aver max 2184 3075 3297 Size: 214 words + basic 24 bit math library * minimum complete program example: 625 words float32 log10( float32); // log10 function Input range: positive numbers above zero Timing: min aver max 3935 5229 5586 Size: 17 words + log() * minimum complete program example: 781 words float32 exp( float32); // exponential ( e**x ) function Input range: -87.3365447506, +88.7228391117 Timing: min aver max 4465 4741 5134 Size: 322 words + 145(floor32) + basic 32 bit math library * minimum complete program example: 843 words float24 exp( float24); // exponential ( e**x ) function Input range: -87.3365447506, +88.7228391117 Timing: min aver max 1969 3025 3264 Size: 251 words + 102(floor24) + basic 24 bit math library * minimum complete program example: 674 words float32 exp10( float32); // 10**x function Input range: -37.9297794537, +38.531839445 Timing: min aver max 3638 4721 5045 Size: 326 words + 145(floor32) + basic 32 bit math library * minimum complete program example: 852 words float24 exp10( float24); // 10**x function Input range: -37.9297794537, +38.531839445 Timing: min aver max 1987 3005 3194 Size: 256 words + 102(floor24) + basic 24 bit math library * minimum complete program example: 679 words float32 sin( float32); // sine function, input in radians float32 cos( float32); // cosine function, input in radians Input range: -512.0, +512.0 * can be used over a much wider range (10**6) if lower accuracy is accepted (degrades gradually to 1 significant decimal digit) Timing: min aver max 543 5220 5855 Size: 357 words + basic 32 bit math library * minimum complete program example: 820 words float24 sin( float24); // sine function, input in radians float24 cos( float24); // cosine function, input in radians Input range: -512.0, +512.0 Timing: min aver max 396 2492 2746 Size: 215 words + basic 24 bit math library * minimum complete program example: 597 words The min and max timing cycles are approximate only. The enhanced 14 bit core will use fewer cycles and less code. All timing is measured in instruction cycles. When using a 4 MHz oscillator, one instruction cycle is 1 microsecond. FAST AND COMPACT INLINE OPERATIONSThe compiler will use inline code for efficiency at some important operations: Integer: - converting to left and right shifts: a*8, a/2 - selecting high/low bytes/words: a/256, a%256, b%0x10000 - replacing remainder by AND operation: a%64, a%0x80 Fixed Point: - converting to left and right shifts: a*8, a/2 - all operations except multiplication and division are implemented inline Floating point: - add/sub (incr/decr) of exponent: a*128.0, a/2 - operations == and != : a==b, a!=0.0 - comparing with constants: a>0, a<=10.0 - inverting the sign bit: a=-a, b=-a FIXED POINT EXAMPLE#include "16C73.h" #include "math24x.h" uns16 data; fixed16_8 tx, av, mg, a, vx, prev, kp; void main(void) { vx = 3.127; tx += data; // automatic type cast data = kp; // assign integer part if ( tx < 0) tx = -tx; // make positive av = tx/20.0; mg = av * 1.25; a = mg * 0.98; // 0.980469: error on constant: 0.000478 prev = vx; vx = a/5.0 + prev; kp = vx * 0.036; // 0.03515626: error on constant: 0.024 kp = vx / (1.0/0.036); // 27.7773437 error on constant: 0.0000156 } // CODE: 274 instructions including library of 130 instructions FLOATING POINT EXAMPLE// CODE: 635 instructions including library of 470 instructions // The code is identical to the above fixed point example // to enable code size comparison #include "16C73.h" #include "math24f.h" uns16 data; float tx, av, mg, a, vx, prev, kp; void main(void) { InitFpFlags(); // enable rounding as default vx = 3.127; tx += data; // automatic type cast data = kp; // assign integer part if ( tx < 0) tx = -tx; // make positive av = tx/20.0; mg = av * 1.25; a = mg * 0.98; prev = vx; vx = a/5.0 + prev; kp = vx * 0.036; kp = vx / (1.0/0.036); } CODE SIZE COMPARISONFirst example is the 32 bit floating point cosine function 'cos()'. This is also found in the Microchip AN660 library. By removing all uncalled functions from the AN660 assembly include files, it was possible to reduce the size of a complete example program to 1170 words. The equivalent CC5X example program is down to 820 words, a saving of 350 code words (-30%). 32 bit floating point math size comparison: Operation CC5X library Microchip AN669 32 * 32 91 111 (+22%) 32 / 32 125 166 (+33%) 32 + 32 182 270 (+48%) int32 -> float32 78 114 (+46%) float32 -> int32 90 125 (+39%) |