dsp_math_multiply_sat explanation

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
aclassifier
Respected Member
Posts: 497
Joined: Wed Apr 25, 2012 8:52 pm

dsp_math_multiply_sat explanation

Post by aclassifier »

I don't understand the lib_dsp/dsp_math_multiply_sat explanation. I understand the final result, which is correct, but not the explanation of what's going on under the hood.

Assuming in my head that we have to do with Q8_24.

I understand this:
* This function multiplies two scalar values and produces a result according
* to fixed-point format specified by the ``q_format`` parameter.

But not this:
* The two operands are multiplied to produce a 64-bit result,
* saturated at the minimum/maximum value given the fixed-point format if overflow occurs,
* and finally shifted right by ``q_format`` bits.

Even if it's explained here:
* 1) Y = X1 * X2
* 2) Y = min( max( Q_FORMAT_MIN, Y ), Q_FORMAT_MAX, Y )
* 3) Y = Y >> q_format

It looks here like Q_FORMAT_MIN and Q_FORMAT_MAX in that pseudocode are equal to MIN_Q8_24 (0x80000000 is -128) and MAX_Q8_24 (0x7FFFFFFF is 127.999999940395355224609375) as defined in lib_dsp/dsp_math. Since 32 bits * 32 bits = 64 bits (ah, al) these ranges that they test against must be those masks but seen from MSB (ah)?

Here is the code from lib_dsp, where they also talk about MININT or MAXINT as if they were those valid for a 32 bits value, not 64. I have a feeling that the programmers of all this stuff, and the author of the xTIMEcomposer-User-Guide-14(14.x).pdf know that they are talking about what is happening in ah, while I seem to be grounded on reading their explanation as seen from al, which then at best does not match in my head.

Anyone who could rectify this for me?

Code: Select all

dsp_math.xc

OBS dsp_math_multiply is equal but no lsats, except for en explicit result register

int32_t dsp_math_multiply_sat( int32_t input1_value, int32_t input2_value, int32_t q_format )
{
    int32_t ah; uint32_t al;
    asm("maccs %0,%1,%2,%3":"=r"(ah),"=r"(al):"r"(input1_value),"r"(input2_value),"0"(0),"1"(1<<(q_format-1)) );
    asm("lsats %0,%1,%2":"=r"(ah),"=r"(al):"r"(q_format),"0"(ah),"1"(al));
    asm("lextract %0,%1,%2,%3,32":"=r"(ah):"r"(ah),"r"(al),"r"(q_format));
    return ah;
}

xTIMEcomposer-User-Guide-14(14.x).pdf

maccs: Mulitply and accumulate signed (not the same as macs)

lsats: Perform saturation on a 64-bit value.
  If any arithmetic has overflowed beyond a given bit index, then the value is set to MININT or MAXINT, right shifted by the bit index.

lextract: Extract a bitfield from a 64-bit value.
  value:    The value to extract the bitfield from.
  position: The bit position of the field, which must be a value between 0 and bpw - 1, inclusive.
  length:   The length of the field, one of bpw, 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32.
I have some code that tests this here (xTIMEcomposer 14.4.1). q24_signed_to_str is my code, and Q_FORMAT_STR_LEN 12 // "-128.123456" space for \0 needed

Code: Select all

void test_q8_24_x_q8_24_saturated (void)
{
    // From lib_dsp/dsp_math.h
    //     #define MIN_Q8_24 (0x80000000) // -128
    //     #define MAX_Q8_24 (0x7FFFFFFF) // 127.999999940395355224609375

    char q24_1_str [Q_FORMAT_STR_LEN];
    char q24_2_str [Q_FORMAT_STR_LEN];
    int32_t        product_32_sat;
    int32_t        product_32;
    const unsigned q_format = SPECTRUM_Q_FORMAT; // 28
    unsigned       input1_value;
    unsigned       input2_value;

    input1_value   = Q24(2.0);
    input2_value   = Q24(-3.0);
    product_32_sat = dsp_math_multiply_sat (input1_value, input2_value, q_format);
    q24_signed_to_str (q24_1_str, product_32_sat);
    //            A product_32_sat -6.000000 (-100663296, 0xFA000000) 
    debug_print ("A product_32_sat %s (%d, 0x%x)\n", q24_1_str, product_32_sat, product_32_sat); 
    
    input2_value   = Q24(3.0);
    product_32_sat = dsp_math_multiply_sat (input1_value, input2_value, q_format);
    q24_signed_to_str (q24_1_str, product_32_sat);
    //            B product_32_sat 6.000000 (100663296, 0x6000000)
    debug_print ("B product_32_sat %s (%d, 0x%x)\n", q24_1_str, product_32_sat, product_32_sat); 

    input1_value   = Q24(16.0); // Individually these..
    input2_value   = Q24(8.0);  // ..cannot "overflow"
    product_32_sat = dsp_math_multiply_sat (input1_value, input2_value, q_format); // 16*8=128 is "one" more than MAX_Q8_24 127.99.., will be saturated to it
    product_32     = dsp_math_multiply     (input1_value, input2_value, q_format); // 16*8=128 will overflow and change sign!
    q24_signed_to_str (q24_1_str, product_32_sat);
    q24_signed_to_str (q24_2_str, product_32);
    if (product_32_sat == product_32) {
        debug_print ("? product_32_sat %s (%d, 0x%x)\n", q24_1_str, product_32_sat, product_32_sat); 
    } else {
        //            input1_value    0x10000000 * input2_value 0x8000000 =
        //            product_32_sat  127.999999  (2147483647, 0x7FFFFFFF)
        //            product_32     -128.000000 (-2147483648, 0x80000000) SIGN CHANGED!
        debug_print ("input1_value    0x%x * input2_value 0x%x =\nproduct_32_sat %s  (%d, 0x%x)\nproduct_32     %s (%d, 0x%x)\n", 
            input1_value, input2_value,
            q24_1_str, product_32_sat, product_32_sat,
            q24_2_str, product_32,     product_32); 
    }
}


--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
fabriceo
XCore Addict
Posts: 213
Joined: Mon Jan 08, 2018 4:14 pm

Post by fabriceo »

Hi Øyvind,
this is a difficult topic at first, but here is how to make it easy, hopefully.

take a q8.24 number and multiply it with q8.24
you get a 64 bit value represented in q16.48
at the end you need to retrieve a Q8.24 for you next steps, so you need to saturate and extract so that it fits again in q8.24.
the lsat "24" instruction is same as clipping +1/-1 where +1/-1 would be coded as q8.56 (24+32)
then the extract instruction take the result back to q8.24 by extracting bits 56(24+32 again)..24

hope this helps and not brings more confusion :)
fabrice
User avatar
aclassifier
Respected Member
Posts: 497
Joined: Wed Apr 25, 2012 8:52 pm

Post by aclassifier »

It was only my positive experience from this forum that helped me dare post that question. Thanks a lot, for an answer!

Helps! Fits exactly to my mental understanding of it since you deal with the data in ah (+32) and not some undefined MININT or MAXINT as the explanation does.

In my opinion the accompanying explanation in dsp_math_multiply_sat should be cleared up, or at least it should define MININT and MAXINT.

I read your
fabriceo wrote: Tue May 07, 2024 3:02 pm the lsat "24" instruction is same as clipping +1/-1 where +1/-1 would be coded as q8.56 (24+32)
as causing this change in point 2 where Q_FORMAT_SHIFT in our case would be 32 for q8.56 (24+32).

* 1) Y = X1 * X2
* 2) Y = min( max( Q_FORMAT_MIN<<Q_FORMAT_SHIFT, Y ), Q_FORMAT_MAX<<Q_FORMAT_SHIFT, Y )
* 3) Y = Y >> q_format

My next daring point is this: The price of using q8.24 is 8 bits in that multiplication. If I had used q1.31 (just int32_t) it would have multiplied into a full 64 bits and then I would have shifted down by 32 bits and lost nothing(?) By lost I mean the need to saturate. 11.0 * 11.0 = 121 ok but 12.0 * 12.0 = 141 saturates to 127.99.. which is needed but not nice. For int32_t I'd only have to keep values out of being INT_MIN (0x80000000) as that would indeed overflow when squared. Whether I need to use q8.24 for my magnitude after the FFT I have started to doubt. But maybe it's because it's needed here: dsp_math_sqrt ((uq8_24)((re*re)+(im*im)))
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
fabriceo
XCore Addict
Posts: 213
Joined: Mon Jan 08, 2018 4:14 pm

Post by fabriceo »

hi Øyvind

if your number is representing a value which is contained between -1.0 and +0.999, it makes sense to use q1.31 instead.
when multiplying with a signed long multiply like maccs for example, you get a q2.62 where the 2 higher bits are either 00 or 11 depending on the sign result.
so if you want to retrieve your result in q1.31 for the next steps, then you still need to organize a signed shift right by 31bits , not just taking the MSB of ah-al !
Instead of doing a 64 bits shift right, it is faster to use lextract "31" which will retrieve 32 bits as of position 31 in the ah-al

after
int ah; unsigned al;
asm("maccs %0,%1,%2,%3":"=r"(ah),"=r"(al):"r"(input1),"r"(input2),"0"(0),"1"(0)) );
just do
asm("lextract %0,%1,%2,%3,32":"=r"(ah):"r"(ah),"r"(al),"r"(31));
and your ah contains le result in q1.31
no need for lsat in this example as the multiply of to numbers between -1..+1 gives something between -1..+1
hope this helps
User avatar
aclassifier
Respected Member
Posts: 497
Joined: Wed Apr 25, 2012 8:52 pm

Post by aclassifier »

Great! Thanks a lot!

I did go on with my test code and also saw that I couldn't just take msb and lsb.

With like q8.24 range [-128..127.99..] there is a need to saturate after the mult since we get q16.48. Not for 11.0 * 11.0 = 121 but for 12.0 * 12.0 = 144 -> 127.99...

But with q1_31 the range is as you say [-1..0.99..] multiplying by less than one is never going to overflow, so that we dont' need to saturate. The two upper bits never represent overflow, only sign. Is this the right way to see it?

I also see that the there are three square root functions (which this spectrum of re2*im2 is going to)::

Code: Select all

dsp_math_sqrt       (uq8_24)      in dsp_math.h. Observe that there also are
dsp_math_int_sqrt   (32->16 bits) in dsp_math_int.h and
dsp_math_int_sqrt64 (64-32 bits)  in dsp_math_int.h
I have struggled so much with these "simple" (I thought) things that I made a small chapter a while ago. It's at Signed formats. I hope it's correct. There I "invented"? a term like "s2_30" to inform about the number of sign bits. For the case here it would be "s2_62".
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
aclassifier
Respected Member
Posts: 497
Joined: Wed Apr 25, 2012 8:52 pm

Post by aclassifier »

Here is "numbered" example of q1.31 (max 0x7fffffff is 0.999..) of the hex value 0x40000 = 0.0001220703125.. squared into 0.0000000149.. or q1.31 0x20 (32) "only".

It looked strange with such a small number so I needed the below code to convince myself.
Yes, the final shift down of 31 bits and fractional decimal representation certainly make sense (numbers less than one, squared, is even smaller)!
And yes, for qn.m (n+m=32) we must shift down by m bits after the multiplication.
QED

Code: Select all

typedef struct {
    uint32_t lsb;
    uint32_t msb;
} parts64_t;

typedef struct {
    union {
        int64_t   int64_val;
        uint64_t  uint64_val;
        parts64_t parts64;
    } u;
} bits_64_t;

int32_t   product_32_sat;
int32_t   product_32;
int32_t   input1_value;
int32_t   input2_value;
bits_64_t bits64_val; 

unsigned q_format = 31;

// == R ==  
// q1.31
// 2exp(31) = pow(2, 31) = 2147483648
// 2exp(16) = pow(2, 16) = 65536 
// *4 = 262144 is then 0x040000 is Q31(0.0001220703125):
//
// 0x40000 on q1.31 = 262144 / 2147483648 = 0.0001220703125 
// Squared = 0.0001220703125 * 0.0001220703125 = 0.0000000149011612 is smaller of course
// 0,0000000149011612 * 2147483648 = 32 is product_32_sat = product_32, as seen below

input1_value   = 4 << 16; // =0x40000 = 2exp(16) * 4 
input2_value   = 4 << 16;
product_32_sat = dsp_math_multiply_sat (input1_value, input2_value, q_format); 
product_32     = dsp_math_multiply     (input1_value, input2_value, q_format);

bits64_val.u.int64_val = (int64_t)input1_value * (int64_t)input2_value;

// "R.31  262144 * 262144 = 0x10:0 (32 0x20) [32 0x20]"

debug_print ("R.%u  %d * %d = 0x%x:%x (%d 0x%x) [%d 0x%x]\n", 
    q_format, input1_value, input2_value, 
    bits64_val.u.parts64.msb, bits64_val.u.parts64.lsb,
    product_32, product_32, product_32_sat, product_32_sat);
By the way, input1_value and input2_value are int32_t, not unsigned as in my first code example.
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
fabriceo
XCore Addict
Posts: 213
Joined: Mon Jan 08, 2018 4:14 pm

Post by fabriceo »

Hi Øyvind
be careful in a 64 bits representation the msb is signed but the lsb is unsigned!

I ve recently released a DSP solution for a DAC manufacturer, and took the following approach:

the (music) samples are coded q1.31
any gain, factor or biquad coefficient is coded q4.28 which gives a 18db head room as we can represent number -8.0..7.999

when multiplying a sample by a gain, the result is always stored in a working accumulator in 64 bits as a q8.56 which gives a 42db headroom. we keep this format all along the treatment until it is converted back to q1.31 when storing the value in the DACs.

Also during treatment we have to multiply this q8.56 by q4.28 very often and then we always rescale to q8.56 which is a powerful container to keep precision. Like working with floats in fact.

when doing a biquad, the q8.56 is converted as q4.28 and multiplied by all coefs q4.28 so the end result is again q8.56

sometime it is needed to multiply q8.56 by q8.56 (128bits result) and then again reduced as q8.56

I can share the optimized assembly code for these 4 types of multiply if someone is interested
q1.31*q4.28->q8.56
q8.56*q4.28->q8.56
q8.56*q8.56->q8.56
q8.56*q4.28->q1.31

have a nice week end
fabriceo
User avatar
aclassifier
Respected Member
Posts: 497
Joined: Wed Apr 25, 2012 8:52 pm

Post by aclassifier »

Thanks a lot! I appreciate that, and will delve into it next week. Yes, msb signed and lsb unsigned (or rather, together: signed), but that's why I printed them out hex:-)=
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/
User avatar
fabriceo
XCore Addict
Posts: 213
Joined: Mon Jan 08, 2018 4:14 pm

Post by fabriceo »

Hi, I ve put it quickly in a kind of library.
using the static inline is very cool when compiler is in -O3

include a test example.

have a good time

FILES REMOVED see later post
Last edited by fabriceo on Sat May 18, 2024 1:50 pm, edited 1 time in total.
User avatar
aclassifier
Respected Member
Posts: 497
Joined: Wed Apr 25, 2012 8:52 pm

Post by aclassifier »

Great! Thanks! I have tried to rename some to understand what your are doing here. Is it correct? You flagged this as converting between the resolution I think, but I see no direct such, only as part of some calculations. Probably because direct converts here has more to do with keep the tongue right in the mouth, than doing direct converts? I understand that for like dsp_math_multiply_sat internally that direct converts need to be done. Also, in the .h file I see that you some times do both lextract and shifts. I assume they are not logically equal, or the final shifts are meant to finalize the word format, kind of? (Like in mul64_int64_int64)

Code: Select all

/*
 *  mymath.xc (https://www.xcore.com/viewtopic.php?t=8738)
 *
 *  Created on: 11 mai 2024
 *      Author: Fabrice
 */

#define VER_STR "v101"
// v100 13May2024 Fabriceo orig with -O2
// v101 13May2024 Fabriceo orig with -O3 and name _ends (cannnot see any difference in log)
// v102 ..

#include <platform.h>
#include <stdio.h>

#ifdef XSCOPE
#include <xscope.h>
void xscope_user_init()
{   xscope_register(0, 0, "", 0, "");
    xscope_config_io(XSCOPE_IO_BASIC);  }   // or XSCOPE_IO_TIMED
#endif

#include <math.h>
#include "mymath.h"

#define mant 28

void mymath_test(){

    double sample_f = 1.0/sqrt(3);           //0.5774
    double gain_f   = M_PI;                  //3.141593

    double acc1_f   = sample_f * gain_f;     //1.813799 (compare: mul64_s31_int32)
    double acc2_f   = acc1_f * sqrt(3);      //3.141593 (compare: mul64_int64_int32)
    double acc3_f   = acc1_f * acc2_f;       //5.698219 (compare: mul64_int64_int64)
    double output_f = acc3_f * (1.0/6.0);    //0.949703 (compare: mul31_int64_int32)

    printf("acc1_f\t\t= %f\n", ( acc1_f ) );
    printf("acc2_f\t\t= %f\n", ( acc2_f ) );
    printf("acc3_f\t\t= %f\n", ( acc3_f ) );
    printf("output_f\t= %f\n", ( output_f ));

    int sample_s31 = s31( sample_f );
    int gain_s28   = s28( gain_f );

    long long acc1_s56,acc2_s56,acc3_s56;

    acc1_s56 = mul64_s31_int32( sample_s31 , gain_s28 , mant);
    printf("\nacc1_s56\t= %f\n", f56( acc1_s56 ) );

    acc2_s56 = mul64_int64_int32( acc1_s56, s28( sqrt(3) ) , mant);
    printf("acc2_s56\t= %f\n", f56( acc2_s56 ) );

    acc3_s56 = mul64_int64_int64( acc1_s56, acc2_s56 , mant);
    printf("acc3_s56\t= %f\n", f56( acc3_s56 ) );

    int output_s31 = mul31_int64_int32( acc3_s56, s28( 1.0/6.0 ) , mant);
    printf("output_s31\t= %f\n",f31( output_s31 ));
}


int main(){
    printf("Hello, test with XC compiler (ver %s, mant %u)\n", VER_STR, mant);
    mymath_test();
    printf("Bye\n");

    /*
    Hello, test with XC compiler (ver v101, mant 28)
    acc1_f      = 1.813799
    acc2_f      = 3.141593
    acc3_f      = 5.698219
    output_f    = 0.949703

    acc1_s56    = 1.813799
    acc2_s56    = 3.141593
    acc3_s56    = 5.698219
    output_s31  = 0.949703
    Bye
    */
    return 0;
}
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/