Need help to implement partin() with inline assembly Topic is solved

Technical questions regarding the XTC tools and programming with XMOS.
xchips
Active Member
Posts: 51
Joined: Wed Jun 22, 2016 8:08 am

Need help to implement partin() with inline assembly

Post by xchips »

I need to handle a special I2S bus format which its BCLK/LRCK = 50 (i.e. 25 per channel).
The program below is a simple test for this:

Code: Select all

on tile[0] : out buffered port:32 p_dout[2] = {XS1_PORT_1M, XS1_PORT_1N};
on tile[0] : in buffered port:32 p_din[2] = {XS1_PORT_1I, XS1_PORT_1J};
on tile[0] : port p_mclk = XS1_PORT_1F;
on tile[0] : out buffered port:32 p_bclk = XS1_PORT_1H;
on tile[0] : out buffered port:32 p_lrck = XS1_PORT_1G;

on tile[0] : clock mclk = XS1_CLKBLK_2;
on tile[0] : clock bclk = XS1_CLKBLK_3;

{
    configure_clock_src(mclk, p_mclk);
    configure_clock_src_divide(bclk, p_mclk, 5);       
    configure_port_clock_output(p_bclk, bclk);
    configure_out_port_no_ready(p_lrck, bclk, 0);
    configure_out_port_no_ready(p_dout[0], bclk, 0);
    configure_in_port_no_ready(p_din[0], bclk);
    start_clock(mclk);
    start_clock(bclk);

    unsigned bclk_lrck_ratio = 25;
    unsigned Lch_sample = bitrev(0x01AC6B4D);
    unsigned Rch_sample = bitrev(0x01296479);
    unsigned rev_sample = 0;

    while(1)
    {
        rev_sample = partin(p_din[0], bclk_lrck_ratio);
        partout(p_lrck, bclk_lrck_ratio, 0x01000000);
        partout(p_dout[0], bclk_lrck_ratio, Lch_sample);

        printf("rev sample = 0x%.8X\n", bitrev(rev_sample));

        rev_sample = partin(p_din[0], bclk_lrck_ratio);
        partout(p_lrck, bclk_lrck_ratio, 0x00FFFFFF);
        partout(p_dout[0], bclk_lrck_ratio, Rch_sample);
        printf("rev sample = 0x%.8X\n", bitrev(rev_sample));
    }
}
p_din[0] connects to p_dout[0] as a loop back test.
However, I just can get 0x01FFFFFF or 0x00000000 in console view all the time.
I'm not sure if this is a bug? If yes, how to implement the partin() function with inline assembly?

Any idea will be appreciated!

Updated:
I guessed 'printf' affected I2S timing, so the program above will never get the correct value.
I changed the 'while(1)' loop as follows, and this time I got 0x01AC6B00 back from 'p_din[0]':

Code: Select all

    while(1)
    {
        rev_sample = partin(p_din[0], bclk_lrck_ratio);
        partout(p_lrck, bclk_lrck_ratio, 0x01000000);
        partout(p_dout[0], bclk_lrck_ratio, Lch_sample);

        rev_sample = partin(p_din[0], bclk_lrck_ratio);
        partout(p_lrck, bclk_lrck_ratio, 0x00FFFFFF);
        partout(p_dout[0], bclk_lrck_ratio, Rch_sample);
        printf("rev sample = 0x%.8X\n", bitrev(rev_sample));

        while(1){}         
    }
So I think 'partin()' can work correctly, it's my fault before. But any way, inline assembly for 'partin()' may help me.


View Solution
peter
XCore Addict
Posts: 230
Joined: Wed Mar 10, 2010 12:46 pm

Post by peter »

Have you tried looking at the VCD waves as to what is happening on the pins? You can generate VCDs on the command-line or in the GUI...

In terms of the inline assembler for partin, you can do the following:

Code: Select all

        asm("inpw %0, res[%1], 24":"=r"(rev_sample):"r"(p_din[0]));
as the equivalent for:

Code: Select all

        rev_sample = partin(p_din[0], 24);
Note that the widths supported for a single instruction inpw are 1-8,16,24,32.

However, the hardware also supports a split transaction where you tell the port how many bits you want in one instruction and then pick up the data later. That is done using the explicit "set transfer width" instruction followed by a normal in:

Code: Select all

        asm("setpsc res[%0], %1"::"r"(p_din[0]), "r"(bclk_lrck_ratio));
        asm("in %0, res[%1]":"=r"(rev_sample):"r"(p_din[0]));
Last edited by peter on Mon Jul 18, 2016 4:57 pm, edited 1 time in total.
xchips
Active Member
Posts: 51
Joined: Wed Jun 22, 2016 8:08 am

Post by xchips »

Thank you peter, your answer helps me a lot. I am currently using LA to analyze the timing because I'm new to VCD debug tool and XMOS.

BTW, unlike 'partin()', I found the read back value by using inline assembly doesn't need 'bitrev'.
peter
XCore Addict
Posts: 230
Joined: Wed Mar 10, 2010 12:46 pm

Post by peter »

Note: I realised that I used the wrong instruction in my first response. I had the SETTW instruction where it should have been the SETPSC.

In summary:
  • SETTW changes the width of data returned from the port permanently and can only be set to the values which are >= the port width and must be in the values [1,4,8,32].
  • SETPSC changes it just for the next data returned and can be any value < the port transfer width and can be any multiple of the port width.
xchips
Active Member
Posts: 51
Joined: Wed Jun 22, 2016 8:08 am

Post by xchips »

Oh! Sorry for the mistake, since that my simple test code can't raise this error. Thank you so much peter.
xchips
Active Member
Posts: 51
Joined: Wed Jun 22, 2016 8:08 am

Post by xchips »

I'm currently using xk-216-mc board to do some audio works.
The default BCLK to LRCK ratio is fixed at 64 in the reference design.

But I need to implement some special I2S signals based on this ref design.
For example, I need to handle two cases:
1. case 1: BCLK/LRCK = 96.
2. Case 2: BCLK/LRCK = 100.
They are not 64x.

First, I didn't change the port definitions (i.e. they(BCLK, LRCK, p_i2s_dac, p_i2s_adc) are all buffered 32 1-bit ports).
The code in audio.xc->deliver() handle 64x case like below:
{
while(1)
{
// Left channel
read 32bit p_i2s_adc data back.
perform 32bit LRCK.
perform 32bit p_i2s_dac.

// Right channel
read 32bit p_i2s_adc data back.
perform 32bit LRCK.
perform 32bit p_i2s_dac.
}
}

I modified it like below for case 1&2:
{
while(1)
{
// Left channel================================
read 32bit p_i2s_adc data back.
perform 32bit LRCK.
perform 32bit p_i2s_dac.

// handle extra 16bit/18bit LRCK and data for BCLK/LRCK = 96, 100 respectively
// (32+16)*2 = 96; (32+18)*2 = 100
read 16/18bit p_i2s_adc data back and throw it directly (we just need MSB 32bit data).
perform 16/18bit LRCK.
perform 16/18bit p_i2s_dac(all 0).

// Right channel================================
read 32bit p_i2s_adc data back.
perform 32bit LRCK.
perform 32bit p_i2s_dac.

// handle extra 16bit/18bit LRCK and data for BCLK/LRCK = 96, 100 respectively
read 16/18bit p_i2s_adc data back and throw it directly (we just need MSB 32bit data).
perform 16/18bit LRCK.
perform 16/18bit p_i2s_dac(all 0).
}
}

DAC can work correctly all the time when BCLK/LRCK = 96 or 100 (on-board codec cannot work when this value = 100, I have another codec can work in this case). So below just the ADC part. Before I used the inline assembly to read the adc ports, both 96x and 100x cases cannot work (ADC part), data what device -> Host are all 0. But now, 96x adc works,
100x still cannot work correctly even Host can get non-zero data back.


Case 1:
The inline assembly help me when BCLK/LRCK = 96.
Test conditions:
MCLK = 24.576Mhz.
BCLK = 3.072MHz.
LRCK = 32kHz (i.e. 3.072MHz / 96)
Connect p_i2s_dac[0] to p_i2s_adc[0].

The program below handle the extra 16/18bit adc data:

Code: Select all

	// bclk_lrck_ratio can be 96 or 100
for(int i = 0; i < I2S_CHANS_ADC; i+=I2S_CHANS_PER_FRAME)
{
     unsigned sample;

/* Note: #1, 2, 3 cannot work (No data input from device to PC, all 0 data)!!!    */
//   sample = partin(p_i2s_adc[index++], 18);                                                  // #1
//   asm volatile ("inpw %0, res[%1], 16" : "=r"(sample) : "r"(p_i2s_adc[index++]));  // #2

//   asm volatile ("setpsc res[%0], %1"::"r"(p_i2s_adc[index]), "r"(bclk_lrck_ratio/2 - 32)); // #3
//   asm volatile ("in %0, res[%1]":"=r"(sample):"r"(p_i2s_adc[index++]));
     // work!
     asm("setpsc res[%0], %1"::"r"(p_i2s_adc[index]), "r"(bclk_lrck_ratio/2 - 32));
     asm("in %0, res[%1]":"=r"(sample):"r"(p_i2s_adc[index++]));
}

As you see, the 'partin()' function won't make this case(ADC part) work until I use the inline assembly.So now 96x can work correctly. But case 2 (BCLK/LRCK) still cannot work.

Case 2:
Test conditions:
MCLK = 24Mhz.
BCLK = 4.8MHz.
LRCK = 48kHz (i.e. 4.8MHz / 100)
Connect p_i2s_dac[0] to p_i2s_adc[0].

/* The program for handling the extra 18 bit adc data just like case 1 */

But it still cannot work. So I use some test pattern to check it.
// Left channel================================
read 32bit p_i2s_adc data back.
Test pattern code for this:

Code: Select all

            for(int i = 0; i < I2S_CHANS_ADC; i+=I2S_CHANS_PER_FRAME)
            {
                // p_i2s_adc[index++] :> sample;
                // Manual IN instruction since compiler generates an extra setc per IN (bug #15256)
                unsigned sample;
				asm volatile("in %0, res[%1]" : "=r"(sample)  : "r"(p_i2s_adc[index++]));
				
				// Test pattern:
				sample = bitrev(0x12345678);      // Lch, Rch is: sample = bitrev(0xABCDEF47);

                /* Note the use of readBuffNo changes based on frameCount */
                if(buffIndex)
                    samplesIn_1[((frameCount-2)&(I2S_CHANS_PER_FRAME-1))+i] = bitrev(sample); // channels 0, 2, 4.. on each line.
                else
                    samplesIn_0[((frameCount-2)&(I2S_CHANS_PER_FRAME-1))+i] = bitrev(sample);
            }

I used Audition to record it back, and used UltraEdit to check the data, they were the test pattern what I set: See '100x_adc_test_parrtern.png'

So it means that the above code for 100x adc read back cannot work correctly.
And I have no idea.

Any idea will be appreciated.
You do not have the required permissions to view the files attached to this post.
peter
XCore Addict
Posts: 230
Joined: Wed Mar 10, 2010 12:46 pm

Post by peter »

Just wondering whether you might be right on the edge of the performance of what an individual core can do. The first thing I would try to confirm whether this is the case is to try one or more of the following:
  1. If possible, use less cores temporarily
  2. Putting the core in fast mode (xs1.h - set_core_fast_mode_on())
  3. Put the core in high priority mode (xs1.h set_core_high_priority_on())
xchips
Active Member
Posts: 51
Joined: Wed Jun 22, 2016 8:08 am

Post by xchips »

Hi, Peter, thanks for your response.
The resources usage in my USB Audio project:

Creating app_usb_aud_xk_216_mc_2i10o10xxxxxx.xe
Constraint check for tile[0]:
Cores available: 8, used: 4 . OKAY
Timers available: 10, used: 4 . OKAY
Chanends available: 32, used: 7 . OKAY
Memory available: 262144, used: 30396 . OKAY
(Stack: 2500, Code: 23388, Data: 4508)
Constraints checks PASSED.
Constraint check for tile[1]:
Cores available: 8, used: 4 . OKAY
Timers available: 10, used: 6 . OKAY
Chanends available: 32, used: 24 . OKAY
Memory available: 262144, used: 46940 . OKAY
(Stack: 2956, Code: 26040, Data: 17944)
Constraints checks PASSED.
Build Complete

I had tried what you mentioned before, but no improvement. I also tried 128x and 256x, 128x ADC can work correctly, but 256x can't work (all 0 data, worse than 100x, at least 100x can have non-zero data, DAC also can work at 128x or 256x).
Putting the core in fast mode (xs1.h - set_core_fast_mode_on())
Put the core in high priority mode (xs1.h set_core_high_priority_on())
I added them in this place: main.xc->usb_audio_io()

Code: Select all

        /* Audio I/O Core (pars additional S/PDIF TX Core) */
        {
            thread_speed();
//            set_core_fast_mode_on();
//            set_core_high_priority_on();
#ifdef MIXER
#define AUDIO_CHANNEL c_mix_out
#else
#define AUDIO_CHANNEL c_aud_in
#endif
            audio(AUDIO_CHANNEL,
I hope I didn't put them in a wrong place.