Downsampling DoP 352.8 / 384 > 176.4 / 192

alexjaw · Post by **alexjaw** » Wed Mar 21, 2018 5:51 pm

What options do I have if I need to downsample DSD from USB that comes as 352.8 / 384 kHz down to 176.4 / 192 kHz?

I will use xCORE-200 processor which will only handle USB audio from host (i.e. only input) and pass I2S data to a separate DSP that will be running at 192/32.

I have looked at lib_src and found that max input/output is 176.4 / 192. So, it seems that the only option, using xmos for src, is to implement a polyphase filter with lib_dsp. Am I correct? Or would it be feasible to allow for higher input rates to the existing code in lib_src?

Reading further in the lib_src documentation and found at the end of the document (appendix A - known issues) that samplerate conversion upper bound is limited by the core frequency of 500MHz, xCORE-200 clocking. If I understand this information correctly, then I can skip any ideas of downsampling with polyphase above 192k. Appreciate if this can be confirmed so that I can focus on another external solution

infiniteimprobability · Fri Mar 23, 2018 8:14 am

By DoP 352.8 do you mean DSD128 with a rate of 5.6448MHz? Do you want to then:

-Extract the DSD from the DoP frame
-Downsample the DSD128 to 176.4kHz PCM

The first part is easy - just extract the middle 2 bytes from a 4 byte frame. The second part is harder, but possible.
You are right that lib_src will not do this - it takes 32b PCM data from the "normal" set of sample rates. Lib_mic_array is essentially a set of FIRs with the first stage being a decimate by 8 stage which takes you from 3.072MHz to 384kHz PCM. The input rate is very high but it uses a trick to make this easier. It uses a decimate by 8 polyphase filter and takes in a Byte of DSD data at a time. A DSD 1 is INT_MAX and DSD 0 is -INT_MAX and so you can compute 8 input samples with a simple lookup. It's a very neat way of doing it and highly efficient and multiple channels will easily fit in a single logical core. Lib_mic_array then uses standard FIRs to do subsequent stages, but by then, the rate is manageable using normal FIRs.

It's all supplied as source but lib_mic_array is highly optimised both in terms of the ASM and the number of taps for voice. It will probably sound OK for HiFi purposes but I could imagine you would want more taps and improved filter characteristics if you are designing high end equipment.

alexjaw · Post by **alexjaw** » Fri Mar 23, 2018 1:45 pm

Actually, the demands are to handle PCM 352/384 32, and DSD128. So, we are talking about an USB audio bridge to I2S, where the PCM data consumer is a DSP running at 192/32.

If I understand you correctly, it's possible to handle DSD128 to PCM176 by doing a similar implementation as in lib_mic_array. We will definitely need to adjust the filter settings for hifi audio. Will look into the implementation.

PCM synchronous sample rate conversion. The primary limits are the processor usage and the number of cores, where 100 MHz is max for 1 core (I'm looking at Typical resource usage in the lib_src doc). From what I can see in the table, a "simple" integer sample rate conversion 384 > 192 should take max 80 MHz (since 96 > 48 takes 20 MHz and 192 > 96 takes 40 MHz). Does my naive calculation suggest that we could make some modifications in the lib_src code for our scenario and enable conversion by a factor of 2 from 352/384 > 176/192?

Many thank's for your input!

infiniteimprobability · Fri Mar 23, 2018 3:40 pm

Actually, the demands are to handle PCM 352/384 32, and DSD128. So, we are talking about an USB audio bridge to I2S, where the PCM data consumer is a DSP running at 192/32.

Got it!

If I understand you correctly, it's possible to handle DSD128 to PCM176 by doing a similar implementation as in lib_mic_array. We will definitely need to adjust the filter settings for hifi audio. Will look into the implementation.

It's quite interesting - you end up with an array of values which is [NTAPS / 8][256]. Each of the values in the array is the sum of the 8 dot product of the coefficients by the 8 input word bits. Effectively you have pre-computed all of the dot products. The magic is in the coefficient generation. Here are two filter coefficient generators snippets. One for standard FIRs and one for DSF (which is the DSD 8b input, decimate by 8 special filter).

Code: Select all

def make_dsf_filter(fs, decimation_factor, transition_low, transition_high, numtaps, name, weight):
    assert decimation_factor == 8
    assert numtaps % decimation_factor == 0
    taps = remez(numtaps, [0, transition_low, transition_high, 0.5*fs], [1, 0], weight=weight, Hz=fs)
    w, h = freqz(taps)
    plot_response(fs, w, h, name)
    #plot_impulse(taps)
    print taps, len(taps)

    uname = name.upper()
    c_source = '#define ' + uname + '_NTAPS\t' + str(numtaps) + '\n'
    c_source += '#define ' + uname + '_NPHASES\t' + str(decimation_factor) + '\n'    
    c_source += 'double ' + name + '_coeffs[' + str(numtaps / decimation_factor) + '][' + str(2**decimation_factor) + '] = {'
    for tap_idx in range(0, numtaps, decimation_factor):
        c_source += '{'
        for lookup_val in range(2**decimation_factor):
            sumcoeffs = 0
            for bit in range(decimation_factor):
                #mask = 0b10000000 >> bit
                mask = 0b00000001 << bit
                if (mask & lookup_val):
                    sumcoeffs += taps[tap_idx + bit]
                else:
                    sumcoeffs -= taps[tap_idx + bit]
            c_source += str(sumcoeffs) + ", "
        c_source = c_source[:-2] #remove last comma-space
        c_source += '},\n'
    c_source = c_source[:-2] #remove last comma-newline
    c_source += '};\n\n'
    return c_source

def make_decimation_filter(fs, decimation_factor, transition_low, transition_high, numtaps, name, weight):
    assert numtaps % decimation_factor == 0
    taps = remez(numtaps, [0, transition_low, transition_high, 0.5*fs], [1, 0], weight=weight, Hz=fs)
    uname = name.upper()
    c_source = '#define ' + uname + '_NTAPS\t' + str(numtaps) + '\n'
    c_source += '#define ' + uname + '_NPHASES\t' + str(decimation_factor) + '\n'
    c_source += 'double ' + name + '_coeffs[' + str(numtaps) + '] = {'
    w, h = freqz(taps)
    plot_response(fs, w, h, name)
    #plot_impulse(taps)
    #print taps
    for tap_idx in range(numtaps):
        c_source += str(taps[tap_idx]) + ", "
    c_source = c_source[:-2] #remove last comma-space
    c_source += '};\n\n'
    return c_source

And here are two example implementations using the coeffs:

Code: Select all

static inline double poly_fir(
	unsigned n_taps, 
	unsigned decimation_factor, 
	double delayline[n_taps * 2], 
	unsigned delayline_idx, 
	double coeffs[n_taps]) 
	{
	double sample = 0;
	for (int i=0; i<n_taps; i++) {
		sample += coeffs[i] * delayline[delayline_idx + i];
	}
	return sample;
}

static inline double dsf_fir(
	unsigned n_taps, 
	unsigned decimation_factor, 
	unsigned char input,
	unsigned char delayline[n_taps / decimation_factor * 2], 
	unsigned delayline_idx, 
	double coeffs[decimation_factor][1 << decimation_factor]) 
	{
	double sample = 0;
	for (int i=0; i<n_taps; i+= decimation_factor) {
		unsigned fir_table_index = i / decimation_factor;
		unsigned delayline_idx_new = fir_table_index + delayline_idx;
		double partial_sum = coeffs[fir_table_index][delayline[delayline_idx_new]];
		sample += partial_sum;
	}
	return sample;
}

PCM synchronous sample rate conversion. The primary limits are the processor usage and the number of cores, where 100 MHz is max for 1 core (I'm looking at Typical resource usage in the lib_src doc). From what I can see in the table, a "simple" integer sample rate conversion 384 > 192 should take max 80 MHz (since 96 > 48 takes 20 MHz and 192 > 96 takes 40 MHz). Does my naive calculation suggest that we could make some modifications in the lib_src code for our scenario and enable conversion by a factor of 2 from 352/384 > 176/192?

This is exactly right. A 192k->96k filter is exactly the same as a 384k->192k filter, only the first is called twice as often. So 80MHz is fine as an estimate. I actually measured it to be around 75MHz in the past. You could tweak up the block size to save a bit more.

But in general, sounds like you have enough oomph on chip to do this. If you reserve 100MHz threads for 2 x 384 -> 192 (or 352->176), 2 x 176.4 -> 192 and one quite low performance thread to do the 2 x DSD->384/352. That's one tile. USB audio uses less than one tile. You will only be using a fraction of the memory too.

infiniteimprobability · Fri Mar 23, 2018 4:29 pm

Just realised that the 75MHz is for block size = 8. So 80MHz for blocks of 4 sounds right!

alexjaw · Post by **alexjaw** » Tue Apr 17, 2018 3:22 pm

I start with the "simple" task of down sampling PCM using ssrc from lib_src (add to Makefile: USED_MODULES += lib_src). This could become what was planned as AN00230, i.e. adding ssrc to the usb audio framework.
So, mission is to down sample 352/384 > 192, but lets start with what should work out-of-the box, 176/192 > 96.

The code below compiles (using build app_usb_aud_xk_216_mc_2i10o10xxxxxx). It detects different PCM sample rates from usb (checked with LRCLK, running foobar, ASIO, TUSBAudio). However, xDAC_SDn is 0.

* Do not understand how to properly execute ssrc_init(...) within the framework.
* Moreover, lacking proper understanding for where to intercept the audio stream in order to perform the sample rate conversion. In the code below it's in the same place as was used for user_dsp example in the thread mentioned below (adding dsp functionality to the framework).

Have commented the code with thoughts and questions and hope that we can sort this out and get an additional application example for the usb framework.

* I imagine that we need to intercept the audiostream between the decoupler and the audio driver in a similar fashion as discussed in thread. In audio.xc we declare a global streaming chanend c_src_glob and perform init. However, there is probably more to be done in the init in order to set up ssrc with ssrc_init().

Code: Select all

...
#include "print.h"

// Datapath to SRC tasks. We use a global channel end to avoid plumbing all through the USB audio stack
#include "user_src.h"
unsafe streaming chanend c_src_glob;
void src_init_chanend(streaming chanend c_src) {
    // todo: how/where do the ssrc_init(...)
    // Placed here produces error: ssrc_ctrl undeclared. But user_src.h is included.
    // ssrc_init(FS_CODE_192, FS_CODE_96, ssrc_ctrl, SSRC_CHANNELS_PER_INSTANCE, SSRC_N_IN_SAMPLES, OFF);
    unsafe {
           c_src_glob = (unsafe streaming chanend) c_src;
    }
}

static unsigned samplesOut[NUM_USB_CHAN_OUT];
...

* Also in audio.xc, streaming decoupler > ssrc > audiodriver, however big question mark

Code: Select all

...
#else
        inuint(c_out);
#endif
        unsafe{
            /* SSRC.
             * Is this correct? ssrc needs min 4 samples, We have stereo from usb, i.e. sending 8 samples to
             * the task. Performing decimation by 2 so I expect only 4 samples back. But I dont quite understand how I
             * should express it in the framework in order to correctly match a fixed samplerate of 192k for audiodriver (i2s)
             * and still allow for higher samplerates from the USB host?
             * lib_src user guide page 5 discusses the ratio between streaming in and out of ssrc. In the framework that would
             * corrspond to different word clocks for decoupler and audiodriver, respectively. Is it possible? Have I completely
             * misunderstood where ssrc should be placed in the audio path of the framework?!
             * The final code here will only run when samplerate is above the fixed
             * samplerate of the DSP. At all other samplerates, the DSP can perform the conversions
             */
            // if (samplerate > 192000) {
                   // do something with the clock?
                   // set samplerate to 192000 and downsample with ssrc
                   for(int i=0; i<(SSRC_N_IN_SAMPLES*SSRC_CHANNELS_PER_INSTANCE); i++) c_src_glob <: samplesOut[i];   // 4x2=8
                   for(int i=0; i<(SSRC_N_OUT_SAMPLES*SSRC_CHANNELS_PER_INSTANCE); i++) c_src_glob :> samplesOut[i];  // 2*2=4
            // }
        }

#if NUM_USB_CHAN_IN > 0
...

* user_main.h should be the same, it defines the streaming channels and places it on a tile. Since we intend to run in sync we do not need to use fancy buffering as in AN00231 and asrc, or am I wrong?

Code: Select all

#ifndef _USER_MAIN_H_
#define _USER_MAIN_H_
#include "customdefines.h"
#include "user_src.h"

#define USER_MAIN_DECLARATIONS   \
        streaming chan chan_src; \

#define USER_MAIN_CORES \
            on tile[AUDIO_IO_TILE]: user_src(chan_src); \
            on tile[AUDIO_IO_TILE]: src_init_chanend(chan_src);

#endif /* _USER_MAIN_H_ */

* user_src.h and user_src.xc (with user_main.h, placed in /src/extensions):

Code: Select all

#ifndef __USER_SRC_H__
#define __USER_SRC_H__
#ifdef __XC__
#include <xs1.h>
#include <src.h>

#define SSRC_CHANNELS_PER_INSTANCE    2  // USB stereo, output from ssrc is composed of left and right samples, interleaved
#define SSRC_N_IN_SAMPLES             4  // Min value of samples for ssrc
#define SSRC_N_OUT_CHANNELS           SSRC_CHANNELS_PER_INSTANCE  // Still stereo
#define SSRC_N_OUT_SAMPLES            2  // ??? Trying to do decimation by 2, i.e 4/2 = 2...
//SSRC_STACK_LENGTH_MULT is determined in ssrc.h

void user_src(streaming chanend c_src);
void src_init_chanend(streaming chanend c_src);

#endif
#endif /* __USER_SRC_H_ */

Code: Select all

#include <xs1.h>
#include "user_src.h"

// lib_src user guide, XM010383, states that following structures must be used for ssrc
ssrc_state_t ssrc_state[SSRC_CHANNELS_PER_INSTANCE];
// todo: Getting error: size of array not constant
//int ssrc_stack[SSRC_CHANNELS_PER_INSTANCE][SSRC_STACK_LENGTH_MULT * SSRC_N_IN_SAMPLES];
int ssrc_stack[2][2*4*4];
ssrc_ctrl_t ssrc_ctrl[SSRC_CHANNELS_PER_INSTANCE];

// todo: src_init_chanend is performed in audio.xc, but shouldn't it be here?
/*
void src_init_chanend(streaming chanend c_src) {
    ssrc_init(FS_CODE_192, FS_CODE_96, ssrc_ctrl, SSRC_CHANNELS_PER_INSTANCE, SSRC_N_IN_SAMPLES, OFF);
    unsafe {
           c_src_glob = (unsafe streaming chanend) c_src;
    }
}
 */

void user_src(streaming chanend c_src) {
    int samps_src_pre[SSRC_N_IN_SAMPLES*SSRC_CHANNELS_PER_INSTANCE] = {0};
    int samps_src_post[SSRC_N_OUT_SAMPLES*SSRC_CHANNELS_PER_INSTANCE] = {0};
    int n_out_ssrc = 0;  // return value from ssrc_process, expecting 8/2 = 4 for each call
                         // left[0], right[0], left[1], right[1], oldest first

    while(1) {
      //Sample exchange
      for(int i = 0; i < SSRC_N_IN_SAMPLES; i++) c_src :> samps_src_pre[i];
      for(int i = 0; i < SSRC_N_OUT_SAMPLES; i++) c_src <: samps_src_post[i];

      // Process data
      // Sending interleaved data in and getting sample rate converted interleaved data out
      n_out_ssrc = ssrc_process(samps_src_pre, samps_src_post, ssrc_ctrl);
    }
}

infiniteimprobability · Mon Apr 23, 2018 12:15 pm

there is an example of ssrc usage in the tests:

https://github.com/xmos/lib_src/blob/ma ... rc_test.xc

This should hopefully show you what to declare and how to setup/call init etc.

I did have a prototype version of USB audio running with SSRC which was going to be the basis of an appnote (AN00230) which never was. Let me see if I can find it. Binary attached for now which works on the xCore200 multichannel board. Press button 1 to change I2S speed. (Host speed controlled by host as normal). If you run with --xscope you can see reported sample rates.

But basically you'll need to insert a manager task between the USB buffer (decouple) and audio because on one side it talks the host (USB) rate and on the other side it talks the audio (I2S) rate. Also USB audio works sample by sample whereas SRC is block based (minimum size 4).

alexjaw · Post by **alexjaw** » Thu Apr 26, 2018 5:05 pm

Have tested the binary and it works very well, terrific job! Any luck finding the code for the binary? Will dive into the example you mentioned.

infiniteimprobability · Sat Apr 28, 2018 2:20 pm

Here you go.. I recall now that on top of the buffering work I had to do some modulo maths to handle the explicit feedback calc (in buffer.xc) as well as add signalling from I2S back to the buffer thread so it knew what the MCLK used is.
This was all written before lib_src was a thing so there may be a few mods to make but the guts are all there. There is also risk that the channel protocol may be different from 6.15.2 USB audio release - you'll need to check this. I think you can handle this though.. I'm pretty sure everything is guarded by SSRC_DEMO so the changes should be apparent by searching for that text.

alexjaw · Post by **alexjaw** » Fri May 25, 2018 11:07 am

Thank's for the code!

Have been occupied with other stuff, but are now integrating the ssrc_usb_audio code with the latest usb_audio6.15. Think that I have made the necessary changes for the dsp_manager (using lib_src from xmos github repo). Am now merging the code for the usb_buffer. In the latest usb_buffer.xc, there is a larger chunk of code guarded with #if 0. Moreover it says in a comment that it's the original feedback implementation (...as if it has been replaced with something newer). And in this code, you have a SSRC_DEMO section which recalculates some variable (and handles the leds). I am not really sure how to merge the code guarded with SSRC_DEMO with the newer usb_buffer.xc since the relevant section is not used anymore (guarded with #if 0). Just tested to place the SSRC_DEMO code outside the if 0 block. It compiles but I don't get any i2s samples. However, I can change the LRCLK (and LEDs).

The code is on github, merge results in branch add_dsp_manager.
It's the whole app. ssrc_demo code is merged with usb_audio6.15, except for part of usb_buffer.xc (#if 0 part). It compiles, button can be used to change LRCLK, but no i2s. Using lib_src-master from github.

Downsampling DoP 352.8 / 384 > 176.4 / 192

Downsampling DoP 352.8 / 384 > 176.4 / 192

Re: Downsampling DoP 352.8 / 384 > 176.4 / 192

Re: Downsampling DoP 352.8 / 384 > 176.4 / 192

Re: Downsampling DoP 352.8 / 384 > 176.4 / 192

Re: Downsampling DoP 352.8 / 384 > 176.4 / 192

Re: Downsampling DoP 352.8 / 384 > 176.4 / 192

Re: Downsampling DoP 352.8 / 384 > 176.4 / 192

Re: Downsampling DoP 352.8 / 384 > 176.4 / 192

Re: Downsampling DoP 352.8 / 384 > 176.4 / 192

Re: Downsampling DoP 352.8 / 384 > 176.4 / 192