Microphone array with more than 16 microphones

New to XMOS and XCore? Get started here.
Post Reply
inowatom
Member
Posts: 11
Joined: Fri Feb 10, 2017 11:33 am

Microphone array with more than 16 microphones

Post by inowatom »

Hi,

I have a problem with my custom board and AN00218_app_hires_DAS_fixed example.
So in my case 16 microphones are connected to tile 0 and are working fine, i can change delays, listen to the output audio but i want to increase the number of microphones to 24 or better 32.

I have built my own version of xCORE Microphone Array 32 microphones, 16 mics are connected to tile 0 and 16 mics are connected to tile 1.
I use XEF216-512-TQ128-C20.

Hardware is ok, the problem is in my software probably. I do not know how to run decimators, delay line on tile 1 and send output data to hires_DAS_fixed function.

Does anybody can help me in this problem?

On tile 0 i have in main function (it works for 16 mics from tile 0):

par {

mic_array_pdm_rx(p_pdm_mics, c_pdm_to_hires[0], c_pdm_to_hires[1]);
mic_array_pdm_rx(p_pdm_mics2, c_pdm_to_hires[2], c_pdm_to_hires[3]);

mic_array_hires_delay(c_pdm_to_hires, c_hires_to_dec, 4, c_cmd);

mic_array_decimate_to_pcm_4ch(c_hires_to_dec[0], c_ds_output[0], MIC_ARRAY_NO_INTERNAL_CHANS);
mic_array_decimate_to_pcm_4ch(c_hires_to_dec[1], c_ds_output[1], MIC_ARRAY_NO_INTERNAL_CHANS);

mic_array_decimate_to_pcm_4ch(c_hires_to_dec[2], c_ds_output[2], MIC_ARRAY_NO_INTERNAL_CHANS);
mic_array_decimate_to_pcm_4ch(c_hires_to_dec[3], c_ds_output[3], MIC_ARRAY_NO_INTERNAL_CHANS);

hires_DAS_fixed(c_ds_output, c_cmd, c_audio);
}

Do i need to run the same code for tile 1? What about hires_DAS_fixed function? Do i have to have it on each tile?

Thank You in advance.

BR
Tom


User avatar
infiniteimprobability
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am
Contact:

Post by infiniteimprobability »

Do i need to run the same code for tile 1? What about hires_DAS_fixed function? Do i have to have it on each tile?
Yes, definitely. You can split your par into:

Code: Select all

par{

on tile[0]: par{
  ...
  }

on tile[1]: par{
  ...
  }
}
..with two top levels as above.
It's as easy as that.
inowatom
Member
Posts: 11
Joined: Fri Feb 10, 2017 11:33 am

Post by inowatom »

infiniteimprobability wrote:
Do i need to run the same code for tile 1? What about hires_DAS_fixed function? Do i have to have it on each tile?
Yes, definitely. You can split your par into:

Code: Select all

par{

on tile[0]: par{
  ...
  }

on tile[1]: par{
  ...
  }
}
..with two top levels as above.
It's as easy as that.
Ok but what about hires_DAS_fixed function. How to change it that mics data from tile 0 and tile 1 are summed togehther?
We have decimators configuration in that function:

mic_array_decimator_conf_common_t dcc = {0, 1, 0, 0, DECIMATION_FACTOR,
g_third_stage_div_2_fir, 0, FIR_COMPENSATOR_DIV_2,
DECIMATOR_NO_FRAME_OVERLAP, FRAME_BUFFER_COUNT};
mic_array_decimator_config_t dc[4] = {
{&dcc, data[0], {INT_MAX, INT_MAX, INT_MAX, INT_MAX}, 4},
{&dcc, data[4], {INT_MAX, INT_MAX, INT_MAX, INT_MAX}, 4},
{&dcc, data[8], {INT_MAX, INT_MAX, INT_MAX, INT_MAX}, 4},
{&dcc, data[12], {INT_MAX, INT_MAX, INT_MAX, INT_MAX}, 4}
//{&dcc, data[16], {INT_MAX, INT_MAX, INT_MAX, INT_MAX}, 4},
//{&dcc, data[20], {INT_MAX, INT_MAX, INT_MAX, INT_MAX}, 4}
};

and time domain frame:
mic_array_frame_time_domain * current =
mic_array_get_next_time_domain_frame(c_ds_output, DECIMATOR_COUNT, buffer, audio, dc);

finally we sum all channel into one:

int output = 0;
for(unsigned i=0;i<MIC_ARRAY_NUM_MICS;i++)
output += (current->data[0]>>3);
output = ((int64_t)output * (int64_t)gain)>>16;

c_audio <: output;
c_audio <: output;
What about it? How to pass data from tile 1 to that function?

BR
TOM
User avatar
infiniteimprobability
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am
Contact:

Post by infiniteimprobability »

Actually, thinking about this further, you need to consider which parts of the system need to share memory. The decimators (mic_array_decimate_to_pcm_4ch) and app (hires_DAS_fixed() ) that receives the data via mic_array_get_next_time_domain_frame need to be on the same tile, because they exchange pointers to shared memory.

The channels between mic_array_ pdm_rx, mic_array_ hires_delay and mic_array_decimate_to_pcm_4ch allow the end tasks to be on different tiles.

So, in your case, you will need to have at least the 4 decimators and the receiving app on the same tile. Since you need 9 threads in total (which is more than the 8 available per tile):

- 2 x pdm_rx
- 2 x hires delay
- 4 x decimator
- 1 x app

... you will need to put 1 x pdm_rx (and hires delay..might as well) on a different tile.

Your app will setup and receive two pairs of decimators, and so your main inner loop could be something like:

Code: Select all

mic_array_frame_time_domain * current_0 =
mic_array_get_next_time_domain_frame(c_ds_output_0, DECIMATOR_COUNT_0, buffer_0, audio_0, dc_0);
mic_array_frame_time_domain * current_1 =
mic_array_get_next_time_domain_frame(c_ds_output_1, DECIMATOR_COUNT_1, buffer_1, audio_1, dc_1);
and then, as you say, something like:

Code: Select all

int output = 0;
for(unsigned i=0;i<MIC_ARRAY_NUM_MICS_0;i++)
output += (current_0->data[i][0]>>3);
for(unsigned i=0;i<MIC_ARRAY_NUM_MICS_1;i++)
output += (current_1->data[i][0]>>3);
output = ((int64_t)output * (int64_t)gain)>>16;
to sum the PCM data from each of the mics
inowatom
Member
Posts: 11
Joined: Fri Feb 10, 2017 11:33 am

Post by inowatom »

infiniteimprobability wrote:Actually, thinking about this further, you need to consider which parts of the system need to share memory. The decimators (mic_array_decimate_to_pcm_4ch) and app (hires_DAS_fixed() ) that receives the data via mic_array_get_next_time_domain_frame need to be on the same tile, because they exchange pointers to shared memory.

The channels between mic_array_ pdm_rx, mic_array_ hires_delay and mic_array_decimate_to_pcm_4ch allow the end tasks to be on different tiles.

So, in your case, you will need to have at least the 4 decimators and the receiving app on the same tile. Since you need 9 threads in total (which is more than the 8 available per tile):

- 2 x pdm_rx
- 2 x hires delay
- 4 x decimator
- 1 x app

... you will need to put 1 x pdm_rx (and hires delay..might as well) on a different tile.

Your app will setup and receive two pairs of decimators, and so your main inner loop could be something like:

Code: Select all

mic_array_frame_time_domain * current_0 =
mic_array_get_next_time_domain_frame(c_ds_output_0, DECIMATOR_COUNT_0, buffer_0, audio_0, dc_0);
mic_array_frame_time_domain * current_1 =
mic_array_get_next_time_domain_frame(c_ds_output_1, DECIMATOR_COUNT_1, buffer_1, audio_1, dc_1);
and then, as you say, something like:

Code: Select all

int output = 0;
for(unsigned i=0;i<MIC_ARRAY_NUM_MICS_0;i++)
output += (current_0->data[i][0]>>3);
for(unsigned i=0;i<MIC_ARRAY_NUM_MICS_1;i++)
output += (current_1->data[i][0]>>3);
output = ((int64_t)output * (int64_t)gain)>>16;
to sum the PCM data from each of the mics

Hi,

Yours comments are correct but when i try to implement this i get:

Constraints checks FAILED
Cores available: 8, used: 10, FAILED

I tried to fix this but with no success. I had implement 1 x pdm_rx on tile 1, rest on tile 0.

Looks like it is impossible to have more than 16 microphones on my chip. Or at least there is a need to modify library functions.
What do You think?

BR
Tom
User avatar
infiniteimprobability
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am
Contact:

Post by infiniteimprobability »

What do You think?
I think it is possible (even with the hi-res delay) and you are pretty much there. Just move the delays across and you are done.. Something like this

Image
inowatom
Member
Posts: 11
Joined: Fri Feb 10, 2017 11:33 am

Post by inowatom »

infiniteimprobability wrote:
What do You think?
I think it is possible (even with the hi-res delay) and you are pretty much there. Just move the delays across and you are done.. Something like this

Image
Yes but even with the decimators on the tile 0 and the rest of additional channels block on the tile 1 compiler can not fit this in 8 cores.

BR

Tom
User avatar
infiniteimprobability
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am
Contact:

Post by infiniteimprobability »

Hi Tom,
I have read your previous posts a little more carefully and thought further about this. If you need 32 channels of PDM mic and hires delay and audio output, then this is certainly a squeeze. Having 24 is definitely no problem, but 32 may be possible with some optimisation work.

Firstly, due to the requirement that mic_array_get_next_time_domain_frame needs to share memory with the decimators (of which there are 8, which is the number of cores on a tile) means that you cannot have a single task/app that collects all of the delayed PCM samples. There is also a restriction of 4 streaming channels across the tiles. So, when you have:

4 x PDM
2 x Delay
8 x Decimators
16 - (4 + 2 + 8) = 2 cores left.

This does not leave much for the app (collecting and summing + controlling delays) and transport of samples back off the chip (I2S). A scheme which could work is this:
Image

You can see you have two apps. Each collects 16 samples from the decimators and App1 forwards it's 16 samples to App0, so it has all of them, which can then be summed.

So somehow you need to fit I2S, the I2S handler and the forwarding logic all in two cores. I think this is possible - basically one (probably App 0) would be I2S with the I2S handler call backs containing the logic and the other would be a loop which collects and forwards samples. It may make sense to put delay control logic in App1 actually, to share the load.

The key here is that the I2S handler is "distributable" which means the compiler effectively inlines the handler callbacks from the I2S core. It means you have to have shortish handlers (so you don't break timing) but even at 48KHz, you have many microseconds available to do the logic in the i2s handler callbacks, so you don't miss the next I2S event. This way you have a single core which runs I2S and talks to the decimators and receives the forwarded samples.

I'd say there is several days work to get this going (it's a fairly advanced bit of optimisation), and there is a small risk it might not work (without proving feasibility), but my sense from previous projects is that it is technically possible.

I guess it depends how many units you are producing and whether your development time is worth the saving in BOM and squeezing it all into a single device. Laying down 2 devices (connected via a link) is the alternative.
Post Reply