Hi,
I have a problem with my custom board and AN00218_app_hires_DAS_fixed example.
So in my case 16 microphones are connected to tile 0 and are working fine, i can change delays, listen to the output audio but i want to increase the number of microphones to 24 or better 32.
I have built my own version of xCORE Microphone Array 32 microphones, 16 mics are connected to tile 0 and 16 mics are connected to tile 1.
I use XEF216-512-TQ128-C20.
Hardware is ok, the problem is in my software probably. I do not know how to run decimators, delay line on tile 1 and send output data to hires_DAS_fixed function.
Does anybody can help me in this problem?
On tile 0 i have in main function (it works for 16 mics from tile 0):
par {
mic_array_pdm_rx(p_pdm_mics, c_pdm_to_hires[0], c_pdm_to_hires[1]);
mic_array_pdm_rx(p_pdm_mics2, c_pdm_to_hires[2], c_pdm_to_hires[3]);
mic_array_hires_delay(c_pdm_to_hires, c_hires_to_dec, 4, c_cmd);
mic_array_decimate_to_pcm_4ch(c_hires_to_dec[0], c_ds_output[0], MIC_ARRAY_NO_INTERNAL_CHANS);
mic_array_decimate_to_pcm_4ch(c_hires_to_dec[1], c_ds_output[1], MIC_ARRAY_NO_INTERNAL_CHANS);
mic_array_decimate_to_pcm_4ch(c_hires_to_dec[2], c_ds_output[2], MIC_ARRAY_NO_INTERNAL_CHANS);
mic_array_decimate_to_pcm_4ch(c_hires_to_dec[3], c_ds_output[3], MIC_ARRAY_NO_INTERNAL_CHANS);
hires_DAS_fixed(c_ds_output, c_cmd, c_audio);
}
Do i need to run the same code for tile 1? What about hires_DAS_fixed function? Do i have to have it on each tile?
Thank You in advance.
BR
Tom
Microphone array with more than 16 microphones
-
- Member
- Posts: 11
- Joined: Fri Feb 10, 2017 11:33 am
-
Verified
- XCore Legend
- Posts: 1156
- Joined: Thu May 27, 2010 10:08 am
Yes, definitely. You can split your par into:Do i need to run the same code for tile 1? What about hires_DAS_fixed function? Do i have to have it on each tile?
Code: Select all
par{
on tile[0]: par{
...
}
on tile[1]: par{
...
}
}
It's as easy as that.
Engineer at XMOS
-
- Member
- Posts: 11
- Joined: Fri Feb 10, 2017 11:33 am
Ok but what about hires_DAS_fixed function. How to change it that mics data from tile 0 and tile 1 are summed togehther?infiniteimprobability wrote:Yes, definitely. You can split your par into:Do i need to run the same code for tile 1? What about hires_DAS_fixed function? Do i have to have it on each tile?
..with two top levels as above.Code: Select all
par{ on tile[0]: par{ ... } on tile[1]: par{ ... } }
It's as easy as that.
We have decimators configuration in that function:
mic_array_decimator_conf_common_t dcc = {0, 1, 0, 0, DECIMATION_FACTOR,
g_third_stage_div_2_fir, 0, FIR_COMPENSATOR_DIV_2,
DECIMATOR_NO_FRAME_OVERLAP, FRAME_BUFFER_COUNT};
mic_array_decimator_config_t dc[4] = {
{&dcc, data[0], {INT_MAX, INT_MAX, INT_MAX, INT_MAX}, 4},
{&dcc, data[4], {INT_MAX, INT_MAX, INT_MAX, INT_MAX}, 4},
{&dcc, data[8], {INT_MAX, INT_MAX, INT_MAX, INT_MAX}, 4},
{&dcc, data[12], {INT_MAX, INT_MAX, INT_MAX, INT_MAX}, 4}
//{&dcc, data[16], {INT_MAX, INT_MAX, INT_MAX, INT_MAX}, 4},
//{&dcc, data[20], {INT_MAX, INT_MAX, INT_MAX, INT_MAX}, 4}
};
and time domain frame:
mic_array_frame_time_domain * current =
mic_array_get_next_time_domain_frame(c_ds_output, DECIMATOR_COUNT, buffer, audio, dc);
finally we sum all channel into one:
int output = 0;
for(unsigned i=0;i<MIC_ARRAY_NUM_MICS;i++)
output += (current->data[0]>>3);
output = ((int64_t)output * (int64_t)gain)>>16;
c_audio <: output;
c_audio <: output;
What about it? How to pass data from tile 1 to that function?
BR
TOM
-
Verified
- XCore Legend
- Posts: 1156
- Joined: Thu May 27, 2010 10:08 am
Actually, thinking about this further, you need to consider which parts of the system need to share memory. The decimators (mic_array_decimate_to_pcm_4ch) and app (hires_DAS_fixed() ) that receives the data via mic_array_get_next_time_domain_frame need to be on the same tile, because they exchange pointers to shared memory.
The channels between mic_array_ pdm_rx, mic_array_ hires_delay and mic_array_decimate_to_pcm_4ch allow the end tasks to be on different tiles.
So, in your case, you will need to have at least the 4 decimators and the receiving app on the same tile. Since you need 9 threads in total (which is more than the 8 available per tile):
- 2 x pdm_rx
- 2 x hires delay
- 4 x decimator
- 1 x app
... you will need to put 1 x pdm_rx (and hires delay..might as well) on a different tile.
Your app will setup and receive two pairs of decimators, and so your main inner loop could be something like:
and then, as you say, something like:
to sum the PCM data from each of the mics
The channels between mic_array_ pdm_rx, mic_array_ hires_delay and mic_array_decimate_to_pcm_4ch allow the end tasks to be on different tiles.
So, in your case, you will need to have at least the 4 decimators and the receiving app on the same tile. Since you need 9 threads in total (which is more than the 8 available per tile):
- 2 x pdm_rx
- 2 x hires delay
- 4 x decimator
- 1 x app
... you will need to put 1 x pdm_rx (and hires delay..might as well) on a different tile.
Your app will setup and receive two pairs of decimators, and so your main inner loop could be something like:
Code: Select all
mic_array_frame_time_domain * current_0 =
mic_array_get_next_time_domain_frame(c_ds_output_0, DECIMATOR_COUNT_0, buffer_0, audio_0, dc_0);
mic_array_frame_time_domain * current_1 =
mic_array_get_next_time_domain_frame(c_ds_output_1, DECIMATOR_COUNT_1, buffer_1, audio_1, dc_1);
Code: Select all
int output = 0;
for(unsigned i=0;i<MIC_ARRAY_NUM_MICS_0;i++)
output += (current_0->data[i][0]>>3);
for(unsigned i=0;i<MIC_ARRAY_NUM_MICS_1;i++)
output += (current_1->data[i][0]>>3);
output = ((int64_t)output * (int64_t)gain)>>16;
Engineer at XMOS
-
- Member
- Posts: 11
- Joined: Fri Feb 10, 2017 11:33 am
infiniteimprobability wrote:Actually, thinking about this further, you need to consider which parts of the system need to share memory. The decimators (mic_array_decimate_to_pcm_4ch) and app (hires_DAS_fixed() ) that receives the data via mic_array_get_next_time_domain_frame need to be on the same tile, because they exchange pointers to shared memory.
The channels between mic_array_ pdm_rx, mic_array_ hires_delay and mic_array_decimate_to_pcm_4ch allow the end tasks to be on different tiles.
So, in your case, you will need to have at least the 4 decimators and the receiving app on the same tile. Since you need 9 threads in total (which is more than the 8 available per tile):
- 2 x pdm_rx
- 2 x hires delay
- 4 x decimator
- 1 x app
... you will need to put 1 x pdm_rx (and hires delay..might as well) on a different tile.
Your app will setup and receive two pairs of decimators, and so your main inner loop could be something like:
and then, as you say, something like:Code: Select all
mic_array_frame_time_domain * current_0 = mic_array_get_next_time_domain_frame(c_ds_output_0, DECIMATOR_COUNT_0, buffer_0, audio_0, dc_0); mic_array_frame_time_domain * current_1 = mic_array_get_next_time_domain_frame(c_ds_output_1, DECIMATOR_COUNT_1, buffer_1, audio_1, dc_1);
to sum the PCM data from each of the micsCode: Select all
int output = 0; for(unsigned i=0;i<MIC_ARRAY_NUM_MICS_0;i++) output += (current_0->data[i][0]>>3); for(unsigned i=0;i<MIC_ARRAY_NUM_MICS_1;i++) output += (current_1->data[i][0]>>3); output = ((int64_t)output * (int64_t)gain)>>16;
Hi,
Yours comments are correct but when i try to implement this i get:
Constraints checks FAILED
Cores available: 8, used: 10, FAILED
I tried to fix this but with no success. I had implement 1 x pdm_rx on tile 1, rest on tile 0.
Looks like it is impossible to have more than 16 microphones on my chip. Or at least there is a need to modify library functions.
What do You think?
BR
Tom
-
- Member
- Posts: 11
- Joined: Fri Feb 10, 2017 11:33 am
Yes but even with the decimators on the tile 0 and the rest of additional channels block on the tile 1 compiler can not fit this in 8 cores.
BR
Tom
-
Verified
- XCore Legend
- Posts: 1156
- Joined: Thu May 27, 2010 10:08 am
Hi Tom,
I have read your previous posts a little more carefully and thought further about this. If you need 32 channels of PDM mic and hires delay and audio output, then this is certainly a squeeze. Having 24 is definitely no problem, but 32 may be possible with some optimisation work.
Firstly, due to the requirement that mic_array_get_next_time_domain_frame needs to share memory with the decimators (of which there are 8, which is the number of cores on a tile) means that you cannot have a single task/app that collects all of the delayed PCM samples. There is also a restriction of 4 streaming channels across the tiles. So, when you have:
4 x PDM
2 x Delay
8 x Decimators
16 - (4 + 2 + 8) = 2 cores left.
This does not leave much for the app (collecting and summing + controlling delays) and transport of samples back off the chip (I2S). A scheme which could work is this:

You can see you have two apps. Each collects 16 samples from the decimators and App1 forwards it's 16 samples to App0, so it has all of them, which can then be summed.
So somehow you need to fit I2S, the I2S handler and the forwarding logic all in two cores. I think this is possible - basically one (probably App 0) would be I2S with the I2S handler call backs containing the logic and the other would be a loop which collects and forwards samples. It may make sense to put delay control logic in App1 actually, to share the load.
The key here is that the I2S handler is "distributable" which means the compiler effectively inlines the handler callbacks from the I2S core. It means you have to have shortish handlers (so you don't break timing) but even at 48KHz, you have many microseconds available to do the logic in the i2s handler callbacks, so you don't miss the next I2S event. This way you have a single core which runs I2S and talks to the decimators and receives the forwarded samples.
I'd say there is several days work to get this going (it's a fairly advanced bit of optimisation), and there is a small risk it might not work (without proving feasibility), but my sense from previous projects is that it is technically possible.
I guess it depends how many units you are producing and whether your development time is worth the saving in BOM and squeezing it all into a single device. Laying down 2 devices (connected via a link) is the alternative.
I have read your previous posts a little more carefully and thought further about this. If you need 32 channels of PDM mic and hires delay and audio output, then this is certainly a squeeze. Having 24 is definitely no problem, but 32 may be possible with some optimisation work.
Firstly, due to the requirement that mic_array_get_next_time_domain_frame needs to share memory with the decimators (of which there are 8, which is the number of cores on a tile) means that you cannot have a single task/app that collects all of the delayed PCM samples. There is also a restriction of 4 streaming channels across the tiles. So, when you have:
4 x PDM
2 x Delay
8 x Decimators
16 - (4 + 2 + 8) = 2 cores left.
This does not leave much for the app (collecting and summing + controlling delays) and transport of samples back off the chip (I2S). A scheme which could work is this:

You can see you have two apps. Each collects 16 samples from the decimators and App1 forwards it's 16 samples to App0, so it has all of them, which can then be summed.
So somehow you need to fit I2S, the I2S handler and the forwarding logic all in two cores. I think this is possible - basically one (probably App 0) would be I2S with the I2S handler call backs containing the logic and the other would be a loop which collects and forwards samples. It may make sense to put delay control logic in App1 actually, to share the load.
The key here is that the I2S handler is "distributable" which means the compiler effectively inlines the handler callbacks from the I2S core. It means you have to have shortish handlers (so you don't break timing) but even at 48KHz, you have many microseconds available to do the logic in the i2s handler callbacks, so you don't miss the next I2S event. This way you have a single core which runs I2S and talks to the decimators and receives the forwarded samples.
I'd say there is several days work to get this going (it's a fairly advanced bit of optimisation), and there is a small risk it might not work (without proving feasibility), but my sense from previous projects is that it is technically possible.
I guess it depends how many units you are producing and whether your development time is worth the saving in BOM and squeezing it all into a single device. Laying down 2 devices (connected via a link) is the alternative.
Engineer at XMOS