DJ kit @ 384 kHz

Technical discussions related to any XMOS development kit or reference design. Eg XK-1A, sliceKIT, etc.
Post Reply
User avatar
jjlm98
Member++
Posts: 31
Joined: Tue Aug 26, 2014 11:00 pm

DJ kit @ 384 kHz

Post by jjlm98 »

Hi there - I'm looking for a little help in extending the DJ kit reference design to 384 kHz. The reference design has been replicated for an application that does not use the 192-kHz DACs on the DJ kit platform.

I've changed MAX_FREQ in customdefines.h to 384000, and I'm operating the XMOS device as a slave (CODEC_MASTER = 1), where I provide the 384-kHz word clock and 2*32*384k = 24.576-MHz bit clock. I also provide a 24.576-MHz master clock (MCLK_48 in customdefines.h is defined as 512*48000).

As an initial test, I master 2-channel 192-kHz audio (bit clock = 12.288 MHz) from the PC to the XMOS device, with the right channel muted. The digital data looks to be expected and is shown in 192k_good.png. Next, I master 2-channel 384-kHz audio (bit clock = 24.576 MHz) audio in the same fashion. Master clock is equal to 24.576 MHz in both the 192-kHz and 384-kHz cases. However, I notice the following problems with the 384-kHz case:

1) The samples appear to be misaligned in time relative to the frame (384k_bad1.png)
2) Zooming out, the XMOS device appears to be twice under-run (50/50 "duty cycle" bursted data) - as if only being sourced at 192 kHz (384k_bad2.png)

Next, I reduce the playback sample rate in the OS to 192 kHz, then reduce my clock rates (driven to the XMOS device) back to 192 kHz. Now there is a sample for every frame, but the samples still appear misaligned in time in the same fashion as the 384-kHz case. If I temporarily change the sample rate in the OS to something else (176.4 kHz) and back to 192 kHz, sample alignment is restored as shown in the original scope capture. Resetting the XMOS device via the hardware pin also restores sample alignment. However, if I reverse the order such that the OS sample rate is changed after my clocks are slowed down, sample alignment is immediate (no need to "toggle" sample rate in the OS).

Is there anything in the setup I have described that would prevent the design from working properly at 384 kHz? Are there any implications with both bit and master clocks operating at the same frequency?

Thanks in advance.
Attachments
192k_good.png
(85.89 KiB) Not downloaded yet
192k_good.png
(85.89 KiB) Not downloaded yet
384k_bad2.png
(91.19 KiB) Not downloaded yet
384k_bad2.png
(91.19 KiB) Not downloaded yet
384k_bad1.png
(93.67 KiB) Not downloaded yet
384k_bad1.png
(93.67 KiB) Not downloaded yet


User avatar
jjlm98
Member++
Posts: 31
Joined: Tue Aug 26, 2014 11:00 pm

Post by jjlm98 »

Greetings - just bumping this thread, as we're still struggling.

As an interim step, to see if master vs. slave mode made a difference, I changed the DJ kit maximum sample rate to 384 kHz while leaving the XMOS device as a master (instead of a slave as it is on my custom board). The onboard CODECs obviously won't support this; I'm simply interested in seeing the clocks and data driven out of the XMOS device for comparison. This was also using a 6-in/6-out configuration (S/PDIF and MIDI disabled).

While the behavior is different than what I described in my previous post, the system still appears to fail in the following ways:

1) When the stream is active, the 384-kHz LRCK period is irregular with intermittent low time (384k_slave_bad1.png). The LRCK driven by the XMOS device is a steady 384 kHz when no stream is active.

2) While the sample alignment problem from my master-mode plots does not appear to be present, data is a whole is discontinuous (384k_slave_bad2.png).

Any and all support is greatly appreciated.
Attachments
384k_slave_bad2.png
(63.98 KiB) Not downloaded yet
384k_slave_bad2.png
(63.98 KiB) Not downloaded yet
384k_slave_bad1.png
(89.37 KiB) Not downloaded yet
384k_slave_bad1.png
(89.37 KiB) Not downloaded yet
User avatar
Ross
XCore Expert
Posts: 962
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

Looks quite strange - I would continue to try and get XMOS master mode working first.

Have you got DSD enabled? I would try disabling it to see if this makes any difference (the DoP detection code code be causing a performance issue at 384kHz)

Also, lots of samples seem to be missing or Zero - what host are you connecting this to? Is the MCLK pin to the XMOS device wired up correctly?
User avatar
jjlm98
Member++
Posts: 31
Joined: Tue Aug 26, 2014 11:00 pm

Post by jjlm98 »

Hi Ross - to answer your questions:

1) MCLK is passed to pin X0D12, which is mapped to PORT_MCLK_IN in the xp_skc_su1.xn file supplied with the original reference design. It is sourced by the PLL on the slice board and appears to be a steady 24.576 MHz as expected.

2) The host is a MacBook Pro running OS X Mavericks. I am requesting the host to sample at 384 kHz by selecting 384 kHz within the "Audio MIDI Setup" system utility.

3) Because DSD enable/disable does not appear to be something I seem to be responsible for configuring in customdefines.h, I don't know that I am or am not running DSD, other than seeing that DSD is advertised as a feature in the software design guide. However, I do see that DSD enable/disable is an input parameter (dsdMode) in calls to AudioHwConfig, but in looking at this function in audiohw.xc, I don't see the function actually use this parameter, at least for the DJ Kit. Therefore I have to assume I'm not running DSD, but maybe you can help confirm.
User avatar
infiniteimprobability
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am
Contact:

Post by infiniteimprobability »

Hi - regarding the DSD enable item, this is controlled by the DSD_CHANS_DAC define. You can see lots of code #if'd in the deliver() function audio.xc in sc_usb_audio. If this value is set then lots of extra codes gets added to the I2S loop (which is what deliver() does), including DoP header detection which might be pushing timing over the edge.

However, if there is no sign of DSD_CHANS_DAC in customdefines.h or the Makefile, then this extra code will be removed by the pre-processor, and we know that this is not the cause of this problem. You could add #error to some of this code inside the #if DSD_CHANS_DAC > 0 sections to make sure it is being removed..

This could still be a performance issue though - have you tried running just 2 channels to see if everything aligns OK? We don't test 384KHz above 2 channels so this is new territory. You only have 2.6us for each loop at 384KHz which includes sample transfer (including interrupt delay from decouple) and all waveform generation.

If this is the problem, then it might be necessary to add a buffer task between decouple and audio.. That hides the interrupt latency that decouple causes when audio asks for samples (saves about 400ns). We do that in the multichannel audio ref design to support 22/24MHz mclk - it can be done by enabling MIXER and setting MAX_MIX_COUNT to 0 - see customdefines.h in app_usb_aud_l2

One other thought.. which application are you building? If it is 2ioxs, then try 2ioxx. That will disable SPDIF and may give a few extra MIPS to the other cores. A good test to see if this is performance related..
User avatar
jjlm98
Member++
Posts: 31
Joined: Tue Aug 26, 2014 11:00 pm

Post by jjlm98 »

Hi infiniteimprobability - thank you for chiming in too. To answer your questions:

1) I do see the DSD-related directives in audio.xc. I grep'd the app_usb_aud_sck_su1 folder and did not see DSD_CHANS_DAC defined anywhere; I also added garbage text to one of the DSD_CHANS_DAC-related #if blocks and it compiled fine. As such, it appears we can be confident that the DJ kit does not enable DSD by default.

2) I am using the 2ioxx configuration such that S/PDIF and MIDI are disabled (which happens to be necessary for my application anyway in order to free as many 1-bit ports as possible).

3) I did some experimentation with various channel counts, and saw the following results:

XMOS as master, 2-in/2-out, without mixer: works
XMOS as master, 4-in/4-out, without mixer: works
XMOS as master, 6-in/6-out, without mixer: does not work
XMOS as master, 6-in/6-out, with mixer: does not work

XMOS as slave, 2-in/2-out, without mixer: works
XMOS as slave, 4-in/4-out, without mixer: does not work
XMOS as slave, 4-in/4-out, with mixer: works
XMOS as slave, 6-in/6-out, with mixer: does not work

Here, "does not work" is defined as the previously seen behavior for master or slave mode. "Works" means that data is continuous and samples appear aligned correctly relative to the word clock - however, I do not yet have an analyzer that can interpret I2S at 384 kHz. For now, I'm just checking that the serial audio stream looks appropriate on an oscilloscope. For the "with" mixer cases, I am adding the following code to customdefines.h:

Code: Select all

/* Enable Mixer Core(s) */
#ifndef MIXER
#define MIXER              1
#endif

/* Disable mixing - mixer core used for volume only */
#ifndef MAX_MIX_COUNT
#define MAX_MIX_COUNT      0
#endif
Slave mode is what is required for our design, and 4-in/4-out is acceptable for our purposes - therefore enabling the mixer seems like a viable fix. We can just have a custom build for this higher sample rate + lower channel count combination. Based on that, I think we can close on this if you guys can help answer the following questions:

1) Based on these results, is this indeed a latency issue? Is there a way to quantify the sample rate + channel count limitation deterministically, vs. empirically finding what works? I'd like to make sure we're not operating the design right along the edge of failure - perhaps it works on my machine today, but we'd like to ensure it doesn't break in some other scenario or environment.

2) Similarly, are there any latency concerns for the 192-kHz case at extreme channel count? I've been using 8-in/4-out and 4-in/8-out configurations at 192 kHz and haven't noticed issues, but similar to above, I'd like to make sure I'm not operating the design in some marginal fashion that may not work across the board.

3) Besides added resource usage, what are the design implications to adding the mixer to the design? Is there any reason we shouldn't just keep it enabled for the 12-channel builds that we limit to 192 kHz? I see a summary of its function in the software design guide, but am curious if there's anything I should watch out for or any other functionality whose presence it may impact.

Thank you guys again and Happy Holidays.
User avatar
infiniteimprobability
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am
Contact:

Post by infiniteimprobability »

Thanks for the feedback - that's great news you have a viable solution. I see this issue has been ongoing for a while so it's good to know you are getting the results you need.
1) Based on these results, is this indeed a latency issue? Is there a way to quantify the sample rate + channel count limitation deterministically, vs. empirically finding what works? I'd like to make sure we're not operating the design right along the edge of failure - perhaps it works on my machine today, but we'd like to ensure it doesn't break in some other scenario or environment.
Good question. It's not so much latency, as total timing budget within the loop. Every system has limits, and we are clearly on the edge in this case as an extra 2 channels (two samples input from decouple and two I2S data outputs per sample period) is enough to cause timing fail.
The nature of XMOS means that it should be fairly robust if it appears to work - all instruction timings are completely deterministic but I share your desire to quantify it. The answer is to use XTA - the timing analyser. There are already a few timing assertion pragmas in there (#pragma xta endpoint) - there is some more background on it here timing stuff. There's a little work to do here, and good starting point would be to thoroughly analyse the while(1) loop in decouple and understand extactly what happens and when..
2) Similarly, are there any latency concerns for the 192-kHz case at extreme channel count? I've been using 8-in/4-out and 4-in/8-out configurations at 192 kHz and haven't noticed issues, but similar to above, I'd like to make sure I'm not operating the design in some marginal fashion that may not work across the board.
We do thorough automated testing at 10 channels in/out at 192KHz, although this does require mixer to be enabled in the configuration. So you should be fine. I believe the threshold for needing mixer enabled is 10 channels (total in/out) using a 24MHz MCLK.

3) Besides added resource usage, what are the design implications to adding the mixer to the design? Is there any reason we shouldn't just keep it enabled for the 12-channel builds that we limit to 192 kHz? I see a summary of its function in the software design guide, but am curious if there's anything I should watch out for or any other functionality whose presence it may impact.
When you set mixer count to 0, all it does is volume (takes the burden off decouple) and buffering, which hides the interrupt latency (it does a lot more with mix count set to >0 such as mixing and channel mapping). So there is no negative impact from having it and I would recommend keeping it on. It's a well tested part of the design.

One other thought - you are exceeding the limit of what you can push through a single isochronous endpoint - 64Mbps. Ie. 384000 x 32 x 6 = 73.7Mbps. So it's not actually going to work from a host perspective! Same obviously with 12Ch @ 192KHz.

I suspect you are still probably seeing an audio loop timing limit too, but you will certainly need to address the bandwidth to host issue also to make 6h work if you needed it.. I have not tried it at this channel count, but you will need to try changing the 24b output packing from 4 sub slots to 3 subslots. That will take the bandwidth down to 384000 x 24 x 6 = 55.3Mbps, which fits OK. Something like the following defines would be needed (please note - not tested although 24b/3byte format is used in some MFA build configurations).

Code: Select all

 -DOUTPUT_FORMAT_COUNT=2 \
	-DSTREAM_FORMAT_OUTPUT_1_RESOLUTION_BITS=16 -DHS_STREAM_FORMAT_OUTPUT_1_SUBSLOT_BYTES=2 \
 	-DSTREAM_FORMAT_OUTPUT_2_RESOLUTION_BITS=24 -DHS_STREAM_FORMAT_OUTPUT_2_SUBSLOT_BYTES=3 
Hope that helps. Happy xMAS all!
User avatar
jjlm98
Member++
Posts: 31
Joined: Tue Aug 26, 2014 11:00 pm

Post by jjlm98 »

Thanks infiniteimprobability - sounds like the take-aways are:

1. If the design appears to function, there's a strong chance it meets timing - but tools are available to quantify this

2. There's no harm in leaving the mixer enabled, so if we have some builds that need it, but some that don't, we can just leave it enabled for all builds

I just have a couple more questions with regard to your latest comments:
infiniteimprobability wrote:We do thorough automated testing at 10 channels in/out at 192KHz, although this does require mixer to be enabled in the configuration. So you should be fine. I believe the threshold for needing mixer enabled is 10 channels (total in/out) using a 24MHz MCLK.
All of the builds I've been working with are 12 channels total (whether 6/6, 4/8, or 8/4), with a 24.576-MHz master clock, so we are above the threshold you describe. Until this discussion, we have not enabled the mixer and did not see issues through 192 kHz - does this mean we're simply getting lucky having not enabled the mixer thus far?
infiniteimprobability wrote:One other thought - you are exceeding the limit of what you can push through a single isochronous endpoint - 64Mbps. Ie. 384000 x 32 x 6 = 73.7Mbps. So it's not actually going to work from a host perspective! Same obviously with 12Ch @ 192KHz.
Can you describe this limitation some more - does this refer to a block in the reference design software, or something in the OS itself? Apologies if this was covered in the software design guide - it didn't jump out at me.

As mentioned previously, we've done extensive bit-accurate 192-kHz testing with channel counts of 12. Given that this exceeds the 64-Mbps limitation, why have we not seen issues thus far? In case it matters: we only tested up to 8 at a time (for example, if I have a board that supports 4-in/8-out, I only test 4 record channels, then 8 playback channels). We never stream all 12 playback and record channels at the same time, although the board would have reported 12 total channels to the OS.

Thanks for the code snippet for the 24-bit option; I'll keep that in my back pocket.
User avatar
jjlm98
Member++
Posts: 31
Joined: Tue Aug 26, 2014 11:00 pm

Post by jjlm98 »

Hi folks - just checking in to see if there are any updates to these last couple questions, specifically:

1) Can anyone shed some light on the 64 Mbps limit? Is this a statement about USB throughput, a Windows limitation, etc?

2) More importantly, I'm trying to understand what to set for NUM_USB_CHAN_IN / I2S_CHANS_ADC, NUM_USB_CHAN_OUT / I2S_CHANS_DAC, and MAX_FREQ in customdefines.h. Currently, we are configuring these for 6, 6, and 192000, respectively, which calls for 73.7 Mpbs (> 64 Mpbs).

My question is, if compile the firmware with a channel count / sample rate product that exceeds 64 Mbps, will a customer only experience a failure if they exercise the full capabilities of the board? As I mention, I haven't seen any issues with 6 channels of playback at 192 kHz (but no record channels), or 6 channels of record at 192 kHz (but no playback) - and these are with a board with the firmware configured for 6/6/192 kHz.

The reason for asking is that some customers may prefer low channel count at high sample rate (e.g., 4-in/4-out at 192 kHz), while others may prefer high channel count at low sample rate (e.g., 6-in/6-out at 48 kHz), both of which are "legal" and below 64 Mbps in total. However, it's not practical to maintain different build configurations for each case.

Instead, if the throughput is only allocated on an "as needed" basis (e.g., being called for by an application), the 64 Mpbs limitation is something that we could simply warn the user about in documentation.
Post Reply