No simultaneous in & out USB audio with lib_xua Topic is solved

Discussions about USB Audio on XMOS devices
Jash
Member
Posts: 9
Joined: Mon Apr 07, 2025 9:38 am

Post by Jash »

Hello,

Thank you for this new idea, I checked out the MCLK counts at each SOF period and it is approximately 3072 (3071 or 3073 from time to time), just like you said.
By the way, I use pin X0D11 as tile 0 clock input looped back from internal PLL on X1D11 (I didn't do it at first a few months ago and the USB device wasn't even recognized)

I also added some printf at ep_buffer.xc line 509 and at decouple.xc line 897 to know when an input stream is requested by host: input stream is started (in XUA_Buffer_Ep() then in XUA_Buffer_Decouple()) two times when I try to record the mic inputs.

I did the same at ep_buffer.xc line 820 and at decouple.xc line 1091 to know when the device has sent a packet to host: this is never reached (both in XUA_buffer_Ep() and XUA_Buffer_Decouple()), so the mics datas from audiohub don't reach the buffer, even if the request from host has been received.

Does it make you understand what could be happening?
View Solution
User avatar
infiniteimprobability
Verified
XCore Legend
Posts: 1164
Joined: Thu May 27, 2010 10:08 am

Post by infiniteimprobability »

OK, so you have a good MCLK count, audio is looping and some configs are working. Those are all of the basics. You are also looking in all of the right places and have acquired a good depth of understanding of the firmware.

I am not sure what else to suggest other than continuing to follow signal path from audiohub->decouple (which fires an ISR and populates the device to host buffer) and then tells XUD (USB) that next time an IN on the ISO data endpoint occurs that the buffer is there.. EPBuffer just handles the enabled select case and signals back via shared mem to decouple to say it has been consumed. So I think decouple is definitely the place to look further.

The buffer (aud_to_host_buffer) consists of a length field followed by the payload. So, looking at the logic, it must be the case that decouple either thinks it's in underflow or the buffer length is 0. Is it possible to stick a print in to check which of these is causing the lack of input data transfer?


One other thing to explore - I spoke to a knowledgeable colleague and he says that Audacity can be unstable on windows when recording multi-channel. Can I ask a bit more about the host setup, and whether you have any other options to see if we can rule the host out? I use Linux and find arecord gives very good control over what exactly get requested of the device in terms of rate, channel count and format.
Engineer at XMOS
User avatar
infiniteimprobability
Verified
XCore Legend
Posts: 1164
Joined: Thu May 27, 2010 10:08 am

Post by infiniteimprobability »

Another thing to check is the value of g_aud_to_host_flag. This is initialised to 1 in EPBuffer if input is enabled.
Engineer at XMOS
Jash
Member
Posts: 9
Joined: Mon Apr 07, 2025 9:38 am

Post by Jash »

Hello,

Thank you for your suggestions, it led me to more test results:

I added some printf in both underflow and no underflow cases (along with min_pkt_size and datalength values).
These points are not reached when I try recording inputs (i.e. g_aud_to_host_flag stays 0), but strangely they are reached from time to time after having stopped the recording (underflow the few first times with min_pkt_size = 84 then no underflow all of the next times with datalength = 100). It seemed to occur when I output sound to speaker, but I am not 100% sure about it since it didn't seem repetable.

With Audacity I use WASAPI host, 48kHz rate, 32-bit float format, 4 input channels.
I tried with arecord in an Ubuntu virtual machine (S32_LE format, 48kHz rate, 4 channels, 1 second duration): recording is very slow (several minutes for each 1/2 second frame of each channel), and when I try listening to the generated file I only hear a 1-second shrill noise (I suppose it is because of samples having been recorded abnormally spaced apart).
Same printf from firmware as when using Audacity (no packet sent when recording, but sent afterwards).

When I reduce NUM_USB_CHAN_IN to 2, recording with arecord is slow too (less than with 4 channels though), and with Audacity the time cursor moves, but very slowly.
Same behavior of printf but with min_pkt_size = 44 and datalength = 52.
Do you think this latency issue could be related to a resource limitation of the chip? Knowing that all of the processes in my design run on tile 0 apart from the audiohub and I2S output process.
(If it is the case, it is still strange that I can record 8 channels simultaneously without any problem when there is no output channel)

When I reduce NUM_USB_CHAN_IN to 1, recording with both arecord and Audacity is well timed, but samples are always 0 (that was not the case with 2 or 4 channels).
Printf from firmware are still the same though, with min_pkt_size = 24 and datalength = 28.

When testing with the mics only firmware, no matter the channel count, recording is always well timed with both arecord and Audacity, and listening to the generated file or wave gives something coherent with what I record.
The firmware printf show that it continuously sends packets without underflow during the recording, with datalength = 100 with 4 channels, datalength = 52 with 2 channels and datalength = 28 with 1 channel. This at least shows that the packet sizes are correct in the simultaneous in & out firmware, even though these points are not reached when expected.

Does it give you more ideas?
User avatar
infiniteimprobability
Verified
XCore Legend
Posts: 1164
Joined: Thu May 27, 2010 10:08 am

Post by infiniteimprobability »

The design itself is good for 10in 10 out at 192 kHz and is regression tested at that rate. https://github.com/xmos/sw_usb_audio/bl ... d.cmake#L8
We also test at 32 in / 32 out at 48kHz - https://github.com/xmos/sw_usb_audio/bl ... .cmake#L29

These are tested on Windows (with the Thesycon driver) and MAC (intel + ARM). Are you running the thesycon driver or using the native windows driver? I believe the native windows driver should be good for 32ch.

So fundamentally your channel count is well within the capabilities of the chip. There are a LOT of designs based on this SW that have been deployed successfully and many with higher performance demands than yours. But that doesn't solve your problem other than hopefully rule some things out.
Do you think this latency issue could be related to a resource limitation of the chip? Knowing that all of the processes in my design run on tile 0 apart from the audiohub and I2S output process"
It depends how many tasks/threads you have per tile. Certainly up to 6 or even 7 should be no problem - the main task that likes MIPS is XUD (the USB device task) and that needs over 80. So for example, 600/7 = 85.7MHz which should be sufficient. How many tasks do you have on tile[0]? If 6 or fewer I cannot see this being a performance issue.

However, temporarily boosting the core frequency in the .xn file is a quick way of checking to see if you have a MIPS problem in the device. 800MHz runs fine on the bench normally. For example:

Code: Select all

        <Node Id="0" InPackageId="0" Type="XS3-L16A-1024" Oscillator="24MHz" SystemFrequency="800MHz" ReferenceFrequency="100MHz">
One other thought - do you know if you are using JTAG or XSCOPE for printing? Printing from JTAG stops the chip for a while (which will cause a lot of problems) whereas XSCOPE just slows down the thread being printed from. Be sure to include config.xscope in your src directory to ensure the device uses xscope printing - beware that printing from tight loops (eg. audio loops at 48k) might break things though.

Does it give you more ideas?
Honestly, other than looking at a USB trace to see what the host is trying to do vs. reality I'm running out of ideas. I have some suspicions about the host - trying a Mac or Linux (native) would be what I'd try next. It's interesting that things work as reduce channel count. It suggests that the logic is doing everything you should expect but something in the signal chain struggles as the rates go up. You can test it's not the XMOS device side with the chip speed test above.
Engineer at XMOS
Jash
Member
Posts: 9
Joined: Mon Apr 07, 2025 9:38 am

Post by Jash »

Hello,

Thank you for all these suggestions.

I use the native Windows driver for all my tests, and tried on a native Linux computer of a colleague (both with Audacity using ALSA and arecord), but the behavior stays the same.

I have 6 threads running on tile 0 so like you said it should not be a ressource limitation. Moreover, all my tests are done with 800MHz system frequency so that weakens ever more this hypothesis.
I also tried setting the buffer thread at high priority and/or fast mode, as well as the XUD thread, without any different result.

I indeed use Xscope for printing (even the mics only firmware doesn't work properly without Xscope), and I even tried removing all the prints but it didn't change anything.

Like you suggested, I will try probing the USB traces and compare them to the ones with the mics only firmware to understand what is going on, and will keep you updated.

Thank you again for your time
Jash
Member
Posts: 9
Joined: Mon Apr 07, 2025 9:38 am

Post by Jash »

Hello again,

I finally managed to resolve the issue!!

I am a bit ashamed to say it, but the problem was in the XUA_Buffer() function call in my main(), I inverted the audio in endpoint and the feedback endpoint (second and third argument)...

It explains a LOT of things, like why I had the impression that g_aud_to_host_flag was raised when I output sound to speaker, or why the recording delay was shorter as I reduced the channel count (I have 1 output, which means 1 feedback channel, but several inputs so the host most likely received samples slower than expected), and by the way I am sure that when I reduced NUM_USB_CHAN_IN to 1, the recording would have been non zero if I played something on the speaker at the same time.
However, I don't understand why I was able to clearly record the feedback (even without a speaker plugged to the board), as it was supposed to be switched with the audio in.

Anyway, thanks a lot for having taken the time to answer all my messages, and really sorry to have bothered you for an issue that could have been resolved a month and a half ago.