Xcore200 USB channel limitation

Technical questions regarding the XTC tools and programming with XMOS.
MarcAntigny
Active Member
Posts: 61
Joined: Tue Feb 13, 2018 2:50 pm

Xcore200 USB channel limitation

Post by MarcAntigny »

Hi all,

I am working with the xCore 200 MC-Audio using the USB AUDIO 2.0 software reference. As mentioned here, the channel limitation for USB Audio 2.0 is 42 channels at 48kHz. The software reference design is presented to work to up to 32 USB channels. I tried with more USB channels (I need about 40 for my application) and I got some surprising results.
Everything works fine with 32, 34 or even 36 USB channels (USB recording on computer). With 38 or 40 channels, the USB stream doesn't work.
The driver is OK (we made it with Theysicon kit), I already updated the USB descriptors to have 40 channels but it doesn't work and moreover, the code doesn't block.
Is there such a hard-coded limitation of the number of USB audio channels ? Maybe with the packets size or the initialization. I am not at ease with USB packets managing so maybe I missed the information.
Thanks for any idea or comment on the subject.

Marc


User avatar
infiniteimprobability
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am
Contact:

Post by infiniteimprobability »

Hmm that's interesting - I wonder if the host is even trying to open the stream? It would be good to see a USB trace to see if the Alt interface changed on stream start. This will tell us if the host is deciding to reject the request or if the firmware is not handling it.

One thing to try is to change the slot size to 3. This will reduce the required bandwidth by 25%. For example:

42 * 32b * 48000 = 64512000
42 * 24b * 48000 = 48384000

This will increase the load on the firmware as is has to unpack. That may need profiling.. However both of the above numbers should be supported by a 1024byte packet limit size at 8kHz (1024 * 8 * 8000 = 65536000) so not exactly sure at this point what is not working..
MarcAntigny
Active Member
Posts: 61
Joined: Tue Feb 13, 2018 2:50 pm

Post by MarcAntigny »

Hi infiniteimprobability,
It worked fine with a slot size of 3B (as expected). However, as you mentioned, the load on the firmware is heavy (unpacking etc) and for my application I need to save a lot of time for other operations. So I will need to dig a bit further to make it work with 4B per sample.
And it seems really surprising that it works with 3B and not with 4B, because the bandwidth should be OK for both.
MarcAntigny
Active Member
Posts: 61
Joined: Tue Feb 13, 2018 2:50 pm

Post by MarcAntigny »

Hi,
At last, the USB channel limitation is linked with the USB packet filling not the bandwidth.
I have 40 channels with 4 Bytes per sample and at 48kHz, there is 6 samples per packet + 1 sample for jitter (48000 / 6 = 8000, the USB micro-frame frequency).
So 40 * 4 * 7 = 1120 Bytes of data per microframe. This exceeds the limitation of 1024 Bytes per microframe. With 36 channels, it leads to 1008 B per microframe (< 1024) so it worked.

However I really need the 40 channels for my application. We found that with "multi-packet" (more than 1024 B/microframe) this should work fine but the USB library doesn't seem to handle multi-packet.
The same thing was highlighted by redfart here : http://www.xcore.com/viewtopic.php?t=5103
So is there now a version of the USB library where multi-packet is supported ?
Thanks.
User avatar
infiniteimprobability
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am
Contact:

Post by infiniteimprobability »

This is a good point - we occasionally need to allow for 7 samples per SOF period in the case where f > 48000Hz.

So 1024 / 7 / 4 = 36.57 channels

and..

1024 / 7 / 3 = 48.76 channels

Support for multiple frames per microframe is not yet implemented for Isochronous endpoint so it looks like 3 byte slots is the only option. In terms of workload, it's only the decouple task which will see this additional workload. The audio side is still dealing with ints albeit with the bottom byte zero padded. I haven't tested it, but my sense is that it should be possible. 48kHz has a frame period of 20.8us which, at 62.5MHz, is 1300 cycles. That's a lot compared with unpacking 40 or so values from an array.
MarcAntigny
Active Member
Posts: 61
Joined: Tue Feb 13, 2018 2:50 pm

Post by MarcAntigny »

Hi infiniteimprobability,
You have summarized the situation well.
Sorry to read that there is not yet support of multiple frame per microframe. Has anybody worked on it, even as a prototype version ? This is on the critical path of our project to work with 40 USB channel...
And as you wrote, 20.8us is large enough for unpacking 40 values. However we don't just unpack values during a frame period (mixing, routing and such funny things).
User avatar
infiniteimprobability
XCore Legend
Posts: 1126
Joined: Thu May 27, 2010 10:08 am
Contact:

Post by infiniteimprobability »

Sorry to read that there is not yet support of multiple frame per microframe. Has anybody worked on it, even as a prototype version ?
I think there is a very early experiment on a branch but it was never completed.
There's another thread with more detail about it here: http://www.xcore.com/viewtopic.php?f=37&t=3075

This is on the critical path of our project to work with 40 USB channel...
This would be true if you must only support 4byte words. If you can accept 3 byte words, then there is no barrier?
And as you wrote, 20.8us is large enough for unpacking 40 values. However we don't just unpack values during a frame period (mixing, routing and such funny things).
Mixing, routing etc. is normally done between audio (i2s) and decouple (where the packing un/packing happens). So if you put a task in between those (like we used to with mixer) then I think you can do all of this. Activity in the processing thread will not affect decouple and vice versa, as long as the total number of active tasks meets the performance requirement.
MarcAntigny
Active Member
Posts: 61
Joined: Tue Feb 13, 2018 2:50 pm

Post by MarcAntigny »

Hi infiniteimprobability,
I have already read this thread. It gives some hints but to work with multiple transaction per microframe, the modifications are to be made in the files of the XMOS USB lib. That's why kenmac couldn't get the DATA1 and DATA0 recognition in the device. The library is not complete for this use.
This would be true if you must only support 4byte words. If you can accept 3 byte words, then there is no barrier?
We can't accept 3Byte words because unpacking is not truly parallel to the other processes of the audio stream. For each channel, a sample is received from USB then unpacked then send to mixer. And then another sample for another channel and so on. So the mixing thread is waiting for every sample to be send (so to be unpacked before) and do nothing during this time. Unpacking/packing from 3 Byte thus adds some time that we can't waste on that (because of other tasks such as mixing etc...)
User avatar
akp
XCore Expert
Posts: 578
Joined: Thu Nov 26, 2015 11:47 pm

Post by akp »

Hi MarcAntigny,
It sounds like you are unpacking in the same task as you are mixing. Is that correct? Or are you operating the mixer in a different task? I guess it seems to me that if you have a mixer in a different task from decouple and you generate a new frame for i2s after you receive a new frame from decouple then the worst you add is one frame time of latency (because the mixer operates in one frame time simultaneously to the i2s task outputting the previous frame and the usb task unpacking the next frame) . Is that what you're trying to avoid or is there a throughput issue? If there's a throughput issue it must mean the total CPU usage of all your tasks exceeds the tile limits I suppose.
MarcAntigny
Active Member
Posts: 61
Joined: Tue Feb 13, 2018 2:50 pm

Post by MarcAntigny »

Hi akp,
No we are not unpacking in the same task as the mixing task. I think there might be a problem because the decouple task get one sample then unpack it then get another sample then unpack it... etc. While the mixing task, as in reference design, send all the samples to the decouple task. So the mixing task sends a sample then it waits for the decouple task to get then unpack the sample and after that sends a new sample. From one part the mixing task wants to send all the samples (the new frame as you named it) but from the other part, the decouple task get each sample one by one, unpacking them in between. I think it is not optimized with large amount of USB channel.
As infiniteimprobability indicated, we have no other choice for now than to try with 3 Byte and unpacking. I'm doing some bench on it and the Xcore 200 MC board and Soft reference. I hope it won't add much latency. (For now with 32 channel and 3B/unpacking, it lead to strange results).

Regards,
Marc
Post Reply