All of the cores run sequentially true. However the decouple task exchanges all channels in one sample period at a time (a frame). Un packing is only a few cycles per sample although you do have a lot of samples so maybe this is an issue in your case.We can't accept 3Byte words because unpacking is not truly parallel to the other processes of the audio stream. For each channel, a sample is received from USB then unpacked then send to mixer. And then another sample for another channel and so on. So the mixing thread is waiting for every sample to be send (so to be unpacked before) and do nothing during this time. Unpacking/packing from 3 Byte thus adds some time that we can't waste on that (because of other tasks such as mixing etc...)
I think it would be orders of magnitude less effort to modify decouple than implement a new feature to the USB low level driver (pretty hard core assembler and in depth knowledge of USB).
You could either:
- Add a double buffer and pass the last unpacked samples using straight line code, then unpack into the other buffer
- If your mixer is on the same tile, use the "unpack after exchange" code but just pass a pointer across. That will save you 40 odd out instructions + memory loads..