USB communication fails on ESD test / USB response status == EPROTO (-71) in Wireshark/USBmon

Technical questions regarding the XTC tools and programming with XMOS.
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

USB communication fails on ESD test / USB response status == EPROTO (-71) in Wireshark/USBmon

Post by dsteinwe »

Hello folks,

I have a weird problem to solve. I have developed an USB sound card with an own firmware implementation (inspired from the XMOS UAC project). The firmware works fine under normal conditions. When I output music and penetrate the sound card with ESDs, the output stops and the mixer doesn't work anymore. I have sniffed the USB communication with Wireshark/USBmon under linux and I was able to identify many communication errors.

The sound card uses multiple endpoints, including an endpoint for audio streams in and out. The first communication error I have found, happens to the endpoint for isochronous stream in. I suspect that I could have found the error at another endpoint as well at first.

Here is the output of the first erroneous USB frame, that wireshark has logged:

Frame 193151: 84 bytes on wire (672 bits), 84 bytes captured (672 bits) on interface usbmon2, id 0
USB URB
	[Source: 2.3.1]
	[Destination: host]
	URB id: 0xffff9edcc9372000
	URB type: URB_COMPLETE ('C')
	URB transfer type: URB_ISOCHRONOUS (0x00)
	Endpoint: 0x81, Direction: IN
		1... .... = Direction: IN (1)
		.... 0001 = Endpoint number: 1
	Device: 3
	URB bus id: 2
	Device setup request: not relevant ('-')
	Data: present (0)
	URB sec: 1655901567
	URB usec: 891490
	URB status: Success (0)
	URB length [bytes]: 4
	Data length [bytes]: 20
	[Request in: 193134]
	[Time from request: 0.004017000 seconds]
	[bInterfaceClass: Unknown (0xffff)]
	ISO error count: 0
	Number of ISO descriptors: 1
	Interval: 8
	Start frame: 1008
	Copy of Transfer Flags: 0x00000204, No transfer DMA map, Dir IN
	Number of ISO descriptors: 1
	USB isodesc 0 [Protocol error (-EPROTO)] (4 bytes)
		Status: Protocol error (-EPROTO) (-71)
		Offset [bytes]: 0
		Length [bytes]: 4
		Padding: 0x00000000

According to USB Error codes EPROTO means:
  1. bitstuff error
  2. no response packet received within the prescribed bus turn-around time
  3. unknown USB error
Presumably, the ESD test causes a bit error. Anyhow, the problem is, that isochronous communications continues and the host always logs the EPROTO error at the response.

In my code only the EP0 (Control) is OR-ed with XUD_STATUS_ENABLE.

In 4.2.11 Status Reporting is the behavior of XUD_STATUS_ENABLE described.

My questions:
  1. Shouldn't an ESD test cause an USB protocol error? That means, the error is caused by a bad hardware design?
  2. If an protocol error happens i.e. inside isochronous stream, should lib_usd fix this error itself?
  3. If lib_usd doesn't fix this error itself, should I OR-ing all my endpoints with XUD_STATUS_ENABLE and implement an error handling according to the function "USB_StandardRequests()" in "xud_device.xc" by calling "XUD_SetStall()" myself in an error case?
  4. Should I do something completely different?
Best regards!
Dieter


User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

I have uploaded the full wireshark dump if you are interested in.

Set the filter "usb.device_address == 3" to show only the frames of the sound card.

"usb.device_address == 3 and usb.iso.iso_status == -71" shows all erroneous frames for the isochronous ep's.

"usb.device_address == 3 and usb.urb_status == -71" shows all erroneous frames for the other ep's.
You do not have the required permissions to view the files attached to this post.
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

In the meantime, I have collected some new information. During my tests, I have only got once following error message:

Code: Select all

xrun: Program received signal ET_ILLEGAL_RESOURCE, Resource exception.
[Switching to tile[1] core[4] (dual issue)]
Pid_Sof () at ./included/XUD_Token_SOF.S:12

12 ./included/XUD_Token_SOF.S: No such file or directory.
in ./included/XUD_Token_SOF.S
Current language: auto; currently asm
In all other cases, the programming interface also seems to die. Therefore, I have not (yet) had a chance to fathom the cause even more deeply or to reproduce the exception. I have added an exception handler that resets the device. Now I can see in the operating system log that the USB device is reset during ESD tests and no longer hangs as before. That is an improvement. Anyhow, I would like to understand, what the possible causes are for the exceptions during the esd tests. Any idea is welcome! I would like to avoid these exections.
jseaber
New User
Posts: 3
Joined: Thu Aug 05, 2021 4:35 pm

Post by jseaber »

Sorry to reply so late to this thread, but thought I'd share what we have learned with respect to ESD.

In short, software is not to blame here. Exceptions upon ESD strikes indicate a hardware problem. So without seeing the schematic and board layout, it's difficult to provide clear answers.

It's critical to protect the XMOS reset pin, USB D+/- pins, and use USB 5V load switching. ESD protection measures should also be added to any I/O pins in use (especially for I2C, SPI, etc.). Reference designs we began with years ago omitted most of these real world requirements.

- I/O pins: Add series termination and ESD321DPYR or equivalent
- USB Input: TPD2E2U06DRLR or equivalent. Review best practices of grounding, shielding, and 5V protection for your design

Are you able to share what hardware protection was in place?
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

Hi jseaber,

it's not too late ;-). Well, I have used the TPD2E001DRLR, that I have found on the explorerKit board. Indeed, I think the TPD2E2U06DRLR is the better choice, because it has also a surge protection.

In the meantime, I think, I have solved the problem. I soldered a mini pcb with a different ESD design on my pcb, that finally succeeded the tests. In the final design, I haven't used the ESD IC anymore, because I had problems to buy them during the design time. Now I use a single ESD diode for USB D+/- and USB 5V, respectively. The diodes have also a surge protection as the diode ESD321DPYR, you have proposed. Of course, I used a suitable 5V diode for USB 5V. I also added a common choke for USB D+/-, but that seems to be optional. Additionally, I used the try-catch lib to automatically restart the mcu, when an exception occurs.

All other interfaces like I2C are also ESD protected. But there, I haven't used diodes with surge protection.
User avatar
Ross
XCore Expert
Posts: 966
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

dsteinwe wrote: Tue Aug 02, 2022 10:42 am In the meantime, I have collected some new information. During my tests, I have only got once following error message:

Code: Select all

xrun: Program received signal ET_ILLEGAL_RESOURCE, Resource exception.
[Switching to tile[1] core[4] (dual issue)]
Pid_Sof () at ./included/XUD_Token_SOF.S:12

12 ./included/XUD_Token_SOF.S: No such file or directory.
in ./included/XUD_Token_SOF.S
Current language: auto; currently asm
In all other cases, the programming interface also seems to die. Therefore, I have not (yet) had a chance to fathom the cause even more deeply or to reproduce the exception. I have added an exception handler that resets the device. Now I can see in the operating system log that the USB device is reset during ESD tests and no longer hangs as before. That is an improvement. Anyhow, I would like to understand, what the possible causes are for the exceptions during the esd tests. Any idea is welcome! I would like to avoid these exections.
A full dump from the command xrun --dumpstate <binary_name>.xe would be good. Thanks
User avatar
dsteinwe
XCore Addict
Posts: 144
Joined: Wed Jun 29, 2016 8:59 am

Post by dsteinwe »

I have made many attempts to reproduce the exception. Unfortunately, I had no success. If I can still cause the exception at some point, I'll get in touch.
User avatar
fabriceo
XCore Addict
Posts: 183
Joined: Mon Jan 08, 2018 4:14 pm

Post by fabriceo »

Hum
this morning I also got this error message
Pid_Sof () at ./included/XUD_Token_SOF.S:12
using lib_XUD 2.2.4
this was just when starting playing a stream.
of course I have a heavy modified application (based on 6.15.2) so there is a context which i need to investigate further.
Still I have the feeling this is related to the newer version of xud library as my application is 3 year old and I didn't notice this before.
nothing to do with ESD in this case.

I ll provide more info later as I m able to reproduce it.
You do not have the required permissions to view the files attached to this post.
User avatar
fabriceo
XCore Addict
Posts: 183
Joined: Mon Jan 08, 2018 4:14 pm

Post by fabriceo »

from an iMac, I get another error:

Code: Select all

xrun: Program received signal ET_LOAD_STORE, Memory access exception.
      [Switching to tile[1] core[6]]
      XUD_TokenOut_Handshake () at ./included/XUD_Token_Out_DI.S:88

      88	./included/XUD_Token_Out_DI.S: No such file or directory.
      	in ./included/XUD_Token_Out_DI.S
or

Code: Select all

OKT : audio stream started
xrun: Program received signal ET_LOAD_STORE, Memory access exception.
      [Switching to tile[1] core[6]]
      XUD_IN_TxHandshake () at ./included/XUD_Token_In_DI.S:36

      36	./included/XUD_Token_In_DI.S: No such file or directory.
      	in ./included/XUD_Token_In_DI.S
here is the dump for both error. it seems there is a stack traceability issue in xud.
I cannot yet give more details on the context and when this happens.
You do not have the required permissions to view the files attached to this post.
User avatar
fabriceo
XCore Addict
Posts: 183
Joined: Mon Jan 08, 2018 4:14 pm

Post by fabriceo »

well, these error happen when I start sending a stream to the device, at a given USB host fs which is confirmed to the host via the FB_Endpoint (forced), but the I2S audio task is configured at another fs due to an SPDIF receiver chip which provide the real mclk...
obviously the buffering, xud_setready and other mechanisms are jeopardized and this is not a normal behavior.
so I need to sort this out myself before asking whats wrong in xud :) . Just to share and not feel lonesome.