USB and internal xlinks failing over time

Technical discussions around xCORE processors (e.g. xcore-200 & xcore.ai).
Machdisk
Member++
Posts: 20
Joined: Wed Aug 01, 2018 11:17 am

USB and internal xlinks failing over time

Post by Machdisk »

Hello All, I'm struggling a bit with a multichannel audio design implemented on an XU224-512-FB374. This is my first XMOS design I've done personally.

It's an odd one and I've been digging around for ages trying to diagnose it. The issue goes like this.

1) Build up a new prototype. Check all power rails, clocks and sequencing. All looks good.
2) Plug it in and program it. Everything (usually) works. I program with JTAG, USB connects to PCB and software is running. Channels stream, all is good. I can run it for hours no problems.
3) Power cycle. about a 1 in 5 chance that if I load the software again that the USB will fail enumeration OR the software will run and USB will connect but the software will hang somewhere it didn't before, investigation shows that a data transfer between tiles will fail and it will seem to be impossible to use a link in one direction even though the other way will work fine. Functions just sit there waiting for data that never comes.
4) Repeated power cycles will now very rarely result in a fully functional system. Which ever symptom has shown up will be permanent.
5) Increasing the value of the pulls on x2d04,x2d05,x2d06,x2d07 from 2k2 to 12k brought back a couple of boards for a day or so and then they went again and now pretty much never work. Setting the pull resistors to this value from the start does not make much difference.

I had a bit of back and forward with John from Xmos support and he had me check the following:

We pass all checks on the schematic/layout checklist points. We are using the latest datasheet.

We are using different power supplies from those in the reference design. Levels are within the recommended operating ones at all times and ramp as specified in the board integration section. Noise is about 15mV on 1V0, 30mV on 3v3.

Regarding the USB vbus advisory. We currently have the circuit as shown in the datasheet for a self powered unit with the 10kOhm series resistor and a 4.7uF cap to ground at the connector.

We have not got a choke fitted to the USB. Only some TVS diodes.

There are some signals on layers beneath D+ and D- but they all cross at right angles and don’t appear to couple into the D+ and D- lines in any measurable fashion.

D+ and D- are trace matched and short.

I do not get the image xcore 0 not enabled issue.

The Xmos boots OK and runs code every time on both tiles. Initially when plugging a new board in everything seems fine. Shortly after however the USB starts to not complete enumeration or the tiles stop exchanging data so code hangs where it’s waiting for that data.

Any ideas anybody?

Ed


User avatar
CousinItt
Respected Member
Posts: 360
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

Hi Ed,

this is a bit of a long shot, and I don't have any experience with the XU parts, but I've noticed that there seem to be at least two flavours of power sequencing guidelines in the XL/XU/XE data sheets. Some require both VDDIO/OTP_VCC to come up within 50ms of VDD, others require VDDIO and OTP_VCC to reach their target values (presumably meaning arrive within spec) before VDD reaches 0.4V. It's possible for a sequence to satisfy both guidelines, so it might be worth a try.

Even longer shot: the PLL supply filter recommendations also vary - some say 2R2 and others 4R7. If you're using the lower value an increase in R or C might help, providing the voltage is still within spec.

I hope you can solve it. Please post the fix here when you've identified it.
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

Hi Ed, can you post the relevant and partial schematic of your design for a review? Screen grabs of the PCB layout where the USB traces can be seen would also be helpful.

1) ample current on each the power supply rails?

2) power rail sequencing is important as defined in the XMOS datasheets

3) details of the reset supervisor? What if you manually apply a reset at the reset pin (rather than the power supply reset)? Is the unit stable upon every manual reset?

4) How many layers is your PCB?

5) Impedance controlled traces on the USB traces?

6) ESD diodes compliant for use with USB 2.0 High Speed? (ie. low capacitance)

7) PCB layout without stubs?

Sounds like a power cycle root issue that should be reviewed. Most reset supervisors have a local loading cap to adjust the delay timing. You could attempt to increase the delay and see if that impacts the results.


Update of additional questions:
4) Repeated power cycles will now very rarely result in a fully functional system. Which ever symptom has shown up will be permanent.
That is not good. So if the widget dies after xx hours of use then some spec is being violated after xx hours.

On your PCB layout - how many layers? FR4 laminate?

Using an IR thermometer - can you check the temp of the XMOS CPU? Is it normal or out of range? Perhaps after xx hours of operation, the device is over heating and resulting in the failures?

If you are observing that repeated use of the USB interface and/or cable to the host PC is leading to this failure then some transient (ESD or in-rush current) could be the fault. The best solution for such events is to use a proper USB Load Switch which offers even reverse bias protection where the outside world does not power your powered off device (if you are self powered). Inductive kick back is a serious event (length of your cabling causing a voltage rise on VBUS and nuking at least the USB interface).

On the links - what are the details of the devices mated with the links? This is a multi-XMOS CPU design? Can your JTAG (XTAG) tool detect all the XMOS devices in the chain? Can you still perform code runs on the other XMOS devices or do they all fail after xx hours?

How far are these links between devices?

Raw links or buffered with some LVDS transceiver(s)?
The Xmos boots OK and runs code every time on both tiles.
Even AFTER the failure(s)? Or is it then dead to the world?
Machdisk
Member++
Posts: 20
Joined: Wed Aug 01, 2018 11:17 am

Post by Machdisk »

Thanks for responding guys. In order:

PLL supply filter is using the 4r7.

Both power sequencing targets are met.

Excessive amounts of available current on every rail are available. (4Amps).

Reset is pretty much per the reference design: An NC7WZ07 that resets on a xsys_reset or a power good reset. The power good reset comes up 40ms or so after the VDD finishes ramping. And no it doesn't seem any better with a manual reset.

4 layers, solid ground plane, power supply noise on the VDD and VDDIO vias on the bottom of the board is about 15mV on VDD and 30mV on VDDIO.

Yes impedance controlled USB traces and when USB is working it works reliably. If the unit boots and runs code properly it's pretty bulletproof while it's running it'll go for hours without a drop out. The traces are about 43mm long and very closely length matched and impedance controlled. They have an unbroken ground plane directly beneath them and while there are signals crossing beneath the ground plane at a right angle there does not appear to be any appreciable crosstalk into the USB lines (and running the calculations any crosstalk at the appropriate edge rates from those lines should be less than 30mV even with some pretty pessimistic assumptions).

Very low impedance ESD diodes on USB. USBLC6-2.

Yeah no stubs on the layout.

Sometimes a board dies the first time you power it up. One of them lasted a few weeks.

4 layers, Fr4 laminate.

Temperature is fine, only barely warm to the touch. (although I have seen failures where it has gotten incredibly hot but I've usually been doing some fairly violent poking around when I've caused that) It sometimes fails on the first boot and it makes no difference what temperature it is or how long it has been running.

The failure can occur without the USB even plugged in so I don't believe it is the cable. I usually leave the cable plugged in as I reset things so it's not a transient from plugging and the cable is fairly short.

Sorry by the links I meant the internal bidirectional channel between tiles. There is only one XMOS chip on the board. it fails to pass data from one tile to the next along a channel in one direction but not in the other.

"Even AFTER the failure(s)? Or is it then dead to the world?"

Even after the failures. The chip will still power up, be detected by the JTAG chain and run code but will stall whenever it gets to the point it tries to send data over the defunct channel or it will still run code OK but the USB will have started failing to enumerate (Detects OK and passes the device ID without issue) or sometimes both happen. On the occasions the board boots properly it will run fine so it's not software. Using the debugger I can see all the cores are still running code but the main program hangs waiting for the communication to complete.

Usually the problem will arise after a power cycle but on at least one occasion a board has failed between consecutive xruns without the hardware having been touched at all (the board that ran for a few weeks. Interestingly it died a day after we removed a few unnecessary pull ups on some input pins and we have had at least a couple boards that recovered for a little while after we adjusted pullups on the same 32 bit port. Why this would be the case I have no idea).

John from XMOS has looked at the entire schematic and doesn't see any intrinsic problems with the connectivity.

Really appreciate the help guys.
Attachments
USB_layout.JPG
(415.78 KiB) Not downloaded yet
USB_layout.JPG
(415.78 KiB) Not downloaded yet
USB.JPG
(109.36 KiB) Not downloaded yet
USB.JPG
(109.36 KiB) Not downloaded yet
RESET.JPG
(99.76 KiB) Not downloaded yet
RESET.JPG
(99.76 KiB) Not downloaded yet
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

Ok. Scary.

What are the full details of your power supply being used by this design? Are you mating a commercial built power adapter / source or of your own design?

If the PCB is getting nuked even without the USB connector being used then the event is not cable related for that case.

This is a good thread to review the effects of the capacitance wrt Vbus spikes - although the ST ESD device in theory should be shunting the Vbus rail. No direct experience with this component as we use Socay (same footprint as Littlefuse, etc.) our of Shenzhen, CN.

https://www.xcore.com/viewtopic.php?f=37&t=5808

Proper ESD handling by the PCBA shop? The component itself does not have any assembly faults UNDER the chip? (verified by X-RAY)?
User avatar
Caleb
Experienced Member
Posts: 82
Joined: Thu Apr 04, 2013 10:14 pm
Contact:

Post by Caleb »

I haven't used this particular IC but we have experience with 4 different XMOS devices including an XU. I'll first echo that power supply and reset sequencing violations can cause spooky behavior.
But we have had cases with some batches of assembled PCBs where we suspect too much heat was applied in re-flow, or some other systematic damage to the IC during assembly (soldering IC to PCB). We had a case where 12 of 16 PCBs failed. Replacing the failing IC from the same tray of ICs resulted in all 12 working. We can't know what went wrong but we can be certain that the assemblers did something wrong the first time that they assembled the IC to the PCB or else there was some type of damaged caused during shipping...
I'm sure it's difficult to replace a BGA IC but it's probably worth the exercise. And be certain that the assembly requirements in the datasheet are met.
Machdisk
Member++
Posts: 20
Joined: Wed Aug 01, 2018 11:17 am

Post by Machdisk »

It's an off the shelf 24V power supply with a bunch of DC-DC regulators creating all the downstream rails.

24V ramps up when switched on over 40ms.

No spikes get onto the 1V or 3v3 rails.

3V3 ramps up over 4ms then 1V ramp 10ms later over about a 4ms ramp as well.

Reset comes up about a full second later. It was earlier but I tried keeping it down for a while to see if that helped (it did not). Manually resetting does not help.

Power supply noise is absolutely within operating specs at all times. No unpleasant spikes.

Applying pressure to the top of the chip makes no difference whatsoever.

Three sets of boards assembled at different times have all done the same thing.

Replacing a BGA results in the new BGA failing shortly.

All boards were verified by xray before sending to us.

The ONLY thing that has made any difference to how long it takes to fail is pulling X2d04,5,6 and 7 up to 3v3 through a high value pull up resistor. Also it seems to make no difference at all what I pull X2D04,5 and 7 in terms of boot behaviour for some reason. The unit always boots in the same way, it just dies sooner if the resistors pull to ground or are below a certain value?
Machdisk
Member++
Posts: 20
Joined: Wed Aug 01, 2018 11:17 am

Post by Machdisk »

Does anyone have a spec for acceptable amounts of ripple on the PLL_AVDD line? I know the cap is meant to be mounted close to it but it doesn't define what that is. Mine is within about 10mm but I've noticed there is a little more ripple on the back of the via from the pin than at the cap.
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

Review these article to be sure you are not facing the same issue:

https://www.analog.com/media/en/technic ... /an88f.pdf

And

https://www.xcore.com/viewtopic.php?f=3 ... rge#p29561

No complaints of similar failures from XMOS with other developers?
User avatar
mon2
XCore Legend
Posts: 1913
Joined: Thu Jun 10, 2010 11:43 am
Contact:

Post by mon2 »

To move forward, it may be best to post a (partial) schematic and if possible BOM with details of the parts used in the assembly for at least the components dealing with the XMOS CPU.
3V3 ramps up over 4ms then 1V ramp 10ms later over about a 4ms ramp as well.


Do you have control over the delay between the 3v3 rail and the 1v0 rail sequencing? Can you shorten the 10ms delay to something less? Interested to see this part of the circuit.
Post Reply