mpc8260 fcc enet transmit time out

Fri Jan 27 07:58:25 EST 2006

One day, hubert.loewenguth at thales-bm.com wrote:
> Everything works fine, but, if I do successive plugs/unplugs during 
> important data transfert, The driver enter into an infinite loop:
> ...
> Is there anybody having encounter the same problem?
> Is there anybody having done some test of  numerous plug/unplug
during 
> important data transfert with a half-duplex connection on mpc8260?
> Is there anybody having an idea to help me ?

I have seen many symptoms involving the "NETDEV WATCHDOG: eth0: transmit
timed out" message, but so far I do not have a code fix for any of them.
:(

We (my employer) use an MPC8270 (mask 2K49M) and LXT971A PHY, with Linux
2.4.18.  In our case we do have MII PHY interrupt.  Like you, when I get
the transmit timeout, it repeats forever.  But I do not see the problem
when doing successive plugs/unplugs of the Ethernet cable.  Instead, I
get timeout during normal board operation, without human interaction.

In one customer site where our MPC8270 board is used, the customer uses
100 Mb half duplex Ethernet.  During many weeks of normal operation,
several times the board did experience transmit timeout.  One of the
times, this was output:

<-------- DUMP STARTS HERE ---------->
NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out.
 Ring data dump: cur_tx c01aa380 (full) cur_rx c01aa220.
 Tx @base c01aa308 :
9c00 0051 070f79a2
1c00 0056 070f7da2
1c00 0056 070f7ea2
1c00 0051 070f7ba2
1c80 003f 070f51c2
9c00 0056 070f50c2
9c00 0051 070f52c2
9c00 0056 070f53c2
9c00 0056 070f55c2
9c00 0051 070f54c2
dc00 0038 070f56c2
9c00 0056 070f57c2
9c00 0051 070f58c2
9c00 0056 070f59c2
9c00 0056 070f5ac2
bc00 0056 070f7ca2
 Rx @base c01aa208 :
9c00 0040 0046f000
<--- snip: BD status are all 9c00 -->
9c00 0040 00461000
9c00 0040 00461800
9c00 0040 00460000
bc00 0040 00460800
<---------- DUMP ENDS HERE ---------->

Note that one TxBD has the status 0x1c80, indicating late collision
(BD_ENET_TX_LC).  This is an unusual condition in Ethernet, but recovery
should still be possible.  Like you, I suspect errata CPM 119, but I
have not tried the patch yet.  (Development schedules and all that
jazz.)

As a workaround, we placed a 10/100 Mb hub between the board and the
customer's network, which negotiated the PHY up to 100 Mb full duplex.
The transmit timeout problem has not been seen since (to the best of my
knowledge.)

Back in the lab I have been able to reproduce the transmit timeout on a
100 Mb full duplex network.  Like you, I added printk output where
fcc_enet_interrupt tests each BD_ENET_TX_* flag.  In one case, I saw
this:

<-------- DUMP STARTS HERE ---------->
eth0: BDP=c01aa370: Carrier lost
eth0: BDP=c01aa370: Carrier lost
eth0: BDP=c01aa330: Carrier lost
eth0: BDP=c01aa360: Carrier lost
eth0: BDP=c01aa348: Carrier lost
eth0: BDP=c01aa310: Carrier lost
eth0: BDP=c01aa318: Carrier lost
<---- Carrier lost repeats 61 more times, random BDP ---->
eth0: BDP=c01aa348: Underrun
eth0: Restarting transmitter!!!

NETDEV WATCHDOG: eth0: transmit timed out
eth0: transmit timed out.
<-------- DUMP ENDS HERE ---------->

The Underrun message means TxBD status bit BD_ENET_TX_UN (0x0002) was
set.  The last Tx ring data dump in your post shows the same thing.
That scares me, mainly because I don't know what it means.  Does it mean
the SDMA transfer didn't end on time?  I dunno.  And what the heck is
carrier lost during TX in full duplex mode?  It makes sense for half
duplex mode like your situation, but I can't make sense of it for full
duplex.  Further, the underrun case has only happened once; in most
other cases, I get a transmit timeout wih absolutely no TxBD error bits
whatsoever, and no indication that a TX restart was even attempted.
That's even scarier.  I also did try repeated plug/unplug of Ethernet
during peak normal operation (probably 5-10 Mb traffic) on the 100 Mb
full duplex network, but after 11 successive plugs I did not see any
timeouts.

I'm starting to wonder if I have a cache coherency problem.  The buffer
descriptors are in main RAM and the data cache is turned on...  Its just
a thought I picked up reading some prior posts that I can't rightly
recall.

I noted that the MPC8280 manual (online from Freescale) does now detail
the transmitter recovery procedure (section 30.10.1 FCC Transmit
Errors), and it's not nearly as simple as what fcc_enet.c implements in
any kernel version.  Despite CPM37, they don't toggle GFMR[ENT] in
combination with the RESTART_TX command.  Also, in 30.12.1 FCC
Transmitter Full Sequence, a command (either RESTART_TX or INIT_TRX)
must be issued after GFMR[ENT] is cleared but _before_ it is set.  You
might try changing fcc_enet_interrupt to do this:

	    if (must_restart) {
		volatile cpm8260_t *cp;

		cep->fccp->fcc_gfmr &= ~FCC_GFMR_ENT;

		cp = cpmp;
		cp->cp_cpcr =
		    mk_cr_cmd(cep->fip->fc_cpmpage,
cep->fip->fc_cpmblock,
		    		0x0c, CPM_CR_RESTART_TX) | CPM_CR_FLG;
		while (cp->cp_cpcr & CPM_CR_FLG);

		cep->fccp->fcc_gfmr |=  FCC_GFMR_ENT;
	    }

I've not been able to work on the problem for some time (development
schedules and all that jazz), but I'll post my solution if I find one.

-Dave

DISCLAIMER:
Important Notice *************************************************
This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unintended recipients are prohibited from taking action on the basis of information in this e-mail.E-mail messages may contain computer viruses or other defects, may not be accurately replicated on other systems, or may be intercepted, deleted or interfered with without the knowledge of the sender or the intended recipient. If you are not comfortable with the risks associated with e-mail messages, you may decide not to use e-mail to communicate with IPC. IPC reserves the right, to the extent and under circumstances permitted by applicable law, to retain, monitor and intercept e-mail messages to and from its systems.