[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

8760.0. "Token ring: ring recovery messages" by INDYX::ram (Ram Rao, PBPGINFWMY) Fri Feb 07 1997 15:59

I have a customer running V4.0A on a 4100 with a PCI Token Ring Adapter
PBXNP-AA.  Their kern.log file is being flooded with messages of the
following type:
	test4100: vmunix: tra0: ring status: ring recovery
(Their machine in named test4100).  Once in a while they are having
connections to the machine dropping.  Other notes in this Notesfile
seem to suggest that their are some token-ring patches available,
but I have not found them in the public areas on oskits or guru.

Customer is willing to upgrade to V4.0B if this will solve the problem
or a V4.0B patch exists.  (The V4.0B patches seem to be more up-to-date
than V4.0A).

Thanks,

Ram
T.RTitleUserPersonal
Name
DateLines
8760.1Get the patchSMURF::GILLUMKirt GillumMon Feb 10 1997 13:2010
    
    There is a patch for running on the 4100 (a TI errata implementation). 
    Contact Bob Spear (spear@zk3.dec.com) for a pointer to the patch.
    
    Typically "ring recover" occurs everytime that a node enters/exits the
    ring.  Also, if you look at the counters, you'll probably see alot of
    beaconing (indicative of a bad connection).
    
    However, start with the latest patched driver.
    
8760.2still errors after patchDYOSW5::WILDERDoes virtual reality get swapped?Wed Mar 05 1997 10:0917
    I am also working with the customer listed in the base note. 
    
    The customer is now at 4.0b. We have applied the patch provided by
    spear. They are STILL getting a lot of "ring recovery" errors. After
    all this, the system crashed after running for about 9 days. The
    crash-data showed numerous ring recovery errors at the time of the
    crash.
    
    To our knowledge, there are not nodes entering/exiting antwhere NEAR
    the number of ring recoveries they get.
    
    Any ideas? Should we ignore this? 
    
    Thanks,
    
    /jim
    
8760.3SMURF::GILLUMKirt GillumThu Mar 06 1997 18:068
    
    Ring recoveries are not a big deal...  Perhaps you should try a
    different cable/mau port on the adapter.  Also, look at the counters
    and see if anything abnormal jumps out at you (netstat -s -Itra0).
    
    Crashing is a big deal.  Why are you crashing?  Is the token ring
    driver on the stack when the system crashes?
    
8760.4Recovery errors concern themNETRIX::"nancy@csc.cxo.dec.com"Nancy FlavellFri Mar 21 1997 15:0534
Kirt,

We appreciate your help with this problem.

Since installing the latest Digital UNIX patches (including the one
you mentioned from Bob Spear), the system has not crashed again.  It
seems that there is not even a core dump or crash-data left to
analyze on the system, which has been running for eight days since
applying the patches.

However, they still get so many of the "ring recovery" messages that
they are quite concerned the Digital equipment or software has some
kind of problem.

Their viewpoint is that there are only three systems on the token
ring (IBM and HP are the other two), none of which are being added
or removed from the ring, and only one of which report the ring
recoveries, namely the Alpha.

We are working to get the additional information you requested,
like the netstat -s.  Meanwhile, may we confirm whether the sole
cause of the "ring recovery" messages is supposed to be nodes
physically being removed or added to the ring, please?

I will also mention that the hardware has been replaced, resulting
in no changes to the symptoms.

Nancy Flavell
Digital UNIX Network Support Specialist
Customer Support Center, Colorado



[Posted by WWW Notes gateway]
8760.5SMURF::GILLUMKirt GillumFri Mar 21 1997 17:0417
    
    The reason that none of the other systems display the message is
    because they probably ignore it.  I think I'll change the driver to 
    ignore the event.  It would save customers from getting concerned.
    
    From the TI TMS380 Second Generation Token Ring User's Guide...
    
    RING_RECOVERY: This bit is set to one when the adapter observes claim
    token MAC frames on the ring.  The adapter may be transmitting the
    claim token frames.  This bit is reset when a ring purge frame is
    received or transmitted.
    
    So the ring monitor probably has detected an error condition and sent
    around a claim frame.  Typically this happens when a node joins or
    exits the ring, but can also occur for several other reasons (like not
    seeing the token within the token rotation time).