[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference netcad::hub_mgnt

Title:DEChub/HUBwatch/PROBEwatch CONFERENCE
Notice:Firmware -2, Doc -3, Power -4, HW kits -5, firm load -6&7
Moderator:NETCAD::COLELLADT
Created:Wed Nov 13 1991
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:4455
Total number of notes:16761

3516.0. "some more errorlog from DS900EF..." by BERFS4::NORD () Tue May 07 1996 18:41


	Good morning, good evening and something between

	(do you know this entry, ouh yaah, the one with the errorlog-entries
	 in the DECswitch 900 is back!)

	Have some new one for you:

	Berlin, hospital, GIGAswitch, some DM900 with DS900EF, stackables, a
	lot of copper and fiber!

	Sitting there with my laptop, and giving one of the DM900 an IP-address,
	as I saw, that the DS900EF, connected to the slot 8 of this backplane
	is running selftest, grumble grumble. Selftest didn't finish, but port
	2 (AUI) is yellow, hmmm. (I've nothing configured to the backplane or
	the DS900EF). Swap it out and swap it in: the same! Swap it out and
	swap it in: the selftest-LED is blinking: non-fatal error! OK. DM900-
	menue: 9 (redirect...) to slot 8: No line-card..., eeyy! It's just
	running the selftest (nobody knows why, but...). After the module came
	up, I was able to "redirect" to slot 8 and I did a "dump error log".
	Below you will see the output:

	The first entries are befor the selftest-non-fatal-error, the seconds
	are from after the selftest-non-fatal-error. Amazing is, that the module
	has installed the firmware-version 1.5.2 and, as you look at the second
	entries, they are telling me, it has V2.1 installed. Should I be asto-
	nished or is that normal?

	Yes, I'll swap this module, no problem, but I need some informations
	about the entries, 'cause this is the second DS900EF of 20 (are not
	configured yet).

	Many thanks,

	with regards

	Wolfgng Nord
	MCS at Berlin at Germany




DECswitch 900EF - slot 8
==============================================================================

                                DUMP ERROR LOG
                            Current Reset Count: 9223

==============================================================================


Entry #       = 3
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 10
Firmware Rev  = 1.5
Reset Count   = 9222
Timestamp     =    0    0    0
Write Count   = 2513
FRU Mask      = 0
Test ID       = DEAD
Error Data    = SR=2008 PC=03033234 Error Code=000023C0 ProcCsr=376D
Registers     = D0=00008344 D1=00000001 D2=00000006 D3=00000001
                D4=00000001 D5=00000787 D6=00000000 D7=0000FFFF
                A0=05033010 A1=0004BA04 A2=0004B8B4 A3=05010012
                A4=030020D8 A5=03020000 A6=0004B830 A7=0004B7C8
Dump another entry [Y]/N? 
Entry #       = 2
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 10
Firmware Rev  = 1.5
Reset Count   = 9221
Timestamp     =    0    0    0
Write Count   = 2513
FRU Mask      = 0
Test ID       = DEAD
Error Data    = SR=2008 PC=03033234 Error Code=000023C0 ProcCsr=3F6D
Registers     = D0=00008344 D1=00000001 D2=00000006 D3=00000001
                D4=00000001 D5=00000787 D6=00000000 D7=0000FFFF
                A0=05033010 A1=0004BA04 A2=0004B8B4 A3=05010012
                A4=030020D8 A5=03020000 A6=0004B830 A7=0004B7C8
Dump another entry [Y]/N? 
Entry #       = 1
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 10
Firmware Rev  = 1.5
Reset Count   = 9220
Timestamp     =    0    0    0
Write Count   = 2513
FRU Mask      = 0
Test ID       = DEAD
Error Data    = SR=2008 PC=03033234 Error Code=000023C0 ProcCsr=376D
Registers     = D0=0000C344 D1=00000001 D2=00000006 D3=00000001
                D4=00000000 D5=00000000 D6=00000001 D7=00000000
                A0=05033010 A1=0004BA04 A2=0004B8B4 A3=05010012
                A4=030020D8 A5=03020000 A6=0004B860 A7=0004B7F8
Dump another entry [Y]/N? 
Entry #       = 0
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 10
Firmware Rev  = 1.5
Reset Count   = 9219
Timestamp     =    0    0    F
Write Count   = 2513
FRU Mask      = 0
Test ID       = DEAD
Error Data    = SR=2008 PC=03044244 Error Code=000023C0 ProcCsr=376D
Registers     = D0=0000C344 D1=00000001 D2=00000006 D3=00000001
                D4=00000000 D5=00000000 D6=00000001 D7=00000000
                A0=05033010 A1=0004BA04 A2=0004B8B4 A3=05010012
                A4=030020D8 A5=03020000 A6=0004B860 A7=0004B834
Dump another entry [Y]/N? 

DECswitch 900EF - slot 8
==============================================================================

                                DUMP ERROR LOG
                            Current Reset Count: 9225

==============================================================================


Entry #       = 0
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 1
Firmware Rev  = 2.1
Reset Count   = 9224
Timestamp     =    0    0    0
Write Count   = 2515
FRU Mask      = 2
Test ID       = 962
Error Data    = SR=0006 PC=00000000 Error Code=00000000 ProcCsr=0000

                 0:00000006  1:00000000  2:00000000  3:00000000
                 4:00000000  5:00000000  6:00000000  7:00000000
Dump another entry [Y]/N? y
Entry #       = 3
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 1
Firmware Rev  = 2.1
Reset Count   = 9224
Timestamp     =    0    0    0
Write Count   = 2514
FRU Mask      = 2
Test ID       = 964
Error Data    = SR=0002 PC=00000002 Error Code=00000004 ProcCsr=0000

                 0:00000002  1:00000002  2:00000004  3:00000000
                 4:00000000  5:00000000  6:00000000  7:00000000
Dump another entry [Y]/N? 



Entry #       = 2
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 1
Firmware Rev  = 2.1
Reset Count   = 9225
Timestamp     =    0    0    0
Write Count   = 2515
FRU Mask      = 2
Test ID       = 963
Error Data    = SR=0002 PC=00000043 Error Code=00000000 ProcCsr=0000

                 0:00000002  1:00000043  2:00000000  3:00000000
                 4:00000000  5:00000000  6:00000000  7:00000000
Dump another entry [Y]/N? 


Dump another entry [Y]/N? y
Entry #       = 1
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 10
Firmware Rev  = 1.5
Reset Count   = 9225
Timestamp     =    0    0  269
Write Count   = 2515
FRU Mask      = 0
Test ID       = DEAD
Error Data    = SR=2008 PC=03033234 Error Code=000023C0 ProcCsr=1F6D
Registers     = D0=0000C344 D1=00000001 D2=00000006 D3=00000001
                D4=00000000 D5=00000000 D6=00000001 D7=00000000
                A0=05033010 A1=0004BA04 A2=0004B8B4 A3=05010012
                A4=030020D8 A5=03020000 A6=0004B860 A7=0004B7F8
Dump another entry [Y]/N? 

Dump another entry [Y]/N? y
Entry #       = 0
Entry Status  = 0   [0=valid, 1=write_error, 2=invalid, 3=empty, 4=crc_error
Entry Id      = 1
Firmware Rev  = 2.1
Reset Count   = 9224
Timestamp     =    0    0    0
Write Count   = 2515
FRU Mask      = 2
Test ID       = 962
Error Data    = SR=0006 PC=00000000 Error Code=00000000 ProcCsr=0000

                 0:00000006  1:00000000  2:00000000  3:00000000
                 4:00000000  5:00000000  6:00000000  7:00000000
Dump another entry [Y]/N? 


T.RTitleUserPersonal
Name
DateLines
3516.1Answers on your error log entries & other hints....NETCAD::BATTERSBYDon't use time/words carelesslyWed May 08 1996 14:1846
    Wolfgang, I'll try to answer this note.
    
    First off, don't grumble about the selftest. Treat the selftest
    diagnostics as a tool for finding internal hardware problems before
    the user environment finds them, not as a hinderance. :-)
    Now, what the yellow port 2 state led is telling you is that there is
    a non-fatal problem with one of the two internal modules within the
    900EF box, namely the I/O module.  The "Test ID's indicating failure of
    this are in your "after" dump of the error log with a test id equal to
    962, 964, and 963. These are internal diagnostic tests specifically
    telling you that there is a hardware problem related to the ethernet
    port #7. How you got more than 4 error log entries is beyond me as
    there are only 4 error log table entries and after the 4th error log 
    entry is written to, the first one is over-written, and so on.
    
    The earlier error log entries you captured with the "DEAD" Test ID, are
    Operational Firmware reported errors. The error codes reported are 23C0.
    These, as I recall are probably related to Packet Memory parity errors.
    These probably are related to the ultimate failure of port 7. What may
    have happened earlier is that whatever was intermittent, (or partially
    failing), on port 7 may have corrupted packets being received and stored
    in packet memory. BTW when individual ports fail on the switch products,
    the module is allowed to come up into operational mode so that diagnosis
    can be done to determine the failure via access through one of the other
    working ports.
    
    Now for the "Firmware Rev" descrepancy. The error log entries with the
    "Test ID = DEAD" are error log entries from the operational firmware
    and so the Firmware Rev = 1.5 field will be the major rev of the firmware
    rev of 1.5.2. The error log entries with the "Test ID = XXX" (XXX being
    some alphanumeric), are error log entris from the internal diagnostics.
    The Firmware Rev = 2.1 field is the rev of the diagnostic dispatcher
    code used to report the failure. 
    Notice too how the error log entries reported by the operational firmware 
    have a more detailed structure than the internal diagnostics have. 
    This was done to facilitate being able to more easily debug operational 
    firmware error log entries.
    
    So to summarize, you have a DECswitch which apparently had some sort of
    partial failure on port 7 that likely caused the packet memory parity
    errors. Subsequently, something in the port 7 circuitry has failed
    completely enough to now be reported by the internal diagnostics.
    Replacing the module is the prudent thing to do.
    I hope this information is helpful to you.
    
    -Bob