[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssdevo::hsj40_product

Title:HSJ30/40 Product Conference
Moderator:SSDEVO::EDMONDS
Created:Tue Jul 13 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1264
Total number of notes:4958

1247.0. "An answer this time please...Lastfail parameter decode" by KERNEL::CLARK (STRUGGLING AGAINST GRAVITY...) Thu May 08 1997 10:03

    This question has been asked many times before and so far as I can
    ascertain, has never been answered satisfactorily:-
    
    HSxxx controllers generate lastfail codes, which in some cases are
    accompanied by lastfail parameters.
    
    There does not appear to be any information published which enables the
    average FE to understand these parameters.
    
    	OK, I can understand where the parameter is a PC in a code stream,
    or an address of a faulting instruction, but where a FAULT TYPE/SUBTYPE
    number is provided, surely it must help the FE to understand what's
    really gone wrong if he can decode this information.
    
    	Not knowing this information is costing DIGITAL lots of '$' in
    un-necessary swaps of the wrong parts.
    
    	In the days of HSC's it was sometimes possible to identify a
    requestor and port from crash information, to isolate a faulty drive.
    
    	Is this information available in HSOF to enable FE's to pin down a
    port/drive which might be causing an HSJ crash?
    
    	Why am I raising this question again?....Because our old friend
    lastfail 01050104 for an HSJ40 running HSOF2.5j has re-occurred.
    
    	Previous notes (152,226,239) suggest a selection of causes which
    favoured a drive with bad metadata, and solutions which included testing
    every drive with DILX, and then re-initialising suspect drives. These
    notes incidentally, were raised in 1993, for HSOF 1.1j!!!
    
    	In a fully populated SW500 array in a customer production
    environment, this approach is just not feasible. There has to be a more
    efficient way of dealing with this scenario.
    
    	Therefore, is there a number which can be provided, or which can be
    extracted, which when decoded, will identify a port/device/channel active at
    the time of the lastfail event? Maybe a parameter (n+1)?
    
    				Dave Clark
    				UK-CSC
T.RTitleUserPersonal
Name
DateLines
1247.1Run FMU and see what it saysSSDEVO::RMCLEANThu May 08 1997 13:563
Actually there is a LOT of built in documentation on these.  What you do
is run FMU on the controller and then do a sho last all full.  This will tell
you what the error code is and give a lot of other info.
1247.2GIDDAY::HOBBSAndy Hobbs. Sydney CSC. -730 5964Sat May 17 1997 11:139
    
     I think a 'describe' of the lastfail code from within FMU also
    gives a  useful listing of the lastfail paramters which might 
    help. I'm miles away from my nearest manual though and the modem
    connection I've got doesn't favour PDF-based alternatives.
    
     Check it out.
    
    Andy/.
1247.3Once more with feeling!!!KERNEL::CLARKSTRUGGLING AGAINST GRAVITY...Tue May 27 1997 13:0823
    Re: .1
    Yes!
    
    It gives you the decode ofthe last fail code, but all it tells you for
    parameter (2) is that it's the "Fault type and subtype values"
    
    My question was :-
    
    What do these values indicate?
    
    Supplementary questions are now....
    
    Are they significant?
    Would they help a FE to make more sense of the problem?
    Would they save DIGITAL money?
    
    What does...  "parameter '2'   00020001" ...mean in English?
    
    Where is the decode for this number?
    
    Can we all have a copy of the list of values please?
    
    				Dave Clark
1247.4KERNEL::CLARKSTRUGGLING AGAINST GRAVITY...Mon Jun 02 1997 07:353
    Did I say something wrong?
    
    				Dave