[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssag::ask_ssag

Title:Ask the Storage Architecture Group
Notice:Check out our web page at http://www-starch.shr.dec.com
Moderator:SSAG::TERZAN
Created:Wed Oct 15 1986
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6756
Total number of notes:25276

3922.0. "EF51 , VERY SLOW, ERRORLOG DECODING PLEASE" by BACHUS::GOUBERT () Wed Feb 22 1995 12:04

T.RTitleUserPersonal
Name
DateLines
3922.1SSAG::LARYLaughter & hope & a sock in the eyeWed Feb 22 1995 21:1012
3922.2ef51 longwordsNETRIX::"clark???@beano.uvo.dec.com"DAVE CLARKTue Mar 11 1997 10:1379
I have the following longword information from an EF51 drive detected error
(MSLG$W_EVENT        00EB)
.
. CONTROLLER DEPENDENT INFORMATION

       LONGWORD 1.     00000000
                                       /..../
       LONGWORD 2.     00000000
                                       /..../
       LONGWORD 3.     00003700
                                       /.7../
ANAL/ERR DAVE.BIN/OUT=DAVE.DAT

	I have not been able to find a decoder for the EF51 error information in this
format.

	HISTRY or PARAMS>stat log do not provide any clues.

	The drive seems to log errors at the rate of two per day, and usually
fairly close together.

 DATE/TIME  1-MAR-1997 13:14:37.01                            SYS_TYPE
01370501
 DATE/TIME  1-MAR-1997 13:14:48.16                            SYS_TYPE
01370501
 DATE/TIME  2-MAR-1997 13:14:40.34                            SYS_TYPE
01370501
 DATE/TIME  2-MAR-1997 13:14:52.90                            SYS_TYPE
01370501
 DATE/TIME  3-MAR-1997 14:04:12.57                            SYS_TYPE
01370501
 DATE/TIME  3-MAR-1997 14:04:12.61                            SYS_TYPE
01370501
 DATE/TIME  4-MAR-1997 12:46:15.72                            SYS_TYPE
01370501
 DATE/TIME  4-MAR-1997 12:46:17.09                            SYS_TYPE
01370501
 DATE/TIME  5-MAR-1997 12:46:19.05                            SYS_TYPE
01370501
 DATE/TIME  5-MAR-1997 12:46:21.82                            SYS_TYPE
01370501
 DATE/TIME  6-MAR-1997 12:46:22.37                            SYS_TYPE
01370501
 DATE/TIME  6-MAR-1997 12:46:26.55                            SYS_TYPE
01370501
 DATE/TIME  7-MAR-1997 12:46:25.70                            SYS_TYPE
01370501
 DATE/TIME  7-MAR-1997 12:46:31.28                            SYS_TYPE
01370501
 DATE/TIME  8-MAR-1997 12:46:29.03                            SYS_TYPE
01370501
 DATE/TIME  8-MAR-1997 12:46:36.01                            SYS_TYPE
01370501
 DATE/TIME  9-MAR-1997 12:46:32.36                            SYS_TYPE
01370501
 DATE/TIME  9-MAR-1997 12:46:40.74                            SYS_TYPE
01370501
 DATE/TIME 10-MAR-1997 12:46:35.68                            SYS_TYPE
01370501
 DATE/TIME 10-MAR-1997 12:46:45.47                            SYS_TYPE
01370501
 DATE/TIME 11-MAR-1997 12:46:39.02                            SYS_TYPE
01370501
 DATE/TIME 11-MAR-1997 12:46:50.21                            SYS_TYPE
01370501


	I have checked the ESE50 service guide for longword decoding, however
it only works on longwords #1, and #2 (both '00000000' in this case) but does
not deal with longword#3 which is the only one I have with any bits set.


	Any help with either decoding the information, or a pointer to a
resource would be welcome.

				Regards...
				Dave Clark

[Posted by WWW Notes gateway]
3922.3you did not list the entire error log but,,,SUBSYS::VIDIOT::PATENAUDEAsk your boss for ARRAY's...Tue Mar 11 1997 10:53173
If the 37 in those longwords is the DER code then 37 = Replace Old Battery.

You may be running slow because ALL I/O is going to the retention device and NOT
to RAM.

I've attached a BLITZ I sent out about a year ago and resent recently to the
Field. 

Roger.

From:	SUBSYS::BABAGI::TOCSIN::TIMA_MGR "06-Feb-1997 1653"  6-FEB-1997
16:53:49.27
To:	BABAGI::PATENAUDE
CC:	
Subj:	GRAM: [TD 2021-A] Test/Replace Batteries - EF51R, EF52R, EF54R - BLITZ

Author                    : LINDA  WARREN
User type                 : DBA 
Location                  : USTIMA
Vaxmail address           : BSS::LWARREN        


Copyright (c) Digital Equipment Corporation 1996, 1997. All rights reserved.

NOTE:  This BLITZ supersedes TD 2021.

   +---------------------------+TM
   |   |   |   |   |   |   |   |
   | d | i | g | i | t | a | l |              TIME DEPENDENT BLITZ
   |   |   |   |   |   |   |   |
   +---------------------------+


   BLITZ TITLE: Testing and replacement of batteries in the EF51R, EF52R,
     	        and EF54R.                                  

   PRIORITY LEVEL: 2

   DATE: February 6, 1997
   TD #: 2021-A

   AUTHOR:	Roger Patenaude
   DTN:		237-3705
   EMAIL:       SUBSYS::PATENAUDE or patenaude@subsys.enet.dec.com
   DEPARTMENT:  Storage External Products, Continuation Engineering

   =================================================================

   PRODUCT NAME(S): EF51R, EF52R, and EF54R.                                  

   PRODUCT FAMILY(IES): {Check all that apply}

   Storage         _X_
   Systems/OS      ___
   Networks        ___
   PC/Peripherals  ___   {includes printers, monitors, etc.}
   Software Apps.  ___


   BLITZ TYPE: {Check all that apply}

   Maintenance Tip           _X_  {Info. will assist servicing the product}
   Service Action Requested  ___  {MCS is requested to perform an activity}


   IF SERVICE ACTION IS REQUESTED: (Check all that apply.)

   Labor Support Required     ___  {Requires MCS to provide service labor}
   Material Support Required  ___  {Requires MCS to provide material}


   Estimated time to complete activity (in hours):
   Will this require a change in the field's inventory:  Yes ___  No ___
   Will an FCO be associated with this advisory?  Yes ___  No ___


   DESCRIPTION OF SERVICE ACTIVITY REQUESTED (if applicable):


    **********************************************************************

   SYMPTOM:

      Customer can lose all data contained on Solid State Disks during
      power failure if the retention battery has failed.


   PROBLEM STATEMENT:

      The mode in which NiCad batteries (as those used in the EF5xR
      products) most commonly fail, is that they will test as having
      voltage and current, however, in fact they can be holding next to no
      reserve. This, if left undetected in a EF5xR implementation can render
      the drive in a state that is only recovered by reformatting the unit.

      It was for this reason that the EF5xR family of devices has on-board
      battery test diagnostics, and that the batteries must be tested and
      replaced every three years or if diagnostics fail during a yearly
      test as part of normal service procedure.


   SOLUTION:

   
      Replace any batteries that are 3 years old or fail annual battery
      tests.

      Inspection of the battery manufacture date, located on the battery
      label is the only true way of finding out the age of the battery.

      Refer to section 4 of EK-EF5XX-UG for proper procedures to access
      and run BATTST utility.
        
      NOTE: As per section 4 of EK-EF5XX-UG, you may view how many days the
      current battery has left before the 3 year replacement by looking at
      the PARAMS values of BSS_MAXR and BSS_REPL. BSS_MAXR is the total
      number of days a battery can live before proactive replacement and
      BSS_REPL is number of days left on current battery. Once BSS_REPL
      reaches "0" the unit will issue an errorlog datagram with a DER code
      of 37(x) (Replace Old Battery) once per week.

      Refer to section 7 of the EK-EZ5XX-UG for proper procedures to
      replace a battery pack (The EF5XX User Guide omits the actual
      replacement procedure).

      Note: Both of the manuals are available online at;
            SUBSYS::LCA:[SPECS.SOLID_STATE.EZXX] or,
            SUBSYS::LCA:[SPECS.SOLID_STATE.EFXX] or,
            TIMA TOOLS in .PDF and .PS format
                             
      Note: The replacement battery part number has recently been CHANGED! 

      	    OLD battery pack PN# 12-37620-01 
      	    NEW battery pack PN# 29-33445-01
                             
      These batteries may discharge during storage and at times when the
      unit's power is removed for more than a month. Upon initial receipt
      of a new EF5xR or after replacement of the batteries in an existing
      EF5xR, you may find the device is write protected due to insufficient
      battery charge-level for data retention. It is recommended that EF5xR
      devices be powered on for a minimum of four hours before operating.

      EF5xR batteries must be replaced every 3 years as part of normal
      service procedure. This item is considered a "wearable" item and NOT
      covered under warranty or field contract. Any replacement is the
      responsibility of the user and should be charged per call/time and
      material.
      
      Once the battery pack has been replaced on a EF5XX, you must reset
      the internal battery 3 year counter saved in the parameters of the
      device. BSS_REPL (as mentioned above) is a read only word, and to
      reset it to factory default of 1095 (3 years in days) you must write
      a "1" into parameter BSS_REST (this may take a minute to be detected
      by the firmware). Then, either power cycle the unit or set the bit
      BSS_UPNV (Update Non-volitile) to force it take an immediate effect.


   VERIFICATION:

      After battery has had sufficient charge time, a successful pass of
      BATTST indicates a good battery.

      It is also a good idea inspect the EF5X battery pack for visible
      signs of "leakage" whenever service is performed on the unit.

      
   LARS INFORMATION: 

                     *** DIGITAL INTERNAL USE ONLY ***

\\ GRP=TIME_DEPENDENT CAT=HARDWARE DB=CSSE_TIME_CRITICAL
\\ TYPE=KNOWN_PROBLEM TYPE=BLITZ STATUS=CURRENT
3922.4decevent's view of things...NETRIX::"clark@beano.uvo.dec.com"dave clarkTue Mar 11 1997 10:5394
I tried running the binary errorlog information past DECEVENT, but I'm not too
convinced by it's interpretation of the third longword:-

******************************** ENTRY    6 ********************************


Logging OS                        1. OpenVMS
System Architecture               1. VAX
OS version                           V5.5-2
Event sequence number         18435.
Timestamp of occurrence              03-MAR-1997 14:04:12
Time since reboot                    0 Day(s) 1:07:59
Host name                            MARS01

SID register              x14000006
System type register      x01370501  Unrecognized System Type
Unique CPU ID             x00000000
System Model                         VAX type not decoded yet

Entry type                      100. Logged Message


---- Device Profile ----
Unit                                 DISK2$DIA3
Product Name                         EF51 DSSI Solid State Disk

---- MSCP Logged Msg ----

Logged Message Type Code          1. Disk Message


Command Reference number  x00000000
Unit Number                       3.
MSCP Sequence number              0.
Logged Message Format             4. Small Disk Error
MSCP Flags                      x00  No MSCP Flags indicated

MSCP Unique Controller-ID x0000408332101779
MSCP Controller Model           105. EF5X
MSCP Controller Class             1. Mass Storage Controller class
Controller SW version           x3A
Controller HW version           x01
Unit SW version                 x3A
Unit HW version                 x01

MSCP SDE Event code           x00EB  Drive detected error.
Multiunit code                x0000
Cylinder                          0.
Volume Serial Number              0.
RF Disk DER Code                x00  Undefined DER Code
Servo Event Code                x00  No Servo Error.
Physical Sector                   0.
Head                              0.
Logical Block Number              0.
Bad Block Space left              0.
DDASP Write Fault Reg           x34
 Cancel

MSCP Unique Unit-ID       x0000408332101779
MSCP Unit Model                  51. EF5X
MSCP Unit Class                   2. Disk class - DEC Std 166 disk
Unit SW version                 x3A
Unit HW version                 x01

MSCP SDE Event code           x00EB  Drive detected error.
Multiunit code                x0000				}
Cylinder                          0.				}
Volume Serial Number              0.				} longword#1 ??
RF Disk DER Code                x00  Undefined DER Code		}
Servo Event Code                x00  No Servo Error.	
Physical Sector                   0.			
Head                              0.			
Logical Block Number              0.
Bad Block Space left              0.
DDASP Write Fault Reg           x37  Disable Write Gate Bit Set. } from lw#3
??
                                     Wrt Lock Fault. Not properly Locked to
                                     Internal Ref Clock.
                                     Write Enabled Fault. Disable Write Gate
                                     Set During Write.
                                     Write Unsafe. Often Result Of Another
                                     Write Fault Condition.
                                     Sector Write Overrun. Attempt to Write
                                     Over Servo Burst.
Servo Status Reg              x0000
Phoenix Data Status Reg       x0000  Cmd Response:  State Machine Idle.



MSCP Unique Unit-ID       x0000408332101779
MSCP Unit Model                  51. EF5X
MSCP Unit Class                   2. Disk class - DEC Std 166 disk

[Posted by WWW Notes gateway]
3922.5Pre-emptive strike!KERNEL::CLARKSTRUGGLING AGAINST GRAVITY...Tue Mar 11 1997 10:575
    Roger...
    	Many thanks...you pr-empted my reply '.4' by seconds!
    
    			Regards...
    			Dave Clark
3922.6hmmm...SUBSYS::VIDIOT::PATENAUDEAsk your boss for ARRAY's...Tue Mar 11 1997 11:205
Your welcome. That DECevent log looks strange. Can you copy a binary of that
error to BOT000::FIREWALL: so I can bit bust it manually?

roger.
3922.7File copied as requestedKERNEL::CLARKSTRUGGLING AGAINST GRAVITY...Thu Mar 13 1997 06:188
    Roger...
    	The file EF51_ERRORS.BIN is now copied as requested. This is the
    cluster-merged binary errorlog for the device since 1st march this
    year.
    
    	Sorry about the delay...I had a day off yesterday (12th)
    
    				Dave
3922.8yup battery.SUBSYS::VIDIOT::PATENAUDEAsk your boss for ARRAY's...Thu Mar 13 1997 10:5311
I did not have to break them down. Why? If you notice, the errors happen every
24 hours. 

The drive tests the battery status every 24 hours after being powered up, if it
fails, your get the every 24 hr error message.

Every 7 days after power on, the drive also test's the internal value of
BSS_REPL and if = 0, will also issue the same error packet, except 7 days appart.

roger.
3922.9Action in hand.KERNEL::CLARKSTRUGGLING AGAINST GRAVITY...Tue Mar 18 1997 13:304
    Roger...
    	Thanks for the feedback...an action plan has been implemented...
    
    				Dave