[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssdevo::hsj40_product

Title:HSJ30/40 Product Conference
Moderator:SSDEVO::EDMONDS
Created:Tue Jul 13 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1264
Total number of notes:4958

1242.0. "Batteries not failing under V3.1" by GEM::SHERGOLD (We are 100% sure; well almost!!) Fri Apr 25 1997 06:22

    In our lab we had an HSJ40 running V2.7-1 and it repeatedly reported
    that the battery had failed (We just haven't had the time to get it
    replaced - always next on the job list!!). However, one job I did do
    was to install the newly delivered V3.1 card. The strange thing was
    that the next mornng the HSJ reported the battery as good with the
    added message of "Cache battery is now sufficiently charged". Can
    someone explain this? Have we been replacing batteries like mad because
    the old battery testing routine was bad (despite the patch)? As there
    is always some delay in getting all our customers up to V3.1 is there a
    better patch we can apply to V2.7 to facilitate the same response. OR
    (heaven forbid :-)  ) has the battery test been fudged to cope with all
    these failing batteries??
    
    Keith
T.RTitleUserPersonal
Name
DateLines
1242.1Does 3.1 really help?PFSVAX::WUENSCHELLWed Apr 30 1997 12:157
    Keith;
    	We continue to fight battery failures in the field even with
    batteries dated 97.  Are you continuing to have good results with HSOF
    3.1?  If so, this may be a solution for some of our problem customers.
    
    By the way, are there any patches to 3.1 that you know of?  A note in
    this conference says there aren't, but then I saw a reference to 3.1-6.
1242.2SSDEVO::THOMPSONPaul Thompson, Colorado SpringsWed Apr 30 1997 17:355
Do the 1997 batteries with which you are having problems have white labels?

If so, the manufacturing date of those batteries pre-dates 1997.  The date
on the white label shows the date that the battery was most recently
re-charged.
1242.3White label has MAR 1997PFSVAX::WUENSCHELLThu May 01 1997 11:576
    Yes, the batteries have a white label with Mar 1997 on it.  They also
    have MAR 97 stamped in black on the edge.
    Are you saying that these batteries may have problems?  How do we
    determine which 1997 batteries are good or bad?
    
    
1242.4Batteries WITHOUT white labelsSSDEVO::THOMPSONPaul Thompson, Colorado SpringsThu May 01 1997 18:324
Batteries dated 1997 that do not have a white label on the face of the battery
with this date are good.  Batteries with the date on a white label on the face
of the battery were originally manufactured in 1996 and are subject to the
problem from the vendor's manufacturing defect.
1242.5GEM::SHERGOLDWe are 100% sure; well almost!!Tue May 06 1997 10:283
    OK Guys what about an answer to .0??? Any takers?
    
    Keith
1242.6Answers...SSDEVO::FAVA4 Yrs of Eng Sch & Never Saw a TrainTue May 06 1997 16:0759
	RE:  .5

	OK, I accept your challenge.

	I presume the questions in .0 that you would like answered are the 
	following:

>>
>>                                                  The strange thing was
>>    that the next mornng the HSJ reported the battery as good with the
>>    added message of "Cache battery is now sufficiently charged". Can
>>    someone explain this? 
>>
	Yes, I can.  This entire battery problem has been extremely painful
	for everyone.  It has been caused by several problems, both hardware
	and software.

	Major changes were made to the battery diagnostic in V3.1 which 
	specifically corrected many of the software issues.  We know now that 
	the diagnostic in V2.7 and V3.0 was declaring many batteries "failed" 
	when there was no problem with them at all.  One of our tests here
	in the past few days was with a set of batteries which failed 
	consistently on V3.0 and passed consistently on V3.1.  These
	batteries have a date code of 10/94 and our testing shows that they 
	would still hold up a cache for 70 - 80 hours!!

	Keep in mind, however, some of the failures detected by the software 
	were true battery failures.  The big problem was to eliminate the 
	"false" failures while still detecting true failures.

>>
>>                          Have we been replacing batteries like mad because
>>    the old battery testing routine was bad (despite the patch)? 
>>
	As I mentioned above, some, but not all, of the problems were due 
	to the software falsely declaring some batteries bad.
>>
>>                                                                 As there
>>    is always some delay in getting all our customers up to V3.1 is there a
>>    better patch we can apply to V2.7 to facilitate the same response. 
>>
	NO.
>>
>>                                                                       OR
>>    (heaven forbid :-)  ) has the battery test been fudged to cope with all
>>    these failing batteries??
>>    
	I hope this suggestion was entirely facetious.  But if there is any 
	doubt, NO!!!, the test was NOT fudged simply to pass all batteries, 
	bad ones included.  Many MONTHS of effort by both hardware and 
	software people have been spent trying to resolve this serious
	customer satisfaction problem.  No one here has treated it lightly.
	The changes in V3.1 were a big step.  However, more work is going 
	on now.  This issue is still not closed to our satisfaction.

	Hope this helps.

						Tom Fava
						Colorado Springs
1242.7Fair's fair!GEM::SHERGOLDWe are 100% sure; well almost!!Fri May 09 1997 13:1011
    Tom,
    
    Thanks for the reply. Not the one I wanted but at least it is an honest
    one and we know where we are.
    
    Oh and by the way the last part was facetious but I didn't know the
    symbol for "tongue in cheek".  [ :-Q  maybe??]
    
    Regards
    	Keith
    
1242.8Cache Battery Low MessagesBSS::BERGLINGThu May 22 1997 13:1511
    I have a new twist on this.
    
    We have installed 3.1 on about 26 HSJ40's. We are getting cache battery
    low notices. When we check the J later, like 8 hours later it says:
    
    "Cache battery is now sufficiently charged"
    
    Any explanation for this?
    
    Thanks,
    Vern Bergling
1242.9Normal operation from what you describeSSDEVO::RMCLEANThu May 22 1997 13:494
Yup... When you get batteries or if you have batteries that have been supporting
the cache there is some chance that they have been discharged somewhat.
Batteries sitting on the shelf or in an unpowered module discharge naturally.
The starting low and later becoming charged is perfectly natural.
1242.10Installed BatteriesBSS::BERGLINGThu May 22 1997 20:535
    These J's have not reported the batteries being low before. It seems to 
    be running fine and then gets a low indication. After 8-12 hours this
    changes back to normal. Is the HSJ recharging the batteries or what?
    
    Thanks,
1242.113.1 Crashes on low Batteries????BSS::BERGLINGFri May 23 1997 12:42198
    The following is a console output from one of these "J"s. It seems that
    when we get the first DRAB interrupt the J crashes. It then logs a
    number of failure codes all pointing to the cache batteries. 4 hours
    later the batteries are again sufficiently charged. 
    
    Is this crash a new feature for 3.1?
    
    The batteries will be replaced today.
    
    Vern
    	
    22:01:30 HJ2202> SHOW THIS
    00:01:29 Controller:
    00:01:29         HSJ40    (C) DEC ZG61013832 Firmware V31J-0, Hardware 
    H09
    00:01:29         Configured for dual-redundancy with ZG61013838
    00:01:29             In dual-redundant configuration
    00:01:29         SCSI address 7
    00:01:29         Time: 31-MAR-1997 14:56:04
    00:01:29 Host port:
    00:01:29         Node name: HJ2202, valid CI node 15, 16 max nodes
    00:01:29         System ID 4200100FD4C0
    00:01:29         Path A is ON
    00:01:29         Path B is ON
    00:01:29         MSCP allocation class   30
    00:01:29         TMSCP allocation class  30
    00:01:29         CI_ARBITRATION = ASYNCHRONOUS
    00:01:29         MAXIMUM_HOSTS = 15
    00:01:29 Cache:
    00:01:29         32 megabyte write cache, version 2
    00:01:29         Cache is GOOD
    00:01:29         Battery is GOOD
    00:01:29         No unflushed data in cache
    00:01:29         CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
    00:01:29         CACHE_POLICY = A
    00:01:29         NOCACHE_UPS
    00:01:29 HJ2202> SHOW FAIL
    00:01:34 Name          Storageset                     Uses            
    Used by
    00:01:34
    ----------------------------------------------------------------------
    00:01:34
    00:01:34 FAILEDSET     failedset
    00:01:34         Switches:
    00:01:34           NOAUTOSPARE
    00:01:34 HJ2202>
    01:22:45
    01:22:45 %LFL--HJ2202> --31-MAR-1997 16:17:20-- Last Failure Code:
    010B2380
    01:22:55  Occurred on 31-MAR-1997 at 16:17:20
    01:22:55  Power On Time: 0. Years, 302. Days, 0. Hours, 58. Minutes,
    17. Second
    01:22:55  Controller Model: HSJ40
    01:22:55  Serial Number: ZG61013832 Hardware Version:  H09(4F)
    01:22:55  Controller Identifier:
    01:22:55   Unique Device Number: 000961013832 Model: 40.(28) Class:
    1.(01)
    01:22:55  Firmware Version: V31J(31)
    01:22:55  Node Name: "HJ2202" CI Node Number: 15.(0F)
    01:22:55  Instance Code: 01010302
    01:22:55  Last Failure Code: 010B2380 (No Last Failure Parameters)
    01:22:55
    01:22:55  Additional information is available in Last Failure Entry: 4.
    01:23:44
    01:23:44 Copyright Digital Equipment Corporation 1993, 1997. All rights
    reserve
    01:23:44 HSJ40 Firmware version V31J-0, Hardware version  H09
    01:23:44
    01:23:44 Last fail code: 010B2380
    01:23:44
    01:23:44 Press " ?" at any time for help.
    01:23:44
    01:23:44
    01:23:44 Cache battery charge is low
    01:23:44 Write-back caching is disabled
    01:23:44 HJ2202>
    01:23:44
    01:23:44 %EVL--HJ2202> --31-MAR-1997 12:46:09-- Instance Code: 01010302
    01:23:44  Template: 1.(01)
    01:23:44  Occurred on 01-MAR-1997 at 18:13:07
    01:23:44  Power On Time: 0. Years, 302. Days, 0. Hours, 58. Minutes,
    18. Second
    01:23:44  Controller Model: HSJ40
    01:23:44  Serial Number: ZG61013832 Hardware Version:  H09(4F)
    01:23:44  Controller Identifier:
    01:23:44   Unique Device Number: 000961013832 Model: 40.(28) Class:
    1.(01)
    01:23:44  Firmware Version: V31J(31)
    01:23:44  Node Name: "HJ2202" CI Node Number: 15.(0F)
    01:23:44  Command Reference Number: 00000000 Sequence Number: 0001
    01:23:44  Instance Code: 01010302
    01:23:44  Last Failure Code: 010B2380 (No Last Failure Parameters)
    01:23:44
    01:23:44 %EVL--HJ2202> --31-MAR-1997 12:46:09-- Instance Code: 02052301
    01:23:44  Template: 18.(12)
    01:23:44  Power On Time: 0. Years, 302. Days, 0. Hours, 58. Minutes,
    20. Second
    01:23:44  Controller Model: HSJ40
    01:23:44  Serial Number: ZG61013832 Hardware Version:  H09(4F)
    01:23:44  Controller Identifier:
    01:23:44   Unique Device Number: 000961013832 Model: 40.(28) Class:
    1.(01)
    01:23:44  Firmware Version: V31J(31)
    01:23:44  Node Name: "HJ2202" CI Node Number: 15.(0F)
    01:23:44  Command Reference Number: 00000000 Sequence Number: 0002
    01:23:44  Memory Address: 00000000
    01:23:44  Instance Code: 02052301
    01:23:44 HJ2202
    01:23:44
    01:23:44 %EVL--HJ2202> --31-MAR-1997 12:46:10-- Instance Code: 024B2401
    01:23:44  Template: 20.(14)
    01:23:44  Power On Time: 0. Years, 302. Days, 0. Hours, 58. Minutes,
    20. Second
    01:23:44  Controller Model: HSJ40
    01:23:44  Serial Number: ZG61013832 Hardware Version:  H09(4F)
    01:23:44  Controller Identifier:
    01:23:44   Unique Device Number: 000961013832 Model: 40.(28) Class:
    1.(01)
    01:23:44  Firmware Version: V31J(31)
    01:23:44  Node Name: "HJ2202" CI Node Number: 15.(0F)
    01:23:44  Command Reference Number: 00000000 Sequence Number: 0003
    01:23:44  Reported via low level DRAB interrupt
    01:23:44  Memory Address: 40000000
    01:23:44  Byte Count: 0.(00000000)
    01:23:44  DRAB Registers:
    01:23:55   DSR:  00000000  CSR:  00000000 DCSR:  00000000  DER: 
    00000000  EAR:
    01:23:55   EDR:  00000000  ERR:  00000000  RSR:  00000000  CHC: 
    00000000  CMC:
    01:23:55  Diagnostic Registers:
    01:23:55   RDR0: 00000000  RDR1: 00000000  WDR0: 00000000  WDR1:
    00000000
    01:23:55  Instance Code: 024B2401
    01:23:55 HJ2202> SHOW THIS
    04:01:28 Controller:
    04:01:28         HSJ40    (C) DEC ZG61013832 Firmware V31J-0, Hardware 
    H09
    04:01:28         Configured for dual-redundancy with ZG61013838
    04:01:28             In dual-redundant configuration
    04:01:28         SCSI address 7
    04:01:28         Time: 31-MAR-1997 15:23:54
    04:01:28 Host port:
    04:01:28         Node name: HJ2202, valid CI node 15, 16 max nodes
    04:01:29         System ID 4200100FD4C0
    04:01:29         Path A is ON
    04:01:29         Path B is ON
    04:01:29         MSCP allocation class   30
    04:01:29         TMSCP allocation class  30
    04:01:29         CI_ARBITRATION = ASYNCHRONOUS
    04:01:29         MAXIMUM_HOSTS = 15
    04:01:29 Cache:
    04:01:29         32 megabyte write cache, version 2
    04:01:29         Cache is GOOD
    04:01:29         Battery is LOW
    04:01:29         No unflushed data in cache
    04:01:29         CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
    04:01:29         CACHE_POLICY = A
    04:01:29         NOCACHE_UPS
    04:01:29 Cache battery charge is low
    04:01:29 Write-back caching is disabled
    04:01:29 HJ2202> SHOW FAIL
    04:01:33 Name          Storageset                     Uses            
    Used by
    04:01:33
    ----------------------------------------------------------------------
    04:01:34
    04:01:34 FAILEDSET     failedset
    04:01:34         Switches:
    04:01:34           NOAUTOSPARE
    04:01:34 Cache battery charge is low
    04:01:34 Write-back caching is disabled
    04:01:34 HJ2202> SHOW THIS
    08:01:28 Controller:
    08:01:28         HSJ40    (C) DEC ZG61013832 Firmware V31J-0, Hardware 
    H09
    08:01:28         Configured for dual-redundancy with ZG61013838
    08:01:28             In dual-redundant configuration
    08:01:28         SCSI address 7
    08:01:28         Time: 31-MAR-1997 19:23:54
    08:01:29 Host port:
    08:01:29         Node name: HJ2202, valid CI node 15, 16 max nodes
    08:01:29         System ID 4200100FD4C0
    08:01:29         Path A is ON
    08:01:29         Path B is ON
    08:01:29         MSCP allocation class   30
    08:01:29         TMSCP allocation class  30
    08:01:29         CI_ARBITRATION = ASYNCHRONOUS
    08:01:29         MAXIMUM_HOSTS = 15
    08:01:29 Cache:
    08:01:29         32 megabyte write cache, version 2
    08:01:29         Cache is GOOD
    08:01:29         Battery is GOOD
    08:01:29         No unflushed data in cache
    08:01:29         CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
    08:01:29         CACHE_POLICY = A
    08:01:29         NOCACHE_UPS
    08:01:29 Cache battery is now sufficiently charged
    08:01:29 HJ2202>
1242.12what is the date?SSDEVO::RMCLEANFri May 23 1997 13:493
The important thing here is what is the date on the batteries?  They may well
be very near failure but they should still be able to hold up the cache for
100 hours.