[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssdevo::hsz40_product

Title:HSZ40 Product Conference
Moderator:SSDEVO::EDMONDS
Created:Mon Apr 11 1994
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:902
Total number of notes:3319

753.0. "HSZ40 CACHE_POLICY why like it is?" by ATZIS3::PUTZENLECHNE () Thu Jan 30 1997 10:23

    Hello everybody!
    
    Can somebody give me a few explanations on the HSZ40
    with HSOF Version 2.7? There are some things which are not
    clear to me.
    
    If i read the description of the CACHE_POLICY-parameter
    in the "User-Guide" (appendix B, page B-79) i'm a little
    bit confused.
    
    
    I DO NOT UNDERSTAND WHY RAID- AND MIRRORSETS MUST BE MADE INOPERATIVE
    IN
    CASE OF A FAILED BATTERY OR IF THE BATTERIES GO LOW AFTER
    INITIALIZATION.
    
    Please give me feedback to the following questions:
    
    1) Why is it not enough to switch of the "write-back-cache" and let
    them be
       accessable in the "write-trough-mode"?
    
    2) What is the difference if the batteries are already low during
    controller-
       initialization or if they go low during operation?
    
    3) Why "is there some risk" if setting CACHE_POLICY=B? If the Units are
       only accessed in write-trough-mode (after a battery-failure) there
       is NO DATA in the cache which must be "maintained if a power failure
       occurs".
    
    4) I expirenced that there was a change from HSOF 2.5 to 2.7.
       With HSOF 2.5 it was possible to access units in write-trough-mode
       even if the battery was failed. This was much better!
       What is the reason why this is no longer possible with HSOF 2.7 ?
    
    5) I know that with HSOF 3.0 this problem does no longer exist for
       dual-redundant-controllers. But what happens to single-controller-
       configurations if a battery fail?
    
    
    The reason why i need answers to these questions are that no customer
    would understand why such a "simple failure" like a bad battery can
    cause a system-down nevertheless he has a dual-redundant-controller.
    
    (we had one critical situation)
    
    thanks for any information,
    
    Helmut
    
T.RTitleUserPersonal
Name
DateLines
753.1Covering the "write hole"SSDEVO::JACKSONJim Jackson, HSx RAID teamThu Jan 30 1997 12:1117
The HSx controllers use the write-back cache to eliminate the "write hole"
that is a characteristic of all RAID levels (except for striping).  In RAID
1 (mirror) sets, the "write hole" causes data to be different on the
different members.  In RAID 5 sets, the "write hole" causes the parity to be
inconsistent, which can cause future data corruption.

We view data corruption seriously, which is why we don't permit a RAID 5 set
to operate if the battery is bad.

When the battery is low and CACHE_POLICY=B, the units are accessed in
"write-through" mode.  In this mode, no user data is kept in the cache.
However, write hole information *is* kept in the cache, so the cache is
still necessary.  If the write hole information is lost, *none* of the
parity on the RAID 5 set can be trusted.  This is why a CLEAR LOST_DATA
command destroys all redundancy on a RAID 5 set.

Does this help?
753.2UTOPIE::OETTLhide bug until worst timeThu Jan 30 1997 16:349
753.3OK - but.....ATZIS1::PUTZENLECHNEFri Jan 31 1997 03:5210
    reply .1
    
    Thank you, well i understand this now, but why was this not already in
    HSOF 2.5 implemented in the same way? I know that in 2.5 it was
    possible to operate with "failed" batteries, or did they experience
    data-los-problems with 2.5 and changed it because of this in 2.7?
    
    Thank you so far,
    
    Helmut
753.4V2.5 vs. V2.7SSDEVO::THOMPSONPaul Thompson, Colorado SpringsMon Feb 03 1997 18:429
V2.5 only checked the batteries at boot time.  Therefore, if you had a failed
battery, you would not know until either, power failed and you lost the data in
cache, or you rebooted the controller.

V2.7 implemented periodic tests of the battery to proactively identify failed
batteries before the Customer lost data.

Under either version of the firmware, once the battery failure is detected, access
to RAID and Mirro sets is denied.
753.5Thank YouATZIS1::PUTZENLECHNEWed Feb 05 1997 15:1711
    
    
    > V2.7 implemented periodic tests of the battery to proactively identify failed
    > batteries before the Customer lost data.
    
    except the cache_policy is set to B. Because i had a System running
    for more than 2 days with failed batteries.
    
    thanks for the informations,
    
    Helmut
753.6Re: Cache policy "B"SSDEVO::THOMPSONPaul Thompson, Colorado SpringsFri Feb 07 1997 12:5114
Regarding the reference to cache policy "B" in the previous note.

Cache Policy "B" means that after a boot or reboot, if the battery is found to be
"low", access to RAID and Mirror sets will be allowed while the battery is
charging.  With cache policy "A", access to the RAID and Mirror sets would be
denied if the battery was low.

If the battery is not found to be fully charged and declared "good" by the
controller within ten (10) hours of a boot or reboot, access to the RAID and
Mirror sets will be denied, regardless of the cache policy.

In other words, ten (10) hours after a boot or reboot, the choice of settings of
the cache policy parameter makes no difference.  If the batteries have not been
declared "good", access to RAID and Mirror sets will be denied.