[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]
Conference ssdevo::hsz40_product

Title:	HSZ40 Product Conference

Moderator:	SSDEVO::EDMONDS

Created:	Mon Apr 11 1994
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	902
Total number of notes:	3319
793.0. "Save_config blitz" by SSDEVO::ASTOR (Subsystems Engineering Support) Tue Mar 04 1997 13:27

Copyright (c) Digital Equipment Corporation 1997. All rights reserved.

+---------------------------+TM
|    |   |   |   |   |   |   |
|  d | i | g | i | t | a | l |           TIME   DEPENDENT   BLITZ
|    |   |   |   |   |   |   |
+---------------------------+


   BLITZ TITLE:

   Possible problem with disks intitialized with SAVE_CONFIG under HSOF V2.7
   on HSZ40/20/SWXRC

   PRIORITY LEVEL: 1

   DATE:	2/21/97
   TD #:        2241

   AUTHOR:         Kurt Astor, Tom Gonzales
   DTN:            522-2478, 522-6234
   EMAIL:          SSDEVO::ASTOR, SSDEVO::T_GONZALES
   DEPARTMENT:     Subsystem Engineering Support

   =================================================================

   PRODUCT NAME(S): HSZ40, RA410, SWXRC

   PRODUCT FAMILY(IES): 

   Storage         _X_
   Systems/OS      ___
   Networks        ___
   PC/Peripherals  ___   
   Software Apps.  ___


   BLITZ TYPE: 

   Maintenance Tip           _X_  
   Service Action Requested  ___


   IF SERVICE ACTION IS REQUESTED:

   Labor Support Required     ___ 
   Material Support Required  ___ 


   Estimated time to complete activity (in hours):
   Will this require a change in the field's inventory:  Yes ___  No _X_
   Will an FCO be associated with this advisory?  Yes ___  No _X_


   DESCRIPTION OF SERVICE ACTIVITY REQUESTED (if applicable):


    **********************************************************************

   SYMPTOM:

       There is a remote possibility that some disks attached to
       HSZ40/20/SWXRC and the solution products containing them (RA410,
       SC4200/4600, etc.) may have a problem in the structure of the
       on-disk file system.  Systems which may be affected are those
       which:

	   1.  Use disks in "JBOD" configuration (that is, disks which
	       are not members of controller-based storagesets such as
	       RAIDsets and mirrorsets)

	   2.  Initialized disks under HSOF V2.7Z using the SAVE_CONFIG
	       command AND rebooted the controller BEFORE initializing
	       the disk under the operating system

       Note that the problem does not occur if the file system was 
       built on the disk before the controller was rebooted.  Also,
       the problem does not occur when disks are initialized using
       SAVE_CONFIG and the platform operating system under HSOF V3.0Z.  

       Note that all 2GB and 4GB drives on Windows NT platforms are NOT
       exposed to this potential problem.  Drives on other platforms
       meeting the above criteria have a small risk of exposure; see the 
       "How to Detect" section of this Blitz for procedures to determine
       whether a disk is exposed.


   PROBLEM STATEMENT:

       When a disk being used in a JBOD configuration is initialized
       with SAVE_CONFIG, the last 500 blocks on the disk are allocated
       by the controller to store the configuration data.  If the
       controller running HSOF V2.7Z is rebooted BEFORE the disk is
       initialized by the platform operating system, the controller
       fails to remember the reduction in disk size and reports the
       unreduced disk capacity to the operating system.  When the
       operating system subsequently builds the file system, the blocks
       which SAVE_CONFIG will use to update the configuration data are
       also included in the file system disk space, creating a potential
       for both the operating system and the controller to write to the
       last 500 blocks on disk.

       If the file system subsequently overwrites configuration data,
       the controller recognizes that the data is invalid config data
       and ignores it.  In this case, controller parameters must be
       manually re-entered when SAVE_CONFIG tries to restore the
       configuration (unless another drive contains valid config data). 
       Various configuration events will cause the controller to write
       the config data to the SAVE_CONFIG area.  If the controller
       overwrites file system data, the results vary depending on the
       platform operating system and the application.

       If a controller which has this problem is upgraded to HSOF V3.0Z
       before the differing file system and controller view of the disk
       capacity is resolved and the file system tries to access the
       SAVE_CONFIG area, the controller returns an error to the
       operating system.  The action that the operating system will take
       upon receiving this error will vary depending on the platform,
       but may include rendering the entire file system or database
       inaccessible.


   HOW TO DETECT IF YOU HAVE THIS PROBLEM:

1.  Windows NT platforms

    As previously noted, 2GB and 4GB drives on Windows NT platforms are
    not exposed to the problem described in this blitz.  This problem
    affects 1GB single-disks units in JBOD configuration with SAVE_CONFIG
    data stored on them.  If you are not using 1GB JBOD disk units with
    SAVE_CONFIG data saved on them, do not proceed any further.  Your
    system is NOT at risk.  

    Use the following procedure to check a JBOD 1GB drive with
    SAVE_CONFIG data saved on it to determine whether it is exposed:


	a.  Shut down the host computer, wait until shut down is complete
 	b.  Restart the hsz controller(s) by pressing the heart-beat
            button(s) (Green reset button)
	c.  Wait a minute, then start the host computer       
	d.  After the host reboots, start up 'Disk Administrator.'
	e.  Determine which drive on 'Disk Admin' corresponds to the
	    1GB JBOD disk to be checked.
        f.  Check if the jbod has a 1MB or greater unpartitioned space at
            the end of disk.  
        g.  If 'f' is true, the disk does NOT have the problem described
            in this blitz.  Make sure that you never use the last 1MB
            space, leave it unpartitioned.
        h.  If 'f' is false, there is no unpartitioned space at the end of
	    the disk, then the very last 196 Blocks (100KB) on the drive
            are at risk for the problem described in this blitz.  See
            the "Solution" section below for the recovery procedure.

 
2.  Novell NetWare platforms

    The problem described in this blitz affects single-disks units in
    JBOD configuration with SAVE_CONFIG data stored on them.  If you are
    not using JBOD disk units with SAVE_CONFIG data saved on them, do
    not proceed any further.  Your system is NOT at risk.  

    NetWare reserves 2% of the space at the end of each disk for bad
    block replacement.  500 blocks (256KB) at the end of this 2% space
    will be exposed to the problem described in this blitz.  A 2% space
    is larger than is generally needed for replacing bad blocks.  For
    example, reserve space on a 4GB, 2GB, and 1GB disk is 80MB, 40MB,
    and 20MB respectively.  The probability of a bad block being
    replaced in the last 256KB of this reserve space is very small;
    however, it is possible.  Use the following procedure to check a
    disk in JBOD configuration to determine whether it is exposed:

	a. NWSERVER>  load install
	b. Open "disk options"
	c. Open "Modify disk partition and Hot Fix"
	d. Select disk drive
	e. Choose "Change Hot Fix"
	f. Record "Redirection Area", this is the BadBlock size.
	g. calculate 2% of the disk
	h. if BadBlock size is less than (2% - 256KB) then the disk
	   is NOT affected.
	i. if the BadBlock size is greater than (2% - 256KB) then the
	   disk IS at risk.  See the "Solution" section below for the
	   recovery procedure.


3.  Sun Solaris and SunOS platforms

    The problem described in this blitz affects single-disks units in
    JBOD configuration with SAVE_CONFIG data stored on them.  If you are
    not using JBOD disk units with SAVE_CONFIG data saved on them, do
    not proceed any further.  Your system is NOT at risk.  

    If you followed the installation guide, you are not at risk.  This
    is due to the fact that the default partition layout reserves the
    last two cylinders for diagnostic purposes.  The 500 blocks in
    question will always reside within those two diagnostic cylinders. 
    If you changed the default partition layout, AND allocated the two
    diagnostic cylinders to a partition, you may be at risk.

    If disks in your system are at risk of this problem, use the
    following procedure to check a disk in JBOD configuration to
    determine whether it is exposed:

        a.  Use the GUI to display the number of blocks on the unit. 
	    Do this by selecting the LUN in question, and then chosing
	    LUN parameters from the pull-down menu.  Write down this number.

        b.  Use the tip command (or an RS-232 terminal) to connect to
            the controller CLI.  If you have problems or questions, this
            command is documented in the installation guide.

        c.  Use the CLI command show <unitname>, substituting the actual
            name of the unit in question for <unitname>.

        d.  If the GUI and the CLI report different sizes for the same
            unit, you are at risk for the problem.  See the "Solution"
	    section below for the recovery procedure.


4.  OpenVMS platforms

    The problem described in this blitz affects single-disks units in
    JBOD configuration with SAVE_CONFIG data stored on them.  If you are
    not using JBOD disk units with SAVE_CONFIG data saved on them, do
    not proceed any further.  Your system is NOT at risk.  

    If disks in your system are at risk of this problem, use the
    following procedure to check a disk in JBOD configuration to
    determine whether it is exposed:

        a.  At the controller prompt, type SHOW DISKnnn (where nnn is
	    the JBOD disk in question).

        b.  Look for "Configuration being backed up on this container"
	    message.

	c.  Record the block size capacity displayed by the controller.
              	
	d.  From the OpenVMS prompt on one of the hosts, mount the disk
	    in question and type the command:

            $ show device/full dka200:

        e.  Compare the total block size obtained from the "show device"
	    command with the block size capacity obtained in step 'c.'

        f.  If the reported sizes are different, this disk is at risk for
	    the problem.  See the "Solution" section below for the recovery
	    procedure.

5.  DIGITAL UNIX platforms

    The problem described in this blitz affects single-disks units in
    JBOD configuration with SAVE_CONFIG data stored on them.  If you are
    not using JBOD disk units with SAVE_CONFIG data saved on them, do
    not proceed any further.  Your system is NOT at risk.  

    If disks in your system are at risk of this problem, use the
    following procedure to check a disk in JBOD configuration to
    determine whether it is exposed:

        a.  At the controller prompt, type SHOW DISKnnn (where nnn is
	    the JBOD disk in question).

        b.  Look for "Configuration being backed up on this container"
	    message.

	c.  Record the block size capacity displayed by the controller.
              	
	d.  From the DIGITAL UNIX on one of the hosts, type the following
	    commands (rrza18c is used in the following example as the device
	    in question):

                          # disklabel -rw /dev/rrza18c HSZ40
                          # disklabel -r /dev/rrza18c
                          # /dev/rrza18c:

        e.  Compare the sectors/unit output from disklabel command with
	    the block size capacity obtained in step 'c.'

        f.  If the reported sizes are different, this disk is at risk for
	    the problem.  See the "Solution" section below for the recovery
	    procedure.


6.  AIX platforms

    The problem described in this blitz affects single-disks units in
    JBOD configuration with SAVE_CONFIG data stored on them.  If you are
    not using JBOD disk units with SAVE_CONFIG data saved on them, do
    not proceed any further.  Your system is NOT at risk.  

    If disks in your system are at risk of this problem, use the
    following procedure to check a disk in JBOD configuration to
    determine whether it is exposed:

    AIX 4.1.4:

        a.  Sum the raw device as shown in the following command:

		 sum -r /dev/rhdiskN

        b.  If this operation results in a read error as shown below,
            the disk is at risk for the problem.  See the "Solution"
            section below for the recovery procedure.

		 sum: read error on /dev/rhdiskN

    AIX 3.2.5:  Disks on systems which have the risk factors described
                above should be regarded as at risk for the problem
                described in this blitz.


7.  HP-UX platforms

    The problem described in this blitz affects single-disks units in
    JBOD configuration with SAVE_CONFIG data stored on them.  If you are
    not using JBOD disk units with SAVE_CONFIG data saved on them, do
    not proceed any further.  Your system is NOT at risk.  

    Disks on systems which have the risk factors described above should
    be regarded as at risk for the problem described in this blitz.



   SOLUTION:

    1.  If you are using SAVE_CONFIG to initialize JBOD disks under 
        HSOF V2.7, be sure to initialize the disk with the platform
        file system BEFORE rebooting the controller.

    2.  If a customer has the risk factors for the problem as described
        in the SYMPTOM and DETECTION sections above, he should use the 
        steps below to resolve the discrepancy in controller/operating
        system views of the disk at the earliest opportunity.  Digital
        recommends that the recovery process described below be
        performed BEFORE upgrading the V2.7Z controller to V3.0Z.  Any
        files which may have been written in the SAVE_CONFIG area will
        be accessible to the operating system after the restore process;
        however, any such files are suspect and should be carefully
        examined to ensure that the data they contain is correct, or
        restored from a previous backup.

     a. Back up the unit that contains SAVE_CONFIG information.
     b. Unmount the file system(s) contained on that unit.
     c. Delete the unit from the configuration in the controller.
     d. Initialize the container from the controller without SAVE_CONFIG.
     e. Add the unit back into the configuration.
     f. Initialize and restore unit from backup.

   VERIFICATION:

   N/A


      LARS INFORMATION: (Supplied by MCS)

       Attention Service Personnel: Begin the comment field of your LARS
       with the word "BLITZ" when you perform an activity associated with a 
       BLITZ Type "Service Action Requested".

                     *** DIGITAL INTERNAL USE ONLY ***

\\ GRP=TIME_DEPENDENT CAT=HARDWARE DB=CSSE_TIME_CRITICAL
\\ TYPE=KNOWN_PROBLEM TYPE=BLITZ STATUS=CURRENT PROD=HSZ40
T.R	Title	User	Personal Name	Date	Lines