[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssdevo::hsz40_product

Title:HSZ40 Product Conference
Moderator:SSDEVO::EDMONDS
Created:Mon Apr 11 1994
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:902
Total number of notes:3319

764.0. "backup/verify errors with writeback_cache enabled" by NEWVAX::DISNEY (Jim Disney, phone 410-643-5578) Fri Feb 07 1997 13:23

My customer is getting OVMS BACKUP/IMAGE/VERIFY errors when backing up to an 
HSZ40 RAIDset with WRITEBACK_CACHE enabled, but not with NOWRITEBACKUP_CACHE
disabled. The backup data is dormant and about 4GB in size. I would expect a 
cache flush to occur prior to the backup verify pass and can't explain the 
verify errors. I ran diags (FMU and DECevent) and no errors. The HSZ40 (32MB)
is HSOF v3.0 (+ 2 recommended patches), OVMS 6.2-1H3 (+ALPSCSI02_070 patch). 
T.RTitleUserPersonal
Name
DateLines
764.1SSDEVO::T_GONZALESFri Feb 07 1997 20:385
    do you have a dual controller configurations, if so try preferring the
    unit id that is getting the errors to the other controller.  Also,what
    is nature of the error message?
    
    
764.2Moved preferred id to other controllerNEWVAX::DISNEYJim Disney, phone 410-643-5578Mon Feb 10 1997 13:17303
    After several backup/verify tests, it turns out that the customer gets
    "backup verify error on block n in file xyz" regardless of whether
    writeback
    cache is enabled or disabled, although the error is not as consistant
    with
    writeback disabled. 
    
    I moved the preferred id's to the other controller as requested:
    
    Original config: 	hszB_top> preferred_id=(1,3)
    			hszB_bottom> preferred_id=(2,4)
    
    Disks $1$dkb101, $1$dkb300 logging backup/verify errors. Also, just had 
    problems on VMS $1$dkb100 disk. Couldn't modify field's password - got
    unrecognized qualifyer /nopwdexp. Note that the problem disks are all
    served 
    by hszB_top controller. Moved the preferred to hszB_bottom
    (prefer=(1,2,3,4) 
    and now can modify uaf field account o.k. Note the firmware
    inconsistancy 
    messages (both controllers are, Firmware V30Z-2, Hardware  A01). Disk 
    $1$dkb100 VMS does have inconsistant firmware - one RZ29B is 0016,
    mirror 
    RZ29B is 0014. HSOF v3.0 requires 0007 for RZ28B.
    
    $ set host/scsi $1$dkb200
    hszB_bottom> set this time=current_time	!noticed that time wasn't
    set
    hszB_bottom> set this preferred_id=(1,2,3,4)
    hszB_bottom> exit
    
    $ set host/scsi $1$dkb100
    
    Copyright Digital Equipment Corporation 1993, 1996. All rights
    reserved.
    HSZ40 Firmware version V30Z-2, Hardware version  A01
    
    Last fail code: 08090010
    
    Press " ?" at any time for help.
    
    
    hszB_bottom> show this full
    
    Controller:
            HSZ40 ZG64507449 Firmware V30Z-2, Hardware  A01
            Configured for dual-redundancy with ZG64507316
                In dual-redundant configuration
            SCSI address 6
            Time: 10-FEB-1997 10:16:29
    Host port:
            SCSI target(s) (1, 2, 3, 4), Preferred target(s) (1, 2, 3, 4)
            TRANSFER_RATE_REQUESTED = 10MHZ
    Cache:
            32 megabyte write cache, version 2
            Cache is GOOD
            Battery is GOOD
            No unflushed data in cache
            CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
            CACHE_POLICY = A
            Host Functionality Mode = A
    Licensing information:
            RAID (RAID Option) is ENABLED, license key is VALID
            WBCA (Writeback Cache Option) is ENABLED, license key is VALID
            MIRR (Disk Mirroring Option) is ENABLED, license key is VALID
    Extended information:
            Terminal speed 9600 baud, eight bit, no parity, 1 stop bit
            Operation control: 00000004  Security state code: 34850
            Configuration backup disabled
    hszB_bottom> show other full
    
    Controller:
            HSZ40 ZG64507316 Firmware V30Z-2, Hardware  A01
            Configured for dual-redundancy with ZG64507449
                In dual-redundant configuration
            SCSI address 7
            Time: 10-FEB-1997 10:16:39
    Host port:
            SCSI target(s) (1, 2, 3, 4), No preferred targets
            TRANSFER_RATE_REQUESTED = 10MHZ
    Cache:
            32 megabyte write cache, version 2
            Cache is GOOD
            Battery is GOOD
            No unflushed data in cache
            CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
            CACHE_POLICY = A
            Host Functionality Mode = A
    Licensing information:
            RAID (RAID Option) is ENABLED, license key is VALID
            WBCA (Writeback Cache Option) is ENABLED, license key is VALID
            MIRR (Disk Mirroring Option) is ENABLED, license key is VALID
    Extended information:
            Terminal speed 9600 baud, eight bit, no parity, 1 stop bit
            Operation control: 00000004  Security state code: 88722
            Configuration backup disabled
    hszB_bottom> sho unit
    
        LUN                                      Uses
    --------------------------------------------------------------
    
      D100                                       VMS
      D101                                       VA3
      D200                                       AIJ
      D201                                       VA4
      D300                                       S1
      D400                                       S2
    hszB_bottom> show fail
    
    Name          Storageset                     Uses             Used by
    ------------------------------------------------------------------------------
    
    FAILEDSET     failedset                                       
            Switches:
              AUTOSPARE
    hszB_bottom> run fmu
    
    
               Fault Management Utility
    
    FMU> show last most
    
    
    Last Failure Entry: 3. Flags: 000FF101
     Template: 1.(01) Description: Last Failure Event
     Occurred on 11-JAN-1997 at 12:33:12
     Power On Time: 0. Years, 9. Days, 1. Hours, 38. Minutes, 21. Seconds
     Controller Model: HSZ40
     Serial Number: ZG64507449 Hardware Version:  A01(01)
     Firmware Version: V30Z(30)
     Informational Report
     Instance Code: 0102030A Description: 
      An unrecoverable firmware inconsistency was detected or an
    intentional
      restart or shutdown of controller operation was requested.
     Reporting Component: 1.(01) Description: 
      Executive Services
     Reporting component's event number: 2.(02)
     Event Threshold: 10.(0A) Classification:
      SOFT. An unexpected condition detected by a controller firmware
    component
      (e.g., protocol violations, host buffer access errors, internal 
    inconsistencies, uninterpreted device errors, etc.) or an intentional
      restart or shutdown of controller operation is indicated.
     Last Failure Code: 08090010 (No Last Failure Parameters)
     Last Failure Code: 08090010 Description: 
      The other controller requested this controller to shutdown.
     Reporting Component: 8.(08) Description: 
      Nonvolatile Parameter Memory Failover Control
     Reporting component's event number: 9.(09)
     Restart Type: 1.(01) Description: No restart
    FMU> 
    $ 
    
    DECevent after above move preferred to hszB_bottom:
    
    ******************************* ENTRY  322
    ******************************** 
    
    
    Logging OS                        1. OpenVMS 
    System Architecture               2. Alpha 
    OS version                           V6.2-1H3 
    Event sequence number          5062. 
    Timestamp of occurrence              10-FEB-1997 09:57:02   
    Time since reboot                    2 Day(s) 2:50:59 
    Host name                            580A01   
    
    System Model                         AlphaServer 2100A 5/300 
    
    Entry type                        1. Device Error 
    
    
    ---- Device Profile ----               
    Unit                                 580A01$DKB100 
    Product Name                         HSZ40  SCSI to SCSI Ctrl 
    
    -- Driver Supplied Info -              
    Device Firmware Revision             V30Z 
    
    
    Device Firmware Revision             V30Z 
    VMS SCSI Error Type               5. Extended Sense Data from Device 
    SCSI ID                         x01 
    SCSI LUN                        x00 
    SCSI SUBLUN                     x00 
    Port Status               x00000001  NORMAL  -  normal successful
    completion 
    Command Opcode                  x00  Test Unit Ready 
    Command Data                           
                                    x00 
                                    x00 
                                    x00 
                                    x00 
                                    x00 
                                           
    SCSI Status                     x02  Check Condition 
    Remaining Byte Length           160. 
    
    ------- HSZ Data -------               
    Instance Code             x0102030A  An unrecoverable firmware
    inconsistency 
                                         was detected or an intentional
    restart or 
                                         shutdown of controller operation
    was 
                                         requested. 
    
    
                                         Component ID =   Executive
    Services. 
                                         Event Number =   x00000002 
                                         Repair Action =   x00000003 
                                         NR Threshold =   x0000000A 
    Template Type                   x01  Last Failure Event. 
    Template Flags                  x00  HCE =   0, Event did not occur
    during Host 
                                                 Command Execution. 
    Ctrl Serial #                              ZG64507449 
    Ctrl Software Revision               V30Z 
    RAIDSET State                   x00  NORMAL. All members present and 
                                         reconstructed, IF LUN is
    configured as a 
                                         RAIDSET. 
    
    Error Code                      x70  Current Error 
    Sense Key                       x06  Unit Attention 
    ASC & ASCQ                    xA000  ASC  =   x00A0 
                                         ASCQ =   x0000 
                                         Last failure event report. 
    
    Last Failure Code         x08090010  Component ID =   Nonvolatile
    Parameter 
                                                          Memory Failover
    Control. 
                                         Event Number =   x00000009 
    
    
                                         Repair Action =   x00000000 
                                         Flag =   0, Firmware Detected 
                                                  Inconsistency. 
                                         Restart Code =   No restart. 
                                         Parameter Count =   0. 
                                              
                                         The other controller requested
    this 
                                         controller to shutdown. 
    
    ----- Software Info -----              
    UCB$x_ERTCNT                     16. Retries Remaining    
    UCB$x_ERTMAX                     16. Retries Allowable    
    IRP$Q_IOSB                x0000000000000000 
    UCB$x_STS                 x08025910  Online 
                                         Busy 
                                         Software Valid 
                                         Unload At Dismount 
                                         "Mount Verification" In-Progress 
                                         Volume is Valid on the local node 
                                         Unit supports the Extended
    Function bit 
    IRP$L_PID                 x85941230  Requestor "PID"    
    IRP$x_BOFF                        0. Byte Page Offset    
    
    RP$x_BCNT                        0. Transfer Size In Byte(s)    
    UCB$x_ERRCNT                      3. Errors This Unit    
    UCB$L_OPCNT                  376945. QIO's This Unit    
    ORB$L_OWNER               x00010004  Owners UIC    
    UCB$L_DEVCHAR1            x1C4D4008  Directory Structured 
                                         File Oriented 
                                         Sharable 
                                         Available 
                                         Mounted 
                                         Error Logging 
                                         Capable of Input 
                                         Capable of Output 
                                         Random Access 
    
    $1$dkb100 VMS disk:
    
    hszB_bottom> show disk100
    Name          Type                      Port Targ  Lun        Used by
    ------------------------------------------------------------------------------
    
    DISK100       disk                         1    0    0        VMS
              DEC      RZ29B    (C) DEC 0016
            Switches:
              NOTRANSPORTABLE       
              TRANSFER_RATE_REQUESTED = 10MHZ (synchronous 10 MHZ
    negotiated)
            Size: 8378028 blocks
    hszB_bottom> show disk400
    Name          Type                      Port Targ  Lun        Used by
    ------------------------------------------------------------------------------
    
    DISK400       disk                         4    0    0        VMS
              DEC      RZ29B    (C) DEC 0014
            Switches:
              NOTRANSPORTABLE       
              TRANSFER_RATE_REQUESTED = 10MHZ (synchronous 10 MHZ
    negotiated)
            Size: 8378028 blocks
    hszB_bottom> 
    
764.3hsz40 controller corrupting dataNEWVAX::DISNEYJim Disney, phone 410-643-5578Tue Feb 11 1997 14:50211
It looks like the problem is with one of the hsz40 dual redundant controllers,
hszA_top. There are no hardware warnings other then firmware inconsistancy
mentioned in .2. The only indication of a problem is corrupted data!  

hszB_top (preferred_id=1,3) was serving three disks - dkb101, dkb300,
    dkb100 (VMS). Got backup verify errors on dkb101, dkb300, and uaf
    errors on dkb100 (VMS). Had a user that could not log into the system.  
    Did an anal/rms on the sysuaf file and got a "bucket check byte out of 
    phase message"                        

On hszB_top:

$ backup/image/verify/ignore=interlock $7$DKB200: $1$dkb300: 
%BACKUP-I-STARTVERIFY, starting verification pass
%BACKUP-E-VERIFYERR, verification error for block 47177 of $1$DKB300:[DSMOPER.
%BACKUP-E-VERIFYERR, verification error for block 57892 of $1$DKB300:[DSMOPER.
%BACKUP-E-VERIFYERR, verification error for block 60850 of $1$DKB300:[DSMOPER.
%BACKUP-E-VERIFYERR, verification error for block 70575 of $1$DKB300:[DSMOPER.
$ dir/dat=mod/sinc $7$DKB200:[000000...]
%DIRECT-W-NOFILES, no files found
$

On hszB_bottom (after move preferred_id)

$ backup/image/verify/ignore=interlock $7$DKB200: $1$dkb300: 
%BACKUP-I-STARTVERIFY, starting verification pass
$ dir/dat=mod/sinc $7$DKB200:[000000...]
%DIRECT-W-NOFILES, no files found
$

hszB_top controller:

         Copyright Digital Equipment Corporation 1994. All rights reserved 


Copyright Digital Equipment Corporation 1993, 1996. All rights reserved.
HSZ40 Firmware version V30Z-2, Hardware version  A01

Last fail code: 20090010

Press " ?" at any time for help.

hszB_top> 
hszB_top> sho this full
Controller:
        HSZ40 ZG64507316 Firmware V30Z-2, Hardware  A01
        Configured for dual-redundancy with ZG64507449
            In dual-redundant configuration
        SCSI address 7
        Time: 11-FEB-1997 11:07:39
Host port:
        SCSI target(s) (1, 2, 3, 4), Preferred target(s) (3)
        TRANSFER_RATE_REQUESTED = 10MHZ
Cache:
        32 megabyte write cache, version 2
        Cache is GOOD
        Battery is GOOD
        No unflushed data in cache
        CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
        CACHE_POLICY = A
        Host Functionality Mode = A
Licensing information:
        RAID (RAID Option) is ENABLED, license key is VALID
        WBCA (Writeback Cache Option) is ENABLED, license key is VALID
        MIRR (Disk Mirroring Option) is ENABLED, license key is VALID
Extended information:
        Terminal speed 9600 baud, eight bit, no parity, 1 stop bit
        Operation control: 00000004  Security state code: 29110
        Configuration backup disabled
hszB_top> 
hszB_top> run fmu

           Fault Management Utility

FMU> sho last most

Last Failure Entry: 1. Flags: 000FF501
 Template: 1.(01) Description: Last Failure Event
 Occurred on 11-JAN-1997 at 12:33:18
 Power On Time: 0. Years, 8. Days, 14. Hours, 40. Minutes, 0. Seconds
 Controller Model: HSZ40
 Serial Number: ZG64507316 Hardware Version:  A01(01)
 Firmware Version: V30Z(30)
 Informational Report
 Instance Code: 0102030A Description: 
  An unrecoverable firmware inconsistency was detected or an intentional
  restart or shutdown of controller operation was requested.
 Reporting Component: 1.(01) Description: 
  Executive Services
 Reporting component's event number: 2.(02)
 Event Threshold: 10.(0A) Classification:
  SOFT. An unexpected condition detected by a controller firmware component
  (e.g., protocol violations, host buffer access errors, internal
  inconsistencies, uninterpreted device errors, etc.) or an intentional
  restart or shutdown of controller operation is indicated.
 Last Failure Code: 20090010 (No Last Failure Parameters)
 Last Failure Code: 20090010 Description: 
  This controller requested this controller to shutdown.
 Reporting Component: 32.(20) Description: 
  Command Line Interpreter
 Reporting component's event number: 9.(09)
 Restart Type: 1.(01) Description: No restart
FMU>  
FMU> show last all

Last Failure Entry: 1. Flags: 000FF501
 Template: 1.(01) Description: Last Failure Event
 Occurred on 11-JAN-1997 at 12:33:18
 Power On Time: 0. Years, 8. Days, 14. Hours, 40. Minutes, 0. Seconds
 Controller Model: HSZ40
 Serial Number: ZG64507316 Hardware Version:  A01(01)
 Firmware Version: V30Z(30)
 Informational Report
 Instance Code: 0102030A Description: 
  An unrecoverable firmware inconsistency was detected or an intentional
  restart or shutdown of controller operation was requested.
 Reporting Component: 1.(01) Description: 
  Executive Services
 Reporting component's event number: 2.(02)
 Event Threshold: 10.(0A) Classification:
  SOFT. An unexpected condition detected by a controller firmware component
  (e.g., protocol violations, host buffer access errors, internal
  inconsistencies, uninterpreted device errors, etc.) or an intentional
  restart or shutdown of controller operation is indicated.
 Last Failure Code: 20090010 (No Last Failure Parameters)
 Last Failure Code: 20090010 Description: 
  This controller requested this controller to shutdown.
 Reporting Component: 32.(20) Description: 
  Command Line Interpreter
 Reporting component's event number: 9.(09)
 Restart Type: 1.(01) Description: No restart

Last Failure Entry: 4. Flags: 000FF181
 Template: 1.(01) Description: Last Failure Event
 Occurred on 08-JAN-1997 at 17:03:08
 Power On Time: 0. Years, 5. Days, 19. Hours, 10. Minutes, 7. Seconds
 Controller Model: HSZ40
 Serial Number: ZG64507316 Hardware Version:  A01(01)
 Firmware Version: V30Z(30)
 Informational Report
 Instance Code: 01010302 Description: 
  An unrecoverable hardware detected fault occurred.
 Reporting Component: 1.(01) Description: 
  Executive Services
 Reporting component's event number: 1.(01)
 Event Threshold: 2.(02) Classification:
  HARD. Failure of a component that affects controller performance or
  precludes access to a device connected to the controller is indicated.
 Last Failure Code: 018700A0 (No Last Failure Parameters)
 Last Failure Code: 018700A0 Description: 
  A processor interrupt was generated with an indication that the (//) RESET
  button on the controller module was depressed.
 Reporting Component: 1.(01) Description: 
  Executive Services
 Reporting component's event number: 135.(87)
 Restart Type: 2.(02) Description: Automatic hardware restart

Last Failure Entry: 3. Flags: 000FF101
 Template: 1.(01) Description: Last Failure Event
 Occurred on 08-JAN-1997 at 14:45:32
 Power On Time: 0. Years, 5. Days, 16. Hours, 52. Minutes, 47. Seconds
 Controller Model: HSZ40
 Serial Number: ZG64507316 Hardware Version:  A01(01)
 Firmware Version: V30Z(30)
 Informational Report
 Instance Code: 0102030A Description: 
  An unrecoverable firmware inconsistency was detected or an intentional
  restart or shutdown of controller operation was requested.
 Reporting Component: 1.(01) Description: 
  Executive Services
 Reporting component's event number: 2.(02)
 Event Threshold: 10.(0A) Classification:
  SOFT. An unexpected condition detected by a controller firmware component
  (e.g., protocol violations, host buffer access errors, internal
  inconsistencies, uninterpreted device errors, etc.) or an intentional
  restart or shutdown of controller operation is indicated.
 Last Failure Code: 20080000 (No Last Failure Parameters)
 Last Failure Code: 20080000 Description: 
  This controller requested this controller to restart.
 Reporting Component: 32.(20) Description: 
  Command Line Interpreter
 Reporting component's event number: 8.(08)
 Restart Type: 0.(00) Description: Full firmware restart

Last Failure Entry: 2. Flags: 000FF180
 Template: 1.(01) Description: Last Failure Event
 Power On Time: 0. Years, 4. Days, 13. Hours, 19. Minutes, 58. Seconds
 Controller Model: HSZ40
 Serial Number: ZG64507316 Hardware Version:  A01(01)
 Firmware Version: V30Z(30)
 Informational Report
 Instance Code: 01010302 Description: 
  An unrecoverable hardware detected fault occurred.
 Reporting Component: 1.(01) Description: 
  Executive Services
 Reporting component's event number: 1.(01)
 Event Threshold: 2.(02) Classification:
  HARD. Failure of a component that affects controller performance or
  precludes access to a device connected to the controller is indicated.
 Last Failure Code: 018800A0 (No Last Failure Parameters)
 Last Failure Code: 018800A0 Description: 
  A processor interrupt was generated with an indication that the program
  card was removed.
 Reporting Component: 1.(01) Description: 
  Executive Services
 Reporting component's event number: 136.(88)
 Restart Type: 2.(02) Description: Automatic hardware restart
FMU> show memory all

 ***No Memory System Failures found; translation terminated***
FMU>