| After several backup/verify tests, it turns out that the customer gets
"backup verify error on block n in file xyz" regardless of whether
writeback
cache is enabled or disabled, although the error is not as consistant
with
writeback disabled.
I moved the preferred id's to the other controller as requested:
Original config: hszB_top> preferred_id=(1,3)
hszB_bottom> preferred_id=(2,4)
Disks $1$dkb101, $1$dkb300 logging backup/verify errors. Also, just had
problems on VMS $1$dkb100 disk. Couldn't modify field's password - got
unrecognized qualifyer /nopwdexp. Note that the problem disks are all
served
by hszB_top controller. Moved the preferred to hszB_bottom
(prefer=(1,2,3,4)
and now can modify uaf field account o.k. Note the firmware
inconsistancy
messages (both controllers are, Firmware V30Z-2, Hardware A01). Disk
$1$dkb100 VMS does have inconsistant firmware - one RZ29B is 0016,
mirror
RZ29B is 0014. HSOF v3.0 requires 0007 for RZ28B.
$ set host/scsi $1$dkb200
hszB_bottom> set this time=current_time !noticed that time wasn't
set
hszB_bottom> set this preferred_id=(1,2,3,4)
hszB_bottom> exit
$ set host/scsi $1$dkb100
Copyright Digital Equipment Corporation 1993, 1996. All rights
reserved.
HSZ40 Firmware version V30Z-2, Hardware version A01
Last fail code: 08090010
Press " ?" at any time for help.
hszB_bottom> show this full
Controller:
HSZ40 ZG64507449 Firmware V30Z-2, Hardware A01
Configured for dual-redundancy with ZG64507316
In dual-redundant configuration
SCSI address 6
Time: 10-FEB-1997 10:16:29
Host port:
SCSI target(s) (1, 2, 3, 4), Preferred target(s) (1, 2, 3, 4)
TRANSFER_RATE_REQUESTED = 10MHZ
Cache:
32 megabyte write cache, version 2
Cache is GOOD
Battery is GOOD
No unflushed data in cache
CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
CACHE_POLICY = A
Host Functionality Mode = A
Licensing information:
RAID (RAID Option) is ENABLED, license key is VALID
WBCA (Writeback Cache Option) is ENABLED, license key is VALID
MIRR (Disk Mirroring Option) is ENABLED, license key is VALID
Extended information:
Terminal speed 9600 baud, eight bit, no parity, 1 stop bit
Operation control: 00000004 Security state code: 34850
Configuration backup disabled
hszB_bottom> show other full
Controller:
HSZ40 ZG64507316 Firmware V30Z-2, Hardware A01
Configured for dual-redundancy with ZG64507449
In dual-redundant configuration
SCSI address 7
Time: 10-FEB-1997 10:16:39
Host port:
SCSI target(s) (1, 2, 3, 4), No preferred targets
TRANSFER_RATE_REQUESTED = 10MHZ
Cache:
32 megabyte write cache, version 2
Cache is GOOD
Battery is GOOD
No unflushed data in cache
CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
CACHE_POLICY = A
Host Functionality Mode = A
Licensing information:
RAID (RAID Option) is ENABLED, license key is VALID
WBCA (Writeback Cache Option) is ENABLED, license key is VALID
MIRR (Disk Mirroring Option) is ENABLED, license key is VALID
Extended information:
Terminal speed 9600 baud, eight bit, no parity, 1 stop bit
Operation control: 00000004 Security state code: 88722
Configuration backup disabled
hszB_bottom> sho unit
LUN Uses
--------------------------------------------------------------
D100 VMS
D101 VA3
D200 AIJ
D201 VA4
D300 S1
D400 S2
hszB_bottom> show fail
Name Storageset Uses Used by
------------------------------------------------------------------------------
FAILEDSET failedset
Switches:
AUTOSPARE
hszB_bottom> run fmu
Fault Management Utility
FMU> show last most
Last Failure Entry: 3. Flags: 000FF101
Template: 1.(01) Description: Last Failure Event
Occurred on 11-JAN-1997 at 12:33:12
Power On Time: 0. Years, 9. Days, 1. Hours, 38. Minutes, 21. Seconds
Controller Model: HSZ40
Serial Number: ZG64507449 Hardware Version: A01(01)
Firmware Version: V30Z(30)
Informational Report
Instance Code: 0102030A Description:
An unrecoverable firmware inconsistency was detected or an
intentional
restart or shutdown of controller operation was requested.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 2.(02)
Event Threshold: 10.(0A) Classification:
SOFT. An unexpected condition detected by a controller firmware
component
(e.g., protocol violations, host buffer access errors, internal
inconsistencies, uninterpreted device errors, etc.) or an intentional
restart or shutdown of controller operation is indicated.
Last Failure Code: 08090010 (No Last Failure Parameters)
Last Failure Code: 08090010 Description:
The other controller requested this controller to shutdown.
Reporting Component: 8.(08) Description:
Nonvolatile Parameter Memory Failover Control
Reporting component's event number: 9.(09)
Restart Type: 1.(01) Description: No restart
FMU>
$
DECevent after above move preferred to hszB_bottom:
******************************* ENTRY 322
********************************
Logging OS 1. OpenVMS
System Architecture 2. Alpha
OS version V6.2-1H3
Event sequence number 5062.
Timestamp of occurrence 10-FEB-1997 09:57:02
Time since reboot 2 Day(s) 2:50:59
Host name 580A01
System Model AlphaServer 2100A 5/300
Entry type 1. Device Error
---- Device Profile ----
Unit 580A01$DKB100
Product Name HSZ40 SCSI to SCSI Ctrl
-- Driver Supplied Info -
Device Firmware Revision V30Z
Device Firmware Revision V30Z
VMS SCSI Error Type 5. Extended Sense Data from Device
SCSI ID x01
SCSI LUN x00
SCSI SUBLUN x00
Port Status x00000001 NORMAL - normal successful
completion
Command Opcode x00 Test Unit Ready
Command Data
x00
x00
x00
x00
x00
SCSI Status x02 Check Condition
Remaining Byte Length 160.
------- HSZ Data -------
Instance Code x0102030A An unrecoverable firmware
inconsistency
was detected or an intentional
restart or
shutdown of controller operation
was
requested.
Component ID = Executive
Services.
Event Number = x00000002
Repair Action = x00000003
NR Threshold = x0000000A
Template Type x01 Last Failure Event.
Template Flags x00 HCE = 0, Event did not occur
during Host
Command Execution.
Ctrl Serial # ZG64507449
Ctrl Software Revision V30Z
RAIDSET State x00 NORMAL. All members present and
reconstructed, IF LUN is
configured as a
RAIDSET.
Error Code x70 Current Error
Sense Key x06 Unit Attention
ASC & ASCQ xA000 ASC = x00A0
ASCQ = x0000
Last failure event report.
Last Failure Code x08090010 Component ID = Nonvolatile
Parameter
Memory Failover
Control.
Event Number = x00000009
Repair Action = x00000000
Flag = 0, Firmware Detected
Inconsistency.
Restart Code = No restart.
Parameter Count = 0.
The other controller requested
this
controller to shutdown.
----- Software Info -----
UCB$x_ERTCNT 16. Retries Remaining
UCB$x_ERTMAX 16. Retries Allowable
IRP$Q_IOSB x0000000000000000
UCB$x_STS x08025910 Online
Busy
Software Valid
Unload At Dismount
"Mount Verification" In-Progress
Volume is Valid on the local node
Unit supports the Extended
Function bit
IRP$L_PID x85941230 Requestor "PID"
IRP$x_BOFF 0. Byte Page Offset
RP$x_BCNT 0. Transfer Size In Byte(s)
UCB$x_ERRCNT 3. Errors This Unit
UCB$L_OPCNT 376945. QIO's This Unit
ORB$L_OWNER x00010004 Owners UIC
UCB$L_DEVCHAR1 x1C4D4008 Directory Structured
File Oriented
Sharable
Available
Mounted
Error Logging
Capable of Input
Capable of Output
Random Access
$1$dkb100 VMS disk:
hszB_bottom> show disk100
Name Type Port Targ Lun Used by
------------------------------------------------------------------------------
DISK100 disk 1 0 0 VMS
DEC RZ29B (C) DEC 0016
Switches:
NOTRANSPORTABLE
TRANSFER_RATE_REQUESTED = 10MHZ (synchronous 10 MHZ
negotiated)
Size: 8378028 blocks
hszB_bottom> show disk400
Name Type Port Targ Lun Used by
------------------------------------------------------------------------------
DISK400 disk 4 0 0 VMS
DEC RZ29B (C) DEC 0014
Switches:
NOTRANSPORTABLE
TRANSFER_RATE_REQUESTED = 10MHZ (synchronous 10 MHZ
negotiated)
Size: 8378028 blocks
hszB_bottom>
|
| It looks like the problem is with one of the hsz40 dual redundant controllers,
hszA_top. There are no hardware warnings other then firmware inconsistancy
mentioned in .2. The only indication of a problem is corrupted data!
hszB_top (preferred_id=1,3) was serving three disks - dkb101, dkb300,
dkb100 (VMS). Got backup verify errors on dkb101, dkb300, and uaf
errors on dkb100 (VMS). Had a user that could not log into the system.
Did an anal/rms on the sysuaf file and got a "bucket check byte out of
phase message"
On hszB_top:
$ backup/image/verify/ignore=interlock $7$DKB200: $1$dkb300:
%BACKUP-I-STARTVERIFY, starting verification pass
%BACKUP-E-VERIFYERR, verification error for block 47177 of $1$DKB300:[DSMOPER.
%BACKUP-E-VERIFYERR, verification error for block 57892 of $1$DKB300:[DSMOPER.
%BACKUP-E-VERIFYERR, verification error for block 60850 of $1$DKB300:[DSMOPER.
%BACKUP-E-VERIFYERR, verification error for block 70575 of $1$DKB300:[DSMOPER.
$ dir/dat=mod/sinc $7$DKB200:[000000...]
%DIRECT-W-NOFILES, no files found
$
On hszB_bottom (after move preferred_id)
$ backup/image/verify/ignore=interlock $7$DKB200: $1$dkb300:
%BACKUP-I-STARTVERIFY, starting verification pass
$ dir/dat=mod/sinc $7$DKB200:[000000...]
%DIRECT-W-NOFILES, no files found
$
hszB_top controller:
Copyright Digital Equipment Corporation 1994. All rights reserved
Copyright Digital Equipment Corporation 1993, 1996. All rights reserved.
HSZ40 Firmware version V30Z-2, Hardware version A01
Last fail code: 20090010
Press " ?" at any time for help.
hszB_top>
hszB_top> sho this full
Controller:
HSZ40 ZG64507316 Firmware V30Z-2, Hardware A01
Configured for dual-redundancy with ZG64507449
In dual-redundant configuration
SCSI address 7
Time: 11-FEB-1997 11:07:39
Host port:
SCSI target(s) (1, 2, 3, 4), Preferred target(s) (3)
TRANSFER_RATE_REQUESTED = 10MHZ
Cache:
32 megabyte write cache, version 2
Cache is GOOD
Battery is GOOD
No unflushed data in cache
CACHE_FLUSH_TIMER = DEFAULT (10 seconds)
CACHE_POLICY = A
Host Functionality Mode = A
Licensing information:
RAID (RAID Option) is ENABLED, license key is VALID
WBCA (Writeback Cache Option) is ENABLED, license key is VALID
MIRR (Disk Mirroring Option) is ENABLED, license key is VALID
Extended information:
Terminal speed 9600 baud, eight bit, no parity, 1 stop bit
Operation control: 00000004 Security state code: 29110
Configuration backup disabled
hszB_top>
hszB_top> run fmu
Fault Management Utility
FMU> sho last most
Last Failure Entry: 1. Flags: 000FF501
Template: 1.(01) Description: Last Failure Event
Occurred on 11-JAN-1997 at 12:33:18
Power On Time: 0. Years, 8. Days, 14. Hours, 40. Minutes, 0. Seconds
Controller Model: HSZ40
Serial Number: ZG64507316 Hardware Version: A01(01)
Firmware Version: V30Z(30)
Informational Report
Instance Code: 0102030A Description:
An unrecoverable firmware inconsistency was detected or an intentional
restart or shutdown of controller operation was requested.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 2.(02)
Event Threshold: 10.(0A) Classification:
SOFT. An unexpected condition detected by a controller firmware component
(e.g., protocol violations, host buffer access errors, internal
inconsistencies, uninterpreted device errors, etc.) or an intentional
restart or shutdown of controller operation is indicated.
Last Failure Code: 20090010 (No Last Failure Parameters)
Last Failure Code: 20090010 Description:
This controller requested this controller to shutdown.
Reporting Component: 32.(20) Description:
Command Line Interpreter
Reporting component's event number: 9.(09)
Restart Type: 1.(01) Description: No restart
FMU>
FMU> show last all
Last Failure Entry: 1. Flags: 000FF501
Template: 1.(01) Description: Last Failure Event
Occurred on 11-JAN-1997 at 12:33:18
Power On Time: 0. Years, 8. Days, 14. Hours, 40. Minutes, 0. Seconds
Controller Model: HSZ40
Serial Number: ZG64507316 Hardware Version: A01(01)
Firmware Version: V30Z(30)
Informational Report
Instance Code: 0102030A Description:
An unrecoverable firmware inconsistency was detected or an intentional
restart or shutdown of controller operation was requested.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 2.(02)
Event Threshold: 10.(0A) Classification:
SOFT. An unexpected condition detected by a controller firmware component
(e.g., protocol violations, host buffer access errors, internal
inconsistencies, uninterpreted device errors, etc.) or an intentional
restart or shutdown of controller operation is indicated.
Last Failure Code: 20090010 (No Last Failure Parameters)
Last Failure Code: 20090010 Description:
This controller requested this controller to shutdown.
Reporting Component: 32.(20) Description:
Command Line Interpreter
Reporting component's event number: 9.(09)
Restart Type: 1.(01) Description: No restart
Last Failure Entry: 4. Flags: 000FF181
Template: 1.(01) Description: Last Failure Event
Occurred on 08-JAN-1997 at 17:03:08
Power On Time: 0. Years, 5. Days, 19. Hours, 10. Minutes, 7. Seconds
Controller Model: HSZ40
Serial Number: ZG64507316 Hardware Version: A01(01)
Firmware Version: V30Z(30)
Informational Report
Instance Code: 01010302 Description:
An unrecoverable hardware detected fault occurred.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 1.(01)
Event Threshold: 2.(02) Classification:
HARD. Failure of a component that affects controller performance or
precludes access to a device connected to the controller is indicated.
Last Failure Code: 018700A0 (No Last Failure Parameters)
Last Failure Code: 018700A0 Description:
A processor interrupt was generated with an indication that the (//) RESET
button on the controller module was depressed.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 135.(87)
Restart Type: 2.(02) Description: Automatic hardware restart
Last Failure Entry: 3. Flags: 000FF101
Template: 1.(01) Description: Last Failure Event
Occurred on 08-JAN-1997 at 14:45:32
Power On Time: 0. Years, 5. Days, 16. Hours, 52. Minutes, 47. Seconds
Controller Model: HSZ40
Serial Number: ZG64507316 Hardware Version: A01(01)
Firmware Version: V30Z(30)
Informational Report
Instance Code: 0102030A Description:
An unrecoverable firmware inconsistency was detected or an intentional
restart or shutdown of controller operation was requested.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 2.(02)
Event Threshold: 10.(0A) Classification:
SOFT. An unexpected condition detected by a controller firmware component
(e.g., protocol violations, host buffer access errors, internal
inconsistencies, uninterpreted device errors, etc.) or an intentional
restart or shutdown of controller operation is indicated.
Last Failure Code: 20080000 (No Last Failure Parameters)
Last Failure Code: 20080000 Description:
This controller requested this controller to restart.
Reporting Component: 32.(20) Description:
Command Line Interpreter
Reporting component's event number: 8.(08)
Restart Type: 0.(00) Description: Full firmware restart
Last Failure Entry: 2. Flags: 000FF180
Template: 1.(01) Description: Last Failure Event
Power On Time: 0. Years, 4. Days, 13. Hours, 19. Minutes, 58. Seconds
Controller Model: HSZ40
Serial Number: ZG64507316 Hardware Version: A01(01)
Firmware Version: V30Z(30)
Informational Report
Instance Code: 01010302 Description:
An unrecoverable hardware detected fault occurred.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 1.(01)
Event Threshold: 2.(02) Classification:
HARD. Failure of a component that affects controller performance or
precludes access to a device connected to the controller is indicated.
Last Failure Code: 018800A0 (No Last Failure Parameters)
Last Failure Code: 018800A0 Description:
A processor interrupt was generated with an indication that the program
card was removed.
Reporting Component: 1.(01) Description:
Executive Services
Reporting component's event number: 136.(88)
Restart Type: 2.(02) Description: Automatic hardware restart
FMU> show memory all
***No Memory System Failures found; translation terminated***
FMU>
|