[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference cookie::hsm

Title:File Shelving
Moderator:COOKIE::HOLSINGER
Created:Mon Mar 15 1993
Last Modified:Thu Jun 05 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:346
Total number of notes:1204

341.0. "SHELF command getting; shelf-w-cancel with hsm-i-recoverpreshlv in shelf error log" by CX3PST::WSC217::SWANK (David) Fri May 23 1997 13:36

Manual shelf commands are cancelled on a specific device, $2$DKF104.  Anal/disk
and Anal/rms uncover no problems.  The shelf error log contains entries like;

********************************************************************************

** 1486 **                      REQUEST ERROR REPORT

   Error detected on request number 1486 on node BLUE
   Entry logged at 22-MAY-1997 06:41:53.28

** Request Information:

   Identifier:  0
   Process:     21A00FC0
   Username:    SLS
   Timestamp:   22-MAY-1997 06:38:04.19
   Client Node: BLUE
   Source:      System
   Type:        File fault
   Flags:       FileID
   State:       Original Validated
   Status:      Error

** Request Parameters:

   File:        $2$DKF104:[BALTIMRE]BDBALTIMRE_70327_345406_351852.9703740_R;1
   Volume:      _$2$DKF104:
   FileID:      (4735,1,0,0)

** Error Information :

   %HSM-E-OFFLINERROR, offline system error, function not performed
   %SYSTEM-S-NORMAL, normal successful completion

** Request Disposition:

   Non-fatal shelf handler error
   Fatal request error
   Operation was rolled back

** Exception Information:

   Exception                          Module                               Line
   SHP_OFFLINE_READ_ERROR             SHP_OFFLINE                          4510

       Platform Status  Message Text
            00000001    %SYSTEM-S-NORMAL, normal successful completion

   Exception                          Module                               Line
   SHP_OFFLINE_READ_ERROR             SHP_OFFLINE_VMS                      1158

       Platform Status  Message Text
            10A38012    %BACKUP-E-OPENOUT, error opening !AS as output

   Exception                          Module                               Line
   SHP_OFFLINE_ERROR                  SHP_OFFLINE_VMS                      1126

       Platform Status  Message Text
            10A38012    %BACKUP-E-OPENOUT, error opening !AS as output

   %HSM-E-OFFREADERR, offline read error on drive _$3$MKA300:


The custumoer is using his entire RW500 as a cache.  We're able to access the various
optical platters both in one of the four drives and those that were out of a drive from
VMS.

The here's their HSM configuration;

 Cache device _$2$ODA0: is disabled, Cache flush is held until after
     6-OCT-1996 12:45:33.12, Backup is performed at flush intervals,
    Cached files are not held on delete of online file
    Block size:         0
    Highwater mark:     100%
    Flush interval:     <none>
	.
	.
	.
Cache device _$2$ODA162: is enabled, Cache flush is held until after
     6-OCT-1996 17:15:58.47, Backup is performed at flush intervals,
    Cached files are not held on delete of online file
    Block size:         0
    Highwater mark:     100%
    Flush interval:     <none>

Cache device _$2$ODA163: is enabled, Cache flush is held until after
    12-MAY-1997 15:43:40.33, Backup is performed at flush intervals,
    Cached files are held on delete of online file
    Block size:         0
    Highwater mark:     100%
    Flush interval:     0 06:00:00.00
	.
	.
	.
Cache device _$2$ODA163: is enabled, Cache flush is held until after
    12-MAY-1997 15:43:40.33, Backup is performed at flush intervals,
    Cached files are held on delete of online file
    Block size:         0
    Highwater mark:     100%
    Flush interval:     0 06:00:00.00


  HSM$ARCHIVE01 has been used
    Identifier:     1
    Media type:     TK87K
    Density:        COMP
    Label:          S26959
    Position:       1932
    Device refs:    2
    Shelf refs:     4
    Current pool:   HSM
    Enabled pools:  HSM


HSM drive HSM$DEFAULT_DEVICE is enabled.
  Shared access:        < shelve, unshelve >
  Drive status:         Not configured
  Enabled archives:     <none>

HSM drive _$3$MKA200: is enabled.
  Shared access:        < shelve, unshelve >
  Drive status:         Configured
  Enabled archives:     HSM$ARCHIVE01   id: 1

HSM drive _$3$MKA300: is enabled.
  Shared access:        < shelve, unshelve >
  Drive status:         Configured
  Enabled archives:     HSM$ARCHIVE01   id: 1


Policy AMA_ARCHIVE_OCC_POLICY is enabled for shelving
  Policy History:
    Created:            25-FEB-1997 16:57:35.29
    Revised:            26-FEB-1997 19:56:26.95
  Selection Criteria:
    State:              Enabled
    Action:             Shelving
    File Event:         Modification date
    Elapsed time:       45 00:00:00
    Before time:        <none>
    Since time:         <none>
    Lowwater mark:      77%
    Primary Policy:     Least Recently Used (LRU)
    Secondary Policy:   Space Time Working Set (STWS)
  Verification:
    Mail notification:  <none>
    Output file:        <none>

Policy AMA_ARCHIVE_POLICY is enabled for shelving
  Policy History:
    Created:            25-FEB-1997 16:57:36.53
    Revised:            25-FEB-1997 16:57:36.53
  Selection Criteria:
    State:              Enabled
    Action:             Shelving
    File Event:         Modification date
    Elapsed time:       90 00:00:00
    Before time:        <none>
    Since time:         <none>
    Lowwater mark:      60%
    Primary Policy:     Least Recently Used (LRU)
    Secondary Policy:   Space Time Working Set (STWS)
  Verification:
    Mail notification:  <none>
    Output file:        <none>

Policy HSM$DEFAULT_OCCUPANCY is enabled for shelving
  Policy History:
    Created:            25-FEB-1997 16:57:37.54
    Revised:            25-FEB-1997 16:57:37.54
  Selection Criteria:
    State:              Enabled
    Action:             Shelving
    File Event:         Expiration date
    Elapsed time:       180 00:00:00
    Before time:        <none>
    Since time:         <none>
    Lowwater mark:      80%
    Primary Policy:     Space Time Working Set (STWS)
    Secondary Policy:   Least Recently Used (LRU)
  Verification:
    Mail notification:  <none>
    Output file:        <none>

Policy HSM$DEFAULT_POLICY is enabled for shelving
  Policy History:
    Created:            25-FEB-1997 16:57:38.56
    Revised:            25-FEB-1997 16:57:38.56
  Selection Criteria:
    State:              Enabled
    Action:             Shelving
    File Event:         Expiration date
    Elapsed time:       180 00:00:00
    Before time:        <none>
    Since time:         <none>
    Lowwater mark:      80%
    Primary Policy:     Space Time Working Set (STWS)
    Secondary Policy:   Least Recently Used (LRU)
  Verification:
    Mail notification:  <none>
    Output file:        <none>

Policy HSM$DEFAULT_QUOTA is enabled for shelving
  Policy History:
    Created:            25-FEB-1997 16:57:39.57
    Revised:            25-FEB-1997 16:57:39.57
  Selection Criteria:
    State:              Enabled
    Action:             Shelving
    File Event:         Expiration date
    Elapsed time:       180 00:00:00
    Before time:        <none>
    Since time:         <none>
    Lowwater mark:      80%
    Primary Policy:     Space Time Working Set (STWS)
    Secondary Policy:   Least Recently Used (LRU)
  Verification:
    Mail notification:  <none>
    Output file:        <none>


Shelf HSM$DEFAULT_SHELF is enabled for Shelving and Unshelving
  Catalog File:         DISK$DISK104:[HSM.CATALOG]HSM$CATALOG.SYS
  Shelf History:
    Created:            25-FEB-1997 16:57:29.00
    Revised:             9-MAY-1997 15:04:09.64
  Backup Verification:  Off
  Save Time:            <none>
  Updates Saved:        All
  Archive Classes:
    Archive list:       HSM$ARCHIVE01   id: 1
    Restore list:       HSM$ARCHIVE01   id: 1
\
\Any assistance or thoughts would be greatly appreciate,
\David
T.RTitleUserPersonal
Name
DateLines
341.1comments...COOKIE::HOLSINGERHSM Engineering, DTN 522-2843Wed May 28 1997 13:5126
Hello David. 

Thank you for providing detailed information about your HSM configuration. 
Here are a couple of observations: 

    1.	The error entry was logged for a file fault (auto unshelve). The 
	root error was %BACKUP-E-OPENOUT, which usually means that Backup 
	could not locate a specific saveset on the appropriate tape. This 
	means that the corresponding HSM catalog entry and tape contents 
	don't agree. One of the two (tape or catalog) has been modified 
	outside of HSM. The error is not associated with HSM caching. 

	You can use SMU LOCATE/FULL for the file in question, to display the 
	particular tape and saveset that HSM is trying to find. Then, mount 
	the tape /FOREIGN, and use BACKUP $1$MUA0:*.*/SAVE/LIST/OUT=TAPE.LIS 
	to get a list of the saveset files and members on the tape. We can 
	use this info to try and isolate where and how the discrepancy occured. 

    2.	The MO cache configuration looks OK. However, the default shelf is 
	configured to flush only to Archive class 1. I did not see any other 
	Archive class definitions. HSM should always be configured with 
	multiple redundant Archive classes. This should be corrected ASAP. 


Regards,
/Paul
341.2$2$DKF104: is also the catalog diskCX3PST::WSC217::SWANKDavidWed May 28 1997 14:4133
\Paul,
\
>Here are a couple of observations:
>
>    1. The error entry was logged for a file fault (auto unshelve). The
>       root error was %BACKUP-E-OPENOUT, which usually means that Backup
>       could not locate a specific saveset on the appropriate tape. This
>       means that the corresponding HSM catalog entry and tape contents
>       don't agree. One of the two (tape or catalog) has been modified
>       outside of HSM. The error is not associated with HSM caching.

The the catalog is on device $2$DKF104: and they're having some problems with
it as well from an SLS backup standpoint.  I working with the customer on
that problem as well but suspect the catalog itself could be bad.

>       You can use SMU LOCATE/FULL for the file in question, to display the
>       particular tape and saveset that HSM is trying to find. Then, mount
>       the tape /FOREIGN, and use BACKUP $1$MUA0:*.*/SAVE/LIST/OUT=TAPE.LIS
>       to get a list of the saveset files and members on the tape. We can
>       use this info to try and isolate where and how the discrepancy occured.

I'll recommend the above procedure to the customer.  Is there a catalog
"health check" procedure to verify its internal structure and functionality?

>    2. The MO cache configuration looks OK. However, the default shelf is
>       configured to flush only to Archive class 1. I did not see any other
>       Archive class definitions. HSM should always be configured with
>       multiple redundant Archive classes. This should be corrected ASAP.

I've already noted the lack of redundancy to the customer, thanks for your
collaboration.
\
\Regards, David
341.3shelf error log entry w/ SYSTEM-W-ACCONFLICTCX3PST::WSC217::SWANKDavidWed May 28 1997 15:0253
Paul,

After my last reply (.2) I when back to error log that the customer sent and I
may have not sent the corresponding error log entry to the shelf command that
fails.  Does the following entry provide any additional insight as to why the
SHELF command would file with a shelf-w-cancel?;

** 1455 **                      REQUEST ERROR REPORT

   Error detected on request number 1455 on node BLUE
   Entry logged at 22-MAY-1997 05:25:16.25

   Identifier:  20316808
   Process:     21A00136
   Username:    HSM$SERVER
   Timestamp:   22-MAY-1997 05:25:11.65
   Client Node: BLUE
   Source:      Application
   Type:        Shelve file
   Flags:       FileID Makespace
   State:       Canceled Original Validated
   Status:      Error

   File:        $2$DKF104:[AURORA]BDAURORA_70325_011259_011591.9703706_R;1
   Volume:      _$2$DKF104:
   FileID:      (4690,1,0,0)

   %HSM-E-FILERROR, file $2$DKF104:[AURORA]BDAURORA_70325_011259_011591.9703706_
   %SYSTEM-W-ACCONFLICT, file access conflict
   %HSM-I-RECOVERPRESHLV, inconsistent state found, file preshelved

   Non-fatal shelf handler error
   Fatal request error
   Operation was rolled back

   Exception                          Module                               Line
   (SHP_ONLINE_READ_ERROR)            SHP_FILE                             4239

       Platform Status  Message Text
            00000800    %SYSTEM-W-ACCONFLICT, file access conflict

   Exception                          Module                               Line
   SHP_ONLINE_ERROR                   SHP_REQUEST                          7672

   Exception                          Module                               Line
   SHP_ONLINE_WRITE_ERROR             SHP_ONLINE                           5567


It's a %SYSTEM-W-ACCONFLICT error that I'm currently investigating from the SLS
backup side of the house.  Have not yet received the SLS log to know exactly
what's happening there.
\
\David
341.4correction to .1COOKIE::HOLSINGERHSM Engineering, DTN 522-2843Thu May 29 1997 21:0532
Hello David. 

I believe I was mistaken in my .1 analysis of the backup error. Since it is 
%BACKUP-E-OPENOUT, the error occured when backup tried to open the temporary 
restore file during the unshelve. The error did not occur because backup could 
not open the file on the archive tape. The error may indicate a problem with 
the HSM$MANAGER device or directory. 

Please find and post the corresponding HSM$LOG:HSM$SHELF_HANDLER.LOG file. 
Note, there is a new file created each time HSM is started. This file should 
contain additional error info from backup. 

WRT the %SYSTEM-W-ACCONFLICT error, this may be the result of a race condition 
within HSM. The error you posted shows a shelve command failing on a preshelved
file, probably during file truncation. Most all conflicting requests are caught
by HSM, with the exception of conflicts due to cache flushing. The was done for 
performance reasons. If the scenario is as I suspect, a cache flush was in 
progress during a makespace shelve. In any case, the error is not serious, as 
no data is affected, and the makespace operation will simply continue with the 
next candidate file to shelve. 

Please verify the situation by posting the following: 

    1.  SMU LOCATE/FULL the file in question 
    2.	DIR/FULL the file in question 
    3.	locate a cache flush entry in HSM$LOG:HSM$SHP_AUDIT.LOG which 
	possesses the 22-MAY-1997 05:25:16.25 error timestamp 

Also, what version of HSM is the customer running?
Regards,
/Paul

341.5CX3PST::WSC217::SWANKDavidFri May 30 1997 12:5211
\Paul,
\
The customer has deleted old HSM$LOG:*.LOG files and the shelf command
is now working.  Customer is going to disable the flush on the RW500 devices
that had a flush interval.  If the problem re-occures they will send the
log files HSM$LOG:HSM$SHP_AUDIT.LOG & HSM$LOG:HSM$SHP_ERROR.LOG that correspond
to the time fram of the incident as well as the output of SMU LOCATE/FULL
and DIR/FULL of the file in question.
\
\Thanks for your help so far,
\David