[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference cookie::hsm

Title:File Shelving
Moderator:COOKIE::HOLSINGER
Created:Mon Mar 15 1993
Last Modified:Thu Jun 05 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:346
Total number of notes:1204

317.0. "HSM Shelving Requests cancelled, et al." by WOTVAX::SMITHD () Wed Feb 12 1997 10:53

We're continuing to have some problems with HSM. The H/W configuration is dual
VAX 8400's connected to dual TL820s via HSJ controllers. We were experiencing 
"volume not S/W enabled" and "media offline" errors associated with a bug in 
SLS version 2.8. I wasn't sure whether the HSM problems were a side effect of
this or not.

Recently we installed SLS version 2.8A and the "volume not S/W enabled" and
"media offline" errors stopped, but HSM continues to act up.

Here's the way HSM is set up:

Version 2.0A of HSM

Version 6.2 of VMS

$ smu sho cache
Cache device _$4$JBA286: is enabled, Cache Flush is held until after
    19-SEP-1996 07:05:11.22, Backup is performed at shelving time
    Cache files are not held on delte of online file
    Blocksize:         0
    Highwater mark:    100%
    Flush interval:    <none>

Cache device _$4$JBB286: is enabled, Cache Flush is held until after
    19-SEP-1996 07:31:33.32, Backup is performed at shelving time
    Cache files are not held on delte of online file
    Blocksize:         0
    Highwater mark:    100%
    Flush interval:    <none>

Cache device _$4$JBA287: is enabled, Cache Flush is held until after
    19-SEP-1996 07:04:44.84, Backup is performed at shelving time
    Cache files are not held on delte of online file
    Blocksize:         0
    Highwater mark:    100%
    Flush interval:    <none>

Cache device _$4$JBB287: is enabled, Cache Flush is held until after
    17-SEP-1996 07:31:37.88, Backup is performed at shelving time
    Cache files are not held on delte of online file
    Blocksize:         0
    Highwater mark:    100%
    Flush interval:    <none>

$ smu sho archive

 HSM$ARCHIVE01 has not been used
   Identifier:     1
   Media type:     TK87K
   Density:        <none>
   Label:          HS0001
   Position:       0
   Device refs:    0
   Shelf refs:     2
   Current pool:   <none>
   Enabled pools:  <none>

 HSM$ARCHIVE02 has been used
   Identifier:     2
   Media type:     TK87K
   Density:        <none>
   Label:          BEF001
   Position:       9
   Device refs:    1
   Shelf refs:     2
   Current pool:   XHF_ARCH_A
   Enabled pools:  XHF_ARCH_A

 HSM$ARCHIVE03 has been used
   Identifier:     3
   Media type:     TK87K
   Density:        <none>
   Label:          BEF709
   Position:       9
   Device refs:    1
   Shelf refs:     2
   Current pool:   XHF_ARCH_B
   Enabled pools:  XHF_ARCH_B

 HSM$ARCHIVE04 has been used
   Identifier:     4
   Media type:     TK87K
   Density:        <none>
   Label:          BDN300
   Position:       9
   Device refs:    1
   Shelf refs:     2
   Current pool:   XHF_ARCH_A1
   Enabled pools:  XHF_ARCH_A1

 HSM$ARCHIVE05 has been used
   Identifier:     5
   Media type:     TK87K
   Density:        <none>
   Label:          BDN400
   Position:       9
   Device refs:    1
   Shelf refs:     2
   Current pool:   XHF_ARCH_B1
   Enabled pools:  XHF_ARCH_B1

$ SMU SHO DEVICE

HSM drive HSM$DEFAULT_DEVICE is enabled.
  Shared access:       < shelve, unshelve >
  MDMS status:         Not configured
  Enabled archives:    <none>

HSM drive _$3$MUA450: is enabled.
  Dedicated access:    < shelve, unshelve >
  MDMS status:         Configured
  Enabled archives:    HSM$ARCHIVE02   id: 2
		       HSM$ARCHIVE04   id: 4

HSM drive _$3$MUA650: is enabled.
  Dedicated access:    < shelve, unshelve >
  MDMS status:         Configured
  Enabled archives:    HSM$ARCHIVE03   id: 3
		       HSM$ARCHIVE05   id: 5

HSM drive _$3$MUA530: is enabled.
  Shared access:       < shelve, unshelve >
  MDMS status:         Configured
  Enabled archives:    HSM$ARCHIVE03   id: 3
		       HSM$ARCHIVE05   id: 5

HSM drive _$3$MUA330: is enabled.
  Shared access:       < shelve, unshelve >
  MDMS status:         Configured
  Enabled archives:    HSM$ARCHIVE02   id: 2
		       HSM$ARCHIVE04   id: 4

$smu show facility

Polycenter HSM is enabled for Shelving and Unshelving
  Facility history:
    Created:             2-MAY-1996 17:05:10.04
    Revised:            17-SEP-1996 08:16:06.12
  Designated servers:   HOBBES
                        CALVIN
  Event logging:        Audit
                        Error
                        Exception
  HSM mode:             Plus
  Remaining license:    999 Gigabytes

$ smu show policy

Policy HSM$DEFAULT_OCCUPANCY is enabled for shelving

Policy HSM$DEFAULT_POLICY is enabled for shelving

Policy HSM$DEFAULT_QUOTA is enabled for shelving

$ smu show shelf

Shelf HSM$DEFAULT_SHELF is enabled for Shelving and Unshelving
  Shelf history:
    Created:             2-MAY-1996 17:05:09.36
    Revised:             3-JUL-1996 16:55:46.64
  Backup Verification:   Off
  Archive Classes:
    Archive list:        HSM$ARCHIVE01  id: 1
    Restore list:        HSM$ARCHIVE01  id: 1

Shelf XHF_SHELF_23 is enabled for Shelving and Unshelving
  Shelf history:
    Created:             11-SEP-1996 13:35:02.10
    Revised:             11-SEP-1996 14:11:13.11
  Backup Verification:   Off
  Archive Classes:      
    Archive list:        HSM$ARCHIVE04  id: 4
                         HSM$ARCHIVE05  id: 5
    Restore list:        HSM$ARCHIVE04  id: 4
                         HSM$ARCHIVE02  id: 2
			 HSM$ARCHIVE05  id: 5
			 HSM$ARCHIVE03  id: 3

Shelf XHF_SHELF_32 is enabled for Shelving and Unshelving
  Shelf history:
    Created:             11-SEP-1996 13:35:02.10
    Revised:             11-SEP-1996 14:11:13.11
  Backup Verification:   Off
  Archive Classes:      
    Archive list:        HSM$ARCHIVE04  id: 4
                         HSM$ARCHIVE05  id: 5
    Restore list:        HSM$ARCHIVE05  id: 5
                         HSM$ARCHIVE03  id: 3
                         HSM$ARCHIVE04  id: 4
                         HSM$ARCHIVE02  id: 2

Our application is a data archive for several types of historical
files. Some file types are extremely large (500K-800K blocks). We keep
1-4 days of data on online storage (depending on data types and file
sizes). This typically keeps our 2 Gbyte disks at 60% full or so. Shelved 
data whose file headers still reside on the disk are retained for 90 days.

The archive classes and shelf is set up so that (at least) one copy of 
all data is on each TL820. Each TL820 has two tape drives enabled for HSM
use. The reason that there are 4 active archives is that due to some operator
errors we got some HSM volumes wiped in the original pair of classes (2 & 3).
The only recovery method we could come up with was to create 2 new classes 
for shelving and retain the old classes for 90 days until all their data is
aged out.

The problems listed below are roughly in priority order:

1) Shelve requests are often cancelled for no reason I can explain. Our
   computer OPS people try to keep things tidy by re-shelving files
   when the users are done with them, but especially on the largest of files,
   shelving requests often get cancelled by the system. The entry in
   HSM$SHP_ERROR.LOG always looks the same and is:

** Request Disposition:

   Non-fatal shelf handler error
   Fatal request error
   Operation was rolled back

** Exception information:

   Exception				Module				Line
   (SHP_ONLINE_READ_ERROR)		SHP_FILE			2736

   Exception				Module				Line
   (SHP_ONLINE_READ_ERROR)		SHP_FILE			2677

       Platform Status  Message TExt
            00000800    %SYSTEM-W-ACCONFLICT, file access conflict

It looks to me like this might be caused by a second user trying to read
a file while it's being shelved; but I have verified that that isn't happening.
Users MAY be doing DIR or even DIR/FULL on the directories containing the files
though. One of the two TL820s seems to count up errors on its two tape drives
much more quickly than the other. The counts aren't unheard of for tape units
(50-60 on one TL820 and less than 10 on the other). Could these errors be 
interfering with shelving. I've tried to correlate the system error log 
entries with entries in HSM$SHP_ERROR.LOG, without success. Any idea what's 
causing this and what could be done about it?

2) Can you describe the algorithm whereby cartridges are removed from tapes 
   and/or tape drives are released for other use once a request completes. We
   have occurences where a request completes, the cartridge is left in the 
   drive and when a request needing a different cartridge is generated it
   stalls sometimes for hours and sometimes indefinitely. OPCOM messages
   indicating that the request for volume "blah" is stalled are generated.
   Shouldn't the tape get removed? Is there any workaround our operators could
   perform when we get into this situation?

3) Similar to two: Describe the algorithm for sending a request to the second
   (or subsequent) archive class in the restore list for a shelf. As in #2
   above, we might have a case when both tapes in one TL820 are busy, but
   there's a free one in the other. The request seems to hang up rather than
   get passed on to other archive classes. This is the reason you see the dual
   shelf definitions above. We've created two separate shelves and split the
   disk volumes used for this activity across them. Then by manually defining
   the restore lists in a different order on the two shelves we get some
   split up of unshelving activity.

4) Given that some of these files are SOOO big, is there some way we can point
   HSM at a different working area for the HSM$xxxxxxxx.RST files created during
   an unshelve operation?

Thanks for help or advice you can provide.

-Doug Smith

T.RTitleUserPersonal
Name
DateLines
317.1Grade up to HSM 2.1 VNABRW::KARTNER_MHOUSTON, we have a problemThu Feb 13 1997 03:3511
    Hi!
    
    I would recomend to grade up to HSM V2.1 witch is SSB allready
    
    COOKIE::AIM$PUBLIC:[HSM.KITS.V21]
    
    This is a bugfix release. There were several problems with HSM2.0A
    including response to OPCOM messages,...
    
    I hope this helps
    								Michael
317.2VAX or ALPHAWOTVAX::SMITHDFri Feb 14 1997 11:3618
>    I would recomend to grade up to HSM V2.1 witch is SSB allready
>    
>    COOKIE::AIM$PUBLIC:[HSM.KITS.V21]
    
Is this the kit?  One might infer from the filenames that this is a VAX
architecture release not alpha?

HSM021.A-DCX_VAXEXE;1
                         929/932      27-JAN-1997 17:17:10.00  (R,RWED,,)
HSM021.B-DCX_VAXEXE;1
                        6135/6136     27-JAN-1997 17:17:11.00  (R,RWED,,)
HSM021.C-DCX_VAXEXE;1
                        9760/9760     27-JAN-1997 17:17:15.00  (R,RWED,,)
HSM021.D-DCX_VAXEXE;1
                        9888/9888     27-JAN-1997 17:17:21.00  (R,RWED,,)

Thanks
Doug
317.3COMEUP::SIMMONDSlock (M); while (not *SOMETHING) { Wait(C,M); } unlock(M)Mon Feb 17 1997 00:4612
    Re: .2  (VAX kits?)
    
.0> We're continuing to have some problems with HSM. The H/W configuration is dual
.0> VAX 8400's connected to dual TL820s via HSJ controllers. We were experiencing 
    ~~~~~~~~
    Who's confused? :):)
    
    Have you tried decompressing the .DCX_VAXEXE files? (on a VAX..)
    and installing the resulting savesets on your Alpha ?
    (The HSM kits have typically been dual Arch.)
    
    John.
317.4SLS-F-MRD_START_FAIL WOTVAX::SMITHDTue Feb 25 1997 12:2012
|    Have you tried decompressing the .DCX_VAXEXE files? (on a VAX..)


Yep, installed the 2.1 release on top of 2.8a SLS and this seems to improve
(possibly fix?) the cancel problem, but now SLS is reporting:

	SLS-F-MRD_START_FAIL - media robot driver startup failure

We are attempting to work this ongoing fun with Ted Saul in the CSC.  Any help
would be appreciated.  

Thanks, Doug
317.5MRD help in SLS conferenceCOOKIE::HOLSINGERHSM Engineering, DTN 522-2843Mon Mar 31 1997 14:2120
re:                   <<< Note 317.4 by WOTVAX::SMITHD >>>

>	SLS-F-MRD_START_FAIL - media robot driver startup failure

Hello Doug, 

This message is indicative of a problem (usually configuration) between the 
robot device and the lowest level software (above the SCSI port driver) used 
by SLS/MDMS to manage the load /unload operations. HSM is at the top of the 
food chain here, and is probably not the most efficient way to troubleshoot 
the problem. 

If the problem persists, I would recommend you re-post the TL8xx/HSJ config 
details in the COOKIE::SLS conference (KP7 if needed). There are several 
entries in that conference already which discuss this exact error message. 
If this problem has been fixed, please disregard this reply (we can use note 
#329 to pursue the dual TL820 problem). 

Regards,
/Paul