[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference cookie::hsm

Title:File Shelving
Moderator:COOKIE::HOLSINGER
Created:Mon Mar 15 1993
Last Modified:Thu Jun 05 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:346
Total number of notes:1204

319.0. "All User I/O to disk stalls for 70 seconds" by COMEUP::SIMMONDS (lock (M); while (not *SOMETHING) { Wait(C,M); } unlock(M)) Sun Feb 16 1997 20:54

    Before sending this into IPMT-land..
    
    On a shared volume (i.e. general user disk) where HSM Unshelving/Shelving
    activity is occurring in one directory, _ALL_ user I/O to the volume
    occasionally gets stalled for 70..80 *Seconds* ?  Surely this is not
    normal ?
    
    [Environment:
    	This is the same node reported in note 318.0 --
    
    	AlphaServer 1000A, 512Mb,  SWXCR -> 5 JBOD drives 4Gb each
        OpenVMS Alpha Version V6.2-1H3
        HSM V2.0A (MUP)]
    
    Thanks!
    John.
T.RTitleUserPersonal
Name
DateLines
319.1Before you IPMT, please give us (much) more informationSTOWKS::SLUISHans van Sluis -- Storage Engineering Support Europe- DTN 889 9526Mon Feb 17 1997 05:0421
John,

This *CAN* be normal if the disk becomes full and HSM has to move files to the
cache or (worse) to tape.

Therefore, can you do a '$SHOW DEVICE D' of your user disks.
Furthermore, did you provide the output of the following :
SMU SHOW FACILITY
SMU SHOW VOLUME
SMU SHOW POLICY
SMU SHOW SHELF
SMU SHOW CACHE
SMU SHOW ARCHIVE
SMU SHOW DEVICE

You also should have the log-files in HSM$LOG ready for inspection.

Please post the output over here, so that we have a better impression of
what we're talking about.

	Hans van Sluis,  StorageWorks Engineering Support Europe
319.2detailsCOMEUP::SIMMONDSlock (M); while (not *SOMETHING) { Wait(C,M); } unlock(M)Mon Feb 17 1997 20:01125
    Hans, here's the stuff you asked for.. the disk volume in question is
    DRB0: .. the HSM cache is on DRB1: .. at no time do _any disks ever_
    exceed 75% used space!!   .. the Customer is typically running a DCL
    procedure which runs a program which causes a file fault on a
    particular file (using sys$crmpsc()) in a DRB0: directory and then
    performs an explicit  $ SHELVE <file> after the program has exited.
    
    John.
    
    ----<i>---------------------------------------------------------------
    
$smu show facility

Polycenter HSM is enabled for Shelving and Unshelving
  Facility history:
    Created:		10-FEB-1997 12:22:25.42
    Revised:		14-FEB-1997 20:34:59.18
  Designated servers:	Any cluster member
  Current server:	ARCHER
  Catalog server:	Enabled
  Event logging:	Audit
			Error
			Exception
  HSM mode:		Basic
  Remaining license:	20 gigabytes

$smu show volume

Volume HSM$DEFAULT_VOLUME on Shelf HSM$DEFAULT_SHELF, Shelving is enabled, 
    Unshelving is enabled, Highwater mark detection is disabled, Occupancy full 
    detection is disabled, Disk quota exceeded detection is disabled

Volume _ARCHER$DRB0: on Shelf CRISP$ARCHIVE, Shelving is enabled, Unshelving is 
    enabled, Highwater mark detection is disabled, Occupancy full detection 
    is disabled, Disk quota exceeded detection is disabled

$smu show policy

Policy HSM$DEFAULT_OCCUPANCY is enabled for shelving

Policy HSM$DEFAULT_POLICY is enabled for shelving

Policy HSM$DEFAULT_QUOTA is enabled for shelving

$smu show shelf

Shelf CRISP$ARCHIVE is enabled for Shelving and Unshelving
  Catalog File:		DISK$DATA:[HSM$SERVER.CATALOG]HSM$CATALOG.SYS
  Shelf History:
    Created:		10-FEB-1997 12:22:25.22
    Revised:		10-FEB-1997 12:22:25.22
  Backup Verification:	Off
  Save Time:		<none>
  Updates Saved:	All
  Archive Classes:
    Archive list:	HSM$ARCHIVE01	id: 1
    Restore list:	HSM$ARCHIVE01	id: 1

Shelf HSM$DEFAULT_SHELF is enabled for Shelving and Unshelving
  Catalog File:		DISK$DATA:[HSM$SERVER.CATALOG]HSM$CATALOG.SYS
  Shelf History:
    Created:		10-FEB-1997 12:22:25.41
    Revised:		10-FEB-1997 12:22:25.41
  Backup Verification:	Off
  Save Time:		<none>
  Updates Saved:	All
  Archive Classes:
    Archive list:	<none>
    Restore list:	<none>

$smu show cache

Cache device _ARCHER$DRB1: is enabled, Cache flush is held until after
    15-JAN-1997 15:40:10.35, Backup is performed at flush intervals, 
    Cached files are held on delete of online file
    Block size:		500000
    Highwater mark:	80%
    Flush interval:	0 01:00:00.00

$smu show archive

  HSM$ARCHIVE01 has been used
    Identifier:	    1
    Media type:	    CompacTape III
    Label:	    HSM002
    Position:	    96
    Device refs:    1
    Shelf refs:	    2

  HSM$ARCHIVE02 has been used
    Identifier:	    2
    Media type:	    CompacTape III
    Label:	    HSM002
    Position:	    99
    Device refs:    1
    Shelf refs:	    0

$smu show device

HSM drive HSM$DEFAULT_DEVICE is enabled.
  Shared access:	< shelve, unshelve >
  Drive status:		Not configured
  Media type:		Unknown Type
  Robot name:		<none>
  Enabled archives:	<none>

HSM drive _ARCHER$MKA200: is enabled.
  Shared access:	< shelve, unshelve >
  Drive status:		Configured
  Media type:		CompacTape III
  Robot name:		<none>
  Enabled archives:	HSM$ARCHIVE01	id: 1
			HSM$ARCHIVE02	id: 2

$show device d

Device                  Device           Error    Volume         Free  Trans Mnt
 Name                   Status           Count     Label        Blocks Count Cnt
ARCHER$DKB500:          Online wrtlck        0
ARCHER$DRA0:            Mounted              0  OPENVMS        4978287   405   1
ARCHER$DRA1:            Mounted              0  TRAN           6171021     2   1
ARCHER$DRA2:            Mounted              0  SECS           7325100     1   1
ARCHER$DRB0:            Mounted              0  DATA           2562417    11   1
ARCHER$DRB1:            Mounted              0  DEV            5564898     9   1
ARCHER$DVA0:            Online               0
319.3IPMT launchedCOMEUP::SIMMONDSlock (M); while (not *SOMETHING) { Wait(C,M); } unlock(M)Mon Feb 24 1997 19:5436
    Ok, IPMT on it's way.
    
    The observed stalls are totally unacceptable for Production or even
    Development deployment of HSM.
    
    To summarize:
    
      [ AlphaServer 1000A 5/300 ]=========== [ KZPSA ]
    					       |   |
                          "OPENVMS"    DRA0 ---|   |--- DRB0  "DATA" <----+
                                               |   |                      :
                          "TRAN"       DRA1 ---|   |--- DRB1  "DEV" <-+   :
                                               |   +                  :   :
                          "SECS"       DRA2 ---|                      :   :
                                               +                      :   :
                                                                      :   :
                                                     HSM Cache dev. --+   :
                                                                          :
                                                     HSM Shelv. Vol ------+
    
    1. User runs a batch procedure which first runs a user image which
       file faults a single file on DRB0: (via sys$crmpsc()); file size
       is 495000 blocks.
    
    2. After doing some comptations on the file contents, the image exits.
    
    3. The DCL batch procedure next explicitly SHELVEs the file that was
       just used, selects a new (shelved) file to use and loops to (1.)
    
    These are the _only_ user-initiated HSM operations occurring on DRB0:
    
    Now, if someone happens to reference some file on DRB0: (_not_ the files
    touched by the batch procedure!) then, *sometimes* all I/O on that disk
    is totally stalled for about 160..170 seconds.
    
    John.
319.4-- this is 319.0 --COMEUP::SIMMONDSlock (M); while (not *SOMETHING) { Wait(C,M); } unlock(M)Mon Feb 24 1997 19:5716
    Before sending this into IPMT-land..
    
    On a shared volume (i.e. general user disk) where HSM Unshelving/Shelving
    activity is occurring in one directory, _ALL_ user I/O to the volume
    occasionally gets stalled for 160..170 *Seconds* ?  Surely this is not
    normal ?
    
    [Environment:
    	This is the same node reported in note 318.0 --
    
    	AlphaServer 1000A, 512Mb,  SWXCR -> 5 JBOD drives 4Gb each
        OpenVMS Alpha Version V6.2-1H3
        HSM V2.0A (MUP)]
    
    Thanks!
    John.
319.5COMEUP::SIMMONDSlock (M); while (not *SOMETHING) { Wait(C,M); } unlock(M)Mon Mar 10 1997 02:0418
    Ooops.. (*blush*)
    
    The .3 configuration is wrong; here is the correct disk config.:
    
    [KA1B05]====PCI==============[KZPSC #0]==========[KZPSC #1]========...
    				    |			|
    				    |			|
    				    |-DRA0		|-DRB0
    				    |-DRA1		|-DRB1
    				    |-DRA2
    (All DR disks are single JBOD, i.e. No RAID)
    
    
    Also, (from a proposed explanation in an IPMT Update mail message)
    the behaviour may be due to file highwater marking enabled for the HSM
    shelving volume.. testing in progress.. will reply with more news.
    
    John.
319.6probably FHWM..COMEUP::SIMMONDSlock (M); while (not *SOMETHING) { Wait(C,M); } unlock(M)Mon Mar 10 1997 03:117
    Ok, the stalls have gone since we turned file highwater marking OFF !
    
    Thanks for your forbearance.. I'll check with the Customer whether they
    really _require_ File HWM on this volume in their proposed Production
    environment before I make any further comment..
    
    John.