[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference cookie::hsm

Title:	File Shelving

Moderator:	COOKIE::HOLSINGER

Created:	Mon Mar 15 1993
Last Modified:	Thu Jun 05 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	346
Total number of notes:	1204

319.0. "All User I/O to disk stalls for 70 seconds" by COMEUP::SIMMONDS (lock (M); while (not *SOMETHING) { Wait(C,M); } unlock(M)) Sun Feb 16 1997 20:54

    Before sending this into IPMT-land..
    
    On a shared volume (i.e. general user disk) where HSM Unshelving/Shelving
    activity is occurring in one directory, _ALL_ user I/O to the volume
    occasionally gets stalled for 70..80 *Seconds* ?  Surely this is not
    normal ?
    
    [Environment:
    	This is the same node reported in note 318.0 --
    
    	AlphaServer 1000A, 512Mb,  SWXCR -> 5 JBOD drives 4Gb each
        OpenVMS Alpha Version V6.2-1H3
        HSM V2.0A (MUP)]
    
    Thanks!
    John.

T.R	Title	User	Personal Name	Date	Lines
319.1	Before you IPMT, please give us (much) more information	STOWKS::SLUIS	Hans van Sluis -- Storage Engineering Support Europe- DTN 889 9526	`Mon Feb 17 1997 05:04`	21
	John, This CAN be normal if the disk becomes full and HSM has to move files to the cache or (worse) to tape. Therefore, can you do a '$SHOW DEVICE D' of your user disks. Furthermore, did you provide the output of the following : SMU SHOW FACILITY SMU SHOW VOLUME SMU SHOW POLICY SMU SHOW SHELF SMU SHOW CACHE SMU SHOW ARCHIVE SMU SHOW DEVICE You also should have the log-files in HSM$LOG ready for inspection. Please post the output over here, so that we have a better impression of what we're talking about. Hans van Sluis, StorageWorks Engineering Support Europe
319.2	details	COMEUP::SIMMONDS	lock (M); while (not *SOMETHING) { Wait(C,M); } unlock(M)	`Mon Feb 17 1997 20:01`	125
	Hans, here's the stuff you asked for.. the disk volume in question is DRB0: .. the HSM cache is on DRB1: .. at no time do _any disks ever_ exceed 75% used space!! .. the Customer is typically running a DCL procedure which runs a program which causes a file fault on a particular file (using sys$crmpsc()) in a DRB0: directory and then performs an explicit $ SHELVE <file> after the program has exited. John. ----<i>--------------------------------------------------------------- $smu show facility Polycenter HSM is enabled for Shelving and Unshelving Facility history: Created: 10-FEB-1997 12:22:25.42 Revised: 14-FEB-1997 20:34:59.18 Designated servers: Any cluster member Current server: ARCHER Catalog server: Enabled Event logging: Audit Error Exception HSM mode: Basic Remaining license: 20 gigabytes $smu show volume Volume HSM$DEFAULT_VOLUME on Shelf HSM$DEFAULT_SHELF, Shelving is enabled, Unshelving is enabled, Highwater mark detection is disabled, Occupancy full detection is disabled, Disk quota exceeded detection is disabled Volume _ARCHER$DRB0: on Shelf CRISP$ARCHIVE, Shelving is enabled, Unshelving is enabled, Highwater mark detection is disabled, Occupancy full detection is disabled, Disk quota exceeded detection is disabled $smu show policy Policy HSM$DEFAULT_OCCUPANCY is enabled for shelving Policy HSM$DEFAULT_POLICY is enabled for shelving Policy HSM$DEFAULT_QUOTA is enabled for shelving $smu show shelf Shelf CRISP$ARCHIVE is enabled for Shelving and Unshelving Catalog File: DISK$DATA:[HSM$SERVER.CATALOG]HSM$CATALOG.SYS Shelf History: Created: 10-FEB-1997 12:22:25.22 Revised: 10-FEB-1997 12:22:25.22 Backup Verification: Off Save Time: <none> Updates Saved: All Archive Classes: Archive list: HSM$ARCHIVE01 id: 1 Restore list: HSM$ARCHIVE01 id: 1 Shelf HSM$DEFAULT_SHELF is enabled for Shelving and Unshelving Catalog File: DISK$DATA:[HSM$SERVER.CATALOG]HSM$CATALOG.SYS Shelf History: Created: 10-FEB-1997 12:22:25.41 Revised: 10-FEB-1997 12:22:25.41 Backup Verification: Off Save Time: <none> Updates Saved: All Archive Classes: Archive list: <none> Restore list: <none> $smu show cache Cache device _ARCHER$DRB1: is enabled, Cache flush is held until after 15-JAN-1997 15:40:10.35, Backup is performed at flush intervals, Cached files are held on delete of online file Block size: 500000 Highwater mark: 80% Flush interval: 0 01:00:00.00 $smu show archive HSM$ARCHIVE01 has been used Identifier: 1 Media type: CompacTape III Label: HSM002 Position: 96 Device refs: 1 Shelf refs: 2 HSM$ARCHIVE02 has been used Identifier: 2 Media type: CompacTape III Label: HSM002 Position: 99 Device refs: 1 Shelf refs: 0 $smu show device HSM drive HSM$DEFAULT_DEVICE is enabled. Shared access: < shelve, unshelve > Drive status: Not configured Media type: Unknown Type Robot name: <none> Enabled archives: <none> HSM drive _ARCHER$MKA200: is enabled. Shared access: < shelve, unshelve > Drive status: Configured Media type: CompacTape III Robot name: <none> Enabled archives: HSM$ARCHIVE01 id: 1 HSM$ARCHIVE02 id: 2 $show device d Device Device Error Volume Free Trans Mnt Name Status Count Label Blocks Count Cnt ARCHER$DKB500: Online wrtlck 0 ARCHER$DRA0: Mounted 0 OPENVMS 4978287 405 1 ARCHER$DRA1: Mounted 0 TRAN 6171021 2 1 ARCHER$DRA2: Mounted 0 SECS 7325100 1 1 ARCHER$DRB0: Mounted 0 DATA 2562417 11 1 ARCHER$DRB1: Mounted 0 DEV 5564898 9 1 ARCHER$DVA0: Online 0
319.3	IPMT launched	COMEUP::SIMMONDS	lock (M); while (not *SOMETHING) { Wait(C,M); } unlock(M)	`Mon Feb 24 1997 19:54`	36
	Ok, IPMT on it's way. The observed stalls are totally unacceptable for Production or even Development deployment of HSM. To summarize: [ AlphaServer 1000A 5/300 ]=========== [ KZPSA ] \| \| "OPENVMS" DRA0 ---\| \|--- DRB0 "DATA" <----+ \| \| : "TRAN" DRA1 ---\| \|--- DRB1 "DEV" <-+ : \| + : : "SECS" DRA2 ---\| : : + : : : : HSM Cache dev. --+ : : HSM Shelv. Vol ------+ 1. User runs a batch procedure which first runs a user image which file faults a single file on DRB0: (via sys$crmpsc()); file size is 495000 blocks. 2. After doing some comptations on the file contents, the image exits. 3. The DCL batch procedure next explicitly SHELVEs the file that was just used, selects a new (shelved) file to use and loops to (1.) These are the _only_ user-initiated HSM operations occurring on DRB0: Now, if someone happens to reference some file on DRB0: (_not_ the files touched by the batch procedure!) then, sometimes all I/O on that disk is totally stalled for about 160..170 seconds. John.
319.4	-- this is 319.0 --	COMEUP::SIMMONDS	lock (M); while (not *SOMETHING) { Wait(C,M); } unlock(M)	`Mon Feb 24 1997 19:57`	16
	Before sending this into IPMT-land.. On a shared volume (i.e. general user disk) where HSM Unshelving/Shelving activity is occurring in one directory, _ALL_ user I/O to the volume occasionally gets stalled for 160..170 Seconds ? Surely this is not normal ? [Environment: This is the same node reported in note 318.0 -- AlphaServer 1000A, 512Mb, SWXCR -> 5 JBOD drives 4Gb each OpenVMS Alpha Version V6.2-1H3 HSM V2.0A (MUP)] Thanks! John.
319.5		COMEUP::SIMMONDS	lock (M); while (not *SOMETHING) { Wait(C,M); } unlock(M)	`Mon Mar 10 1997 02:04`	18
	Ooops.. (blush) The .3 configuration is wrong; here is the correct disk config.: [KA1B05]====PCI==============[KZPSC #0]==========[KZPSC #1]========... \| \| \| \| \|-DRA0 \|-DRB0 \|-DRA1 \|-DRB1 \|-DRA2 (All DR disks are single JBOD, i.e. No RAID) Also, (from a proposed explanation in an IPMT Update mail message) the behaviour may be due to file highwater marking enabled for the HSM shelving volume.. testing in progress.. will reply with more news. John.
319.6	probably FHWM..	COMEUP::SIMMONDS	lock (M); while (not *SOMETHING) { Wait(C,M); } unlock(M)	`Mon Mar 10 1997 03:11`	7
	Ok, the stalls have gone since we turned file highwater marking OFF ! Thanks for your forbearance.. I'll check with the Customer whether they really _require_ File HWM on this volume in their proposed Production environment before I make any further comment.. John.