[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference msgaxp::optical

Title:Optical Products
Moderator:TAPE::SENEKER
Created:Wed May 04 1988
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:841
Total number of notes:3218

770.0. "Jukebox devices MVTIMEOUT" by HGOVC::CSCHAN () Tue Feb 04 1997 04:58

	This notes has been cross posted in VMScluster note conference.

	There is a 3 nodes CI/NI cluster: Alpha 4000, DEC7730 and DEC7750 
	with RW534 direct connection. After a node leave and re-joint the 
	cluster, many jukebox devices go into mount verification timeout 
	stage. Customer had to reboot the whole cluster to get access those
	jukebox devices. 

	According to the information from customer, one of event happened 
	in the following sequence:
	- Customer shutdown and reboot the Alpha 4000 system
	- Many disks went into mount verification stage.
	- Sometime later, find many mounted jukebox devices (13 out of 20 
	  mounted devices) went into mount verification timeout in Alpha 4000.
	- Same things happened on DEC7750 that has the RW534 direct connected.
        - It is different on DEC7730 system, all mounted jukebox devices have
	  gone to mount verification timeout stage. This is  the most busy
          system in the cluster.
	- Customer tried to dismount those MVtimout devices, process hung.
        - Customer had to reboot the whole cluster.


	VMS6.2, OSMS 3.3-1

	Sysgen Parameter: MVTIMEOUT: 3600 
			  MSCP_LOAD: 1
   			  MSCP_SERVE_ALL:1
	
	All the Jukeboxs devices are served clusterwise. There are 88 (176 
	logical units) cartridges in the optical disk library and only 20 
        mounted usually.
 

	Questions:

	1. Is it a normal behavior for cluster with jukebox devices?

	2. How can we prevent this problem happens?

	I plan to increase the MVTIMEOUT value but I am not sure it can
	fix the problem and whether there is any side effect.


	
	Regards, C.S.

T.RTitleUserPersonal
Name
DateLines
770.1check MSCP_CREDITS/MSCP_BUFFERSTAPE::SENEKERHead banging causes brain mushTue Feb 04 1997 13:1411
    The following verbage is in the OSMS User's Manual:
    
    The value of the sysgen parameters, MSCP_CREDITS and MSCP_BUFFERS,
    should be increased on the node serving optical disks that are mounted
    cluster wide.  The MSCP_CREDITS value should be increased by one for
    each volume being served to the cluster.
    
    The MSCP_BUFFER value should allow for a minimum of 16 pages of
    buffer area per MSCP_CREDITS value.
    
    Rob
770.2HGOVC::CSCHANWed Feb 05 1997 05:4210
    Re .-1
    
    We found both parameters MSCP_CREDITS and MSCP_BUFFERS are set to
    the default value. The field will request the customer increase them
    to the correct vaule. However, these 2 parameters seems to be improve
    I/O throughput only.
    
    Any additional idea?
    
    Regards, C.S. 
770.3TAPE::SENEKERHead banging causes brain mushWed Feb 05 1997 13:4912
    Summarized alot... VMS (cluster code) uses a MSCP_CREDIT to manage
    /track I/O for each MSCP served device.  If/when VMS has to share
    MSCP_CREDITS between multiple devices then the possiblity exists
    that that VMS will detect a timeout of I/O completion from some device.
    
    When this happens it will place all devices on that node into mount
    verification.
    
    I don't remember were the associated value of 16 came from for
    MSCP_BUFFERS.
    
    Rob
770.4Can mount/cluster without DNS/DFS?HGTIMA::CSCHANSun Feb 09 1997 23:1457
    
    
    
	I found an artical in TIMA that quote "[RW5XX] Document...Use of 
        Clusters with Optical Software". I think it is a good reference 
	but there are some queries:


          5.1.D  VMS MOUNT and DISMOUNT operations for each optical volume
          will consume time equivalent to the traditional magnetic disk
          volume VMS MOUNT and DISMOUNT, plus a per-volume swap time of
          fifteen seconds plus the specified MINSWAP delay.

>	Base on this information, should I set the sysgen parameter:
>
>	MVTIMEOUT > (15 + MINSWAP) * number of mounted jukebox devices ?
>

 
          5.2  Use of OSMS and OSDS without decDFS and decDNS

          5.2.B.  Platters must not be mounted /SERVED nor /CLUSTER nor
          /SYSTEM (which implies /SERVED).  Served platters trigger
          OpenVMS MSCP services which currently are not compatible with
          removable storage.

>	The cluster does not use decDFS and decDNS. Could those Platters
>	be mounted /cluster or /system?
	

          5.2.C.  MOUNT and DISMOUNT operations may consume several hours
          on larger autochangers with as many as 288 volumes.  This time
          may be reduced by mounting only as many volumes as are
          absolutely required, by mounting volumes as /NOWRITE (readonly)
          volumes where possible, by using MCR JBUTIL SET PARAMETER
          /MINSWAP=5 to set the platter hold time to a minimum, and by
          keeping as few files open as is necessary.

          5.2.D.  Cluster transitions must be avoided.  Processors joining
          an active cluster do not respect the "special" nature of optical
          disks.  This limitation is particularly true when the node
          rejoining the cluster contains the interface adapter to the
          autochanger and drives.  Any cluster transition will cause all
          disks with outstanding I/O (optical as well as magnetic)
>         clusterwide to begin a mount-verification, which may not
>         complete before an operation timeout occurs which causes another
>         mount verification to begin, ad infinitum.


>	The customer has to shutdown/reboot one of the system often. In this
>	case, how can we eliminate "operation timeout" situation happening?




    
770.5update of old informationTAPE::SENEKERHead banging causes brain mushMon Feb 10 1997 13:1660
    The information in the TIMA article is somewhat out of date. I will
    correct various parts here.
    
    
          5.1.D  VMS MOUNT and DISMOUNT operations for each optical volume
          will consume at least the specified MINSWAP time plus the amount
    	  of time required by the jukebox to physically move the optical
    	  disk to or from it's storage slot.
    
    	  When the time required for the VMS MOUNT and DISMOUNT operations
    	  is greater than the specified MINSWAP time then these operations
    	  will consume the time required for the MOUNT or DISMOUNT
          operation plus  the amount of time required by the jukebox to
    	  physically move the optical disk to or from it's storage slot.
    
>
>	There is no reason (that I am aware of) to set MVTIMEOUT to any
>	value larger than the default value of 3600 seconds (1 hour).
>
 
          5.2  Use of OSMS and OSDS without decDFS and decDNS

          5.2.B.  Platters must not be mounted /SERVED nor /CLUSTER nor
          /SYSTEM (which implies /SERVED).  Served platters trigger
          OpenVMS MSCP services which currently are not compatible with
          removable storage.
>
>	This information is out of date.  The last two releases of OSMS
>	have allows rewritable disks to be used in a cluster.  The /SYSTEM
>	and /CLUSTER qualifiers are allowed.
>

          5.2.C.  MOUNT and DISMOUNT operations may consume several hours
          on larger autochangers with as many as 288 volumes.  This time
          may be reduced by mounting only as many volumes as are
          absolutely required, by mounting volumes as /NOWRITE (readonly)
          volumes where possible, by using MCR JBUTIL SET PARAMETER
          /MINSWAP=5 to set the platter hold time to a minimum, and by
          keeping as few files open as is necessary.

>
>	This is still true.
>

          5.2.D.  Cluster transitions must be avoided.  Processors joining
          an active cluster do not respect the "special" nature of optical
          disks.  This limitation is particularly true when the node
          rejoining the cluster contains the interface adapter to the
          autochanger and drives.  Any cluster transition will cause all
          disks with outstanding I/O (optical as well as magnetic)
          clusterwide to begin a mount-verification, which may not
          complete before an operation timeout occurs which causes another
          mount verification to begin, ad infinitum.

>
>	This information is out of date.  The last two releases of OSMS
>	have allows rewritable disks to be used in a cluster and a mechaism
>	was added to allow mount-verification to complete before it caused
>	another timeout.
>