[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference noted::sns

Title:POLYCENTER System Watchdog for VMS OSF/1 ULTRIX HP-UX AIX SunOS
Notice:Wishes:406,FAQ:845,Kits-VMS:1000,UNIX:694 VMS ECO01 FT kit: 521
Moderator:AZUR::HUREZZ
Created:Fri May 15 1992
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1033
Total number of notes:4584

1003.0. "Clusterwide check for single process?" by UTRTSC::DORLAND (The Wizard of Odz2) Wed Feb 19 1997 11:42

    Hello,
    
    just a short question.
    Is it possible to do a clusterwide watch for a process?
    Eg. the QUEUE_MANAGER runs on only onde node.
    If I add a check for this process on that node I will
    get notified (correctly) that the process is gone when
    it fails over to another node in the cluster.
    
    I tried to enter a 'watch' via the alias cluster node name.
    But this doens't work.
    
    It seems that the consolidator 'translates' the cluster
    alias to one of the cluster nodes and polls that specific
    node for the process. Which may not be runnig there.
    
    However, looking at the output from SENS WATCH SHOW EVENTS 
    I see several events (such as DISK free space below xxx)
    reported with the cluster alias node. 
    
    Why can't I do this with a process?
    
    Thatnks in advance,
    
    Ton Dorland
    (tested with SNS 2.2 ECO3)
T.RTitleUserPersonal
Name
DateLines
1003.1Not yet implementedAZUR::HUREZConnectivity & Computing Services @VBE. DTN 828-5159Wed Feb 19 1997 13:3641
    The feature you're describing is an interesting one, but it is not yet
    implemented into System Watchdog.
    
    If you enter the cluster alias in your profile, then the cluster load
    balancing algorithm will decide which is the cluster member actually
    connected to... So you may get process missing events somehow randomly
    depending upon the presence of the process on the target node selected
    independantly of the Consolidator.  This obviously doesn't work as
    expected.
    
    Besides, the Consolidator has currently no means to know - a priori -
    the cluster members list, from a cluster alias, or even what is a
    cluster alias, a cluster member name or a standalone node name.
    
    Consolidation of cluster-wide events is done a posteriori, once events
    are reported to the Consolidator, as each event packet has a cluster field
    into it.  It consists in merging, for an event code sublist, identical
    event messages coming from distinct cluster members with the same cluster
    alias into a single event.  PROcess missing is not considered as a
    cluster-wide event...
    
    I think the most straightforward way to implement the wished feature
    would be merely to add a parameter into the PROcess missing data
    specification, say using a /CLUSTER_WIDE qualifier, e.g.
    
    	SNS$EDIT> ADD NODE trusted_cluster_member PROCESS proc_name proc_uic -
                   /CLUSTER_WIDE /INTERVAL=...
    
    so that, for processes marked as cluster-wide, the Agent node trusted to
    detect the process presence would scan the cluster process table
    instead of only the local node process table.
    
    Of course, this implies a profile structure change, conversion utilities,
    etc, which cannot be included in an ECO kit, but rather into a point
    release.
    
    What do you think?
    
    Regards,
    
    	-- Olivier.
1003.2Sounds good.UTRTSC::DORLANDThe Wizard of Odz2Thu Feb 20 1997 05:537
    Sounds good, despite the fact that a CONVERT of the database
    is necessary. Also I think it can be implemented faily easy
    because since the last few VMS versions (V6.2 if I remember
    correctly) it is much easier to do clusterwide checks via
    GETJPI.
    
    Thanks, Ton