[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

2976.0. "Translan alarms exceptions" by GIDDAY::CHONG (Andrew Chong - Sydney CSC ) Tue May 12 1992 05:17

	
	14 Translan alarms are enabled to poll the bridges at 15 seconds
interval to determine change of status on the links. 

	A sample alarm looks like this :


Alarm Fired Procedure = DISK_USER:[NETWORKS.MCC.ALARMS]AWB_BRS_LINK.COM;1
Alarm Exception Procedure = SYS$COMMON:[MCC]MCC_ALARMS_LOG_EXCEPTION.COM;3
Description = "AWB-BRS Megalink has RECOVERED"
Category = "Bridge Recovery"
Expression = (CHANGE_OF(TRANSLAN AWBTL3503 line 2 module state,*, FORWARDING), 
					  at every 00:00:15)
Severity = Critical
                               

	The problem is that for the past 2 weeks it has been geting exceptions
alarm randomly on all bridges . The exceptions are logged in the exception 
logfile and some entries are seen in the MCC_ALARMS_date_ERROR.LOG

Two types of exceptions are seen :

    1.	%MCC-W-TIME_ALREADY_PA, scheduled time has already passed
        	           The rule has been disabled

    2. 	Cannot communicate with target
	
Particularly of concern is the TIME_ALREADY_PA exception. Since the rule 
itself does not specify a start and ending time for the alarm. The disabling 
of the rule means that no further polling is done until the alarm is reenabled 
via a com procedure  that runs at midnight. 


	Any comments why it would get TIME_ALREADY_PA exception ?
	Is polling the bridges at 15 seconds too frequent ? 
	
	A second problem which may or may not be related is that the above 
alarms are kept alive by a detached process . The detach process runs a com
procedure which enables the translan alarms then do a show command within mcc 
with start="duration" . "duration" is the duration to midnight. This keeps 
the process alive till midnight. The procedure then exit from decmcc which 
disables all alarms and then goes back to enable the alarms for another 24 
hours. Over a period of two to 3 days it is obseved that the detached process 
gradually increase its cpu usage to over 60%. The process has to be restarted 
to get it back to normal (less then 6% cpu utilization).

	This process is created using the account that normally manages decmcc 
and is created with /authorized. 

	Any comments on why cpu usages creeps to such a hugh amount ?

	Andrew

T.RTitleUserPersonal
Name
DateLines
2976.1ease up on polling interval..TOOK::MCPHERSONLife is hard. Play short.Tue May 12 1992 12:2114
Yes 15s polling interval is too short.   You're probably digging yourself into
a hole really fast, esp if the Translans are busy or the lines between them are
congested.

Suggestion: Using FCL, try the command for each of the Translans in your
network and note the longest response time (probably do this a few times just
to get a 'feel' for their average response times).   Use that value + maybe a
10% fudge factor to derive your shorteset "least common denominator" for
polling translans for that attribute.

Unfortunately, the CPU usage issue has be boggled...  Maybe someone from the
alarms team can help there.

/doug
2976.2GIDDAY::CHONGAndrew Chong - Sydney CSC Tue May 12 1992 23:247
    
    The alarms pooling interval will be eased back to 30 seconds to see
    what effect it has . Though I can understand how the rules could be
    disabled with TIME_ALREADY_PASS exceptions. 
    
    Andrew
    
2976.3longer poll interval == problem solvedGIDDAY::CHONGAndrew Chong - Sydney CSC Fri May 22 1992 05:206
       Increasing the poll interval to 30 seconds has clear up the problem
    .
        It has also decrease the cpu usage of the detached process.
     
    Andrew