[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

3814.0. "Enabled/running Alarm not POLLING?" by GWTEAL::WOESTEMEYER (Why??...Why not!!!) Mon Sep 28 1992 16:38

How can an alarms rule be enabled and running, yet have miseed several polls?
The 'show' below is typical of several rules in this state.  To try and 
solve this we increased the poll interval to 15 minutes, but still this 
continues to happen.

Steve Woestemeyer
CSC/CS Network Support Team
-------------------------------------------------------------------------
VMS V5.5-1, DECmcc BMS V1.2

MCC 0 ALARMS RULE CISCO_2_STATE_CHANGE
AT 22-SEP-1992 08:22:08 All Attributes

                                   NAME = CISCO_2_STATE_CHANGE
                                  State = Enabled
                               Substate = Running
                Time of Last Evaluation = 21-SEP-1992 17:35:56.42
              Result of Last Evaluation = False
                       Current Severity = Clear
                     Creation Timestamp = 21-SEP-1992 13:21:56.30
                       Evaluation Error = 0
                        Evaluation True = 0
                       Evaluation False = 255
                             Expression = (CHANGE_OF(SNMP ISGRSE2 INTERFACE 2
                                          IFLASTCHANGE, *,*),
AT EVERY
                                          00:01:00)
                            Description = "If the substate is anything other
                                          than NONE, then
                                               either there is a problem with
                                          the circuit or the
                                                  circuit is in  the OFF or
                                          SERVICE state."
                               Category = "Circuit problems"
                              Procedure =
SYS$COMMON:[MCC]MCC_ALARMS_MAIL_ALARM.
                                          COM;3
                                  Queue = "SYS$BATCH"
                              Parameter = "MCC"
                      Exception Handler =
SYS$COMMON:[MCC]MCC_ALARMS_MAIL_EXCEPT
                                          ION.COM;2
                     Perceived Severity = Critical
                         Probable Cause = Unknown
T.RTitleUserPersonal
Name
DateLines
3814.1You mean it just stops ?MOLAR::ROBERTSKeith Roberts - Network Management ApplicationsMon Sep 28 1992 17:2822
  Steve,

  So I understand .. the rule runs for days (well, in your example 255 
  false evaluations) .. then appears to get stuck.  Neither the counters
  nor the 'time of last evaluation' change.

  (Q) Does this happen on any other entity other than SNMP ?

  If the Rule's background thread dies for some reason, the Rule Status and
  Counter information will just stop changing.  Or, if Alarms is waiting
  on SNMP (in your example) to return with some data .. the counters and
  such will appear stuck.

  (Q) Do you have a lot of SNMP rules running simultaineously ?

  I believe if the SNMP AM runs out of sockets, it keeps trying to get one
  every second till one becomes available.  I don't know what would happen
  if one never became available.

  /keith

  
3814.2More Info on Stalling SNMP alarmsGWTEAL::WOESTEMEYERWhy??...Why not!!!Thu Oct 01 1992 11:2312
    More info and answers to question(.1):
    
    IP transport is not UCX but TVG Multinet V3.1
    
    (A) Only polling alarms set up are SNMP alarms, all others are event
        driven.
    
    (A) There are currently 30 SNMP polling alarms enabled, 2 of which have
        stalled out.
    
    Where do we go from here.
    Steve
3814.3TOOK::GUERTINIt fall down, go boomThu Oct 01 1992 14:032
    Does TVG Multinet V3.1 have a restriction on the number of concurrent
    open sockets they support?
3814.4Multinet # of Socket?GWTEAL::WOESTEMEYERWhy??...Why not!!!Fri Oct 02 1992 12:199
    I wish I knew alot more about TVGs MULTINET.  I am begining to think
    this may be at the core of this problem.  I have asked the customer to
    see if Multinet has a command similar to 'UCX SHOW COMMUNICATION',
    which lists the configured number of sockets, as well as the current
    and peak number used.
    
    Any Multinet users out there have any ideas?
    
    Steve
3814.5 More Info on Stalling SNMP alarms!IOOSRV::HITTENMILLERMon Oct 26 1992 13:0184
 I went on site to review problem with the customer and found the following two
events.

   Problem Summary Description:

     Several of the customers SNMP polling alarms are hanging.  There is
   no indication that there is any problem until the time and date of the
   last evaluation is compared to the current time and date.  When this 
   comparison is done it is seen that there can be a large difference   
   between the current time and the time of last evaluation, many times 
   the polling period.                                                  

   Hardware configuration: Vaxstation 3100, Model 76
   Software configuration: VMS 5.5, NODNS, NORDB, 
  			   TVG Multinet V3.1, for TCPIP Access module, and
			   DECMCC-BMS V1.2


 1) Polling of ALARMS stalling. 

    o	The alarm's status is ENABLED but the time of last poll was three days 
	ago. 
    o	I verified system parameters as stated by MCC and TGV, all were 
	configured correctly. 
    o	Next we rebooted the system to start from a known state. 
    o	Entered MCC/interface=windows and enabled the alarms. 
    o	Waited until all alarms were polled at least one time, alarms are set
	for 10 minutes and one hour.
    o	Removed the system from the network and connected a LOOPBACK connector
	to the MCC system.
    o	Alarms started to trigger.
    o	After 45 minutes the system was connected back to the network and waited
        30 minutes to review polling status.
    o	Checking the status of when the alarms were last polled the DECNET
	alarms were all working. Most of the SNMP alarms last polled time was 
	1:15 to 30 minutes ago, NOT working -- ENABLED but last time of update
	were not correct.
    o   Any SNMP alarms the were not polled while the system was disconnected
	from the network was still working (an hour timed alarm for example).
    o   Looking at the network with a sniffer were was no packeted on the wire
	for the alarm events that didn't have the correct polling time. Any
	of the SNMP alarms with the correct polling time was seen on the wire.
    o   Looking a the mail massage triggered from one of the alarms that was
	triggered while the system was off the network was the following 
	exception: Internet Communication device error %SYSTEM-W-CANCEL, 
	operation. This alarm was one that was NOW stuck. There was no software
        error log for MCC for alarm logging or Multinet reported no errors but
	other SNMP alarms were still working.
    o   We disabled and enabled the alarms that were not being updated and they
	started to work.

 2) Error in command file to process event alarms.

        During the processing of an event alarm a data file:
		SYS$SCRATCH:MCC_ALARMS_DATA_xxxxxxx.DAR
	is created. The second record in this file has a record size of 1039
	bytes, this record size causes an error:
		%DCL-W-BUFOVF, command buffer overflow - shorten expression or
				command line
	The error occurs after the second read of the data file:
	   $ READ_LOOP:
	   $ !
	   $ 	read/end_of_file=endit data_file line
	   $ 	string - f$element(0, " ",line)
           %DCL-W-BUFOVF, command buffer overflow - shorten expression or command line
	   .....
	   ....
	   ...
	   ..
	   .	

	The name of the command file is MCC_ALARMS_MAIL_EXCEPTION.COM. The
	text of the data record that is causing the error is,
	   "MANAGED_OBJECT: SNMP ISGRSE2 Interface 7"	
	with 1001  spaces that follow.

  I don't know if the two problems are related, but this alarms did get stuck.
Not all of the alarms that got stuck had this problem. The alarm definition 
statement looked normal.

Regards,

Bill Hittenmiller				IOOSRV::HITTENMILLER

3814.6QAR'd in mcc012_extMCC1::DITMARSPeteWed Oct 28 1992 17:153
QAR #515, against alarms (a guess)

don't let that inhibit discussion here though
3814.7TRM::KWAKThu Dec 17 1992 19:5814
    
    RE: .5
    
    The first problem (Polling of ALARMS stalling) seems to be caused
    by the TVG Multinet. The TVG Multinet's "Select Call" (which makes
    calls to socket routines) is said to be blocking the process.
    This problem is currently investigated by a MCC developer (I was told.)
    
    The second problem (Error in command file to process event alarms)
    has been reported in #3858.0.
    This problem has been fixed (12/1/92), and the fix will be available in 
    the next release.
    
    William