[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

1226.0. "Alarm notification stopps for domain" by MFRNW1::SCHUSTER (Karl Schuster @MFR Network Services) Wed Jul 10 1991 09:48

I run BMS V1.1, SNMP AM V1.0, Alarms Patch for BMS is installed, VMS V5.4.

I have set up 2 domains ( 2 independant domains, with no parent-child 
relation ). The first domain is for DECnet with about 40 rules with 5 
Minute interval polling, the second domain is for SNMP with about 20 rules
with 1 minute polling interval. Polling runs in Batch with 2 independant
batchprocess ( 1 for each domain ). The processes are restarted every morning.

Alarmnotifications on the MAP work fine for about 2 - 3 days, and then I
get the error messages in the Iconic Maps ( which are always active ).

	Notification being stopped for Domain ...
	unexpected condition returned to notification FM
	exception encountered - event manager reported a lost event

( The DECnet event process is not active ).

The messages appear again, as soon as I restart the Maps.

After a Systemreboot alarming works again for about 2-3 days.

Is this a bug, or a problem with Quotas, Sysgen Parameters, ... ?

Regards,
Karl Schuster
T.RTitleUserPersonal
Name
DateLines
1226.1bug in 1.1 kitTOOK::CALLANDERJill Callander DTN 226-5316Fri Jul 12 1991 12:1112
We recently found that there is a bug in the notification system that occurs
when the event manager gets an event overflow condition. The AMs and
FMs are using different internal codes for specifying this "condition",
which is causing the PMs problems. The only thing that concerns me
about your seeing it is that you didn't mention anything about getting
a large number of events. Would you have any guesstimate as to how
many alarms have fired (total) since the PM was started. Also could
you try starting the FCL and leaving that up for a few days as well
with the two NOTIFY DOMAIN xxx commands running and see what happens
there as well (do they die at the same time if started at the same time).

thanks for the input more when we know something.
1226.2more info .....MFRNW1::SCHUSTERKarl Schuster @MFR Network ServicesMon Jul 15 1991 12:4214
    Just some more detailed info about the Problem:
    
    1. We dont use DECnet eventlogging at all - only synchronous polling
       with alarm rules.
    
    2. we get a lot of alarms fired, because some components are often offline
       ( in total we have > 10000 alarms fired within 3 days )
    
    3. today there occured another error: 
    	    notification being stopped for domain ...
    	    unknown DNS error
    
    Karl
        
1226.3question on batchTOOK::CALLANDERJill Callander DTN 226-5316Mon Jul 15 1991 18:3632
When a process makes a request of the event manger, that request remains active
unless requested to termiante. Killing a batch job will leave the reqeust
active with no recipient for the data, causing the manger to finally run out of
memory and terminate. To stope the event manager you must terminate ALL mcc 
sessions running, including those in batch. 

during regular operation of mcc please do not do "stop..." or "delete/ent..."
commands to terminate mcc batch jobs, especially if they are event sources
or requestors. I am not certain if this is what yoiu are doing, but it
will cause the overall results that you are seeing (this has been discussed
at length in a number of other notes in the conference). If you want
to run an alarms batch job, and restart it every morning, try something
like

.COM

enable rules
show mcc 0 all char, at start=(when you want to terminate procedure)
disable rules
exit mcc


Then run this .COM from inside a dcl procedure that just keeps looping
around running it (and example of a procedure to do this I believe can
be found using keyword TOOLS).

Could you try killing all mcc sessions (including batch jobs) and giving
a try to restarting your rules and event manager and see if your problem
goes away.

thanks
jill
1226.4more infoMFRNW1::SCHUSTERKarl Schuster @MFR Network ServicesThu Jul 18 1991 12:3017
Normally we do not $stop/id or $stop/entry of the alarmbatchjob, but it 
might have happened a few times during testing.
The alarmbatchjob is like:
	enable rules
	show mcc 0 alarm rule * all counter, in domain xyz,
		to file=xyz.log, at every 01:00:00 until 23:00:00
The job is restarted for the next day 05:00:00. The IMPM Windows remain 
permanently open.
What we did NOT do in the .com file is:  
	disable rules

I will change the .com file and add "disable rules", as well as terminate
Map Windows before starting and stopping alarm processes.

More info next week.

Karl
1226.5Disable is Automatic with normal shut-downTOOK::ORENSTEINThu Jul 18 1991 13:2812
    
    Is there a difference between MCC executing the command EXIT and
    MCC running out of commands to execute (the end of the com file)?
    I don't believe so.
    
    If there is no difference then the DISABLE RULES is done
    automatically.  The MCC exit handler will automatically send an Alert
    to all threads: this includes the threads in ALARMS that have the
    outstanding GETEVENT.  Once the thread is notified, it will become
    disabled (and hence the rule will become disabled).
    
    aud...
1226.6o.k.MFRNW1::SCHUSTERKarl Schuster @MFR Network ServicesMon Jul 29 1991 07:224
    Now, as I DISABLE the rules before the image exit, the error did not
    occur for 1 and a half week. It seems to be the solution.
    
    Thanks, Karl
1226.7another : Notification stopped ...HLRG02::SYSTEMIncredible but . . . not true .Wed Jul 22 1992 12:5524
Hi,

A customer has:
- VMS V5.4-3
- DECmcc V1.1.0

He creates some alarm rules and enables them.
When he enables notification notification than he gets the following error:

	Notification being stopped for domain ...
	%MCC-E-UNSUPP-OP, unsupported operation.

And he gets in the window where he starts mcc the following:

	%SYSTEM-F-ACCVIO access violation, reason mask=00, virtual
	address=0000000C, PC=000F0980, PSL=03C00004


Can anybody give me some help ? ?


Regards,

/-/	Henk.
1226.8any DNA5 entities in the domainTOOK::CALLANDERMCC = My Constant CompanionFri Jul 24 1992 16:486
There wea re few problems with DNA5 handling... without getting into the 
details first let's find out if they were using any. If not I would 
appreciate a list of the contents of the domain.

thanks
jill