[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

1194.0. "alarms on 'recorded' data??" by JETSAM::WOODCOCK () Thu Jun 27 1991 17:28

Hi,

Is there anyway I can write an ALARM for *past* time which has been
RECORDED?? I can't find it in the manuals specifically and if it is
supported I have failed at syntax! Actually MCC let me create an alarm
using the FOR qualifier in the SOT but when it was enabled it immediately
disabled with "scheduled time passed" error. I'm looking for alarms on
stats.

any help most appreciated,
brad...
T.RTitleUserPersonal
Name
DateLines
1194.1Try SHOW first. If that works we may have a bug in Alarms!WAKEME::ANILThu Jun 27 1991 19:5417
Hi Brad,

As usual you were the first to try Alarms on past data. I don't see 
why Alarms should not be able to handle the case. I did not document 
it just because I felt it would be a very difficult concept for a user to 
grasp and its usefulness was  questionable. 

Now that you have tried it when not so happy results, can I request you to
try it for just pure data and not stats. Also another question,
were you able to do a "show" on the data that you were trying to 
Alarm on?


Let us know your findings. 

- Anil Navkal

1194.2avoids pollingJETSAM::WOODCOCKFri Jun 28 1991 15:5750
Hi Anil,

Now that I know it *should* work I'll dig in and see what I find. As far
as its usefulness I've got some very real needs for it. One complaint of MCC
that I've heard is its complexity. I'm not sure if this should be a seperate
topic or not but I'm hoping MCC managers are listening. Please bear with me
as I get a little long winded as I throw real numbers out on the table.

There are a couple of issues at hand which MCC should try to improve. Number
of rules (ie. complexity) and amounts of WAN polling necessary to manage the
WAN. I know there have been some procedures to help write the rules but this
should probably be taken a step further to some sort of definable default
services for the different entity classes.

The numbers game...

I've got 50 routers and 80 circuits (reality) to manage. If I compare a
similar setup to what is used today to manage the net I come up with the
following amounts of rules and polling. This is very conservative and
nothing fancy.

A rule for each router (50) dealing with circuit outages (events, no polling).
A rule for each site (12) dealing with node outages (events, no polling).
A rule for each circuit (80) for off hours circuit monitoring each 15 min.
							(320 polls/hr)
Export or Record mainly *just* counters for each circuit (80) and each 
	router (50) only once an hour (130 polls/hr).
A rule for each circuit for utilization and error threshold as a warning.
	One for inbouond, outbound, and errors. 80 circuits 240 rules polling
	at hourly intervals. PA polls twice per interval therefore 480 polls/hr.
A rule for each circuit for utilization and error threshold as a problem.
	One for inbouond, outbound, and errors. 80 circuits 240 rules polling
	at hourly intervals. PA polls twice per interval therefore 480 polls/hr.
A rule for each router (50) for packet thruput each hour. 100 polls/hr.

Grand totals are in the neighborhood of 670 rules and 1510 
polls/hr...conservatively!!! No can do. I may not be able to get the number
of rules down but if I use past time for non real-time needs (ie. all stats)
I can reduce the polling by more than 1000 polls/hr.

While I love the versatility of this product it does come at a price. Other
large companies will also have to grapple with these numbers and look to cut
back on certain non-essentials to make the management managable.

In any event, I'll let you know of my successes with alarms for past times.

best regards,
brad...


1194.3prob parsing 'in domain ...'LUVBOT::MCCMon Jul 01 1991 14:0124
In doing a SHOW command the domain needs to be specified. But if I put the
"in domain" qualifier into the expression ALARMS doesn't appear to parse it
as needed. Bug or unsupported???


thanks,
brad...


create mcc 0 alarms rule past_count -
expression=(node4 bbpk01 cir syn-0 circuit down>0,for start 16:30 -
,in domain .pko-24),-
procedure=mcc_common:mcc_alarms_mail_alarm.com,parameter=mcc,-
in domain .pko-24
!
!MCC 0 ALARMS RULE past_count 
!AT 28-JUN-1991 16:50:43 
!
!Missing right parenthesis in alarm expression.
!
exit
!


1194.4Bot - bug and unsupportedTOOK::ORENSTEINMon Jul 01 1991 17:2222
>>> In doing a SHOW command the domain needs to be specified. But if I put the
>>> "in domain" qualifier into the expression ALARMS doesn't appear to parse it
>>> as needed. Bug or unsupported???
    
    
    Both.
    
    In investigating this, I found a bug in the parse routine for
    prepositions.  A QAR has been filed.
    
    Also, I discovered that ALARMS does not support the IN DOMAIN
    qualifier in expressions.  Currently there is no support for
    examining data on a domain basis.  A QAR has been filed.
    
    I saw your math that states that polling historical data could
    save you 1000 polls on your network, but I did have some trouble
    understanding that.  How important is this to you?
    
    I will see what can get done for V1.2, but I make NO promises.
    
    aud...
                                                         
1194.5It IS important!NSSG::R_SPENCENets don't fail me now...Mon Jul 01 1991 18:1632
    The savings in polling is very important. 1000 polls per hour
    translates to an average of 16 per minute. That will take a very big
    system to support in order to leave some resources to deal with
    and alarms testing true or exception handeling not to mention
    any management actions initiated by people.
    
    Where is the savings? Well, for example...
    To export data on a node4, (router for example), the Ethernet line
    and circuit plus the 4 sync lines and circuits, I have to poll the
    router 15 times (maybe more?).
    
    The same sort of number comes up for the Historical Recording. I don't
    know what happens if you specify several partitions (Brad, you might
    want to make sure you record and then export characteristics too in
    case you want to do any external reporting that needs line speeds).
    
    Then, if we want to have alarm rules for % utilization inbound and
    outbound plus errors, we add another 15 polls.
    
    That adds up to 45 polls per router for each time we want all this
    stuff. If the Historian could record it all with a minimum number
    of polls and then export and alarms use the recorded data the polling
    could perhaps be reduced from 45 to 10 or less.
    
    All the RFIs and RFPs I am seeing these days on Network Management
    are actually asking us what the traffic level that the management
    system will add to the network is. We need to be able to minimize
    that traffic.
    
    Hope this clears it up.
    
    s/rob
1194.6suggestionsJETSAM::WOODCOCKMon Jul 01 1991 19:0838
Hi, 
    
>    The same sort of number comes up for the Historical Recording. I don't
>    know what happens if you specify several partitions (Brad, you might
>    want to make sure you record and then export characteristics too in
>    case you want to do any external reporting that needs line speeds).
 
Actually, I was planning on getting LINE characteristics only once a day for
each circuit to handle reports. Like I said, the numbers were conservative.
   
As far as what's needed, I'm going to look for different approaches to get the
job done with V1.1. I'll probably end up scaling back info (one threshold for
errors rather than two) and hack something together that partially uses MCC.
But I'm willing to bet big bucks if our customers understood the mechanics
they won't be happy.

Suggestions, I've got three:

1. Bring in the support for ALARMS handling historical (in specific domains)
   data. And fix the parsing bug. VERY IMPORTANT.

2. Change the way PA operates today. This was a previous suggestion but worth
   mentioning several times :-). Rather than having PA poll at the beginning
   and end of each interval, have PA poll once each interval and subtract
   last_counters from present_counters for calculations (also holds true for
   reports). This solves two problems. It effectively reduces the number of
   polls by half. Also, as the polling interval decreases MCCs accuracy becomes
   more dependent on both system and network performance with todays method
   because the polling must be accurate. If you use the one poll method the 
   polls could be off but the stats are always on the money. A must in my
   opinion.

3. As a bonus set up a utility which handles default (user definable) services
   for different entity classes (ie. alarms, stats). This hides some of the
   complexity of the management environment.

best regards,
brad...
1194.7I have a dream...WAKEME::ANILTue Jul 02 1991 15:4047
Hi Rob and Brad,

Thanks for the valuable data about number of rules needed to manage
a reasonable size network. We will look *very* seriously to provide
the domain support in rule expression but we also have face the
reality of available (or lack there of!) people power.

Talking along the lines of suggestions, from the users point of view
the following thought makes a hell of a sense:

	Record the following attribute for entity foo every
	1 hour and
	by the way let me know if the attribute cross the thresholds 
	indicated them. 

	List Attribute partitions to record
	
	o Characteristics
	o Counters
	o Status

	List of attributes 	threshold values		Change 
	for thresholds		upper bound  	lower bound     from 	to
	
        aaa                      10             20                               
	bbb			 30		40	
        ccc                                                     Enable   Disable 
	ddd                                                     router	 non router
	eee	                 15.5           20.5
	


   Yes I now I am dreaming for now. But I do want to make two points.
1. There is no reason why we can not evaluate the data as it is being
   collected. Yes that does mean, Alarms and Historian have to communicate
   a lot! But look at the advantage. We need not poll twice for the 
   same data, nor do we have to wait for the data to be in the MIR.

2. A very simplified user interface that does not need 100 rules to
   monitor 100 attributes! Thus saving on the resources.

  I know all this is hind sight. I only hope it becomes a foresight
  for the future!!

	- Anil Navkal   


1194.8Ain't that the truthTOOK::ORENSTEINTue Jul 02 1991 16:466
    
    Now that sounds like true integration of Network Management
    Products!
    
    aud...
    
1194.9YupNSSG::R_SPENCENets don't fail me now...Tue Jul 02 1991 16:559
    Anil, exactly...
    
    And add to it integration of Export as well.
    
    Seems like there should be a "data gatherer FM" that gets called for
    entity data and by using fuzzy logic it could reduce the network
    traffic needed for management by consolodating requests for data.
    
    s/rob
1194.10Deja vu!DFLAT::PLOUFFEJerryTue Jul 02 1991 18:0718
  > Seems like there should be a "data gatherer FM" that gets called for
  > entity data and by using fuzzy logic it could reduce the network
  > traffic needed for management by consolodating requests for data.

  This is exactly what is needed.  We used to call this a "subscription 
  service" and it was talked about many moons agos.  I'm glad to see it
  brought back to light.  Hopefully Brad's numbers will provide the 
  necessary justification.

  We did not call it an FM , we called it a "service" since we thought of it 
  as being part of the IM.  After all, the IM handles all scheduling of 
  operations (including SHOWs) so it possibly could implement the "fuzzy
  logic" that you mentioned.  

  Whatever the design, it certainly seems to be necessary...

                                                                    - Jerry
1194.11A couple of solutionsTOOK::ORENSTEINTue Jul 09 1991 16:2528
    
    Back to the original topic:  Can ALARMS do rules on historical data?
    
    re .3
    
    There are two possibilities for allowing this:
    
    1. Let the user decide:
    
    As you did in your example, we could allow the IN DOMAIN qualifier in
    the expression -- the bug you found could be fixed?   But this may be 
    confusing because you could be in domain A and have rules on data 
    recorded from domain B.
    
    2. Make it transparent:
    
    The domain in which the rules are ENABLED could be used for
    determining the domain of the recorded data.  In this case, the IN
    DOMAIN qualifer will not be allowed in a rule expression; but, when
    using the MAP everything will be transparent since the domain
    is implicit on every comman.  This means that you can be in Domain A 
    and any rules that you Enable will only watch entities in Domain A.
    
    I prefer possibility 2.
    
    Feedback?
    
    aud. ..
1194.12either method okJETSAM::WOODCOCKTue Jul 09 1991 17:216
I think method two would be sufficient for our needs. Although someone down
the road might find uses for the first method depending on the domain structure
and how they intend to use it.

regards,
brad...
1194.13Clarificatin on .-2WAKEME::ANILWed Jul 10 1991 11:2926
Before anyone jumps at us I would like to clarify the following point:

>    2. Make it transparent:
>    
>    The domain in which the rules are ENABLED could be used for
>    determining the domain of the recorded data.  In this case, the IN
>    DOMAIN qualifer will not be allowed in a rule expression; but, when
>    using the MAP everything will be transparent since the domain
>    is implicit on every comman.  This means that you can be in Domain A 
>    and any rules that you Enable will only watch entities in Domain A.
                                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

     What Alarms will do is issue a SHOW/GETEVENT directive with
     IN_Q filled in. The MM then can choose to ingore the IN_Q
     qualifier and provide the information for the entity. What
     this means is that if trhe node foo in *not* a member of DOMAIN 
     A, Alarms will still get the data to evaluate the rule
     as long as past timne has not been specified. 

    If you do specify past time you will have to have the Historian
    collected the data for the node foo which will then in turn
    will have to be the member of Domain A! (Boy is it complicted!!)

    Hope this helps. ;)

   - Anil