[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

1108.0. "Events happen to quick maybe !" by SNOC01::MISNETWORK (They call me LAT) Fri Jun 07 1991 05:52

    I have a problem that may be related to what was said in 807, but I am
    not sure. 
    
    I have alarms set up as follows -
    
    create mcc 0  alarms rule SNO-ZKO-DECnet-DOWN -
    expression=(occurs(node4 SNOR04 circ zko-1 circuit down)), -
    procedure=disk$userdisk:[tassone.mcc]sprnet_decnet_alarms_broadcast.com;,-
    parameter="user=(tassone,engineer)",queue="MCC_ALARMS$BATCH",-
    category="SNO-ZKO-DECnet-DOWN",-
    description="SNOR04 Nashua DECnet circuit has gone down",-
    exception
    handler=disk$userdisk:[tassone.mcc]SPRNET_ALARMS_BROADCAST_EXCEPTION.COM;,-
    perceived severity=critical, in domain nashua
    
    create mcc 0  alarms rule SNO-ZKO-DECnet-UP -
    expression=(occurs(node4 SNOR04 circ zko-1 circuit up)), -
    procedure=disk$userdisk:[tassone.mcc]sprnet_decnet_alarms_broadcast.com;,-
    parameter="user=(tassone,engineer)",queue="MCC_ALARMS$BATCH",-
    category="SNO-ZKO-DECnet-UP",-
    description="SNOR04 Nashua DECnet circuit now back up",-
    exception
    handler=disk$userdisk:[tassone.mcc]SPRNET_ALARMS_BROADCAST_EXCEPTION.COM;,-
    perceived severity=warning, in domain nashua
    
    Both of these alarms are enabled. Today we had a problem with this
    circuit which saw the thing bounce a number of times as the following
    getevnts showed -
    
    Node4 59.823 Circuit ZKO-1
    AT  7-JUN-1991 16:45:57 Any Event
    
    Successfully received events:
    Circuit down, circuit fault
                                     Reason = circuit synchronization lost
                      Adjacent Node Address = 2.10
    
    Node4 59.823 Circuit ZKO-1
    AT  7-JUN-1991 16:45:57 Any Event
    
    Successfully received events:
    Circuit up
                      Adjacent Node Address = 2.10
    
    and a repl/enab=net showed -
    
    $
    %%%%%%%%%%%  OPCOM   7-JUN-1991 16:45:56.97  %%%%%%%%%%%
    Message from user DECNET on SPRNET
    DECnet event 4.7, circuit down, circuit fault
    From node 59.823 (SNOR04),  7-JUN-1991 16:47:45.14
    Circuit ZKO-1, Line synchronization lost, Adjacent node = 2.10
    
    
    $
    %%%%%%%%%%%  OPCOM   7-JUN-1991 16:45:57.32  %%%%%%%%%%%
    Message from user DECNET on SPRNET
    DECnet event 4.10, circuit up
    From node 59.823 (SNOR04),  7-JUN-1991 16:47:46.12
    Circuit ZKO-1, Adjacent node = 2.10
    
    Unfortunately, only the alarm saying the circuit had come up fired, I
    don't know why the circuit down alarm didn't fire. Normally it works
    fine , this has only happened now that the circuit down and circuit up 
    occur so close together!
    
    Is this related to what is said in the release notes -
    "If too many occurences are requested at one time it is possible that
    the event manager may not be capable of processing them all."
    
    I don't see the events lost message anywhere !
    
    Any ideas, 
    Cheers,
    Louis
    
    p.s. I also had the problem of no events firing, to fix this I
    disabled/enabled the local sink monitor. All worked fine after that.
    
T.RTitleUserPersonal
Name
DateLines
1108.1events aren't lostTOOK::CALLANDERJill Callander DTN 226-5316Mon Jun 10 1991 00:3118
    No, it doesn't look like the too many occurrences problem, and you
    areright you would have seen the events lost (or some other error)
    telling you that the manager lost events. As to why you didn't get
    it I don't know. Were you certain that the rule was active? Did you
    check to see if the exception handler had already fired or some other
    reason? 
    
    I don't know, but it sure looks like it should have worked. Maybe Jim
    Carey (PL for decnet AM stuff) could shed some like on how the event
    listener for MCC handles getting these, or Matt on how multiple
    requesters for the same event are handled. FYI -- Jim if he is doing
    what I think the decnet AM should have a getevent from the FCL and a
    getevent from alarms pending at the same time; could this cause
    problems in the AM?
    
    (just guessing)
    jill
    
1108.2That isn't a lost eventTOOK::GUERTINI do this for a living -- reallyMon Jun 10 1991 11:2123
    You only get lost events when the volume of events held in the Event
    Pool fills to capacity.  You never get lost events because events occur
    too fast (the putter just blocks).  The only time I've seen this kind
    of behavior is when the request was for the "Next" event (instead of
    a scope of interest), and the next request was too late to for the next
    occurrence.
    
    Something like:
    
          Getter Thread                  Putter Thread
          -------------                  -------------
      1) Request Any Next Event
                                     2) Event Occurs
      3) Return Event to Requestor
    
                                     4) Event Occurs
                                        (No requests match so it is thrown away)
      5) Request Any Next Event
         (Waits for Next Event)
    
    Note that Step 5 has missed the Event in Step 4.
    
    -Matt.
1108.3wrong eventNETCUR::WADEBill Wade T&N Course DevelopmentMon Jun 10 1991 12:417
    I had the same problem.
    
    If you look at the event it is #4.7 (circuit down circuit fault)
    but you are checking for event #4.8 (circuit down).  Not sure what
    would trigger a 4.8 event??
    
    bill
1108.4Thats it - no mysterySNOC02::MISNETWORKTake a byteTue Jun 11 1991 01:0731
    You found my problem Bill. It was pretty obvious in the end, I even
    documented it in a previous note 1093. 
    
    Trouble is, how can I check on DECnet outages without having to fire
    two alarms, one looking for circuit downs ( 4.8 ) and one looking for 
    circuit down circuit fault ( 4.7 ), not to mention circuit down
    operator initiated ( 4.9 ). 
    
    By the way , to get 4.8 to fire, simply turn a DECnet circuit off and
    you will get event #4.9 (circuit down operator initiated) on the local
    router as follows -
    
    %%%%%%%%%%%  OPCOM  11-JUN-1991 10:24:26.96  %%%%%%%%%%%
    Message from user DECNET on SPRNET
    DECnet event 4.9, circuit down, operator initiated
    From node 59.857 (SNLR01), 11-JUN-1991 11:19:32.22
    Circuit SNA-1, Line synchronization lost, Adjacent node = 59.430
    (SNAR01)
    
    and you will get event #4.8 (circuit down) from the remote router as
    follows -
    
    %%%%%%%%%%%  OPCOM  11-JUN-1991 10:24:52.56  %%%%%%%%%%%
    Message from user DECNET on SPRNET
    DECnet event 4.8, circuit down
    From node 59.430 (SNAR01), 11-JUN-1991 10:25:29.64
    Circuit SNL-2, Adjacent node listener receive timeout
    Adjacent node = 59.857
    
    Cheers,
    Louis