[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

401.0. "Alarms internal error" by BCAT::CSENCSITS () Wed Oct 10 1990 21:00

I am a very new user to DECmcc and the ALARMS. Checking the documents and this
note file, I do not see this mentioned. The log files are being created when
the node is reachable. Did it about 10 times, then had this error appear in
MCC_ALARMS_10-OCT-1990_ERROR.LOG;1

 >>> 10-OCT-1990 11:42:30.70   MCC 0 ALARMS RULE ywolh1
     Expression = (NODE4 YWOLH1 STATE = ON, AT EVERY 00:15:00)
     Status     = Alarms internal error
MCC 0 ALARMS RULE ywolh1
AT 10-OCT-1990 11:42:30

When the above appeared, the Exception Handler routine did not get called.
Have not found this error documented. Any ideas??

Here is what the alarm is. I am using it somewhat reversed. The procedure
basically purges the .LOG file on successful connects. The Exception handler
send me mail if the connection fails. Eventually it will page me.

                        
MCC> show   MCC 0 ALARMS RULE YWOLH1 all char
MCC 0 ALARMS RULE YWOLH1
Characteristics
AT 10-OCT-1990 15:51:53


Examination of attributes shows:
                              Procedure = USER:[MCC]MCC_DECALERT_ALERT.COM;1
                      Exception Handler = USER:[MCC]MCC_DECALERT_ALERT_EXCEPTION
                                          .COM;2
                            Description = "Node YWOLH1 appears to be
                                          unreachable, Please investigate"
                               Category = "Node Unreachable"
                              Parameter = "YWOLH1 not reachable"
                             Expression = (NODE4 YWOLH1 STATE = ON, AT EVERY
                                          00:15:00)


T.RTitleUserPersonal
Name
DateLines
401.1Can't open data fileTOOK::ORENSTEINThu Oct 11 1990 13:3926
    Hi,
    
    I regret that this message was not put into the release notes.
    
    When a rule fires or an exception occurs, a data file is created and
    the parameters to the command procedure (P1 through P7) are written
    to this datafile, along with the Severity entered on the command line.
    The name of the datafile is MCC_ALARMS_DATA_<numbers>.DAT       
    
    If any error occurs in opening this datafile, the error ALARMS INTERNAL
    ERROR is reported.  Better error messages are on the way in a future
    release and each will be documented.  
    
    I suspect (since the rule fired about 10 times before this message
    appeared) that you ran out of disk space or file quota.
    
    Depending on which version of ALARMS you are running, this file will
    be located in MCC_COMMON or SYS$SCRATCH.  Also depending on the version
    the datafile may be automatically purged.  The older version places the
    file in MCC_COMMON and does NOT purge - the new version places the file
    in SYS$SCRATCH and DOES purge.
    
    Hope this helps...
    
    	aud...
      
401.2Alarms internal errorBCAT::CSENCSITSThu Oct 11 1990 18:28104
>>>
>>>    I suspect (since the rule fired about 10 times before this message
>>>    appeared) that you ran out of disk space or file quota.

Don't think that is the problem since I am the only person on this system 
with 2 RA81's, full priv and plenty of space (547k blocks)

I have seen the files you mentioned. 

In testing right now: WETDRY is my 3100 in normal state

MCC> show MCC 0 ALARMS RULE WETDRY_STATE all char
MCC 0 ALARMS RULE WETDRY_STATE
Characteristics
AT 11-OCT-1990 13:10:06


Examination of attributes shows:
                              Procedure = USER:[MCC]MCC_DECALERT_ALERT.COM;2
                      Exception Handler = USER:[MCC]MCC_DECALERT_ALERT_EXCEPTION
                                          .COM;4
                            Description = "Node WETDRY appears to be
                                          unreachable, Please investigate"
                               Category = "Node Unreachable"
                              Parameter = "WETDRY not reachable"
                             Expression = (NODE4 WETDRY STATE = ON, AT EVERY
                                          00:01:30)
 
MCC> show MCC 0 ALARMS RULE WETDRY_STATE all statu
MCC 0 ALARMS RULE WETDRY_STATE
Status
AT 11-OCT-1990 13:10:30


Examination of attributes shows:
                                  State = Enabled
                               Substate = Running
                Time of Last Evaluation = 11-OCT-1990 13:09:26.58
              Result of Last Evaluation = True
MCC>

In my account is MCC_DECALERT_ALERT.LOG which shows the Procedure being 
called and this statement in it:
   DELETE SYS$SCRATCH:MCC_ALARMS_DATA_13105675.DAT;"

Everything is normal.

Now I RESTRICT WETDRY (ncp set exec state restric)
 
Only change is the  Result of Last Evaluation = false. No new files created.

Now will stop access to WETDRY. (ncp set NML proxy none)

First pass shows: 

MCC> show MCC 0 ALARMS RULE WETDRY_STATE all statu
MCC 0 ALARMS RULE WETDRY_STATE
Status
AT 11-OCT-1990 13:21:20


Examination of attributes shows:
                                  State = Enabled
                               Substate = Running
                Time of Last Evaluation = 11-OCT-1990 13:19:56.24
              Result of Last Evaluation = Error
                        Error Condition = "Access control information invalid
                                          at Node

Additional file appear in my directory. It was _Exception.log. Shows my exception 
handler got called.

Then in 1:30 min looked again:

MCC> show MCC 0 ALARMS RULE WETDRY_STATE all statu
MCC 0 ALARMS RULE WETDRY_STATE
Status
AT 11-OCT-1990 13:21:30


Examination of attributes shows:
                                  State = Disabled
                               Substate = Disabled by error condition
                           Disable Time = 11-OCT-1990 13:21:27.91
                Time of Last Evaluation = 11-OCT-1990 13:19:56.24
              Result of Last Evaluation = Error
                        Error Condition = "Access control information invalid
                                          at Node


Now in the MCC_COMMON directory this appeared:

DSNMCC::CSENCSITS$ typ USER:[MCC]MCC_ALARMS_11-OCT-1990_ERROR.LOG;1
 >>> 11-OCT-1990 13:21:26.27   MCC 0 ALARMS RULE WETDRY_STATE
     Expression = (NODE4 WETDRY STATE = ON, AT EVERY 00:01:30)
     Status     = Alarms internal error
MCC 0 ALARMS RULE WETDRY_STATE
AT 11-OCT-1990 13:21:26
                           
Ihope this help and not confuse. It appears to be an access problem to the node
which caused the internal error. When this happens neither my procedure or
exception is called.

(normally I would not test a node every 1:30 but it shows fast errors this way.)
401.3What happens on subsequent polls?GOSTE::CALLANDERMon Oct 15 1990 17:479
    Aud,
    
    could this be due not to the firing of the exception handler but
    due to the next poll. When it attempts to go back to the DNA4 AM
    after the exception to show the attributes again, you will get a
    DNA4 exception because the node isn't there (something along the
    lines of node does not exist or is not known to local node). In
    this case what will alarms do?
    
401.4ALARMS handles multiple exceptionsTOOK::ORENSTEINMon Oct 15 1990 20:0847
    ALARMS has no knowledge of, and does not care, whether an exception 
    is the first, second or the Nth.  An exception is an exception is an
    exception.
    
    The algorithm ALARMS uses to evaluate rules is the following:
    
    GET DATA:
    
       This polls the entity for its attributes.
    
    CHECK DATA:
    
       If a REPONSE is returned: 
          (EVALUATE (see below))
    
       If an EXCEPTION is returned: 
          (write to datafile - this is where ALARMS_INTERNAL_ERROR can happen)
          (queue to the batch queue the user's exception handler)
          (If handle on data is MORE (for more polls) go to GET DATA)
          
    EVALUATE:
      
       If more data is needed to evaluate the rule (CHANGE_OF perhaps)
          GET DATA.
       If expression evaluates to true:
          (write to datafile - this is where ALARMS_INTERNAL_ERROR can happen)
          (queue to the batch queue the user's command procedure)
          (If handle on data is MORE (for more polls) go to GET DATA)
          
    Yes, ALARMS does keep counters of how may excpetions have occurred,
    but this is only to help the user to see what's going during
    evaluation.
    
    As I have mentioned before, an ALARMS_INTERNAL_ERROR can only be
    produced if the data file can not be opened.  Two things can cause this
    to happen: 
    
    1.  A bug in ALARMS corrupts the file name so that it is not a legal
        VMS file specification.
    
    2. Something in the users' environment is preventing the file from
       being opened.
    
    We are investigating this with the author and will report back with
    the problem (soution).
    
    aud...
401.5Alarms Internal Error bug .. Fixed !!WAKEME::ROBERTSKeith Roberts - DECmcc Alarms TeamTue Oct 16 1990 18:0812
With Johns help, we have found & corrected the Alarms-Internal-Error.

The next DECmcc kit will contain the fix - if anyone else experiences the
"Alarms Internal Error" with this kit (x1.0.1) -- then please send me
mail.

Thanks,

Keith Roberts
WAKEME::ROBERTS
(dtn) 226-5394
401.6EXCEPTION not liked on ALARMS creationADO75A::SHARPEC is bliss?Wed Nov 07 1990 21:0813
    What syntax do I use to register the alarm ... The version of DECmcc
    that I am using declares itself to be: DECmcc (X1.1.0).
    
    However, it will not accept an exception statement on a 
    create mcc 0 alarms ... command ...
    
    How do I do it, or what have I done wrong (including installation
    errors)?
    
    I am trying to detect when a node has gone down.
    
    Regards
    Richard Sharpe
401.7Belay that last request, me hearties!ADO75A::SHARPEC is bliss?Wed Nov 07 1990 21:277
    Enter stage left with sheepish look on face.
    
    I looked through the command procedures in mcc_common: and found the
    syntax. Seems like it should be "exception handler", not just exception.
    
    Regards
    Richard Sharpe
401.8EX is ambiguos -- common error as wellGOSTE::CALLANDERMon Nov 26 1990 17:375
    actually to be real clear, a problem  I have seen a few times is
    that people abbreviate the argument to EX which is ambiguous in
    the alarms syntax because of EXpression and EXception handler.