[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]
Conference azur::mcc

Title:	DECmcc user notes file. Does not replace IPMT.
Notice:	Use IPMT for problems. Newsletter location in note 6187
Moderator:	TAEC::BEROUD

Created:	Mon Aug 21 1989
Last Modified:	Wed Jun 04 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	6497
Total number of notes:	27359
3438.0. "Success Stories for V1.2" by CTHQ3::WOODCOCK () Mon Jul 27 1992 15:55

Hi there,

Just thought it might be helpful to start a success note to help the sellers
of MCC. I seem to be asked somewhat regularly how we get the job done so I
wrote this up. Anyone with a success story might help the cause with a reply
regardless of whether the technical description is included. At times any 
support conference can get a little 'tense' because the focus is always on 
what's broken rather than what works. For the record:

	**************** WE'VE NEVER HAD IT SO GOOD ****************

        ************************* THANKS ***************************

Sincerely,
brad...
 

............................................................................

ESC DECmcc V1.2 Implementation Overview

Brad Woodcock
ESC Consulting
last revision 7/27/92


INTRODUCTION:

The following writing describes the DECmcc V1.2 implementation for the
Enterprise Service's Center (ESC) of Digital Equipment Corporation. Its intent
is to describe the implementation textually for use as a learning/marketing
device for others.

DECmcc V1.2 provides for an integrated approach to managing objects using the
EMA platform. This approach has lead to this product being very granular for
managing multiple objects in similar fashions. At times granularity can be
perceived as complexity, I view it as functionally rich based on our success
with the product. Granularity in the context of this product results in 
multiple methods of managing objects dependent upon the business needs.

This implies that the ESC solution and methods desribed below are not the only
available options but only one which meets specific ESC management needs. In
fact, the ESC's use of any managment product is simplistic due to the size of
the overall management environment. As ESC overall management needs change 
so does the implementation of DECmcc.


ENVIRONMENT:

DECmcc's primary use within the ESC is mainly in the Data Networks arena. These
networks include the direct monitoring and management of DEC's internal
network backbone (BB) structures for DECnet_IV, TCP/IP, and WATN (Wide Area 
Terminal Network) within the U.S. Future needs to be evaluated include OSI 
routers as the network transitions, PBXs, and direct X25 product management. 
The following is a breakdown of each protocol's direct management needs from 
the ESC today:

	DECnet:  ~70   DECnet_IV routers (DECrouter 2000s)
                  24   Load Hosts (micro-vax's)
                 ~80   DECnet circuits (BB->BB  &  Regional connects into BB)

 	TCP/IP:  ~10   IP routers (Wellfleet)
                 ~15   IP circuits (BB->BB)

          WATN: ~100   WATN hosts

The DECmcc platform includes an 8810 VMS V5.4-3 multi-tasked system for the
monitoring of all networks. This system is in use for historical reasons as it
has always been used for network monitoring; the monitoring applications have 
simply been extended/migrated to DECmcc. Graphical workstations are used as 
display devices for managing network entities and maps from DECmcc on the 8810.

Entity registration is accomplished using a private DECdns namespace 
implemented directly on the 8810 system. The use of a private namespace is in
part a decision of history also. The ESC implementation dates back to 1990 
where issues of security/scalability/administration prevented DECmcc's use in 
the corporate namespace. These issues have been resolved and 
evaluation/transition of the ESCs use of the corporate namespace with DECmcc 
will be done as time permits. 


DOMAIN and MAP DESCRIPTIONS:

There are two map structures in use today; one for DECnet/IP and one for WATN.
The basic DECnet/IP structure's top level map (named .WORLDBB) contains 17 
second level domains (named <site_code>-<decnet_area>). The WORLDBB map has
a backdrop of the US, Western Europe and a blow-out of New England. This was
created using AUTOcad then converted to the DECmcc format using an internally
developed tool found in the NOTED::MCC conference. The domains are 
geographically placed on the map with lines between them as NODE4 CIRCUIT child
entities. The global node4 entities are also placed within this domain but not
in view of the map (off to the side). This enables the lines to change colors
with alarms and events. All IP routers are also placed within a section of the
map with lines drawn as SNMP INTERFACES enabling color changes also. Site 
domains and IP routers are depicted with self created icons using DECpaint.

Within 12 of the 17 second level domains reside a pictorial of connectivity
showing all DECnet routers and load hosts managed by the ESC within the 
appropriate US sites. Viewing the WORLDBB map gives a status of the entire US
DECnet and IP backbones, including all DECnet regional and international
connectivity into the US backbone.

The WATN network is a public based Tymnet network using Tymnet nodes and
Xyplex based hosts. The top level domain (.WATN.WATN) contains 16 domains
with names reflecting Tymnet nodes across the US. Each domain contains a data
collector icon and a series of reference entities representing logical hosts
being monitored for availability/status.


DECnet/IP MONITORING:

We currently have two methods for DECnet monitoring (events and polling). We 
use DECnet events (4.7 & 4.10) for updating the map only (batch activitiy 
would be too resource intensive for this size environment). DECnet events are
'sinked' from all routers to the DECmcc node. The DECmcc node is also set up
as a local sink using the MCC_DNA4_EVL process. A NOTIFY request is done at 
the top level and individual TARGET commands are done for each domain for 
updating lines on the map in real time.

	Notify Command (worldbb only):

	Notify Domain .worldbb Entity List = (node4 * circuit *), 
        Events = (circuit down circuit fault,circuit up)

The above command is saved and recalled by MCC at every map startup. Both 
events are put into the same request so they will CORRELATE and change color
properly to reflect the status of links. The TARGET commands are used to 
define the severity (color) for both events for EACH domain.

	Target Commands (each domain):       

	EVENT SOURCE           EVENT                            TARGET SEVERITY
	-----------------------------------------------------------------------
	node4 * circuit *      circuit up                       clear
	node4 * circuit *      circuit down circuit fault       critical


The second method for monitoring is polling alarms. Because we use events for
real time updates to the map polling is only set to every 30 minutes as a 
'backup' and off-hour monitor. For DECnet circuits the following alarm is used 
for each domain. Note that all routers only reside in ONE domain at the second
level so each router is only polled once. All circuits not in use are in the 
OFF state. When these rules fire they update log files, send mail and call 
DECalert (paging/voice) once for contiguous link outages. The alarms also 
change the map color.

	DECnet ALARM rule (12, one per domain):

	MCC> show domain .pko-24 rule * all char

	Domain LUVBOT_NS:.pko-24 Rule poll_PKO-24
	AT  6-JUL-1992 11:38:00 Characteristics

	Examination of attributes shows:
                     Alarm Fired Procedure = DISK$MCC:[MCC.COM]CKT_DOWN.COM;1
                 Alarm Exception Procedure = DISK$MCC:[MCC.COM]NODE_DOWN.COM;1
                               Batch Queue = "mcc$batch"
                                Expression = (node4 * circuit * substate <>
                                             none,at every 0:30:0)
                                  Severity = Critical
                            Probable Cause = Unknown

The ESC will also implement a rule for watching CIRCUIT DOWNS for better
management of bouncing circuits.

The following rule is used for IP circuits. Note that wildcarding cannot be
accomplished with IP because unused backup circuits are still seen with an 
ifOperStatus of DOWN. Therefore an alarm for each circuit (15) is needed.

	IP ALARM rule:

	MCC> show domain .worldbb rule PALO_ALTO_W_5 all char

	Domain LUVBOT_NS:.worldbb Rule PALO_ALTO_W_5
	AT  6-JUL-1992 11:40:52 Characteristics

	Examination of attributes shows:
                   Alarm Fired Procedure = DISK$MCC:[MCC.COM]CKT_DOWN.COM;1
               Alarm Exception Procedure = DISK$MCC:[MCC.COM]NODE_DOWN.COM;1
                             Batch Queue = "mcc$batch"
                              Expression = (snmp LUVBOT_NS:.PALO_ALTO_W inter 5
                                           ifoperstatus=down,at every 0:30:0)
                                Severity = Critical
                          Probable Cause = Unknown

The command procedures used by the alarm rules are multi-purpose as indicated
above. There are two different log files written to by these procedures; one
for the current month and the other for the current half-hour. Each have time
stamps built into the filename. In many instances we are polling both ends of
a DECnet circuit because of the wildcarding used in the alarms. The alarm 
procedure checks a configuration file to ensure the circuit is only reported 
thru batch from one end. Also, if the same circuit entry is present in the 
last half-hour log file, calls for mail and DECalert are not issued. This 
saves on unnecessary continuous mail for any long outages. A menu driven 
procedure is then used to give the status (current, daily errors, monthly 
errors) of all the networks using these log files while off-hours within 
seconds. This procedure can also be 'launched' from the map application 
pull-down. 


WATN MONITORING:

The WATN monitor was built using data collectors. An X.25 connection into a
Tymnet node allows us to retrieve related events for this network. A program
was written which excepts these events and sends them to DECmcc using data
collector functionallity. The MCC_EVC_SINK process is run for this purpose. 
A notify command is issued automatically at each map startup:

	Notify Domain .watn.watn Entity List = (collector *), 
	Events = (any event)

This allows for HOST status to be reflected on the map as they become
unavailable/available on the network and is used for Tymnet vendor management.
An alarm rule was also written for sending mail, calling DECalert and updating
log files. The network status can also be determined with the same menu driven
command file described above providing for complete integration of network
status for all networks. The alarm rule is as follows:

	MCC> show mcc 0 alarms rule watn_host_status all char

	MCC 0 ALARMS RULE watn_host_status
	AT 24-JUL-1992 16:58:59 Characteristics

	Examination of attributes shows:
                             Procedure = DISK$MCC:[MCC.COM]WATN_HOST_STATUS.COM
                                         ;1
                                 Queue = "mcc$batch"
                            Expression = (occurs(collector * any event))
                    Perceived Severity = Indeterminate
                        Probable Cause = Unknown

An additional alarm rule is used to verify the operation of the DECmcc system
itself and is meant to fire each half hour interval. It updates the map and 
places an entry into the current half hour status log file. The 
characteristics are as follows:

	MCC> show domain .worldbb rule test_poll all char

	Domain LUVBOT_NS:.worldbb Rule test_poll
	AT 27-JUL-1992 11:07:53 Characteristics

	Examination of attributes shows:
                   Alarm Fired Procedure = DISK$MCC:[MCC.COM]CKT_DOWN.COM;1
               Alarm Exception Procedure = DISK$MCC:[MCC.COM]NODE_DOWN.COM;1
                             Batch Queue = "mcc$batch"
                              Expression = (node4 LUVBOT buffer size>500,at
                                           every 0:30:0)
                                Severity = Minor
                          Probable Cause = Unknown

All the above alarm rules for all networks are enabled within a single batch 
process. Rules are enabled within the process on a per domain basis with a 
minute and a half wait statement for allowing the wildcarded rules time to 
execute and to spread the load. This alarm's process shuts down at midnight 
each night and restarts automatically.


DECnet/IP METRICS:

Availability metrics are derived using the monthly log files created by the
alarms fired during a given month. A procedure which determines the number of 
circuits, days in the month, and poll rate then searches the log file for
numbers of errors. The procedure then calculates availabilty for routers and
circuits for both DECnet and IP.

The other major concern for metrics is circuit utilization. These metrics are
used for current performance issues and also long term trend analysis for
upgrade assessments. RECORDed information for all DECnet and IP circuits are
set up as follows:

	DECnet circuit counters: hourly
	   DECnet circuit char.: daily
       	      DECnet line char.: daily

	  IP interface counters: hourly
	    IP interface status: daily

These are the minimally required attributes recorded for DECmcc to calculate
circuit/interface statistics. Hourly statistics available using DECmcc commands
with this setup is sufficient for 99% of all current performance issues being
analyzed on these circuits.

Long term trends and upgrade assessments require a roll-up process for 
filtering the data into less information for practical purposes. Daily and
monthly averages are required for ESC needs. The use of 7x24 data has been
deemed ineffective over the years and therefore a 5x8 approach has been taken.
Focusing on 5x8 is more practical because this timeframe is more user sensitive
(read: interactive) to performance issues. 7x24 tends to 'water down' averages
and potentially cover up problems until it's too late (eg. the phone rings).

Procedures have been developed which provide this process in an automated
fashion requiring as little interaction as possible. A procedure is self-run 
each night producing overall utilization and congestion figures for each 
circuit for the 'working' hours (M-F). A single entry is made into a file for
each circuit each night. A monthly process is run which averages these numbers
into a single entry for the entire month into a seperate file. The resultant of
this method are two files for each circuit being managed: one which holds an
entry for each working day's averages and another containing the monthly 
averages of all working days. Graphs can be created on an individual basis for
any circuit for either daily or monthly data. Although graphs can be produced
this is not automated to produce a graph each month for each circuit due to the
scope of the management environment.

Future implementations will most likely be converted to using EXPORT and Rdb
features.
T.R	Title	User	Personal Name	Date	Lines
3438.1		CSOADM::ROTH	I'm getting closer to my home...	`Wed Jul 29 1992 12:59`	3
	Thank you for this informative post! Lee