[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

1551.0. "%MCC-E-TRANSMITERROR & %SYSTEM-DEVINACT" by ANTIK::WESTERBERG (Stefan Westerberg CS Stockholm) Thu Sep 26 1991 10:41

Has anybody seen this alarm message:

Requsted operation cannot be completed
%MCC-E-TRANSMITERROR, error trying transmit
%SYSTEM-F-DEVINACT, device inactive

I get this alarm message 1-6 times a day from a alarms rule for one Translan
bridge 350 and from one LAN bridge 150 (Its always the same bridge). 
The bridges aren't located on the same network (Two different customers).

Any bright ideas ?
T.RTitleUserPersonal
Name
DateLines
1551.1Bright ideas...CHRISB::BRIENENDECmcc Bridge|Station|SNMP Management.Thu Sep 26 1991 12:5613
The errors you are seeing probably come from the MCC_EA routines (which
are used by Bridge AM, TransLAN AM, Ethernet AM, and Concentrator AM).

The type of error would indicate that the Target entity is not the problem,
but rather the Ethernet Host Port (guess: its being reset due to some error
condition).

Can you provide more information about the system these errors are
appearing on (e.g., what are the ethernet host ports? DELUAs? DEBNAs?
what type/version of vax system/vms software? Is the ethernet host
port being heavily used?)...

						Chris
1551.2They aren't member of any idle club !ANTIK::WESTERBERGStefan Westerberg CS StockholmThu Sep 26 1991 14:558
I seen this type of errors on a VAXstation 3100 M38 and VAX4000-300.
VMS version on both system is 5.4-2.

The load on 3100 is about 3 Export every 120s and 11 alarms every 60s and 
on the 4000 30 Export every 120s and 30 alarms every 30s.

/Stefan
1551.3Still problemsSTKMCC::LUNDFri Oct 18 1991 12:2618
Hello

We have patched the ezdriver on the 4000-300 (CSCPAT_0252) but the problem still
exsists. I have seen this on different sites and the problem seems to be
related to the load on the MCC host.
Is 30 Export every 120s and 30 alarms every 60s (not 30) to much for MCC on a
VAX 4000-300.
We are using the Translan AM for almost all alarms and exports, accessing the
bridges via 10 mbit Ethernet.

If the error message indicates a problem on the MCC host this should be
investigated.
No problems are seen on the MCC host ethernet line counters and VMS version is
5.4-2

Regards Niklas.
                    

1551.4"120s" seconds?TOOK::CALLANDERMCC = My Constant CompanionTue Oct 29 1991 18:4414
    how much stuff are you set up to record/export. All partitions
    or only some? and by 120s do you mean every 120 seconds
    (2 minutes), if so then that's 1 minute between rules and
    2 between exports...if that is the case it might be that the
    intervals are a bit close together, based upon the load on
    your net, load on your system, memory in the system, and what type of
    devices you are looking at.
    
    I will send this off to the Translan guy to see if he knows what the
    characterization of their AM is in regards to overhead/response time to
    show requests.
    
    thanks
    
1551.5try longer intervals; less system load.TOOK::MCPHERSONi'm only 5 foot one...Tue Oct 29 1991 19:1839
    The briefest exporting interval I've ever been able to make work (with
    the Translan AM) was 00:05:00.    Note that This was with *no* alarms
    outstanding and no other exporting going on.   

    I haven't done any sort of workload characterization of the Translan
    AM.  Has anyone else? 


MCC> set person doug opinion_flag = true

PERSON MCDOUG_NS:.Doug
    AT 29-OCT-1991 17:12:31 Characteristics

    Modification completed successfully.
                         opinion_flag = TRUE
MCC>
MCC> show person doug Personal Opinion

PERSON  MCDOUG_NS:.Doug
AT 29-OCT-1991 17:14:31 Characteristics

         "I'm not sure it's even worth it to try to export on that
          brief an interval (for long-term exporting) since you'll
          create Sagans and Sagans of attribute data records...  and
          your reporter (DTR32 or DECdecision) is going to have to
          dutifully plow through it..."

MCC> 
MCC> set person doug opinion_flag = false

BRIDGE KAJUN_NS:.br4
PERSON  MCDOUG_NS:.Doug
AT 29-OCT-1991 17:14:35 Characteristics

    Problem modifying attribute.
                             opinion_flag = TRUE

MCC> exit
1551.6get help now ;-)MKNME::DANIELETue Oct 29 1991 19:291
	You been doin' this too long Doug.
1551.7Customer aren't happy !ANTIK::WESTERBERGStefan Westerberg CS StockholmWed Nov 06 1991 19:1914
This is a very irritating exception that constantly lits up the screen.
And the customer aren't happy with that at all.

The load of 25 translan export with a 2 minute duration and about 120 alarms
with a duration spaning from 1 hour to 15s don't sounds to be a overwhelmy 
load for a 8 vups VAX 4000-300. Infact it seems that we have to increase the 
number of alarms close to 700. So this problem has to be solved if we are 
going to be able to trust that the alarms triggerd are live, not false !

Is there anybode else that have seen this type of behaviour ?

Need a fix for this very soon.

/Stefan
1551.8Did you try lengthening the EXPORT interval?MCDOUG::MCPHERSONMy object paradigm needs integration...Thu Nov 07 1991 15:0924
Did you try legthening the export interval on the Translans (as I suggested 
earlier)?   

Again, the shortest export interval that *I* was able to make work was 5 min.
That might help lessen the load on the ethernet interface somewhat. 

This is just a guess:

When you do an 
	NCP> sho known line count 
	NCP> show known circuit count
are you seeing a lot of "System buffer unavailable" or similar counters? 

If so, you _may_ be able to alleviate the problem by upping some sysgen
parameters (that elude me right now... Maybe lrpcount? srpcount?  dunno.  Help?)

I know you're looking for a *solution* and not more questions, but please try to 
work with us to isolate the problem.

If anyone else out there has any ideas, please feel free to chime in.

./doug


1551.9Some explanatonsSTKMCC::LUNDNiklas LundThu Nov 07 1991 17:4966
Hello Doug !

Thanks for helping us with this problem.

>>Did you try legthening the export interval on the Translans (as I suggested 
>>earlier)?   

No we haven't and that's just because this is a LIVE network monitoring system.
We are managing a big financial Value Added Network, with +30 Translan bridges
in it.
We must be able to detect problems, like broken lines, in the network within
30 Seconds. 
The utlization graphs that we produce daily on each 64 Kbps line should not
have polling intervalls bigger than 120s (60s for the most importent lines).

We are exporting all line attributes on 5 Translan bridges with +5 synch ports
active.
The exports are most of the time working well and we get an RDB database that
have the size of 23000 blocks each day.

The alarms are changed to START with 2 seconds "duration". 

Like this

Enable mcc 0 alarms rule bridge1_line2_nofwd, at start (+00:00:02)
Enable mcc 0 alarms rule bridge1_line3_nofwd, at start (+00:00:02)
Enable mcc 0 alarms rule bridge1_line4_nofwd, at start (+00:00:02)
.
.


Remember that we have seen these errors even on systems that have maybe 30%
of the above described load and that the AM's vary from Translan and Bridge to
Ethernet station AM. 
The problem are seen most on VAX 4000-300 systems. 

I have included two more error messages of the same type that shows up
as exceptions, but much less freqvently.
 
Exception:     The requested operation cannot be completed
	           %MCC-E-TRANSMITERROR, error trying to transmit a packet
                   %SYSTEM-F-DEVINACT, device inactive

Exception:     The requested operation cannot be completed
                   %MCC-E-RECEIVEERROR, error trying to receive a packet
                   %SYSTEM-F-DEVINACT, device inactive

Exception:     The requested operation cannot be completed
                   %MCC-F-STRTDEVERROR,  start Ethernet device failed.
                   %SYSTEM-F-BADPARAM, bad parameter value
 		   (This one is only seen when using Ethernet station AM)
		  	

>>When you do an 
>>	NCP> sho known line count 
>>	NCP> show known circuit count
>>are you seeing a lot of "System buffer unavailable" or similar counters? 
>>
>>If so, you _may_ be able to alleviate the problem by upping some sysgen
>>parameters (that elude me right now... Maybe lrpcount? srpcount?)

No problems are seen in line and circuit counters, no pool expansion either.
The load on the customers ethernet is 5-10% with peaks up to 30%


Regards Niklas
1551.10Snake eyes. sorry.MCDOUG::MCPHERSONMy object paradigm needs integration...Thu Nov 07 1991 18:0616
Whew.  I dunno Niklas.... From your description (and of course the meaning
of the error you're getting)  it doesn't sound like there's anything that can 
be done to help you form within the TRanslan AM. 

Unless you can get some flexibilty on the export interval, there's nothing 
further I can think of to help you.

I hope someone else can come up with something.

/doug.

P.S.    You do know that Digital must *purchase* the Translan AM for ALL USE
other than for use within DEC and for demo purposes, yes?   I trust that the
appropriate monies and licenses have changed hands for the usage of the Translan 
AM in this network, or we (Digital) are liable for breach of contract 
(among other things).
1551.11This needs to be hidden from the user.CHRISB::BRIENENDECmcc Bridge|Station|SNMP Management.Thu Nov 07 1991 19:0625
This error is relatively common (at least to us) when pounding on the
Ethernet device.

It has nothing to do with CPU utilization, and is not a "code bug" in
the MCC_EA Routines (they're just reporting what happens).

There are two possible solutions to the problem, both involve hiding
the problem from the user (and neither are easy to patch into V1.1):

  (1) Modify the MCC_EA Routines to do retries when encountering
      the device inactive error - this wouldn't be the first time we
      did something like this (e.g., there is special code in
      place which handles the DELUA "differently")

  (2) Tell AM developers to do the retries themselves if they don't
      want to bother the user with this information - the set of
      AMs using the EA routines is still fairly small, so this
      isn't as big of a deal as one would think.

We will be looking at which makes sense very soon. This decision will be
based partly on how long the fix would take to implement and the risk
associated with making the change (e.g., change in the MCC_EA at this
point is more risky than having the AMs do retries).

						Chris Brienen
1551.12Maybe we'll just _hide_ it next time..MCDOUG::MCPHERSONMy object paradigm needs integration...Thu Nov 07 1991 19:297
Thanks for the note, Chris.

That which we cannot fix, we hide.  Fair enough.
I'll add this to the Vitalink's engineering "To Do" list for the next 
release of the Translan AM. 

/doug
1551.13Filed as MCC_INTERNAL QAR#1335 [Priority 3]CHRISB::BRIENENDECmcc Bridge|Station|SNMP Management.Fri Nov 08 1991 17:580
1551.14PATCH ?ANTIK::WESTERBERGStefan Westerberg CS StockholmFri Nov 15 1991 07:489
When could we expect a patch for this problem ? 

At one customer site where we have aubout 700 alarms rule we get 10 to 32 
%MCC-E-TRANSMITERROR per hour !

A patch for this problem is badly needed !

/Stefan
1551.15Please re-read .11TOOK::MCPHERSONMy object paradigm needs integration...Fri Nov 15 1991 10:2621
I do understand your urgency, but I think Chris made it pretty clear:
The change would need to be made either 
	a) to the mcc_ea routines	or
	b) to the AM(s) that are calling the mcc_ea routines (in this case, 
	   the Translan AM

"a" is fairly risky, given the amount of work that;'s focused on getting the
1.2 stuff out the door.  Also, the mcc_ea routines really are working the way
that they're *supposed* to.   The _calling routine_ should really handle the
retries on failure.

"b" is really the _correct_ thing to do, but you'll need to go to Vitalink to
get them to make a patch (or new .exe).   AM maintenance *is* Vitalink's
responsibility (personally, I doubt that they'll be able to get you a
patch for the Translan AM any quicker than we could fix the mcc_ea routines.

I know this is exactly what you _don't_ want to hear, but my input is pick one
option and work it through the appropriate escalation mechanism(s); Digital's
for 'a' and Vitalink's for 'b'.

/doug