[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference azur::mcc

Title:DECmcc user notes file. Does not replace IPMT.
Notice:Use IPMT for problems. Newsletter location in note 6187
Moderator:TAEC::BEROUD
Created:Mon Aug 21 1989
Last Modified:Wed Jun 04 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:6497
Total number of notes:27359

1028.0. "MCC_EVENT_GET Invalid Lock Id problem" by PHONE::LORD () Fri May 17 1991 19:28

	Hello,

	I'm having an intermittent problem with MCC_EVENT_GET and MCC_EVENT_PUT 
	which I'm wondering if anyone else has seen.

	We have one thread running in one process which does nothing but loop
	doing MCC_EVENT_GETs, and when this call is satisfied issuing an event
	report to a remote manager. 

	We have a program running in another process which does a single
	MCC_EVENT_PUT, usually run in response to a DECnet event but for 
	testing I'm just repeatedly calling it from a DCL procedure, every
	1 or 2 seconds.

	They hook up most of the time; every MCC_EVENT_PUT is reflected in a
	completed MCC_EVENT_PUT - but after some arbitrary number of events,
	sometimes 3, sometimes 400+, the agent dies. Usually, but not
	always, the MCC_EVENT_PUT takes a long time to complete just before
	the agent departs, upwards of 50 seconds instead of the usual 4 
	seconds, and then it dies with the following (condensed) errors:

	%SYSTEM-F-RPOPRAND, reserved operand fault at PC 15705 PSL 3C00000
	%SYSTEM-F-RPOPRAND, reserved operand fault at PC 0 PSL 452

	The agent actually sees this last event, the MCC_EVENT_GET completes
	normally - but the next MCC_EVENT_GET fails immediately with an error
	code of MCC_S_EVENTINTERNERR; when I enabled DECmcc event tracing 
	(I defined the logicals MCC_EVENT_LOG = 1 and MCC_EVENT_TRACE = 180)
	I can see that the agent thread is getting an error of an invalid lock
	id.

	Here is a trace of the last few seconds of the agent's existence - the
	%MCC-E-NOMOREVT error is, I guess, normal, it shows up for every event,
	but the invalid lock id error is new and causes the MCC_EVENT_GET call
	itself to fail:
...
Event_Get starting
Event_Get processing Partition argument
Event_Get Input Arguments ...
  in_entity =
        entity [0] wild = NOT_WILD class = 12 id = 0 type = 24
        instance = SCCP
        entity [1] wild = INSTANCE_FULL class = 11
  Partition = 0
  Filter = MCC_K_NULL_PTR
  time_spec = MCC_K_NULL_PTR
        Handle context not set up yet
Event_Get handle state = MCC_K_HANDLE_FIRST
Event_Get building a PRMB for this request
Failed to dequeue event:
%MCC-E-NOMOREVT, no more events in event queue
Get Event failed to build PRMB:
%SYSTEM-F-IVLOCKID, invalid lock id
MCC_Event_Get 52880564 (code=0)

	Thanks,

	- Rick Lord

T.RTitleUserPersonal
Name
DateLines
1028.1TOOK::GUERTINI do this for a living -- reallyMon May 20 1991 13:0421
    I have seen the Invalid Lock ID but never the Reserved Operand Fault.
    
    Are you using any other logicals (Besides MCC_EVENT_LOG and
    MCC_EVENT_TRACE)?   Do you invoke MCC with just a Manage command or are
    there other qualifiers?  What kind of machine are you using?  are there
    multiple CPUs or just a single CPU?  How many processes are using the
    Event Manager?  Do you know of anything unusual about the Version of
    VMS you are running?  What version is it?
    
    I think there may be two bugs here.  There is a bug in the event
    manager where for a small window in time we can get into a dead-lock.
    I have only seen this happen on a MIPS machine (Ultrix version of MCC)
    and on a SMP VAX.
    
    The Reserved Operand Fault appears to be a memory clobber.  I have not
    seen this before.  It may be an MCC bug.
    
    Also, the memory clobber could be clobbering the Event Managers data
    structures, causing the Invalid Lock-ID error.
    
    -Matt.
1028.2More agent specifics PHONE::LORDMon May 20 1991 17:1451
	
	Hello Matt,

	The way we are using the the agent image is as follows:

	1. Create and enable some alarm rules, 4 OCCURs and 2 CHANGE_OFs; the
	   latter are polling every 15 seconds checking for a node's state
 	   going from reachable to unreachable and vice versa.

	2. Start the agent up, then start a manager up, in separate processes.
	   They establish an association with each other and then the agent 
	   just sits there waiting on an MCC_EVENT_GET - the manager just 
	   sits there waiting for a message from the agent (across our CMIP
	   stack).

	3. When a DECnet event occurs, the DCL procedure which gets executed
	   runs the PUT_EVENT.EXE image to generate a MCC_EVENT_PUT, passing 
	   along the parameter specified in the definition of the alarm rule.

	4. The agent's MCC_EVENT_GET completes, it generates an event report
	   to the manager, and then queues up another MCC_EVENT_GET, unless
	   it gets an error on the MCC_EVENT_GET - the reason I exit if I get
	   an EVENTINTERNERR status is that if I don't the program goes into
	   an infinite (I assume) loop, always returning the same error.

	o The only logicals I've defined are MCC_EVENT_LOG and MCC_EVENT_TRACE
	  (are there any more I should be defining?)

	o Both my PUT_EVENT image and the agent image I'm doing the 	
	  MCC_EVENT_GET from are linked against DECmcc but are not management 
	  modules; they are both just run as images, no MANAGE command 
	  involved at all.

	o I'm running this on a uVax 3500 with 64Mb memory, VMS V5.4-2, 
	  single CPU obviously; I don't know if there are any odd problems 
	  with this version of VMS.

	o Chris Bond is doing the same testing in the UK (this is all in 
	  preparation for the Networks '91 show in June, by the way), and
	  he has noticed that when he pulls the Ethernet cable (to generate
	  an "adjacency down" DECnet event), he actually sees the event 
	  count (from looking at the rule itself) incremented by 2 just
	  before the MCC_EVENT_GET fails - does this suggest anything?

	Is there anything we can do to protect ourselves during the window
	you mentioned in which a deadlock is possible?

	Thanks,

	- Rick

1028.3How are you running as a non-Management Module?TOOK::GUERTINI do this for a living -- reallyMon May 20 1991 19:0312
>>	o Both my PUT_EVENT image and the agent image I'm doing the 	
>>	  MCC_EVENT_GET from are linked against DECmcc but are not management 
>>	  modules; they are both just run as images, no MANAGE command 
>>	  involved at all.

    Everything you said sounds fine except this.  You should not be using
    any other MCC logicals other than the MCC event tracing logicals.
    
    Please expand on this statement.  How did you create the images?
    
    -Matt.
    
1028.4How my images are built + linked PHONE::LORDMon May 20 1991 19:1517
	Matt,

	The images are just C files written referencing DECmcc routines, using
	DECmcc .H files, etc. They are compiled like any other C file, but when
	they are linked I use the following options file:

		sys$share:mcc_kernel_init.obj
		sys$share:mcc_kernel_shr/share
		sys$share:mcc_desframe/lib
	
	This is as described in my (probably pretty old) MM programmers guide.
	Things seem to work fine normally; this is an intermittent problem we
	are having - we just don't want to have it in front of the customers!

	- Rick
    
1028.5Have you tried running as a PM?TOOK::GUERTINI do this for a living -- reallyTue May 21 1991 12:0623
    Rick,
    
    I'm starting to run out of ideas.  There may be a bug in the
    mcc_kernel_init.obj module which you used for callable MCC.  My
    problem is that the MCC Kernel does not really control the execution
    environment when you use Callable MCC, you do.  So it's hard to debug.
    
    It may be more helpful to try running your application as a PM.  I can
    send you mail on how to run as a PM if you don't know.

    Also, can you post the result of some debugger information?...

    DBG> set break/except
    DBG> go
    	:
      wait for the Reserved Operand Fault
        :
    DBG> set image mcc_kernel_shr
    DBG> show calls

    My DTN is 226-5974 and my mail address is TOOK::GUERTIN

    -Matt.