[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference pamsrc::decmessageq

Title:NAS Message Queuing Bus
Notice:KITS/DOC, see 4.*; Entering QARs, see 9.1; Register in 10
Moderator:PAMSRC::MARCUSEN
Created:Wed Feb 27 1991
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2898
Total number of notes:12363

2819.0. "dmqld from Digital UNIX ECO1 rvcs signal 11" by OZROCK::THOMAN (Bring back "Eddie The Eagle Edwards" !!) Tue Mar 25 1997 00:06


	We're getting intermitent "crashes" with the link drivers (again)
	A user had just attempted to establish a connection from a PC 
	running DMQ client running

		"DECmessageQ for WindowsNT V3.2"

	to our Digital UNIX server.

	The attempt was made 1st thing on monday after both machines were
	fairly idle over the wk/end.


	UNIX Side Details:


#uname -a
OSF1 mynode.ozy.dec.com V3.2 148 alpha



# su - dmq
% what `which dmqld` | grep DECmessageQ
        DECmessageQ for UNIX, V3.2A (ECO 1), Mon Feb 2 00:00:00 EST 1997
% what `which dmqqe` | grep DECmessageQ
        DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996
% what `which dmqbcp` | grep DECmessageQ
        DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996
% what `which dmqgcp` | grep DECmessageQ
        DECmessageQ for UNIX, V3.2A, Sun Nov 24 00:01:39 EST 1996





	It may just be a coincidence that starting the PC caused
	the signal 11's, but every 10 sec, as the retrys occured,
	the signal was rcvd...

	
	Output from Group Control on UNIX side at end of this note.

	Thanks

	Craig.




************ dmqld (809.0) 24-MAR-1997 10:36:39 ************
ipi, semaphore operation failed

************ dmqld (809.0) 24-MAR-1997 10:36:39 ************
ld, initialization failure

************ dmqld (809.0) 24-MAR-1997 10:36:39 ************
spi, not attached to object

************ dmqld (809.0) 24-MAR-1997 10:36:39 ************
ld, caught signal 11

************ dmqld (809.0) 24-MAR-1997 10:36:39 ************
ld, link receiver for group 2966 from group 3654 is exiting

************ dmqld (809.0) 24-MAR-1997 10:36:49 ************
ipi, semaphore operation failed

************ dmqld (809.0) 24-MAR-1997 10:36:49 ************
ld, initialization failure

************ dmqld (809.0) 24-MAR-1997 10:36:49 ************
spi, not attached to object

************ dmqld (809.0) 24-MAR-1997 10:36:49 ************
ld, caught signal 11

************ dmqld (809.0) 24-MAR-1997 10:36:49 ************
ld, link receiver for group 2966 from group 3654 is exiting

************ dmqld (809.0) 24-MAR-1997 10:36:59 ************
ipi, semaphore operation failed

************ dmqld (809.0) 24-MAR-1997 10:36:59 ************
ld, initialization failure

************ dmqld (809.0) 24-MAR-1997 10:36:59 ************
spi, not attached to object

T.RTitleUserPersonal
Name
DateLines
2819.1Any ideas - I'm seeing it when no PC involved also.OZROCK::THOMANBring back "Eddie The Eagle Edwards" !!Tue Apr 08 1997 05:0256
	I'm seeing the problems again, this time both groups are
	Digital UNIX hosts.

	Any chance you could give me a dmqld with trace built in
	so I can send you some details & a core dump ??

	Thx

	Craig.




************ dmqld (723.0) 07-APR-1997 10:09:53 ************
ld, link receiver for group 2966 has lost connection to group 2963

************ dmqld (723.0) 07-APR-1997 10:09:53 ************
ld, link receiver for group 2966 from group 2963 is exiting

************ dmqld (725.0) 07-APR-1997 10:09:53 ************
ld, link sender for group 2966 to group 2963 is exiting

************ dmqld (723.0) 07-APR-1997 10:11:57 ************
ipi, semaphore operation failed

************ dmqld (723.0) 07-APR-1997 10:11:57 ************
ld, initialization failure

************ dmqld (723.0) 07-APR-1997 10:11:57 ************
spi, not attached to object

************ dmqld (723.0) 07-APR-1997 10:11:57 ************
ld, caught signal 11

************ dmqld (723.0) 07-APR-1997 10:11:57 ************
ld, link receiver for group 2966 from group 2963 is exiting

************ dmqld (723.0) 07-APR-1997 10:12:07 ************
ipi, semaphore operation failed

************ dmqld (723.0) 07-APR-1997 10:12:07 ************
ld, initialization failure

************ dmqld (723.0) 07-APR-1997 10:12:07 ************
spi, not attached to object

************ dmqld (723.0) 07-APR-1997 10:12:07 ************
ld, caught signal 11

************ dmqld (723.0) 07-APR-1997 10:12:07 ************
ld, link receiver for group 2966 from group 2963 is exiting

************ dmqld (723.0) 07-APR-1997 10:12:17 ************
ipi, semaphore operation failed

2819.2Anyone home ?OZROCK::THOMANJust reboot & it will work again ;-)Thu Apr 17 1997 03:1612

	The crash is still happening !

	Why hasn't anyone (from D/B-MQ Engineering) replied  ?

	Please supply us with a dmqld with trace & debug flags
	set so we can get you a core dump !

	Thx

	C.
2819.3We'll look at it as soon as we canXHOST::MARCUSDmQ Escalation and Quality Assurance Manager (DTN 320-5003, 860-Fri Apr 18 1997 11:4517
    
    No one has responded, probably, because the engineering team is very
    busy at he moment, particularly the engineers responsible for UNIX
    ld's.  As has been stated many times before, the DmQ notesfile is an
    informal support channel.  We do our best to be responsive but we
    cannot guarantee response time to problems posted here.
    
    I will ask an engineer to look at this as soon as possible and provide
    information for turning on ld tracing and whatever other information 
    we may need.
    
    If this is a non-critical problem, you can also log a QAR on the PAMSRC
    system, account QAR_INTERNAL.  Or, you can process this through the
    country CSC.
    
    If it is critical, you can log an IPMT case.
    
2819.4Twilight Zone Being Entered...OZROCK::THOMANJust reboot & it will work again ;-)Tue Apr 29 1997 03:1942
	I see the problem, only on a Monday, always around 10a.m.

	From .0 you'll see it happened at 10:36:39 on Mon March 24.
	From .1 you'll see it happened at 10:09:53 on Mon April 7
	From memory also last Mon, Apr 21
	It has happened again at 10:41:24 on Mon April 28.
	
	Below is the crontab file... I don't believe it is initiating
	anything before, or around 10am on a monday.

	
#
# OSF/1 Release 1.2
#
#       root crontab file
#
15 4 * * * find /var/preserve -mtime +7 -type f -exec rm -f {} \;
20 4 * * * find /tmp -type d -name dmq -prune -o -type f -atime +2 -exec rm -f {
} \;
30 4 * * * find /var/tmp -type d -name dmq -prune -o -type f -atime +7 -exec rm
-f {} \;
40 4 * * * find /var/adm/syslog.dated -depth -type d -ctime +5 -exec rm -rf {} \
;
#0 3 * * 4 /usr/sbin/acct/dodisk > /var/adm/diskdiag &
#
# Weekly backups
#
00 8 * * 4 /sysm/rbupscript -l 0 -f flyemu:/dev/nrmt0h / /usr /kits /usr/evaluat
e /usr/products
#05 11 * * 3 /sysm/rbupscript -l 0 -f flyemu:/dev/nrmt0h / /usr /kits /usr/evalu
ate /usr/products
#


	Ever seen this senario before ?


	Thanks,

	Craig.

2819.5Do the dmqld's perform semctl(...,...,IPC_STAT, ...) calls?OZROCK::THOMANJust reboot & it will work again ;-)Wed Apr 30 1997 08:3919

	The error of .0 revolves around semaphore problems,
	and ends in a signal 11. 

	A test program I wrote gets a signal 11 when calling
	semctl(2) with the IPC_STAT "cmd".

	The crash happens when I try to access the "sem"
	struct.

	I've put a note in the Digital UNIX notes conference
	'cause it's not obvious what I'm doing wrong. If the
	dmqld process calls semctl, perhaps it is striking
	the same problem ?.

	Thx

	Craig.
2819.6At about the 5th attempt...OZROCK::THOMANJust reboot & it will work again ;-)Thu May 01 1997 00:005
	QAR now entered - it is #00670


	C.