[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference pamsrc::decmessageq

Title:NAS Message Queuing Bus
Notice:KITS/DOC, see 4.*; Entering QARs, see 9.1; Register in 10
Moderator:PAMSRC::MARCUSEN
Created:Wed Feb 27 1991
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2898
Total number of notes:12363

2836.0. "Misterious crashes (DmQ 3.2 OpenVMS VAX/Alpha)" by PLACEK::STEFANOWICZ () Thu Apr 03 1997 15:28

    We have serious problem at customer site. It is difficult to track
    the cause of problem, it is just a guess that it might be DmQ as
    it appears in all components that use DmQ.
    
    Customer reported machine crash (VAX 7000, OpenVMS 6.2, DmQ 3.2 RT,
    DECnet/OSI). 
    No dump is present as they have 2GB RAM and did not want to loose 
    disk for dump storage ;-) We just know crash occured in NETACP, and
    there were several process crashes in our application processes.
    
    We started looking in our development environment (AlphaStation
    SCSI Cluster, OpenVMS v6.2, DmQ 3.2 Dev, TCP/IP connections) and found 
    simmilar problems (but without whole system crash).
    
    We just have error.log and accounting.dat. Analysing them we
    found that our application processes report error 1036
    (AST Fault) in EXECUTIVE mode. We do not explicitly use AST and
    do not explicitly change modes. Maybe it would be of interest
    that we extensively use put_msg until quota exceeding - this is
    our simple application-flow-control mechanism.
    
    We have 2 assumptions:
    (1) AST FAULT reason is as suggested in documentation - stack too small
        or corrupted. But being too small is unlikely as we use around 8k
    	maximum.
    (2) There is a hidden error somewhere in communication layer. We
        suspected RMS also, as running in EXEC mode, but is not used in
    	one component which also reports mentioned error.
    
    Folks, we know this is quite vogue description. We just hope any of
    you already had simmilar situation.
    
    Thanks for any help.
    Artur
T.RTitleUserPersonal
Name
DateLines
2836.1I know of only one instance of this...KLOVIA::MICHELSENBEA/DEC MessageQ EngineeringThu Apr 03 1997 19:1612
...where a process that uses DmQ sets a DmQ timer then does a ^Y STOP which prevents
the USER mode exit handlers from running.  When the EXEC mode DmQ timer AST goes off
it lands off in space causing ACCVIOs or ASTFAULTs.  This has been been fixed with
ECO 3247.  However, since you say that NETACP is the one reporting the problem, I
know no case in which DmQ corrupted a process that was never known to DmQ.

  I think you are going to have to reconfigure the system to able to save a crash
dump.



Marty