[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference virke::mrmemo

Title:VAX MAILGATE for MEMO
Moderator:STKHLM::OLSSON
Created:Sat Feb 25 1989
Last Modified:Tue May 14 1996
Last Successful Update:Fri Jun 06 1997
Number of topics:216
Total number of notes:933

126.0. "PROBLEM WITH HANGING MRMEMO-SERVER" by MUNICH::ROTHER (Bernd Rother TSSC munich) Wed Mar 11 1992 13:28

Hi,

I have a problem at a customer site with MRMEMO V2.1.
MRMEMO stops to deliver messages to IBM, the server status is idle connected, 
in the MRMEMO mailbox are still messages to deliver, in the log-file I see
that the last successfull delivered message is the same that I can see in the
first position in the mailbox. When I try to stop the server with shut,
the server is still alive, I can stop the server only with stop/id. 
After stopping the proccess and starting the server again, the messages get 
delivered and the Gateway works fine. It seems that the server can't compliet
the message transfer and hangs around in a undefined state.
The customer gets from us the patches for MRMEMO V2.1, but we have still the 
same problem.

Has anyone an idea about this problem.

Bernd Rother
DSC-Munich 
T.RTitleUserPersonal
Name
DateLines
126.1Check Server substate to see whether it's a known problemSTKOFF::SPERSSONPas de problemeWed Mar 11 1992 17:2919
    
    Hi,
    
    We have a support issue with similar symptoms. If you issue the
    (undocumented) MRMMAN switch
    
    MRMMAN> SHOW/DEBUG
    
    you will also see the server's SUBSTATE. If the substate is 1170 or
    1130 then you are having the same problem. The server should never go
    into this substate while main state is IDLE. The customer who's having
    this problem has agreed to install a command procedure that will check
    the server's status regularly and restart if the above happens. This
    should give us time to track the problem within the MRMEMO server state
    machine. I am writing the command procedure at this moment. Stay tuned.
    
    cheers,
    
    	Stefan
126.2Server SUBSTATE 1130,1131MUNICH::ROTHERBernd Rother DSC-MunichWed Mar 18 1992 18:4514
Hi Stefan,

Now I get the infomation from my customer.

When the server proccess hangs we get the SUBSTATE 1130, after shut server 1
we get the SUBSTATE 1131.
Have you finish the command procedure and from where I can get the command
procedure ?

Thank's for your help

Bernd Rother
DSC-Munich
126.3See next replySTKOFF::SPERSSONPas de problemeWed Mar 18 1992 19:3226
    
> When the server proccess hangs we get the SUBSTATE 1130, after shut server 1
> we get the SUBSTATE 1131.
    
    Yep, the SHUT command enables the bit "Pending Shut" which is the first
    bit in Substate mask. But since the state machine is looping the server
    still won't shut
    
> Have you finish the command procedure and from where I can get the command
> procedure ?

    Well, I've sent it for testing at that other customer with the same
    problem, but I will post it as the next reply. Please note that there
    is no mechanism for resubmitting the routine, you will have to provide
    that yourself (the logic being that it would be much easier to implement
    this functionality on site rather than provide all the checks/double
    checks/parameterizing that would be required for a generic resubmit
    mechanism. Besides it's very probable that the customer already has a
    periodic MRMEMO check routine, so he would only need to add a call to
    my routine from there)
    
    Also note that there is no check for Substate 1131
    
    cheers,
    
    	Stefan
126.4MRMEMO$CHK_STATE.COMSTKOFF::SPERSSONPas de problemeWed Mar 18 1992 19:35113
$!
$!	MRMEMO$CHK_STATE.COM				12-Mar-1991
$!
$!      Check on status of MRMEMO server(s) and if it appears to be broken
$!	with no chance of healing itself stops server process and restarts.
$!	To stop the server the DCL STOP command is used, since the MRMMAN
$!	SHUT command is not entirely realiable, and would mean a time delay
$!
$!	Parameters
$!
$!		P1 - Mail to SYSTEM account on error? "Y" for yes, "N" for no
$!		      numeric values, 0 for "N" and 1 for "Y" also accepted
$!		     (optional - defaults to "N")
$!
$!		P2 - number of servers to be checked (incremented from 1)
$!		     (optional - defaults to 1)
$!
$! Define temporary filenames 
$ define showout mrmemo$dir:show_state.lis.
$ define stateout mrmemo$dir:state.lis.
$ define substateout mrmemo$dir:substate.lis.
$!
$! determine value of parameters
$!
$ msg_to_sysmgr = 0
$ if "''p1'" then msg_to_sysmgr = 1
$ no_of_servers = 1
$ if "''p2'" .nes. "" then no_of_servers = p2
$!
$! Loop no of servers
$! 
$ cnt = 1
$ loop:
$ create showout
$ define/user sys$output showout
$ mc mrmman.exe show 'cnt'/debug
$!
$! Get current state and substate. If search fails it indicates server is not 
$! started at all. If that's the case we abandon the check.
$!
$ search showout "Current state:"/out=stateout
$ if $status .nes. "%X00000001" then goto status_ok
$ search showout "Current substate:"/out=substateout
$!
$! extract and check status of server, beginning with substate
$! 1130 and 1170 both means that the bits for "waiting for commit", 
$! and "MEMO bid rejected are set simultaneously. They never should be if main
$! state is idle.
$!
$ open substatein substateout
$ read substatein substate_str
$ close substatein
$ substate = f$extract (35,4,substate_str)
$ if ("''substate'" .nes. "1130") .and. ("''substate'" .nes. "1170") then -
      goto status_ok
$!
$! hit on substate. check whether main state is "idle"
$!
$ open statein stateout
$ read statein state_str
$ close statein
$ state = f$extract (31,4,state_str)
$ if "''state'" .nes. "idle" then goto status_ok
$!
$! Server is hanging! We must Stop and restart
$!
$! Find pid of MRMEMO Server process
$!
$ context = ""
$ prcloop:
$ pid = f$pid(context)
$ if pid .eqs. "" then goto restart
$ prcnam = f$getjpi (pid, "prcnam")
$ if prcnam .nes. "MRMEMO Server ''cnt'" then goto prcloop
$!
$! Stop Server
$!
$ stop/id='pid'
$!
$! Restart Server
$!
$ restart:
$ mcr mrmman.exe start 'cnt'
$!
$! Announce that we have restarted server 
$!
$ write sys$output "**********************************************************"
$ write sys$output "MRMEMO Server ''cnt' stopped and restarted because Current"
$ write sys$output "Server state indicated hung process"
$ write sys$output "**********************************************************"
$!
$! If required, send a simple VMSmail to System Manager
$!
$ if msg_to_sysmgr then mail nl: system/subject=-
"MRMEMO Server ''cnt' stopped and restarted after hanging state detected"
$!
$ status_ok:
$ cnt = cnt + 1
$!
$! endloop. once more?
$!
$ if cnt .le. no_of_servers then goto loop
$!
$!
$! Delete temp files and get out
$!
$ delete 'f$trnlnm("showout")'*
$ delete 'f$trnlnm("stateout")'*
$ delete 'f$trnlnm("substateout")'*
$ deassign showout
$ deassign stateout
$ deassign substateout
$exit