|
Hi,
We have a support issue with similar symptoms. If you issue the
(undocumented) MRMMAN switch
MRMMAN> SHOW/DEBUG
you will also see the server's SUBSTATE. If the substate is 1170 or
1130 then you are having the same problem. The server should never go
into this substate while main state is IDLE. The customer who's having
this problem has agreed to install a command procedure that will check
the server's status regularly and restart if the above happens. This
should give us time to track the problem within the MRMEMO server state
machine. I am writing the command procedure at this moment. Stay tuned.
cheers,
Stefan
|
|
Hi Stefan,
Now I get the infomation from my customer.
When the server proccess hangs we get the SUBSTATE 1130, after shut server 1
we get the SUBSTATE 1131.
Have you finish the command procedure and from where I can get the command
procedure ?
Thank's for your help
Bernd Rother
DSC-Munich
|
|
> When the server proccess hangs we get the SUBSTATE 1130, after shut server 1
> we get the SUBSTATE 1131.
Yep, the SHUT command enables the bit "Pending Shut" which is the first
bit in Substate mask. But since the state machine is looping the server
still won't shut
> Have you finish the command procedure and from where I can get the command
> procedure ?
Well, I've sent it for testing at that other customer with the same
problem, but I will post it as the next reply. Please note that there
is no mechanism for resubmitting the routine, you will have to provide
that yourself (the logic being that it would be much easier to implement
this functionality on site rather than provide all the checks/double
checks/parameterizing that would be required for a generic resubmit
mechanism. Besides it's very probable that the customer already has a
periodic MRMEMO check routine, so he would only need to add a call to
my routine from there)
Also note that there is no check for Substate 1131
cheers,
Stefan
|
| $!
$! MRMEMO$CHK_STATE.COM 12-Mar-1991
$!
$! Check on status of MRMEMO server(s) and if it appears to be broken
$! with no chance of healing itself stops server process and restarts.
$! To stop the server the DCL STOP command is used, since the MRMMAN
$! SHUT command is not entirely realiable, and would mean a time delay
$!
$! Parameters
$!
$! P1 - Mail to SYSTEM account on error? "Y" for yes, "N" for no
$! numeric values, 0 for "N" and 1 for "Y" also accepted
$! (optional - defaults to "N")
$!
$! P2 - number of servers to be checked (incremented from 1)
$! (optional - defaults to 1)
$!
$! Define temporary filenames
$ define showout mrmemo$dir:show_state.lis.
$ define stateout mrmemo$dir:state.lis.
$ define substateout mrmemo$dir:substate.lis.
$!
$! determine value of parameters
$!
$ msg_to_sysmgr = 0
$ if "''p1'" then msg_to_sysmgr = 1
$ no_of_servers = 1
$ if "''p2'" .nes. "" then no_of_servers = p2
$!
$! Loop no of servers
$!
$ cnt = 1
$ loop:
$ create showout
$ define/user sys$output showout
$ mc mrmman.exe show 'cnt'/debug
$!
$! Get current state and substate. If search fails it indicates server is not
$! started at all. If that's the case we abandon the check.
$!
$ search showout "Current state:"/out=stateout
$ if $status .nes. "%X00000001" then goto status_ok
$ search showout "Current substate:"/out=substateout
$!
$! extract and check status of server, beginning with substate
$! 1130 and 1170 both means that the bits for "waiting for commit",
$! and "MEMO bid rejected are set simultaneously. They never should be if main
$! state is idle.
$!
$ open substatein substateout
$ read substatein substate_str
$ close substatein
$ substate = f$extract (35,4,substate_str)
$ if ("''substate'" .nes. "1130") .and. ("''substate'" .nes. "1170") then -
goto status_ok
$!
$! hit on substate. check whether main state is "idle"
$!
$ open statein stateout
$ read statein state_str
$ close statein
$ state = f$extract (31,4,state_str)
$ if "''state'" .nes. "idle" then goto status_ok
$!
$! Server is hanging! We must Stop and restart
$!
$! Find pid of MRMEMO Server process
$!
$ context = ""
$ prcloop:
$ pid = f$pid(context)
$ if pid .eqs. "" then goto restart
$ prcnam = f$getjpi (pid, "prcnam")
$ if prcnam .nes. "MRMEMO Server ''cnt'" then goto prcloop
$!
$! Stop Server
$!
$ stop/id='pid'
$!
$! Restart Server
$!
$ restart:
$ mcr mrmman.exe start 'cnt'
$!
$! Announce that we have restarted server
$!
$ write sys$output "**********************************************************"
$ write sys$output "MRMEMO Server ''cnt' stopped and restarted because Current"
$ write sys$output "Server state indicated hung process"
$ write sys$output "**********************************************************"
$!
$! If required, send a simple VMSmail to System Manager
$!
$ if msg_to_sysmgr then mail nl: system/subject=-
"MRMEMO Server ''cnt' stopped and restarted after hanging state detected"
$!
$ status_ok:
$ cnt = cnt + 1
$!
$! endloop. once more?
$!
$ if cnt .le. no_of_servers then goto loop
$!
$!
$! Delete temp files and get out
$!
$ delete 'f$trnlnm("showout")'*
$ delete 'f$trnlnm("stateout")'*
$ delete 'f$trnlnm("substateout")'*
$ deassign showout
$ deassign stateout
$ deassign substateout
$exit
|