| Hi,
have seen this problem on three different BRS-cluster after upgrading them to
BRS V1.4. In fact it is reproducible on all of them.
Here is what I figured out so far, it may or may not be 100% correct. The BRS
procedures are quite complex and it took me a considerable amount of time to
dig in, and work around this (and other) bugs.
If one is leaving the OMS-menu not using the 'correct way' ( File -> Exit )
the mailbox assigned to this process remains allocated and continues to get
filled up by the BRS$SENTRY process. After some time, depending on the size of
the mailbox, the mailbox is full and the writing process (BRS$SENTRY) goes into
RWMBX.
This szenario happens quite often if, for example, an operator leaves his
X-session ( Session -> End session ) without exiting the OMS-menu first.
(b.t.w There are also other possible problems like huge DECW$SERVER_0_ERROR.LOG
files filling up the system-disk if the OMS-menu will not be exit'ed prior
session-end)
A serious side effect of this; if the BRS$SENTRY process hangs, no OMS-station
failover will be possible if the primary one fails.
my workaround:
I have a batch process running checking the OMS-station at regular intervalls
for processes in RWMBX (and other SUSP) states. If one is found and the
Processname is BRS$SENTRY the appended com-procedure will be triggered. The
procedure will deassign the BRS$OMSMBX_x Logical and clean-up the mailbox.
Afterwards the BRS$SENTRY process continues to work.
Maybe there is another, less complex, way. But that was the only one I found so
far to keep the OMS-stations running without constantly rebooting them.
Maybe you need to IPMT this.
regards,
Bernd
================================================================================
$! File : SCHEDULER_COM:CHECK_ORPHAN_BRS_MAILBOXES.COM
$! Date : 08.10.96 B.Oberle
$! Usage: release BRS connections to mailboxes which are not longer in
$! use (maybe due to inproper BRS-menu exit)
$! Note : called by SYS$SYSDEVICE:[SNS$WATCHDOG]SNS$CHECK_PROCESSES_STATES
$!------------------------------------------------------------------------------
$! History:
$! ========
$!
$!------------------------------------------------------------------------------
$!
$ set noon
$ counter = 0
$!
$ NODE = F$GETSYI("NODENAME")
$ if node .NES. F$trnlnm("BRS$PRIMARY_OMS") then goto not_prim_oms
$!
$ loop:
$! -----
$!
$ counter = counter + 1
$ if counter .GT. 50 then goto end_run
$ mba_dev_name = F$trnlnm("BRS$OMSMBX_''counter'")
$ if mba_dev_name .EQS. "" then goto loop
$!
$ if F$getdvi(mba_dev_name,"REFCNT") .NE. 2 then goto loop
$!
$ write sys$output "==> orphan BRS mailbox found (''mba_dev_name') -- deleting logical ..."
$ deassign/system/user BRS$OMSMBX_'counter'
$!
$! #### now empty the mailbox ####
$! This happens in batch because the procedure will hang if the mailbox is empty
$!
$ submit -
/noident -
/param=("''mba_dev_name'") -
/queue=sys$batch -
/noprint -
/nolog -
scheduler_com:SUB_EMPTY_ORPHAN_BRS_MAILBOXES.COM
$!
$ wait ::10
$ delete/entry='$ENTRY'
$!
$ goto loop
$!
$ not_prim_oms:
$! -------------
$!
$ write sys$output ""
$ write sys$output "==> Procedure run on primary OMS-node only !!!"
$ write sys$output ""
$!
$ end_run:
$! --------
$!
$ exit
$!
================================================================================
$! File : SCHEDULER_COM:SUB_EMPTY_ORPHAN_BRS_MAILBOXES.COM
$! Date : 08.10.96 B.Oberle
$! Usage: write content of mailbox to null-device
$! Note : do not use interactive -- called by CHECK_ORPHAN_BRS_MAILBOXES.COM
$!------------------------------------------------------------------------------
$! History:
$! ========
$!
$!------------------------------------------------------------------------------
$!
$ if p1 .EQS. "" then exit
$!
$ loop:
$! -----
$ copy 'p1' sys$output:
$ goto loop
|