[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference chefs::mdf

Title:Business Recovery Server
Notice:
Moderator:CHEFS::MCCAUGHAN_S
Created:Tue Nov 19 1991
Last Modified:Fri May 09 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:254
Total number of notes:980

250.0. "brs$sentry proceses in rwmbx" by COMICS::GLEDHILL () Tue Mar 04 1997 22:23

Wonder if anyone can help here, been asked to help look at a problem at a uk 
site where the brs processes have locked up. I don't normally support BRS, so 
don't have any idea how the processes relate to each other and in particular how
oms and brs processes communicate. ANy pointers to some documentation on this
stuff would be appreciated also. 

This involves a pair of nodes each with mbxful brs$sentry processes. 
Mailboxes called brs$omsmbx_2 (on vomsah) and brs$omsmbx_1 (on vomsch).  
Both are full (731 messages), but no channels  assigned to them from any other 
process on the system so can't be sure what should be reading them.

Question is what processes are supposed to be reading them? IN one of the dumps 
this is the only one other BRS proceses there anyway (BRS$server_xxx) and
no OMS process. 

ON each node the brs$server_xxxx processes seem to be talking to each other 
over decnet

I wonder if the deadlock could be as the link is closed in one direction ie.

links on vomsah (33.720)

 Local ID    XWB     State   Node   Remote ID    PID      Remote User
 --------    ---     -----   ----   ---------    ---      -----------
*  16452   836C5F80   run   33.719     16497   00120083   SYSTEM          
   8583    836C4B80   run   33.11         32   0001000F   BRIAN_T         


on vomsch (33.719)

 Local ID    XWB     State   Node   Remote ID    PID      Remote User
 --------    ---     -----   ----   ---------    ---      -----------
   24617   836A1E80   run   33.720      8197   001F0051   SYSTEM          
*  16497   837360C0   clo   33.720     16452   00010023   BRS$SERVER      
   293     83768600   run   33.719       294   00040089   X$X0            
   294     83768800   run   33.719       293   0001001B   SYSTEM          
   16807   8369F980   run   33.11         31   0001000F   BRIAN_T         

Thanks for any help.

Regards DG.
T.RTitleUserPersonal
Name
DateLines
250.1workaroundTIMABS::OBERLEThu Mar 06 1997 11:36121
Hi,

have seen this problem on three different BRS-cluster after upgrading them to
BRS V1.4. In fact it is reproducible on all of them.

Here is what I figured out so far, it may or may not be 100% correct. The BRS
procedures are quite complex and it took me a considerable amount of time to
dig in, and work around this (and other) bugs.

If one is leaving the OMS-menu not using the 'correct way' ( File -> Exit )
the mailbox assigned to this process remains allocated and continues to get
filled up by the BRS$SENTRY process. After some time, depending on the size of
the mailbox, the mailbox is full and the writing process (BRS$SENTRY) goes into
RWMBX.

This szenario happens quite often if, for example, an operator leaves his
X-session ( Session -> End session ) without exiting the OMS-menu first.

(b.t.w There are also other possible problems like huge DECW$SERVER_0_ERROR.LOG
files filling up the system-disk if the OMS-menu will not be exit'ed prior
session-end)

A serious side effect of this; if the BRS$SENTRY process hangs, no OMS-station
failover will be possible if the primary one fails.

my workaround:
I have a batch process running checking the OMS-station at regular intervalls
for processes in RWMBX (and other SUSP) states. If one is found and the
Processname is BRS$SENTRY the appended com-procedure will be triggered. The
procedure will deassign the BRS$OMSMBX_x Logical and clean-up the mailbox.
Afterwards the BRS$SENTRY process continues to work.

Maybe there is another, less complex, way. But that was the only one I found so
far to keep the OMS-stations running without constantly rebooting them.
Maybe you need to IPMT this.

regards,
Bernd

================================================================================


$! File :       SCHEDULER_COM:CHECK_ORPHAN_BRS_MAILBOXES.COM
$! Date :       08.10.96  B.Oberle
$! Usage:       release BRS connections to mailboxes which are not longer in
$!              use (maybe due to inproper BRS-menu exit)
$! Note :       called by SYS$SYSDEVICE:[SNS$WATCHDOG]SNS$CHECK_PROCESSES_STATES
$!------------------------------------------------------------------------------
$! History:
$! ========
$!
$!------------------------------------------------------------------------------
$!
$  set noon
$  counter = 0
$!
$  NODE         = F$GETSYI("NODENAME")
$  if node .NES. F$trnlnm("BRS$PRIMARY_OMS") then goto not_prim_oms
$!
$  loop:
$! -----
$!
$  counter = counter + 1
$  if counter .GT. 50 then goto end_run
$  mba_dev_name = F$trnlnm("BRS$OMSMBX_''counter'")
$  if mba_dev_name .EQS. "" then goto loop
$!
$  if F$getdvi(mba_dev_name,"REFCNT") .NE. 2 then goto loop
$!
$  write sys$output "==> orphan BRS mailbox found (''mba_dev_name') -- deleting logical ..."
$  deassign/system/user BRS$OMSMBX_'counter'
$!
$! ####  now empty the mailbox  ####
$! This happens in batch because the procedure will hang if the mailbox is empty
$!
$  submit -
        /noident                        -
        /param=("''mba_dev_name'")      -
        /queue=sys$batch                -
        /noprint                        -
        /nolog -
        scheduler_com:SUB_EMPTY_ORPHAN_BRS_MAILBOXES.COM
$!
$  wait ::10
$  delete/entry='$ENTRY'
$!
$  goto loop
$!
$  not_prim_oms:
$! -------------
$!
$  write sys$output ""
$  write sys$output "==>  Procedure run on primary OMS-node only !!!"
$  write sys$output ""
$!
$  end_run:
$! --------
$!
$  exit
$!


================================================================================


$! File :       SCHEDULER_COM:SUB_EMPTY_ORPHAN_BRS_MAILBOXES.COM
$! Date :       08.10.96  B.Oberle
$! Usage:       write content of mailbox to null-device
$! Note :       do not use interactive  -- called by CHECK_ORPHAN_BRS_MAILBOXES.COM
$!------------------------------------------------------------------------------
$! History:
$! ========
$!
$!------------------------------------------------------------------------------
$!
$  if p1 .EQS. "" then exit
$!
$  loop:
$! -----
$  copy 'p1' sys$output:
$  goto loop
250.2QAR #272 entered, pls generate an IPMTSTAR::BOAENLANclusters/VMScluster Tech. OfficeThu Mar 06 1997 17:3011
I've entered a QAR against BRS for this problem. I recommend you enter an
IPMT referencing the QAR so that it will get visablility as a field problem.

The QAR is in the BRS database:

QAR # St Sev Pub Cat Maintainer   Component  T Entered-by   Date in
----- -- --- --- --- ------------ ---------- - ------------ -----------
00272 OP  S  Yes     JNEDOROSCIK  BRS        M BOAEN         6-MAR-1997
IMPROPER EXIT FROM OMS MENU RESULTS IN BRS$SENTRY PROCESES IN RWMBX

'Gards, Verell Boaen
250.3JNEDEROSCIK no loger on the project.CHEFS::PADDICKMichael Paddick - BRS Bristol, UKFri Mar 07 1997 14:142
    Just a quick note to point out that JNEDEROSCIK is no longer working on
    BRS. Hopefully someone else in BRS engineering will take over...
250.4SNOFS1::CHURCHILLJThu Mar 13 1997 04:581
    Or perhaps CA will take it over :-0