[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::dec_mls_plus

Title:dec_mls_plus
Moderator:SMURF::BAT
Created:Mon Nov 29 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:534
Total number of notes:2544

527.0. "Oracle: V4.0A porting problem: hung processes" by SMURF::BAT (Segui la tua beatitudine) Mon Jun 02 1997 20:16

    Oracle called.  When they run more than one Oracle database at one
    time, after several hours, they will end up hung.  This doesn't
    happen on MLS+ V3.1A, this is just on their V4.0A port.
    
    They all appear to be in a thread_block state within msg_dequeue.
    I've asked him to send me the stack trace of each one.
    
    Is it possible that something is not thread safe?  How do we go about
    finding it?
T.RTitleUserPersonal
Name
DateLines
527.1From Norman. Chris called later -- still have to call backSMURF::BATSegui la tua beatitudineWed Jun 04 1997 00:3571
From:	US2RMC::"NLIANG@us.oracle.com" "NLIANG.US.ORACLE.COM"  2-JUN-1997 16:56:02.03
To:	smurf::bat
CC:	
Subj:	stack of hung problem

 
Barbara, 
 
Here is the stack trace of the hung process: 
 
[2] record output foo.tmp (0 lines) 
(dbx) 
 
(dbx) (dbx) >  0 thread_block() 
["../../../../src/kernel/kern/sched_prim.c":2063, 0xfffffc00002ac7ac 
] 
   1 msg_dequeue(0xfffffc0001dcea80, 0xfffffc0007f9aa48, 0xffffffff80309020, 
0xffffffff883738e0, 0x0 
) ["../../../../src/kernel/kern/ipc_basics.c":869, 0xfffffc0000299cf0] 
   2 msg_receive_trap(0x3ffc0182c50, 0x3ffc0187700, 0x3ffc0082620, 
0x1400e5710, 0x400) ["../../../.. 
/src/kernel/kern/ipc_basics.c":1235, 0xfffffc000029a3b4] 
   3 _Xsyscall(0x8, 0x3ff8053ea44, 0x3ffc017b660, 0x1400e5710, 0x400) 
["../../../../src/kernel/arch/ 
alpha/locore.s":1333, 0xfffffc000051adf8] 
(dbx) (dbx) >  0 thread_block() 
["../../../../src/kernel/kern/sched_prim.c":2063, 0xfffffc00002ac7ac 
] 
    [ another dbm: ]
    
   1 msg_dequeue(0xfffffc0003232a80, 0xfffffc000423a048, 0xffffffff80309020, 
0xffffffff882538e0, 0x0 
) ["../../../../src/kernel/kern/ipc_basics.c":869, 0xfffffc0000299cf0] 
   2 msg_receive_trap(0x3ffc0182c50, 0x3ffc0187700, 0x3ffc0082620, 
0x1400e5710, 0x400) ["../../../.. 
/src/kernel/kern/ipc_basics.c":1235, 0xfffffc000029a3b4] 
   3 _Xsyscall(0x8, 0x3ff8053ea44, 0x3ffc017b660, 0x1400e5710, 0x400) 
["../../../../src/kernel/arch/ 
alpha/locore.s":1333, 0xfffffc000051adf8] 
(dbx) (dbx) >  0 thread_block() 
["../../../../src/kernel/kern/sched_prim.c":2063, 0xfffffc00002ac7ac 
] 
    
    [ yet another dbm hung on the same system: ]
    
   1 msg_dequeue(0xfffffc0005c71500, 0xfffffc000656e4a8, 0xffffffff80309020, 
0xffffffff8816f8e0, 0x0 
) ["../../../../src/kernel/kern/ipc_basics.c":869, 0xfffffc0000299cf0] 
   2 msg_receive_trap(0x3ffc0182c50, 0x3ffc0187700, 0x3ffc0082620, 
0x1400e5710, 0x400) ["../../../.. 
/src/kernel/kern/ipc_basics.c":1235, 0xfffffc000029a3b4] 
   3 _Xsyscall(0x8, 0x3ff8053ea44, 0x3ffc017b660, 0x1400e5710, 0x400) 
["../../../../src/kernel/arch/ 
alpha/locore.s":1333, 0xfffffc000051adf8] 
(dbx) (dbx) >  0 thread_block() 
["../../../../src/kernel/kern/sched_prim.c":2063, 0xfffffc00002ac7ac 
] 
    
    [ yet another dbm hung on the same system: ]
    
   1 msg_dequeue(0xfffffc0005c71500, 0xfffffc000656e4a8, 0xffffffff80309020, 
0xffffffff8816f8e0, 0x0 
) ["../../../../src/kernel/kern/ipc_basics.c":869, 0xfffffc0000299cf0] 
   2 msg_receive_trap(0x3ffc0182c50, 0x3ffc0187700, 0x3ffc0082620, 
0x1400e5710, 0x400) ["../../../.. 
/src/kernel/kern/ipc_basics.c":1235, 0xfffffc000029a3b4] 
   3 _Xsyscall(0x8, 0x3ff8053ea44, 0x3ffc017b660, 0x1400e5710, 0x400) 
["../../../../src/kernel/arch/ 
alpha/locore.s":1333, 0xfffffc000051adf8] 
(dbx) 

527.2next stepSMURF::BATSegui la tua beatitudineWed Jun 04 1997 00:4113
    I spoke to Kris this morning to get a pointer to the right place to ask
    about how to find this.  She said
    
    1.	Dave Long is the right guy to ask about thread issues.
    
    2.	Try building them a debug kernel:
    
    Ask them what software options they have in their kernel (ask them for
    their conf file), then build them a genvmunix with those options and
    with CFLAGS=g3 and no optimizations, and send it to them.  
    
    Then, next time these dbm's hang, they should force a crash and send
    the crash dump files here.
527.3now to build a debug kernelSMURF::BATSegui la tua beatitudineThu Jun 05 1997 16:422
    Norman sent me the kernel options list; I've archived it in ~ftp/pub/oracle
    
527.4what is dlm?SMURF::BATSegui la tua beatitudineThu Jun 05 1997 17:015
    In further discussing this hang, Norman said that he had had to remove
    the calls to the lock manager their code was using... he said because 
    "they couldn't find the 'Digital Lock Manager' code in V4... e.g.,
    /usr/include/sys/dlm.h and the dlm_detach, etc., routines... where are
    they?"
527.5I put the files in ~ftp/pub/oracleSMURF::BATSegui la tua beatitudineThu Jun 05 1997 17:3021
From:	US2RMC::"NLIANG@us.oracle.com" "NLIANG.US.ORACLE.COM"  3-JUN-1997 22:23:47.95
To:	smurf::bat
CC:	ASCHEN@us.oracle.com, NLIANG@us.oracle.com
Subj:	Re: RE: multiple dbm hangs on thread_block


--=_ORCL_38673501_0_11919706032002120
Content-Transfer-Encoding:7bit
Content-Type:text/plain; charset="us-ascii"

The hung situation still persists. And I cannot relate it to any Oracle's 
potential problem. The only thing I could think of now is latch problem. Since 
latch is implemented in assembly language and the "as" compiler seems output a 
significant small .o for me, where in 3.1, I got a much bigger object file. 
 
Anyway, I've attached the files you need. 
 
(I need more semaphores in order for several database running together.) 
 
Norman Liang 
Oracle Corporation
527.6but it is hanging on creat syscallSMURF::BATSegui la tua beatitudineThu Jun 05 1997 17:3215
From:	US2RMC::"NLIANG@us.oracle.com" "NLIANG.US.ORACLE.COM"  3-JUN-1997 22:28:01.30
To:	smurf::bat
CC:	
Subj:	Re: RE: multiple dbm hangs on thread_block


--=_ORCL_38673663_0_11919706032006300
Content-Transfer-Encoding:7bit
Content-Type:text/plain; charset="us-ascii"

One more thing you need to know. We're using a multi-processors machine and 
It's more likely to hang when I'm using parallel query option from Oracle. 
 
Norman Liang 
Oracle Coporation
527.7got to get this to normanSMURF::BATSegui la tua beatitudineFri Jun 06 1997 00:533
    I just found out that dlm stuff is in the kernel, for TruClusters --
    still don't know where the pool is.