[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference clt::cma

Title:DECthreads Conference
Moderator:PTHRED::MARYSTEON
Created:Mon May 14 1990
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1553
Total number of notes:9541

1528.0. "Thread missing from "ladebug show thread" ?" by MUFFIT::gerry (Gerry Reilly) Thu Apr 17 1997 14:26

I am trying to help a partner debug a deadlock in their application, and I 
want to understand something I currently don't...

This is on Digital UNIX V4.0A with the latest pthread patches applied.

Using ladebug I am seeing a list of threads (using show thread) that does
not include thread 7 and shows thread 3 terminating.  Thread 3 I believe
I understand, it has been cancelled.  However, thread 7 I find puzzling
because 'where thread all' shows a stack for this thread.

My question; should I expect show thread to show all the threads ?  If not,
             under what conditions are threads missed out ?  

Any insight greatly appreciated.

-gerry

(ladebug) show thread
Thread State      Substate        Policy     Priority Name
------ ---------- --------------- ---------- -------- -------------
>*  -3 running                    idle        0       null thread for VP 0x1
     1 blocked    timed cond wait throughput 11       default thread
    -1 blocked    kernel          fifo       32       manager thread
    -2 running                    idle        0       null thread for VP 0x0
     2 blocked    timed cond wait throughput 11       <anonymous>
     4 blocked    kernel          throughput 11       <anonymous>
     5 blocked    cond wait       throughput 11       <anonymous>
     6 blocked    kernel          throughput 11       <anonymous>
     8 blocked    mutex wait      throughput 11       <anonymous>
     3 terminated                 throughput 11       <anonymous>

(ladebug) where thread all
Stack trace for thread -3
>0  0x240d8a8c in nxm_idle(0x1, 0x14004630, 0x63cd4b70, 0x140045c8, 0x240beb74, 0x63cd37f0) DebugInformationStrippedFromFile19:???
#1  0x240c6ba0 in vpIdle(0x240beb74, 0x63cd37f0, 0x240b97e4, 0x140045c8, 0x63cceca8, 0x140045c8) DebugInformationStrippedFromFile110:???
#2  0x240b97e0 in UnknownProcedure7FromFile98(0x0, 0x0, 0x240c07d0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile98:???
#3  0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???

Stack trace for thread 1
#0  0x240c8a78 in /usr/shlib/libpthread.so
#1  0x240b2d20 in /usr/shlib/libpthread.so
#2  0x240b0634 in dspDispatch(0x63ccf720, 0x0, 0x63ccf528, 0x0, 0x63ccf720, 0x11fff640) DebugInformationStrippedFromFile89:???
#3  0x240ab478 in cvTimedWait(0x7fff1910, 0x7fff1908, 0x63ccf750, 0x63ccf528, 0x0, 0x1400bb60) DebugInformationStrippedFromFile1:???
#4  0x240a961c in __pthread_delay_np(0x7ff8cd30, 0x63ccc120, 0x100000, 0x63ccf528, 0x3354edb4, 0x12028cc8) DebugInformationStrippedFromFile1:???
#5  0x24099084 in pthread_delay_np(0x100000, 0x63ccf528, 0x3354edb4, 0x12028cc8, 0x6fe7f868, 0x6ff384a0) DebugInformationStrippedFromFile7:???
#6  0x6fe7f864 in /opt/cics/lib/libcicsco.so
#7  0x6ff384a8 in CICSH_Suspend(0x6ff384ac, 0x2710, 0x0, 0x200000, 0x6ffe6cc4, 0x2710) DebugInformationStrippedFromFile2:???
#8  0x6ffe6cc0 in TerSH_Emulation(0xfffffffe, 0x36304100, 0x12003264, 0x0, 0x7ffd2ec0, 0x1) DebugInformationStrippedFromFile10:???
#9  0x1200332c in main(0x0, 0x140009818, 0x1, 0x45586732, 0x3, 0x140023400) DebugInformationStrippedFromFile1:???

Stack trace for thread -1
#0  0x240d8a44 in msg_receive_trap(0x14017680, 0x500, 0x240ba52c, 0x0, 0x63cd37f0, 0x1) DebugInformationStrippedFromFile19:???
#1  0x240cf200 in msg_receive(0x63cd37f0, 0x63cd4c50, 0x7, 0x500, 0x63cd6120, 0x9a97c3354e88f) DebugInformationStrippedFromFile6:???
#2  0x240b8ecc in UnknownProcedure3FromFile98(0x0, 0x0, 0x1, 0x45586732, 0x3, 0x0) DebugInformationStrippedFromFile98:???
#3  0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???

Stack trace for thread -2
#0  0x240d8a8c in nxm_idle(0x1, 0x14004098, 0x63cd4b70, 0x14004030, 0x240beb74, 0x63cd37f0) DebugInformationStrippedFromFile19:???
#1  0x240c6ba0 in vpIdle(0x240beb74, 0x63cd37f0, 0x240b97e4, 0x14004030, 0x63cceca8, 0x14004030) DebugInformationStrippedFromFile110:???
#2  0x240b97e0 in UnknownProcedure7FromFile98(0x0, 0x0, 0x240c07d0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile98:???
#3  0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???

Stack trace for thread 2
#0  0x240c8a78 in /usr/shlib/libpthread.so
#1  0x240b2d20 in /usr/shlib/libpthread.so
#2  0x240b0634 in dspDispatch(0x1403bb80, 0x0, 0x1403c3b0, 0x0, 0x1403bb80, 0x140479e8) DebugInformationStrippedFromFile89:???
#3  0x240ab478 in cvTimedWait(0x514c4, 0x0, 0x14032730, 0x1403c3b0, 0x0, 0x14037b80) DebugInformationStrippedFromFile1:???
#4  0x240a94bc in __pthread_cond_timedwait(0x14032730, 0x1403c3b0, 0x0, 0x14037b80, 0x240938c0, 0x2) DebugInformationStrippedFromFile1:???
#5  0x240938bc in ptdexc_cond_timedwait(0x240938c0, 0x2, 0x24132c1c, 0x64334050, 0x3354e890, 0x2564a5a8) DebugInformationStrippedFromFile4:???
#6  0x24132c18 in UnknownProcedure0FromFile2(0x1, 0x63ccf490, 0x63cd6120, 0x0, 0x3354e890, 0x2564a5a8) DebugInformationStrippedFromFile2:???
#7  0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???

Stack trace for thread 4
#0  0x243ebff0 in /usr/shlib/libc.so
#1  0x24096e48 in __sigwait(0x0, 0x0, 0x0, 0x0, 0x6fe80a50, 0x0) DebugInformationStrippedFromFile6:???
#2  0x6fe80a4c in /opt/cics/lib/libcicsco.so
#3  0x6ff394f0 in TerSH_SignalInit(0x0, 0x0, 0x240c07d0, 0x6ff39448, 0x1407ce30, 0x0) DebugInformationStrippedFromFile3:???
#4  0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???

Stack trace for thread 5
#0  0x240c8a78 in /usr/shlib/libpthread.so
#1  0x240b2d20 in /usr/shlib/libpthread.so
#2  0x240b0634 in dspDispatch(0x1403bc40, 0x100000, 0x14080180, 0x1, 0x1403bc40, 0x240abc80) DebugInformationStrippedFromFile89:???
#3  0x240ac288 in cvWait(0x14092910, 0x1, 0x14032ee0, 0x1407f270, 0x14080150, 0x14080180) DebugInformationStrippedFromFile1:???
#4  0x240a9504 in __pthread_cond_wait(0x14032ee0, 0x1407f270, 0x14080150, 0x14080180, 0x2409395c, 0x2) DebugInformationStrippedFromFile1:???
#5  0x24093958 in ptdexc_cond_wait(0x14080150, 0x14080180, 0x2409395c, 0x2, 0x2413c2b0, 0x240ac6d8) DebugInformationStrippedFromFile4:???
#6  0x2413c2ac in UnknownProcedure8FromFile17(0x0, 0x0, 0x642b08d0, 0x7ffd2e80, 0x140e5428, 0x1407f270) DebugInformationStrippedFromFile17:???
#7  0x2413d2c0 in rpc__cthread_stop_all(0x241429f4, 0x140e5428, 0x6430e028, 0x64334698, 0x64333a30, 0x140e5428) DebugInformationStrippedFromFile17:???
#8  0x241429f0 in rpc_server_listen(0x140e52f8, 0x140e52f8, 0x3, 0x45586732, 0x3, 0x11fff328) DebugInformationStrippedFromFile22:???
#9  0x6ff38ee4 in TerSH_RPCInit(0x0, 0x0, 0x1, 0x45586732, 0x3, 0x0) DebugInformationStrippedFromFile3:???
#10 0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???

Stack trace for thread 6
#0  0x243b1a68 in /usr/shlib/libc.so
#1  0x241486e8 in UnknownProcedure5FromFile25(0x1, 0x63ccf490, 0x63cd6120, 0x0, 0x0, 0x0) DebugInformationStrippedFromFile25:???
#2  0x2414846c in UnknownProcedure4FromFile25(0x0, 0x0, 0x240c07d0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile25:???
#3  0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???

Stack trace for thread 8
#0  0x240c8a78 in /usr/shlib/libpthread.so
#1  0x240b2d20 in /usr/shlib/libpthread.so
#2  0x240b0634 in dspDispatch(0x14005d10, 0x14082ee8, 0x14082d70, 0x7ffd2e80, 0x1403b940, 0x0) DebugInformationStrippedFromFile89:???
#3  0x240b4120 in pthread_mutex_block(0x0, 0xb, 0x0, 0x7ffd2e80, 0x1403b940, 0x0) DebugInformationStrippedFromFile96:???
#4  0x240c8870 in __pthread_mutex_lock(0xb, 0x0, 0x7ffd2e80, 0x1403b940, 0x0, 0x24099c00) DebugInformationStrippedFromFile112:???
#5  0x24099bfc in pthread_mutex_lock(0x7ffd2e80, 0x1403b940, 0x0, 0x24099c00, 0x6ff3a070, 0x0) DebugInformationStrippedFromFile7:???
#6  0x6ff3a06c in /opt/cics/lib/librcsco.so
#7  0x6ff39ae0 in TerSH_RSend(0x14086600, 0x48384a0054414843, 0x42495254384a0048, 0x6574786961004848, 0x6d72, 0x742f7665642f0000) DebugInformationStrippedFromFile4:???
#8  0x6ff3f9b0 in UnknownProcedure1FromFile15(0x6430e028, 0x1414d950, 0x1, 0x45586732, 0x3, 0x1) DebugInformationStrippedFromFile15:???
#9  0x24188358 in /usr/shlib/libdce.so
#10 0x2413b4f0 in UnknownProcedure2FromFile17(0x14082d70, 0x14092910, 0x1, 0x63ccf490, 0x63cd6120, 0x0) DebugInformationStrippedFromFile17:???
#11 0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x1, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???

Stack trace for thread 7
#0  0x240c8a78 in /usr/shlib/libpthread.so
#1  0x240b2d20 in /usr/shlib/libpthread.so
#2  0x240b0634 in dspDispatch(0x14006290, 0x14082988, 0x14082810, 0x14082810, 0x1403bc40, 0x0) DebugInformationStrippedFromFile89:???
#3  0x240b4120 in pthread_mutex_block(0x1, 0x240aa94c, 0x14082810, 0x1, 0x14082810, 0x63cd37f0) DebugInformationStrippedFromFile96:???
#4  0x240c8870 in __pthread_mutex_lock(0x240aa94c, 0x14082810, 0x1, 0x14082810, 0x63cd37f0, 0x63cd4b70) DebugInformationStrippedFromFile112:???
#5  0x63cd4b6c in ???

Stack trace for thread 3
#0  0x2425fa98 in UnknownProcedure7FromFile331(0x14091a18, 0x14091a18, 0x240c0efc, 0x2425f238, 0x14087950, 0x0) DebugInformationStrippedFromFile331:???
#1  0x240c0b64 in thdBase(0x0, 0x0, 0x0, 0x3, 0x45586732, 0x3) DebugInformationStrippedFromFile102:???

(ladebug) detach
(ladebug) quit
T.RTitleUserPersonal
Name
DateLines
1528.1#7 is dead...it's an ex-thread!WTFN::SCALESDespair is appropriate and inevitable.Thu Apr 17 1997 14:5412
I suspect that the problem is not that "show thread" is missing a thread;
rather I'd bet it's that "where thread all" is showing you one that it
shouldn't.

I suspect that thread 7 has terminated; however, unlike thread 3, thread 7
has been reclaimed (because some thread already joined with it or because it
was explicitly detached).  What "where thread all" is showing you for thread
7 is a cached thread "corpse", with an inconsistent execution context (no
surprise that, since the thread is dead!).


				Webb
1528.2DCETHD::BUTENHOFDave Butenhof, DECthreadsThu Apr 17 1997 19:0523
Yes, prior to 4.0D, I recently discovered that some of the pthreaddebug
thread information functions (such as the "get registers" call that ladebug
uses to start a "where") didn't reject a terminated thread ID. As a result,
it'd return misleading data.

On the other hand, ladebug tracks thread activation and termination to keep
its own list of threads up to date -- so I don't know why the "where thread
all" thought there was a thread 7.

If you're using a version of ladebug earlier than 4.0-35, (and especially if
you've got one earlier than 4.0-30), you should update. (Of course, if this
is being done at "a partner"'s site, I don't know whether that's necessarily
possible.) There have been a lot of thread-related problems fixed in ladebug,
though I don't know all the details or have any idea whether this might be
one. But in general, threaded debugging will go much more smoothly with the
latest ladebug.

It's also possible that there's a DECthreads problem, and somehow ladebug
isn't getting all the termination events -- if I have some time at some point
I might try to provoke that to check, but, no guarantees. It might well prove
tricky to deliberately catch things in the right state...

	/dave
1528.3Hmmm...did you attach to that process at all?PTHRED::PORTANTEPeter Portante, DTN 381-2261, (603)881-2261, MS ZKO2-3/Q18Thu Apr 17 1997 19:566
Gerry,

Did you have ladebug run the program from the beginning or attach to the program
after it had started to run?

-Peter
1528.4ThanksMUFFIT::gerryGerry ReillyFri Apr 18 1997 12:1413
Thanks for all the help hints.

I will upgrade the system to 4.0-35 and see if I get some more useful
information.  

Currently, they are attaching to the process rather than starting the
process under the debugger.  Really can't change this because provoking
the hang in a reasonable time (like 12 hrs) requires 60 instances of
the process to be run; they have no idea which ones of the 60 will
hang, and therefore they would need to start all 60 under ladebug...

-gerry

1528.5DCETHD::BUTENHOFDave Butenhof, DECthreadsFri Apr 18 1997 12:5531
The reason Pete asked about attaching is that he, a ladebug developer, and I
were talking about this note yesterday in the hall. Our concensus was that
the "missing thread" may be completely innocuous (and irrelevant) if you
attached. ladebug relies on two mechanisms to keep an internal list of
threads up to date: first, it iterates through all "known threads", and then
it keeps the list current by tracking the activation and termination of
threads.

However, there's a small window after the termination event where the
terminating thread is still "known" to our scheduler. Thus, if you attached
AFTER thread 7 "terminated" but before it went away, ladebug would add the
thread to its list, but would not know to remove it. Because of the bug in
the old pthreaddebug library, it didn't get an error when asking for that
thread's registers later, and showed a bogus stack -- but pthreaddebug
ignored the bogus thread ID for "show thread".

In 4.0D, ladebug will receive an error when trying to get the registers... if
nothing else changes, it should be prepared to deal with this situation by
updating the thread list.

However, it occurs to me that ladebug is really tracking the wrong events. I
think it's tracking ACTIVATE and TERMINATE, which would mean it doesn't know
about threads that have been created but haven't yet run, and it has
forgotten about threads that have terminated but haven't yet been joined or
detached. It should probably be tracking CREATE and FREE events, instead. (If
a FREE event has already been issued for thread 7 when you attach, that
thread will not be "known" to our scheduler.)

In any case, all of this has nothing to do with your partner's deadlock.

	/dave
1528.6MUFFIT::gerryGerry ReillyTue Apr 22 1997 13:3838
Well thanks for all the input.  With metering enabled and the latest
ladebug we've found the partner's deadlock.  It is occuring while they
are handling an exception.  That problem can now be fixed.

However, while their system is in the 'hung' state with ladebug attached
and the information from metering available, is their any data available
to me regarding where the threads was when the DECthreads exception was 
raised ?  

The output for Thread 1 from 

(ladebug) pthread "threads -af"
main thread 1 (blocked, timed cond wait) "default thread" (0x63ccf528), created
    by pthread
  Waiting on condition variable 4 using mutex 17; timeout at 
    Mon Apr 21 12:09:01 1997
  Scheduling: throughput policy at priority 11
  Masked signals: none
  Pending signals: none
  Object flags: none; self flags: delay; sched flags: none; mutex flags: none;
    atomic flags: none
  Thread specific data: 0=0x63ccf980, 1=0x140dbc20, 4=0x140b3b40, 5=0x1412bac0,
     6=0x1412be80
  Stack: 0x11fff300; base is 0x11fffffff, guard area at 0x4000000
  General cancelability enabled, asynch cancelability disabled
  Current vp is 1, synch port is 14, vp ID is 13
  Join uses mutex 16 and condition variable 3; wait uses mutex 17 and
    condition variable 4
  The thread's start function and argument are unknown
  The thread's latest errno is 22, the last DECthreads exception caught was
    "exception formatting NYI" (status exception 0x16c9a016 [02662320026])
  The thread has mutexes locked: 47, 48, 89

<<info for other threads deleted as looks uninteresting..>>

Thanks.

-gerry
1528.7The origination point is not recorded.WTFN::SCALESDespair is appropriate and inevitable.Tue Apr 22 1997 14:3421
.6> is their any data available to me regarding where the threads was when
.6> the DECthreads exception was raised ?  

No.  We haven't come up with a feasible way of recording and reporting that
information.

Part of the problem is that the PC at which the exception was originally
raised may not be the one you're interested in, anyway -- what you want is
the deepest PC in _your_code_ which raised or tried to handle the exception! 
:-) And, obviously, there's no way for DECthreads to record that (other than
recording the whole stack, and that's not warranted in a "production"
application).

If you have the luxury of running the application under the debugger (which
you don't in this case), you can set a breakpoint in the "raise" routine in
libexc (exc_raise()?) and check the stack at the point where the exception
originates.  (However, remember that a number of facilities, DECthreads
included, raise exceptions as a part of -normal- operation...)


				Webb
1528.8Thanks.MUFFIT::gerryGerry ReillyTue Apr 22 1997 15:215
Thought that might be the answer but it was always worth asking.

Thanks as always.

-gerry