[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference turris::digital_unix

Title:DIGITAL UNIX(FORMERLY KNOWN AS DEC OSF/1)
Notice:Welcome to the Digital UNIX Conference
Moderator:SMURF::DENHAM
Created:Thu Mar 16 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:10068
Total number of notes:35879

9511.0. "pthread_create and can't grow stack" by VAXRIO::63197::Meyer () Tue Apr 15 1997 21:38

Customer's program is aborting with "can't grow stack" when it executes a
"pthread_create" on OSF v3.2c.  This same application runs okay on v4.0.

I have applied the last patches OSF on v3.2c but it did not solve the problem.
Library libpthreads.so from the patch's file appears to be equal the original
one.

Any suggestions ?

Thanks.

Carlos Meyer
MCS/CSC - DEC/Brazil


T.RTitleUserPersonal
Name
DateLines
9511.1Try increasing the stack sizeWTFN::SCALESDespair is appropriate and inevitable.Wed Apr 16 1997 03:5313
Try increasing the stack size of the thread which is calling pthread_create()
(by specifying a larger value in the attributes object used to create the
thread which is experiencing the problem).

If that has no apparent effect, or if the thread which is calling
pthread_create() is the initial thread, then you need to look for problems
which would result in recursive signal delivery.  One would be if the
customer's program is trying to handle signals (i.e., SIGSEGV).  Another
would be if there were something wrong with the version of libexc on the
customer's system.


					Webb
9511.2no successVAXRIO::63197::MeyerFri Apr 18 1997 13:1827
      <<< TURRIS::DISK$NOTES_PACK2:[NOTES$LIBRARY]DIGITAL_UNIX.NOTE;1 >>>
Hi Webb,

> Try increasing the stack size of the thread which is calling  
> pthread_create()
> (by specifying a larger value in the attributes object used to create the
> thread which is experiencing the problem).

I raised stack size (pthread_attr_setstacksize) to 64K without success. Should
I try higher values ? 

> If that has no apparent effect, or if the thread which is calling
> pthread_create() is the initial thread, then you need to look for problems
> which would result in recursive signal delivery.  One would be if the
> customer's program is trying to handle signals (i.e., SIGSEGV).  Another
> would be if there were something wrong with the version of libexc on the
> customer's system.

Problem occurs when that pthread_create is called by the fourth time, and
it is easily reproduceable in my system (v3.2d1) too. I checked libexc in my
system and it is okay. A grep in the customer programs did not show the 
delivery of signals.

Do you have any other suggestions ? 

Regards,
-meyer
9511.3SMURF::DENHAMDigital UNIX KernelFri Apr 18 1997 22:133
    I seem to remember this can happen when the application is
    built wrong. It's a WAG, but how was this application compiled/linked?
    (Your usual question, Webb!)
9511.4Running out of ideas...WTFN::SCALESDespair is appropriate and inevitable.Mon Apr 21 1997 18:2119
.2> Problem occurs when that pthread_create is called by the fourth time

If you mean the call is in a loop and fails on the fourth iteration, then I'm
out of suggestions...  :-|

If you mean that the fourth call along some (possibly recursive) code path is
failing, then I'm not sure what to suggest...  :-/

.2> I raised stack size (pthread_attr_setstacksize) to 64K without success. 
.2> Should I try higher values ? 

I would have thought that 64K would have been enough, but feel free to try
larger values (try a meg! :-).  The key question is, does changing the stacksize
(make sure you change it by at least 8K each time) change the behavior at all? 
And, you -are- changing the stack size of the thread calling pthread_create(),
not the stack size used in the call which is failing...right?  :-}


				Webb
9511.5piece of codeVAXRIO::63197::MeyerTue Apr 22 1997 14:5351
    Re .3:

>>  I seem to remember this can happen when the application is
>>  built wrong. It's a WAG, but how was this application compiled/linked?

# cxx -c -g -non_shared -O -I/usr/include -I. -I../Header -I/usr/include/cxx 
`pwd`/DataOutputDriver.cxx

# cxx -o SML -O ... mainsml.o -L/usr/ccs/lib -lm -lrt -threads

   R3 .4:

>> If you mean that the fourth call along some (possibly recursive) code path 
>> is failing, then I'm not sure what to suggest...  :-/

   That's it.

>> And, you -are- changing the stack size of the thread calling
>> pthread_create(), not the stack size used in the call which is failing... 
>> right?  :-}


  pthread_attr_t attr_obj;

    pthread_attr_create(&attr_obj);
    pthread_attr_setinheritsched(&attr_obj, PTHREAD_DEFAULT_SCHED);
    pthread_attr_setprio(&attr_obj, PROCESS_PRIORITY);
    pthread_attr_setsched(&attr_obj, SCHED_FIFO);
/**********************************************************/
    pthread_attr_setstacksize(&attr_obj,64000);
/**********************************************************/ 

    OutputArg.Self = this;
    OutputArg.ip_address = (int)parameters[0];
    OutputArg.port = (int)parameters[1];
    OutputArg.type = (int)parameters[2];
    OutputArg.need_to_convert = (int)parameters[3];
    OutputArg.block_option = NON_BLOCKING;

printf("Before \n");

    status = pthread_create(&ThreadId,attr_obj,
(pthread_startroutine_t)DataOutputDriver_Communication, 
(pthread_addr_t)&(this->OutputArg));


    printf ("Status = %d\n",status);

Rgds,
-meyer.

9511.6SMURF::DENHAMDigital UNIX KernelTue Apr 22 1997 17:1610
    The "sendsig: can't grow stack..." event should leave a core
    file and should tell you what the signal was. Got any information
    like that to share?                                          
    
    It could be that the new thread was given a bad stack pointer.
    
    When it then jumps to its start routine from the kernel,
    the push onto the bogus sp would cause a seg fault. The
    kernel would try to push the signal data onto the bogus
    stack. When that fails, you get the can't grow stack message.
9511.7Wrong thread??WTFN::SCALESDespair is appropriate and inevitable.Tue Apr 22 1997 17:3934
.5> # cxx -c -g -non_shared [...]

I wouldn't think that "-non_shared" did anything on a _compile_ command line
(i.e., given that you're specifying "-c", you should never get to the point of
using the "-non_shared"), but I don't know C++ that well...

.5> # cxx -o SML -O ... mainsml.o -L/usr/ccs/lib -lm -lrt -threads

This, the link command line, is where you'd normally specify
"-non_shared"...only you're not doing it here.  Weird...


But, anyway...

.5> printf("Before \n");
.5> 
.5>     status = pthread_create(&ThreadId,attr_obj,
.5> (pthread_startroutine_t)DataOutputDriver_Communication, 
.5> (pthread_addr_t)&(this->OutputArg));
.5> 
.5>     printf ("Status = %d\n",status);

So, when you run this piece of code, you see "Before" but you don't see
"Status", right?  And therefore, you think the problem is occuring in the thread
which is calling pthread_create().  If this is the case (and not that the thread
created by this call to pthread_create() is the problem), then you are
increasing the wrong stack!  You need to find where the thread which executes
the supplied code is created, and increase ITS stack.

But, I'll second Jeff's suggestion: look in the core file and see what's
actually happening!


				Webb
9511.8SIGSEGVVAXRIO::63197::MeyerWed Apr 23 1997 16:5947
    Re .5:
>>     The "sendsig: can't grow stack..." event should leave a core
>>     file and should tell you what the signal was. Got any information
>>     like that to share?
>>     It could be that the new thread was given a bad stack pointer.
 
    Yes, I agree you. From the core file:
    signal segmentation fault at >*[thread_info ...]
 
    Re .6:

>> .5> # cxx -o SML -O ... mainsml.o -L/usr/ccs/lib -lm -lrt -threads

>> This, the link command line, is where you'd normally specify
>> "-non_shared"...only you're not doing it here.  Weird...

   I tried "-non_shared" with the linker but no success.

>> So, when you run this piece of code, you see "Before" but you don't see
>> "Status", right? And therefore, you think the problem is occuring in the
>> thread which is calling pthread_create().

   Yes, it is rigth !

>> If this is the case (and not that the thread created by this call to
>> pthread_create() is the problem), then you are increasing the wrong stack!
>> You need to find where the thread which executes the supplied code is
>> created, and increase ITS stack.

   What I understood reading customer's applications is that there is no
   creation of threads in the caller. The caller calls a function in the
   usual way, and this function creates the thread. If I understood your
   suggestion correctly I can not follow it in this case.

   Maybe some part of the code was really destroyed and that was the reason
   of the segmentation violation (SIGSEGV), and I will try to discover its
   origin. Patch 105 for v3.2c (DECthreads Memory Handling Correction)
   fits this situation very well but it did not work for v3.2c.

   Fellows, do not leave me alone. New suggestions, please !.
 
   Rgds,
   -meyer.

   BTW: stack size raised to ~128K and ... nothing changed.


9511.9-threads for everythingTUXEDO::CHUBBWed Apr 23 1997 17:085
    FWIW, if you're using -threads on the link line (cxx -o program ...)
    then you should use it for the compile as well.  That way its Draft-4
    threads all the way around.
    
    -- brandon
9511.10SMURF::DENHAMDigital UNIX KernelWed Apr 23 1997 20:422
    Anything more useful from the core file? The thread_info is
    intriguing, if it's not just random garbage.
9511.11I think we're closing in!WTFN::SCALESDespair is appropriate and inevitable.Wed Apr 23 1997 22:4728
.8> signal segmentation fault at >*[thread_info ...]

Ah-hah.  Yes, this function is called during thread creation, and it's a PIG!
(It requires 8100 bytes of stack all by itself!) 

.8> What I understood reading customer's applications is that there is no
.8> creation of threads in the caller. The caller calls a function in the
.8> usual way, and this function creates the thread. 

Yes, but, the _caller_ is a thread, too!  It was created somewhere.  Find out
where.  You should be able to get some good clues from the core file --
what's the stack trace look like at the point when the segv occurs?


.8> I tried "-non_shared" with the linker but no success.

I wouldn't expect it to change anything (on V3), I was just pointing out an
inconsistency.  (Thanks to Brandon for pointing out another one!  Yes, you
need -threads both for compilation and for linking.)

.8> Maybe some part of the code was really destroyed and that was the reason
.8> of the segmentation violation (SIGSEGV), and I will try to discover its
.8> origin.

The stack overflow is detected because it causes a segmentation violation in
a specific area adjacent to the last page of a thread's stack.  (There's no
need to try to discover its origin...)

9511.12Solved !VAXRIO::63197::MeyerThu Apr 24 1997 14:2023
    Okay Guys you were really fantastic !

    The answer to this problem were in one of the previous replies from you
    but I am not used to these "thread" things and I did not do my homework
    very well.

    Pat Lambert from the CSC of Atlanta suggested me to increase the     
    stacksize in all routines with pthread_create and it solved the problem. 
    After that I tried to increase each one of them individually and when I   
    raised the stack size of the "Manager" routine to 64k it worked fine.

    This is a fligth simulator applic and involves lots of routines and I
    just examined the routine which calls DataOutputDriver and it does not 
    have pthread_create, but the routine which calls it does, and that was
    the problem.

    There are two questions not perfectly answered yet to me.

    1) How could the coder foresee correct values for the stacksize ?
    2) Why did DU v4.0 work fine ?
    
    Thanks for everybody !
    -Meyer.
9511.13Stack space is a fixed, scarce resource.WTFN::SCALESDespair is appropriate and inevitable.Thu Apr 24 1997 20:4724
.12> There are two questions not perfectly answered yet to me.

I'll answer your second one first...

.12> Why did DU v4.0 work fine ?

Because it no longer uses the piggy function which was causing the stack
overflow, so it could get by with less stack space.

.12> How could the coder foresee correct values for the stacksize ?

In short, he can't.  With each new version of API products and system run-time
libraries, the amount of stack space a thread requires could change.  As people
gain more experience with threads, perhaps the providers will be sensitive to
stack space requirements (as we now are with interface compatibility) and will
document the requirements and/or ensure that they don't increase radically over
time.  (Or, perhaps, with some magic we can avoid the necessity of having small,
fixed-size stacks!)  For now, the coder must guess how much space a thread
needs; in the not too distant future we expect to have tools which will help
ensure the guess is accurate, but there will be a need to leave some "extra"
space for "future expansion" for some time, yet....


				Webb
9511.14thanks !VAXRIO::63197::MeyerThu Apr 24 1997 21:159
Webb, 

Due to your explanations things are more clear to me and now I fell better to
talk to the customer.

Thank you very much.

Regards,
-Meyer.