| First, remember that we're talking about Digital UNIX (and DEC OSF/1) prior
to Digital UNIX 4.0...
> 1, Exactly what happens?
> 2, Exactly what doesn't work?
> 3, Doesn't the .so module automatically map in libpthreads.so and its
> dependencies?
I'll take these together, rather than out of order, since the real answer
comes out of question 3.
Yes, .so files store dependencies, and all libraries on which they're
dependent will be mapped when they're loaded. So if the standard "ls",
completely thread-unaware, ends up dlopen-ing your threaded library, the
loader says "ah ha!" and maps in libpthreads.so, libmach.so, and libc_r.so.
But it maps them AFTER all the libraries.
Because libc was completely thread-unaware, we had libc_r, which preempts a
bunch of libc entry points to make them thread-safe. For bizarre historical
reasons I couldn't possible justify since I argued strongly against them,
some thread-safe functions (notably fork() and malloc()/free()/etc.) were not
even in libc_r, but in libpthreads. Because all of these libraries are mapped
"at the end" of the dependency list, they are unable to preempt all of those
unsafe libc entry points. The result is a program that simply isn't
thread-safe, which means that just about anything can happen.
And changing the loader so that it DID preempt the libc entry points wouldn't
be any better. There's already existing static state in libc, and libc_r is
not a clean and perfect replacement (it's more of a blast shield to keep the
worst of the impact of threads away from poor defenseless libc). You'd have
allocated memory (from malloc) that now couldn't be freed, stdio streams that
don't exist, or have different state, and so forth.
Bummer.
> 4, Why does the main program cc command need a -threads if it doesn't
> reference anything thread like?
In case it's not already obvious, having the main program built with -threads
ensures that the libraries are pulled in IN THE RIGHT ORDER, regardless of
any shared library dependencies.
> 5, How in general is a main program meant to know that some .so it uses
> has threads calls in it?
You just gotta know.
Hey, at least we've fixed most of the problem in 4.0, except that exceptions
don't quite work if the main program wasn't built with -lexc (a problem that
would be VERY easy for the development environment people to solve, since
libexc preempts only one libc symbol). And of course code built without
_REENTRANT would use the wrong errno -- which may or may not be a problem
(and we'd like to get this solved with the compiler folks, but we haven't
worked out a good method yet).
Solaris, by the way, has this problem in spades, with no solution in sight
(and no sign that anyone even wants to solve it). They didn't think of
something like TIS, so they made libc thread-safe by making direct calls to
thread synchronization functions. They made it work without the thread
library by putting stubs for all of these functions into libc. So threads
simply don't work, at all, even potentially, if you load the thread library
after libc. Period, end of game.
/dave
|
| .2> So is there any easy way to tell from a core dump or from the behaviour
.2> of a customer program that it was linked properly?
You might try asking the debugger where the fork() function (or any of several
others) are: if it shows up in libc then things are bad; if it shows up in
libpthreads things are, well, worse? ;-)
.2> I presume that not linking it properly leads to flaky problems like
.2> corrupted malloc's etc from the use of non thread safe functions from
.2> libc.
Well, it leaves the application open to reentrancy problems, which typically
result in corruption-type problems.
.2> it seems to be the Solaris approach is preferable to the
.2> pre V4.0 Digital approach because the failure mode is HARD rather than
.2> flaky failures that burn up large amount of support time to find.
No, I think the net result on Solaris is the same -- you've got a threaded
process using unsafe functions.
Webb
|