| Hi,
Since no one else has replied, I thought I'd better... In an ideal,
bug-less world, no one would ever see any form of "EXE ABORT" message;
however, since we live in the real world, here's my cut:
First, I'm not exactly sure what is meant by 'EXE ABORT', so
I'll take 2 general cases:
1. An executor process dies and the client gets back a -2035 (if
waiting for a response or -2003 (if sending a request and the
executor died after successfully returning the last response).
This situation is usually caused by a client executing a request
which causes some software on the back-end to bug-check. It could
be a SQL/Services, SQL, Rdb Dispatch or Rdb Exec component.
However, this can also happen if someone does a STOP/ID or kill -9
on an executor.
2. An executor dies and the failure is recorded in a version-dependent
server log file, but there's no client to which the error can be
returned.
This situation is usually caused by some sort of failure while
SQL/Services is trying to clean up a client connection that
terminated. Remember that SQL/Services gets notified of client
termination in one of only 2 ways:
1. The client calls sqlsrv_release, in which case the executor
receives and processes a SQL/Services Release message; and
2. Everything else. Eg, call to sqlsrv_abort, user terminates
application, user reboots PC, user trips over power cable or
comms cable, disconnecting one or both. For all these cases,
the SQL/Services dispatcher will eventually get a network
disconnect event of some sort and tell the executor to clean up.
If for any reason the executor fails to clean up, then it will take
itself down and the comm server (<=V6.1) or monitor (>=V7.0) will write
an entry in the log file. There three reasons why an executor will fail
to clean up:
1. Some software component bugchecks, resulting in the executor dying
involuntarily, or
2. The executor detects that cleanup has failed so it kills itself.
Reasons for this might be because it thinks there's a transaction
still active, there are statements it can't free or it can't detach
from a database in the case of a universal service.
3. The dispatcher has requested that the executor cleanup, but it can't
because its in the middle of a database request which is stalled for
some reason (eg, an UPDATE request waiting for a lock). In this
case, the dispatcher will nuke the executor process after waiting
30 seconds or so (I don't remember the exact number) for it to clean
up voluntarily.
In any event, cases 1 and 2 above should happen if everything is working
correctly. However, I believe there might still be a problem or 2 in
V6.1 where cleanup doesn't always happen correctly. I don't personally
know of any such problems with 7.0 (at least not when I last worked on
it some months ago now), but there might be something.
If clients are seeing -2035's and/or unexplained executor failures are
being logging in the monitor log file, then you should review the
specified executor log to determine the reason for the failure. Note
that the 7.0 ECO 1 software has a bunch more logging software in the
dispatcher and monitor components to record cases when clients go
away and executor processes have to be cleaned up.
Hope that helps,
Si
|