[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference help::osi_appl_support

Title:Please read note 1.0ELP::OSI_APPL_SUPPORT
Notice:Please read note 1.0
Moderator:DRAGNS::MILLERCOM::S_WATTUM
Created:Mon Aug 30 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:516
Total number of notes:2729

510.0. "ACCVIO in OSAK DECnet Plus 7.1." by FRAIS::CELLAM () Wed Apr 23 1997 14:23

    Scott,
    
    	I wanted to send you a mail but nothing works at the moment.
    I have a customer running DEComni with OSAK under OpenVMS 7.1 Alpha
    and DECnet Plus 7.1, so I guess this is OSAK 3.0P, image dates are
    21-Oct-96.  The customer says that his application is access violating
    and the image pointed to is osak$share, the rel pc is 16c098.  I'm
    awaiting more details and have suggested some traces etc...  In the
    mean time, is there any fixes for an Accvio in this or other areas?
    
    Thanks,
    
    Chris
T.RTitleUserPersonal
Name
DateLines
510.1RMULAC.DVO.DEC.COM::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringWed Apr 23 1997 15:0332
I can be reached via SMTP directly at s_wattum@rmulac.dvo.dec.com -
unfortunately, the move of EDS to the Crosspoint location has manaaged to
totally disrupt network connectivity for my usual email address.

I'm not aware of any ACCVIO problems within OSAK that match the limited
footprint info you've provided.  Sometimes ACCVIO's in OSAK are an
application issue; OSAK does virtually no checking to ensure that various user
pointers passed in the OSAK PB are valid (no PROBER or PROBEW's) - so if the
application has messed up a PB pointer, or possibly has not correctly
initialized the parameter block, you could easily have an ACCVIO in OSAK which
is not OSAK's fault.  Further, it's entirely possible that an application error
may exist and not manifest on different versions - simply because of differences
in memory allocation or stack contents.

Information that would be needed is to further isolate this is:

1) The actual OSAK call that is failing.
2) a dump of the OSAK PB, along with any of the data structures pointed at
   by the PB - that is, verify all pointers, etc.
3) and OSAK_DIAG trace might also be helpful - if it happens that there is
   a diag nearby, we might be able to further isolate where in OSAK the ACCVIO
   is happening.
4) Since it is possible that OSAK could ACCVIO because of incorrectly encoded
   data, an OSAK trace might also be useful.
5) And, of course, the frequently requested IPMT.

If this doesn't do it, we might be able to get you a version of OSAK linked
DEBUG and that coupled with a process image dump would allow us to isolate
things further.


--Scott
510.2RMULAC.DVO.DEC.COM::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringThu Apr 24 1997 13:196
If/when you escalate the IPMT to us, could you email me directly with the
information.  Because of continueing network problems, I can't reliably get
to IPMT.

Thanks,
--Scott
510.3Prepare a Debug OSAK For OpenVMS Alpha 7.1.EICSMS::CELLAMWed May 07 1997 12:5920
    Scott,
    
    	the customer is placing a support call now, it should be going to
    the DECnet OSI support and they will hopefully open an IPMT.  I've gone
    through the transport trace and see that the accvio occurs when an
    inbound connection request cannot be answered, unfortunately the CR
    messages are not being traced so I don't know how many CRs are back
    logged. I see that a DR is returned with Reason 80.  Prior to this the
    last data transfer occured 4 seconds before the accvio, this was the
    receipt of an MMS request PDU, it is ak'd by transport, but I don't see
    any MMS response PDU, which is strange.  Which OSAK call is in use is
    hard to tell. We can't use the osak_diag tracing, this slows down the
    data transfers too much and this is bad news, we've had enough problems
    with our wide area routers causing delays and/or throwing away packets.
    We are exchanging data between 2 railway signalling/control stations 
    (one from SEL-Alcatel on a Alpha OpenVMS and the other from Siemens on 
    a SCO-UNIX PC) in between the two stations is a high speed ICE train.
    The customer is opting for the debug image and process dump route.
    
    Chris
510.4SEL/Alcatel Exception Trace Log.FRSSMS::CELLAMWed May 07 1997 15:46314
510.5CANTH::WATTUMScott Wattum - FTAM/VT/OSAK EngineeringWed May 07 1997 16:5717
    I'm confused, you have the name of the routine which apparently called
    OSAK and the line number - why can't you tell what OSAK routine was
    called?  Do you not have access to the source for the application using
    OSAK?
    
    The virtual address looks like OSAK was called with a null pointer to a
    structure that was expected to be supplied; when OSAK tried to
    dereference the pointer and gain access to one of the members, the
    accvio happened.  Just a guess though.
    
    Valerie suggests turning on all diags *except* routine entry/exit,
    which would be a value 14 - this will allow us to look at error
    handling, and other misc. diagnostics which should not have as much
    impact on the application.
    
    --Scott
    
510.6VMS 7.1 OSAK Doesn't Have My IPMT Fixes.EICSMS::CELLAMMon May 12 1997 13:5425
    Scott,
    
    	we just did an image date check, it appears that the V7.1 OSAK
    sharables are earlier than the one used in V6.3, I received a new
    sharable library for V6.3 in November.
    
    	With regards to the code, it's useless, the code line is for
    sys$hiber, can't tell what was happening.
    
    	I'm getting an acces violation here with the image from nov,
    however, I can't see anything useful in the osak diag trace.  I'll
    send you a mail with the details.  
    
    OmniOsakDT_OpenResponder exception caught from osak_open_responder: 12
    
    14:7:20.759 Listen failed for VMD MMS_IS_LCL_103 103... 12 0 0
    %SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual
    address=000000000000
    0000, PC=0000000000000000, PS=00000000
    
    14:7:20.801 Listening for a connect request from VMD 103...
    
    This may or may not be related to the customer problem.
    
    Chris 
510.7RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringMon May 12 1997 14:2412
>    	we just did an image date check, it appears that the V7.1 OSAK
>    sharables are earlier than the one used in V6.3, I received a new
>    sharable library for V6.3 in November.

We were code frozen for 7.1 around that time, so it's likely that what we gave
you is later than what shipped with 7.1.    	

>With regards to the code, it's useless, the code line is for
>    sys$hiber, can't tell what was happening.

Which suggests that OSAK was probably executing a completion AST of some sort.

510.8New Image Same osak_open_responder accvio..EICSMS::CELLAMTue May 13 1997 15:1550
    Scott,
    
    	just for info, my osak_diag log file seems to always end at this
    point:-
    
    
    13-MAY-1997 16:54:00.13/RTN/Entering osak_get_event(). port = 20363864
    13-MAY-1997 16:54:00.13/RTN/Entering async_get_event()
    13-MAY-1997 16:54:00.14/RTN/Leaving async_get_event() #6
    13-MAY-1997 16:54:00.15/RTN/Leaving osak_get_event() #5, status =
    44741635
    13-MAY-1997 16:54:00.17/RTN/Entering osak__async_close_ast()
    13-MAY-1997 16:54:00.18/CP/The value of port->encode_q = 0
    13-MAY-1997 16:54:00.19/CP/The value of port->transmit_q = 0
    13-MAY-1997 16:54:00.20/CP/The value of port->transmit_exp_q = 0
    13-MAY-1997 16:54:00.21/CP/The value of port->free_pb_q = 0
    13-MAY-1997 16:54:00.45/RTN/Entering osak_open_responder().
    13-MAY-1997 16:54:00.47/RTN/Entering osak__check_process_priv()
    13-MAY-1997 16:54:00.48/RTN/Leaving osak__check_process_priv() #6
    13-MAY-1997 16:54:00.49/RTN/Entering osak__check_mgmt_availability()
    13-MAY-1997 16:54:00.50/CP/Disabling ASTs. Status = 9
    13-MAY-1997 16:54:00.51/CP/The value of management availability is = 1
    13-MAY-1997 16:54:00.52/RTN/Leaving osak__check_mgmt_availability() #1
    13-MAY-1997 16:54:00.53/CP/Disabling ASTs. Status = 9
    13-MAY-1997 16:54:00.54/RTN/Entering osak__tr_open_responder()
    13-MAY-1997 16:54:00.55/ port = 20566024
    13-MAY-1997 16:54:00.57/ pb = 22835952
    13-MAY-1997 16:54:00.58/RTN/Entering tsel_known()
    
    
    The last application log is the accvio, i.e.
    
    16:53:57.450 Listen failed for VMD MMS_IS_LCL_84 84... 80119874 0 0
    %OMNI-E-LISTEN_ERR, Listen Error
    
    16:53:57.479 Listening for a connect request from VMD 84...
    
    16:54:0.117 Accepting a connect for vmd number 83
    
    OmniOsakDT_OpenResponder exception caught from osak_open_responder: 12
    
    16:54:0.601 Listen failed for VMD MMS_IS_LCL_84 84... 12 0 0
    %SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual
    address=000000000000
    0000, PC=0000000000000000, PS=00000000
    
    16:54:0.619 Listening for a connect request from VMD 84...
     Interrupt
    
    $ stop
510.9RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringTue May 13 1997 15:5027
That helps, but the ACCVIO information looks suspect to me.  I mean a PC of 0?
A VA of 0 I could maybe inderstand (well, almost), but not a PC of 0.  Is it
possible that the signal handler that appears to be in place is corrupting or
not providing any information except the ACCVIO status?

All this routine does is take the tsap that was specified in the PB of the
open_responder() call and compare it with a list of known tsaps to see if OSAK
might already be listening on that tsap.  Is it possible that the tsel.pointer
you are passing in (as part of the osak_paddress/osak_aei structure, the
local_aei field in the pb) has a munged pointer (maybe something got allocated
on the stack of a subroutine and is no longer valid)?

the code basically does:

for( every_known_tsap )
  if( a_known_tsap_length == the_user_tsap_length )
    status = compare_byte_by_byte( a_known_tsap with the_user_tsap )
    if( status == the_same )
       return true;

Once you get this accvio, does every subsequent open_responder() call fail?  If
so then I would begin to suspect that OSAKs list of known tsaps was corrupted.
If not, I would begin to suspect the application using OSAK is passing a munged
pointer.

If needed, we could probably insert some additional diagnostics in this routine
to dump out the pointers being used by both OSAK and the user.
510.10Perhaps a Catching Problem, here's another problem.EICSMS::CELLAMWed May 14 1997 10:4472
    Yeep, the cathing and raising of the errors in .8 look suspect, I've
    asked for the code so that I can review it to see if we could better
    TRY and CATCH this error.
    
    Mean-while I'm doind some other tests and I'm seeing a problem in
    the osak_associate_request processing, there seems to be a problem 
    getting the template, why this should start to fail now I don't
    know.
    
    Here's the OSAK diag
    
14-MAY-1997 11:30:20.56/RTN/Leaving osak_open_initiator() #8, status = NORMAL, port = 15025568
14-MAY-1997 11:30:20.57/RTN/Entering osak_associate_req(). port = 15025568 
14-MAY-1997 11:30:20.58/RTN/Entering osak__find_acse_pctxtid()
14-MAY-1997 11:30:20.59/RTN/Leaving osak__find_acse_pctxtid() #3. Status = 44740609
14-MAY-1997 11:30:20.61/RTN/Entering osak__assoc_req_copy()
14-MAY-1997 11:30:20.62/RTN/Entering osak__create_encode_block()
14-MAY-1997 11:30:20.63/RTN/Leaving osak__create_encode_block() #2
14-MAY-1997 11:30:20.64/RTN/Entering osak__tr_connect_req()
14-MAY-1997 11:30:20.65/RTN/Entering t_connect_get_template()
14-MAY-1997 11:30:20.66/CP/The transport template = IEEE
14-MAY-1997 11:30:20.67/ERR/Transport error
14-MAY-1997 11:30:20.68/RTN/Leaving osak_emaa_call() #13
14-MAY-1997 11:30:20.69/ERR/Error - EMAA call to get template's network type failed
14-MAY-1997 11:30:20.70/RTN/Leaving t_connect_get_template() #9
14-MAY-1997 11:30:20.72/RTN/Leaving osak__tr_connect_req() #7
14-MAY-1997 11:30:20.73/RTN/Entering osak__service_end()
14-MAY-1997 11:30:20.75/ERR/Error returned from the state table
14-MAY-1997 11:30:20.76/RTN/Entering osak__free_encode_block()
14-MAY-1997 11:30:20.77/RTN/Leaving osak__free_encode_block() #514936680
14-MAY-1997 11:30:20.79/RTN/Leaving osak__service_end() #3
14-MAY-1997 11:30:20.82/RTN/Leaving osak_associate_req() #3, status = 44743906

    The application log reports, again the catch has a problem, we should
    be able to continue from here:-
    
    
11:30:19.942 Requesting a connect with VMD 48...
EXC: osak dt: Fatal error associate request. code = 44743906.
        osak_status_1:  44743906	OSAK_S_TRANSERR
        osak_status_2:  0
        transport_status_1:     20	%SYSTEM-F-BADPARAM, bad parameter value
        transport_status_2:     0
Raising exception 44743906 at 
SYS$SYSDEVICE:[OMNI.LIBRARY.OMNIMMS022.TEST]OMNI_OSAK_UTIL.C;25:2200
%CMA-F-EXCCOP, exception raised; VMS condition code follows
-OSAK-E-TRANSERR, there is an error in the Transport provider
%TRACE-F-TRACEBACK, symbolic stack dump follows
  image    module    routine             line      rel PC           abs PC
 PTHREAD$RTL                                0 000000000003D09C 000000000096B09C
 CMA$RTL                                    0 00000000000341E4 00000000008F01E4
 OMNI_AST_BASIC_SHR                         0 00000000000D416C 00000000001E816C
 OMNI_AST_BASIC_SHR                         0 00000000000CB5A4 00000000001DF5A4
 OMNI_AST_BASIC_SHR                         0 00000000000CDC2C 00000000001E1C2C
 OMNI_AST_BASIC_SHR                         0 00000000000CC8AC 00000000001E08AC
 OMNI_AST_BASIC_SHR                         0 000000000021B810 000000000032F810
 OMNI_AST_BASIC_SHR                         0 000000000023CE08 0000000000350E08
 OMNI_AST_BASIC_SHR                         0 00000000000EEF9C 0000000000202F9C
 OMNI_AST_BASIC_SHR                         0 000000000023B0A0 000000000034F0A0
                                            0 FFFFFFFF800BD3A8 FFFFFFFF800BD3A8
                                            0 FFFFFFFF800A4448 FFFFFFFF800A4448
 OMNI_AST_BASIC_SHR                         0 000000000027023C 000000000038423C
 TEST_MMS_CONNECT_A  TEST_MMS_CONNECT_A  connect_a
                                        18798 0000000000000350 0000000000030350
 TEST_MMS_CONNECT_A  TEST_MMS_CONNECT_A  main
                                        19388 0000000000001B2C 0000000000031B2C
 TEST_MMS_CONNECT_A  TEST_MMS_CONNECT_A  __main
                                            0 00000000000000A4 00000000000300A4
 PTHREAD$RTL                                0 000000000004C148 000000000097A148
 PTHREAD$RTL                                0 0000000000030664 000000000095E664
                                            0 FFFFFFFF826FB0D8 FFFFFFFF826FB0D8
$
510.11RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringWed May 14 1997 12:3016
Well, the first thing to check would be to see if the template IEEE did in fact
exist.  ncl> show osi tran temp ieee all

We've never had any reported problems with the emaa routine.  if the template
does exist via the NCL SHOW command (which is all that our emaa routine is
really doing), then I would maybe check net$acp and make sure it hasn't depleted
it's pagefile quota or somesuch (this is a long shot).

One thing which concerns me - it looks like you are using threads.  You do know
that OSAK is not thread safe?  The application must either have only 1 thread
which calls into OSAK, or it needs to put its own mutexes around the OSAK calls
to provent other threads from executing in OSAK when another thread is already
in OSAK.  Failure to do this will certainly result in strange failures within
OSAK.  Does DEComni do this?

--Scott
510.12Can't See Any difference in our trace.EICSMS::CELLAMWed May 14 1997 13:1317
    This version of the DEComni sharable library doesn't use thread, it
    does however, use the cma error routines.  We have another sharable
    for threads and the osak calls are mutexed.
    
    I've turned on our debug traces, however, the data that is being passed
    is always identical, with the exception of the TSAP value.  The
    template used is always the same.  I can't see what could be going
    wrong, I presume that OSAK takes our TSAPs, Template/NSAP and builds
    up an NCB Item list, the QIOW is called with IO$_ACCESS and this call
    fails, either in the status or in the IOSB.  I can't see what return 
    status/iosb codes are returned other that 20 which is passed up.
    Any pointers as to what I should look for, I guess I could dump
    out the pb?
    
    Thanks,
              
    Chris
510.13RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringWed May 14 1997 14:3418
The BADPARAM error in the last diag trace was actually a result of a call into
SYS$EMAA; if OSAK cannot determine the type of the template (CONS, CLNS or RFC)
then it can't call QIO - so we didn't get that far.

Again, this points to bad data being passed into OSAK and not necessarily a
problem with OSAK.  The data in this case would again by the TSAP (for the
called_aei).

My suggestion for this one would be to dump out the data in the osak_paddress
structure hung off of the osak_aei pointed at by called_aei in the pb; pay
specific attention to the tsap and nsap members (that is, dump everything in
them).  Double check that IEEE does in fact exist as a transport template. 
While NCL doesn't do quite the same thing as OSAK does when talking to EMAA, if
EMAA was having a problem I would expect it to show up in both places (that is,
both with OSAK and with an NCL SHOW).

--Scott

510.14Error 20 Smells Like a Quota Problem, but which process.EICSMS::CELLAMWed May 14 1997 14:5410
    So the template's there, net$acp's ok, process quota's ok, ncl's osi
    transport connects, max nsaps ok.  The pb block appears to be ok,
    I've printed out the sap names etc.. plus the lengths, they are ok
    to.  Changed one catch and at least for osak_associate_req we keep on
    running after an error.  It smells like a quota problem, but sda and
    sho proc say everything is ok for my users processes and net$acp.
    What about the osak process's?
    
    Chris
    
510.15RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringWed May 14 1997 14:5720
oops.  My mistake.  We aren't dealing with a tsap problem here, but a template
problem.  Sorry about that.

OSAK will return BADPARAM under the following conditions:

If the pointer to the template, the pointer to the network type (if the nsap
structure was messed up - not something the user would need to worry about) or a
pointer to working memory provided by the user allocation callback routine are
null (the user alloc callback will be asked for 2K of memory).  The template
name information is hung off of the pb in the 'transport_template' list.  This
info should be validated.

OSAK will also return BADPARAM if the template name is not in the range of 1-512
in length. 

And we can return BADPARAM is the call to EMAA to request information on the
network type of the template failed.

--Scott

510.16RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringWed May 14 1997 15:0216
>    What about the osak process's?

If you're refering to osak$server or osak$netman they aren't involved at this
point.

The only other thing that comes to mind is if your allocation routine provided
on the osak_open_initiator call is failing to allocate the 2K of memory we need
for this - this would be a process issue for the process doing the
osak_associate_request.

The call to SYS$EMAA can fail which would result in a BADPARAM status - but my
memory says it won't fail because the template doesn't exist.  We may need to
perform some EMAA tracing, and I'll have to check on how to do that (or if it
can be done).

--Scott
510.17Pointer Names Of Interest?EICSMS::CELLAMWed May 14 1997 15:186
    What are the names of the pointers that I need to look at i.e. for
    the network type and the working memory?
    
    Thanks,
    
    Chris
510.18RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringWed May 14 1997 15:2820
COuld you try something for me?

Could you copy the file DRAGNS::NET_TEST.EXE over to your AXP system and
run it.  It will query for a template name and then will call the same routine
OSAK normally uses and will print out how we did.

You should see output like what follows - mainly i'm interested in if the
routine returns 0 for a status (an empty line will exit you).

RMULAC $ run net_test
Enter the template name: default
GetNetworkType returned a status of 1 and a networkType of 1 for template defaul
t
Enter the template name: osit$rfc1006
GetNetworkType returned a status of 1 and a networkType of 3 for template osit$r
fc1006
Enter the template name: garbage
GetNetworkType returned a status of 1 and a networkType of 0 for template garbag
e
Enter the template name:
510.19RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringWed May 14 1997 18:502
I am told that SYS$EMAA uses BYTLM, so make sure your BYTLM quota is sufficient,
or try raising it.
510.20SYS$EMMAEICSMS::CELLAMThu May 15 1997 10:403
    Could we have a guideline for BYTLM, where do I get docs on sys$emaa
    calls?  Does this call have only success/fail staus codes, is there
    something else that would be more useful?
510.21RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringThu May 15 1997 11:5814
I do not have any guidelines to supply on bytlm because it is application
dependant (# of concurrent associations, outstanding I/O requests, etc.).
All I can suggest is to increase it - I would start by doubling what you
currently have.

I'm not aware of any formal documentation on sys$emaa().  I don't know what
granularity of status codes emaa returns - however, OSAK internally maps any
return from emaa into a simple success/failure status.  This mapping is buried
inside of some kernel mode code, so we can't even place any OSAK diagnostic
tracepoints in the code.  Valerie and I were discussing whether we might be able
to return the actual emaa status as the secondary transport status in the status
block, but we aren't sure at this point whether the effort would be justified.

--Scott
510.22More Status Info Can Only Be Goodness.EICSMS::CELLAMThu May 15 1997 12:579
    Yeep, it looks like it is/was a bytlm problem, this call really eats
    up bytlm.  As to whether or not it is worthwhile including the sys$emma
    error details in the osak status codes, my personall opinion is that
    any useful status info is goodness for the product and also for the
    reduction in support costs.  This error 20 problem has eaten up lots of
    my time, now that its overcome I can look further into the access
    violation I'm getting on the responder side.
    
    Chris
510.23RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringThu May 15 1997 13:242
Well, I would tend to agree - it all depends on whether emaa is returning
anything useful.  We'll throw it on the todo list for a future release.
510.24Back to my ACCVIO in OSAKEICSMS::CELLAMThu May 15 1997 13:48125
    Great.
    
    Ok now I've trapped an accvio in OSAK$OSAKSHR, the osak diag says
    this was the last sequence, I'm using the debug image.  Does any of
    this help.
    
    OSAK /15-MAY-1997 15:07:56.83/RTN/Entering osak_open_responder.
    Parameter block = 26658352
    OSAK /15-MAY-1997 15:07:56.84/RTN/Entering osak__check_process_priv()
    OSAK /15-MAY-1997 15:07:56.85/RTN/Leaving osak__check_process_priv() #6
    OSAK /15-MAY-1997 15:07:56.86/RTN/Entering
    osak__check_mgmt_availability()
    OSAK /15-MAY-1997 15:07:56.87/CP/Disabling ASTs. Status = 9
    OSAK /15-MAY-1997 15:07:56.89/CP/The value of management availability
    is = 1
    OSAK /15-MAY-1997 15:07:56.90/RTN/Leaving
    osak__check_mgmt_availability() #1
    OSAK /15-MAY-1997 15:07:56.91/CP/Disabling ASTs.
    OSAK /15-MAY-1997 15:07:56.92/ Status = 9
    OSAK /15-MAY-1997 15:07:56.93/RTN/Entering osak__tr_open_responder()
    OSAK /15-MAY-1997 15:07:56.94/ port = 20560480
    OSAK /15-MAY-1997 15:07:56.95/ pb = 26658352
    OSAK /15-MAY-1997 15:07:56.97/RTN/Entering tsel_known()
    
    The calls and pb for osak_open_responder are:
    
15: 7:54.560 Listening for a connect request from VMD 8...
%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000000000000
0000, PC=0000000000EBF134, PS=0000001B
break on exception at SHARE$OSAK$OSAKSHR+1478964
DBG>
DBG> sho calls
 module name     routine name      line           rel PC           abs PC
 SHARE$OSAK$OSAKSHR                           0000000000000000 0000000000EBF134
 SHARE$OSAK$OSAKSHR                           0000000000000000 0000000000EAFAAC
 SHARE$OSAK$OSAKSHR                           0000000000000000 0000000000D87480
 SHARE$OMNI_AST_BASIC_SHR                     0000000000000000 00000000004F7AD0


ex pDTAssocInfo->port
OMNI_OSAK_MAIN\OSAKV30DT_OpenResponder\pDTAssocInfo->port:      0
*OMNI_OSAK_MAIN\OSAKV30DT_OpenResponder\pb
    pb_length:  300
    ws_length:  1024
    func:       1001
    tsdu_ptr:   0
    next_pb:    0
    port_id:    0
    event_type: 0
    more_flag:  0
    data_length:        0
    user_data:  0
    peer_data:  0
    acse_pci_eoc:       0
    pres_pci_eoc:       0
    status_block
        osak_status_1:  44740609
        osak_status_2:  0
        transport_status_1:     0
        transport_status_2:     0
    local_aei:  27071248
    actual_aeiid:       0
    acontext:   0
    calling_aei:        0
    called_aei: 0
    transport_template: 26709992
    protocol_versions:  0
    sconnect_id:        0
    protocol_options:   0
    segmentation:       0
    initial_serial_number:      0
    initial_tokens:     0
    functional_units:   0
    pcontext_list:      0
    pdefault_context:   0
    responding_aei:     0
    pcontext_res_list:  0
    pdefault_context_res:       0
    reject_reason:      0
    request_tokens:     0
    pcontext_del_list:  0
    pcontext_del_res_list:      0
    token_item: 0
    sync_confirm:       0
    sync_point: 0
    resync_type:        0
    pcontext_id_list:   0
    tokens:     0
    exception_reason:   0
    activity_id
        size:   0
        pointer:        0
    old_activity_id
        size:   0
        pointer:        0
    old_sconnection_id: 0
    activity_reason:    0
    abort_reason:       0
    abort_ppdu: 0
    local_abort:        0
    release_reason:     0
    release_resp_reason:        0
    action_result:      0
    redirect_state
        initiator:      0
        pm_state:       0
        filler1:        0
        filler2:        0
    process_id: 0
    process_name:       0
    pcontext_redirect_list:     0
    rcv_data_list:      0
    local_data: 0
    alloc_rtn:  4366400
    dealloc_rtn:        4366336
    alloc_param:        0
    completion_rtn:     4364968
    completion_param:   27070920
    trans_characteristics:      0
    api_version:        3
    user_context:       0
    data_separation:    0
    rfc1006_port:       0
    rfc1006_port:       0
    
510.25RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringThu May 15 1997 13:587
can you format the osakpb member 'local_aei'

As I mentioned, all this routine does is examine the list of known tsaps to see
if the one you are requesting has already been opened.  Either the local_aei
structure has been corrupted or the internal list that osak maintains has.

Did you get a process dump (SET PROCESS/DUMP) - and can we examine the dump?
510.26Pointers look ok.EICSMS::CELLAMThu May 15 1997 15:3338
    I didn't get a dump, my detached process hangs and stopping it didn't 
    produce a dump file.  However, going into the debugger again I got the
    local_aei details, they are good, I'm always using the same TSAP so
    it's either a timing problem or a corrupt internal list:-
    
*OMNI_OSAK_MAIN\OSAKV30DT_OpenResponder\pb->local_aei
    paddress
        psel
            size:       5
            pointer:    24871000
        ssel
            size:       1
            pointer:    24871104
        tsel
            size:       2
            pointer:    24871208     I looked here and this is my TSAP 01
        nsap
            next:       0
            id
                size:   6
                pointer:        24871312
            type:       1
    aetitle
        aptitle
            size:       0
            pointer:    0
        ae_qualifier
            size:       0
            pointer:    0
    aeiid
        apiid
            size:       0
            pointer:    0
        aeiid
            size:       0
            pointer:    0
DBG>
    
510.27DRAGNS::MILLERValerie MillerThu May 15 1997 15:5915
>DBG> sho calls
>module name     routine name      line           rel PC           abs PC
>SHARE$OSAK$OSAKSHR                           0000000000000000 0000000000EBF134
>SHARE$OSAK$OSAKSHR                           0000000000000000 0000000000EAFAAC
>SHARE$OSAK$OSAKSHR                           0000000000000000 0000000000D87480
>SHARE$OMNI_AST_BASIC_SHR                     0000000000000000 00000000004F7AD0

Does doing
DBG> set image osak$osakshr
DBG> set module/all

before the "show calls" give you any more information here (like module name,
routine name, line number)?

Valerie
510.28RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringThu May 15 1997 15:5919
We really need a dump to look at.  You should be able to specify /DUMP and
/NODEBUG on the RUN command that detaches the process.  I've been looking at the
code involved some more, and OSAK makes a copy of the tsap information, so if
the info being passed via the PB was munged, it would have shown up sooner than
in the tsel_known routine.

So, we either have a corrupted known tsap list within OSAK (which we then must
ask ourselves how did that list become corrupted), or the user memory that was
allocated for the tsap block that OSAK copied the tsap into became somehow
corrupted.  In either case we need to look at a dump in order to figure out
where to look further.

Also, could you provide more of the diag information leading up to this event. 
if it is a corrupt list, it's most likely happening because of an AST routine
touching the list at the same time a non-ast routine is; and we should be able
to pick this up from the diag info.


--Scott
510.29DRAGNS::MILLERValerie MillerThu May 15 1997 16:135
Also, are you sure you're using the debug osak$osakshr I just gave you?  I have
changed the format of the osak diagnostics, and what you show in .24 is the
older format.

Valerie
510.30RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringThu May 15 1997 16:468
Valerie makes a good point.  Since you're using a DEBUG version of OSAK, you
must remove the known system image via the INSTALL utility, otherwise you will
continue to use the non-debug version of OSAK.  And remember that you just want
to remove it as a known system image - you can't make debug images known system
images.



510.31tsel_knownFRSSMS::CELLAMTue May 20 1997 09:3639
    Scott, Valerie,
    
    	your correct I'd messed up the copy and installed the wrong image.
    Heres, the info:-
    
    10:32:26.474 Accepted connect for vmd number 30 count 116
    %DEBUG-I-DYNIMGSET, setting image OSAK$OSAKSHR
    %DEBUG-I-DYNMODSET, setting module OSAK_TRANSPORT
    %SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual
    address=000000000000
    0000, PC=0000000000F39070, PS=0000001B
    break on exception at OSAK_TRANSPORT\tsel_known\%LINE 27285+16
    %DEBUG-W-UNAOPNSRC, unable to open source file
    OSAK2:[OSAK.V30_AXP_BUILDS.NEBRAS
    KA.TEMP.SRC]OSAK_TRANSPORT_VOTS.C;1
    -RMS-F-DEV, error in device name or inappropriate device type for
    operation
     27285: Source line not available
DBG> sho calls
     module name     routine name      line           rel PC           abs
    PC
    *OSAK_TRANSPORT  tsel_known       27285       000000000000009C
    0000000000F39070
    *OSAK_TRANSPORT  osak__tr_open_responder
                                      19857       0000000000000814
    0000000000F23E84
     OSAK_API                                     0000000000000000
    0000000000D977E4
     SHARE$OMNI_AST_BASIC_SHR                     0000000000000000
    00000000004F7AD0
    
    
    
    As for a process dump, I've set the process to dump, but when I exit
    the debugger no dump file is produced.  I've also tried running
    detached, the problems occur but are caughts by the condition
    handler, no dump is produced the process continues.
    
    Chris
510.32RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringTue May 20 1997 13:1811
You're going to have to figure out how to get a dump.  Since it sounds like this
runs interactively, just do a RUN/NODEBUG after doing the SET PROCESS/DUMP
command.  Unless the threads stuff is catching and handling the ACCVIO signal,
then this *should* produce a .DMP file.  If it doesn't, then something is
catching the signal and you'll need to figure out how to either stop the catch,
or instruct the catcher to let a dump be produced.

The line number helps, but if this is a result of a corrupted data structure,
we're going to need the dump to try and look at what got corrupted and why.

--Scott
510.33DRAGNS::MILLERValerie MillerTue May 20 1997 17:0837
Chris,

Scott and I have taken a look based on the limited information we have at this
point.  The section of code where the ACCVIO occurs is accessing some memory
that osak allocated by calling the user's allocation routine.  Note that osak
does all of its memory management by calling the allocation and deallocation
routines supplied by the osak user.

Scott and I suspect that the problem lies with the memory management done by
the application.  Either memory has gotten corrupted, or the application has
deallocated memory and continued to access it incorrectly (memory that
subsequently got allocated to osak).

Scott has some suggestions for you to track down the application's memory
management problems.

- If you don't already, use an application-supplied memory allocation and
  deallocation routine throughout the application, including for osak memory
  allocations/deallocations.  For example, re-define malloc and free to point
  to your own routine:

  #define malloc application_malloc
  #define free   application_free

- Fill allocated and deallocated memory with a pattern, such as 0 or -1.  Scott
  especially suggests that filling with -1 on deallocation should help track
  down problems.

- Keep a list of allocated and freed memory.

At this point, we feel that this is most likely an application problem rather
than an osak problem.  Many other customers have been using this code with no
problems.

The process dump will be helpful in confirming this hypothesis.

Valerie
510.34RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringWed May 21 1997 14:2755
Some additional info...

The code in question basically compares the tsap information specified in the
local_aei field of the PB that was passed into osak_open_responder() with the
list of known tsaps that we're already listening on.  Basically the routine that
calls tsel_known() copies the local_aei information into a different data
structure which was allocated via the user supplied allocation routine.

Now, we feel that this new structure is probably ok, because tsel_known() is
basically just doing a byte by byte comparison of data - if the newly created
structure was corrupt for some reason, we would have failed when we copied the
data into it, rather than when we compared it.  In addition,
osak_open_responder() has disabled AST delivery, so we aren't dealing with an
AST coming in and zapping things.

If tsel_known() returns false, then the new structure is linked into the list of
known tsaps.  If tsel_known() returns true, then the new structure is hung off
the already existing entry (a list of lists):

TSAP1---TSAP1---TSAP1
  |
TSAP2
  |
TSAP3---TSAP3

At this point, we suspect that someone (probably the application) has
overwritten/corrupted the pointer which points to the tsap name in one of the
structures already on the known tsap list (one of the structures in the first
column).  Since the first check being done is against the length, this member
appears to be OK, but the pointer to the actual data has (from all appearances)
been zapped to zero.

As Valerie mentioned, any memory that OSAK dynamically allocates came from the
user application via the callback routine supplied when the port is initially
opened.  This situation looks like the user allocation routine has provided a
block of memory to OSAK that is still in use by some other part of the
application.  This could easily be the situation if the application called
free() at some point and then continued to reference/use the memory which was
free'd, and is why we suggest providing your own alloc/dealloc 'shell' that zaps
any memory free'd to -1.  If you are using the standard malloc() and free()
functions, then this will require that you provide a "jacket" around any
allocated memory so that your free() routine can pick up the length of the
memory being free'd.  If you are using lib$get_vm/lib$free_vm then you only need
to change the zone attributes when you create the zone (btw; malloc and free do
not use lib$get_vm()/lib$free_vm()).

So at this point, we strongly suspect an application issue, and it's entirely
possible that a problem like this might not show up on one release of OpenVMS
but would on another because of underlying changes in how whatever alloc/dealloc
service you are using peformed the actually memory management.  Additionally,
OSAK has not changed significantly between the 6.3 and 7.1 releases, certainly
nothing in the area of how we deal with the known tsap list, and the other
significant change was an OS upgrade, so.....

--Scott
510.35Error 20 occurs here too, then I close a port.FRSSMS::CELLAMThu May 22 1997 11:2549
    Still can't create a dump, but i've got a feeling that this is also
    as a result of a sys$emma problem in a previous inbound connect request.
    I see that a previous osak_responder/get_event call is failing, the Iosb 
    has transport error and this nasty error 20 in the transport status.
    This is the responder side to my previous test program that was making
    the connect requests.  I was wondering why the emma call is necessary,
    since this program is always listening on the same tsap and template
    I'd have thought osak would know what to use.
    
    22-MAY-1997 12:09:50.27/RTN/Entering t_connect_ind()
    22-MAY-1997 12:09:50.28/EXT/Making sense mode call
    22-MAY-1997 12:09:50.29/CP/Status of Sense-mode call = 1
    22-MAY-1997 12:09:50.30/ERR/Transport error
    22-MAY-1997 12:09:50.31/RTN/Leaving osak_emaa_call() #13
    22-MAY-1997 12:09:50.33/ERR/Error in osak_emaa_call. Status = 44743906
    22-MAY-1997 12:09:50.34/RTN/Entering async_get_event()
    22-MAY-1997 12:09:50.35/RTN/Leaving async_get_event() #5
    22-MAY-1997 12:09:50.36/RTN/Entering reactivate_tsap()
    22-MAY-1997 12:09:50.37/CP/The value of tsap->connected = 0
    22-MAY-1997 12:09:50.38/CP/The value of tsap->connected = 0
    22-MAY-1997 12:09:50.39/RTN/Leaving reactivate_tsap() #5
    
    our event returns
    
    DT: ProcessIndication 12:09:51.774
    
    OSAK error 44743906 0  Transport 20 0
    
    and we then do an osak_async_close, just after this I print out the process
    quotas, this should have been enough.
    
    ASTcnt    =      300 Current =        121 Previous =        122
    BIOcnt    =      300 Current =        128 Previous =        129
    BYTcnt    =   299040 Current =       7520 Previous =       9440
    DIOcnt    =      300 Current =        300 Previous =        300
    ENQcnt    =     2000 Current =       1996 Previous =       1996
    FILcnt    =      300 Current =        208 Previous =        208
    PAGFILcnt =   100000 Current =      65152 Previous =      65152
    TQcnt     =      125 Current =        124 Previous =        124
    
    and my listen ast returns
    
    12: 9:54.386 Listen failed for VMD MMS_IS_LCL_36 36... 80119874 0 0
    %OMNI-E-LISTEN_ERR, Listen Error
    
    I re-issue the listen call, i.e. osak_open_responder and were hosed
    the list is corrupt.  So I'm thinking that it has someting to do with
    the closing of the port and maybe the freeing of buffers, either by
    omni or osak.
510.36RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringThu May 22 1997 12:309
Can you provide a pointer to a file with the OSAK diagnostics so that we can get
a *complete* picture of what is going on; that is, what OSAK is doing leading up
to the ACCVIO.  I'm starting to feel like we're playing 20 questions ;-)


We'll continue to review the OSAK code that manipulates the known tsap list, but
the complete diagnostics would help us a lot in narrowing this down further.

--Scott
510.37DRAGNS::MILLERValerie MillerThu May 22 1997 22:4111
    re: .35
    
>   I was wondering why the emma call is necessary,
>   since this program is always listening on the same tsap and template
>   I'd have thought osak would know what to use.
    
    OSAK is calling EMAA to find out what the transport type is (clns,
    cons, rfc1006) for the actual transport connection, because on VMS one
    listening port can listen on all types of connections.
    
    Valerie
510.38It Appears That BYTLM Is Not Returned After A Connection Abort.EICSMS::CELLAMWed May 28 1997 11:0919
    Valerie, Scott,
    
    	it appears in my test environment that when a connection is aborted 
    and the OSAK port closed that the bytlm eaten by the EMMA call is not 
    returned to the process.  After a series of connection aborts and 
    connection re-establishments the process runs out of Bytlm and is no
    longer able to accept further connection requests, this is propogated
    through as a NO_EVENT, TRANS_ERR, 20, DEComni calls a the osak async
    close and the access violation takes place.  I'm looking into modifying
    the DEComni state machine to ingore the above event when in a listening
    state and not to subsequently close the osak port, the osak errors will
    then be propogated into the omni_listen iosb.  I did this for the VAX
    version for Volvo and it appears that this change was put into the
    Alpha kit, I'll have to discuss with DEComni eng why this was not done.
    
    However, I'm still concerned about the BYTLM not being returned after
    a connection is lost.
    
    Chris
510.39RMULAC::S_WATTUMScott Wattum - FTAM/VT/OSAK EngineeringWed May 28 1997 12:339
It's either EMAA or more likely OSI Transport not returning the BYTLM.

Please escalate this issue as a seperate IPMT to DECnet-Plus engineering, and we
will attempt to work with them on further isolation.  Please provide them with
as much information as you can about the circumstances surrounding the bytlm
loss.

Thanks,
--Scott
510.40Ok IPMT for DECnet Plus, Now The Accvio Log.EICSMS::CELLAMWed May 28 1997 14:1228
	Scott,
    
    	I'll follow up with another IPMT to DECnet-Plus eng, since they
    were the culprit for the Fillm quota problem I'd say that this one is
    in their code to, perhaps as a result of a previous fix.
    
    	This is the customer accvio with the debug image, the process dump
    will follow.
    
Current PROCESS Oracle Rdb environment is version V7.0-0 (MULTIVERSION)
Current PROCESS SQL environment is version V7.0-0 (MULTIVERSION)
Current PROCESS Rdb/Dispatch environment is version V7.0-0 (MULTIVERSION)
<1> MMS-Server started at 28-MAY-1997 07:47:29.58
%SYSTEM-F-ACCVIO, access violation, reason mask=00,
 virtual address=000000000000000C, PC=0000000000A68C64, PS=0000001B
%TRACE-F-TRACEBACK, symbolic stack dump follows
  image    module    routine             line      rel PC           abs PC      
 OSAK$OSAKSHR  OSAK_TRANSPORT  t_connect_ind
                                        20776 00000000000035F4 0000000000A68C64
                                            0 FFFFFFFF800BD3A8 FFFFFFFF800BD3A8
                                            0 FFFFFFFF80001924 FFFFFFFF80001924
 MMS_SERVER  ALST  alst_bearbeiten      14208 00000000000003F8 0000000000039238
 MMS_SERVER  MMS_SERVER  main           15588 0000000000000298 00000000000307F8
 MMS_SERVER  MMS_SERVER  __main             0 0000000000000058 00000000000305B8
 PTHREAD$RTL                                0 000000000004C148 00000000004F8148
 PTHREAD$RTL                                0 0000000000030664 00000000004DC664
                                            0 FFFFFFFF8173D0D8 FFFFFFFF8173D0D8
<2> MMS-Server started at 28-MAY-1997 14:01:40.75