[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference help::osi_appl_support

Title:	Please read note 1.0ELP::OSI_APPL_SUPPORT
Notice:	Please read note 1.0
Moderator:	DRAGNS::MILLERCOM::S_WATTUM

Created:	Mon Aug 30 1993
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	516
Total number of notes:	2729

510.0. "ACCVIO in OSAK DECnet Plus 7.1." by FRAIS::CELLAM () Wed Apr 23 1997 14:23

    Scott,
    
    	I wanted to send you a mail but nothing works at the moment.
    I have a customer running DEComni with OSAK under OpenVMS 7.1 Alpha
    and DECnet Plus 7.1, so I guess this is OSAK 3.0P, image dates are
    21-Oct-96.  The customer says that his application is access violating
    and the image pointed to is osak$share, the rel pc is 16c098.  I'm
    awaiting more details and have suggested some traces etc...  In the
    mean time, is there any fixes for an Accvio in this or other areas?
    
    Thanks,
    
    Chris

T.R	Title	User	Personal Name	Date	Lines
510.1		RMULAC.DVO.DEC.COM::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Wed Apr 23 1997 15:03`	32
	I can be reached via SMTP directly at s_wattum@rmulac.dvo.dec.com - unfortunately, the move of EDS to the Crosspoint location has manaaged to totally disrupt network connectivity for my usual email address. I'm not aware of any ACCVIO problems within OSAK that match the limited footprint info you've provided. Sometimes ACCVIO's in OSAK are an application issue; OSAK does virtually no checking to ensure that various user pointers passed in the OSAK PB are valid (no PROBER or PROBEW's) - so if the application has messed up a PB pointer, or possibly has not correctly initialized the parameter block, you could easily have an ACCVIO in OSAK which is not OSAK's fault. Further, it's entirely possible that an application error may exist and not manifest on different versions - simply because of differences in memory allocation or stack contents. Information that would be needed is to further isolate this is: 1) The actual OSAK call that is failing. 2) a dump of the OSAK PB, along with any of the data structures pointed at by the PB - that is, verify all pointers, etc. 3) and OSAK_DIAG trace might also be helpful - if it happens that there is a diag nearby, we might be able to further isolate where in OSAK the ACCVIO is happening. 4) Since it is possible that OSAK could ACCVIO because of incorrectly encoded data, an OSAK trace might also be useful. 5) And, of course, the frequently requested IPMT. If this doesn't do it, we might be able to get you a version of OSAK linked DEBUG and that coupled with a process image dump would allow us to isolate things further. --Scott
510.2		RMULAC.DVO.DEC.COM::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Thu Apr 24 1997 13:19`	6
	If/when you escalate the IPMT to us, could you email me directly with the information. Because of continueing network problems, I can't reliably get to IPMT. Thanks, --Scott
510.3	Prepare a Debug OSAK For OpenVMS Alpha 7.1.	EICSMS::CELLAM		`Wed May 07 1997 12:59`	20
	Scott, the customer is placing a support call now, it should be going to the DECnet OSI support and they will hopefully open an IPMT. I've gone through the transport trace and see that the accvio occurs when an inbound connection request cannot be answered, unfortunately the CR messages are not being traced so I don't know how many CRs are back logged. I see that a DR is returned with Reason 80. Prior to this the last data transfer occured 4 seconds before the accvio, this was the receipt of an MMS request PDU, it is ak'd by transport, but I don't see any MMS response PDU, which is strange. Which OSAK call is in use is hard to tell. We can't use the osak_diag tracing, this slows down the data transfers too much and this is bad news, we've had enough problems with our wide area routers causing delays and/or throwing away packets. We are exchanging data between 2 railway signalling/control stations (one from SEL-Alcatel on a Alpha OpenVMS and the other from Siemens on a SCO-UNIX PC) in between the two stations is a high speed ICE train. The customer is opting for the debug image and process dump route. Chris
510.4	SEL/Alcatel Exception Trace Log.	FRSSMS::CELLAM		`Wed May 07 1997 15:46`	314
510.5		CANTH::WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Wed May 07 1997 16:57`	17
	I'm confused, you have the name of the routine which apparently called OSAK and the line number - why can't you tell what OSAK routine was called? Do you not have access to the source for the application using OSAK? The virtual address looks like OSAK was called with a null pointer to a structure that was expected to be supplied; when OSAK tried to dereference the pointer and gain access to one of the members, the accvio happened. Just a guess though. Valerie suggests turning on all diags except routine entry/exit, which would be a value 14 - this will allow us to look at error handling, and other misc. diagnostics which should not have as much impact on the application. --Scott
510.6	VMS 7.1 OSAK Doesn't Have My IPMT Fixes.	EICSMS::CELLAM		`Mon May 12 1997 13:54`	25
	Scott, we just did an image date check, it appears that the V7.1 OSAK sharables are earlier than the one used in V6.3, I received a new sharable library for V6.3 in November. With regards to the code, it's useless, the code line is for sys$hiber, can't tell what was happening. I'm getting an acces violation here with the image from nov, however, I can't see anything useful in the osak diag trace. I'll send you a mail with the details. OmniOsakDT_OpenResponder exception caught from osak_open_responder: 12 14:7:20.759 Listen failed for VMD MMS_IS_LCL_103 103... 12 0 0 %SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000000000000 0000, PC=0000000000000000, PS=00000000 14:7:20.801 Listening for a connect request from VMD 103... This may or may not be related to the customer problem. Chris
510.7		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Mon May 12 1997 14:24`	12
	> we just did an image date check, it appears that the V7.1 OSAK > sharables are earlier than the one used in V6.3, I received a new > sharable library for V6.3 in November. We were code frozen for 7.1 around that time, so it's likely that what we gave you is later than what shipped with 7.1. >With regards to the code, it's useless, the code line is for > sys$hiber, can't tell what was happening. Which suggests that OSAK was probably executing a completion AST of some sort.
510.8	New Image Same osak_open_responder accvio..	EICSMS::CELLAM		`Tue May 13 1997 15:15`	50
	Scott, just for info, my osak_diag log file seems to always end at this point:- 13-MAY-1997 16:54:00.13/RTN/Entering osak_get_event(). port = 20363864 13-MAY-1997 16:54:00.13/RTN/Entering async_get_event() 13-MAY-1997 16:54:00.14/RTN/Leaving async_get_event() #6 13-MAY-1997 16:54:00.15/RTN/Leaving osak_get_event() #5, status = 44741635 13-MAY-1997 16:54:00.17/RTN/Entering osak__async_close_ast() 13-MAY-1997 16:54:00.18/CP/The value of port->encode_q = 0 13-MAY-1997 16:54:00.19/CP/The value of port->transmit_q = 0 13-MAY-1997 16:54:00.20/CP/The value of port->transmit_exp_q = 0 13-MAY-1997 16:54:00.21/CP/The value of port->free_pb_q = 0 13-MAY-1997 16:54:00.45/RTN/Entering osak_open_responder(). 13-MAY-1997 16:54:00.47/RTN/Entering osak__check_process_priv() 13-MAY-1997 16:54:00.48/RTN/Leaving osak__check_process_priv() #6 13-MAY-1997 16:54:00.49/RTN/Entering osak__check_mgmt_availability() 13-MAY-1997 16:54:00.50/CP/Disabling ASTs. Status = 9 13-MAY-1997 16:54:00.51/CP/The value of management availability is = 1 13-MAY-1997 16:54:00.52/RTN/Leaving osak__check_mgmt_availability() #1 13-MAY-1997 16:54:00.53/CP/Disabling ASTs. Status = 9 13-MAY-1997 16:54:00.54/RTN/Entering osak__tr_open_responder() 13-MAY-1997 16:54:00.55/ port = 20566024 13-MAY-1997 16:54:00.57/ pb = 22835952 13-MAY-1997 16:54:00.58/RTN/Entering tsel_known() The last application log is the accvio, i.e. 16:53:57.450 Listen failed for VMD MMS_IS_LCL_84 84... 80119874 0 0 %OMNI-E-LISTEN_ERR, Listen Error 16:53:57.479 Listening for a connect request from VMD 84... 16:54:0.117 Accepting a connect for vmd number 83 OmniOsakDT_OpenResponder exception caught from osak_open_responder: 12 16:54:0.601 Listen failed for VMD MMS_IS_LCL_84 84... 12 0 0 %SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000000000000 0000, PC=0000000000000000, PS=00000000 16:54:0.619 Listening for a connect request from VMD 84... Interrupt $ stop
510.9		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Tue May 13 1997 15:50`	27
	That helps, but the ACCVIO information looks suspect to me. I mean a PC of 0? A VA of 0 I could maybe inderstand (well, almost), but not a PC of 0. Is it possible that the signal handler that appears to be in place is corrupting or not providing any information except the ACCVIO status? All this routine does is take the tsap that was specified in the PB of the open_responder() call and compare it with a list of known tsaps to see if OSAK might already be listening on that tsap. Is it possible that the tsel.pointer you are passing in (as part of the osak_paddress/osak_aei structure, the local_aei field in the pb) has a munged pointer (maybe something got allocated on the stack of a subroutine and is no longer valid)? the code basically does: for( every_known_tsap ) if( a_known_tsap_length == the_user_tsap_length ) status = compare_byte_by_byte( a_known_tsap with the_user_tsap ) if( status == the_same ) return true; Once you get this accvio, does every subsequent open_responder() call fail? If so then I would begin to suspect that OSAKs list of known tsaps was corrupted. If not, I would begin to suspect the application using OSAK is passing a munged pointer. If needed, we could probably insert some additional diagnostics in this routine to dump out the pointers being used by both OSAK and the user.
510.10	Perhaps a Catching Problem, here's another problem.	EICSMS::CELLAM		`Wed May 14 1997 10:44`	72
	Yeep, the cathing and raising of the errors in .8 look suspect, I've asked for the code so that I can review it to see if we could better TRY and CATCH this error. Mean-while I'm doind some other tests and I'm seeing a problem in the osak_associate_request processing, there seems to be a problem getting the template, why this should start to fail now I don't know. Here's the OSAK diag 14-MAY-1997 11:30:20.56/RTN/Leaving osak_open_initiator() #8, status = NORMAL, port = 15025568 14-MAY-1997 11:30:20.57/RTN/Entering osak_associate_req(). port = 15025568 14-MAY-1997 11:30:20.58/RTN/Entering osak__find_acse_pctxtid() 14-MAY-1997 11:30:20.59/RTN/Leaving osak__find_acse_pctxtid() #3. Status = 44740609 14-MAY-1997 11:30:20.61/RTN/Entering osak__assoc_req_copy() 14-MAY-1997 11:30:20.62/RTN/Entering osak__create_encode_block() 14-MAY-1997 11:30:20.63/RTN/Leaving osak__create_encode_block() #2 14-MAY-1997 11:30:20.64/RTN/Entering osak__tr_connect_req() 14-MAY-1997 11:30:20.65/RTN/Entering t_connect_get_template() 14-MAY-1997 11:30:20.66/CP/The transport template = IEEE 14-MAY-1997 11:30:20.67/ERR/Transport error 14-MAY-1997 11:30:20.68/RTN/Leaving osak_emaa_call() #13 14-MAY-1997 11:30:20.69/ERR/Error - EMAA call to get template's network type failed 14-MAY-1997 11:30:20.70/RTN/Leaving t_connect_get_template() #9 14-MAY-1997 11:30:20.72/RTN/Leaving osak__tr_connect_req() #7 14-MAY-1997 11:30:20.73/RTN/Entering osak__service_end() 14-MAY-1997 11:30:20.75/ERR/Error returned from the state table 14-MAY-1997 11:30:20.76/RTN/Entering osak__free_encode_block() 14-MAY-1997 11:30:20.77/RTN/Leaving osak__free_encode_block() #514936680 14-MAY-1997 11:30:20.79/RTN/Leaving osak__service_end() #3 14-MAY-1997 11:30:20.82/RTN/Leaving osak_associate_req() #3, status = 44743906 The application log reports, again the catch has a problem, we should be able to continue from here:- 11:30:19.942 Requesting a connect with VMD 48... EXC: osak dt: Fatal error associate request. code = 44743906. osak_status_1: 44743906 OSAK_S_TRANSERR osak_status_2: 0 transport_status_1: 20 %SYSTEM-F-BADPARAM, bad parameter value transport_status_2: 0 Raising exception 44743906 at SYS$SYSDEVICE:[OMNI.LIBRARY.OMNIMMS022.TEST]OMNI_OSAK_UTIL.C;25:2200 %CMA-F-EXCCOP, exception raised; VMS condition code follows -OSAK-E-TRANSERR, there is an error in the Transport provider %TRACE-F-TRACEBACK, symbolic stack dump follows image module routine line rel PC abs PC PTHREAD$RTL 0 000000000003D09C 000000000096B09C CMA$RTL 0 00000000000341E4 00000000008F01E4 OMNI_AST_BASIC_SHR 0 00000000000D416C 00000000001E816C OMNI_AST_BASIC_SHR 0 00000000000CB5A4 00000000001DF5A4 OMNI_AST_BASIC_SHR 0 00000000000CDC2C 00000000001E1C2C OMNI_AST_BASIC_SHR 0 00000000000CC8AC 00000000001E08AC OMNI_AST_BASIC_SHR 0 000000000021B810 000000000032F810 OMNI_AST_BASIC_SHR 0 000000000023CE08 0000000000350E08 OMNI_AST_BASIC_SHR 0 00000000000EEF9C 0000000000202F9C OMNI_AST_BASIC_SHR 0 000000000023B0A0 000000000034F0A0 0 FFFFFFFF800BD3A8 FFFFFFFF800BD3A8 0 FFFFFFFF800A4448 FFFFFFFF800A4448 OMNI_AST_BASIC_SHR 0 000000000027023C 000000000038423C TEST_MMS_CONNECT_A TEST_MMS_CONNECT_A connect_a 18798 0000000000000350 0000000000030350 TEST_MMS_CONNECT_A TEST_MMS_CONNECT_A main 19388 0000000000001B2C 0000000000031B2C TEST_MMS_CONNECT_A TEST_MMS_CONNECT_A __main 0 00000000000000A4 00000000000300A4 PTHREAD$RTL 0 000000000004C148 000000000097A148 PTHREAD$RTL 0 0000000000030664 000000000095E664 0 FFFFFFFF826FB0D8 FFFFFFFF826FB0D8 $
510.11		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Wed May 14 1997 12:30`	16
	Well, the first thing to check would be to see if the template IEEE did in fact exist. ncl> show osi tran temp ieee all We've never had any reported problems with the emaa routine. if the template does exist via the NCL SHOW command (which is all that our emaa routine is really doing), then I would maybe check net$acp and make sure it hasn't depleted it's pagefile quota or somesuch (this is a long shot). One thing which concerns me - it looks like you are using threads. You do know that OSAK is not thread safe? The application must either have only 1 thread which calls into OSAK, or it needs to put its own mutexes around the OSAK calls to provent other threads from executing in OSAK when another thread is already in OSAK. Failure to do this will certainly result in strange failures within OSAK. Does DEComni do this? --Scott
510.12	Can't See Any difference in our trace.	EICSMS::CELLAM		`Wed May 14 1997 13:13`	17
	This version of the DEComni sharable library doesn't use thread, it does however, use the cma error routines. We have another sharable for threads and the osak calls are mutexed. I've turned on our debug traces, however, the data that is being passed is always identical, with the exception of the TSAP value. The template used is always the same. I can't see what could be going wrong, I presume that OSAK takes our TSAPs, Template/NSAP and builds up an NCB Item list, the QIOW is called with IO$_ACCESS and this call fails, either in the status or in the IOSB. I can't see what return status/iosb codes are returned other that 20 which is passed up. Any pointers as to what I should look for, I guess I could dump out the pb? Thanks, Chris
510.13		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Wed May 14 1997 14:34`	18
	The BADPARAM error in the last diag trace was actually a result of a call into SYS$EMAA; if OSAK cannot determine the type of the template (CONS, CLNS or RFC) then it can't call QIO - so we didn't get that far. Again, this points to bad data being passed into OSAK and not necessarily a problem with OSAK. The data in this case would again by the TSAP (for the called_aei). My suggestion for this one would be to dump out the data in the osak_paddress structure hung off of the osak_aei pointed at by called_aei in the pb; pay specific attention to the tsap and nsap members (that is, dump everything in them). Double check that IEEE does in fact exist as a transport template. While NCL doesn't do quite the same thing as OSAK does when talking to EMAA, if EMAA was having a problem I would expect it to show up in both places (that is, both with OSAK and with an NCL SHOW). --Scott
510.14	Error 20 Smells Like a Quota Problem, but which process.	EICSMS::CELLAM		`Wed May 14 1997 14:54`	10
	So the template's there, net$acp's ok, process quota's ok, ncl's osi transport connects, max nsaps ok. The pb block appears to be ok, I've printed out the sap names etc.. plus the lengths, they are ok to. Changed one catch and at least for osak_associate_req we keep on running after an error. It smells like a quota problem, but sda and sho proc say everything is ok for my users processes and net$acp. What about the osak process's? Chris
510.15		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Wed May 14 1997 14:57`	20
	oops. My mistake. We aren't dealing with a tsap problem here, but a template problem. Sorry about that. OSAK will return BADPARAM under the following conditions: If the pointer to the template, the pointer to the network type (if the nsap structure was messed up - not something the user would need to worry about) or a pointer to working memory provided by the user allocation callback routine are null (the user alloc callback will be asked for 2K of memory). The template name information is hung off of the pb in the 'transport_template' list. This info should be validated. OSAK will also return BADPARAM if the template name is not in the range of 1-512 in length. And we can return BADPARAM is the call to EMAA to request information on the network type of the template failed. --Scott
510.16		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Wed May 14 1997 15:02`	16
	> What about the osak process's? If you're refering to osak$server or osak$netman they aren't involved at this point. The only other thing that comes to mind is if your allocation routine provided on the osak_open_initiator call is failing to allocate the 2K of memory we need for this - this would be a process issue for the process doing the osak_associate_request. The call to SYS$EMAA can fail which would result in a BADPARAM status - but my memory says it won't fail because the template doesn't exist. We may need to perform some EMAA tracing, and I'll have to check on how to do that (or if it can be done). --Scott
510.17	Pointer Names Of Interest?	EICSMS::CELLAM		`Wed May 14 1997 15:18`	6
	What are the names of the pointers that I need to look at i.e. for the network type and the working memory? Thanks, Chris
510.18		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Wed May 14 1997 15:28`	20
	COuld you try something for me? Could you copy the file DRAGNS::NET_TEST.EXE over to your AXP system and run it. It will query for a template name and then will call the same routine OSAK normally uses and will print out how we did. You should see output like what follows - mainly i'm interested in if the routine returns 0 for a status (an empty line will exit you). RMULAC $ run net_test Enter the template name: default GetNetworkType returned a status of 1 and a networkType of 1 for template defaul t Enter the template name: osit$rfc1006 GetNetworkType returned a status of 1 and a networkType of 3 for template osit$r fc1006 Enter the template name: garbage GetNetworkType returned a status of 1 and a networkType of 0 for template garbag e Enter the template name:
510.19		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Wed May 14 1997 18:50`	2
	I am told that SYS$EMAA uses BYTLM, so make sure your BYTLM quota is sufficient, or try raising it.
510.20	SYS$EMMA	EICSMS::CELLAM		`Thu May 15 1997 10:40`	3
	Could we have a guideline for BYTLM, where do I get docs on sys$emaa calls? Does this call have only success/fail staus codes, is there something else that would be more useful?
510.21		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Thu May 15 1997 11:58`	14
	I do not have any guidelines to supply on bytlm because it is application dependant (# of concurrent associations, outstanding I/O requests, etc.). All I can suggest is to increase it - I would start by doubling what you currently have. I'm not aware of any formal documentation on sys$emaa(). I don't know what granularity of status codes emaa returns - however, OSAK internally maps any return from emaa into a simple success/failure status. This mapping is buried inside of some kernel mode code, so we can't even place any OSAK diagnostic tracepoints in the code. Valerie and I were discussing whether we might be able to return the actual emaa status as the secondary transport status in the status block, but we aren't sure at this point whether the effort would be justified. --Scott
510.22	More Status Info Can Only Be Goodness.	EICSMS::CELLAM		`Thu May 15 1997 12:57`	9
	Yeep, it looks like it is/was a bytlm problem, this call really eats up bytlm. As to whether or not it is worthwhile including the sys$emma error details in the osak status codes, my personall opinion is that any useful status info is goodness for the product and also for the reduction in support costs. This error 20 problem has eaten up lots of my time, now that its overcome I can look further into the access violation I'm getting on the responder side. Chris
510.23		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Thu May 15 1997 13:24`	2
	Well, I would tend to agree - it all depends on whether emaa is returning anything useful. We'll throw it on the todo list for a future release.
510.24	Back to my ACCVIO in OSAK	EICSMS::CELLAM		`Thu May 15 1997 13:48`	125
	Great. Ok now I've trapped an accvio in OSAK$OSAKSHR, the osak diag says this was the last sequence, I'm using the debug image. Does any of this help. OSAK /15-MAY-1997 15:07:56.83/RTN/Entering osak_open_responder. Parameter block = 26658352 OSAK /15-MAY-1997 15:07:56.84/RTN/Entering osak__check_process_priv() OSAK /15-MAY-1997 15:07:56.85/RTN/Leaving osak__check_process_priv() #6 OSAK /15-MAY-1997 15:07:56.86/RTN/Entering osak__check_mgmt_availability() OSAK /15-MAY-1997 15:07:56.87/CP/Disabling ASTs. Status = 9 OSAK /15-MAY-1997 15:07:56.89/CP/The value of management availability is = 1 OSAK /15-MAY-1997 15:07:56.90/RTN/Leaving osak__check_mgmt_availability() #1 OSAK /15-MAY-1997 15:07:56.91/CP/Disabling ASTs. OSAK /15-MAY-1997 15:07:56.92/ Status = 9 OSAK /15-MAY-1997 15:07:56.93/RTN/Entering osak__tr_open_responder() OSAK /15-MAY-1997 15:07:56.94/ port = 20560480 OSAK /15-MAY-1997 15:07:56.95/ pb = 26658352 OSAK /15-MAY-1997 15:07:56.97/RTN/Entering tsel_known() The calls and pb for osak_open_responder are: 15: 7:54.560 Listening for a connect request from VMD 8... %SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000000000000 0000, PC=0000000000EBF134, PS=0000001B break on exception at SHARE$OSAK$OSAKSHR+1478964 DBG> DBG> sho calls module name routine name line rel PC abs PC SHARE$OSAK$OSAKSHR 0000000000000000 0000000000EBF134 SHARE$OSAK$OSAKSHR 0000000000000000 0000000000EAFAAC SHARE$OSAK$OSAKSHR 0000000000000000 0000000000D87480 SHARE$OMNI_AST_BASIC_SHR 0000000000000000 00000000004F7AD0 ex pDTAssocInfo->port OMNI_OSAK_MAIN\OSAKV30DT_OpenResponder\pDTAssocInfo->port: 0 *OMNI_OSAK_MAIN\OSAKV30DT_OpenResponder\pb pb_length: 300 ws_length: 1024 func: 1001 tsdu_ptr: 0 next_pb: 0 port_id: 0 event_type: 0 more_flag: 0 data_length: 0 user_data: 0 peer_data: 0 acse_pci_eoc: 0 pres_pci_eoc: 0 status_block osak_status_1: 44740609 osak_status_2: 0 transport_status_1: 0 transport_status_2: 0 local_aei: 27071248 actual_aeiid: 0 acontext: 0 calling_aei: 0 called_aei: 0 transport_template: 26709992 protocol_versions: 0 sconnect_id: 0 protocol_options: 0 segmentation: 0 initial_serial_number: 0 initial_tokens: 0 functional_units: 0 pcontext_list: 0 pdefault_context: 0 responding_aei: 0 pcontext_res_list: 0 pdefault_context_res: 0 reject_reason: 0 request_tokens: 0 pcontext_del_list: 0 pcontext_del_res_list: 0 token_item: 0 sync_confirm: 0 sync_point: 0 resync_type: 0 pcontext_id_list: 0 tokens: 0 exception_reason: 0 activity_id size: 0 pointer: 0 old_activity_id size: 0 pointer: 0 old_sconnection_id: 0 activity_reason: 0 abort_reason: 0 abort_ppdu: 0 local_abort: 0 release_reason: 0 release_resp_reason: 0 action_result: 0 redirect_state initiator: 0 pm_state: 0 filler1: 0 filler2: 0 process_id: 0 process_name: 0 pcontext_redirect_list: 0 rcv_data_list: 0 local_data: 0 alloc_rtn: 4366400 dealloc_rtn: 4366336 alloc_param: 0 completion_rtn: 4364968 completion_param: 27070920 trans_characteristics: 0 api_version: 3 user_context: 0 data_separation: 0 rfc1006_port: 0 rfc1006_port: 0
510.25		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Thu May 15 1997 13:58`	7
	can you format the osakpb member 'local_aei' As I mentioned, all this routine does is examine the list of known tsaps to see if the one you are requesting has already been opened. Either the local_aei structure has been corrupted or the internal list that osak maintains has. Did you get a process dump (SET PROCESS/DUMP) - and can we examine the dump?
510.26	Pointers look ok.	EICSMS::CELLAM		`Thu May 15 1997 15:33`	38
	I didn't get a dump, my detached process hangs and stopping it didn't produce a dump file. However, going into the debugger again I got the local_aei details, they are good, I'm always using the same TSAP so it's either a timing problem or a corrupt internal list:- *OMNI_OSAK_MAIN\OSAKV30DT_OpenResponder\pb->local_aei paddress psel size: 5 pointer: 24871000 ssel size: 1 pointer: 24871104 tsel size: 2 pointer: 24871208 I looked here and this is my TSAP 01 nsap next: 0 id size: 6 pointer: 24871312 type: 1 aetitle aptitle size: 0 pointer: 0 ae_qualifier size: 0 pointer: 0 aeiid apiid size: 0 pointer: 0 aeiid size: 0 pointer: 0 DBG>
510.27		DRAGNS::MILLER	Valerie Miller	`Thu May 15 1997 15:59`	15
	>DBG> sho calls >module name routine name line rel PC abs PC >SHARE$OSAK$OSAKSHR 0000000000000000 0000000000EBF134 >SHARE$OSAK$OSAKSHR 0000000000000000 0000000000EAFAAC >SHARE$OSAK$OSAKSHR 0000000000000000 0000000000D87480 >SHARE$OMNI_AST_BASIC_SHR 0000000000000000 00000000004F7AD0 Does doing DBG> set image osak$osakshr DBG> set module/all before the "show calls" give you any more information here (like module name, routine name, line number)? Valerie
510.28		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Thu May 15 1997 15:59`	19
	We really need a dump to look at. You should be able to specify /DUMP and /NODEBUG on the RUN command that detaches the process. I've been looking at the code involved some more, and OSAK makes a copy of the tsap information, so if the info being passed via the PB was munged, it would have shown up sooner than in the tsel_known routine. So, we either have a corrupted known tsap list within OSAK (which we then must ask ourselves how did that list become corrupted), or the user memory that was allocated for the tsap block that OSAK copied the tsap into became somehow corrupted. In either case we need to look at a dump in order to figure out where to look further. Also, could you provide more of the diag information leading up to this event. if it is a corrupt list, it's most likely happening because of an AST routine touching the list at the same time a non-ast routine is; and we should be able to pick this up from the diag info. --Scott
510.29		DRAGNS::MILLER	Valerie Miller	`Thu May 15 1997 16:13`	5
	Also, are you sure you're using the debug osak$osakshr I just gave you? I have changed the format of the osak diagnostics, and what you show in .24 is the older format. Valerie
510.30		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Thu May 15 1997 16:46`	8
	Valerie makes a good point. Since you're using a DEBUG version of OSAK, you must remove the known system image via the INSTALL utility, otherwise you will continue to use the non-debug version of OSAK. And remember that you just want to remove it as a known system image - you can't make debug images known system images.
510.31	tsel_known	FRSSMS::CELLAM		`Tue May 20 1997 09:36`	39
	Scott, Valerie, your correct I'd messed up the copy and installed the wrong image. Heres, the info:- 10:32:26.474 Accepted connect for vmd number 30 count 116 %DEBUG-I-DYNIMGSET, setting image OSAK$OSAKSHR %DEBUG-I-DYNMODSET, setting module OSAK_TRANSPORT %SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000000000000 0000, PC=0000000000F39070, PS=0000001B break on exception at OSAK_TRANSPORT\tsel_known\%LINE 27285+16 %DEBUG-W-UNAOPNSRC, unable to open source file OSAK2:[OSAK.V30_AXP_BUILDS.NEBRAS KA.TEMP.SRC]OSAK_TRANSPORT_VOTS.C;1 -RMS-F-DEV, error in device name or inappropriate device type for operation 27285: Source line not available DBG> sho calls module name routine name line rel PC abs PC OSAK_TRANSPORT tsel_known 27285 000000000000009C 0000000000F39070 OSAK_TRANSPORT osak__tr_open_responder 19857 0000000000000814 0000000000F23E84 OSAK_API 0000000000000000 0000000000D977E4 SHARE$OMNI_AST_BASIC_SHR 0000000000000000 00000000004F7AD0 As for a process dump, I've set the process to dump, but when I exit the debugger no dump file is produced. I've also tried running detached, the problems occur but are caughts by the condition handler, no dump is produced the process continues. Chris
510.32		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Tue May 20 1997 13:18`	11
	You're going to have to figure out how to get a dump. Since it sounds like this runs interactively, just do a RUN/NODEBUG after doing the SET PROCESS/DUMP command. Unless the threads stuff is catching and handling the ACCVIO signal, then this should produce a .DMP file. If it doesn't, then something is catching the signal and you'll need to figure out how to either stop the catch, or instruct the catcher to let a dump be produced. The line number helps, but if this is a result of a corrupted data structure, we're going to need the dump to try and look at what got corrupted and why. --Scott
510.33		DRAGNS::MILLER	Valerie Miller	`Tue May 20 1997 17:08`	37
	Chris, Scott and I have taken a look based on the limited information we have at this point. The section of code where the ACCVIO occurs is accessing some memory that osak allocated by calling the user's allocation routine. Note that osak does all of its memory management by calling the allocation and deallocation routines supplied by the osak user. Scott and I suspect that the problem lies with the memory management done by the application. Either memory has gotten corrupted, or the application has deallocated memory and continued to access it incorrectly (memory that subsequently got allocated to osak). Scott has some suggestions for you to track down the application's memory management problems. - If you don't already, use an application-supplied memory allocation and deallocation routine throughout the application, including for osak memory allocations/deallocations. For example, re-define malloc and free to point to your own routine: #define malloc application_malloc #define free application_free - Fill allocated and deallocated memory with a pattern, such as 0 or -1. Scott especially suggests that filling with -1 on deallocation should help track down problems. - Keep a list of allocated and freed memory. At this point, we feel that this is most likely an application problem rather than an osak problem. Many other customers have been using this code with no problems. The process dump will be helpful in confirming this hypothesis. Valerie
510.34		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Wed May 21 1997 14:27`	55
	Some additional info... The code in question basically compares the tsap information specified in the local_aei field of the PB that was passed into osak_open_responder() with the list of known tsaps that we're already listening on. Basically the routine that calls tsel_known() copies the local_aei information into a different data structure which was allocated via the user supplied allocation routine. Now, we feel that this new structure is probably ok, because tsel_known() is basically just doing a byte by byte comparison of data - if the newly created structure was corrupt for some reason, we would have failed when we copied the data into it, rather than when we compared it. In addition, osak_open_responder() has disabled AST delivery, so we aren't dealing with an AST coming in and zapping things. If tsel_known() returns false, then the new structure is linked into the list of known tsaps. If tsel_known() returns true, then the new structure is hung off the already existing entry (a list of lists): TSAP1---TSAP1---TSAP1 \| TSAP2 \| TSAP3---TSAP3 At this point, we suspect that someone (probably the application) has overwritten/corrupted the pointer which points to the tsap name in one of the structures already on the known tsap list (one of the structures in the first column). Since the first check being done is against the length, this member appears to be OK, but the pointer to the actual data has (from all appearances) been zapped to zero. As Valerie mentioned, any memory that OSAK dynamically allocates came from the user application via the callback routine supplied when the port is initially opened. This situation looks like the user allocation routine has provided a block of memory to OSAK that is still in use by some other part of the application. This could easily be the situation if the application called free() at some point and then continued to reference/use the memory which was free'd, and is why we suggest providing your own alloc/dealloc 'shell' that zaps any memory free'd to -1. If you are using the standard malloc() and free() functions, then this will require that you provide a "jacket" around any allocated memory so that your free() routine can pick up the length of the memory being free'd. If you are using lib$get_vm/lib$free_vm then you only need to change the zone attributes when you create the zone (btw; malloc and free do not use lib$get_vm()/lib$free_vm()). So at this point, we strongly suspect an application issue, and it's entirely possible that a problem like this might not show up on one release of OpenVMS but would on another because of underlying changes in how whatever alloc/dealloc service you are using peformed the actually memory management. Additionally, OSAK has not changed significantly between the 6.3 and 7.1 releases, certainly nothing in the area of how we deal with the known tsap list, and the other significant change was an OS upgrade, so..... --Scott
510.35	Error 20 occurs here too, then I close a port.	FRSSMS::CELLAM		`Thu May 22 1997 11:25`	49
	Still can't create a dump, but i've got a feeling that this is also as a result of a sys$emma problem in a previous inbound connect request. I see that a previous osak_responder/get_event call is failing, the Iosb has transport error and this nasty error 20 in the transport status. This is the responder side to my previous test program that was making the connect requests. I was wondering why the emma call is necessary, since this program is always listening on the same tsap and template I'd have thought osak would know what to use. 22-MAY-1997 12:09:50.27/RTN/Entering t_connect_ind() 22-MAY-1997 12:09:50.28/EXT/Making sense mode call 22-MAY-1997 12:09:50.29/CP/Status of Sense-mode call = 1 22-MAY-1997 12:09:50.30/ERR/Transport error 22-MAY-1997 12:09:50.31/RTN/Leaving osak_emaa_call() #13 22-MAY-1997 12:09:50.33/ERR/Error in osak_emaa_call. Status = 44743906 22-MAY-1997 12:09:50.34/RTN/Entering async_get_event() 22-MAY-1997 12:09:50.35/RTN/Leaving async_get_event() #5 22-MAY-1997 12:09:50.36/RTN/Entering reactivate_tsap() 22-MAY-1997 12:09:50.37/CP/The value of tsap->connected = 0 22-MAY-1997 12:09:50.38/CP/The value of tsap->connected = 0 22-MAY-1997 12:09:50.39/RTN/Leaving reactivate_tsap() #5 our event returns DT: ProcessIndication 12:09:51.774 OSAK error 44743906 0 Transport 20 0 and we then do an osak_async_close, just after this I print out the process quotas, this should have been enough. ASTcnt = 300 Current = 121 Previous = 122 BIOcnt = 300 Current = 128 Previous = 129 BYTcnt = 299040 Current = 7520 Previous = 9440 DIOcnt = 300 Current = 300 Previous = 300 ENQcnt = 2000 Current = 1996 Previous = 1996 FILcnt = 300 Current = 208 Previous = 208 PAGFILcnt = 100000 Current = 65152 Previous = 65152 TQcnt = 125 Current = 124 Previous = 124 and my listen ast returns 12: 9:54.386 Listen failed for VMD MMS_IS_LCL_36 36... 80119874 0 0 %OMNI-E-LISTEN_ERR, Listen Error I re-issue the listen call, i.e. osak_open_responder and were hosed the list is corrupt. So I'm thinking that it has someting to do with the closing of the port and maybe the freeing of buffers, either by omni or osak.
510.36		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Thu May 22 1997 12:30`	9
	Can you provide a pointer to a file with the OSAK diagnostics so that we can get a complete picture of what is going on; that is, what OSAK is doing leading up to the ACCVIO. I'm starting to feel like we're playing 20 questions ;-) We'll continue to review the OSAK code that manipulates the known tsap list, but the complete diagnostics would help us a lot in narrowing this down further. --Scott
510.37		DRAGNS::MILLER	Valerie Miller	`Thu May 22 1997 22:41`	11
	re: .35 > I was wondering why the emma call is necessary, > since this program is always listening on the same tsap and template > I'd have thought osak would know what to use. OSAK is calling EMAA to find out what the transport type is (clns, cons, rfc1006) for the actual transport connection, because on VMS one listening port can listen on all types of connections. Valerie
510.38	It Appears That BYTLM Is Not Returned After A Connection Abort.	EICSMS::CELLAM		`Wed May 28 1997 11:09`	19
	Valerie, Scott, it appears in my test environment that when a connection is aborted and the OSAK port closed that the bytlm eaten by the EMMA call is not returned to the process. After a series of connection aborts and connection re-establishments the process runs out of Bytlm and is no longer able to accept further connection requests, this is propogated through as a NO_EVENT, TRANS_ERR, 20, DEComni calls a the osak async close and the access violation takes place. I'm looking into modifying the DEComni state machine to ingore the above event when in a listening state and not to subsequently close the osak port, the osak errors will then be propogated into the omni_listen iosb. I did this for the VAX version for Volvo and it appears that this change was put into the Alpha kit, I'll have to discuss with DEComni eng why this was not done. However, I'm still concerned about the BYTLM not being returned after a connection is lost. Chris
510.39		RMULAC::S_WATTUM	Scott Wattum - FTAM/VT/OSAK Engineering	`Wed May 28 1997 12:33`	9
	It's either EMAA or more likely OSI Transport not returning the BYTLM. Please escalate this issue as a seperate IPMT to DECnet-Plus engineering, and we will attempt to work with them on further isolation. Please provide them with as much information as you can about the circumstances surrounding the bytlm loss. Thanks, --Scott
510.40	Ok IPMT for DECnet Plus, Now The Accvio Log.	EICSMS::CELLAM		`Wed May 28 1997 14:12`	28
	Scott, I'll follow up with another IPMT to DECnet-Plus eng, since they were the culprit for the Fillm quota problem I'd say that this one is in their code to, perhaps as a result of a previous fix. This is the customer accvio with the debug image, the process dump will follow. Current PROCESS Oracle Rdb environment is version V7.0-0 (MULTIVERSION) Current PROCESS SQL environment is version V7.0-0 (MULTIVERSION) Current PROCESS Rdb/Dispatch environment is version V7.0-0 (MULTIVERSION) <1> MMS-Server started at 28-MAY-1997 07:47:29.58 %SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=000000000000000C, PC=0000000000A68C64, PS=0000001B %TRACE-F-TRACEBACK, symbolic stack dump follows image module routine line rel PC abs PC OSAK$OSAKSHR OSAK_TRANSPORT t_connect_ind 20776 00000000000035F4 0000000000A68C64 0 FFFFFFFF800BD3A8 FFFFFFFF800BD3A8 0 FFFFFFFF80001924 FFFFFFFF80001924 MMS_SERVER ALST alst_bearbeiten 14208 00000000000003F8 0000000000039238 MMS_SERVER MMS_SERVER main 15588 0000000000000298 00000000000307F8 MMS_SERVER MMS_SERVER __main 0 0000000000000058 00000000000305B8 PTHREAD$RTL 0 000000000004C148 00000000004F8148 PTHREAD$RTL 0 0000000000030664 00000000004DC664 0 FFFFFFFF8173D0D8 FFFFFFFF8173D0D8 <2> MMS-Server started at 28-MAY-1997 14:01:40.75