[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference cookie::sls

Title:Storage Library System
Moderator:COOKIE::REUTER
Created:Sun Oct 13 1991
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2270
Total number of notes:7850

2250.0. "Fatal Bugcheck SSRVEXCEPT on SLS$SYSBAK 2.8A" by DECPRG::ZVONAR () Wed Apr 30 1997 05:58

Hello,

the Fatal Bugcheck SSRVEXCEPT crash occured on customer site on AS8400, OpenVMS 
6.2-1H3, SLS 2.8A. The current image is SLS$SYSBAK.

Please, if somebody has some tip what may I check, let me know.

The information about crash and SLS follows. I have local copy of dump.

Thanks,
Karel,

Time of system crash: 23-APR-1997 00:11:06.16
CPU 02 reason for Bugcheck: SSRVEXCEPT, Unexpected system service exception
Process currently executing on this CPU: SYSBAK_10F8_1
Current image file: DSA2:[SLS$FILES.][SYSTEM]SLS$SYSBAK.EXE;1
Current IPL: 2  (decimal)

%SYSTEM-F-ACCVIO, access violation, reason mask=00, virtual address=00000068,
PC=80079BA8, PS=00000203

-----------------------------------------------------------------------------
        Image Identification Information

                image name: "SLS$SYSBAK"
                image file identification: "V2.8A"
                image file build identification: ""
                link date/time: 15-JAN-1997 17:04:38.08
                linker identification: "A11-14"

-----------------------------------------------------------------------------
P00022::FIELD> type SLS$ROOT:[SYSBAK.SYSBAK_LOGS]P0002210F8_1.ERR
PID 000010F9
STARTED 23-APR-1997 00:06:05.35
HISTORY SLS$ROOT:[SYSBAK.TEMP_HISTORY]P22_DAY_S_SBKP00022DSA0.HST;1
LISTING DISK$SM:[SLS$FILES]DSA0_0423.DAY;1
%SLS-I-AUTOLOADING, automatically loading volume AHT258 in drive _$1$RDEVA0:
%SLS-I-STARTING, starting volume AHT258 at position 72 on drive _$1$RDEVA0: at
23-APR-1997 00:10:35.60
HDR1DSA0.0423        DSA0  00010025000100 97113 97113 000000DECVMSBACKUP
HDR1DSA0.0423        DSA0  00010025000100 97113 97113 000000DECVMSBACKUP
HDR2F0819208192                     M             00
HDR2F0819208192                     M             00
%BACKUP-W-ACCONFLICT,
DSA0:[SYS0.SYSCOMMON.SYSEXE]SYS$QUEUE_MANAGER.QMAN$JOURNAL;1 is open for write
 by another user
%BACKUP-W-ACCONFLICT, DSA0:[VMS$COMMON.SYSEXE]SYS$QUEUE_MANAGER.QMAN$JOURNAL;1
is open for write by another user
EOF1DSA0.0423        DSA0  00010025000100 97113 97113 000346DECVMSBACKUP
EOF1DSA0.0423        DSA0  00010025000100 97113 97113 000346DECVMSBACKUP
EOF2F0819208192                     M             00
EOF2F0819208192                     M             00
%BACKUP-I-STARTRECORD, starting backup date recording pass
%SLS-I-FINISHED, finished volume AHT258 on drive _$1$RDEVA0: at 23-APR-1997
00:11:06.06
%SLS-I-HSTBUFFS, 2 history block buffers used
------------------------------------------------------------------------------ 


T.RTitleUserPersonal
Name
DateLines
2250.1please use CANASTAHAN::HALLEVolker Halle MCS @HAO DTN 863-5216Wed Apr 30 1997 14:3418
    Karel,
    
    could you please use the CANASTA Mail Server to check your CLUE file
    for a known footprint ?
    
    CANASTA is a Digital-internal crash analysis tool, which has a
    knowledge database of known crash footprints and solution and also has
    a huge database of crash footprints, which will be automatically
    searched, if there is no solution for your crash.
    
    For details, please read note VAXAXP::VMSNOTES 233.
    
    As a first step during crash analysis, please ALWAYS obtain the CLUE
    file and send it to the CANASTA Mail Server.
    
    Thanks,
    
    Volker.
2250.2CANASTA status: UNIDENTIFIEDDECPRG::ZVONARFri May 02 1997 05:1118
Volker,

I used CANASTA as the first step of problem solving. The result from CANASTA 
was STATUS: UNIDENTIFIED. I'm finding in COMET, TIMA etc. too. I found only 
NON-FATAL ACCVIO crashes in SLS$SYSBAK. The BUGCHECKFATAL on customer system is 
set to 0. No errors in errorlog (only the FATAL BUGCHECK message).

Installed ECOs:
ALPCPUC06_62, ALPLIBR05_70, ALPSYS04_62,  ALPDISM01_62, ALPSCSI02_70,
ALPDRIV04_70, ALPSHAD05_62, ALPINIT01_70, ALPSMUP01_70, ALPMANA02_70

I'm looking for some similar SLS 2.8A crashes before I start deeper analyze of 
crash dump.

Thanks,
Karel


2250.3the problem identified, no solution yetDECPRG::ZVONARFri May 02 1997 14:0152
I have more crashes on the same node on customer site and some progress in 
analyze occured:

1. System crashes on DOUBLDEALO, Double deallocation of memory block
     I am not sure that some SLS job was running.

   The CLUE output and dump was not accessible at this time.
   
   Customer then set POOLCHECK by our recommendation to ON.

2. Crash aprox. 1 hour later
     POOLCHECK, Corruption or inconsistency in pool discovered by pool checker

     Here are 2 SLS batch jobs in LEF state, one job has device RDEVA0: busy.
     RDEVA0: -> TZ87 drive in TL820, I/O request queue is empty, errorcount = 0

   CANASTA status:  UNIDENTIFIED. I have some similar rules.

   After boot customer set POOLCHECK to off without next reboot (ACTIVE       
   POOLCHECK off, CURRENT POOLCHECK on).

3. Crash aprox. 1 hour later
     BADDALRQSZ, Bad memory deallocation request size or address
	Here are 2 SLS batch jobs, the current job has RDEVA0: busy. 

   CANASTA status: PARTIAL RULE - 009B385C-DD25A4E0-31015A. From the first view       
   it looks as the same problem. SLSE02028 is not installed. IPMT CFS_50555    
   does not offer a final solution.

     Crash dump information:
CPU 03 reason for Bugcheck: BADDALRQSZ, Bad memory deallocation request size or
address
Process currently executing on this CPU: BATCH_198
Current image file: DSA2:[SLS$FILES.][TTI_RDEV]RDCONTROL_A62.EXE;1
Current IPL: 2  (decimal)

        Image Identification Information

                image name: "RDCDRIVER_A62"
                image file identification: "X-3"
                image file build identification: ""
                link date/time: 11-APR-1997 07:48:54.93
                linker identification: "A11-20"
 
On Monday I will receive dumps from all crashes.
Now is CURRENT & ACTIVE POOLCHECK set off.

So, it looks as this is the known problem without solution.

Karel


2250.4me too have twoMUNICH::REINHow come holes in SWISS CHEESE??Mon May 05 1997 06:5612
    Hallo Karel,
    
    We have also two customers reporting this type of crash. When we have
    the crash dumps available we will escalate the cases.
    A lot is pointing to the rdclient software.
    
    I found one IPMT open on the subject cfs.50606 which says, that TTI is
    involved.
    
    regards
    
    Volker 
2250.5COOKIE::MCCLELLANDMarty, SLS/MDMS EngineeringFri May 16 1997 16:318
I agree your crashes are the same as some already reported against
RDF's RDCDRIVER.  Touch Technologies, Inc. (the developer/maintainer
of RDF) is currently working on the crashes.  It is my understanding
they have a fix for each of the reported crash types and they are
doing exhaustive testing to make sure they have covered all bases.

Marty