[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference vaxaxp::vmsnotes

Title:VAX and Alpha VMS
Notice:This is a new VMSnotes, please read note 2.1
Moderator:VAXAXP::BERNARDO
Created:Thu Jan 23 1997
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:703
Total number of notes:3722

563.0. "Alpha V6.2, OPA0: hangs." by PRSSOS::MAILLARD (Denis MAILLARD) Mon May 05 1997 13:37

	Strange OPA0: hang problem on Alphaservers. One of our customers is
selling complete Alpha systems with dedicated applications to his own customers.
For over a year his customers have been experiencing hangs of the interactive
process connected to OPA0:. The problem is quite frequent: on some sites the
customer told me up to once a day, more frequently once or twice a week. The
only way to get out of the situation once the hang occurs is a reboot, which is
seldom practical as there are usually other users working on others terminals on
the system (these other terminals never get hung). Another characteristic is
that once the process on OPA0: gets hung, OPA0: starts experiencing errors at a
fast rate (often over ten per minutes). These errors appear in the SHOW ERROR
command, but none of them is ever entered in the errorlog!

I've recently been able to obtain a forced crash dump of an AlphaServer 300
4/266 under V6.2 that was experiencing this problem. The process is in LEF
state, has channels assigned to OPA0: (one is busy), and is apparently under DCL
(no current image, current stack is Supervisor, the recall buffer is
unfortunately not available). There is one IRP in the I/O request queue of OPA0:
with a rather strange function code (xF000, i.e. IO$_NOP plus x4000 and x8000 as
modifiers). Error count for the device is 2200, but none is in the errorlog,
except that just before the crash entry in ANAL/ERROR, one gets a message saying
	%ERF-I-UNKENTRY, unknown entry type, 37

	Has anybody any notion of what's happening there? Any hint would be
greatly appreciated.
			Denis.

EVIDENCES:
Process index: 0011   Name: GUIOT   Extended PID: 00000091
----------------------------------------------------------
Process status:        02040001  RES,PHDRES
Required capabilities: 0000000C  QUORUM,RUN

PCB address              80D789C0    JIB address              80D3B400
PHD address              8115C000    Swapfile disk address    00000000
Master internal PID      00020011    Subprocess count                0
Internal PID             00020011    Creator internal PID     00000000
Extended PID             00000091    Creator extended PID     00000000
State                       LEF      Termination mailbox          0000
Previous CPU Id          00000000    Current CPU Id           00000000
Previous ASNSEQ  0000000000000001    Previous ASN     0000000000000004
Current priority                8    # of threads     0000000000000000
Initial process priority        4    Delete pending count         0
Base priority                   4    AST's active                 NONE
UIC                [00200,000020]    AST's remaining               247
Mutex count                     0    Buffered I/O count/limit      149/150
Waiting EF cluster              0    Direct I/O count/limit        150/150
Abs time of last event   037EB5BC    BUFIO byte count/limit      99424/99808
Event flag wait mask     DFFFFFFF    # open files allowed left     100
Swapped copy of LEFC0    00000000    Timer entries allowed left     10
Swapped copy of LEFC1    00000000    Active page table count         0
Global cluster 2 pointer 00000000    Process WS page count          36
Global cluster 3 pointer 00000000    Global WS page count           17


Process header
--------------

First free P0 address      00000000    Accumulated CPU time       00000105
Free PTEs between P0/P1        2370    CPU since last quantum         10E2
First free P1 address      7EE82000    Subprocess quota                 10
P0 page table address      81164000    AST's enabled                  KESU
P1 page table address      81070000    ASN sequence #     0000000000000001
Free page file pages           3028    AST limit                       250
Page fault cluster size           4    Process header index           0001
Page table cluster size           1    Backup address vector      00001000
Flags                      00000080    WSL index save area        00001014
Direct I/O count                681    PTs having locked WSLs            2
Buffered I/O count             6249    PTs having valid WSLs             2
Limit on CPU time          00000000    Active page tables                2
Maximum page file count        3125    Maximum active PTs                5
Total page faults              2109    Guaranteed fluid WS pages        20
File limit                      100    Extra dynamic WS entries         92

Process index: 0011   Name: GUIOT   Extended PID: 00000091
----------------------------------------------------------
Timer queue limit                10    Locked WSLE counts array       4078
Current page file template 00000000    Valid WSLE counts array        4090
Local event flag cluster 0 C0000001    Local event flag cluster 1 E0000000


Process page file assignments
-----------------------------

        PROCIDX  SYSIDX    REFCNT

           0         3         46 Current assignment
           1         0          0
           2         0          0
           3         0          0

Remaining reserved pages     114    Total reserved pages         114

Saved process registers
-----------------------
R0   = 00000000 00000001  R1   = 00000000 00000000  R2   = FFFFFFFF 80C66680
R3   = 00000000 7FFBF680  R4   = 00000000 0000001D  R5   = 00000000 7FFBF680
R6   = 00000000 7FFBE4C0  R7   = 00000000 7FF91FC0  R8   = 00000000 7EE85EB8
R9   = 00000000 7FF9C400  R10  = 00000000 7FF9D228  R11  = 00000000 7FFBE3E0
R12  = 00000000 00000000  R13  = 00000000 7EF11DA0  R14  = 00000000 FDD04F5F
R15  = 00000000 7EF11DA0  R16  = FFFFFFFF 80C05528  R17  = FFFFFFFF 80D789C0
R18  = 00000000 00000002  R19  = 00000000 00000001  R20  = 00000000 00018009
R21  = 00000000 00018001  R22  = FFFFFFFF 80C331C0  R23  = FFFFFFFF 80D789C0
R24  = 00000000 00000002  R25  = 00000000 00000005  R26  = 00000000 00000FD2
R27  = FFFFFFFF 80C3BE08  R28  = 00000000 7EF11DA0  FP   = 00000000 7FF9C2E0
PC   = FFFFFFFF 801991E0  PS   = 00000000 00000012
KSP  = 00000000 7FF91EF0  ESP  = 00000000 7FF96000  SSP  = 00000000 7FF9C2E0
USP  = 00000000 7EE7FD40  PTBR = 00000000 00000EEE
AST{SR/EN}    = 0000000F  ASN  = 00000000 00000004


Working set information
-----------------------

First WSL entry          000000BE   Current authorized working set size   250
First locked entry       000000C4   Default (initial) working set size    125
First dynamic entry      000000C6   Maximum working set allowed (quota)   250
Last entry replaced      0000012A
Last entry in list       00000262


Process index: 0011   Name: GUIOT   Extended PID: 00000091
----------------------------------------------------------
Lock data:

Lock id:  33000595   PID:     00020011   Flags:   VALBLK  CONVERT
Par. id:  01000000   SUBLCKs:        0
LKB:      80DB8F40   BLKAST:  00000000
PRIORTY:      0000

Granted at      NL   00000000-FFFFFFFF

Resource:      45504F5F 24464D4C    LMF$_OPE  Status:
 Length   18   504C412D 534D564E    NVMS-ALP
 Exec. mode    00000000 00004148    HA......
 System        00000000 00000000    ........

Local copy


Process index: 0011   Name: GUIOT   Extended PID: 00000091
----------------------------------------------------------

                            Process active channels
                            -----------------------

Channel  Window           Status        Device/file accessed
-------  ------           ------        --------------------
  0010  00000000                        DKA0:
  0040  00000000             Busy       OPA0:
  0060  00000000                        OPA0:
  0090  80D80EC0                        DKA0:(422,1,0) (section file)
  00A0  80D85F40                        DKA0:(3214,2,0) (section file)


                            Process activated images
                            ------------------------

  IMCB    Start     End    Sym Vect    Type      Image Name  Major ID,Minor ID
-------- -------- -------- -------- ------------ -----------------------------

Total images = 0                Pages allocated = 0






OPA0                          VT400_Series                UCB address:  80C23BF8

Device status:   00000113 tim,int,online,bsy
Characteristics: 0C040007 rec,ccl,trm,avl,idv,odv
                 00000200 nnm

Owner UIC [000200,000020]   Operation count      66326   ORB address    80D2AD80
      PID        00020011   Error count           2200   DDB address    80C23A78
Class/Type          42/71   Reference count          2   DDT address    80C23AB8
Def. buf. size         80   BOFF              00000180   CRB address    80C23E00
DEVDEPEND        180891A0   Byte count        00000100   IRP address    80DE4780
DEVDEPND2        F9601400   SVAPTE            80D5DC40   Fork PC        80C5FC20
DEVDEPND3        00000000   DEVSTS            00000001   Fork R3        0000000D
FLCK index             3A   Int. due time     0008F277   I/O wait queue 80C23C64
DLCK address     80C23F00
%SDA-W-NOREAD, unable to access location 00012000
%SDA-W-NOREAD, unable to access location 00902A20
                                I/O request queue
                                -----------------

STATE    IRP      PID   MODE CHAN  FUNC    WCB     EFN    AST     IOSB    STATUS

 C   80DE4780  00020011  E   0040  C000  00000000  29  80C67760  7EFB00E0  8203
        nop bufio,func,termio





 ******************************* ENTRY     306. *******************************
 ERROR SEQUENCE 992.                             LOGGED ON:  CPU_TYPE 00000006
 DATE/TIME 25-APR-1997 10:25:40.72                            SYS_TYPE 0000000D
 SYSTEM UPTIME: 6 DAYS 18:49:54
 SCS NODE: ALPHA                                            OpenVMS AXP V6.2

 HW_MODEL: 00000639 Hardware Model = 1593.

 TIME STAMP AlphaServer 300 4/266
%ERF-I-UNKENTRY, unknown entry type, 37				<<<<<<<<<<<<<<<<
 ******************************* ENTRY     307. *******************************
 ERROR SEQUENCE 993.                             LOGGED ON:  CPU_TYPE 00000006
 DATE/TIME 25-APR-1997 10:28:22.95                            SYS_TYPE 0000000D
 SYSTEM UPTIME: 6 DAYS 18:52:37
 SCS NODE: ALPHA                                            OpenVMS AXP V6.2

 HW_MODEL: 00000000 Hardware Model = 0.

 FATAL BUGCHECK AlphaServer 300 4/266

 OPERATOR, Operator requested system shutdown

       PROCESS NAME    SYSTEM
       PROCESS ID      00020014

       ERROR PC        00000000 000305B4

    Process Status = 38000000 00001F03, SW = 03, Previous Mode = USER
    System State = 00, Current Mode = KERNEL
    VMM = 00 IPL = 31, SP Alignment = 56

 STACK POINTERS

 KSP 00000000 7FF91EF8  ESP 00000000 7FF96000  SSP 00000000 7FF9C100
 USP 00000000 7EE83B80

 GENERAL REGISTERS

 R0  00000000 00000000  R1  FFFFFFFF 80000000  R2  00000000 7FF86040
 R3  FFFFFFFF 80D5E640  R4  00000000 00000001  R5  00000000 00000001
 R6  00000000 0003007C  R7  00000000 7FF91FC0  R8  00000000 7FF9C1F8
 R9  00000000 7FF9C400  R10 00000000 00000000  R11 00000000 7FFBE3E0
 R12 00000000 00000000  R13 00000000 000100F0  R14 00000000 00000000
 R15 00000000 00020000  R16 00000000 00000474  R17 00000000 00004000
 R18 00000000 7FF91E58  R19 00000000 7FF91FC0  R20 FFFFFFFF 8191D97C
 R21 20000000 00000003  R22 00000000 00000000  R23 00000000 00000000
 R24 FFFFFFFF 80000000  R25 00000000 00000000  R26 FFFFFFFF 80C05E90
 R27 00000000 7FF91E2C  R28 00000000 00030334  FP  00000000 7FF91F00
 SP  00000000 7FF91EF8  PC  00000000 000305B4  PS  38000000 00001F03

 SYSTEM REGISTERS

       PTBR            00000000 00000CD2
                                       Page Table Base Register
       PCBB            00000000 01322080
                                       Privileged Context Block Base
       PRBR            FFFFFFFF 80D2A000
                                       Processor Base Register
       VPTB            00000002 00000000
                                       Virtual Page Table Base Register
       SCBB            00000000 000001A2
                                       System Control Block Base
       SISR            00000000 00000000
                                       Software Interrupt Summary Register
       ASN             00000000 00000001
                                       Address Space Number
       ASTSR_ASTEN     00000000 0000000F
                                       AST Summary/AST Enable
       FEN             00000000 00000000
                                       Floating-Point Enable
       IPL             00000000 0000001F
                                       Interrupt Priority Level
       MCES            00000000 00000008
                                       Machine Check Error Summary

T.RTitleUserPersonal
Name
DateLines
563.1STAR::LEWISMon May 05 1997 13:488
    Errors not entered in the errorlog are usually timeouts. We've fixed
    many problems in this area for opdriver. You need to get the latest
    patch kit -- sorry, I don't know the name (and I'm not certain that
    the absolute latest code has been made into a TIMA kit yet). 
    
    I'm not sure I know what an Alphaserver 300 is, there may be platform
    specific code that would improve the behavior too.
    Sue Lewis
563.2PRSSOS::MAILLARDDenis MAILLARDMon May 05 1997 14:256
    Re .1: Thanks for the info, Sue. Do you have any idea of where this
    latest OPDRIVER can be obtained? The latest tima kit for V6.2 is
    ALPOPDR02_062 and it is nearly a year old (June 96). Should I raise an
    IPMT to obtain the kit?
    		Thanks,
    			Denis.
563.3STAR::LEWISMon May 05 1997 14:297
>>    The latest tima kit for V6.2 is
>>    ALPOPDR02_062 and it is nearly a year old (June 96). Should I raise an
>>    IPMT to obtain the kit?          
    
      That would be a good idea. 
    Thanks
    Sue
563.4AUSS::GARSONDECcharity Program OfficeMon May 05 1997 23:137
    re .0
    
    Try enabling error logging on OPA0. ($ SET DEV/ERROR OPA0)
    
    I vaguely recall that the func gets changed as it wends its way into
    the IRP. See whether you can locate the code that queued the I/O (could
    be tricky if it's DCL via RMS) and check the caller specified func.
563.5PRSSOS::MAILLARDDenis MAILLARDTue May 06 1997 14:306
    Re .3, .4: Thanks for the tips. Actually SHOW CALL shows that the I/O
    was generated by an RMS SYS$GET, and finding the original function code
    might indeed get a bit tricky. Right now I'm in the process of getting
    a connection to a V6.2 source CD-ROM, and also of writing an IPMT form
    to get this  last version of SY$OPDRIVER.EXE.
    		Denis.