[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference mvblab::alphaserver_4100

Title:AlphaServer 4100
Moderator:MOVMON::DAVISS
Created:Tue Apr 16 1996
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:648
Total number of notes:3158

578.0. "system uncorrectable machine check ?" by PANTER::AUBERT () Mon Apr 28 1997 09:32

    A customer of mine has an AlphaServer 4100 5/300 with 4 cpus who
    crashed 2 days ago with a "System Uncorrectable Machine Check".
    
    Is anybody know a working tools who decode Machine Check on AS4100 ?
    
    Configuration and Console abstract are included below.
    Any help will be appreciated in order to decode this machine check...
    
    Thierry Aubert/DEC at CERN
    
    
    Configuration:
    =============
    
    ----- EVENT INFORMATION -----
    
    EVENT CLASS                             OPERATIONAL EVENT
    OS EVENT TYPE                  300.     SYSTEM STARTUP
    SEQUENCE NUMBER                  1.
    OPERATING SYSTEM                        DEC OSF/1
    OCCURRED/LOGGED ON                      Fri Apr 25 18:38:26 1997
    OCCURRED ON SYSTEM                      shd52
    SYSTEM ID                 x00050016
    SYSTYPE                   x00000000
    MESSAGE                                 Alpha boot: available memory
    from
                                             _0xb18000 to 0xfffe000
                                            Digital UNIX V4.0B  (Rev. 564);
    Wed
                                             _Apr 16 15:15:14 MET DST 1997
                                            physical memory = 256.00
    megabytes.
                                            available memory = 244.89
    megabytes.
                                            using 975 buffers containing
    7.61
                                             _megabytes of memory
                                            Master cpu at slot 0.
                                            Firmware revision: 3.0
                                            PALcode: Digital-UNIX/OSF
    version 1.21
                                            AlphaServer 4100 5/300 0MB
                                            pci1 at mcbus0 slot 5
                                            psiop0 at pci1 slot 1
                                            Loading SIOP: script c0000c00,
    reg
                                             _4444000, data c000cb70
                                            scsi0 at psiop0 slot 0
                                            rz5 at scsi0 target 5 lun 0
    (LID=0)
                                             _(DEC     RRD45   (C) DEC 
    0436)
                                            pza0 at pci1 slot 2
                                            pza0 firmware version: DEC  P01 
    A10
                                             _
                                            scsi1 at pza0 slot 0
                                            tz8 at scsi1 target 0 lun 0
    (LID=1)
                                             _(STK     SD-3            
    011E)
                                             _(Wide16)
                                            pza1 at pci1 slot 3
                                            pza1 firmware version: DEC  P01 
    A10
                                             _
                                            scsi2 at pza1 slot 0
                                            tz16 at scsi2 target 0 lun 0
    (LID=2)
                                             _(STK     SD-3            
    011E)
                                             _(Wide16)
                                            pza2 at pci1 slot 4
                                            pza2 firmware version: DEC  P01 
    A10
                                             _
                                            scsi3 at pza2 slot 0
                                            tz24 at scsi3 target 0 lun 0
    (LID=3)
                                             _(STK     SD-3            
    011E)
                                             _(Wide16)
                                            tz26 at scsi3 target 2 lun 0
    (LID=4)
                                             _(QUANTUM DLT7000         
    101A)
                                             _(Wide16)
                                            pza3 at pci1 slot 5
                                            pza3 firmware version: DEC  P01 
    A10
                                             _
                                            scsi4 at pza3 slot 0
                                            tz32 at scsi4 target 0 lun 0
    (LID=5)
                                             _(STK     SD-3            
    011E)
                                             _(Wide16)
                                            changer at scsi4 target 1 lun 0
                                             _(LID=6) (STK     9714
                                             _1300)
                                            tz34 at scsi4 target 2 lun 0
    (LID=7)
                                             _(QUANTUM DLT7000         
    101A)
                                             _(Wide16)
                                            gpc0 at eisa0
                                            pci0 at mcbus0 slot 4
                                            eisa0 at pci0
                                            ace0 at eisa0
                                            ace1 at eisa0
                                            lp0 at eisa0
                                            fdi0 at eisa0
                                            fd0 at fdi0 unit 0
                                            pci2000 at pci0 slot 2
                                            isp0 at pci2000 slot 0
                                            isp0: QLOGIC ISP1020A
                                            isp0: Firmware revision 2.10
    (loaded
                                             _by console)
                                            scsi5 at isp0 slot 0
                                            rz40 at scsi5 target 0 lun 0
    (LID=8)
                                             _(DEC     RZ29B    (C) DEC
    0016)
                                             _(Wide16)
                                            rz41 at scsi5 target 1 lun 0
    (LID=9)
                                             _(DEC     RZ29B    (C) DEC
    0016)
                                             _(Wide16)
                                            rz42 at scsi5 target 2 lun 0
    (LID=10)
                                             _(DEC     RZ29B    (C) DEC
    0016)
                                             _(Wide16)
                                            rz43 at scsi5 target 3 lun 0
    (LID=11)
                                             _(DEC     RZ29B    (C) DEC
    0016)
                                             _(Wide16)
                                            rz44 at scsi5 target 4 lun 0
    (LID=12)
                                             _(DEC     RZ29B    (C) DEC
    0016)
                                             _(Wide16)
                                            rz45 at scsi5 target 5 lun 0
    (LID=13)
                                             _(DEC     RZ29B    (C) DEC
    0016)
                                             _(Wide16)
                                            rz46 at scsi5 target 6 lun 0
    (LID=14)
                                             _(DEC     RZ29B    (C) DEC
    0016)
                                             _(Wide16)
                                            tu0: DECchip 21140-AA:
    Revision: 1.2
                                            tu0 at pci0 slot 3
                                            tu0: DEC Fast Ethernet
    Interface,
                                             _hardware address:
    00-00-F8-31-11-6D
                                            tu0: console mode: selecting
    10BaseT
                                             _(UTP) port: half duplex
                                            hip0: Roadrunner version 2
    (20000900)
                                            hip0 at pci0 slot 4
                                            hip0 slot 4: PCI/HIPPI
    interface
                                             _0-a0-88-1-0-88
                                            fta0 DEC DEFPA FDDI Module,
    Hardware
                                             _Revision 1
                                            fta0 at pci0 slot 5
                                            fta0: DMA Available.
                                            fta0: DEC DEFPA (PDQ) FDDI
    Interface,
                                             _Hardware address:
    08-00-2B-B4-15-75
                                            fta0: Firmware rev: 2.46
                                            Created FRU table configuration
    binary
                                             _errorlog packet
                                            kernel console: ace0
                                            dli: configured
    
    shd52 [55]
    
    
    
    Console abstract:
    ================
    
shmon2% more 2504
    ca
    hYinoe Cuhrec ks ySsYtSemTE Mh Faast ahla lAbtoerdt
    uMaec htion ea nc hierckr eccoodve e= r0axb2l02e0 0e0r0r
    0       rp.a Rl teecomp[r0d-1 ]         =t h0e0
    00e00r1r4o00rec bh9a8l t0 0c0o0d0e001 1fafnfde 1P1C 0
    d        pcalo ntteampc[t2 -y3]o        u       r=  Dfifgffiftca0l0
    0S0e46r4vc80
    i c0e00s00 0re00p0r0e0s044e0n0
    m       ipavel t.e
    p[I4n- 5]a      d       d=i t00i00o000n1,4 0t2y4pfec6 8I N0F0O0 030
    0an01d4 0I24
    fcN68F
            pa8l  atte mtph[e6 -c7o]        n       = s0o0l0e0 00a0n0d0
    00r00e0c00o
    frfdf ftfhc0e0004646c0
            pal temp[8-9]           = 1f1e171515020100 fffffc00004649f0
            pal temp[10-11]         = 00000001202c3e4c fffffc0000464850
            pal temp[12-13]         = fffffc0000464bf0 0000000000006e80
            pal temp[14-15]         = 0000000000000000 00000000000f0000
            pal temp[16-17]         = 0000020306600001 0000000000000000
            pal temp[18-19]         = 000000011fffe0c0 ffffffff90eaba38
            pal temp[20-21]         = 000000000a074000 fffffc0000464c20
            pal temp[22-23]         = fffffc00005e4530 00000000014d9a38
            shadow[0-1]             = 0000000000000000 0000000000000000
            shadow[2-3]             = 0000000000000000 0000000000000000
            shadow[4-5]             = 0000000000000000 0000000000000000
            shadow[6-7]             = 0000000000000000 0000000000000000
            Addr of excepting instruction   = 00000001202c3e4c
            Summary of arithmetic traps     = 0000000000000000
            Exception mask                  = 0000000000000000
            Base address for PALcode        = 0000000000014000
            Interrupt Status Reg            = 0000000000200000
            CURRENT SETUP OF EV5 IBOX       = 000000c164000000
            I-CACHE Reg Tag parity error    = 0000000000000000
            D-CACHE error Reg               = 000= 000000fbe0000000
             Whami reg.     = 0000023a
             Sys. Env. reg. = 00000000
             PCI Rev. reg.  = 06000032
             CAP_CTL reg.   = 42460ff1
             HAE_MEM reg.   = 00000000
             HAE_IO reg.    = 00000000
             INT_CTL reg.   = 00000003
             INT_REG reg.   = 00800000
             INT_MASK0 reg. = 00c51111
             INT_MASK1 reg. = 00000000
             MC_ERR0 reg.   = 0d124500
             MC_ERR1 reg.   = 800fc800
             CAP_ERR reg.   = c0000000
             PCI_ERR1 reg.  = 00000000
             MDPA_STAT reg. = 00000000
             MDPA_SYN reg.  = 00000000
             MDPB_STAT reg. = 00000000
             MDPB_SYN reg.  = 00000000
    panic (cpu 0): System Uncorrectable Machine Check
    Machine Check SYSTEM Fatal Abort
    Machine check code = 0x2020000
            pal temp[0-1]           = 0000000000000007 fffffc00005e2eb0
            pal temp[2-3]           = fffffc0000464c80 0000000000004400
            pal 164000000
            I-CACHE Reg Tag parity error    = 0000000000000000
            D-CACHE error Reg               = 0000000000000000
            Effective VA                    = ffffffffff8000a0
            Reason for D-stream             = 0000000000014890
            EV5 SCache address              = ffffff000001904f
            EV5 SCache TAG/Data parity      = 0000000000000000
            EV5 BC_TAG_ADDR                 = ffffffffffffffff
            EV5 EI_ADDR: Phys addr of Xfer  = ffffff000802a0bf
            Fill Syndrome           000000
             MC_ERR0 reg.   = 00006e80
             MC_ERR1 reg.   = 800e8a04
             CAP_ERR reg.   = 85000000
             PCI_ERR1 reg.  = 00000000
             MDPA_STAT reg. = 00000000
             MDPA_SYN reg.  = 00000000
             MDPB_STAT reg. = 00000000
             MDPB_SYN reg.  = 00000000
    
    DUMP: 1000000 blocks available for dumping.
    DUMP: 57337 required for a partial dump.
    DUMP: 0x814001 is the primary swap with 999999, start our last 57336
        : of dump at 942663, going to end (real end is one more, for
    header)
    device string for dump = SCSI 0 2000 0 0 0 0 0.
    DUMP.prom: dev SCSI 0 2000 0 0 0 0 0, block 262144
     results.
    DUMP: Header to 0x814001 at 999999 (0xf423f)
    device string for dump = SCSI 0 2000 0 0 0 0 0.
    DUMP.prom: dev SCSI 0 2000 0 0 0 0 0, block 262144
    DUMP: Header to 0x814001 at 999999 (0xf423f)
    succeeded
    halted CPU 1
    halted CPU 2
    halted CPU 3
    CP - SAVE_TERM routine to be called
    CP - SAVE_TERM exited with hlt_req = 1, r0 = 00000000.00000000
    
    halted CPU 0
    
    halt code = 5
    HALT instruction executed
    PC = fffffc0000465250
    P00>>>
T.RTitleUserPersonal
Name
DateLines
578.1errorlog?POBOXB::STEINMANMon Apr 28 1997 22:324
    
    Can you get the binary errorlog from the system?
    
    mo
578.2Here is the binary errorlog.PANTER::AUBERTTue Apr 29 1997 08:58753
    You will find the binary errorlog on ruxack.geo.dec.com (ftp account),
    file /pub/binary.errlog.shd52
    
    You will find below the "dia -R -f binary.errlog.shd52" output.
    
    Thanks for any diagnosis.
    
    Thierry Aubert/DEC at CERN
    
    % /usr/sbin/dia -R -f binary.errlog.shd52 | more
    
    DECevent V2.3
    
    
    ******************************** ENTRY    1
    ********************************
    
    
    Logging OS                        2. Digital UNIX
    System Architecture               2. Alpha
    Event sequence number             1.
    Timestamp of occurrence              25-APR-1997 18:38:26
    Host name                            shd52
    
    System type register      x00000016  AlphaServer 4000 Series
    Number of CPUs (mpnum)    x00000001
    CPU logging event (mperr) x00000000
    
    Event validity                    1. O/S claims event is valid
    Event severity                    5. Low Priority
    Entry type                      300. Start-Up ASCII Message Type
    
    SWI Minor class                   9. ASCII Message
    SWI Minor sub class               3. Startup
    
    ASCII Message
        Alpha boot: available memory from 0xb18000 to 0xfffe000
        Digital UNIX V4.0B  (Rev. 564); Wed Apr 16 15:15:14 MET DST 1997
        physical memory = 256.00 megabytes.
        available memory = 244.89 megabytes.
        using 975 buffers containing 7.61 megabytes of memory
        Master cpu at slot 0.
        Firmware revision: 3.0
        PALcode: Digital-UNIX/OSF version 1.21
        AlphaServer 4100 5/300 0MB
        pci1 at mcbus0 slot 5
        psiop0 at pci1 slot 1
        Loading SIOP: script c0000c00, reg 4444000, data c000cb70
        scsi0 at psiop0 slot 0
        rz5 at scsi0 target 5 lun 0 (LID=0) (DEC     RRD45   (C) DEC  0436)
        pza0 at pci1 slot 2
        pza0 firmware version: DEC  P01  A10
        scsi1 at pza0 slot 0
        tz8 at scsi1 target 0 lun 0 (LID=1) (STK     SD-3             011E)
        (Wide16)
        pza1 at pci1 slot 3
        pza1 firmware version: DEC  P01  A10
        scsi2 at pza1 slot 0
        tz16 at scsi2 target 0 lun 0 (LID=2) (STK     SD-3            
    011E)
        (Wide16)
        pza2 at pci1 slot 4
        pza2 firmware version: DEC  P01  A10
        scsi3 at pza2 slot 0
        tz24 at scsi3 target 0 lun 0 (LID=3) (STK     SD-3            
    011E)
        (Wide16)
        tz26 at scsi3 target 2 lun 0 (LID=4) (QUANTUM DLT7000         
    101A)
        (Wide16)
        pza3 at pci1 slot 5
        pza3 firmware version: DEC  P01  A10
        scsi4 at pza3 slot 0
        tz32 at scsi4 target 0 lun 0 (LID=5) (STK     SD-3            
    011E)
        (Wide16)
        changer at scsi4 target 1 lun 0 (LID=6) (STK     9714            
    1300)
        tz34 at scsi4 target 2 lun 0 (LID=7) (QUANTUM DLT7000         
    101A)
        (Wide16)
        gpc0 at eisa0
        pci0 at mcbus0 slot 4
        eisa0 at pci0
        ace0 at eisa0
        ace1 at eisa0
        lp0 at eisa0
        fdi0 at eisa0
        fd0 at fdi0 unit 0
        pci2000 at pci0 slot 2
        isp0 at pci2000 slot 0
        isp0: QLOGIC ISP1020A
        isp0: Firmware revision 2.10 (loaded by console)
        scsi5 at isp0 slot 0
        rz40 at scsi5 target 0 lun 0 (LID=8) (DEC     RZ29B    (C) DEC
    0016)
        (Wide16)
        rz41 at scsi5 target 1 lun 0 (LID=9) (DEC     RZ29B    (C) DEC
    0016)
        (Wide16)
        rz42 at scsi5 target 2 lun 0 (LID=10) (DEC     RZ29B    (C) DEC
    0016)
        (Wide16)
        rz43 at scsi5 target 3 lun 0 (LID=11) (DEC     RZ29B    (C) DEC
    0016)
        (Wide16)
        rz44 at scsi5 target 4 lun 0 (LID=12) (DEC     RZ29B    (C) DEC
    0016)
        (Wide16)
        rz45 at scsi5 target 5 lun 0 (LID=13) (DEC     RZ29B    (C) DEC
    0016)
        (Wide16)
        rz46 at scsi5 target 6 lun 0 (LID=14) (DEC     RZ29B    (C) DEC
    0016)
        (Wide16)
        tu0: DECchip 21140-AA: Revision: 1.2
        tu0 at pci0 slot 3
        tu0: DEC Fast Ethernet Interface, hardware address:
    00-00-F8-31-11-6D
        tu0: console mode: selecting 10BaseT (UTP) port: half duplex
        hip0: Roadrunner version 2 (20000900)
        hip0 at pci0 slot 4
        hip0 slot 4: PCI/HIPPI interface 0-a0-88-1-0-88
        fta0 DEC DEFPA FDDI Module, Hardware Revision 1
        fta0 at pci0 slot 5
        fta0: DMA Available.
        fta0: DEC DEFPA (PDQ) FDDI Interface, Hardware address:
    08-00-2B-B4-15-75
        fta0: Firmware rev: 2.46
        Created FRU table configuration binary errorlog packet
        kernel console: ace0
        dli: configured
    
    
    
    ******************************** ENTRY    2
    ********************************
    
    
    Logging OS                        2. Digital UNIX
    System Architecture               2. Alpha
    Event sequence number             0.
    Timestamp of occurrence              25-APR-1997 18:38:26
    Host name                            shd52
    
    System type register      x00000016  AlphaServer 4000 Series
    Number of CPUs (mpnum)    x00000001
    CPU logging event (mperr) x00000000
    
    Event validity                    1. O/S claims event is valid
    Event severity                    5. Low Priority
    Entry type                      110. Generalized Machine State Type
    
    SWI Minor class                   3. System configuration
    
    
    
        **********     The Following Revision 4.0 FRU Table     **********
        **********         is NOT supported at this time        **********
    
    **** FRU Table Header ***
    
       Checksum of config pkt x7981B0585077D8F9
       FRU Table length       x00002C4F
       FRU Table Revision     x00000004
       System Serial Number              AY65200674
    
    
    
    ******************************** ENTRY    3
    ********************************
    
    
    Logging OS                        2. Digital UNIX
    System Architecture               2. Alpha
    Event sequence number             4.
    Timestamp of occurrence              25-APR-1997 18:10:17
    Host name                            shd52
    
    System type register      x00000016  AlphaServer 4000 Series
    Number of CPUs (mpnum)    x00000004
    CPU logging event (mperr) x00000000
    
    Event validity                    1. O/S claims event is valid
    Event severity                    1. Severe Priority
    Entry type                      100. CPU Machine Check Errors
    
    CPU Minor class                   2. 660 Entry
    
    Software Flags            x0000000300000000
                                         IOD 0 Register Subpkt Pres
                                         IOD 1 Register Subpkt Pres
    Active CPUs               x0000000F
    Hardware Rev              x00000000
    System Serial Number                 AY65200674
    Module Serial Number
    Module Type                   x0000
    System Revision           x00000000
    
    * MCHK 660 Regs *
    Flags:                    x00000000
    PCI Mask                      x0000
    Machine Check Reason          x0202  IOD-Detected Hard Error -OR-
                                         DTag Parity Error (If Cached CPU)
    PAL SHADOW REG 0          x0000000000000000
    PAL SHADOW REG 1          x0000000000000000
    PAL SHADOW REG 2          x0000000000000000
    PAL SHADOW REG 3          x0000000000000000
    PAL SHADOW REG 4          x0000000000000000
    PAL SHADOW REG 5          x0000000000000000
    PAL SHADOW REG 6          x0000000000000000
    PAL SHADOW REG 7          x0000000000000000
    PALTEMP0                  x0000000000000007
    PALTEMP1                  xFFFFFC00005E2EB0
    PALTEMP2                  xFFFFFC0000464C80
    PALTEMP3                  x0000000000004400
    PALTEMP4                  x00000000000F40C2
    PALTEMP5                  x0000F980000003F8
    PALTEMP6                  x0000000000000000
    PALTEMP7                  xFFFFFC00004646C0
    PALTEMP8                  x1F1E171515020100
    PALTEMP9                  xFFFFFC00004649F0
    PALTEMP10                 xFFFFFC000046E444
    PALTEMP11                 xFFFFFC0000464850
    PALTEMP12                 xFFFFFC0000464BF0
    PALTEMP13                 x0000000000006E80
    PALTEMP14                 x0000000000000000
    PALTEMP15                 x00000000000F0000
    PALTEMP16                 x0000020306600001
    PALTEMP17                 x0000000000000000
    PALTEMP18                 x000000011FFFE0C0
    PALTEMP19                 xFFFFFFFF90EAB7D0
    PALTEMP20                 x000000000A074000
    PALTEMP21                 xFFFFFC0000464C20
    PALTEMP22                 xFFFFFC00005E4530
    PALTEMP23                 x00000000014D9A38
    Exception Address Reg     xFFFFFC000046E444
                                         Native-mode Instruction
                                         Exception PC  x3FFFFF000011B911
    Exception Summary Reg     x0000000000000000
    Exception Mask Reg        x0000000000000000
    PAL Base Address Reg      x0000000000014000
                                         Base Addr for PALcode: 
    x0000000000000005
    Interrupt Summary Reg     x0000000000200000
                                         External HW Interrupt at IPL21
                                         AST Requests 3-0: 
    x0000000000000000
    IBOX Ctrl and Status Reg  x000000C164000000
                                         Timeout Counter Bit Clear.
                                         IBOX Timeout Counter Enabled.
                                         Floating Point Instr's May be
    Issued.
                                         PAL Shadow Registers Enabled.
                                         Correctable Error Interrupts
    Enabled.
                                         ICACHE BIST (Self Test) Was
    Successful.
                                         TEST_STATUS_H Pin Asserted
    Icache Par Err Stat Reg   x0000000000000000
    Dcache Par Err Stat Reg   x0000000000000000
    Virtual Address Reg       xFFFFFFFFFF8000A0
    Memory Mgmt Flt Sts Reg   x0000000000014890
                                         If Err, Reference Resulted in DTB
    Miss
                                         Fault Inst RA Field: 
    x0000000000000002
    
                                         Fault Inst Opcode: 
    x0000000000000029
    Scache Address Reg        xFFFFFF000001904F
    Scache Status Reg         x0000000000000000
    Bcache Tag Address Reg    xFFFFFFFFFFFFFFFF
                                         Last Bcache Access Resulted in a
    Hit.
                                         Value of Parity Bit for Tag
    Control Status
                                            Bits Dirty, Shared & Valid is
    Set.
                                         Value of Tag Control Dirty Bit is
    Set.
                                         Value of Tag Control Shared Bit is
    Set.
                                         Value of Tag Control Valid Bit is
    Set.
                                         Value of Parity Bit Covering Tag
    Store
                                            Address Bits is Set.
                                         Tag Address<38:20> Is: 
    x000000000007FFFF
    Ext Interface Address Reg xFFFFFF000802A0BF
    Fill Syndrome Reg         x0000000000006900
    Ext Interface Status Reg  xFFFFFFF004FFFFFF
                                         Error Occurred During D-ref Fill
    LD LOCK                   xFFFFFF00005EDEDF
    
    ** IOD SUBPACKET -> **               IOD 0 Register Subpacket
    
    WHOAMI                    x0000023A  Module Revision  1.
                                         CPU = 0
    
    Base Address of Bridge    x000000F9E0000000
    Dev Type & Rev Register   x06008032  CAP Chip Revision:       
    x00000002
                                         HORSE  Module Revision:  
    x00000003
                                         SADDLE Module Revision:  
    x00000000
                                         SADDLE Module Type:        Left
    Hand
                                         PCI-EISA Bus Bridge Present on PCI
    Segment
                                         PCI Class Code           
    x00000600
    MC-PCI Command Register   x42460FF1  Module SelfTest Passed LED on
                                         Delayed PCI Bus Reads Protocol:
    Enabled
                                         Bridge to PCI Transactions:
    Enabled
                                         Bridge REQUESTS 64 Bit Data
    Transactions
                                         Bridge ACCEPTS 64 Bit Data
    Transactions
                                         PCI Address Parity Check: Enabled
                                         MC Bus CMD/Addr Parity Check:
    Enabled
                                         MC Bus NXM Check: Enabled
                                         Check ALL Transactions for Errors
                                         Use RD/MOD/WRT for <64 Byte Block
    Mem Wrt
                                         Wrt PEND_NUM Threshold:  6.
                                         RD_TYPE Memory Prefetch Algorithm:
    Short
                                         RL_TYPE Mem Rd Line Prefetch Type:
    Medium
                                         RM_TYPE Mem Rd Multiple Cmd Type: 
    Long
                                         ARB_MODE PCI Arbitration: Round
    Robin
    Mem Host Address Ext Reg  x00000000  HAE Sparse Mem Adr<31:27>
    x00000000
    IO Host Adr Ext Register  x00000000  PCI Upper Adr Bits<31:25>
    x00000000
    Interrupt Ctrl Register   x00000003  Write Device Interrupt Info
    Struct:Enabled
    Interrupt Request         x00811011  Interrupts asserted  x00011011
                                         Hard Error
    Interrupt Mask0 Register  x00C51111
    Interrupt Mask1 Register  x00000000
    MC Error Info Register 0  x00006E80
                                         MC Bus Trans Addr<31:4>: 6E80
    MC Error Info Register 1  x800E8A04  MC bus trans addr <39:32>
    x00000004
                                         MC Command is ReadMod0-Mem
                                         CPU0 Master at Time of Error
                                         Device ID:   x00000002
                                         MC error info valid
    CAP Error Register        x85000000  Error Detected but Not Logged
                                         Non-existant memory
                                         MC error info latched
    PCI Bus Trans Error Adr   x000003FE
    MDPA Status Register      x00000000  MDPA Status Register Data Not
    Valid
    MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not
    Valid
    MDPB Status Register      x00000000  MDPB Status Register Data Not
    Valid
    MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not
    Valid
    
    ** IOD SUBPACKET -> **               IOD 1 Register Subpacket
    
    WHOAMI                    x0000023A  Module Revision  1.
                                         CPU = 0
    
    Base Address of Bridge    x000000FBE0000000
    Dev Type & Rev Register   x06000032  CAP Chip Revision:       
    x00000002
                                         HORSE  Module Revision:  
    x00000003
                                         SADDLE Module Revision:  
    x00000000
                                         SADDLE Module Type:        Left
    Hand
                                         Internal CAP Chip Arbiter: Enabled
                                         PCI Class Code           
    x00000600
    MC-PCI Command Register   x42460FF1  Module SelfTest Passed LED on
                                         Delayed PCI Bus Reads Protocol:
    Enabled
                                         Bridge to PCI Transactions:
    Enabled
                                         Bridge REQUESTS 64 Bit Data
    Transactions
                                         Bridge ACCEPTS 64 Bit Data
    Transactions
                                         PCI Address Parity Check: Enabled
                                         MC Bus CMD/Addr Parity Check:
    Enabled
                                         MC Bus NXM Check: Enabled
                                         Check ALL Transactions for Errors
                                         Use RD/MOD/WRT for <64 Byte Block
    Mem Wrt
                                         Wrt PEND_NUM Threshold:  6.
                                         RD_TYPE Memory Prefetch Algorithm:
    Short
                                         RL_TYPE Mem Rd Line Prefetch Type:
    Medium
                                         RM_TYPE Mem Rd Multiple Cmd Type: 
    Long
                                         ARB_MODE PCI Arbitration: Round
    Robin
    Mem Host Address Ext Reg  x00000000  HAE Sparse Mem Adr<31:27>
    x00000000
    IO Host Adr Ext Register  x00000000  PCI Upper Adr Bits<31:25>
    x00000000
    Interrupt Ctrl Register   x00000003  Write Device Interrupt Info
    Struct:Enabled
    Interrupt Request         x00800000  Interrupts asserted  x00000000
                                         Hard Error
    Interrupt Mask0 Register  x00C51111
    Interrupt Mask1 Register  x00000000
    MC Error Info Register 0  x00006E80
                                         MC Bus Trans Addr<31:4>: 6E80
    MC Error Info Register 1  x800E8A04  MC bus trans addr <39:32>
    x00000004
                                         MC Command is ReadMod0-Mem
                                         CPU0 Master at Time of Error
                                         Device ID:   x00000002
                                         MC error info valid
    CAP Error Register        x85000000  Error Detected but Not Logged
                                         Non-existant memory
                                         MC error info latched
    PCI Bus Trans Error Adr   x00000000
    MDPA Status Register      x00000000  MDPA Status Register Data Not
    Valid
    MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not
    Valid
    MDPB Status Register      x00000000  MDPB Status Register Data Not
    Valid
    MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not
    Valid
    
    
    PALcode Revision                     Palcode Rev: 1.21-3
    
    
    ******************************** ENTRY    4
    ********************************
    
    
    Logging OS                        2. Digital UNIX
    System Architecture               2. Alpha
    Event sequence number             3.
    Timestamp of occurrence              25-APR-1997 18:10:17
    Host name                            shd52
    
    System type register      x00000016  AlphaServer 4000 Series
    Number of CPUs (mpnum)    x00000004
    CPU logging event (mperr) x00000000
    
    Event validity                    1. O/S claims event is valid
    Event severity                    1. Severe Priority
    Entry type                      302. ASCII Panic Message Type
    
    SWI Minor class                   9. ASCII Message
    SWI Minor sub class               1. Panic
    
    ASCII Message                        panic (cpu 0): System
    Uncorrectable
                                         Machine Check
    
    
    
    ******************************** ENTRY    5
    ********************************
    
    
    Logging OS                        2. Digital UNIX
    System Architecture               2. Alpha
    Event sequence number             2.
    Timestamp of occurrence              25-APR-1997 18:10:13
    Host name                            shd52
    
    System type register      x00000016  AlphaServer 4000 Series
    Number of CPUs (mpnum)    x00000004
    CPU logging event (mperr) x00000000
    
    Event validity                    1. O/S claims event is valid
    Event severity                    1. Severe Priority
    Entry type                      100. CPU Machine Check Errors
    
    CPU Minor class                   2. 660 Entry
    
    Software Flags            x0000000300000000
                                         IOD 0 Register Subpkt Pres
                                         IOD 1 Register Subpkt Pres
    Active CPUs               x0000000F
    Hardware Rev              x00000000
    System Serial Number                 AY65200674
    Module Serial Number
    Module Type                   x0000
    System Revision           x00000000
    
    * MCHK 660 Regs *
    Flags:                    x00000000
    PCI Mask                      x0000
    Machine Check Reason          x0202  IOD-Detected Hard Error -OR-
                                         DTag Parity Error (If Cached CPU)
    PAL SHADOW REG 0          x0000000000000000
    PAL SHADOW REG 1          x0000000000000000
    PAL SHADOW REG 2          x0000000000000000
    PAL SHADOW REG 3          x0000000000000000
    PAL SHADOW REG 4          x0000000000000000
    PAL SHADOW REG 5          x0000000000000000
    PAL SHADOW REG 6          x0000000000000000
    PAL SHADOW REG 7          x0000000000000000
    PALTEMP0                  x00000001400ECB98
    PALTEMP1                  x000000011FFFE110
    PALTEMP2                  xFFFFFC0000464C80
    PALTEMP3                  x0000000000004400
    PALTEMP4                  x000000014024FC68
    PALTEMP5                  x000000014024FC68
    PALTEMP6                  x0000000000000000
    PALTEMP7                  xFFFFFC00004646C0
    PALTEMP8                  x1F1E171515020100
    PALTEMP9                  xFFFFFC00004649F0
    PALTEMP10                 x00000001202C3E4C
    PALTEMP11                 xFFFFFC0000464850
    PALTEMP12                 xFFFFFC0000464BF0
    PALTEMP13                 x0000000000006E80
    PALTEMP14                 x0000000000000000
    PALTEMP15                 x00000000000F0000
    PALTEMP16                 x0000020306600001
    PALTEMP17                 x0000000000000000
    PALTEMP18                 x000000011FFFE0C0
    PALTEMP19                 xFFFFFFFF90EABA38
    PALTEMP20                 x000000000A074000
    PALTEMP21                 xFFFFFC0000464C20
    PALTEMP22                 xFFFFFC00005E4530
    PALTEMP23                 x00000000014D9A38
    Exception Address Reg     x00000001202C3E4C
                                         Native-mode Instruction
                                         Exception PC  x00000000480B0F93
    Exception Summary Reg     x0000000000000000
    Exception Mask Reg        x0000000000000000
    PAL Base Address Reg      x0000000000014000
                                         Base Addr for PALcode: 
    x0000000000000005
    Interrupt Summary Reg     x0000000000200000
                                         External HW Interrupt at IPL21
                                         AST Requests 3-0: 
    x0000000000000000
    IBOX Ctrl and Status Reg  x000000C164000000
                                         Timeout Counter Bit Clear.
                                         IBOX Timeout Counter Enabled.
                                         Floating Point Instr's May be
    Issued.
                                         PAL Shadow Registers Enabled.
                                         Correctable Error Interrupts
    Enabled.
                                         ICACHE BIST (Self Test) Was
    Successful.
                                         TEST_STATUS_H Pin Asserted
    Icache Par Err Stat Reg   x0000000000000000
    Dcache Par Err Stat Reg   x0000000000000000
    Virtual Address Reg       xFFFFFFFF90EABA08
    Memory Mgmt Flt Sts Reg   x0000000000016AD1
                                         If Error, Reference Which Caused
    Was Write
                                         If Err, Reference Resulted in DTB
    Miss
                                         Fault Inst RA Field: 
    x000000000000000B
    
                                         Fault Inst Opcode: 
    x000000000000002D
    Scache Address Reg        xFFFFFF000001902F
    Scache Status Reg         x0000000000000000
    Bcache Tag Address Reg    xFFFFFFFFFFFFFFFF
                                         Last Bcache Access Resulted in a
    Hit.
                                         Value of Parity Bit for Tag
    Control Status
                                            Bits Dirty, Shared & Valid is
    Set.
                                         Value of Tag Control Dirty Bit is
    Set.
                                         Value of Tag Control Shared Bit is
    Set.
                                         Value of Tag Control Valid Bit is
    Set.
                                         Value of Parity Bit Covering Tag
    Store
                                            Address Bits is Set.
                                         Tag Address<38:20> Is: 
    x000000000007FFFF
    Ext Interface Address Reg xFFFFFF000802A0BF
    Fill Syndrome Reg         x0000000000006900
    Ext Interface Status Reg  xFFFFFFF004FFFFFF
                                         Error Occurred During D-ref Fill
    LD LOCK                   xFFFFFF0000200A0F
    
    ** IOD SUBPACKET -> **               IOD 0 Register Subpacket
    
    WHOAMI                    x0000023A  Module Revision  1.
                                         CPU = 0
    
    Base Address of Bridge    x000000F9E0000000
    Dev Type & Rev Register   x06008032  CAP Chip Revision:       
    x00000002
                                         HORSE  Module Revision:  
    x00000003
                                         SADDLE Module Revision:  
    x00000000
                                         SADDLE Module Type:        Left
    Hand
                                         PCI-EISA Bus Bridge Present on PCI
    Segment
                                         PCI Class Code           
    x00000600
    MC-PCI Command Register   x42460FF1  Module SelfTest Passed LED on
                                         Delayed PCI Bus Reads Protocol:
    Enabled
                                         Bridge to PCI Transactions:
    Enabled
                                         Bridge REQUESTS 64 Bit Data
    Transactions
                                         Bridge ACCEPTS 64 Bit Data
    Transactions
                                         PCI Address Parity Check: Enabled
                                         MC Bus CMD/Addr Parity Check:
    Enabled
                                         MC Bus NXM Check: Enabled
                                         Check ALL Transactions for Errors
                                         Use RD/MOD/WRT for <64 Byte Block
    Mem Wrt
                                         Wrt PEND_NUM Threshold:  6.
                                         RD_TYPE Memory Prefetch Algorithm:
    Short
                                         RL_TYPE Mem Rd Line Prefetch Type:
    Medium
                                         RM_TYPE Mem Rd Multiple Cmd Type: 
    Long
                                         ARB_MODE PCI Arbitration: Round
    Robin
    Mem Host Address Ext Reg  x00000000  HAE Sparse Mem Adr<31:27>
    x00000000
    IO Host Adr Ext Register  x00000000  PCI Upper Adr Bits<31:25>
    x00000000
    Interrupt Ctrl Register   x00000003  Write Device Interrupt Info
    Struct:Enabled
    Interrupt Request         x00800000  Interrupts asserted  x00000000
                                         Hard Error
    Interrupt Mask0 Register  x00C51111
    Interrupt Mask1 Register  x00000000
    MC Error Info Register 0  x0D124500
                                         MC Bus Trans Addr<31:4>: D124500
    MC Error Info Register 1  x800FC800  MC bus trans addr <39:32>
    x00000000
                                         MC Command is Read0-Mem
                                         CPU3 OR IOD3 Master at Time of
    Error
                                         Device ID:   x00000007
                                         MC error info valid
    CAP Error Register        xC0000000  Uncorrectable ECC err det by MDPB
                                         MC error info latched
    PCI Bus Trans Error Adr   x00000000
    MDPA Status Register      x00000000  MDPA Status Register Data Not
    Valid
    MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not
    Valid
    MDPB Status Register      x00000000  MDPB Status Register Data Not
    Valid
    MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not
    Valid
    
    ** IOD SUBPACKET -> **               IOD 1 Register Subpacket
    
    WHOAMI                    x0000023A  Module Revision  1.
                                         CPU = 0
    
    Base Address of Bridge    x000000FBE0000000
    Dev Type & Rev Register   x06000032  CAP Chip Revision:       
    x00000002
                                         HORSE  Module Revision:  
    x00000003
                                         SADDLE Module Revision:  
    x00000000
                                         SADDLE Module Type:        Left
    Hand
                                         Internal CAP Chip Arbiter: Enabled
                                         PCI Class Code           
    x00000600
    MC-PCI Command Register   x42460FF1  Module SelfTest Passed LED on
                                         Delayed PCI Bus Reads Protocol:
    Enabled
                                         Bridge to PCI Transactions:
    Enabled
                                         Bridge REQUESTS 64 Bit Data
    Transactions
                                         Bridge ACCEPTS 64 Bit Data
    Transactions
                                         PCI Address Parity Check: Enabled
                                         MC Bus CMD/Addr Parity Check:
    Enabled
                                         MC Bus NXM Check: Enabled
                                         Check ALL Transactions for Errors
                                         Use RD/MOD/WRT for <64 Byte Block
    Mem Wrt
                                         Wrt PEND_NUM Threshold:  6.
                                         RD_TYPE Memory Prefetch Algorithm:
    Short
                                         RL_TYPE Mem Rd Line Prefetch Type:
    Medium
                                         RM_TYPE Mem Rd Multiple Cmd Type: 
    Long
                                         ARB_MODE PCI Arbitration: Round
    Robin
    Mem Host Address Ext Reg  x00000000  HAE Sparse Mem Adr<31:27>
    x00000000
    IO Host Adr Ext Register  x00000000  PCI Upper Adr Bits<31:25>
    x00000000
    Interrupt Ctrl Register   x00000003  Write Device Interrupt Info
    Struct:Enabled
    Interrupt Request         x00800000  Interrupts asserted  x00000000
                                         Hard Error
    Interrupt Mask0 Register  x00C51111
    Interrupt Mask1 Register  x00000000
    MC Error Info Register 0  x0D124500
                                         MC Bus Trans Addr<31:4>: D124500
    MC Error Info Register 1  x800FC800  MC bus trans addr <39:32>
    x00000000
                                         MC Command is Read0-Mem
                                         CPU3 OR IOD3 Master at Time of
    Error
                                         Device ID:   x00000007
                                         MC error info valid
    CAP Error Register        xC0000000  Uncorrectable ECC err det by MDPB
                                         MC error info latched
    PCI Bus Trans Error Adr   x00000000
    MDPA Status Register      x00000000  MDPA Status Register Data Not
    Valid
    MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not
    Valid
    MDPB Status Register      x00000000  MDPB Status Register Data Not
    Valid
    MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not
    Valid
    
    
    PALcode Revision                     Palcode Rev: 1.21-3
    
578.3Install and use DECeventPOBOXA::SHEPARDTue Apr 29 1997 11:593
    You can install and run DECevent and post the results here.
    
    Gary
578.4DECevent result in .2PANTER::AUBERTTue Apr 29 1997 13:093
    I have already run DECevent and the result is posted in .2
    
    Thierry
578.5HARMNY::CUMMINSTue Apr 29 1997 14:588
    From the DECevent output, this would appear to be the same problem as
    that described in 385.* and the Blitz in 93.26. Can you confirm? I
    couldn't tell for sure from the UNIX start-up audit trail whether each
    KZPSA in your machine had a disk attached.
    
    P.S. I have spoken with my UNIX counterpart and he tells me there is
    a fix in place for this problem and was going to post a note in this
    conference once the patch became available.
578.6CPU #3 defect ?PANTER::AUBERTTue Apr 29 1997 15:09313
    The AlphaServer 4100 5/300 crashed again... I will copy below the
    new DECevent result. I have asked people from the field to replace the
    cpu #3. Could somebody confirme my diagnostic ?
    
    Thierry
    
    ******************************** ENTRY    4
    ********************************
    
    
    Logging OS                        2. Digital UNIX
    System Architecture               2. Alpha
    Event sequence number             4.
    Timestamp of occurrence              29-APR-1997 12:24:57
    Host name                            shd52
    
    System type register      x00000016  AlphaServer 4000 Series
    Number of CPUs (mpnum)    x00000004
    CPU logging event (mperr) x00000003
    
    Event validity                    1. O/S claims event is valid
    Event severity                    1. Severe Priority
    Entry type                      302. ASCII Panic Message Type
    
    SWI Minor class                   9. ASCII Message
    SWI Minor sub class               1. Panic
    
    ASCII Message                        panic (cpu 3): Processor Machine
    Check
    
    
    ******************************** ENTRY    5
    ********************************
    
    
    Logging OS                        2. Digital UNIX
    System Architecture               2. Alpha
    Event sequence number             3.
    Timestamp of occurrence              29-APR-1997 12:24:57
    Host name                            shd52
    
    System type register      x00000016  AlphaServer 4000 Series
    Number of CPUs (mpnum)    x00000004
    CPU logging event (mperr) x00000003
    
    Event validity                    1. O/S claims event is valid
    Event severity                    1. Severe Priority
    Entry type                      100. CPU Machine Check Errors
    
    CPU Minor class                   1. Machine check (670 entry)
    
    Software Flags            x0000000300000000
                                         IOD 0 Register Subpkt Pres
                                         IOD 1 Register Subpkt Pres
    Active CPUs               x0000000F
    Hardware Rev              x00000000
    System Serial Number                 AY65200674
    Module Serial Number
    Module Type                   x0000
    System Revision           x00000000
    
    * MCHK 670 Regs *
    Flags:                    x00000000
    PCI Mask                      x0000
    Machine Check Reason          x0098  Fatal Alpha Chip Detected Hard
    Error
    PAL SHADOW REG 0          x0000000000000000
    PAL SHADOW REG 1          x0000000000000000
    PAL SHADOW REG 2          x0000000000000000
    PAL SHADOW REG 3          x0000000000000000
    PAL SHADOW REG 4          x0000000000000000
    PAL SHADOW REG 5          x0000000000000000
    PAL SHADOW REG 6          x0000000000000000
    PAL SHADOW REG 7          x0000000000000000
    PALTEMP0                  x0000000140EFC1F8
    PALTEMP1                  x0000000140EFC1F8
    PALTEMP2                  xFFFFFC0000464C80
    PALTEMP3                  x0000000000005588
    PALTEMP4                  x0000000140D91680
    PALTEMP5                  x0000000140EFAD68
    PALTEMP6                  x0000000140D91680
    PALTEMP7                  xFFFFFC00004646C0
    PALTEMP8                  x1F1E161514020100
    PALTEMP9                  xFFFFFC00004649F0
    PALTEMP10                 x00000001201954F4
    PALTEMP11                 xFFFFFC0000464850
    PALTEMP12                 xFFFFFC0000464BF0
    PALTEMP13                 x0000000000006FC0
    PALTEMP14                 x0000000000000000
    PALTEMP15                 x0000000000004978
    PALTEMP16                 x0000009806700301
    PALTEMP17                 x0000000000000000
    PALTEMP18                 x000000011FFFE280
    PALTEMP19                 xFFFFFFFF90EA3A38
    PALTEMP20                 x000000000FCBE000
    PALTEMP21                 xFFFFFC0000464C20
    PALTEMP22                 xFFFFFC00005E4530
    PALTEMP23                 x0000000003BF7A38
    Exception Address Reg     x00000001201954F4
                                         Native-mode Instruction
                                         Exception PC  x000000004806553D
    Exception Summary Reg     x0000000000000000
    Exception Mask Reg        x0000000000000000
    PAL Base Address Reg      x0000000000014000
                                         Base Addr for PALcode: 
    x0000000000000005
    Interrupt Summary Reg     x0000000000000000
                                         AST Requests 3-0: 
    x0000000000000000
    IBOX Ctrl and Status Reg  x000000C164000000
                                         Timeout Counter Bit Clear.
                                         IBOX Timeout Counter Enabled.
                                         Floating Point Instr's May be
    Issued.
                                         PAL Shadow Registers Enabled.
                                         Correctable Error Interrupts
    Enabled.
                                         ICACHE BIST (Self Test) Was
    Successful.
                                         TEST_STATUS_H Pin Asserted
    Icache Par Err Stat Reg   x0000000000000000
    Dcache Par Err Stat Reg   x0000000000000000
    Virtual Address Reg       x0000000140EFC250
    Memory Mgmt Flt Sts Reg   x0000000000011B10
                                         If Err, Reference Resulted in DTB
    Miss
                                         Fault Inst RA Field: 
    x000000000000000C
    
                                         Fault Inst Opcode: 
    x0000000000000023
    Scache Address Reg        xFFFFFF000001960F
    Scache Status Reg         x0000000000000000
    Bcache Tag Address Reg    xFFFFFFFFFF7FFFFF
                                         Last Bcache Access Resulted in a
    Hit.
                                         Value of Parity Bit for Tag
    Control Status
                                            Bits Dirty, Shared & Valid is
    Set.
                                         Value of Tag Control Dirty Bit is
    Set.
                                         Value of Tag Control Shared Bit is
    Set.
                                         Value of Tag Control Valid Bit is
    Set.
                                         Value of Parity Bit Covering Tag
    Store
                                            Address Bits is Set.
                                         Tag Address<38:20> Is: 
    x000000000007FFF7
    Ext Interface Address Reg xFFFFFF00080C0D0F
    Fill Syndrome Reg         x000000000000491B
    Ext Interface Status Reg  xFFFFFFF904FFFFFF
                                         UNCORRECTABLE ECC ERROR
                                         Error Occurred During D-ref Fill
                                         Second External Interface Hard
    Error
    LD LOCK                   xFFFFFF00024462CF
    
    ** IOD SUBPACKET -> **               IOD 0 Register Subpacket
    
    WHOAMI                    x0000023F  Module Revision  1.
                                         CPU = 3
    
    Base Address of Bridge    x000000F9E0000000
    Dev Type & Rev Register   x06008032  CAP Chip Revision:       
    x00000002
                                         HORSE  Module Revision:  
    x00000003
                                         SADDLE Module Revision:  
    x00000000
                                         SADDLE Module Type:        Left
    Hand
                                         PCI-EISA Bus Bridge Present on PCI
    Segment
                                         PCI Class Code           
    x00000600
    MC-PCI Command Register   x42460FF1  Module SelfTest Passed LED on
                                         Delayed PCI Bus Reads Protocol:
    Enabled
                                         Bridge to PCI Transactions:
    Enabled
                                         Bridge REQUESTS 64 Bit Data
    Transactions
                                         Bridge ACCEPTS 64 Bit Data
    Transactions
                                         PCI Address Parity Check: Enabled
                                         MC Bus CMD/Addr Parity Check:
    Enabled
                                         MC Bus NXM Check: Enabled
                                         Check ALL Transactions for Errors
                                         Use RD/MOD/WRT for <64 Byte Block
    Mem Wrt
                                         Wrt PEND_NUM Threshold:  6.
                                         RD_TYPE Memory Prefetch Algorithm:
    Short
                                         RL_TYPE Mem Rd Line Prefetch Type:
    Medium
                                         RM_TYPE Mem Rd Multiple Cmd Type: 
    Long
                                         ARB_MODE PCI Arbitration: Round
    Robin
    Mem Host Address Ext Reg  x00000000  HAE Sparse Mem Adr<31:27>
    x00000000
    IO Host Adr Ext Register  x00000000  PCI Upper Adr Bits<31:25>
    x00000000
    Interrupt Ctrl Register   x00000003  Write Device Interrupt Info
    Struct:Enabled
    Interrupt Request         x00800000  Interrupts asserted  x00000000
                                         Hard Error
    Interrupt Mask0 Register  x00C51111
    Interrupt Mask1 Register  x00000000
    MC Error Info Register 0  x080C0D00
                                         MC Bus Trans Addr<31:4>: 80C0D00
    MC Error Info Register 1  x800FD800  MC bus trans addr <39:32>
    x00000000
                                         MC Command is Read0-Mem
                                         CPU3 OR IOD3 Master at Time of
    Error
                                         Device ID:   x00000007
                                         MC error info valid
    CAP Error Register        xE0000000  Uncorrectable ECC err det by MDPA
                                         Uncorrectable ECC err det by MDPB
                                         MC error info latched
    PCI Bus Trans Error Adr   x00000000
    MDPA Status Register      x00000000  MDPA Status Register Data Not
    Valid
    MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not
    Valid
    MDPB Status Register      x00000000  MDPB Status Register Data Not
    Valid
    MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not
    Valid
    
    ** IOD SUBPACKET -> **               IOD 1 Register Subpacket
    
    WHOAMI                    x0000023F  Module Revision  1.
                                         CPU = 3
    
    Base Address of Bridge    x000000FBE0000000
    Dev Type & Rev Register   x06000032  CAP Chip Revision:       
    x00000002
                                         HORSE  Module Revision:  
    x00000003
                                         SADDLE Module Revision:  
    x00000000
                                         SADDLE Module Type:        Left
    Hand
                                         Internal CAP Chip Arbiter: Enabled
                                         PCI Class Code           
    x00000600
    MC-PCI Command Register   x42460FF1  Module SelfTest Passed LED on
                                         Delayed PCI Bus Reads Protocol:
    Enabled
                                         Bridge to PCI Transactions:
    Enabled
                                         Bridge REQUESTS 64 Bit Data
    Transactions
                                         Bridge ACCEPTS 64 Bit Data
    Transactions
                                         PCI Address Parity Check: Enabled
                                         MC Bus CMD/Addr Parity Check:
    Enabled
                                         MC Bus NXM Check: Enabled
                                         Check ALL Transactions for Errors
                                         Use RD/MOD/WRT for <64 Byte Block
    Mem Wrt
                                         Wrt PEND_NUM Threshold:  6.
                                         RD_TYPE Memory Prefetch Algorithm:
    Short
                                         RL_TYPE Mem Rd Line Prefetch Type:
    Medium
                                         RM_TYPE Mem Rd Multiple Cmd Type: 
    Long
                                         ARB_MODE PCI Arbitration: Round
    Robin
    Mem Host Address Ext Reg  x00000000  HAE Sparse Mem Adr<31:27>
    x00000000
    IO Host Adr Ext Register  x00000000  PCI Upper Adr Bits<31:25>
    x00000000
    Interrupt Ctrl Register   x00000003  Write Device Interrupt Info
    Struct:Enabled
    Interrupt Request         x00800000  Interrupts asserted  x00000000
                                         Hard Error
    Interrupt Mask0 Register  x00C51111
    Interrupt Mask1 Register  x00000000
    MC Error Info Register 0  x080C0D00
                                         MC Bus Trans Addr<31:4>: 80C0D00
    MC Error Info Register 1  x800FD800  MC bus trans addr <39:32>
    x00000000
                                         MC Command is Read0-Mem
                                         CPU3 OR IOD3 Master at Time of
    Error
                                         Device ID:   x00000007
                                         MC error info valid
    CAP Error Register        xE0000000  Uncorrectable ECC err det by MDPA
                                         Uncorrectable ECC err det by MDPB
                                         MC error info latched
    PCI Bus Trans Error Adr   x00000000
    MDPA Status Register      x00000000  MDPA Status Register Data Not
    Valid
    MDPA Error Syndrome Reg   x00000000  MDPA Syndrome Register Data Not
    Valid
    MDPB Status Register      x00000000  MDPB Status Register Data Not
    Valid
    MDPB Error Syndrome Reg   x00000000  MDPB Syndrome Register Data Not
    Valid
    
    
    PALcode Revision                     Palcode Rev: 1.21-3
    
    
578.7HARMNY::CUMMINSTue Apr 29 1997 15:3716
    The DECevent output in .2 shows what appears to be a diskless KZPSA
    UNIX panic (see note 93.26, etc.). Until you posted .6 I hadn't looked
    far enough in the DECevent log posted in .2 to see the CPU3 error
    (which is also shown in the log in .6). Could you post a reply
    indicating whether you have any diskless (/tapeless) KZPSAs in said
    machine?
    
    Also, can you reply with info about the system's memory config? I see
    from .2 that you have 256MBs of memory. This is two 128MB SYNC options,
    yes? Or do you have a proto (or internal order) with a single 256MB
    option installed in it? If SYNC memory, is this DIGITAL memory or
    third-party memory?
    
    Finally, I was looking back through other notes in this conference and
    it would appear that the log in .2 may be from the same machine as that
    discussed in 543.*? Is this true?
578.8Don't suspect CPU#3POBOXB::STEINMANTue Apr 29 1997 15:3821
    
    I don't believe CPU#3 is at fault....I suspect memory or unterminated
    KZPSA referred to in .-1, since the system
    bus saw an uncorrectable ECC at the same address as CPU3 detected it:
    
    CPU3:
        Ext Interface Address Reg xFFFFFF00080C0D0F
        Fill Syndrome Reg x000000000000491B
        Ext Interface Status Reg  xFFFFFFF904FFFFFF
    
    IOD:
        MC Error Info Register 0  x080C0D00
              MC Bus Trans Addr<31:4>: 80C0D00
    MC Error Info Register 1  x800FD800  
    MC bus trans addr <39:32> x00000000
            MC Command is Read0-Mem
            CPU3 OR IOD3 Master at Time of Error
    
    
    /mo
578.9All KZPSAs have devices connected...PANTER::AUBERTWed Apr 30 1997 08:2336
    >> The DECevent output in .2 shows what appears to be a diskless KZPSA
    >> UNIX panic.
    >> Could you post a reply indicating whether you have any diskless
    >> (/tapeless) KZPSAs in said machine?
    
    I have no diskless (/tapeless) KZPSAs in this machine.
    
    >> Also, can you reply with info about the system's memory config?
    
    FRU Location                   0. Slot Name: MEM0L and MEM0H
    Self Test Status       x00000001  FRU passed Self-Test
    Total Memory Size            128. Mega Bytes (2 Modules)
          Module Size             64. Mega Bytes (per Module)
    Memory Base Addr       x0000000000000000
    Memory Module Type     x0000000000000003
                                      Syncronous DRAM
    
    FRU Location                   1. Slot Name: MEM1L and MEM1H
    Self Test Status       x00000001  FRU passed Self-Test
    Total Memory Size            128. Mega Bytes (2 Modules)
          Module Size             64. Mega Bytes (per Module)
    Memory Base Addr       x0000000008000000
    Memory Module Type     x0000000000000003
                                      Syncronous DRAM
    
    It means that we have 4 X 64MB module (4 X B3020-CA).
    
    >> Finally, I was looking back through other notes in this conference
    >> and it would appear that the log in .2 may be from the same machine
    >> as that discussed in 543.*? Is this true?
    
    It is not the same machine but a machine with the same configuration.
    
    Thanks for your time helping diagnosing this urgent problem.
    
    Thierry Aubert/DEC at CERN
578.10PROXY::ALFORDWed Apr 30 1997 14:1424
    It's possible that the mother board could be at fault, if it is a rev
    B06 54-23803-01. I have seen one customer problem with simular config-
    uration that we fixed by swapping in a rev B07 54-23803-01 module. These 
    should be available through the P1 process.
    
    Since pulling the mother board is not a trival task, I strongly suggest
    first installing (2) sets of EDO (B3030-EA) memory first (if possible). 
    If these sets work, then it appears you may have the same problem as
    that other customer. If they don't work, then you have a different
    problem.
    
    History:
    There was a change to a PAL (vendor/code change) on the 54-23803-02 module.
    This change added timing margin for sync memory configurations. Even though
    all lab test results indicated the old PAL met system specification it
    was decided to change the PAL anyways. As already mentioned, we have seen
    one customer problem with 4 cpu and 4 B3020-CAs fail with 660 MCHKs. EDO
    memories worked fine. 
                       
    FYI... there is a rev B08 available too, this is electrically the same
    as a rev B07. The difference is, the B08 has new mounting holes for a
    power resistor support bracket.
    
    bruce
578.11motherboard rev B07 solved my problemPANTER::AUBERTFri May 09 1997 08:2710
    I have changed the motherboard with revision B07. Since 2 days now the
    system is working fine (no more machine check). I do not have checked
    with EDO memory before but I will do it with another system which has
    the same problem.
    
    I would like to thank you for your advice since it solves my problem.
    
    Regards,
    
    Thierry Aubert/DEC at CERN