[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference kernel::csguk_systems

Title:CSGUK_SYSTEMS
Notice:No restrictions on keyword creation
Moderator:KERNEL::ADAMS
Created:Wed Mar 01 1989
Last Modified:Thu Nov 28 1996
Last Successful Update:Fri Jun 06 1997
Number of topics:242
Total number of notes:1855

18.0. "VENUS INFORMATION" by KERNEL::ADAMS (Venus on Remote Control) Fri Mar 17 1989 23:55

    
    This note is for information/snippets/gotcha's etc for
    8600/8650 Systems.
    
T.RTitleUserPersonal
Name
DateLines
18.1KAF 1F & MCSPEKERNEL::ADAMSVenus on Remote ControlFri Mar 17 1989 23:5636

    Gentlemen, (and I use the term more loosely now),
    
    	We are seeing a number of instances of KAF 1F failures on Venus
    systems. First problem is that VSR doesn't know anything of a KAF
    code of 1F; it reports it as an Undefined KAF code. This is NOT helpful!!
    A KAF code of 1F means the following:-
    *************************************************************************
    KAF 1F -- MBOX/SBI Command Error or Non-existant Memory

	This KAF occurs because the Mbox is stalled looping at uPC FF
	due  to  a DMA ERROR (Abus Control or Address PE or NXM). The
	problem  is  caused  by  the microstack being popped too many
	times (Underflow error).
    *************************************************************************
    In plain english, the ABus is probably naffed. Look for error bits set
    in the MSTAT1 or MSTAT2 IPR's.
    
    	Another (more nasty) problem is that the console may have printed
    the following message:-
    ?DCN-E-CSPERR, MCS control store parity error
    ?ECR-E-MSTKER, MCS ustack error caused CSPE interrupt
    	Bad_C6400504AB246FA893E110  Syndrome_80780007

    	This is NOT an MBox control store parity error; it's a microstack
        -----------------------------------------------------------------
    error in the MBox. It has nowt to do with the Control Store. Now, the
    -----------------------------------------------------------
    problem is that the console may create a KAF reason in the snapshot
    of 1F (it looks for the MBox uPC stuck at FF for KAF 1F).
    	So, rule 1, if you get the MCS CSPE message above, it's almost
    GUARANTEED to be something OTHER than an MBox Control Store PE.


    
18.21B06 & MCSPE & 1C00KERNEL::ADAMSVenus on Remote ControlFri Mar 17 1989 23:5937
Please be careful, if you get MCS parity errors reported.
Along with the text, reporting the problem, will be a line or
two of text giving the cause of the problem and the bad microword
plus the sysdrome (really the contents of CSES)

All this information needs to be recorded for fault analysis.
Also we need to know the circumstances leading up to the error.
The reason for this is that although the "VAX" cpu may be halted,
the I/O still continues and will most likely still impact the 
M-BOX.This can often cause MCS U-Stack overflow, which then gets
fired straight into the console.At this time the console is 
probably trying to "save the system state", but the interrupt
kills this and is handled at higher priority.One result of this
CAN be the halting of the T-11, resulting in the ROM> prompt.

As an example, we had the following on a system.

CPU STOP      CPU ERROR HALT CSM CODE=06   < This is the real fault >
Attempting to save machine state.          < Snap should be 1B06    >

MCS CS Parity Error                        < This is from outstanding >
MCS U-Stack Error caused CSPE Interrupt.   < I/O trying to complete   >
Bad = nnnnnnnnnnnnnnnnnnnn Syn=80780007    < It stops the snapshot    >
                                           < & generates a 1C00 instead>
                                           < So we've lost the fault info>

?T-11 Halt                                 < This may not always halt >
Registers nnnn nnnn nnnn nnnn nnnn nnnn    < We may go straight to    >
                                           < trying a Restart         >
ROM>
ROM>B                                      < This reboots the consol  >

Attempting Warm Restart   etc etc.....
Restart probably fails,resulting in a bugcheck/reboot.


18.3Help with those "micros" messages.KERNEL::ADAMSVenus on Remote ControlMon Mar 20 1989 15:37122
          8600 - How to Enable Reporting of Microdiagnostic Problems.     
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

********************   CAUTION:  FOR INTERNAL USE ONLY   *********************
*                                                                            *
*      THIS INFORMATION IS FOR USE BY DIGITAL EQUIPMENT CORP. AND ITS        *
*      EMPLOYEES ONLY.  PLEASE USE EXTREME CARE IF YOU MUST DISCUSS ANY      *
*      PART OF THIS INFORMATION WITH ANYONE WHO IS NOT A DIGITAL EMPLOYEE.   *
*                                                                            *
******************************************************************************

 VIII.0  DEALING WITH STRANGE CONDITIONS

      Strange   conditions   that    may    occur    while    running
      microdiagnostics are;


  o  "?DCP-E-NOANSD, DSM-DC communication failure",

          When you see this  message,  the  Ebox  microsequencer  has
     stopped listening to the console.  It is very important that you
     capture data  so  that  the  cause  of  this  condition  can  be
     investigated.   This could happen because of programming faults,
     because the hardware is not initialized properly or because  the
     hardware  is  broken.  This message can occur at almost any time
     when you are sitting at the console terminal.   No  matter  what
     the reason, we need to know what caused the condition so that we
     can fix it, or write another test to catch the fault earlier  in
     the testing sequence.

          What you should do:

     1.  Enable HARDCOPY if you have a hardcopy terminal available
     2.  Type "STOP CPU"
     3.  Type "MIC"

              This will cause the current Microsequencer  PCs  to  be
         typed on the terminal.


     4.  Type "space bar" 10 more times

              This causes a whole sequence of Microsequencer  PCs  to
         be typed out.  This helps us to find out what the CPU thinks
         its doing.
     5.  Type "return".  This gets you out of MIC mode.
     6.  Type "reset"
     7.  Type "Start CPU"
     8.  Type "Examine/ESCRATCH 70
     9.  Type "Examine/ESCRATCH 73
    10.  Type "Show Data"
    11.  Now re-execute the command file for the Microdiagnostic that
         got the error.  (Type "@EDK--")
    12.  Type "Start", to see if the diagnostic fails consistantly.
    13.  If the microdiagnostic hangs again, Type  "DIAG",  and  then
         re-execute the command file one more time.
    14.  Save all of this data and include it with a problem report.


  o  "?DCP-E-UMICTP, unexpected micro trap at vector XX",

          You should get this message only after you have  started  a
     microdiagnostic.   It means that there is something wrong in the
     hardware that is causing Microtraps in the EBOX that the current
     test  has  not  requested  nor  tried to force.  If you see this
     message it means that a fault is in the machine that should have
     been  caught  by  a previous diagnostic, or that the machine has
     not been initialized properly.

          What you should do:
     1.  Enable HARDCOPY if you have a hardcopy terminal available
     2.  Type "SHOW Switches"
     3.  Type "SHOW Data"
     4.  Type "Examine/WBUS 6"
     5.  Type "Examine/WBUS 7"
     6.  Type "Examine/WBUS 9"
     7.  Type "Examine/WBUS 11"
     8.  Type "Examine/WBUS 12"
     9.  Type "Examine/WBUS 13"
    10.  Type "START"
              Typeing START and causing the tests  to  be  run  again
         will  tell  us if the problem was a spurious one time event,
         or if we have an initialization or setup problem within  our
         test microcode.

  o  "?DCP-E-ALIVEE. invalid dsm alive byte".

          You should only get this message after you  have  issued  a
     "START"  command  to a diagnostic.  It means that the diagnostic
     should have finished running its current test, but has not.  The
     microcode  may  be  hung,  or  the  test may have gotten into an
     infinite  loop.   In  either  case,  it  is   a   lot   like   a
      DSM-DC communication failure"  and  needs to have the same sort
     of information collected.  We  need  to  know  what  caused  the
     condition so that we can fix it.

          What you should do:
     1.  Enable HARDCOPY if you have a hardcopy terminal available
     2.  Type "STOP CPU"
     3.  Type "MIC"

              This will cause the current Microsequencer  PCs  to  be
         typed on the terminal.
     4.  Type "space bar" 10 more times

              This causes a whole sequence of Microsequencer  PCs  to
         be typed out.  This helps us to find out what the CPU thinks
         its doing.
     5.  Type "return".  This gets you out of MIC mode.
     6.  Type "reset"
     7.  Type "Start CPU"
     8.  Type "Examine/ESCRATCH 70
     9.  Type "Examine/ESCRATCH 73
    10.  Type "Show Data"
    11.  Now re-execute the command file for the Microdiagnostic that
         got the error.  (Type "@EDK--")
    12.  Type "Start", to see if the diagnostic fails consistantly.
    13.  If the microdiagnostic hangs again, Type  "DIAG",  and  then
         re-execute the command file one more time.
    14.  Save all of this data and include it with a problem report.

18.4INTSTKINV/MCHK on reboot ???KERNEL::ADAMSVenus on Remote ControlWed Apr 12 1989 15:2620
    
    Remember the problem of Interrupt Stack Invalid etc on REBOOT ???
    
    Well there was a workaround of INIT/PAMM & INIT/CPU added to
    DEFBOO.COM. This only affected COLD REBOOT problems.
    
    Now we have a rewritten SYSLOA790.EXE on the rev 10 console pack
    (it's called SYSLOA.790 on the RL02 [RT11 only has 6 chars for
    filename]). It is 35 blocks long and the creation date on the 
    RL02 should be 21-Feb-1989. If you "Anal/Image" on the customer
    system, you should see Image File ID of X-1 with link date of
    21-SEP-1988 and NO PATCHES.
    This fixes the problem of INTSTKINV & Machine checks when the
    "Auto-Reboot" option of shutdown is used  -- PROVIDED that the
    L0211 module is at Rev F.
    
    All sites should be running at least Rev 10 consoles and have the
    correct version of SYSLOA. Inform the BRANCH, if this is not the
    case.
    
18.5New SYSLOA.EXE Files available.KERNEL::ADAMSVenus on Remote ControlMon Apr 24 1989 20:0565
    The patched version of SYSLOA790.EXE for  version  5.X  is  now 
    available.  It  is available in the public account on COMICS::, 
    along with SYSLOA790.EXE for version 4.7. The two files are :-

    o COMICS::DISK$USERS2:[PUBLIC]SYSLOA790_V50.EXE
    o COMICS::DISK$USERS2:[PUBLIC]SYSLOA790_V47.EXE

    I  shall also be distributing both images with the next release 
    of Console. (The version 4.7 SYSLOA790.EXE was  distributed  on 
    Console  10  as  SYSLOA.790).  Just  to recap I have listed the 
    Symptoms  and  Problems  that   these   patched   versions   of 
    SYSLOA790.EXE fix.

    Symptoms:
    
    When  rebooting VMS 4.7 using shutdown, auto-reboot, or after a 
    bugcheck, the system will sometimes fail with  INTERRUPT  STACK 
    INVALID HALT KAF. VMS 5.0 and 5.1 have a total of four symptoms 
    with the  two  major  ones  being  KERNEL  MODE  HALT  KAF  and 
    INTERRUPT  STACK  INVALID KAF. Either version of VMS may simply 
    show one CPU ERROR after a reboot. This error will be a MACHINE 
    CHECK  LOGOUT  entry  with a DATE/TIMESTAMP of XX-JAN-1978. For 
    the snapshots, there will be machine check stack frames in  the 
    ISP  record. On V4.7, there will be two stackframes on the ISP. 
    Under V5.X, there will be one on the ESC stackframe and one  on 
    the ISP.

    Problems:
    
    The  problem  can be identified by looking at the machine check 
    stack frame in the ISP record of the SNAP.

    The EBCS register will have bit 15 set MBOX FATAL ERROR and bit 
    14 set MBOX INTERRUPT PENDING.

    For  VMS  4.7 IVASAV will contain a virtual address of 80029400 
    which would translate to a physical address of  20000000.  With 
    VMS  5.0  IVASAV  will be different from machine to machine but 
    should still translate to a physical address of 20000000.

    Decoding MSTAT1 should show the MBOX CYCLE TYPE to  be  a  NOP. 
    MSTAT2  should have bit 2 set CP I/O BUFFER ERROR. You may also 
    find multiple machine check entries in the ISP  with  the  same 
    error signature.

    In the SB0 record the SBI Timeout Address Register will have an 
    address of  08000800  (20002000  PHYSICAL)  for  VMS  V4.7  and 
    08000000 (20000000 physical) for VMS 5.0.

    
    If  the new installed version of SYSLOA790.EXE DOES NOT fix the 
    above senerios please contact myself or Chris.

    Also, due to the recent changes in Field Service, I  appreciate 
    that  many  of the 8600 focus Engineers have now moved on, so I 
    have attached  the  distribution  list  to  end  of  this  mail 
    message.  Could  you  please mail me if you think that I should 
    include other 8600  responsible  Engineers  on  this  list,  or 
    indeed if you wish to be removed.

    Regards
    Brian Lindley


18.6Identifying your SysloaKERNEL::ADAMSVenus on Remote ControlWed Apr 26 1989 00:5017
    
    There might be some confusion from File-ID versions, if you
    use ANA/IMAGE SYSLOA.EXE to see if the customer is up to date.
    
    Here is the information to look for, regarding the new files:
    
    V4.7 Sysloa.Exe is 35 Blocks long and Link date should be on
    or after 21-Sept-1988
    
    V5   Sysloa.Exe is 39 Blocks long and Link date should be on
    or after 14-Mar-1989.
    
    Unfortunately you cannot just look at File Identification from
    the Ana/Image because in one case it stayed the same, and in the
    other it went back one version, in spite of it being a total
    rewrite of the image.
    
18.7More on INTSTKINV & SYSLOAKERNEL::ADAMSVenus on Remote ControlMon May 15 1989 11:3987
             The attached information is from the CSSE HPS Group

    The following information will be available in the CSSE STARS database
================================================================================

                      INTERRUPT STACK INVALID HALTS ON BOOTING
                                  BY:  GARY SHEPARD
                                      HPS CSSE



     SYMPTOM:

     When rebooting VMS 4.7 using shutdown, auto-reboot, or  after  a  bugcheck,
     the  system will sometimes fail with INTERRUPT STACK INVALID HALT KAF.  VMS
     5.0 and 5.1 have a total of four symptoms with the  two  major  ones  being
     KERNEL  MODE  HALT  KAF and INTERRUPT STACK INVALID KAF.  Either version of
     VMS may simply show one CPU ERROR after a reboot.  This  error  will  be  a
     MACHINE  CHECK  LOGOUT entry with a DATE/TIMESTAMP of XX-JAN-1978.  For the
     snapshots, there will be machine check stack frames in the ISP record.   On
     V4.7,  there will be two stackframes on the ISP.  Under V5.X, there will be
     one on the ESC stackframe and one on the ISP.

     CSSE CONTACT:

     Gary Shepard
     DTN 297-5290 or 508-467-5290
     HPSMEG::SHEPARD

     or

     DENNEY ANDREW
     DTN 297-2892 or 508-467-2892
     HPSMEG::ANDREW

     PROBLEM:

     There has recently been some problems discovered  and  solved  with  a  new
     SYSLOA790.EXE for VMS 4.7, 5.0 and 5.1 that caused problems when rebooting.
     This problem can be identified by looking at the machine check stack  frame
     in the ISP record of the SNAP.

     The EBCS register will have bit 15 set MBOX FATAL ERROR and bit 14 set MBOX
     INTERRUPT PENDING.

     For VMS 4.7 IVASAV will contain a virtual address of 80029400  which  would
     translate  to  a physical address of 20000000.  With VMS 5.0 IVASAV will be
     different from machine to machine but should still translate to a  physical
     address of 20000000.

     Decoding MSTAT1 should show the MBOX CYCLE TYPE to be a NOP.  MSTAT2 should
     have  bit  2  set  CP I/O BUFFER ERROR.  You MAY also find multiple machine
     check entries in the ISP with the same error signature.

     In the SB0 record the SBI Timeout Address Register will have an address  of
     08000800  (20002000 PHYSICAL) for VMS V4.7 and 08000000 (20000000 physical)
     for vms 5.0.


                                  1



     SOLUTION:

     There is a new version of SYSLOA790  which  can  be  obtained  through  the
     CSC's.   If  the  new  version  of  SYSLOA790  does not correct the booting
     problems insure that the  following  modules  are  at  these  revisions  of
     higher.  L0211 rev F, L0203 rev C, M8273 rev D.

     WORKAROUND:

     There is a temporary workaround that can be utilized until the new  version
     of  SYSLOA790.EXE  is  obtained.   However, it will disable a BOOT feature.
     Once this workaround is installed, the BOOT/R5:nn command won't work.  This
     is due to the INIT/CPU wiping out the passed value of R5.

     To implement the workaround,  copy  DEFBOO.COM  using  EXCHANGE  into  your
     directory  and edit it.  After the first INIT command, insert the following
     two lines.

     INIT/CPU
     INIT/PAMM

     Then copy it back to the console RL02 using exchange.  This workaround does
     not work on all machines, but does work on most machines.
    
18.8V14 Consol is here.KERNEL::ADAMSVenus on Remote ControlTue Nov 21 1989 00:44146
    Gentlemen

    8600/8650  Console  Pack  Release 14 is now available. To speed 
    up the distribution of this release, I have decided to make  it 
    available publically on COMICS, in the following directory :-

    COMICS::DISK$TECH:[VENUS]CONSOL14_DIAG.DSK

    If this presents a problem to anyone please mail/phone as I  do 
    not intend to ship this release via magtape as well.
    I  shall  follow  this  mail  with  another   mail   describing 
    enhancements and added features in this release.



          TO:      All 8600 engineers        DATE: 20-November-1989
                                             FROM: Brian Lindley
                                             DEPT: Product & Tech-
                                                   nology Group
                                             EXT:  833-3659
                                             LOC:  UVO
                                             ENET: COMICS::LINDLEY


          cc:      Chris Loane


          SUBJECT: 8600/8650 Console Pack revision 14.0


          The new 8600/8650 console pack revision 14.0 is with us.
          It has some added features over previous console packs. They
          are as follows :-

          o  Improved RDC/RHM Handling :-

          1) Front-panel light anomaly.
             Corrected a problem with the front-panel Remote Enable
             light by resetting a counter in the event the SCP ter-
             minal control switch is turned to REMOTE and back to LO-
             CAL before the 5 second timeout counter had elapsed.

          2) ^P may force the console to enter CIO mode.
             Resolved a problem which causes the console to occasion-
             ally drop in to CIO mode. The source of the problem was
             a conflict between the updating of the front-panel lights
             and the reading of the front-panel switches.

          o  Cache Sweep During Snapshot Process

             Prior to this release, the cache sweep routine was in-
             voked after the snapshot procedure. Unfortunately, this
             did not work. VERIFY/ECS, which is the last action taken
             during the snapshot process, would trash Escratch and
             consequently cause CSM to become unusable. The cache sweep
             would not work since CSM is required to perform this func-
             tion.
             The call to sweep cache is now issued during the snap-
             shot procedure just before the call to verify the con-
             trol stores.
             It should be noted that cache sweeps are invoked only
             when the SNAP flag is on. When the SNAP flag is off, cache
             sweeps are NOT performed.

          o  Informational Messages During a KAF

             The console will display an informational message af-
             ter a KAF failure. This has been provided to assist the
             field by describing the sequence of events which occur
             during the snapshot procedure and to prevent the pos-
             sibility of user intervention which may prematurely abort
             the snapshot process.

             The new message is as follows;

             Attempting to save machine state after KAF-(KAF fail-
             ure message) DO NOT STOP THE SNAPSHOT PROCESS UNTIL THE
             SNAP FILE IS WRITTEN. Let the system reinitalize by it-
             self. (Approximately 5 minutes)

             1  Stop clock, read and save all upcs via RDREG.
             2  Read and save selected CONSOLE registers.
             3  Read and save EMM status and environment.
             4  Read and save 17 of 24 SDB channels.
             5  Check clock alignment and get 20 cycle upc trace.
             6  Unhang and restart CSM, read and stash:
                All ESCratch locations
                All VENUS processor registers
                All PAMM locations
                Top 64 long_words on the interrupt_stack
                Middle 25 long_words on the interrupt_stack
                Bottom 64 long_words on the interrupt_stack
                All IOA and SBI/NEXUS registers
             7  Sweep Cache
             8  If enabled, verify all Control Store and PAMM.
             9  Write the SNAP buffer to SNAP1.DAT or SNAP2.DAT

          o  Expanded CSPE text message

             Modified MCPECR, which handles Control Store Parity Er-
             rors, so the XOR result of a CSPE is always printed in
             the console message and can be used to identify the failed
             hardware. Additional documentation will be added to the
             VAX 8600/8650 SYSTEM FAULT ISOLATION MANUAL (EK-8600S-
             MM-002) to assist the field in diagnosing the FRU.


          o  This console pack has VSR (for VMS version 4.X) and SNAP-
             BUSTER distributed with it. VSR for VMS version 5 is not
             available on this release. VSR for version 5 is avail-
             able via the SDD Tools kit and has several hooks into
             SDD files which unfortuately cannot be shipped by this
             meduim. If this is going to cause problems, please con-
             tact me.

          o  There are two command files on the Console Pack for copy-
             ing VSR and SNAPBUSTER files from the Console Pack / Vir-
             tual Disk to a specified account. These command files
             are called VSRCPY.COM and SNPCPY.COM.
             VSRCPY.COM is a command file to copy all the files, re-
             quired to run VSR, from CSA1: or a virtual disk to SYS$ERRORLOG,
             and if specified set up all the logical assignments which
             VSR requires to run. At the DCL prompt type:
             EXCH COPY CSA1:VSRCPY.COM *.*
             to copy this command file into your default directory.
             @VSRCPY will prompt for options.

             SNPCPY.COM is a command file to copy all the files, re-
             quired to run SNAP, from CSA1: to a virtual disk to a
             specified account. At the DCL prompt type:
             EXCH COPY CSA1:SNPCPY.COM *.*
             to copy this command file into your default directory.
             @SNAP_SETUP will do all the set ups to enable SNAP to
             run BUT run this command file from the account where the
             SNAP files are located. RUN SNAP will prompt for the name
             of the Snapshot. SNAP.DOC is an ascii file which gives
             the background information for SNAP.

          o  The CI microcode is rev 8.0

          o  As with all console releases, the diagnostics included
             have much improved isolation added. Read GUIDE.MEM and
             EDKAA.DOC for greater detail.

          Brian Lindley

18.9From Venus NotesKERNEL::ADAMSVenus on Remote ControlThu Jan 11 1990 14:0645
================================================================================
Note 173.0         CPU hangs/KAF-1E with console release 14/15        No replies
MED::PCOTE "Deus ex machina"                         40 lines  10-JAN-1990 08:49
--------------------------------------------------------------------------------



	Yes, there  is  already a note, (167.13...) which does discuss
	this but considering that the first 12 entries are not germane
	to the topic and could possibly confuse readers, I am entering
	a new topic.

	Console release 14 and console release 15 which is just coming
	out of SDC has a (EBOX) microcode bug which may cause a KAF-1E
	Unknown Machine Hang.

	To make  matters  worse,  there  is  also  a  "special" (EBOX)
	microcode  release  which  was  distributed by CSSE to certain
	sites  which  resolves  the  problem  of erroneous "Write data
	parity   errors"  in  the  error  log  file.  This  particular
	microcode,  EBOX V2.32 also possesses the same bug which could
	cause the Unknown Machine Hangs.

	CSSE has  issued  a  blitz warning the field that this problem
	does  exist  with  console  release  14. The field should also
	understand  that  the  problem will exist with console release
	15 and with the special EBOX microcode release V2.32.

	The error signature in the snapshot has already been discussed
	in  the  other  topic but can be summarized by noting that the
	EBOX hangs at upc 1D08 and the signal FBA FBOX WRITE PROB H is
	asserted.  Note  that  the  upc  is  1D0A if running with EBOX
	microcode V2.32.

	Engineering has  isolated  the problem and has generated a fix
	but  can not verify the fix since all efforts to reproduce the
	problem  inhouse has failed. If there are any sites that could
	assist  us  in  verifying  the  fix  then  please contact CSSE
	(HSPMEG::SWETT) at your earliest possible convenience.	

	     Paul
    
    
    	
	
18.10How the new SYSLOA was born.KERNEL::ADAMSVenus on Remote ControlTue Feb 20 1990 13:0698
From:	CSC32::PAULY "16-Oct-1989 1053" 16-OCT-1989 18:36:51.64
To:	BISTRO::BUI,COMICS::LOANE
CC:	CGOFS::MCARA,GIDDAY::PHELPS,MDVAX1::DPROSE,PAULY
Subj:	8600 interrupt stack invalid saga, or how a new SYSLOA came to be!

	Gentlemen,

	Below is a description of the work that was done on the 8600 reboot 
failure.  The paragraph that descibes the underlying problem is not entirely
corret.  When I originally wrote this we had not yet learned the failure was
due to the MBOX clocks being stopped for console overlays.  The console over-
lays were for the DEFBOO file.  When the MBOX clocks were stopped a DMA read 
request from the SBIA for the DW780 was in progress.  The DW780 would timeout 
the read request and then later the MBOX would return the data to the SBIA who
would in turn return it to the DW780.  Since the DW780 had timed out the 
request an SBI fault occurred with the SBIA being transmitter during fault. 
The SBIA and DW780 error registers were now latched showing the error.  The
system would then continue booting and reach the INIADP790 code.  This code
would generate the address of a device at TR1 resulting in a timeout machine
check.  A machine check recovery block was used to protect against nxm timeouts.
But since the SBIA registers indicated a fault, machine check error recovery
code was entered which contained the programming mistakes listed below.  There
is one very important item to take note of.  A modification was made to the 
INIADP790 code to unlock the SBIA registers (clear any left over errors that 
were caused by vmb or initialization) before going out on the bus for the 
first time.  This modifation only cleared out stale data, so if an error 
occurred while configuring the SBI the real errors would be logged.  Additional
improvements were made to the machine check code to capture the nexus registers
if an error did occur before the SBI was completely configured.  You can read
the following for an explaination of the bugs that were found and changes that
were made.
-------------------------------------------------------------------------------
		(My original explaintion written March 8,1989)

From:	NEXUS::PAULY "SECRET OF THE UNIVERSE, ITS NEVER TOO LATE TO HAVE A HAPPY CHILDHOOD  08-Mar-1989 0941"  8-MAR-1989 09:47:29.72
To:	@BELL.DIS,PAULY       
CC:	
Subj:	8600 interrupt stack invalid SAGA!!  Or "HOW A NEW SYSLOA CAME TO BE"


	We have started the reboot testing of the systems with the new
SYSLOA790.EXE and VMB.EXE provided by Brian Porter (VMS eng.).  So that
everyone is current on all of the work done on SYSLOA790 and VMB we are briefly
going to describe each of the changes that was made to the code.

	The underlying problem within the hardware is an intermittent SBI fault 
that occurred while trying to configure the SBI nexuses.  Although the fault
was very intermittent when it did occur it was consistent in the fact that
it happened when the first nonexistent address on the SBI was accessed.  
The type of fault was an unexpected read data being detected by the DW780 at 
TR3, the SBIA was the transmitter during fault.  It was this intermittent 
failure that resulted in certain parts of the MCHECK790 handler to be 
executed which contained software bugs.  
 
Within MCHECK790.LIS version X-14 there were two bugs which accounted for the
interrupt stack invalid reboot failures.  The first problem is in the routine
CP_IO_BUF_MCHECK; the READ_SYSTIME macro contained an addressing mode problem
which resulted in an access violation.  The second problem is in the routine 
SETUP_RETRYSCB; the BICL3 #3FF,R1,R2 should have used a 1FF for the bit clear.
This bug resulted in the RETRY_SCB to be built over top of an array in memory
that is used to temporarily hold copies of SBIA,SBI silo, and SBI nexus 
registers.  
	
	Based upon what we learned about the error handling and the initiali-
zation of the SBI nexuses several functionality improvements were suggested 
to Brian.  MCHECK790 (X-14) used to capture the SBI nexuses before capturing
the SBIA error registers.  The new MCHECK790 (X-15) was changed to capture
the SBIA error register, SBI nexuses, and SBI silo respectfully.  Within an
SBIA errorlog entry the IOA ADDRESS used to be reported as a virtual address.
A change was made to report the IOA ADDRESS as a physical address.  If an 
SBI error occurred during SBI initialization, MCHECK790 (X-14) would not
capture the SBI nexuses if the MMG$GL_SBICONF and EXE$GL_CONFREGL tables were
not built.  Brian added a number of new routines in MCHECK790  which
now capture the SBI nexuses without relying on the tables being built.
Another anomaly of the SBI FAULT is a timeout to address 20002000.
This timeout was stale data left in the SBIA registers by VMB while looking
for a CI780 on the SBI.  VMB.EXE in the routine NXMMCHK_790 does not clear
the timeout.  VMB was changed to begin looking for the CI780 at TR3 instead
of TR1 and to clear timeouts.  The CONFIG_IO routine in INIADP790.LIS does not 
unlock the SBIA (clear errors) before trying to size the SBI this resulted
in the timeout left by VMB being logged with the FAULT.  INIADP790 was changed
to unlock the SBIA before going out on the SBI and to also begin configuring
at TR3.

	Once Brian created SYSLOA790 (X-15), he then turned it over to us so	
we could do the debugging of it since he didn't have a V4.7 machine to test
his code improvements.  We spent approximately a week debugging the new
SYSLOAs.  During this time frame, we encountered numerous failures but Brian
was always very responsive and would make the appropriate changes.  Once the
original bugs and the new functionality was completely tested we encountered
one last bug.  The code path in routine MCHK_EXIT2 in MCHECK790 had apparently
never executed before.  The bug here was that it would REI to the failing PC
resulting in an infinite loop (machine check within machine check).

Regards,

	Dan Pauly
18.11Microdiagnostic InfoKERNEL::ADAMSVenus on Remote ControlWed Feb 28 1990 13:2421
	Following a recent problem of F-Box microdiagnostics failing
	because there was no F-Box in the machine being tested, may 
	I reccomend the use of the following commands :-

	DC>CONFIGURE 

	Determines which arrays and SBIAs are physically present and 
	checks for presence of an FBOX. It sets software status bits
	for each available unit, to make them automatically selected
	for test.

	DC>SHOW CONFIGURATION

	Displays current configuration and selected for test status.
	SELECT & DESELECT command may be used to modify the status.

	For more info, refer to Manual # 8180 in the library, page
	5-33 to 5-42.

 
18.12Rev 16 Pack coming soon!!KERNEL::ADAMSVenus on Remote ControlThu Apr 05 1990 12:4911
The Rev 16 consol pack is due to hit the field around the middle of
this month. This has the fixes to the bugs in versions 14/15.

Currently all systems should be running at least version 10, although
some systems have "hand built" version 13 packs.

Once the new rev 16 pack is available, we need to encourage the field
to upgrade as soon as possible.


18.13How the 8600 takes a snap.KERNEL::ADAMSVenus on Remote ControlThu Apr 05 1990 16:5638

In view of recent calls, a reminder of the process and it's requirements.

1. The 8600 stops executing Vax Instructions, for some reason.
2. The console takes a snapshot and writes a message to the console-terminal.
3. The file Snap1/2.Dat gets written to the RL02 (if the customer doesn't 
   stop it.
4. The console writes a success message to the terminal and initiates a boot.
5. In startup, ERRFMT spawns ERRSNAP.EXE to see if we need to copy a snap.
6. Errsnap.Exe calls SYS$SYSROOT:[SYSERR]ERRSNAP.COM to do the copy. The .COM
   file MUST exist in the ROOT, rather than the COMMON area.
7. If the copy is successful, we get ERRSNAP.LOG;n in Sys$errorlog AND we 
   invalidate the SNAPn.DAT file on the RL02.
   If the copy does not succeed for ANY reason, we get neither of the above.

Notes.

1. A new console pack WILL NOT have either Snap1.Dat or Snap2.Dat until a
   snapshot situation arises. The files are then created as required.
2. The ERRSNAP.EXE program will not handle a "Search List", so ERRSNAP.COM
   MUST exist in the ROOT directory. (It can be in Sys$common as well, but 
   this is not essential)
3. Unless the SNAPn.DAT files are INVALID or do not exist on the RL02, we 
   will NOT write any snaps to the RL02.
4. If you have "Valid" files on the RL02,and the process has not worked for
   any of the above reasons, you can copy them manually with EXCHANGE, using
   /Transfer_Mode=Block. You then need to have the system down, to invalidate
   the RL02 files with >>>SET SNAP INVALID console command.
5. ERRSNAP.EXE and ERRSNAP.COM will only succeed, if spawned by ERRFMT at 
   startup, you CANNOT run them interactively.
6. If you want to check that a snap can be written to the console, you should
   >>>Set Snap Now      (twice)
   >>>Set Snap Invalid  (invalidates BOTH snaps.)
   >>>Show Snap1.dat    (dumps the file, check for 1st byte=FF20 & 10 blocks)
   >>>Show Snap2.dat    (dumps the file, check for 1st byte=FF20 & 10 Blocks)


18.14Snap problems ?? What problem ??KERNEL::ADAMSVenus on Remote ControlWed Apr 18 1990 13:2538
I am getting reports that some engineers are having problems with
VSA, the Venus Snapshot Analyser. The most common problem seems to
be that it "bombs out" before completion.

I have looked at some of these problems and would like to pass on
the following information, which may help.

1. No changes have been made to VSA, for well over a year.

2. From about two years ago, VSA has had the intelligence to look
   at the snap-type, and do just the analysis required, as decided by
   the program designers/experts in USA.

3. VSA has a built in "/AUTO" switch which selects this mode of operation,
   with the sole intention of NOT giving you doubtful information which 
   could cloud the analysis.

4. Because of this (item 3 above) you should NOT specify any parameters
   when you run BVSA, other than the snap file name. Adding the "/ALL" 
   parameter, can cause VSA to exit with the error "Too many rules fired".
   Typically this will be if the I/O world has timed out, due to the rest
   of the CPU having stopped, due to the "real" snap reason.
   You may remember this from my mail messages, from way back, but I repeat
   it here for the engineers new to the group.

5. VSA will only produce a .ANL file for 1E00 "Unknown Hang" snaps.
   IT WILL DO THIS BY DEFAULT. (No parameters required.)
   In these cases, the .VSA file will most often be just a log of VSA
   activity, rather than the file you may be used to.

6. I recommend that you use Chris Loane's "Snapbuster" program for most
   snapshots, as this will give you reliable information, in a fraction
   of the VSA time.

If you still have problems, after the above recommendations, or just need
more information, then please get in touch.


18.15Unwanted Snapshot ??KERNEL::ADAMSVenus on Remote ControlWed Apr 25 1990 18:4947
    
    Some more info, to help explain the question,
    
    "Why do I get a snapshot when I C-ontinue an 8600 from >>> ?"
    
    In the cases I have looked at, I have seen the following scenario;
    ******************************************************************
    System running VMS or maybe hung in macro-code.
    Engineer types ^P
    ?MCP-I-CPSRUN, CPU is still running
    >>>
    >>> Some command, eg Show power
    >>> Other command
    >>> C 
    ?MCP-E-CSMLOP, CSM Console loop not running
    ?MCP-W-CPHUNG, CPU is hung
    Attempting to save machine state after KAF-UNKNOWN MACHINE HANG
    Initialising CPU  etc.
    
    *********************************************************************
    
    Now why does it perform like this ??
    
    Well, as you all know, the Consol (T11) is checking all the time
    to see that USMI (Start Macro Instruction) is set. It needs to see
    this F/F set at least every 300mS, otherwise it will declare a KAF.

    Now, remember that ^P does not HALT the 8600. Also when you give
    it a command, eg Show XYZ it "Stalls" the E-Box, to bring in the
    CSM overlay, into the ECS, to perform your command.
    The consol (T11) still believes the CPU is running, and it gave
    you a message right at the start. So all this time, it is wanting
    to see USMI set. 300mS has been and gone, so when you go back to
    PIO mode, the decision has already been made and the T11 forces
    the snapshot.
    
    So, what should you do ??
    -------------------------
    
    Once in CIO mode at >>>, HALT the CPU straight away, then both the
    Vax and the T11 Know that everything is halted and the T11 will
    not check for USMI. Now you can use consol commands as you wish.
    
    The problem is that if the machine is in a CLUSTER, you are liable
    to get CI timeouts and/or a CLUEXIT bugcheck due to the connection-
    manager timing out etc.
    
18.16Single-Step in a Cluster - FIX!!KERNEL::ADAMSVenus on Remote ControlWed May 30 1990 12:04117
    From:STAR::HOLSTEIN "Richie Holstein 381-1513 ZKO3-4/W23" 25-MAY-1990 
    Subj:	Availability of a fix to close out CLD OGO022609


                           INTEROFFICE MEMORANDUM


          DATE: May 25, 1990

          FROM: Richard Holstein
                VMS Development
          DTN:  381-1513
          L/MS: ZKO3-4/W23
          Net:  STAR::HOLSTEIN

          TO:   Pete Lawrence
                Bryan Jones

          SUBJECT:  A fix for the 86xx "footprint" problem, CLD
                  OGO022609

          We have finally coded and, we believe, confirmed, a fix
          for the last problem associated with CLD OGO022609 on
          the VAX 8600/8650.  The particular symptom described
          in the CLD was commonly referenced as the "footprint"
          problem.

          This problem was first reported by customer support
          centers in Europe, but the fix is generally applicable.
          It appears most frequently when a field service engi-
          neer is engaged in remote diagnosis on a system which
          is a member of an active cluster.    
    
                             If:

          -  the device OPA1 (the remote diagnosis port to the
             console) has been configured;

          -  the front panel "terminal control switch" has been
             set to the remote position at least once since the
             last time VMS booted;

          -  there is continuing output to the operator's con-
             sole;

          -  and the console is put into console mode (that
             is, CTRL/P is issued) long enough to accumulate
             substantial output;

          then VMS will hang at IPL 20 while waiting for the
          console to become ready to accept another character.
          When the console becomes ready, VMS will recover.  In
          the interim however, tasks scheduled to occur at IPL 8
          have not had a chance to occur.  Two such tasks are the
          handshaking needed to assure other cluster members that
          this system has not failed, and the massaging of the
          CI port to keep connections open.  Not servicing those
          requests leads to CLUEXIT system crashes and loss of CI
          connections, respectively.

          The fix for this problem involves a change to the
          source code in OPDRV790.MAR, part of SYSLOA790.EXE.
          Instead of looping while checking for the "ready"
          bit to change from 0 to 1 in the TXCS register, the
          code saves its current task and state, and dismisses
          the original interrupt.  Another interrupt occurs
          when the "ready" bit eventually gets set to 1.  The
          code remembers the saved state and task and does the
          operations it earlier postponed.  The new interrupt is
          dismissed and the system continues normally.

          Both interrupts take the CPU to IPL 20.  By dismissing
          the original interrupt, the CPU gets an opportunity to
          handle lower IPL interrupts, especially those for IPL 8
          where the great majority of system synchronization and
          periodic tasks take place.

          To make the fix for this "day 1" bug available to cus-
          tomers as quickly as possible, we need to upgrade ex-
          isting installations.  Fortunately, OPDRV790.MAR has
          not changed since VAX/VMS V5.0, and we can build a SYS-
          LOA790.EXE for each of the releases since then.  Such
          releases are best distributed as VMSINSTAL kits handled
          by Bryan Jones's Sustaining Engineering Group.  We ex-
          pect to cooperate with them to make the kits available
          as soon as possible.  We would also appreciate help
          from them in making up and testing the kits.

          Because of the extreme lateness in the release cycle of
          V5.4, we do not expect to be able to include this fix
          in that release.  Close on the heels of V5.4 though,
          will be V5.4-1, a strictly bug-fix release and a more
          realistic goal.  We therefore expect to see this fix
          generally available in V5.4-1 and all future releases,
          including the release now known as "Phoenix."

          It should be noted that this problem was also seen
          during the development of the VAX 9000.  Ward Travis
          designed the fix for that system and deserves the
          credit for what I've adapted for the VAX 8600 and
          8650.  Thanks also to Paul Cote, Charlie Hellen, Pete
          Lawrence and Paul Leveille for their help in diagnosing
          the bug and for testing the long line of attempted
          fixes.

          cc:   Brian Porter
                Ward Travis
                Rod Gamache
                Elliot Drayton
                Tiphany Worley
                Paul Cote
                Charlie Hellen
                Paul Leveille

          [End of 8600-CLD.TXT]

18.17You can't win 'em all.KERNEL::ADAMSVenus on Remote ControlFri Jun 01 1990 18:5225
    
    Remember that around VMS V4.7, the 8600/8650 Machine check handler,
    got smart and into the uptime stakes ?? I.E. If the CPU had corrected
    the error and was able to restart the instruction, then why should
    a Kernel mode machine check crash VMS ?? So it didn't.
    
    Well we had a call today, where we thought things had changed. We
    had a correctable E-Box Control Store Parity error machine check,
    but VMS still crashed.
    
    The reason:
    
    Having determined the rev of consol pack and thence E-Box U-Code
    rev, we took the micro-pc from CSES <28:16>. From the fiche, we
    found this to be in the MOVC5/MOVTC routine of DEROSA.MIC. From
    the PSL of the machine check we found FPD set in the PSL <27>.
    This pretty much guarantees, that although EHM will correct the
    CSPE, we cannot restart the instruction, so down we go.
    
    As a side issue on this problem, I have now re-coded the 8600
    machine check analyser in NDT to get you to check CSES for
    the syndrome in bits <15:08>. This will then give you only one
    (but correct) FRU, from the two possibles. The analyser is now
    V6.1.
    
18.18Rev 17 = "Old Chestnuts"KERNEL::ADAMSVenus on Remote ControlTue Feb 05 1991 17:4720
	Having had an enquiry today from an engineer on a site 
	using a Rev 17 8600 console pack, be aware that two "old"
	Warning messages are back with us :-

	1. The SID is once again checked and a warning is printed
	   to the effect that "The hardware on this system is less
	   than the required revision".

	   (SID being used for 3rd party software "licence".)

	2. Memory configuration expects DEC modules.
	   i.e. 4Mb = 1 slot. 16/64 Mb = 2 slots.
	   Warning message is to the effect that "the memory config
	   is not supported."

	   Usually due to use of EMC2 16 Mb modules.



18.19EMM REGISTER INFORMATIONKERNEL::ADAMSVenusian turned Aquanaut,-833 3790Thu May 30 1991 17:34265
Explanation of EMM registers on a 8600 or 8650


********************   CAUTION:  FOR INTERNAL USE ONLY   *********************
*                                                                            *
*      THIS INFORMATION IS FOR USE BY DIGITAL EQUIPMENT CORP. AND ITS        *
*      EMPLOYEES ONLY.  PLEASE USE EXTREME CARE IF YOU MUST DISCUSS ANY      *
*      PART OF THIS INFORMATION WITH ANYONE WHO IS NOT A DIGITAL EMPLOYEE.   *
*                                                                            *
******************************************************************************


PRODUCT: VENUS

LAST TECHNICAL REVIEW: 09-FEB-1989

SOURCE: Technical Support Services Europe

\       by  HARRY VAN DER ZEE  (83413) of  RDC / VALBONNE


SYMPTOMS/PROBLEM:

If we see in the errorlog an EMM entry or a snap due to an EMM problem
there is hardly no EMM register information available.


register explanation.

	RXDB REG 21

	31             24 23            16 15           11   8 7          0
        |----------------|----------------|------------|------|------------|
        |                |                |            |      |            |
        |    MBZ         |    CARRIER     |    MBZ     |  ID  |   DATA     |
        |                |                |            |      |            |
        --------------------------------------------------------------------
        <====RXDB3======><====RXDB2======><=======RXDB1=======><==RXDB0====>

	IF <11:8>  = 2  THEN THE DATA IN <7:0> concerns EMM data


	There are 2 types of opcode's (data in rxb0) 
	1).  exception reports from the EMM
	2).  responses to request made to the EMM

	Data Bytes for the EMM line ( when RXDB ID = 2)

     ******************************************************************	
     *	The format of an opcode for an EMM exception is as follows:   *
     *                                     ---------                  *
     ******************************************************************

         7    6    5   4   3   2   1   0
       |---|-----|---|---|---|---|---|---|
       | 1 | asd | x |    opcode ID      |
       |---|-----|---|---|---|---|---|---|


	Where bit 7 of the opcode byte, when set, indicates that this is an
	EMM EXCEPTION report. If the 'ASD" bit is set it indicates that the
	EMM's automatic shutdown timer is running and that a total system
	power shutdown is pending (within minutes) if the cause of the
	condition is not rectified (RED ZONE temperature faults and AIR FLOW
	faults causes ASD timer to begin counting). Bit 5 of the opcode
	byte is reserved for future use (not guaranteed to be 0).

	All exception reports are contained in a single opcode byte followed
	by a single data byte. The 'opcode ID'can be any of the following
	5 bit values. The packet data following the opcode byte is also shown
	here.

	Regulator_A = 0(16)	;status change in regulator A +5v
	Regulator_B = 1(16)     ;status change in regulator B +5v 
        Regulator_C = 2(16)     ;status change in regulator C +5v
	Regulator_D = 3(16)     ;status change in regulator D -2v
	Regulator_E = 4(16)     ;status change in regulator E -2v
	Regulator_F = 5(16)     ;status change in regulator F -5.2v
	Regulator_H = 6(16)     ;status change in regulator H -5.2v
	Regulator_L_pos = 7(16) ;status change in regulator L +12v
	Regulator_L_neg = 8(16) ;status change in regulator L -12v
	Regulator_k_pos = 9(16) ;status change in regulator K +15v
	Regulator_k_neg = A(16) ;status change in regulator K _15v

		On all these regulators the byte values means:

	Byte 0    Value of 0 -- Voltage now normal
		  Value of 1 -- Voltage now out of spec	       
	
	
        T1_Temp. = B(16)	;status change in T1 Temperature
       	T2_Temp. = C(16)	;status change in T2 Temperature
	T3_Temp. = D(16)	;status change in T3 Temperature
	T4_Temp. = E(16)	;status change in T4 Temperature

		On these sensors the byte values means:

	Byte 0    Value of 0 -- Temerature now normal
		  Value of 1 -- temperature now in yellow zone
                  Value of 2 -- Difference now in Red Zone. This condition
				will cause a total system power-off
				if not corrected
		  Value of 3 -- temerature is below nominal range.


	T2-T1_Temperature = F(16)  ;status of Delta T2 T1 has changed
	T3-T1_Temperature =10(16)  ;status of Delta T3 T1 has changed
	T4-T1_Temperature =11(16)  ;status of Delta T4 T1 has changed

		On these Delta's the byte values means:

	Byte 0	  Value of 0 -- Difference now normal
		  Value of 1 -- Difference now in Yellow zone
		  Value of 2 -- Difference now in red Zone. This condition
				will cause a total system power-off
				if not corrected.

	Air_flow1_fault  = 12(16)  ;status of AIR FLOW SENSOR 1 has changed
	Air_flow2_fault  = 13(16)  ;status of AIR FLOW SENSOR 2 has changed

	Byte 0    Value of 0 -- Air Flow Sensor now normal
		  Value of 1 -- Air Flow Sensor now out of spec. This condition
				will cause a total system power off if 
				not corrected.

	BBU_Available    = 14(16)  ; Status of BBU has changed

	Byte 0    Value of 0 -- BBU is now available
		  Value of 1 -- BBU is now not available

	EMM_FAILURE      = 15(16)  ; EMM status has changed

	Byte 0    Value of 0 -- EMM is dead ( failed to restart )
		  Value of 1 -- EMM encountered parity error in its RAM
		  Value of 2 -- EMM encountered an illegal instruction
		  Value of 3 -- EMM encountered an unknown trap to 0
		  Value of 4 -- EMM encountered an unexpected trap intr
		  Value of 5 -- EMM encountered an unexpected 6.5 intr
		  Value of 6 -- Excessive collisions on EMM bus
		  Value of 7 -- No transport acknowledge from EMM
		  Value of 8 -- No response from EMM
                  Value of 9 -- Negative response from EMM
                  Value of A -- EMM insisting is has no buffers available
                  Value of B -- CSL-to-EMM message transmit timeout

	The EMM is rebooted by the console when any of the above
	errors occur , except for the case where the EMM is dead.

	TX_RDY_TIMEOUT = 16(16)   ;The TXCS RDY bit has not been set by the
				  ;console for a full 2 secondes the TX
				  ;operation that was in progress has been 
				  ;aborted
	Byte 0	  Value of 0 -- Local terminal operation aborted
		  Value of 1 -- Remote services port operation aborted
                  Value of 2 -- EMM operation aborted
                  Value of 3 -- Logical console operation aborted

    **********************************************************************
    * The format of an opcode for an EMM request response is as follows  *
    *                                            ----------              *
    **********************************************************************



         7    6    5   4   3   2   1   0
       |---|-----|---|---|---|---|---|---|
       | 0 | asd | x |    opcode ID      |
       |---|-----|---|---|---|---|---|---|

	
        Where bit 7 of the opcode byte, when clear, indicates that this is a
	RESPONSE to an EMM request. If the 'ASD' bit is set it indicates that
	the EMM's automatic shutdown timer is running and that a total system
	power shutdown is pending (within minutes)if the cause of the
	condition is not rectified (RED ZONE temperature faults and AIR FLOW
	faults can cause the ASD timer to begin counting). Bit 5 of the opcode
	byte is reserved for future use ( not guaranteed to be 0).

	Solicited responses are variable length and, thus, begin with the first
	data byte being the byte count. The following 'opcodes ID's ' indicate
	that the EMM is responding to a request made via the TXDB register.
	There are 2 responses that can occur.

	EMM_Status = 0(16)	;Response to "EMM_status" request

		This operation returns the status of the EMM unit, which
		includes the contents of the status register and its PROM
		revision number

		Byte 0  (Packet size = 8 bytes)

 remark ==>	Byte 1  (Power controller Register)
			Bit 0 - regulator B status (0 = Off) (+5v BBU)
			Bit 1 - regulator C status (0 = Off) (+5v SBIA)
			Bit 2 - regulator D status (0 = Off) (-2v ECL)
			Bit 3 - regulator E status (follows state of bit 2)
			Bit 4 - regulator F status (0 = Off) (+5.2 ECL)
			Bit 5 - regulator H status (follows state of bit 4)
			Bit 6 - regulator J status (unused)
                        Bit 7 - BBU disable status (0 = enabled)

 remark ==>    Byte 2   (Margin Enable Registers)
			Bit 0 - regulator A margin enable status (0 = nominal)  
                        Bit 1 - regulator B margin enable status (0 = nominal)
			Bit 2 - regulator C margin enable status (0 = nominal)
			Bit 3 - regulator D margin enable status (0 = nominal)
			Bit 4 - regulator E margin enable status (follows bit 3
			Bit 5 - regulator F margin enable status (0 = nominal)
			Bit 6 - regulator H margin enable status (follows bit 5
			Bit 7 - regulator J margin enable status (0 = nominal)
		       
 remark ==>    Byte 3   (Margin Select Registers)
			Bit 0 - regulator A margin status (0 = low; 1 = high)
			Bit 1 - regulator B margin status (0 = low; 1 = high)
			Bit 2 - regulator C margin status (0 = low; 1 = high)
			Bit 3 - regulator D margin status (0 = low; 1 = high)
			Bit 4 - regulator E margin status (follows bit 3)
			Bit 5 - regulator F margin status (0 = low; 1 = high)
			Bit 6 - regulator H margin status (follows bit 5)
			Bit 7 - regulator J margin status (0 = low; 1 = high)

 remark ==>    Byte 4   (Least-Significiant Byte of the 16-bit MODOK Register)
			Bit 0 - regulator A OK status (1 = OK)
			Bit 1 - regulator B OK status (1 = OK)
			Bit 2 - regulator C OK status (1 = OK)
			Bit 3 - regulator D OK status (1 = OK)
			Bit 4 - regulator E OK status (1 = OK)
			Bit 5 - regulator F OK status (1 = OK)
			Bit 6 - regulator H OK status (1 = OK)
			Bit 7 - regulator J OK status (1 = OK)

 remark ==>     Byte 5  (Most-significiant Byte of the 16 bit MODOK Register)
			Bit 0 - regulator K OK status (1 = OK)
			Bit 1 - regulator L OK status (1 = OK)
			Bit 2 - MODULE K AC LO status (1 = OK)
			Bit 3 - MODULE L AC LO status (1 = OK)
			Bit 4 - \
			Bit 5 -  }- EMM unit number ( always 0 for venus
			Bit 6 - /
			Bit 7 - status of KEY OVERRIDE circuit

 remark ==>     Byte 6  (Miscellaneous Hardware Status Register)
			Bit 0 - status of AIR FLOW1 SENSOR (1 = fault)
                        Bit 1 - status of BBU unit         (1 = failure)
                        Bit 2 - status of MINUS 2V CROBAR  (1 = crowbar)
                        Bit 3 - status of AIR FLOW2 SENSOR (1 = fault)
                        Bit 4 - status of LATCHED AC LO    (1 = ac low)
                        Bit 5 - status of LATCHED DV LOW   (1 = dc low)
                        Bit 6 - status of PARITY CHECKER   (1 = always 1)
                        Bit 7 - status of PARITY ERROR     (1 = always 0)

 remark ==>     Byte 7  ( Miscelleneous Software Status Register )
			Bit 0 - status of  EXT OUTPUT signal (1= asserted)
                        Bit 1 - status of  DEFAULT MODE ENABLED (1 = ENABLED)
                        Bit 2 - status of  AUTO SHUTDOWN     ( 1 = ACTIVE )
                        Bit 3 - status of  5.5 interrupt DISABLED (1 =DISABLED)
                        Bit 4-7 - unused
 remark ==>     Byte 8  (EMM PROM Version Number0

			Bit 0-7 - integer value of the EMM PROM version


All the Bytes with a remark ==> you will find them in a snap shot analyses 
report file.