[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssdevo::hsd30_product

Title:HSD30 Product Conference
Moderator:SSDEVO::EDMONDSN
Created:Mon Apr 11 1994
Last Modified:Tue Jun 03 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:300
Total number of notes:1008

294.0. "HSD30 and 4000-100 Cluster" by TOSSUB::BRUSA () Wed Apr 09 1997 08:11

I have a problem to configure a cluster with two MVAX 4000-100 and a HSD30, 
the situation is the following :

OpenVMS 5.5-2
HSOF version 2.7

	__________		__________		__________
       |4000-100  |            |HSD30     |            |4000-100  |
       |          |            |single    |            |          |
       |dssi dssi |            |controller|            |dssi dssi |
       | 0     1  |            |          |            | 0     1  |
	__________		__________		__________
	 |    | |		    |			 |    | |
	 |    T T		    _			 |    T T
	 |_________________________| |___________________|
 	   	BC29R			   BC29R

In this configuration if I boot only a system, the vms boot correctly and no 
problem is present , when I try to boot the second cpu the system already 
booted hung and display the following error 

PAA0 PATH # 0 HAS GONE FROM GOOD TO BAD 

The console screen of the system booting is:
 
VMS 5.5-2 

CNXMAN using remote acces method to quorum disk
PAA0 CI port timeout
PAA0 port is reinitializing (49 retries left) check error log

After this the two systems hung and no operation is possible.

I have try to change the configuration and I moved the dssi connection from 
the bus #0 to the bus #1 on the two systems but in this way I'm unable to boot 
any systems. 
At the VMS boot the error  "EXECINIT-F-ERROR initializing boot device 
R0 = 0000028C" is displayed on the console screen and the systems halt.


I have try to cross the connection through the two systems (bus #0 to the 
bus #1 and then to the HSD30 ) but the problem is always present, the system 
that have the bus #1 connected is unable to boot.

This is a big problem because the systems are in a production environment and 
is available for check only on the Sunday and for a little time, the customer 
is upset because he have buyed a cluster configuration but we are unable to 
configure this.
Any suggestion is appreciate.

Thank you in advance 


Livio Brusa  

T.RTitleUserPersonal
Name
DateLines
294.1Get latest patches? Autogen? get a dumpVMSSPT::JENKINSKevin M Jenkins VMS Support EngineeringWed Apr 09 1997 09:3521
    
    Are you running with latest patch kits. There were some things fixed
    around this area... I'm not sure if they were VAX of ALPHA, but you
    should at least have VAXSHAD09, not for shadowing but for the
    cluster code. Then check for kits for SYS$SCS, like VAXSCSxxx.
    
    Just for starters... When the port reset there should have been some
    registers printed out. Get them decoded to find out what error
    interrupt was set. Another possibility could be not enough Nonpaged
    Pool... perhaps when they try to cluster the systme runs out of pool
    this could cause a hang...
    
    If all else fails you'll need to crash both systems at the same time..
    Hope you have seperate crash dumps... You should halt them both at the
    same time then force a crash. This would be needed for someone to
    try and figure out what is going on. It's important to halt them both
    as crashing one while the other is still running will change the
    state of things.
    
    Kevin
    
294.2patches installedTOSSUB::BRUSAWed Apr 09 1997 09:527
I have forgot that I have installed the following patches:

VAXSHAD09_U2055 & VAXDRIV04_070

without success.

Livio Brusa
294.3DSSI Node IDs?BRUNEL::KIRBYThu Apr 10 1997 10:1014
I hope you have set the 4000-100 systems to different DSSI Node IDs. Typically 
I would set one of them to 7, the other to 6, and the HSD to 0. No duplicates 
allowed on the bus. Don't forget to check any internal RFxx drives also.


On the later 4100s I think it is a console command (type "help" and look for 
"show" and "set" DSSI something) ... the earlier units I suspect there was a 
jumper. 

However your drawing implies a 4000-100A, so as long as the system firmware 
is reasonably up-to-date it should be done from the console.


				Steve.
294.4Are the machines 4108s ?KERNEL::MEGARITYI remember when Rock was youngThu Apr 10 1997 15:45163
Author                    : MARCI R POTTER
User type                 : DBA 
Location                  : USTIMA
Vaxmail address           : CSC32::POTTER       

Copyright (c) Digital Equipment Corporation 1997. All rights reserved.

+---------------------------+TM
|    |   |   |   |   |   |   |
|  d | i | g | i | t | a | l |           TIME   DEPENDENT   BLITZ
|    |   |   |   |   |   |   |
+---------------------------+


   BLITZ TITLE: Firmware Update for Vax 4000-108

  

   PRIORITY LEVEL: 

   DATE:March 26,1997
   TD #: 2269

   AUTHOR:Heather Kane
   DTN:223-4712
   EMAIL:Kane@proxy.enet.dec.com
   DEPARTMENT:RSE

   =================================================================

   PRODUCT NAMES:Vax 4000 108

   PRODUCT FAMILY: 

   Storage         ___
   Systems/OS      _X_
   Networks        ___
   PC/Peripherals  ___   
   Software Apps.  ___


   BLITZ TYPE: 

   Maintenance Tip           TIMA::INFO_X_  
   Service Action Requested  ___  


   IF SERVICE ACTION IS REQUESTED: (Check all that apply.)

   Labor Support Required     _X_  
   Material Support Required  ___  


   Estimated time to complete activity (in hours):
   Will this require a change in the field's inventory:  Yes __X_  No ___
   Will an FCO be associated with this advisory?  Yes ___  No _X_


   DESCRIPTION OF SERVICE ACTIVITY REQUESTED (if applicable):

	Firmware needs to be updated from V1.0 to V2.0 on VAX 4000-108

    *******************************************************************  


   PROBLEM STATEMENT:  

	When trying to install VAX 4000-108's in a DSSI cluster with other
	Vax4000-108's or any other clusterable sytem the cluster may not 
	configure properly and may cause hangs when a second system is booted.  

   SYMPTOM:  

	No matter what DSSI ID is set at console, when OVMS boots the ID is 
	always 7 so there is a conflict. 
     
   SOLUTION:	

	This is caused by OVMS not using the ID configured by the console with
	the SET_DSSI ID command. This is caused by a legacy code issue, 
	whereby OVMS will only use the DSSI ID value if Console firmware 
	Version is above 2.X.	    		

	The firmware needs to be updated to V2.0.  The version is available 
	by copying from :

  	may21::WRK:[MOPLOAD]kacat_v20_1.sys

   UPDATING PROCESS:

	The new firmware file needs to be copied to a mop$load area and then 
	perform the update.  Most systems have the firmware enable jumper
	(W3) on the CPU modules installed, which allows Firmware updates.  If 
	there is any problem with updating refer to the Vax4000-108 On-line 
	Service Guide as a reference.
 
***** On Server System *****

$ MCR NCP

NCP>SET CIRCUIT ISA-0 STATE OFF
NCP>SET CIRCUIT ISA-0 SERVICE ENABLED
NCP>SET CIRCUIT ISA-0 STATE ON
NCP>EXIT

$
$ COPY kacat_v20_1.sys MOM$LOAD:*.*
$
***** On Client System *****
>>>b/100 eza0

 (BOOT/R5:100 EZA0)

  2..

Bootfile: kacat_v20_1

-EZA0

 1..0..

FEPROM update program
                        ---CAUTION---
--- Executing this program will change your current FEPROM ---

Do you want to continue [Y/N] ? : y

Blasting in V2.0-1.   The program will take at most several minutes.

DO NOT ATTEMPT TO INTERRUPT PROGRAM EXECUTION

Doing so may result in loss of operable state !!!
+----------------------------------------+

10...9...8...7...6...5...4...3...2...1...0

FEPROM Programming successful
?06 HLT INST
        PC = 00008E24

>>>

cycle power


   VERIFICATION:

	Set DSSI ID on multi-node cluster as required and verify there are
	no conflicts.


   LARS INFORMATION: (Supplied by MCS)

       Attention Service Personnel: Begin the comment field of your LARS
       with the word "BLITZ" when you perform an activity associated with a
       BLITZ Type "Service Action Requested".



                     *** DIGITAL INTERNAL USE ONLY ***

\\ GRP=TIME_DEPENDENT CAT=HARDWARE DB=CSSE_TIME_CRITICAL
\\ TYPE=KNOWN_PROBLEM TYPE=BLITZ STATUS=CURRENT PROD=DEC4000-XXX PROD=4XXX
294.5TOSSUB::BRUSAFri Apr 11 1997 10:1314
***** Answer .3

The dssi id are different ,on the first cpu dssi 0 have id 7 and dssi 1 
have id 5, on the second cpu the dssi id are 6 and 4 ,the HSD30 have dssi id 1.
No internal or external RFxx drive are connected . 
The firmware version of the ka52a is 2.4 with vmb version 2.15.


***** Answer .4

The two system are 4000 model 100a.

Livio Brusa

294.6Later VMS required.BRUNEL::KIRBYFri Apr 11 1997 15:2021
I still suspect a DSSI ID problem. From memory, when the 4100 firmware was 
updated to include the setable DSSI IDs there was a blitz of some sort 
(that I can't find) that said that VMS was not yet coded to use the new method.

Note .4 is the same issue in reverse .... VMS is now looking for the V2 console.

Perhaps it is VMS V5.5-2 that doesn't support the new method???




So I've just dug out of the cupboard and read the "BA42B Based system DSSI 
Upgrade Manual", EK-500AA-UP. This says that VMS 5.5-2H4 is needed, so I 
would suggest you give that a try.

Also there is a STARS article that says the same, minimum of V5.5-2H4 or V6.1 
to support the KFDDA-A (the dual-DSSI module).


			Steve.

294.7upgraded to vms 5.5.2H4TOSSUB::BRUSAWed May 14 1997 13:419
    I'm sorry for the late response but only in the last weekend the
    customer have upgraded to Vms 5.5-2H4 but the problem is always
    present.
    Please someone have some suggestion.
    
    Thank you  
    
    
    Livio Brusa
294.8Same root ?STOWKS::SLUISHans van Sluis - StorageWorks Engineering Support Europe- DTN 889 9526Fri May 16 1997 13:0410
Livio,

This may sound stupid, but we're not booting from the same system
root, are we.

Did you setup your bootflags on node A as 0,1 and on node B as 1,1
(or as 0,0 and 1,0). This will cause node A to boot from SYS0 and node B
from SYS1.

Hans
294.9different rootTOSUP1::BRUSAMon May 19 1997 05:177
    
    Hy Hans
    
    The two systems boot from two different roots , the R5 registers are set
    at 00000000 and 10000000.
    
    	Livio Brusa
294.10helpSSDEVO::ASTORSubsystems Engineering SupportThu May 22 1997 18:2937
    Hi Livio,
    
    I'll just cover the stuff from the controller/DSSI side and leave the
    VMS stuff to Kevin and others.
    
      First please see blitz I wrote in note 114.  I know that you have
    drawn that you are using DSSI bus 0, but it might help anyway.
    
      To make sure you dont have a DSSI conflict, bring up one system,
    do a $ show cluster/cont, the add rport.  Shut this system down and
    bring up the other one and do the same thing.  Then, you'll know for
    sure that console/openvms/hsd30 isnt tricking you.  Notice I didnt
    single anyone out here :).
    
      While the system is up and running, run vtdpy on the HSD30.  You
    should get a few NOR's (no response) per second, because the controller
    is polling for DSSI ID's that dont exist on the bus.  NAK's should be 0.
    If NAKS are not 0, then there is a possible bus integrity problem.  
    
      I would recommend putting the HSD30 on the end of the bus with an
    external terminator.  Also, in that configuration, you could try DSSI
    bus 1 (see note 114).  The best part is you get to see the term pwr
    light on the terminator to make sure you have termination power.
    
      I know this sounds strange, but I have in my possesion a terminator
    that is bad and causes very similar symptoms.  I keep it around for
    training purposes :).  Could also be a bad cable or no termination.
    
    This DSSI is robust stuff, things can be marginal and if the legnth is
    short, it will work anyway.
    
    If you could provide an errorlog it would be helpful, as well as a
    SHOW THIS command from the controller.
    
    Good luck,
    
    Kurt
294.11Problem solvedTOSUP1::BRUSATue Jun 03 1997 05:4918
Hi 

I'm sorry for the delay in the updating ,but I'have been out of office for 
some weeks .
For some reason that I don't known the customer after the update 
from version 5.5-2 to version 5.5-2h4, don't have check if the configuration 
work correctly.
I'm very furious with the customer because when I have ask to him if the 
problem has been corrected the answer has been "the problem still exist".

I was gone in the last week to the customer site and I have boot the two 
system in cluster without any problem.

I apologize for the mistake and thank everybody .

Regards to all

Livio Brusa