[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference ssdevo::hsd30_product

Title:	HSD30 Product Conference

Moderator:	SSDEVO::EDMONDSN

Created:	Mon Apr 11 1994
Last Modified:	Tue Jun 03 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	300
Total number of notes:	1008

294.0. "HSD30 and 4000-100 Cluster" by TOSSUB::BRUSA () Wed Apr 09 1997 08:11

I have a problem to configure a cluster with two MVAX 4000-100 and a HSD30, 
the situation is the following :

OpenVMS 5.5-2
HSOF version 2.7

	__________		__________		__________
       |4000-100  |            |HSD30     |            |4000-100  |
       |          |            |single    |            |          |
       |dssi dssi |            |controller|            |dssi dssi |
       | 0     1  |            |          |            | 0     1  |
	__________		__________		__________
	 |    | |		    |			 |    | |
	 |    T T		    _			 |    T T
	 |_________________________| |___________________|
 	   	BC29R			   BC29R

In this configuration if I boot only a system, the vms boot correctly and no 
problem is present , when I try to boot the second cpu the system already 
booted hung and display the following error 

PAA0 PATH # 0 HAS GONE FROM GOOD TO BAD 

The console screen of the system booting is:
 
VMS 5.5-2 

CNXMAN using remote acces method to quorum disk
PAA0 CI port timeout
PAA0 port is reinitializing (49 retries left) check error log

After this the two systems hung and no operation is possible.

I have try to change the configuration and I moved the dssi connection from 
the bus #0 to the bus #1 on the two systems but in this way I'm unable to boot 
any systems. 
At the VMS boot the error  "EXECINIT-F-ERROR initializing boot device 
R0 = 0000028C" is displayed on the console screen and the systems halt.


I have try to cross the connection through the two systems (bus #0 to the 
bus #1 and then to the HSD30 ) but the problem is always present, the system 
that have the bus #1 connected is unable to boot.

This is a big problem because the systems are in a production environment and 
is available for check only on the Sunday and for a little time, the customer 
is upset because he have buyed a cluster configuration but we are unable to 
configure this.
Any suggestion is appreciate.

Thank you in advance 


Livio Brusa

T.R	Title	User	Personal Name	Date	Lines
294.1	Get latest patches? Autogen? get a dump	VMSSPT::JENKINS	Kevin M Jenkins VMS Support Engineering	`Wed Apr 09 1997 09:35`	21
	Are you running with latest patch kits. There were some things fixed around this area... I'm not sure if they were VAX of ALPHA, but you should at least have VAXSHAD09, not for shadowing but for the cluster code. Then check for kits for SYS$SCS, like VAXSCSxxx. Just for starters... When the port reset there should have been some registers printed out. Get them decoded to find out what error interrupt was set. Another possibility could be not enough Nonpaged Pool... perhaps when they try to cluster the systme runs out of pool this could cause a hang... If all else fails you'll need to crash both systems at the same time.. Hope you have seperate crash dumps... You should halt them both at the same time then force a crash. This would be needed for someone to try and figure out what is going on. It's important to halt them both as crashing one while the other is still running will change the state of things. Kevin
294.2	patches installed	TOSSUB::BRUSA		`Wed Apr 09 1997 09:52`	7
	I have forgot that I have installed the following patches: VAXSHAD09_U2055 & VAXDRIV04_070 without success. Livio Brusa
294.3	DSSI Node IDs?	BRUNEL::KIRBY		`Thu Apr 10 1997 10:10`	14
	I hope you have set the 4000-100 systems to different DSSI Node IDs. Typically I would set one of them to 7, the other to 6, and the HSD to 0. No duplicates allowed on the bus. Don't forget to check any internal RFxx drives also. On the later 4100s I think it is a console command (type "help" and look for "show" and "set" DSSI something) ... the earlier units I suspect there was a jumper. However your drawing implies a 4000-100A, so as long as the system firmware is reasonably up-to-date it should be done from the console. Steve.
294.4	Are the machines 4108s ?	KERNEL::MEGARITY	I remember when Rock was young	`Thu Apr 10 1997 15:45`	163
	Author : MARCI R POTTER User type : DBA Location : USTIMA Vaxmail address : CSC32::POTTER Copyright (c) Digital Equipment Corporation 1997. All rights reserved. +---------------------------+TM \| \| \| \| \| \| \| \| \| d \| i \| g \| i \| t \| a \| l \| TIME DEPENDENT BLITZ \| \| \| \| \| \| \| \| +---------------------------+ BLITZ TITLE: Firmware Update for Vax 4000-108 PRIORITY LEVEL: DATE:March 26,1997 TD #: 2269 AUTHOR:Heather Kane DTN:223-4712 EMAIL:Kane@proxy.enet.dec.com DEPARTMENT:RSE ================================================================= PRODUCT NAMES:Vax 4000 108 PRODUCT FAMILY: Storage ___ Systems/OS _X_ Networks ___ PC/Peripherals ___ Software Apps. ___ BLITZ TYPE: Maintenance Tip TIMA::INFO_X_ Service Action Requested ___ IF SERVICE ACTION IS REQUESTED: (Check all that apply.) Labor Support Required _X_ Material Support Required ___ Estimated time to complete activity (in hours): Will this require a change in the field's inventory: Yes __X_ No ___ Will an FCO be associated with this advisory? Yes ___ No _X_ DESCRIPTION OF SERVICE ACTIVITY REQUESTED (if applicable): Firmware needs to be updated from V1.0 to V2.0 on VAX 4000-108 ***************************************************************** PROBLEM STATEMENT: When trying to install VAX 4000-108's in a DSSI cluster with other Vax4000-108's or any other clusterable sytem the cluster may not configure properly and may cause hangs when a second system is booted. SYMPTOM: No matter what DSSI ID is set at console, when OVMS boots the ID is always 7 so there is a conflict. SOLUTION: This is caused by OVMS not using the ID configured by the console with the SET_DSSI ID command. This is caused by a legacy code issue, whereby OVMS will only use the DSSI ID value if Console firmware Version is above 2.X. The firmware needs to be updated to V2.0. The version is available by copying from : may21::WRK:[MOPLOAD]kacat_v20_1.sys UPDATING PROCESS: The new firmware file needs to be copied to a mop$load area and then perform the update. Most systems have the firmware enable jumper (W3) on the CPU modules installed, which allows Firmware updates. If there is any problem with updating refer to the Vax4000-108 On-line Service Guide as a reference. * On Server System *** $ MCR NCP NCP>SET CIRCUIT ISA-0 STATE OFF NCP>SET CIRCUIT ISA-0 SERVICE ENABLED NCP>SET CIRCUIT ISA-0 STATE ON NCP>EXIT $ $ COPY kacat_v20_1.sys MOM$LOAD:. $ *** On Client System * >>>b/100 eza0 (BOOT/R5:100 EZA0) 2.. Bootfile: kacat_v20_1 -EZA0 1..0.. FEPROM update program ---CAUTION--- --- Executing this program will change your current FEPROM --- Do you want to continue [Y/N] ? : y Blasting in V2.0-1. The program will take at most several minutes. DO NOT ATTEMPT TO INTERRUPT PROGRAM EXECUTION Doing so may result in loss of operable state !!! +----------------------------------------+ 10...9...8...7...6...5...4...3...2...1...0 FEPROM Programming successful ?06 HLT INST PC = 00008E24 >>> cycle power VERIFICATION: Set DSSI ID on multi-node cluster as required and verify there are no conflicts. LARS INFORMATION: (Supplied by MCS) Attention Service Personnel: Begin the comment field of your LARS with the word "BLITZ" when you perform an activity associated with a BLITZ Type "Service Action Requested". * DIGITAL INTERNAL USE ONLY *** \\ GRP=TIME_DEPENDENT CAT=HARDWARE DB=CSSE_TIME_CRITICAL \\ TYPE=KNOWN_PROBLEM TYPE=BLITZ STATUS=CURRENT PROD=DEC4000-XXX PROD=4XXX
294.5		TOSSUB::BRUSA		`Fri Apr 11 1997 10:13`	14
	*** Answer .3 The dssi id are different ,on the first cpu dssi 0 have id 7 and dssi 1 have id 5, on the second cpu the dssi id are 6 and 4 ,the HSD30 have dssi id 1. No internal or external RFxx drive are connected . The firmware version of the ka52a is 2.4 with vmb version 2.15. *** Answer .4 The two system are 4000 model 100a. Livio Brusa
294.6	Later VMS required.	BRUNEL::KIRBY		`Fri Apr 11 1997 15:20`	21
	I still suspect a DSSI ID problem. From memory, when the 4100 firmware was updated to include the setable DSSI IDs there was a blitz of some sort (that I can't find) that said that VMS was not yet coded to use the new method. Note .4 is the same issue in reverse .... VMS is now looking for the V2 console. Perhaps it is VMS V5.5-2 that doesn't support the new method??? So I've just dug out of the cupboard and read the "BA42B Based system DSSI Upgrade Manual", EK-500AA-UP. This says that VMS 5.5-2H4 is needed, so I would suggest you give that a try. Also there is a STARS article that says the same, minimum of V5.5-2H4 or V6.1 to support the KFDDA-A (the dual-DSSI module). Steve.
294.7	upgraded to vms 5.5.2H4	TOSSUB::BRUSA		`Wed May 14 1997 13:41`	9
	I'm sorry for the late response but only in the last weekend the customer have upgraded to Vms 5.5-2H4 but the problem is always present. Please someone have some suggestion. Thank you Livio Brusa
294.8	Same root ?	STOWKS::SLUIS	Hans van Sluis - StorageWorks Engineering Support Europe- DTN 889 9526	`Fri May 16 1997 13:04`	10
	Livio, This may sound stupid, but we're not booting from the same system root, are we. Did you setup your bootflags on node A as 0,1 and on node B as 1,1 (or as 0,0 and 1,0). This will cause node A to boot from SYS0 and node B from SYS1. Hans
294.9	different root	TOSUP1::BRUSA		`Mon May 19 1997 05:17`	7
	Hy Hans The two systems boot from two different roots , the R5 registers are set at 00000000 and 10000000. Livio Brusa
294.10	help	SSDEVO::ASTOR	Subsystems Engineering Support	`Thu May 22 1997 18:29`	37
	Hi Livio, I'll just cover the stuff from the controller/DSSI side and leave the VMS stuff to Kevin and others. First please see blitz I wrote in note 114. I know that you have drawn that you are using DSSI bus 0, but it might help anyway. To make sure you dont have a DSSI conflict, bring up one system, do a $ show cluster/cont, the add rport. Shut this system down and bring up the other one and do the same thing. Then, you'll know for sure that console/openvms/hsd30 isnt tricking you. Notice I didnt single anyone out here :). While the system is up and running, run vtdpy on the HSD30. You should get a few NOR's (no response) per second, because the controller is polling for DSSI ID's that dont exist on the bus. NAK's should be 0. If NAKS are not 0, then there is a possible bus integrity problem. I would recommend putting the HSD30 on the end of the bus with an external terminator. Also, in that configuration, you could try DSSI bus 1 (see note 114). The best part is you get to see the term pwr light on the terminator to make sure you have termination power. I know this sounds strange, but I have in my possesion a terminator that is bad and causes very similar symptoms. I keep it around for training purposes :). Could also be a bad cable or no termination. This DSSI is robust stuff, things can be marginal and if the legnth is short, it will work anyway. If you could provide an errorlog it would be helpful, as well as a SHOW THIS command from the controller. Good luck, Kurt
294.11	Problem solved	TOSUP1::BRUSA		`Tue Jun 03 1997 05:49`	18
	Hi I'm sorry for the delay in the updating ,but I'have been out of office for some weeks . For some reason that I don't known the customer after the update from version 5.5-2 to version 5.5-2h4, don't have check if the configuration work correctly. I'm very furious with the customer because when I have ask to him if the problem has been corrected the answer has been "the problem still exist". I was gone in the last week to the customer site and I have boot the two system in cluster without any problem. I apologize for the mistake and thank everybody . Regards to all Livio Brusa