[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference decwet::winnt-clusters

Title:	WinNT-Clusters
Notice:	Info directories moved to DECWET::SHARE1$:[NT_CLSTR]
Moderator:	DECWET::CAPPELLOF

Created:	Thu Oct 19 1995
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	863
Total number of notes:	3478

590.0. "Problems with RAIDArray 310" by IJSAPL::ONDERWATER (Cor Onderwater @UTO) Thu Jan 30 1997 15:14

    Hi,
    
    After upgrading an AlphaServer 4100 cluster from two simple shared
    disks to a RaidArray 310 with two logical disks (a two-disk mirrorset,
    a 4-disk raid 5/3 set and a spare disk) manual failover sometimes
    fails. The error message we get is: 
      Unknown error code 3254846469 (sev=3, fac=0x201, id=0x405
      CliFmMan TransferGroup=-1040120827
    After this error the failovergroup (the raid 3/5 set) can not be brought
    on-line again (same error message as above) from either system. After
    rebooting the systems the cluster looks allright.
    
    NT version:      3.51 with service pack 5
    Digital Clusters for Windows NT 1.0 with service pack 1
    Upgraded SCSI driver (from StorageWorks Nijmegen)
    
    Is this a known problem? Is a solution available?
    
    Cor

T.R	Title	User	Personal Name	Date	Lines
590.1	disk not found	DECWET::LEES	Will, NTSG DECwest, Seattle	`Thu Jan 30 1997 19:29`	30
	Did you remove the old disk objects from the group before doing the upgrade. Please verify in the Cluster Administrator that the old disk objects are not still referenced in the group. Do you still have the problem after the reboot, or has it gone away? Will <<< Note 590.0 by IJSAPL::ONDERWATER "Cor Onderwater @UTO" >>> -< Problems with RAIDArray 310 >- Hi, After upgrading an AlphaServer 4100 cluster from two simple shared disks to a RaidArray 310 with two logical disks (a two-disk mirrorset, a 4-disk raid 5/3 set and a spare disk) manual failover sometimes fails. The error message we get is: Unknown error code 3254846469 (sev=3, fac=0x201, id=0x405 CliFmMan TransferGroup=-1040120827 After this error the failovergroup (the raid 3/5 set) can not be brought on-line again (same error message as above) from either system. After rebooting the systems the cluster looks allright. NT version: 3.51 with service pack 5 Digital Clusters for Windows NT 1.0 with service pack 1 Upgraded SCSI driver (from StorageWorks Nijmegen) Is this a known problem? Is a solution available? Cor
590.3	interesting!	MSE1::PCOTE	Rebuilt NT: 163, Rebuilt VMS:1	`Fri Jan 31 1997 13:41`	11
	hmmm, I have the exact symptoms on a HSZ40 and the STEAM service was running as well. Question: Do you have HSZDISK 2.51 installed ? Also, Set the FMlog verbosity to 6 to acquire more info in the fmlog when the problem occurs. (see the admin guide for info on how to do this)
590.4	RAIDArray 310 still fails at manual failover	IJSAPL::ONDERWATER	Cor Onderwater @UTO	`Fri Jan 31 1997 14:28`	13
	After stopping the STEAM services the error mentioned in .0 occurred again i.e. manual failover is correct, second manual failover (failback) fails. When no client is accessing a shared disk the problem did not occur (so far). We reinstalled the cluster software, but the error is still there. The RAIDArray disks show the following information at boot time: "HSZ20 V30Z" (firmware revision?). Is this supported buy the cluster software? Cor
590.5	Yes HSZDISK 2.51 installed	IJSAPL::ONDERWATER	Cor Onderwater @UTO	`Fri Jan 31 1997 15:42`	3
	Reply to .3 Yes, hszdisk v2.51 is installed. AlphaBios version is 5.21
590.6		MSE1::PCOTE	Rebuilt NT: 163, Rebuilt VMS:1	`Fri Jan 31 1997 15:44`	5
	Yeah, V3.0 is supported but you should upgrade to patch level 2.
590.7	please tell me where I can get V3.0 patch level 2	IJSAPL::ONDERWATER	Cor Onderwater @UTO	`Fri Jan 31 1997 15:56`	5
	Can you please tell me where I can get V3.0 patch level 2? Cor
590.8	note 495.6	MSE1::PCOTE	Rebuilt NT: 163, Rebuilt VMS:1	`Fri Jan 31 1997 16:14`	0
590.9	>>set this id=(0,..)	COPCLU::JTHOMSEN		`Sat Feb 01 1997 12:23`	13
	Hi! Had a similar problem with a HSZ40 controller where one could make a failover but could not get it back with a failover but had to reboot. In the HSZ40 there were no ID's set so with HSZ40>set this id=(0,1,2,3) - the problem seems to be solved. Maybe you have to do the same on your controller?? Regards Jan Thomsen MCS Denmark
590.10	Where comes that 3rd disk from????	IJSAPL::ONDERWATER	Cor Onderwater @UTO	`Sun Feb 02 1997 19:34`	38
	Hi, See last part of cluster trace file: -----Start 16:49:04.627 tid=170 Step II: Identifying shared devices by probing Dos physical drives 16:49:04.665 tid=170 Defined Dos Device: PhysicalDrive3 ==> \Device\Harddisk3\Partition0 16:49:04.710 tid=170 The Cluster Disk Driver is not attached to device PhysicalDrive3. This could be because the cluster disk driver, CluDisk, is not installed, or because some other driver has already attached to this device. Please verify that any disk filter drivers start after the CluDisk driver. File: E:\CLUBUILD.351\src\fm\fmdisk\device.c Line: 321 16:49:04.800 tid=170 The Failover Manager encountered an error or exception while invoking a method function. The Online operation on object FMDisk\_disk_0eb1e35b failed. File: E:\CLUBUILD.351\src\fm\fmcore\fmgroup.c Line: 1446 16:49:04.882 tid=170 No such disk. File: E:\CLUBUILD.351\src\fm\fmcore\fmgroup.c Line: 1447 16:49:04.965 tid=170 Putting group "Groep2" Offline 16:49:06.765 tid=170 The cluster manager has put group Groep2 OFF LINE on this system. Reason: Administrator request. File: E:\CLUBUILD.351\src\fm\fmcore\fmgroup.c Line: 1701 ------ end trace It looks as if a third disk is discovered. The RAIDArray 310 only offers 2. This happens when a manually failed-over group is manually failed back. Another thing: The times on the two systems differ one hour. Can this lead to problems? Cor
590.11		MSE1::PCOTE	Rebuilt NT: 163, Rebuilt VMS:1	`Mon Feb 03 1997 17:23`	13
	<<< Note 590.10 by IJSAPL::ONDERWATER "Cor Onderwater @UTO" >>> -< Where comes that 3rd disk from???? >- read note 390.3 and the release notes for the cause of the 'phantom disk'. Also, the problem you're seeing (and I'm seeing) with a manual failover and open files seem to be the root of the error message that you've noted on reply .0 More later,
590.12		MPOS01::naiad.mpo.dec.com::mpos01::cerling	I'm@witz.end	`Tue Feb 04 1997 13:31`	17
	I would guess our position to be that of waiting to see what Microsoft does. Since our NT Clusters only supports StorageWorks, and no other vendor, I cannot a monetary reason for Digital to support EMC for Digital's clustering. Microsoft might feel differently for follow-on Wolfpack. I doubt that it will be there for V1.0 of Wolfpack, either. Get your StorageWorks guy in there. Maybe if they really want clusters, they will bite the bullet and give StorageWorks a toe in the door in order for them to run clusters. Then they might realize they are paying too much for EMC when they can get what they want from StorageWorks. Pitch the benefits of clusters, but allow the StorageWorks guy to counter any perceived shortcomings of StorageWorks when compared to EMC. tgc
590.13		MSE1::PCOTE	Rebuilt NT: 163, Rebuilt VMS:1	`Tue Feb 04 1997 13:42`	14
	I have logged the problem referenced by the base note. btw, this has nothing to do with the SW310. wrt to EMC storage, Microsoft will provide a hardware qualification suite via the HCT. Hardware vendors, such as EMC will need to pass this qualification suite to get the NT cluster (wolfpack) stamp of approval. I'm sure EMC will persue this. I'm sure Digital will not bother to qual EMC storage for our (short lived) NT cluster product. Paul