[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference decwet::winnt-clusters

Title:WinNT-Clusters
Notice:Info directories moved to DECWET::SHARE1$:[NT_CLSTR]
Moderator:DECWET::CAPPELLOF
Created:Thu Oct 19 1995
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:863
Total number of notes:3478

590.0. "Problems with RAIDArray 310" by IJSAPL::ONDERWATER (Cor Onderwater @UTO) Thu Jan 30 1997 15:14

    Hi,
    
    After upgrading an AlphaServer 4100 cluster from two simple shared
    disks to a RaidArray 310 with two logical disks (a two-disk mirrorset,
    a 4-disk raid 5/3 set and a spare disk) manual failover sometimes
    fails. The error message we get is: 
      Unknown error code 3254846469 (sev=3, fac=0x201, id=0x405
      CliFmMan TransferGroup=-1040120827
    After this error the failovergroup (the raid 3/5 set) can not be brought
    on-line again (same error message as above) from either system. After
    rebooting the systems the cluster looks allright.
    
    NT version:      3.51 with service pack 5
    Digital Clusters for Windows NT 1.0 with service pack 1
    Upgraded SCSI driver (from StorageWorks Nijmegen)
    
    Is this a known problem? Is a solution available?
    
    Cor
T.RTitleUserPersonal
Name
DateLines
590.1disk not foundDECWET::LEESWill, NTSG DECwest, SeattleThu Jan 30 1997 19:2930
Did you remove the old disk objects from the group before doing the upgrade.  
Please verify in the Cluster Administrator that the old disk objects are not 
still referenced in the group.

Do you still have the problem after the reboot, or has it gone away?

Will
         <<< Note 590.0 by IJSAPL::ONDERWATER "Cor Onderwater @UTO" >>>
                        -< Problems with RAIDArray 310 >-

    Hi,
    
    After upgrading an AlphaServer 4100 cluster from two simple shared
    disks to a RaidArray 310 with two logical disks (a two-disk mirrorset,
    a 4-disk raid 5/3 set and a spare disk) manual failover sometimes
    fails. The error message we get is: 
      Unknown error code 3254846469 (sev=3, fac=0x201, id=0x405
      CliFmMan TransferGroup=-1040120827
    After this error the failovergroup (the raid 3/5 set) can not be brought
    on-line again (same error message as above) from either system. After
    rebooting the systems the cluster looks allright.
    
    NT version:      3.51 with service pack 5
    Digital Clusters for Windows NT 1.0 with service pack 1
    Upgraded SCSI driver (from StorageWorks Nijmegen)
    
    Is this a known problem? Is a solution available?
    
    Cor

590.3interesting!MSE1::PCOTERebuilt NT: 163, Rebuilt VMS:1Fri Jan 31 1997 13:4111

   hmmm, I have the exact symptoms on a HSZ40 and the STEAM service
   was running as well.

   Question: Do you have HSZDISK 2.51 installed ? Also, Set 
   the FMlog verbosity to 6 to acquire more info in the fmlog
   when the problem occurs. (see the admin guide for info on
   how to do this)


590.4RAIDArray 310 still fails at manual failoverIJSAPL::ONDERWATERCor Onderwater @UTOFri Jan 31 1997 14:2813
    After stopping the STEAM services the error mentioned in .0 occurred
    again i.e. manual failover is correct, second manual failover
    (failback) fails. When no client is accessing a shared disk the problem
    did not occur (so far).
    
    We reinstalled the cluster software, but the error is still there.
    
    The RAIDArray disks show the following information at boot time:
    "HSZ20 V30Z" (firmware revision?). Is this supported buy the cluster
    software?
    
    Cor
     
590.5Yes HSZDISK 2.51 installedIJSAPL::ONDERWATERCor Onderwater @UTOFri Jan 31 1997 15:423
    Reply to .3
    
    Yes, hszdisk v2.51 is installed. AlphaBios version is 5.21
590.6MSE1::PCOTERebuilt NT: 163, Rebuilt VMS:1Fri Jan 31 1997 15:445

    Yeah, V3.0 is supported but you should upgrade to patch level 2.

    
590.7please tell me where I can get V3.0 patch level 2IJSAPL::ONDERWATERCor Onderwater @UTOFri Jan 31 1997 15:565
    
    Can you please tell me where I can get V3.0 patch level 2?
    
    Cor
    
590.8note 495.6MSE1::PCOTERebuilt NT: 163, Rebuilt VMS:1Fri Jan 31 1997 16:140
590.9>>set this id=(0,..)COPCLU::JTHOMSENSat Feb 01 1997 12:2313
    Hi!
    
    Had a  similar problem with a HSZ40 controller where one could make a 
    failover but could not get it back with a failover but had to reboot.
    In the HSZ40 there were no ID's set so with
    HSZ40>set this id=(0,1,2,3) - the problem seems to be solved. Maybe you
    have to do the same on your controller??
    
    Regards
    
    Jan Thomsen
    MCS Denmark
    
590.10Where comes that 3rd disk from????IJSAPL::ONDERWATERCor Onderwater @UTOSun Feb 02 1997 19:3438
   Hi,
    See last part of cluster trace file:
    -----Start
    16:49:04.627 tid=170 Step II: Identifying shared devices by
      probing Dos physical drives
    16:49:04.665 tid=170 Defined Dos Device: 
      PhysicalDrive3 ==> \Device\Harddisk3\Partition0
    16:49:04.710 tid=170 The Cluster Disk Driver is not 
      attached to device PhysicalDrive3.
    This could be because the cluster disk driver, CluDisk, 
      is not installed,
    or because some other driver has already attached to 
      this device.
    Please verify that any disk filter drivers start after 
      the CluDisk driver.
       File: E:\CLUBUILD.351\src\fm\fmdisk\device.c    Line: 321
    16:49:04.800 tid=170 The Failover Manager encountered an 
      error or exception
    while invoking a method function.
    The Online operation on object FMDisk\_disk_0eb1e35b failed.
        File: E:\CLUBUILD.351\src\fm\fmcore\fmgroup.c    Line: 1446
    16:49:04.882 tid=170 No such disk.
        File: E:\CLUBUILD.351\src\fm\fmcore\fmgroup.c    Line: 1447
    16:49:04.965 tid=170 Putting group "Groep2" Offline
    16:49:06.765 tid=170 The cluster manager has put 
      group Groep2 OFF LINE
    on this system. Reason: Administrator request.
        File: E:\CLUBUILD.351\src\fm\fmcore\fmgroup.c    Line: 1701
    ------ end trace
    
    It looks as if a third disk is discovered. The RAIDArray 310 only
    offers 2. This happens when a manually failed-over group is manually
    failed back. 
    Another thing: The times on the two systems differ one hour. Can this
    lead to problems?
    
    Cor
    
590.11MSE1::PCOTERebuilt NT: 163, Rebuilt VMS:1Mon Feb 03 1997 17:2313

         <<< Note 590.10 by IJSAPL::ONDERWATER "Cor Onderwater @UTO" >>>
                    -< Where comes that 3rd disk from???? >-

    read note 390.3 and the release notes for the cause of the 
    'phantom disk'.

    Also, the problem you're seeing (and I'm seeing) with a manual
    failover and open files seem to be the root of the error message
    that you've noted on reply .0

    More later,
590.12MPOS01::naiad.mpo.dec.com::mpos01::cerlingI'm@witz.endTue Feb 04 1997 13:3117
	I would guess our position to be that of waiting to see what 
	Microsoft does.  Since our NT Clusters only supports StorageWorks,
	and no other vendor, I cannot a monetary reason for Digital to
	support EMC for Digital's clustering.  Microsoft might feel 
	differently for follow-on Wolfpack.  I doubt that it will be there
	for V1.0 of Wolfpack, either.

	Get your StorageWorks guy in there.  Maybe if they really want
	clusters, they will bite the bullet and give StorageWorks a toe
	in the door in order for them to run clusters.  Then they might
	realize they are paying too much for EMC when they can get what
	they want from StorageWorks.  Pitch the benefits of clusters, but
	allow the StorageWorks guy to counter any perceived shortcomings of
	StorageWorks when compared to EMC.

tgc
590.13MSE1::PCOTERebuilt NT: 163, Rebuilt VMS:1Tue Feb 04 1997 13:4214

  I have logged the problem referenced by the base note. btw, this
  has nothing to do with the SW310.

  wrt to EMC storage, Microsoft will provide a hardware qualification
  suite via the HCT. Hardware vendors, such as EMC will need to pass
  this qualification suite to get the NT cluster (wolfpack) stamp of
  approval. 

  I'm sure EMC will persue this. I'm sure Digital will not bother
  to qual EMC storage for our (short lived) NT cluster product.

  Paul