[Search for users]
[Overall Top Noters]
[List of all Conferences]
[Download this site]
Title: | ase |
|
Moderator: | SMURF::GROSSO |
|
Created: | Thu Jul 29 1993 |
Last Modified: | Fri Jun 06 1997 |
Last Successful Update: | Fri Jun 06 1997 |
Number of topics: | 2114 |
Total number of notes: | 7347 |
Hello All I cross posted this to the ADVfs conference as well.
anyone know of a patch for this?
I just got off the phone with a customer who's entire ASE had crashed.
Each system had crashed with this error:
ftx_bfmeta_rec_redo: got bmt page N1 instead of N2
I found a note in comet that related this to a bug in simport.
As far as I can tell from the note simport is the driver for the KZPSA.
The note also indicated there was a patch for this. I found a patch for
all versions except 4.0b. 4.0b is the version that we are using.
The comet entry lead me to suspect a corrupted domain which I found.
I deleted and recreated the domain and was able to get the cluster up.
As best I can tell this is what happened.
Node 1 corrupted the domain and crashed. The ASE tried to fail the service
to node two. Node two tried to mount the domain and crashed.
ASE the switched the service to node three. There you have it three dead
cluster members.
I have also noted a large number of CAM errors in the members logs
that may or may not be related. I am not sure because the HSZ's
had to be reset after the system crashes for a node to see the
shared disks.
Here is the setup. Three 4100's with 3'KZPSA connected to three HSZ50's
one on each KZP. TrueCluster Available Server 1.4. Digital Unix 4.0b.
As best as I can tell all systems have the current versions of firmware.
T.R | Title | User | Personal Name | Date | Lines
|
---|