[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::ase

Title:ase
Moderator:SMURF::GROSSO
Created:Thu Jul 29 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2114
Total number of notes:7347

1852.0. "Cluster Monitor Not Working" by NCMAIL::DOIRON () Thu Jan 30 1997 21:01

    I am installing and testing the following ASE configuration:
    
    2-AS1000A 5/400, each with 2 KZPSAs, 2 HSZ50s
    Digital UNIX 4.0b
    Truecluster Available Server 1.4
    
    
    Both systems were up and running as members of the cluster. I created
    two nfs services. Cluster monitor was configured and running properly.
    
    To test failover, I shut one of the systems down. The nfs service
    failed over correctly, however the cluster monitor show both systems as
    failed. If I clicked on the services icon, I get the following message:
    
    	"Have not received any ase reports"
    
    Meanwhile, asemgr shows the remaining system as up and running and I
    could nfs mount both services from a client. I rebooted the 2nd system,
    and relocated one of the services back to the 2nd system. Again cluster
    monitor showed both system as failed with the same message. 
    
    Asemgr showed both systems as up and running and I could nfs mount both
    services. We relocated services back and forth and tested network
    outages. All worked as expected, but cluster monitor was still not
    working properly.
    
    We finally rebooted both systems. Both members were up and running.
    Both nfs services working properly, but cluster monitor still showed
    both systems as failed with the same message.
    
    Any takers?
    
    -Ron 
T.RTitleUserPersonal
Name
DateLines
1852.1SMURF::MARSHALLRob Marshall - USEGFri Jan 31 1997 17:3525
    Hi Ron,
    
    Any takers for what? :-)  If you are looking for someone to fix this
    bug, there are two issues.
    
    One: bugs in tractd that have been fixed, but no official patch has
    been released.
    
    Two: bugs in submon which allow cmon to display that a node is up when
    it has really been down for up to 8 minutes.  This typically happens
    when the node that got shutdown was the director.  It takes submon 8+
    minutes to figure out that the director has gone away and start looking
    for the new one.
    
    I can point you to a patch for the tractd problem, but the submon
    problem has not been fixed.  You could open a CLD, let them know that I
    know (roughly) about the problem, and I will get the patch out to you.
    
    I know that saying this in the notes file will tend to raise the
    question in everyone's mind: why hasn't the official patch been
    released if a patch is available?  There are a number of reasons for
    that which I don't want to discuss in here.
    
    Rob Marshall
    USEG