[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::ase

Title:ase
Moderator:SMURF::GROSSO
Created:Thu Jul 29 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2114
Total number of notes:7347

1887.0. "net failures, affect on ASE" by LEXSS1::GINGER (Ron Ginger) Mon Feb 17 1997 13:13

    I have some questions on network failures and the action of ASE.
    
    Assume a simple 2 node ASE, doing an NFS service. I understand the
    system does ping over both the scsi bus and the net. Suppose the net
    card on the system running a service fails. The net ping will fail.
    
    
    1) does the system failover when the net ping fails? 
    2) how do we know which net card failed? maybe the card on the other
    machine failed, and the one runing the service is fine, and seeing the
    net just fine. Wont we failover and then find out we have really killed
    the service?
    
    I dont recall this being coveed in the theory of operation doc. Is it
    covered anywhere?
T.RTitleUserPersonal
Name
DateLines
1887.1/var/ase/lib/ni_status_awk script.BACHUS::DEVOSManu Devos DEC/SI Brussels 856-7539Tue Feb 18 1997 07:1918
Hi Ron,

If the system running the service is detecting that the card itself is broken,
then the control is given to the ni_status_awk script, otherwyse ASE is simply
saying that there is a partitioned network, in which case it kill the
asedirector so no subsequent failover is possible for any cause.

ASE is able to check if the card is down because it checks the "SEND" hardware
counter everytime it sends a packet. If this counter is not changing, then the
card is down, otherwyse it is the network or the other system.

When the ni_status_awk script is invoked on one system, its default behaviour 
is to "DISCONNECT" the system from ASE when ALL the "monitored" network of the
system are DOWN, so a failover of the services is done.

You can now change this default behaviour to do what you want.

Hope it helps, Manu.
1887.2LEXSS1::GINGERRon GingerTue Feb 18 1997 11:4723
    Thanks, but Im still confused. Lets try a diagram.
    
    ___________              ------------
    |         |              |          |
    | host A  |              | host B   |
    -----------              -----------
       |                        |
       -----------|  |----------|
                  |  |
               ---------
              | net hub |
              ----------
                  |------------> to "the world"
    
    suppose A is serving NFS, and someone trips over the wire between Host A 
    and the hub, or the hub card for host A fails. ASE will not be able to 
    ping between A and B. 
    
    I would expect ASE to failover the service to B, is this correct?
    
    If it was B's wire or hub card that had broken, then this would be bad,
    since A is still connected to the hub and running fine. How does ASE
    handle this?
1887.32 paths...TROOA::MSCHNEIDERmartin.schneider@tro.mts.dec.comWed Feb 19 1997 12:074
    ASE uses both the network interface and the shared SCSI bus to ping the
    other host.  So the failure of one interface should not preclude the
    hosts from determining the state of the other host via the alternate
    2nd bus.
1887.4LEXSS1::GINGERRon GingerWed Feb 19 1997 12:267
    OK, in my example both hosts are fine, but something in the net is broken
    that prevents B from getting to the net. Net pings will fail. Will ASE
    decide to stop the service on A and fail it over to B, even though A is
    still reaching the rest of the net just fine? 
    
    Has it been considered to have ASE ping some other 'independent' party
    out on the net to decide if it can reach real net clients?
1887.5NO FAILOVER in this case !!!BRSDVP::DEVOSManu Devos DEC/SI Brussels 856-7539Wed Feb 19 1997 19:4415
    No Ron,
    
    As I explained in .1, the Network card on A  (and on B) is OK as proved
    by the "send" hardware counter changing with packet. SO the ASEDIRECTOR
    is killed and NO failover will occur either NOW or LATER.
    
    In this particular case, ASE is NOT able to know if the ping is not
    working because a problem is affecting SYSTEM A cable or system A or B
    HUB card or SYSTEM B cable or SYSTEM B network card. So, ASE is simply
    preventing itself to fail any service over to the other host as long as
    this problem is persisting.
    
    Is it clear ?
    
    Manu.
1887.6USCTR1::ASCHERDave AscherThu Feb 20 1997 10:4126
    I think it would be useful to directly address Ron's last inquiry.
    There was some ambiguity in the .5 response.
    
                   <<< Note 1887.4 by LEXSS1::GINGER "Ron Ginger" >>>

    OK, in my example both hosts are fine, but something in the net is broken
    that prevents B from getting to the net. Net pings will fail. Will ASE
    decide to stop the service on A and fail it over to B, even though A is
    still reaching the rest of the net just fine? 
    
       To clarify - you are saying that "A" has NO net problems:
       good net card, good cable, can talk to other nodes but not
       B; B cannot reach the net because either it's net card is
       broken or it cable is out.
       
       It would be a definite design bug if ASE decided to failover
       under this scenario. In the tests that I have run at customer
       sites (only pulling cables out - not failing the net interface
       card) ASE does the right thing. It does NOT failover the
       service due to a net problem on a node NOT running the service.
    
        
    Has it been considered to have ASE ping some other 'independent' party
    out on the net to decide if it can reach real net clients?