[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::ase

Title:ase
Moderator:SMURF::GROSSO
Created:Thu Jul 29 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2114
Total number of notes:7347

1969.0. "local HSM: Error: xid_send: write failed" by BACHUS::DEVOS (Manu Devos DEC/SI Brussels 856-7539) Thu Mar 27 1997 09:50

    Hi,
    
    The above message is appearing in the daemon.log when a Monitored
    network is going down. The ni_status_ksh script has been customized
    to move the aliases from the down interface to an UP interface as
    well as the Network direct route. Can Engineering explain the causes
    and consequences of this error. It is the Primary network; has it a
    different behaviour regarding network failure?
    
    Thanks to shed some light on this, 
    Manu.
    
    The context is a dual 8400 with 4 tulip interfaces (netmask
    255.224.0.0)
    
    gandalf:  10.15.1.21	beorn:  10.15.1.22	netname: Ethernet1
    gandalf2: 10.79.1.21	beorn2: 10.79.1.22	netname: Ethernet2
    gandalf3: 10.143.1.21	beorn3: 10.143.1.22	netname: Ethernet3
    gandalf4: 10.207.1.21	beorn4: 10.207.1.22	netname: Ethernet4
    
    The aliases are:
    
    clinicom:  10.15.1.23
    clinicom2: 10.79.1.23
    clinicom3: 10.143.1.23
    clinicom4: 10.207.1.23
    
    Here an extract of the daemon.log (truncated). All the messages are
    coming from the local HSM (where the interface is going down)
    ---------------------------------------------------------------------
    Inf: net interface tu0 10.15.1.22 change old = NETWORK_INTERFACE_UP
                                             new = NETWORK_INTERFACE_DOWN
    War: Network interface tu0 10.15.1.22 DOWN
    Inf: exec'ing with pipe: /var/ase/sbin/ase_run_sh 2604  
    Not: /var/.../ase_run_sh: ifconfig tu0 delete clinicom1
    Not: /var/.../ase_run_sh: ifconfig tu1 alias clinicom1
    Not: /var/.../ase_run_sh: Aliasname clinicom1 moved to tu1
    Not: /var/.../ase_run_sh: route delete -net -interface Ethernet1
    Not: /var/.../ase_run_sh: delete net Ethernet1: gateway 7Q
    Not: /var/.../ase_run_sh: route add -net -interface Ethernet1 10.79.1.22
    Not: /var/.../ase_run_sh: add net Ethernet1: gateway 10.79.1.22
    Not: /var/.../ase_run_sh: ifconfig tu0 down
    ALE: HSM_NI_STATUS:10.15.1.22:DOWN:10.79.1.22:UP
                      :10.143.1.22:UP:10.207.1.22:UP
    Inf: net path gandalf (10.15.1.21) state change old = PING_OK 
                               new = PING_NOT_OK_INTERFACE_UNKNOWN
    War: Can't ping gandalf over the network
    Inf: exec'ing with pipe: /var/ase/sbin/ase_run_sh 2741  
    Not: /var/.../ase_run_sh: change host 10.15.1.21: gateway 10.79.1.21
    Not: /var/.../ase_run_sh: change host 10.79.1.21: gateway 10.79.1.21
    Not: /var/.../ase_run_sh: change host 10.143.1.21: gateway 10.143.1.21
    Not: /var/.../ase_run_sh: change host 10.207.1.21: gateway 10.207.1.21
    ALE: HSM_PATH_STATUS:10.15.1.21:DOWN:10.79.1.21:UP
                        :10.143.1.21:UP:10.207.1.21:UP
    Error: xid_send: write failed
    last message repeated 3 times
    local Simulator Notice: snd: exiting...
    Error: xid_send: write failed
    last message repeated 13 times
    last message repeated 6 times
    --------------------------------------------------------------------
    
    
T.RTitleUserPersonal
Name
DateLines
1969.1dust.zk3.dec.com::MarshallRob Marshall USEGThu Apr 03 1997 22:2015
Hi Manu,

xid_send is simply sending pings (using a DLI socket, with the destination
address the interfaces own - i.e. for the network a no-op but is used to
see if the network interface is still able to send data) out the interface.
If you ifconfig the interface down, you will see this error message.  Also,
if the send queue for the interface backs up, or some other problem.

And, yes, the member name network is special, but you would get this error
message on any monitored interface if you ifconfig it down.

Hope this answers your question,

Rob Marshall
USEG
1969.2Is the primary network different?BACHUS::DEVOSManu Devos DEC/SI Brussels 856-7539Fri Apr 04 1997 12:3830
    Thanks for your reply, Rob.
    
    The problem I am facing is a different behaviour of the Primary network
    versus the other. My ni_ststus_ksh script is movinf the aliasnames from
    a bad interface to a good one as well as changing the "direct" route of
    the network trough the same good interface. To avoid a ping-pong effect
    if a cable connector is giving intermittently bad contacts, I place the
    bad interface "down" as you see. My system has 4 networks (tu0, tu1,
    tu2 and tu3) and when I removed the tu3 cable,  tu0 is receiving the
    aliases previously owned by tu3 and the route is changed accordingly. I
    can then removed tu2 and all is OK. In the script, we defined a minimum
    number of interfaces than should be UP (2) otherwyse I send the
    HSM_LOOK_DISCONNECTED message to HSM. So if 3 off the 4 networks are
    down, the host is disconnected and the service(s) are failed over.
    
    Now, here is the problem. If I remove the cable of tu1, tu2 and tu3
    then the disconnection is done. So far so well. If I remove tu0 first
    and then tu1 and tu2 the script reacts as it should do, i.e. it sends
    the HSM_LOOK_DISCONNECTED message to HSM, but there is no reply from
    the "snd" command and no failover !!! So it appears that tu0 (primary)
    does not behave like the other networks. Also, I discovered that asemgr
    cannot be invoked from the system where tu0 cable is removed. So, the
    question of the original note: Is the primary network needed for
    certain operations that the route change caused by the path_status_awk 
    script does not cover?
    
    Thanks again, 
    
    Manu.
    
1969.3dust.zk3.dec.com::MarshallRob Marshall USEGFri Apr 04 1997 17:0316
Hi,

I guess I wasn't clear enough.  Yes, the primary, or member, network is
special.  All of the daemon-to-daemon traffic runs over that interface.

Normally the packets get to the output routine for the interface, see
that it is destined for one of it's own interfaces, and then re-routes
it via the loopback device (lo0).  But, since you have ifconfig'ed the
interface down, the packets are most likely not getting far enough to
get re-routed, and that is why all of the daemons appear to hang.

I guess I'm not clear on where the advantage is when you ifconfig the
interface down.  What does that really buy you?  And can you NOT do
that for the primary network?

Rob
1969.4understood !BRSDVP::DEVOSManu Devos DEC/SI Brussels 856-7539Mon Apr 07 1997 09:339
    Rob,
    
    Thanks for these detailled explanations. I am going to remove the
    "ifconfig tu0 down" command from my script and I will give you the result
    back, later this week.
    
    Reagards,
    Manu.
    
1969.5Thanks !!!BRSDVP::DEVOSManu Devos NSIS Brussels 856-7539Tue Apr 15 1997 06:2712
    Rob,
    
    I went yesterday at the customer site, and it is OK now, the message is
    not coming anymore.
    
    I still have some "network route" problem that I have never seen
    before, but I still have to investigate them.
    
    Thanks again,
    
    Manu.