[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::dec_mls_plus

Title:dec_mls_plus
Moderator:SMURF::BAT
Created:Mon Nov 29 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:534
Total number of notes:2544

484.0. "single-level NFS exported via router failure" by NNTPD::"sowards@mail.dec.com" (Mark Sowards) Mon Apr 21 1997 16:25

Hi All,
    I'm back.  Here's another strange problem that seems to be caused by
routing
through an MLS+ machine.  I've got a Sun Solaris(2.51) machine NFS exporting a

directory to the network.  The MLS box on the same network as the Sun has no 
problem accessing the exported directory.  This same MLS+ box acts as a router

to another network on which is another MLS+ box(both are running MLS+4.0a
beta).
This second MLS+ box can only travers the exported directory two some time
three
levels before failing and an error message is displayed...
  'NFS3 Server Sun1 not responding.  Still trying'
the user cannot go down the tree any farther.  He can go back up the tree. 
Has this been reported before?  Is there some workaround for this?
[Posted by WWW Notes gateway]
T.RTitleUserPersonal
Name
DateLines
484.1curiousSMURF::BATSegui la tua beatitudineMon Apr 21 1997 17:313
    Nothing I've heard of but I'll ask around.
    
    By MLS+ V4.0a "Beta" do you mean rev 38 or what?
484.2works here but we haven't set up sun: nfsstat say anything?SMURF::BATSegui la tua beatitudineMon Apr 21 1997 22:4522
    Lee suggests running tcpdump (see note on tcpdump setup) on the MLS+
    box in between.
    
    Set it up so that it is looking at both interfaces.
    
    And then to monitor the traffic between the far MLS+ box and the Sun
    box.  You should be able to see the traffic from the far box go out
    the interface on the Sun side, and you should see the traffic from the
    Sun side go to the far MLS+ box.
    
    FYI:  Lee set up a set of systems in the lab such that:
    
    [ single level ]   <->     [ MLS+ box A ]  <->   [ MLS+ box b ]
    
    
    Single level is exporting to the world (no restrictions in
    /etc/exports).
    
    MLS+ box B is importing from Single Level.  On MLS+ Box B you can
    cd down 6 levels (that's all we tried) and look around. Works fine.
    
    If we can set up a Sun box we will. 
484.3Beta = rev 38NNTPD::&quot;sowards@mail.dec.com&quot;Mark SowardsTue Apr 22 1997 21:165
Yes, early in March I ftp'ed and Barbara FedEx'ed what was the lates
pre-release
version of MLS+4.0a  I believe it was rev 38.
[Posted by WWW Notes gateway]
484.4see 446.13: what if they try resetting the Sun MTU_WINDOW?SMURF::BATSegui la tua beatitudineTue Apr 22 1997 22:0611
    Hi Mark,
    
    Lee found a Sun system here and we've reproduced the problem.
    tcpdump is telling us that the intermediate MLS+ box is not
    sending any full size frags through to the target MLS+ box.
    
    Once again it appears that somewhere the packet size is not
    getting decremented by the size of the IP options.  Or something
    like that :-)
    
    More later.
484.5sorry don't have a fix instantlySMURF::BATSegui la tua beatitudineTue Apr 22 1997 23:0914
    It looks at though the NFS code is not setting the max MTU size down by
    the size of the potential IP header options.
    
    But I'm not sure where the adjustment should go, I'll have to consult
    with others.
    
    Notes for us: See submit 181 for the one that fixed the tcp code
    (pmtu.c).  tcp_mss_send subtracts 40 if it is SEC_NET (a subsequent
    code re-basing fixed the IP options header calculation, so I think the 
    #if SEC_NET stuff line 960 stuff should go away, see submit # 530).
    
    But it appears NFS code is using a different algorithm for calculating
    the size (ku_sendto_mbuf:kudp_fastsend.c)... but is this the { only }
    udp place that needs adjusting?
484.6Thanks for the quick infoNNTPD::&quot;sowards@mail.dec.com&quot;Mark SowardsWed Apr 23 1997 00:0815
Thanks for the rapid response.  I'm sure my customer will be happy at
finally finding somethineg that wasn't User error.  

I don't follow this though......

>>> Notes for us: See submit 181 for the one that fixed the tcp code
>>> (pmtu.c).  tcp_mss_send subtracts 40 if it is SEC_NET (a subsequent
>>> code re-basing fixed the IP options header calculation, so I think the 
>>> #if SEC_NET stuff line 960 stuff should go away, see submit # 530).

  
   I breathlessly await your contemplations.

                Thanks.....
[Posted by WWW Notes gateway]
484.7SMURF::BATSegui la tua beatitudineWed Apr 23 1997 17:484
    > I don't follow this though......
    
    See where it says "Notes for us:"?  I didn't include "You" in "Us" (how
    very rude).  I should have said "Notes to myself" or something.
484.8See note 446.13 -- the customer set the MTU_WINDOW size?SMURF::BATSegui la tua beatitudineFri Apr 25 1997 01:1313
    Mark, do you happen to know how to tell the Sun system to limit its 
    MTU_WINDOW?  We'd like to try that here and see what effect it has, and
    have no documentation for our system (and no man pages).
    
    tcpdump is telling us that the fragments that are the max mms of 1480
    bytes are being dropped by the routing MLS+ box.  Smaller packets get
    forwarded.
    
    Lee exported a single-level file system from a vanilla DIGITAL UNIX box
    and the MLS+ box in-between had no problem passing the packets through
    to the importing MLS+ box.
    
    So apparently the problem is not reproducible without a Sun system.
484.9now we can reproduce their workaround?SMURF::BATSegui la tua beatitudineThu May 15 1997 23:3113
    I spoke with Mark this afternoon, and he got the info from David
    Hustler. To set the MTU window size on the Sun,
    
    # ifconfig -a
    	to get the list of config'ed devices
    
    # ifconfig dev mtu size
    
    	where "dev" is the name of the device from the -a list
    	where "size" is the size to which you want to set the interface
    
    He said that they are using this as a workaround at the moment to 
    handle this importing problem.
484.10and the winner is....SMURF::SCHOFIELDRick Schofield, DTN 381-0116Fri May 23 1997 17:0475
    We were able not only to reproduce the original problem, but to
    exacerbate it to the point where we couldn't even _mount_ the
    Solaris-exported file system on the MLS client.  What we discovered
    follows:
    
    There were 3 systems directly involved in mounting the Sun filesystem: 
    the MLS+ nfs client (on net 10.50.*.*), the MLS+ system doing the
    routing (routing for 16.141.*.* and 10.50.*.*) and the Sun box (on net
    16.141.*.*).  It looked like this:
                                                MLS+  routed -s
    MLS+   routed -q                         +---------------------+
    +-------------------+                    | chopin:             |
    |                   |                    | ln0 = 16.141.96.243 |>-----+
    | yjustey:          |                    |                     |      |
    | tu0 = 10.50.96.82 |>------------------<| ychopin:            |      |
    +-------------------+                    | ln1 = 10.50.96.243  |      |
                                             +---------------------+      |
                                                                          |
                            Sun   routed -q                               |
                          +---------------------+                         |
                          | stike:              |                         |
                          | le0 = 16.141.96.171 |>------------------------+
                          +---------------------+
    
    
    The problem boils down to the fact that there were multiple systems on
    the 16.141 net which were attempting to route for 10.50 (by running
    routed -s). (Check out the attached output from the ping -R command to
    see how many goofy routes the simple ping was taking).   This caused a
    cyberfight among conflicting routers, all of whom had their own ideas
    about where to route packets between stike and yjustey.  Once these
    systems had their routed's changed to the -q flag, the nfs problems
    went away between stike and yjustey.  In fact, not only could the
    exported filesystem on stike be mounted by yjustey, but directories at
    any depth could be examined and all normal nfs operations could be
    performed without error.
    
    So let's have the customer perform the following tests to see if the
    problem at their end is similar.  Capture the output from these
    commands:
    
    1)  On the MLS+ nfs client:
    		ping -R <sun_hostname>
        on the Sun box:
    		ping -R <mls_nfs_client_hostname>
    
    2)  On a third system somewhere on the network that the sun box is on:
    
    	# cd /dev
    	# ./MAKDEV pfilt
    	# ifconfig -a (note the name of the network device - usually either
    		ln0 or tu0)
    	# pfconfig +p +c <devicename>
        # tcpdump -s 256 -N host <sun_system>
    
        Then on the mls nfs client system, reproduce the failure condition
    by traversing the filesystem mounted from the sun box.
    
    	Rick
    
    yjustey:[ UNCLASSIFIED ]> ping -R stike
    PING stike.zk3.dec.com (16.141.96.171): 56 data bytes
    64 bytes from 16.141.96.171: icmp_seq=0 ttl=248 time=14 ms
    RR:     chopin (16.141.96.243)
            stike.zk3.dec.com (16.141.96.171)
            16.141.96.239
            16.141.96.2
            16.141.96.241
            16.141.96.2
            16.141.96.239
            16.141.96.2
            ychopin.zk3.dec.com (10.50.96.243)
    64 bytes from 16.141.96.171: icmp_seq=1 ttl=248 time=5 ms (same route)
    64 bytes from 16.141.96.171: icmp_seq=2 ttl=248 time=6 ms (same route)
    
484.11still need to follow up on this with LockheedSMURF::BATSegui la tua beatitudineThu May 29 1997 21:199
    Mark said they were worried that the systems wouldn't do the routing if
    they weren't running -s.  I said they should first verify with netstat
    -r and tcpdump to see if having multiple systems running routed -s was
    the problem.  And it makes a difference whether it is a class A or not
    network.  
    
    I said that were were able to not have the MTU_WINDOW size on the Sun
    box reconfigured to a smaller value and it was working.  (I hope that
    is correct.)  Why their changing it to make it work was a red herring.