[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference decwet::advfs_support

Title:AdvFS Support/Info/Questions Notefile
Notice:note 187 is Freq Asked Questions;note 7 is support policy
Moderator:DECWET::DADDAMIO
Created:Wed Jun 02 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1077
Total number of notes:4417

1058.0. ""can't migrate hole" error" by RHETT::MOORE () Mon May 12 1997 09:17

    I ran into an odd bug the other day.  We were adding some new storage
    to our crash dump server at the Atlanta CSC.  Several filesystems
    wound up migrating to new homes.  I did the migration with addvol/
    rmvol which worked beautifully in all cases except one.
    
    A domain consisted of three volumes, each an RZ29-B.  It was about
    50% full,  with the three volumes being about 65%, 55%, and 30% full
    respectively.  I was moving it to a single volume 12-GB stripe set
    on an HSZ50.  The domain was otherwise inactive at the time.
    
    I addvol'd the stripe set -- no problem.
    
    I rmvol'd the first RZ29B -- no problem.
    
    I rmvol'd the second RZ29B, and got the error message
    E_CANT_MIGRATE_HOLE, complaining about one of the vmcore files I
    was moving.  Since this is a crash dump server, there are lots of
    huge sparse files on it.  However, the file it called out was by
    no means the largest sparse file.
    
    I tried rmvol'ing the third RZ29B and it complained about the same
    file.  I moved this file to another filesystem.
    
    I tried again to rmvol the second RZ29B and it gave me the same error
    message on another file.  I removed this file and tried again.
    It complained about a third file.  I removed that one too, and the
    rmvol succeeded.
    
    I then tried again to rmvol the third volume and it succeeded.
    The domain was entirely migrated to the new stripe set.
    
    Any ideas on this?  Unfortunately, I had to destroy the evidence
    (this was a production machine and I had to get it back up quickly.)
    
    Martin Moore
    Digital UNIX Support
T.RTitleUserPersonal
Name
DateLines
1058.1Try a balance - it seems to work for meUNIFIX::HARRISJuggling has its ups and downsMon May 12 1997 09:4616
    As to why, I don't know, but I've experienced a similar problem in the
    past (in my case it was a domain where I analyzed CLD crash dumps when
    I worked in USEG).  I was constantly adding and removing partitions
    from the domain, as I shifted from needing more space for crash dump to
    needing more space for test file systems and back again.
    
    Anyway, I got around the problem by doing a balance on the domain.  It
    is my guess that the balance moved things around and as a side effect
    defrag'ed a few things.  After that I was able to do my rmvol's with
    out a problem.  Maybe the same effect could have occurred if I had
    defragmented the domain.  I don't know.
    
    So maybe this can be a workaround for you.  Someone else will have to
    try to answer the question of why.
    
    					Bob Harris
1058.2DECWET::MARTINMon May 12 1997 19:1022
Well, I've poked around and come up with a little bit of info.

First, you should be able to use addvol, rmvol, defragment, and balance on file
systems that are active and busy.  So, in theory, you could have had this
machine up and available to users while doing this transfer.

Note the key words "in theory".

We had a problem with defragment where it would look up the extent table of a
file, then a user would modify the file (truncation would cause the problem
easiest), *then* defragment would attempt to move it, which would fail because
it thought that there was data there that was no longer there.  We fixed this by
just ignoring the error, on the assumption that it would be OK on the next pass,
and besides, if the file didn't get defragged, it wasn't that big a deal.

I haven't investigated rmvol to see what its migration scheme is like.  Since
you say the file domain was quiescent while the rmvol was happening, that can't
have been the problem.  I would highly recommend opening a QAR against rmvol, so
we can allocate engineering efforts into fixing this.  I'm really curious if
this might have worked had you just tried to run rmvol a second time.

--Ken
1058.3RHETT::MOORETue May 13 1997 08:456
    re .2 --
    
    I actually did try the failing rmvol more than once (though I didn't
    mention that in the original post.)  It acted the same way each time.
    
    Martin
1058.4DECWET::MARTINTue May 13 1997 16:024
Interesting.  Definitely a bug in rmvol, then.  (Well, it's a bug either way,
but that's more information to help solve it....)

A QAR would be much appreciated.
1058.5RHETT::MOORETue May 13 1997 16:433
    Entered as QAR 52975.
    
    Martin