[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference decwet::advfs_support

Title:AdvFS Support/Info/Questions Notefile
Notice:note 187 is Freq Asked Questions;note 7 is support policy
Moderator:DECWET::DADDAMIO
Created:Wed Jun 02 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:1077
Total number of notes:4417

1011.0. "Beserk advfsd crashing system" by INDYX::ram (Ram Rao, PBPGINFWMY) Mon Mar 03 1997 21:47

I just had an RZ58 housing a single volume domain get terminally sick.
I had /usr/users on the disk.  The system crashed.  In attempting to
recover, I booted to single user, and took the mount of /usr/users out
of fstab, and then tried to go multi-user.  Towards the end of processing
the rc3.d scripts, the system crashed.  This happened repeatedly.  I
traced this to the fact that advfsd was somehow still wanting to access
the corrupt disk, so I disabled the startup of advfsd in /sbin/rc3.d.
Now the system comes all the way up.

The panic string is:
  ftx_bfmeta_rec_redo: bs_pinpg error, page = N2\n N1 = 5, N2 = 339
preceded by some occurences of:
        AdvFS I/O error:
            Volume: /dev/rz6c
            Tag: 0xfffffff1.0000
            Page: 471
            Block: 1176960
            Block count: 16
            Type of operation: Read
            Error: 5

How do I get advfsd to stop looking for /dev/rz6c (the failed disk),
and crashing the system in the process?  I even tried renaming the
domain directory in /etc/fdmns, but that didn't advfsd from doing its
damage.  It has /dev/rz6c squirreled away in some private database.
How do I make it forget /dev/rz6c?

Thanks,

Ram
T.RTitleUserPersonal
Name
DateLines
1011.1SMURF::SCOTTTue Mar 04 1997 11:5011
Ram,

Did you rename the domain directory, or move it out of /etc/fdmns?
Removing or moving the domain directory out of /etc/fdmns should
prevent access to the domain by advfsd.

An undocumented approach is to add an entry for the disk into the file
/var/opt/advfsd/disks.ignore (see note 881).  This approach didn't work
on my system, though.

larry
1011.2DECWET::MARTINTue Mar 04 1997 14:1413
Moving the domain directory out of /etc/fdmns will prevent advfsd from accessing
the domain.  Renaming the domain directory but leaving it in /etc/fdmns will
still have advfsd try to access the domain.

The disks.ignore file only affects whether or not advfsd will look at a disk
from the standpoint of the disklabel (the "Devices & Volumes" view).  If a disk
is in the disks.ignore file, but still has a link somewhere within the
/etc/fdmns directory hierarchy, advfsd will look at it for AdvFS domain
information.

All of this information should be documented in the release notes.

--Ken
1011.3INDYX::ramRam Rao, PBPGINFWMYWed Mar 05 1997 18:005
I had renamed it and was experiencing problems.  I followed the suggestion
of .1 and .2 and moved the domain directory out of /etc/fdmns, and then
advfsd did not cause any further problems.

Thanks.
1011.4KITCHE::schottEric R. Schott USG Product ManagementSun Mar 09 1997 08:585
Hi

  I think this should be ipmt'd, as it should not crash the system
under any senario...

1011.5please send crash-data file to CANASTAHAN::HALLEVolker Halle MCS @HAO DTN 863-5216Mon Mar 10 1997 01:4216
    Ram,
    
    if you still have the crash-data file available, could you please send
    it to the CANASTA Mail Server using the following command:
    
    # Mail -s "Diagnose Case=ADVFS_1011 Customer=notes_on_decwet" 
    	can_server@xocomp.enet.dec.com < crash-data.n
    
    This will make sure, that at least the crash footprint gets documented
    in the CANASTA crash database.
    
    Thanks,
    
    Volker.
    
    PS: To learn more about CANASTA, please read note TURRIS::DIGITAL_UNIX 8919
1011.6I think this is what is happeningCSC32::RUTSCHOWJack of all trades, master of noneFri Mar 14 1997 18:449
    If it is what I think it is, I have worked on these crashes and
    engineering is aware of it.  What is happening is you have one process
    that hit the damaged domain and panics it, but right on it's heel is
    another process trying to do the samething, it tries to panic the
    domain, can't because the domain is already paniced and panics the
    system...  I think it won't happen if the first one is all the way
    through the code before the second starts on it's journey...
                       
    dale