[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::ase

Title:ase
Moderator:SMURF::GROSSO
Created:Thu Jul 29 1993
Last Modified:Fri Jun 06 1997
Last Successful Update:Fri Jun 06 1997
Number of topics:2114
Total number of notes:7347

1875.0. "Several urgent questions about V1.4" by DYOSW5::WILDER (Does virtual reality get swapped?) Tue Feb 11 1997 22:31

    Several questions/problems with TCR 1.4 Production server:
    
    Walked into customer site with a 2 node 8200 with TCR 1.4 Production
    server. They are running UNIX V4.0b with LARGE LSM raw partitions/drd
    services. 
    
    1) On one of the consoles, there were several messages of:
    "fnctl: Local lockmanager not registered"
    
    What is this and what does it mean?
    
    2) Occassionally getting the message "chk_bf_quota: user/group underflow"
    What does this mean and how do I fix it?
    
    3) After applying the bss_rm_iodone_bind patch for remote drd hangs
    (this IS an 8200): when booting one system with the second one down,
    there are NUMEROUS error messages as ASE starts about drd services
    (some have a favored member of the one that is down) shutting down and
    then restarting. After the node is up, running a drd_ivp states that
    the down node has an ASE_ID of -1 and that there is an error
    (obviously). Once I boot the second node and it is completely up, the
    drd_ivp runs fine and the ASE_ID is correct. Could this be due to the
    kdb patch?
    
    4) We are using /dev/rrz136c as a tie breaker. However, for LSM, we
    gave /dev/rrz136 (with no trailing "c") When booting the cluster, it
    states that /dev/rrz136c is not in an ASE service and that it will not
    use the disk. However, cnxshow shows the disk as a tie breaker. Is this
    okay? Can we/should we use /dev/rrz136 in the cnxset command instead of
    /dev/rrz136c?
    
    Thanks in advance for your help with these questions. The customer is
    asking for explanations and I obviously have none.
    
    /jim
    
T.RTitleUserPersonal
Name
DateLines
1875.1KITCHE::schottEric R. Schott USG Product ManagementWed Feb 12 1997 00:0212
>    
>    2) Occassionally getting the message "chk_bf_quota: user/group underflow"
>    What does this mean and how do I fix it?
>    

It means that the advfs quota files are not accurate...generally running
vquotacheck (when file systems are quiesent or boot time) will resolve.

sys_check can help give you other clues of things to do with advfs.

I don't know the other answers...

1875.2LSM takes "g" and "h" partitionADCA01::BALAJICWed Feb 12 1997 02:0310
    Hi,
    
    When you create a LSM disk without trailing "c" . I suppose 
    it takes the "g" and "h" partition by default for makeing 
    LSMpub and LSMpriv. 
    			I suppose the disklabel should show you that.
    
    Regards
    Balaji
    
1875.3follow-upDYOSW5::WILDERDoes virtual reality get swapped?Wed Feb 12 1997 10:496
    Well, LSM takes the entire volume. I can check the disklabel, but on a
    4GB disk, I have use of almost the entire disk. My real question is:
    for tie-breakers, can I use rrz136, or must I use an actual partition?
    
    /jim
    
1875.4Still need answers for 2 questionsDYOSW5::WILDERDoes virtual reality get swapped?Thu Feb 13 1997 10:5710
    Well, we have solved questions 2 and 4. Thanks for the help.
    
    We still need help with questions 1 and 3 in the base note. Has anyone
    seen these and have ANY idea what is happening and hopefully how to
    solve them?
    
    Thanks,
    
    /jim
    
1875.5KITCHE::schottEric R. Schott USG Product ManagementThu Feb 13 1997 11:294
Hi

 I suggest your file an IPMT to get the attention you deserve.

1875.6Further info on final questionDYOSW5::WILDERDoes virtual reality get swapped?Thu Feb 13 1997 21:5430
    Okay, we seem to have solved question 1. Here is more info on the last
    unanswered question. Before I file an IPMT, maybe somecan tell me what
    is causing this.
    
    2 node Production Server environment: UNIX V4.0B and TCR 1.4. There are
    4 drd services and 2 nfs services. One node boots fine, no problems.
    When the other node boots (this is true if the node is joining the
    cluster, or is the only one coming up), after ASE starts up we get the
    following messages (nodes are mcsteamboat and mctahoe) This ONLY
    happens on mcsteamboat:
    steamboat ASE: mctahoe Agent notice:
    /var/ase/sbin/lsm_dg_action: coldg: Disk group bench01_01_dg: No such
    disk group is imported
    ...voldg deport of disk group bench01_01_dg failed
    ...voldisk: Device rz130: Device is already offline
    
    This repeates for all the disks in the disk group, and for all
    diskgroups and for the nfs services. It appears that mcsteamboat THINKS
    it should own all the services (services are preferred, but they are
    split between the 2 nodes). Once the cluster is up, all works fine. All
    services can fail over, and everything that should work in a cluster
    seems to be working. This appears to be only a startup issue.
    
    Are there any ideas as to why one node would do this and the other node
    is fine? Any suggestions as to how to fix this?
    
    Thanks,
    
    /jim
    
1875.7This is the expected behaviour...BACHUS::DEVOSManu Devos DEC/SI Brussels 856-7539Mon Feb 17 1997 07:1424
Jim,

What you are seeing is normal. When a DECsafe or TRUcluster (with MC) system
is booting, the stop script of each service is ran (except for the DRD 
services which have no script) and the service is stopped on the booting
system. This operation is done to allow a clean-up of the application(s). 
Indeed, if the system is booting, maybe it is because it has crashed before,
and thus a clean-up is maybe necessary.

The default stop script is allowing you to discriminate the stop operation
from a RUNNING system versus a BOOTING system by checking the MEMBER_STATE
variable.

>    steamboat ASE: mctahoe Agent notice:
>    /var/ase/sbin/lsm_dg_action: coldg: Disk group bench01_01_dg: No such
>    disk group is imported
>    ...voldg deport of disk group bench01_01_dg failed
>    ...voldisk: Device rz130: Device is already offline

Thus, these messages show you that ASE is trying to deport the diskgroup, but
is is not imported, then it tried to place the disk offline, but it is already
offline. These operations are part of the SERVICE STOP operation as explained.

Regards, Manu.
1875.8ThanksDYOSW5::WILDERDoes virtual reality get swapped?Mon Feb 17 1997 13:143
    Thanks,