[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference smurf::ase

Title:	ase

Moderator:	SMURF::GROSSO

Created:	Thu Jul 29 1993
Last Modified:	Fri Jun 06 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	2114
Total number of notes:	7347

1875.0. "Several urgent questions about V1.4" by DYOSW5::WILDER (Does virtual reality get swapped?) Tue Feb 11 1997 22:31

    Several questions/problems with TCR 1.4 Production server:
    
    Walked into customer site with a 2 node 8200 with TCR 1.4 Production
    server. They are running UNIX V4.0b with LARGE LSM raw partitions/drd
    services. 
    
    1) On one of the consoles, there were several messages of:
    "fnctl: Local lockmanager not registered"
    
    What is this and what does it mean?
    
    2) Occassionally getting the message "chk_bf_quota: user/group underflow"
    What does this mean and how do I fix it?
    
    3) After applying the bss_rm_iodone_bind patch for remote drd hangs
    (this IS an 8200): when booting one system with the second one down,
    there are NUMEROUS error messages as ASE starts about drd services
    (some have a favored member of the one that is down) shutting down and
    then restarting. After the node is up, running a drd_ivp states that
    the down node has an ASE_ID of -1 and that there is an error
    (obviously). Once I boot the second node and it is completely up, the
    drd_ivp runs fine and the ASE_ID is correct. Could this be due to the
    kdb patch?
    
    4) We are using /dev/rrz136c as a tie breaker. However, for LSM, we
    gave /dev/rrz136 (with no trailing "c") When booting the cluster, it
    states that /dev/rrz136c is not in an ASE service and that it will not
    use the disk. However, cnxshow shows the disk as a tie breaker. Is this
    okay? Can we/should we use /dev/rrz136 in the cnxset command instead of
    /dev/rrz136c?
    
    Thanks in advance for your help with these questions. The customer is
    asking for explanations and I obviously have none.
    
    /jim

T.R	Title	User	Personal Name	Date	Lines
1875.1		KITCHE::schott	Eric R. Schott USG Product Management	`Wed Feb 12 1997 00:02`	12
	> > 2) Occassionally getting the message "chk_bf_quota: user/group underflow" > What does this mean and how do I fix it? > It means that the advfs quota files are not accurate...generally running vquotacheck (when file systems are quiesent or boot time) will resolve. sys_check can help give you other clues of things to do with advfs. I don't know the other answers...
1875.2	LSM takes "g" and "h" partition	ADCA01::BALAJIC		`Wed Feb 12 1997 02:03`	10
	Hi, When you create a LSM disk without trailing "c" . I suppose it takes the "g" and "h" partition by default for makeing LSMpub and LSMpriv. I suppose the disklabel should show you that. Regards Balaji
1875.3	follow-up	DYOSW5::WILDER	Does virtual reality get swapped?	`Wed Feb 12 1997 10:49`	6
	Well, LSM takes the entire volume. I can check the disklabel, but on a 4GB disk, I have use of almost the entire disk. My real question is: for tie-breakers, can I use rrz136, or must I use an actual partition? /jim
1875.4	Still need answers for 2 questions	DYOSW5::WILDER	Does virtual reality get swapped?	`Thu Feb 13 1997 10:57`	10
	Well, we have solved questions 2 and 4. Thanks for the help. We still need help with questions 1 and 3 in the base note. Has anyone seen these and have ANY idea what is happening and hopefully how to solve them? Thanks, /jim
1875.5		KITCHE::schott	Eric R. Schott USG Product Management	`Thu Feb 13 1997 11:29`	4
	Hi I suggest your file an IPMT to get the attention you deserve.
1875.6	Further info on final question	DYOSW5::WILDER	Does virtual reality get swapped?	`Thu Feb 13 1997 21:54`	30
	Okay, we seem to have solved question 1. Here is more info on the last unanswered question. Before I file an IPMT, maybe somecan tell me what is causing this. 2 node Production Server environment: UNIX V4.0B and TCR 1.4. There are 4 drd services and 2 nfs services. One node boots fine, no problems. When the other node boots (this is true if the node is joining the cluster, or is the only one coming up), after ASE starts up we get the following messages (nodes are mcsteamboat and mctahoe) This ONLY happens on mcsteamboat: steamboat ASE: mctahoe Agent notice: /var/ase/sbin/lsm_dg_action: coldg: Disk group bench01_01_dg: No such disk group is imported ...voldg deport of disk group bench01_01_dg failed ...voldisk: Device rz130: Device is already offline This repeates for all the disks in the disk group, and for all diskgroups and for the nfs services. It appears that mcsteamboat THINKS it should own all the services (services are preferred, but they are split between the 2 nodes). Once the cluster is up, all works fine. All services can fail over, and everything that should work in a cluster seems to be working. This appears to be only a startup issue. Are there any ideas as to why one node would do this and the other node is fine? Any suggestions as to how to fix this? Thanks, /jim
1875.7	This is the expected behaviour...	BACHUS::DEVOS	Manu Devos DEC/SI Brussels 856-7539	`Mon Feb 17 1997 07:14`	24
	Jim, What you are seeing is normal. When a DECsafe or TRUcluster (with MC) system is booting, the stop script of each service is ran (except for the DRD services which have no script) and the service is stopped on the booting system. This operation is done to allow a clean-up of the application(s). Indeed, if the system is booting, maybe it is because it has crashed before, and thus a clean-up is maybe necessary. The default stop script is allowing you to discriminate the stop operation from a RUNNING system versus a BOOTING system by checking the MEMBER_STATE variable. > steamboat ASE: mctahoe Agent notice: > /var/ase/sbin/lsm_dg_action: coldg: Disk group bench01_01_dg: No such > disk group is imported > ...voldg deport of disk group bench01_01_dg failed > ...voldisk: Device rz130: Device is already offline Thus, these messages show you that ASE is trying to deport the diskgroup, but is is not imported, then it tried to place the disk offline, but it is already offline. These operations are part of the SERVICE STOP operation as explained. Regards, Manu.
1875.8	Thanks	DYOSW5::WILDER	Does virtual reality get swapped?	`Mon Feb 17 1997 13:14`	3
	Thanks,