[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference humane::scheduler

Title:	SCHEDULER
Notice:	Welcome to the Scheduler Conference on node HUMANEril
Moderator:	RUMOR::FALEK

Created:	Sat Mar 20 1993
Last Modified:	Tue Jun 03 1997
Last Successful Update:	Fri Jun 06 1997
Number of topics:	1240
Total number of notes:	5017

1089.0. "Job remains in a 'running' state" by BACHUS::BANKEN () Tue Apr 30 1996 15:35

Hello,

Configuration : MV3190 as Scheduler Server running OpenVms 6.2 and Scheduler 2.1b-7
                A cluster of Agents running OpenVms 5.5-2 and Sched Agent 2.1b-5

Problem: From times to times a job running on a agent never get's his status updated, that means the job on
         agent side is finished, but on the server side the status is still 'running'.
	 That happens once or twice a day, from both agent, other local and remote jobs continue to work
         without problem (if dependencies allow it).

In the DBC083_REMOTE_EXECUTOR log file we may find the following :

entered outer main loop
assign to mailbox failed; trying again  <- could this be a problem ?


Most of the jobs are run very often (each 5 or 10 minutes)and take few secondes to complete.

Here is an extract of the agent log file :

 
539033266:                 receieved job 113 from scheduler node BKS010
539033266:         local username: EXPSYSTEM
539033266:        remote username: EXPSYSTEM
539033266:                command: @[.CHECK_CLUSTER]CHECK_CLUSTER
539033266:            output file: log_EXPSYSTEM_bigsys:check_cluster_bigsys

Job stats:
           cputime: 103 ticks
             maxws: 1219
            faults: 1248
               ios: 191
               elapsed: 1 secs

					<- this job gave the problem
           				   there is no  Job ends message !

539033266:                 receieved job 114 from scheduler node BKS010
539033266:         local username: EXPSYSTEM
539033266:        remote username: EXPSYSTEM
539033266:                command: @[.CHECK_DISKS]CHECK_DISKS
539033266:            output file: log_EXPSYSTEM_bigsys:CHECK_DISKS

Job stats:
           cputime: 94 ticks
             maxws: 731
            faults: 608
               ios: 181
           elapsed: 1 secs

Job 114 ends at Tue Apr 30 13:20:41 1996

539033266: final error on connect to socket 4   <- May this be considered as normal ?
539033266: send failed
539033266:                 receieved job 120 from scheduler node BKS010
539033266:         local username: EXPSYSTEM
539033266:        remote username: EXPSYSTEM
539033266:                command: @[.CHECK_QUEUE]CHECK_QUEUE


Any info will be welcome.

Best regards,

Alain.

T.R	Title	User	Date	Lines
1089.1		BACHUS::BANKEN	`Thu May 02 1996 08:48`	10
	Hello, What do you think about the error messages (logfiles) ?. Most of the jobs are remote jobs, may we have confidence in Scheduler in such a configuration, as someone a site where remote jobs are used intensively ?. Please react, Alain.
1089.2		BACHUS::BANKEN	`Fri May 03 1996 09:49`	10
	Hi Scheduler Team, I may imagine that you are all busy with very important stuff, but on the other hand we must keep customers satisfied. This case will be IMPT'ed ... Thanks in advance for your comprehension. Alain.