[Search for users] [Overall Top Noters] [List of all Conferences] [Download this site]

Conference noted::hackers_v1

Title:-={ H A C K E R S }=-
Notice:Write locked - see NOTED::HACKERS
Moderator:DIEHRD::MORRIS
Created:Thu Feb 20 1986
Last Modified:Mon Aug 03 1992
Last Successful Update:Fri Jun 06 1997
Number of topics:680
Total number of notes:5456

331.0. "Could I use lock manager for this??" by RUMOR::FALEK (The TU58 King) Wed Oct 08 1986 16:28

I have a scheduler program that was originally designed to work on a
single CPU but now I want to make it useable on a cluster. The scheduler
runs as a detached process. External programs, such as the human interface,
communicate with the scheduler by directly reading or writing the
scheduler's database, and then kicking the scheduler by sending a mailbox
message which basically says "Hey, wake up and look at record number n".

Since the database file is already accessible to all nodes on the cluster,
that part does not present a problem. But there are no cluster-wide mailboxes
so I need to replace that part with another mechanism. I want something
simple and elegant. I'm wondering, is there any way I could use the distributed
lock manager somehow to do what I want?

Restrictions:

I need to pass up to 16 bytes of data to the scheduler and cause one of its
local event flags to get set (as of now, this is done through mailbox i/o
completion).

I don't want to have special detached processes sitting around on each node
of the cluster.

I don't want the overhead of setting up DECnet links to the node the
scheduler is running on.

     Anyone have any ideas?
T.RTitleUserPersonal
Name
DateLines
331.1SCS services, maybe?FROST::HARRIMANDEC 41-BLANK-03, Harriman,Paul J., qty 1Thu Oct 09 1986 11:531
    
331.2Read Chapter 12 of System Services Ref. Man.QUILL::NELSONJENelsonThu Oct 09 1986 15:5128
    Yes, the lock manager will do exactly what you want.
    
    Here's what you do:
    
    	Your scheduler process gets started (on just one node of your
        cluster!) and $ENQs a lock for EXclusive access, then converts
    	the lock to PW specifying a value block (value blocks discussed
    	later), and a blocking AST routine called DOORBELL.  The scheduler
    	$HIBERnates.
    
        When the DOORBELL routine runs, it converts the lock to NL mode,
    	then $ENQWs a request to get the lock back in EX mode.  Once
   	the request is satisfied, you copy the value block to local
    	storage, re-initialize the value block, and convert the lock
    	back to PW, specifying DOORBELL as your blocking AST routine.
    	The scheduler can then act on the contents of the value block.
    
    	Value blocks are used to pass information (up to 16 bytes) between
    	processes.
    
    	The program that wants to communicate with the scheduler $ENQWs
    	a request for the lock in EX mode.  Once granted, it fills in
    	the value block, and converts the lock to NL mode.

    Hope this has been clear.
    
    				JENelson
331.3will try it, thanks - RUMOR::FALEKThe TU58 KingThu Oct 09 1986 16:274
    Thanks, I'll write a little test program to try this.
    
    (I also entered this as note 1731 in VMSnotes and got some response
    there, it says essentially the same thing as .2)
331.4CLT::GILBERTeager like a childThu Oct 09 1986 21:3731
Yes, but what if...

	What if two processes simultaneously want to communicate with
	the scheduler?  Processes A and B $ENQW EX mode lock requests.
	The scheduler's DOORBELL routine runs, converts the lock to
	NL mode, and $ENQs a request to get the lock back in EX mode
	(your note had this last one as '$ENQW', which is incorrect).
	Now process A is granted the EX lock, fills the value block,
	and converts to NL mode.  Process B gets the lock, fills the
	value block (thereby trashing what process A wrote there), and
	converts to NL mode.  Then the scheduler is granted its EX lock.

	Note that the scheduler receives only one of the two messages.


	Instead, to communicate with the scheduler should $ENQ (it's
	okay to wait here, if you like) an EX mode request.  When it's
	granted, check whether the value block already contains a message.
	While there's a message in the value block, convert to NL mode,
	and $ENQ another EX mode request for the message.  When there
	isn't a message in the value block, fill the value block, and
	convert the lock to NL mode.

	And before the scheduler's DOORBELL routine converts the lock
	to NL mode, it should clear the value block to indicate that
	it contains no message.


	Instead of clearing the value block, you could use the flags (to
	the $ENQ service) and a lock-status-block status of SS$_VALNOTVALID
	to indicate whether the value block contains a message.
331.5DECnet is not so badTAV02::NITSANNitsan Duvdevani, Digital IsraelThu Oct 16 1986 07:298
re .0

> I don't want the overhead of setting up DECnet links to the node the
> scheduler is running on.

In a small "benchmark" we made about a year ago (on a small cluster),
DECnet communication (using the CI) was more efficient than the distributed
lock manager.
331.6DECnet links don't HAVE to be slowCRATE::COBBDanny Cobb, DSS Eng, LKGMon Oct 20 1986 15:186
    Lou, instead of creating/deleting processes for logical links,
    write your own program that declares itself a decnet object and
    handles the incoming links.  Its fast, and handling the multithreading
    isn't too tough, and your connects are practically instantaneous.
    
    Danny
331.7Lock manager mechanism works great!RUMOR::FALEKex-TU58 KingWed Oct 22 1986 16:5114
    Re: .6 (Having the server declare itself as a network object and
    handle incoming DEcnet connects) - would certainly work, and has the
    advantage (not relevant for this particular application) of also
    working on a wide-area network. But... I've coded the mechanism
    described in .3,.4 using $ENQ and it works great!! Connections
    appear to the user as being nearly instantaneous. I don't think
    DECnet connects to a server can work as quickly. 

    The distributed lock manager is neat stuff! The first time I read
    the documentation it seemed confusing, but it is simple to use once
    you get the concepts down. Our group is putting all our workstations
    on a LAVC, so maybe I can use some of my new-found knowlege to make
    some useful cluster utilities, like f'rinstance something to cause
    execution of a VMS command on all nodes of a cluster at once.
331.8How can I pass a message to all nodes of cluster?FALEK::FALEKex-TU58 KingSun Nov 16 1986 02:5772
    I now have my job-creating scheduler program's user-interface working
    cluster-wide, using mechanisms discussed earlier in this note.
    Users can talk to the scheduler from any node, but jobs get created
    only on the node the scheduler is running on. There can be only
    one copy of the scheduler running per cluster.
    
    It would be useful to generalize things furthur. I'd like to add
    a "NODE" field to the (common diskfile) database and run a scheduler
    on EACH node of the cluster. If the node field in the database
    is blank, the job may be run on any CPU in the cluster, otherwise
    just on the specified CPU.
    
    I came up with a lock-based communications mechanism to help implement
    this, but when I tried it out, I found that my design is wrong. I'm
    going to describe it here (any why it doesn't work) in hopes that
    readers of this note may be able to offer hints that may help me to
    get around the problem.
    
    My design has one of the schedulers be "master", and all others
    slaves. The master is the one that holds the "KO" lock in exclusive
    mode. Since a scheduler never gives this up once it gets it, the
    master is the first scheduler started in the cluster. If the node
    crashes or someone kills the master process,  one of the slaves
    will get the KO_LOCK and the KO_AST that goes along with the lock
    will make it the new master. The KO_AST sets a bit in the scheduler
    that tells it that it is the master, and also sets up the
    user-interface lock mechanism described in earlier responses to
    this note. The user-interface always talks to the master scheduler.
    (This stuff, the passing of master-ship and the user-interface always
    talking to the master from any node, works correctly)
    
    Now let's say that the master wants to send a message to one of the
    slaves. Messages consist of a flag-byte, a destination node, and
    a message.  My original (doesn't work) design had each scheduler, at
    startup time, enqueue a request for the "Round Robbin" (RR) lock
    in exclusive mode, specifying the "RR_AST" to be delivered when
    the EX-mode request is granted.

    When the scheduler acting as master gets the EX-mode RR-lock, it keeps
    it until it has a message to circulate to all the slaves. To circulate
    a message, it puts the message in the value block, downgrades the lock
    to NL-mode, and $ENQS another request to get the lock back in Ex-mode.

    When a scheduler acting as slave gets the lock, it reads the message
    in the value block, sets the flag-byte to "H" if the message was
    for it, and then downgrades the lock to NL, thereby letting the next
    guy get it. It then $ENQs a request to get the lock again in EX-mode.

    Eventually the master should get the EX-mode lock again, look at the
    flag byte to see if the message was accepted by anybody, and then keep
    the lock in EX-mode until it has another message to circulate. I
    reasoned that when the master downgrades the lock and then requests
    the upgrade, the conversion request goes at the end of the queue and
    so all slaves should get a crack at it before the master gets it again.

    Unfortunately, my mechanism only works for up to 1 slave.
    The problem is that new slaves can never get the lock for the first
    time (even if they first request it in NL: mode and try to upgrade)
    because apparently the lock CONVERSION queue must be empty before any
    NEW locks can ever be granted. Since in my scheme either the first slave
    or the master will always have an outstanding request for a conversion
    of the RR lock to EX-mode, no new slaves can ever acquire the lock for the
    first time.  

    Is this analysis correct?  I've gotta believe VMS must do this sort of
    thing all the time, so there must be some way to do it!

    Can anybody think of a scheme whereby the master can pass a message to
    the servers on ALL nodes of the cluster?

			lou falek
    
331.9see also CSSE32::CLUSTERFALEK::FALEKex-TU58 KingFri Nov 21 1986 02:096
    I posted my question about using the lock manager to send a message
    to all nodes of a cluster to CSSE32::CLUSTER (note 302) and got many
    useful suggestions. There are pitfalls that one can fall into because
    rules about which conversions can or can't be blocked by what are
    complex and the documentation doesn't make them all that clear.
    But what I want can be made to work.