From: briggs@eisner.decus.org
Sent: Wednesday, February 16, 2000 10:13 AM
To: Info-VAX@Mvb.Saic.Com
Subject: Re: 2 Node cluster quorum question

Organization: DECUServe
Lines: 79

In article <38A9D704.F34A702A@vl.videotron.ca>, JF Mezei <jfmezei.spamnot@vl.videotron.ca> writes:
> Jan Vorbrueggen wrote:
>> half return later, the connection manager notices that the sequence number
>> doesn't match, so it will tell the half attempting to re-join the cluster
>> to commit suicide. Thus, the potential writers on node B never get a chance
>> to complete their I/O after the state transition has completed.
> 
> So, a satellite node, when it reconnects, will be forced to reboot unless it
> can reconnect quickly enough before it is forgotten by the remaining cluster.
> 
> I find it rather heartless from the VMS engineers that the only solution they
> could find to this issue was suicide induced by the rest of the cluster. How
> would you like it if, after struggling very hard to re-establish a connection
> with your friends, your friends would respond "we don't know you anymore, go
> and kill yourself ?"

Please propose lock manager semantics that allow the system to forcibly
release locks that an application thinks it holds.  How will the
application be notified?  How will the application respond?  Does this
make your life as an application programmer easier or more difficult?
Can the existing lock manager system service calls support the proposed
semantics?  What happens to backwards compatibility if they cannot?

Alternately, please propose lock manager semantics that will allow
inconsistent locks to be held across nodes in a cluster.  How should
applications deal with the possibility that an exclusive lock may not
guarantee exclusive access?  How can the system make life easier on
these applications?  Will this require changes in the system service
call interface?  What happens to backwards compatibility?

If you bring a satellite node back into a cluster after it has been
removed then you have the potential for incompatible locks being held
by the satellite and the survivors.  Without a way to either release
the incompatible locks or live with them, the satellite node cannot be
allowed back in.

A CLUEXIT bugcheck is probably the least intrusive way to deal with
the problem.  It is certainly nicer than any other resolution I can
think of.  At least with a bugcheck, your applications get restarted
automatically.

An inconsistent lock database problem at cluster merge time is only
the tip of the iceberg.  There are other issues.  For example:

Consider the case of a cache consistency lock.  All nodes hold the
lock in protected read mode, guaranteeing that their cache is consistent.
The satellite drops off line and is removed from the cluster.  A writer
on the surviving members acquires the lock in protected write mode,
ringing a doorbell AST on all the readers who then release their lock,
invalidate their cache and reacquire the lock.  The writer updates
backing store and downgrades its lock.  Now the satellite comes on line.
If we allow it to join the cluster it will have an intact protected
read lock and an invalid cache entry whose consistency should have
been assured by that lock.  Note that at cluster merge time the lock
database would have been perfectly consistent.  During the cluster
partition time there was a point at which the partitioned lock manager
database was inconsistent.  But we've got no good way of knowing that.

There are a number of guarantees in the existing cluster model that
are so obvious that we assume them without thinking about it.

Guarantee:  A node either is a cluster member or is not.
Guarantee:  All nodes in the cluster know the complete set of other
	    nodes that comprise the cluster.
Guarantee:  All nodes in the cluster have connectivity with all
	    other nodes in the cluster.
Guarantee:  Locks have cluster scope.

You would have us break one or more of these guarantees.

> I think that there should be pressure put on the engineers to find a more
> humane solution to this....

Before we pressure the engineers to find a solution, there should be
some indication that a solution exists.  And there should be some
idea of what shape the solution should take.  What penalties are
you willing to accept in order to achieve the benefits you seek?

	John Briggs			briggs@eisner.decus.org