From: briggs@eisner.decus.org Sent: Wednesday, February 16, 2000 10:13 AM To: Info-VAX@Mvb.Saic.Com Subject: Re: 2 Node cluster quorum question Organization: DECUServe Lines: 79 In article <38A9D704.F34A702A@vl.videotron.ca>, JF Mezei writes: > Jan Vorbrueggen wrote: >> half return later, the connection manager notices that the sequence number >> doesn't match, so it will tell the half attempting to re-join the cluster >> to commit suicide. Thus, the potential writers on node B never get a chance >> to complete their I/O after the state transition has completed. > > So, a satellite node, when it reconnects, will be forced to reboot unless it > can reconnect quickly enough before it is forgotten by the remaining cluster. > > I find it rather heartless from the VMS engineers that the only solution they > could find to this issue was suicide induced by the rest of the cluster. How > would you like it if, after struggling very hard to re-establish a connection > with your friends, your friends would respond "we don't know you anymore, go > and kill yourself ?" Please propose lock manager semantics that allow the system to forcibly release locks that an application thinks it holds. How will the application be notified? How will the application respond? Does this make your life as an application programmer easier or more difficult? Can the existing lock manager system service calls support the proposed semantics? What happens to backwards compatibility if they cannot? Alternately, please propose lock manager semantics that will allow inconsistent locks to be held across nodes in a cluster. How should applications deal with the possibility that an exclusive lock may not guarantee exclusive access? How can the system make life easier on these applications? Will this require changes in the system service call interface? What happens to backwards compatibility? If you bring a satellite node back into a cluster after it has been removed then you have the potential for incompatible locks being held by the satellite and the survivors. Without a way to either release the incompatible locks or live with them, the satellite node cannot be allowed back in. A CLUEXIT bugcheck is probably the least intrusive way to deal with the problem. It is certainly nicer than any other resolution I can think of. At least with a bugcheck, your applications get restarted automatically. An inconsistent lock database problem at cluster merge time is only the tip of the iceberg. There are other issues. For example: Consider the case of a cache consistency lock. All nodes hold the lock in protected read mode, guaranteeing that their cache is consistent. The satellite drops off line and is removed from the cluster. A writer on the surviving members acquires the lock in protected write mode, ringing a doorbell AST on all the readers who then release their lock, invalidate their cache and reacquire the lock. The writer updates backing store and downgrades its lock. Now the satellite comes on line. If we allow it to join the cluster it will have an intact protected read lock and an invalid cache entry whose consistency should have been assured by that lock. Note that at cluster merge time the lock database would have been perfectly consistent. During the cluster partition time there was a point at which the partitioned lock manager database was inconsistent. But we've got no good way of knowing that. There are a number of guarantees in the existing cluster model that are so obvious that we assume them without thinking about it. Guarantee: A node either is a cluster member or is not. Guarantee: All nodes in the cluster know the complete set of other nodes that comprise the cluster. Guarantee: All nodes in the cluster have connectivity with all other nodes in the cluster. Guarantee: Locks have cluster scope. You would have us break one or more of these guarantees. > I think that there should be pressure put on the engineers to find a more > humane solution to this.... Before we pressure the engineers to find a solution, there should be some indication that a solution exists. And there should be some idea of what shape the solution should take. What penalties are you willing to accept in order to achieve the benefits you seek? John Briggs briggs@eisner.decus.org