From: Keith Parris [keithparris_NOSPAM@yahoo.com]
Sent: Wednesday, February 16, 2005 11:30 AM
To: Info-VAX@Mvb.Saic.Com
Subject: Re: Data replication / disaster tolerance

JF Mezei wrote:
> Or you can have one node with 2 votes and one with 1.
> 
>  50% of chances that failure will result in uninterrupted service. (i.e.
> if the earthquake ont strikes the side with the 1 vote node.  And if the
> 2 vote node goes down, you can just reboot the other node into SYSBOOT>
> change its votes to 3 and then continue the boot.

The problem with this approach is that it arbitrarily predetermines 
which of the two sites will continue on any failure that affects 
inter-site connectivity, even if that failure might adversely affect the 
site with more votes. If the "wrong" site continues automatically, 
transactions it subsequently processes can end up being lost, in a 
scenario known by the name of "Creeping Doom". See my presentations on 
disaster tolerance at http://www2.openvms.org/kparris/ for more details.

For this reason, I recommend that balanced votes be used in 2-site VMS 
DT clusters.

> I think that on VAXes, you can type <CTRL-P> at the console to get a
> chance at readjusting quorum on a node that is currently hung due to
> loss of quorum.  Does this work on Alphas as well ?

It is possible to do this on Alpha systems (it's documented in the HP 
OpenVMS System Manager's Manual, Volume 1: Essentials --> Managing 
Storage Media --> Using Interrupt Priority Level C (IPC), at 
http://h71000.www7.hp.com/doc/732FINAL/aa-pv5mh-tk/aa-pv5mh-tk.HTMl -- 
the command is >>> D SIRR C to deposit a value of 12 (decimal), for IPL 
12, into the Software Interrupt Register, and then IPC> Q to adjust 
quorum), but due to some timeouts, it never worked reliably in SMP 
systems, so coming in through RMDRIVER (using Availability Manager, 
DECamds, or DTCS) to initiate the quorum adjustment is the approach most 
favored today.