From: Keith Parris [keithparris_NOSPAM@yahoo.com] Sent: Wednesday, February 16, 2005 11:30 AM To: Info-VAX@Mvb.Saic.Com Subject: Re: Data replication / disaster tolerance JF Mezei wrote: > Or you can have one node with 2 votes and one with 1. > > 50% of chances that failure will result in uninterrupted service. (i.e. > if the earthquake ont strikes the side with the 1 vote node. And if the > 2 vote node goes down, you can just reboot the other node into SYSBOOT> > change its votes to 3 and then continue the boot. The problem with this approach is that it arbitrarily predetermines which of the two sites will continue on any failure that affects inter-site connectivity, even if that failure might adversely affect the site with more votes. If the "wrong" site continues automatically, transactions it subsequently processes can end up being lost, in a scenario known by the name of "Creeping Doom". See my presentations on disaster tolerance at http://www2.openvms.org/kparris/ for more details. For this reason, I recommend that balanced votes be used in 2-site VMS DT clusters. > I think that on VAXes, you can type at the console to get a > chance at readjusting quorum on a node that is currently hung due to > loss of quorum. Does this work on Alphas as well ? It is possible to do this on Alpha systems (it's documented in the HP OpenVMS System Manager's Manual, Volume 1: Essentials --> Managing Storage Media --> Using Interrupt Priority Level C (IPC), at http://h71000.www7.hp.com/doc/732FINAL/aa-pv5mh-tk/aa-pv5mh-tk.HTMl -- the command is >>> D SIRR C to deposit a value of 12 (decimal), for IPL 12, into the Software Interrupt Register, and then IPC> Q to adjust quorum), but due to some timeouts, it never worked reliably in SMP systems, so coming in through RMDRIVER (using Availability Manager, DECamds, or DTCS) to initiate the quorum adjustment is the approach most favored today.