This appendix discusses the following topics:
Configuring a NetRAIN virtual interface for a cluster interconnect (Section D.1)
Tuning the LAN interconnect for optimal performance (Section D.2)
Obtaining network adapter configuration information (Section D.3)
Monitoring activity on the LAN interconnect (Section D.4)
Migrating from Memory Channel to a LAN interconnect (Section D.5)
Migrating from a LAN interconnect to Memory Channel (Section D.6)
Migrating from a Fast Ethernet LAN interconnect to a Gigabit Ethernet LAN interconnect (Section D.7)
Troubleshooting LAN interconnect problems (Section D.8)
D.1 Configuring a NetRAIN Virtual Interface for a Cluster LAN Interconnect
If you do not configure the cluster interconnect from redundant array of independent network adapters (NetRAIN) virtual interfaces during cluster installation, you can do so afterwards. However, the requirements and rules for configuring a NetRAIN virtual interface for use in a cluster interconnect differ from those documented in the Tru64 UNIX Network Administration: Connections manual.
Unlike a typical NetRAIN virtual device, a NetRAIN device for the cluster
interconnect is set up completely within the
ics_ll_tcp
kernel subsystem in
/etc/sysconfigtab
and not in
/etc/rc.config.
This allows the interconnect to be
established very early in the boot path, when it is needed by cluster
components to establish membership and transfer I/O.
Caution
Never change the attributes of a member's cluster interconnect NetRAIN device outside of its
/etc/sysconfigtabfile (that is, by using anifconfigcommand or the SysMan Station, or by defining it in the/etc/rc.configfile and restarting the network). Doing so will put the NetRAIN device outside of cluster control and may cause the member system to be removed from the cluster.
To configure a NetRAIN interface for a cluster interconnect after cluster installation, perform the following steps on each member:
To eliminate the LAN interconnect as a single point of failure, one or more Ethernet switches are required for the cluster interconnect (two are required for a no-single-point-of-failure (NSPOF) LAN interconnect configuration), in addition to redundant Ethernet adapters on the member configured as a NetRAIN set. If you must install additional network hardware, halt and turn off the member system. Install the network cards on the member and cable each to different switches, as recommended in the Cluster Hardware Configuration manual. Turn on the switches and reboot the member. If you do not need to install additional hardware, you can skip this step.
Use the
ifconfig -a
command to determine the names of
the Ethernet adapters to be used in the NetRAIN set.
If you intend to configure an existing NetRAIN set for a cluster interconnect (for example, one previously configured for an external network), you must first undo its current configuration:
Use the
rcmgr delete
command to delete the following variables
from the member's
/etc/rc.config
file:
NRDEV_x,
NRCONFIG_x,
NETDEV_x,
IFCONFIG_x, variables
associated with the device.
Use the
rcmgr set
command to decrement the
NR_DEVICES
and
NUM_NETCONFIG
variables.
Edit the
/etc/sysconfigtab
file to add the
new adapter.
For example, change:
ics_ll_tcp: ics_tcp_adapter0 = alt0
to:
ics_ll_tcp: ics_tcp_adapter0 = nr0 ics_tcp_nr0[0] = alt0 ics_tcp_nr0[1] = alt1
Reboot the member. The member is now using the NetRAIN virtual interface as its physical cluster interconnect.
Use the
ifconfig
command to show the NetRAIN device defined with the
CLUIF
flag.
For example:
# ifconfig nr0
nr0: flags=1000c63<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,SIMPLEX,CLUIF>
NetRAIN Attached Interfaces: ( alt0 alt1 ) Active Interface: ( alt0 )
inet 10.1.0.2 netmask ffffff00 broadcast 10.1.0.255 ipmtu 1500
Repeat this procedure for each remaining member.
D.2 Tuning the LAN Interconnect
This section provides guidelines for tuning the LAN interconnect.
Caution
Do not tune a NetRAIN virtual interface being used for a cluster interconnect using those mechanisms used for other NetRAIN devices (including
ifconfig,niffconfig, andniffdcommand options ornetrainorics_ll_tcpkernel subsystem attributes). Doing so is likely to disrupt cluster operation. The cluster software ensures that the NetRAIN device for the cluster interconnect is tuned for optimal cluster operation.
D.2.1 Improving Cluster Interconnect Performance by Setting Its ipmtu Value
Some applications may receive some performance benefit if you
set the IP Maximum Transfer Unit
(MTU) for the cluster interconnect virtual
interface (ics0) on each member to the same value
used by its physical interface
(membern-tcp0).
The recommended value depends on the type of cluster interconnect in use.
For Fast Ethernet or Gigabit Ethernet, set the
ipmtu
value to 1500.
NOTE
The LAN interconnect cannot be configured to take advantage of the performance characteristics of the larger MTU sizes (jumbo frames) offered by Gigabit Ethernet.
For Memory Channel, set the
ipmtu
value to 7000.
To view the current
ipmtu
settings for the virtual and
physical cluster interconnect devices, use the following command:
# ifconfig -a
ee0: flags=1000c63<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,SIMPLEX,CLUIF>
inet 10.1.0.100 netmask ffffff00 broadcast 10.1.0.255 ipmtu 1500
ics0: flags=1100063<UP,BROADCAST,NOTRAILERS,RUNNING,NOCHECKSUM,CLUIF>
inet 10.0.0.1 netmask ffffff00 broadcast 10.0.0.255 ipmtu 7000
Because this cluster member is using the
ee0
Ethernet
device for its physical cluster interconnect device, change the
ipmtu
for its virtual cluster interconnect device
(ics0) from 7000 to 1500.
To set the
ipmtu
value for the
ics0
virtual device, perform the following procedure:
Add the following line to the
/etc/inet.local
file on each member, supplying an
ipmtu
value:
ifconfig ics0 ipmtu value
Restart the network on each member using the
rcinet
restart
command.
D.2.2 Set Ethernet Switch Address Aging to 15 Seconds
Ethernet switches maintain tables that associate MAC addresses (and virtual LAN (VLAN) identifiers) with ports, thus allowing the switches to efficiently forward packets. These forwarding databases (also known as unicast address tables) provide a mechanism for setting the time interval when dynamically learned forwarding information grows stale and is invalidated. This mechanism is sometimes referred to as the aging time.
For any Ethernet switch participating in a LAN interconnect, set its aging time to 15 seconds.
Failure to do so may cause the switch to erroneously continue to route packets for a given MAC address to a port listed in the forwarding table after the MAC address has moved to another port (for example, due to NetRAIN failover). This may disrupt cluster communication and result in one or more nodes being removed from the cluster. The consequence may be that one or more nodes hang due to loss of quorum, but may also result in one of several panic messages. For example:
CNX MGR: this node removed from cluster CNX QDISK: Yielding to foreign owner
D.3 Obtaining Network Adapter Configuration Information
To display information from the datalink driver for a network adapter,
such as its name, speed, and operating mode, use the SysMan Station
or the
hwmgr
-get attr -cat network
command.
In the following example,
tu2
is the client network adapter running at
10 Mb/s in half-duplex mode and
ee0
and
ee1
are a NetRAIN virtual interface configured as the
LAN interconnect and running at 100 Mb/s in full-duplex mode:
# hwmgr -get attr -cat network | grep -E 'name|speed|duplex' # hwmgr -get attr -cat network | grep -E 'name|speed|duplex' name = ee0 media_speed = 100 full_duplex = 1 user_name = (null) (settable) name = ee1 media_speed = 100 full_duplex = 1 user_name = (null) (settable) name = tu0 media_speed = 10 full_duplex = 1 user_name = (null) (settable) name = tu1 media_speed = 10 full_duplex = 0 user_name = (null) (settable) name = tu2 media_speed = 10 full_duplex = 0 user_name = (null) (settable) name = tu3 media_speed = 10 full_duplex = 0 user_name = (null) (settable) name = alt0 media_speed = 1000 full_duplex = 1 user_name = (null) (settable)
D.4 Monitoring LAN Interconnect Activity
Use the
netstat
command to monitor the traffic across
the LAN interconnect.
For example:
# netstat -acdnots -I nr0
nr0 Ethernet counters at Mon Apr 30 14:15:15 2001
65535 seconds since last zeroed
3408205675 bytes received
4050893586 bytes sent
7013551 data blocks received
6926304 data blocks sent
7578066 multicast bytes received
115546 multicast blocks received
3182180 multicast bytes sent
51014 multicast blocks sent
0 blocks sent, initially deferred
0 blocks sent, single collision
0 blocks sent, multiple collisions
0 send failures
0 collision detect check failure
0 receive failures
0 unrecognized frame destination
0 data overruns
0 system buffer unavailable
0 user buffer unavailable
nr0: access filter is disabled
Use the
ifconfig -a
and
niffconfig -v
commands to monitor the status of the active and inactive adapters in a
NetRAIN virtual interface.
# ifconfig -a
ee0: flags=1000c63<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,SIMPLEX,CLUIF>
NetRAIN Virtual Interface: nr0
NetRAIN Attached Interfaces: ( ee1 ee0 ) Active Interface: ( ee1 )
ee1: flags=1000c63<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,SIMPLEX,CLUIF>
NetRAIN Virtual Interface: nr0
NetRAIN Attached Interfaces: ( ee1 ee0 ) Active Interface: ( ee1 )
ics0: flags=1100063<UP,BROADCAST,NOTRAILERS,RUNNING,NOCHECKSUM,CLUIF>
inet 10.0.0.200 netmask ffffff00 broadcast 10.0.0.255 ipmtu 15u00
lo0: flags=100c89<UP,LOOPBACK,NOARP,MULTICAST,SIMPLEX,NOCHECKSUM>
inet 127.0.0.1 netmask ff000000 ipmtu 4096
nr0: flags=1000c63<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,SIMPLEX,CLUIF>
NetRAIN Attached Interfaces: ( ee1 ee0 ) Active Interface: ( ee1 )
inet 10.1.0.2 netmask ffffff00 broadcast 10.1.0.255 ipmtu 1500
sl0: flags=10<POINTOPOINT>
tu0: flags=c63<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST,SIMPLEX>
inet 16.140.112.176 netmask ffffff00 broadcast 16.140.112.255 ipmtu 1500
tun0: flags=80<NOARP>
# niffconfig -v
Interface: ee1, description: NetRAIN internal, status: UP, event: ALERT, state: GREEN
t1: 3, dt: 2, t2: 10, time to dead: 3, current_interval: 3, next time: 1
Interface: nr0, description: NetRAIN internal, status: UP, event: ALERT, state: GREEN
t1: 3, dt: 2, t2: 10, time to dead: 3, current_interval: 3, next time: 1
Interface: ee0, description: NetRAIN internal, status: UP, event: ALERT, state: GREEN
t1: 3, dt: 2, t2: 10, time to dead: 3, current_interval: 3, next time: 2
Interface: tu0, description: , status: UP, event: ALERT, state: GREEN
t1: 20, dt: 5, t2: 60, time to dead: 30, current_interval: 20, next time: 20
D.5 Migrating from Memory Channel to LAN
This section discusses how to migrate a cluster that uses Memory Channel as its cluster interconnect to a LAN interconnect.
Replacing a Memory Channel interconnect with a LAN interconnect requires some cluster downtime and interruption of service.
Note
If you are performing a rolling upgrade (as described in the Cluster Installation manual) and intend to replace the Memory Channel with a LAN interconnect, plan on installing the LAN hardware on each member during the roll. Doing so allows you to avoid performing steps 1 through 4 in the following procedure.
To prepare to migrate an existing cluster using the Memory Channel interconnect to using a LAN interconnect, perform the following procedure for each cluster member:
Halt and turn off the cluster member.
Install the network adapters. Configure any required switches or hubs.
Turn on the cluster member.
Boot the member on the
genvmunix
kernel
over Memory Channel into the cluster.
Rebuild the
vmunix
kernel using the
doconfig.
Copy the new kernel to the root (/) directory.
At this point, you can configure the newly installed Ethernet
hardware as a private conventional subnet shared by all cluster members.
You can verify that the hardware is configured properly
and operates correctly before setting it up as a LAN interconnect.
Do not use the
rcmgr
command or statically edit
the
/etc/rc.config
file to permanently set up
this network.
Because this test network must not survive the reboot of the
cluster over the LAN interconnect, use
ifconfig
commands on each member to set it up.
To configure the LAN interconnect, perform the following steps.
Note
If you intend to remove the Memory Channel hardware (particularly the hubs) from the cluster, perform the first 7 steps in this procedure. After halting all members, power off each member. You can then remove the Memory Channel hardware. Power on all the members and then boot them all one by one, as described in Step 8.
Shutting off a Memory Channel hub in a running cluster, even one that uses a LAN interconnect, causes the entire cluster to panic. Disconnecting a member's Memory Channel cable from a hub causes that member to panic.
On each member, make backup copies of the member-specific
/etc/sysconfigtab
and
/etc/rc.config
files.
On each member, inspect the member-specific
/etc/rc.config
file, paying special attention to
the
NETDEV_x
and
NRDEV_x
configuration variables.
Because the network adapters used for the LAN interconnect
must be configured very early in the boot process, they are defined
in
/etc/sysconfigtab
(see next step) and must not be
defined in
/etc/rc.config.
This applies to NetRAIN
devices also.
Decide whether you are configuring new devices or
reconfiguring old devices for the LAN interconnect.
If the latter, you
must make appropriate edits to the
NRDEV_x,
NRCONFIG_x,
NETDEV_x,
IFCONFIG_x,
NR_DEVICES
and
NUM_NETCONFIG
variables so that the same network device names do not appear both
in the
/etc/rc.config
file and the
ics_ll_tcp
stanza of the
/etc/sysconfigtab
file.
On each member, set the
clubase
kernel
attribute
cluster_interconnect
to
tcp
and the following
ics_ll_tcp
kernel attributes as
appropriate for the member's network configuration.
For example:
clubase: cluster_interconnect = tcp # ics_ll_tcp: ics_tcp_adapter0 = nr0 ics_tcp_nr0[0] = ee0 ics_tcp_nr0[1] = ee1 ics_tcp_inetaddr0 = 10.1.0.1 ics_tcp_netmask0 = 255.255.255.0
For a cluster that was rolled from
TruCluster Server Version 5.1, also edit the
cluster_node_inter_name
attribute of the
clubase
kernel subsystem.
For example:
clubase: cluster_node_inter_name = pepicelli-ics0
Edit the clusterwide
/etc/hosts
file so that it
contains the IP name and IP address of the cluster interconnect low-level
TCP interfaces.
For example:
127.0.0.1 localhost 16.140.112.238 pepicelli.zk3.dec.com pepicelli 16.120.112.209 deli.zk3.dec.com deli 10.0.0.1 pepicelli-ics0 10.1.0.1 member1-icstcp0 10.0.0.2 pepperoni-ics0 10.1.0.2 member2-icstcp0 16.140.112.176 pepperoni.zk3.dec.com pepperoni
For a cluster that was rolled from
TruCluster Server Version 5.1, edit the clusterwide
/etc/hosts.equiv
file and the clusterwide
/.rhosts
file, changing the
mc0
entries to
ics0
entries.
For example, change:
deli.zk3.dec.com pepicelli-mc0 pepperoni-mc0
to:
deli.zk3.dec.com pepicelli-ics0 member1-icstcp0 pepperoni-ics0 member2-icstcp0
For a cluster that was rolled from
TruCluster Server Version 5.1, use the
rcmgr set
command
to change the
CLUSTER_NET
variable in the
/etc/rc.config
file on each member.
For example:
# rcmgr get CLUSTER_NET pepicelli-mc0 # rcmgr set CLUSTER_NET pepicelli-ics0
Halt all cluster members.
Boot all cluster members, one at a time.
D.6 Migrating from LAN to Memory Channel
This section discusses how to migrate a cluster that uses a LAN interconnect as its cluster interconnect to Memory Channel.
To configure the Memory Channel, perform the following steps:
Power off all members.
Install and configure Memory Channel adapters, cables, and hubs as described in the Cluster Hardware Configuration manual.
Reboot each member on the
genvmunix
kernel
over the LAN interconnect into the cluster.
Rebuild each member's
vmunix
kernel using the
doconfig.
Copy the new kernel of each member to the member's
root (/) directory.
On each member, make a backup copy of the member-specific
/etc/sysconfigtab
file.
On each member, set the
clubase
kernel
attribute
cluster_interconnect
to
mct.
Halt all cluster members.
Reboot all cluster members one at a time.
D.7 Migrating from Fast Ethernet LAN to Gigabit Ethernet LAN
This section discusses how to migrate a cluster that uses a Fast Ethernet LAN interconnect to use a Gigabit Ethernet LAN interconnect.
Replacing a Fast Ethernet LAN interconnect with a Gigabit Ethernet LAN interconnect requires some cluster downtime and interruption of service.
To prepare to migrate an existing cluster using a Fast Ethernet LAN interconnect to a Gigabit LAN interconnect, perform the following procedure for each cluster member:
Halt and turn off the cluster member.
Install the Gigabit Ethernet network adapters. Configure any required switches or hubs.
Turn on the cluster member.
Reboot the member on the
genvmunix
kernel
over the Fast Ethernet LAN interconnect into the cluster.
Rebuild the member's
vmunix
kernel using the
doconfig
command.
Copy the new kernel to the member's
root (/) directory.
At this point, you can configure the newly installed Gigabit Ethernet
hardware as a private conventional subnet shared by all cluster members.
You can verify that the hardware is configured properly
and operates correctly before setting it up as a LAN interconnect.
Do not use the
rcmgr
command or statically edit
the
/etc/rc.config
file to permanently set up
this network.
Because this test network must not survive the reboot of the
cluster over the LAN interconnect, use
ifconfig
commands on each member to set it up.
To configure the Gigabit Ethernet LAN interconnect, perform the following steps:
On each member, make a backup copy of the member-specific
/etc/sysconfigtab
file.
On each member, inspect the member-specific
/etc/rc.config
file, paying special attention to
the
NETDEV_x
and
NRDEV_x
configuration variables.
Because the network adapters used for the LAN interconnect
must be configured very early in the boot process, they are defined
in
/etc/sysconfigtab
(see next step) and must not be
defined in
/etc/rc.config.
This applies to NetRAIN
devices also.
Decide whether you are configuring new devices or
reconfiguring old devices for the LAN interconnect.
If the latter, you
must make appropriate edits to the
NRDEV_x,
NRCONFIG_x,
NETDEV_x,
IFCONFIG_x,
NR_DEVICES
and
NUM_NETCONFIG
variables so that the same network device names do not appear both
in the
/etc/rc.config
file and the
ics_ll_tcp
stanza of the
/etc/sysconfigtab
file.
On each member, set the following
ics_ll_tcp
kernel attributes as
appropriate for the member's network configuration.
For example:
clubase: cluster_interconnect = tcp # ics_ll_tcp: ics_tcp_adapter0 = nr0 ics_tcp_nr0[0] = alt0 ics_tcp_nr0[1] = alt1 ics_tcp_inetaddr0 = 10.1.0.100 ics_tcp_netmask0 = 255.255.255.0
Halt all cluster members.
Boot all cluster members, one at a time.
This section discusses the following problems that can occur due to a misconfigured LAN interconnect and how you can resolve them:
A booting member displays hundreds of broadcast errors and panics an existing member (Section D.8.1).
An
ifconfig
nrx
switch command
fails with a "No such device nr0" message (Section D.8.2).
An application running in the cluster cannot bind to a well-known port (Section D.8.3).
D.8.1 Many Broadcast Errors on Booting or Booting New Member Panics Existing Member
The Spanning Tree Protocol (STP) must be disabled on all Ethernet switch ports connected to the adapters on cluster members, whether they are single adapters or included in the NetRAIN virtual interfaces. If this is not the case, cluster members may be flooded by broadcast messages that, in effect, create denial-of-service symptoms in the cluster. You may see hundreds of instances of the following message when booting the first and subsequent members:
arp: local IP address 10.1.0.100 in use by hardware address 00-00-00-00-00-00
These messages will be followed by:
CNX MGR: cnx_pinger: broadcast problem: err 35
Booting additional members into this cluster may result in a hang or panic of existing members, especially if a quorum disk is configured. During the boot, you may see the following message:
CNX MGR: cannot form: quorum disk is in use. Unable to establish contact with members using disk.
However, after 30 seconds or so, the member may succeed in discovering the
quorum disk and form its own cluster, while the existing members hang or
panic.
D.8.2 Cannot Manually Fail Over Devices in a NetRAIN Virtual Interface
NetRAIN monitors the health of inactive interfaces by determining whether
they are receiving packets and, if necessary, by sending probe packets from the
active interface.
If an inactive interface becomes disconnected,
NetRAIN may mark it as
DEAD.
If you pull the cables on the active adapter, NetRAIN
attempts to activate
the
DEAD
standby adapter.
Unless there is a real
problem with this adapter, the failover works properly.
However, a manual NetRAIN switch operation (for example,
ifconfig
nr0 switch) behaves in a different way.
In this case, NetRAIN
does not attempt to fail over to a
DEAD
adapter when there are no healthy standby adapters.
The
ifconfig nr0 switch
command returns
a message such as the following:
ifconfig ioctl (SIOCIFSWITCH) No such device nr0
You may see this behavior in a dual-switch configuration if one switch is
power cycled and you immediately try to manually fail over an active
adapter from the other switch.
After the switch that has been powered on
has initialized itself (in a few minutes or so), manual NetRAIN failover
normally behaves properly.
If the failover does not work correctly, examine the cabling of the switches and
adapters and use the
ifconfig
and
niffconfig
commands to determine the state of the interfaces.
D.8.3 Applications Unable to Map to Port
By default, the communications subsystem in a cluster using a LAN interconnect uses port 900 as a rendezvous port for cluster broadcast traffic and reserves ports 901 through 910 and 912 through 917 for nonbroadcast channels. If an application uses a hardcoded reference to one of these ports, it will fail to bind to the port.
To remedy this situation, change the ports used by the LAN interconnect.
Edit the
ics_tcp_rendezvous_port
and
ics_tcp_ports
attributes in the
ics_ll_tcp
subsystem, as described in
sys_attrs_ics_ll_tcp(5)