This appendix describes problems that can occur during installation
and how to deal with them.
B.1 Troubleshooting the LAN Interconnect
This section discusses the problems that can occur due to a
misconfigured LAN interconnect and how you can resolve them.
B.1.1 Conflict with Default Physical Cluster Interconnect IP Name
In clusters with a LAN interconnect, the default physical
cluster interconnect IP name has the form
membermemberID-icstcp0.
The
clu_create
and
clu_add_member
commands use
ping
to determine whether
the default name is already in use on the net.
If this check
finds a host already using the default IP name, you are instructed:
Enter the physical cluster interconnect interface device name []
After displaying this message, the command fails. Depending on which command was executing at the time of failure, you get one of the following messages:
Error: clu_create: Bad configuration Error: clu_add_member: Bad configuration
If you see either of these messages, look in
/cluster/admin/clu_create.log
or
/cluster/admin/clu_add_member.log, as appropriate,
for the following error message:
Error: A system with the name 'membermemberID-icstcp0' is currently running on your network.
If you find this message, contact your network administrator
about changing the hostname of the non-cluster system already
using the default IP name.
The
clu_create
and
clu_add_member
commands do not allow you to change the
default physical cluster interconnect IP name.
B.1.2 Booting Member Joins Cluster But Appears to Hang Before Reaching Multi-User Mode
If a new member appears to hang at boot time sometime after joining the cluster, the speed or operational mode of the booting member's LAN interconnect adapter is probably inconsistent with that of the LAN interconnect. This problem can result from the adapter failing to autonegotiate properly, from improper hardware settings, or from faulty Ethernet hardware. To determine whether this problem exists, pay close attention to console messages of the following form on the booting member:
ee0: Parallel Detection, 10 Mbps half duplex ee0: Autonegotiated, 100 Mbps full duplex
For a cluster interconnect running at 100 Mb/s in full-duplex mode, the first message may indicate a problem. The second message indicates that autonegotiation has completed successfully.
The autonegotiation behavior of the Ethernet adapters and switches that are configured in the interconnect may cause unexpected hangs at boot time if you do not take the following considerations into account:
Autonegotiation settings must be the same on both ends of any given cable. That is, if an Ethernet adapter is configured for autonegotiation, the switch port to which it is connected must also be configured for autonegotiation. Similarly, if the adapter is cross-cabled to another member's adapter, the other member's adapter must be set to autonegotiate. If you violate this rule (for example, by setting one end to 100 Mb/s full-duplex, and the other to autonegotiate), the member set to autonegotiate may set itself to half-duplex mode while booting and cluster transactions will experience delays.
Supported 100 Mb/s Ethernet network adapters in AlphaServer systems can
use two different drivers:
ee
and
tu.
Network adapters in the DE50x
family (which have a console name of the form
ew
x0) are based on
the DECchip 21140, 21142, and 21143 chipsets and use the
tu
driver.
If the network adapter uses the
tu
driver, it may or may not support autonegotiation.
Note
DE500-XA adapters do not support autonegotiation. Proper autonegotiation succeeds more often with DE500-BA and DE504 adapters than with DE500-AA adapters.
To use autonegotiation, set the
ewx0_mode
console variable to
auto
and set
the port on the switch connected to the network adapter for
autonegotiation.
With network adapters using the
tu
driver, it may be
easier to force
the adapter to use 100 Mb/s full-duplex mode explicitly.
To force the
adapter to use 100 Mb/s full-duplex mode, set the
ewx0_mode
variable to
FastFD.
In this case, you must use a switch that allows
autonegotiation to be disabled and
set the port on the switch connected the network adapter for 100 Mb/s full-duplex.
See
tu(7)
Network adapters in the DE60x
family
(which have a console name of the form
ei
x0) use the
ee
driver.
Network adapters using the
ee
driver by
default use IEEE 802.3u autonegotiation to determine which speed setting
to use.
Make sure that the port on the switch to which the network adapter
is connected is set for autonegotiation.
See
ee(7)
Supported 1000 Mb/s Ethernet network adapters in the
DEGPA-xx
family
use the
alt
driver.
Network adapters using the
alt
driver by
default use IEEE 802.3u autonegotiation to determine which speed setting
to use.
Make sure that the port on the switch to which the network adapter
is connected is set for autonegotiation.
See
alt(7)
B.1.3 Booting Member Hangs While Trying to Join Cluster
If a new member hangs at boot time while trying to join the cluster, the new member might be disconnected from the cluster interconnect. The following may have caused the disconnect:
A cable is unplugged.
You specified an existing Ethernet adapter
as the physical cluster interconnect interface to
clu_add_member, but that adapter is not connected to
other members (and perhaps is used for a purpose other than as a
LAN interconnect, such as a client network).
You specified an address for the cluster interconnect
physical device that is not on the same subnet as those of other
cluster members.
For example, you may have specified an address on
the cluster interconnect virtual subnet (ics0) for
the member's cluster interconnect physical device.
You specified a different interconnect type for
this member (for example, the
cluster_interconnect
attribute in its
clubase
kernel subsystem is
mct), whereas the rest of the cluster specifies
tcp).
One of the following messages is typically displayed on the console:
CNX MGR: cannot form: quorum disk is in use. Unable to establish contact
with members using disk.
CNX MGR: Node pepperoni id 2 incarn 0xa3a71 attempting to form or join cluster deli
Perform the following steps to resolve this problem:
Halt the booting member.
Make sure the adapter is properly connected to the LAN interconnect.
Mount the new member's boot partition on another member. For example:
# mount root2_domain#root /mnt
Examine the
/mnt/etc/sysconfigtab
file.
The
attributes listed in
Table C-2
must be set correctly
to reflect the member's LAN
interconnect interface.
Edit
/mnt/etc/sysconfigtab
as appropriate.
Unmount the member's boot partition:
# umount /mnt
Reboot the member.
B.1.4 Booting Member Panics with "ics_ll_tcp" Message
If you boot a new member into the cluster and it panics with an
"ics_ll_tcp: Unable to configure cluster interconnect network interface"
message, you may have specified a device that does not exist as
the member's physical cluster interconnect interface to
clu_add_member, or
the booting kernel may not contain the device driver to support
the cluster interconnect device.
Perform the following steps to resolve this problem:
Halt the booting member.
Mount the new member's boot partition on another member. For example:
# mount root2_domain#root /mnt
Examine the
/mnt/etc/sysconfigtab
file.
The
ics_ll_tcp
attributes listed in
Table C-2
must be set to correctly reflect the member's LAN interconnect interface.
If the interface does not exist, do the following:
Edit
/mnt/etc/sysconfigtab
as appropriate.
Unmount the member's boot partition:
# umount /mnt
Reboot the member.
If the interface name is correct, the
vmunix
kernel
may not contain the device driver for the LAN interconnect device.
To
rectify this problem, do the following:
Boot the member on the
genvmunix
kernel.
Edit the
/sys/conf/HOSTNAME
file and add the missing driver.
Rebuild the
vmunix
kernel using the
doconfig
command.
Copy the new kernel to the root (/) directory.
Reboot the member from its
vmunix
kernel.
B.1.5 Booting Member Displays "ics_ll_tcp: ERROR: Could not create a NetRAIN set with the specified members" Message
If you boot a new member into the cluster and it displays the "ics_ll_tcp: ERROR: Could not create a NetRAIN set with the specified members" message shortly after the installation tasks commence, a NetRAIN virtual interface used for the cluster interconnect has may have been misconfigured. You will also see this message if a member of the NetRAIN set has been misconfigured.
Perhaps you have edited the
/etc/rc.config
file to apply traditional NetRAIN admin
to the LAN interconnect.
In this case, the NetRAIN configuration in
the
/etc/rc.config
file is ignored and the NetRAIN
interface defined in
/etc/sysconfigtab
is used
as the cluster interconnect.
You must never configure a NetRAIN set that is used for a cluster
interconnect in the
/etc/rc.config
file.
A NetRAIN device for the cluster interconnect is set up completely
within the ics_ll_tcp kernel subsystem in
/etc/sysconfigtab
and not in
/etc/rc.config.
Perform the following steps to resolve this problem:
Use the
rcmgr delete
command
to edit the newly booted member's
/cluster/members/{memb}/etc/rc.config
file to
remove the
NRDEV_x,
NRCONFIG_x,
NETDEV_x, and
IFCONFIG_x
variables
associated with the device.
Use the
rcmgr set
command to decrement the
NR_DEVICES
and
NUM_NETCONFIG
variables that doubly define the cluster interconnect NetRAIN device.
Reboot the member.
B.2 Dealing with Other Issues
B.2.1 Booting a New Member Without a Cluster License Displays ATTENTION Message
When you boot a newly added member, the
clu_check_config
utility performs a series of configuration checks.
If you have not
yet installed the TruCluster Server license, the
TCS-UA
product authorization key (PAK), on the member, the boot procedure will
display the following messages:
Starting Cluster Configuration Check... The boottime cluster check found a potential problem. For details search for !!!!!ATTENTION!!!!! in /cluster/admin/clu_check_log_hostname check_cdsl_config : Boot Mode : Running /usr/sbin/cdslinvchk in the background check_cdsl_config : Results can be found in : /var/adm/cdsl_check_list clu_check_config : no configuration errors or warnings were detected
The following message appears in the
/cluster/admin/clu_check_log_hostname
file:
/usr/sbin/caad is NOT_RUNNING !!!!!ATTENTION!!!!!
When the TruCluster Server license is not configured on a member, the
cluster application availability (CAA)
daemon (caad) is not
automatically started on that member.
This is normal and expected
behavior.
If you did not configure the license from within
clu_add_member
when you added the new member (as
discussed in
Chapter 5), you can
configure it later using the
lmf register
command.
After
the license has been installed, you can start the CAA daemon on that member
using the
/usr/sbin/caad
command.