7    Managing Network Services

The TruCluster Server Cluster Installation manual describes how to initially configure network services. We strongly suggest you configure network services before the cluster is created. If you wait until after cluster creation to set up services, the process can be more complicated.

This chapter describes the procedures to set up network services after cluster creation. The chapter discusses the following topics:

7.1    Configuring DHCP

A cluster can be a highly available Dynamic Host Configuration Protocol (DHCP) server. It cannot be a DHCP client. A cluster must use static addressing. On a cluster, DHCP runs as a single-instance application with cluster application availability (CAA) providing failover. At any one time, only one member of the cluster is the DHCP server. If failover occurs, the new DHCP server uses the same common database that was used by the previous server.

The DHCP server attempts to match its host name and IP address with the configuration in the DHCP database. If you configure the database with the host name and IP address of a cluster member, problems can result. If the member goes down, DHCP automatically fails over to another member, but the host name and IP address of this new DHCP server does not match the entry in the database. To avoid this and other problems, follow these steps:

  1. Familiarize yourself with the DHCP server configuration process that is described in the chapter on DHCP in the Tru64 UNIX Network Administration: Connections manual.

  2. On the cluster member that you want to act as the initial DHCP server, run /usr/bin/X11/xjoin and configure DHCP.

  3. Select Server/Security.

  4. From the pulldown menu that currently shows Server/Security Parameters, select IP Ranges.

  5. Set the DHCP Server entry to the IP address of the default cluster alias.

    There can be multiple entries for the DHCP Server IP address in the DHCP database. You might find it more convenient to use the jdbdump command to generate a text file representation of the DHCP database. Then use a text editor to change all the occurrences of the original DHCP server IP address to the cluster alias IP address. Finally, use jdbmod to repopulate the DHCP database from the file you edited. For example:

    # jdbdump > dhcp_db.txt
    # vi dhcp_db.txt
     
    

    Edit dhcp_db.txt and change the owner IP address to the IP address of the default cluster alias.

    Update the database with your changes by entering the following command:

    # jdbmod -e dhcp_db.txt
     
    

  6. When you finish with xjoin, make DHCP a highly available application. DHCP already has an action script and a resource profile, and it is already registered with the CAA daemon. To start DHCP with CAA, enter the following command:

    # caa_start dhcp
     
    

  7. Edit /etc/join/server.pcy and add the following line:

    canonical_name  cluster_alias
     
    

    where cluster_alias is the default cluster alias.

  8. Stop DHCP and then restart it:

    # caa_stop dhcp
    # caa_start dhcp
     
    

For information about highly available applications and CAA, see the TruCluster Server Cluster Highly Available Applications manual.

7.2    Configuring NIS

To provide high availability, the Network Information Service (NIS) daemons ypxfrd and rpc.yppasswdd run on every cluster member.

As described in Section 3.1, the ports that are used by services that are accessed through a cluster alias are defined as either in_single or in_multi. (These definitions have nothing to do with whether the service can or cannot run on more than one cluster member at the same time.)

ypxfrd runs as an in_multi service, which means that the cluster alias subsystem routes connection requests and packets for that service to all eligible members of the alias.

rpc.yppasswdd runs as an in_single service, which means that only one alias member receives connection requests or packets that are addressed to the service. If that member becomes unavailable, the cluster alias subsystem selects another member of the alias as the recipient for all requests and packets addressed to the service.

NIS parameters are stored in /etc/rc.config.common. The database files are in the /var/yp/src directory. Both rc.config.common and the databases are shared by all cluster members. The cluster is a slave, a master, or a client. The functions of slave, master, and client cannot be mixed among individual cluster members.

If you configured NIS at the time of cluster creation, then as far as NIS is concerned, you need do nothing when adding or removing cluster members.

To configure NIS after the cluster is running, follow these steps:

  1. Run the nissetup command and configure NIS according to the instructions in the chapter on NIS in the Tru64 UNIX Network Administration: Services manual.

    You have to supply the host names that NIS binds to. Include the cluster alias in your list of host names.

  2. On each cluster member, enter the following commands:

    # /sbin/init.d/nis stop
    # /sbin/init.d/nis start
     
    

7.2.1    Configuring an NIS Master in a Cluster with Enhanced Security

You can configure an NIS master to provide extended user profiles and to use the protected password database. For information about NIS and enhanced security features, see the Tru64 UNIX Security Administration manual. For details on configuring NIS with enhanced security, see the appendix on enhanced security in a cluster in the same manual.

7.3    Configuring Printing

With a few exceptions, printer setup on a cluster is the same as printer setup on a standalone Tru64 UNIX system. See the Tru64 UNIX System Administration manual for general information about managing the printer system.

In a cluster, a member can submit a print job to any printer anywhere in the cluster. A printer daemon, lpd, runs on each cluster member. This parent daemon serves both local lpr requests and incoming remote job requests.

The parent printer daemon that runs on each node uses /var/spool/lpd, which is a context-dependent symbolic link (CDSL) to /cluster/members/{memb}/spool/lpd. Do not use /var/spool/lpd for any other purpose.

Each printer that is local to the cluster has its own spooling directory, which is located by convention under /usr/spool. The spooling directory must not be a CDSL.

A new printer characteristic, :on, has been introduced to support printing in clusters. To configure a printer, run either printconfig or lprsetup on any cluster member.

If a printer is a local device that is connected to a member via a COM port (/dev/tty01) or a parallel port (/dev/lp0), then set :on to the name of the member where the printer is connected. For example, :on=memberA

The printer is connected to the member memberA.

When configuring a network printer that is connected via TCP/IP, you have two choices for values for the :on characteristic:

Using Advanced Printing Software

For information on installing and using Advanced Printing Software in a cluster, see the Tru64 UNIX Advanced Printing Software System Administration and Operation Guide.

7.4    Configuring DNS/BIND

Configuring a cluster as a Berkeley Internet Name Domain (BIND) server is similar to configuring an individual Tru64 UNIX system as a BIND server. In a cluster, the named daemon runs on a single cluster member, and that system is the actual BIND server. The cluster alias handles queries, so that it appears the entire cluster is the server. Failover is provided by CAA. If the serving member becomes unavailable, CAA starts the named daemon on another member.

The bindconfig command lets you specify a cluster as a client or server, but not both. This choice is somewhat misleading because a cluster can act as both a client and a server. In particular, when a cluster is configured as a BIND server, and no name servers are already specified in /etc/resolv.conf, it is also automatically configured as a BIND client of itself:

When a cluster is configured as a BIND server and no name servers are specified in /etc/resolv.conf, bindconfig automatically adds the cluster alias as the first name server in /etc/resolv.conf. This might occur if you configure the system as a BIND server during an initial installation. However, if a cluster was initially set up to be a BIND client and you then run bindconfig to make it a BIND server, it is likely that /etc/resolv.conf already specifies at least one name server. In this case, bindconfig does not automatically add the cluster alias as the first name server. To change this, use bindconfig.

Because BIND environment variables are stored in /etc/rc.config.common, which is a clusterwide file, all cluster members are configured identically at boot time. Likewise, because /etc/resolv.conf is a clusterwide file, all cluster members use the same name servers.

Whether you configure BIND at the time of cluster creation or after the cluster is running, the process is the same.

To configure a cluster as either a BIND server or client, use the command bindconfig or sysman dns.

It does not matter on which member you run the command. If you are configuring a BIND server, CAA determines the member on which the named name server runs. The sysman -focus option is not applicable for configuring BIND because you are not configuring a particular member to be a client or server, but rather, you are configuring the entire cluster as a client or server. That is, the named name server does not necessarily run on the member on which you are running the BIND server configuration; CAA starts named on one of the members.

The /etc/resolv.conf and /etc/svc.conf files are clusterwide files.

For details on configuring BIND, see the chapter on the Domain Name System (DNS) in the Tru64 UNIX Network Administration: Services manual.

7.5    Managing Time Synchronization

All cluster members need time synchronization. The Network Time Protocol (NTP) meets this requirement. Because of this, the clu_create command configures NTP on the initial cluster member at the time of cluster creation, and NTP is automatically configured on each member as it is added to the cluster. All members are configured as NTP peers.

If your site chooses not to use NTP, make sure that whatever time service you use meets the granularity specifications that are defined in RFC 1035 Network Time Protocol (Version 3) Specification, Implementation and Analysis.

Because the system times of cluster members should not vary by more than a few seconds, we do not recommend using the timed daemon to synchronize the time.

7.5.1    Configuring NTP

The Cluster Installation manual recommends that you configure NTP on the Tru64 UNIX system before you install the cluster software that makes the system the initial cluster member. If you did not do this, clu_create and clu_add_member configure NTP automatically on each cluster member. In this configuration, the NTP server for each member is localhost. Members are set up as NTP peers of each other, and use the IP address of their cluster interconnect interfaces.

The localhost entry is used only when the member is the only node running. The peer entries act to keep all cluster members synchronized so that the time offset is in microseconds across the cluster. Do not change these initial server and peer entries even if you later change the NTP configuration and add external servers.

To change the NTP configuration after the cluster is running, you must run either ntpconfig or sysman ntp on each cluster member. These commands always act on a single cluster member. You can either log in to each member or you can use the -focus option to sysman in order to designate the member on which you want to configure NTP. Starting and stopping the NTP daemon, xntpd, is potentially disruptive to the operation of the cluster, and should be performed on only one member at a time.

When you use sysman to learn the status of the NTP daemon, you can get the status for either the entire cluster or a single member.

7.5.2    Using the Same External NTP Servers for All Members

You can add an external NTP server to just one member of the cluster. However, this creates a single point of failure. To avoid this, add the same set of external servers to all cluster members.

We strongly recommend that the list of external NTP servers be the same on all members. If you configure differing lists of external servers from member to member, you must ensure that the servers are all at the same stratum level and that the time differential between them is very small.

7.5.2.1    Time Drift

If you notice a time drift among cluster members, you need to resynchronize members with each other. To do this you must log on to each member of the cluster and enter the ntp -s -f command and specify the cluster interconnect name of a member other than the one where you are logged on. By default a cluster interconnect name is the short form of the hostname with -ics0 appended. For example, if provolone is a cluster member, and you are logged on to a member other than provolone, enter the following command:

# ntp -s -f provolone-ics0
 

You then log on to the other cluster members and repeat this command, in each case using a cluster interconnect name other than the one of the system where you are logged on.

7.6    Managing NFS

A cluster can provide highly available network file system (NFS) service. When a cluster acts as an NFS server, client systems that are external to the cluster see it as a single system with the cluster alias as its name. When a cluster acts as an NFS client, an NFS file system that is external to the cluster that is mounted by one cluster member is accessible to all cluster members. File accesses are funneled through the mounting member to the external NFS server. The external NFS server sees the cluster as a set of independent nodes and is not aware that the cluster members are sharing the file system.

7.6.1    Configuring NFS

To configure NFS, use the nfsconfig or sysman nfs command.

Note

Do not use the nfssetup command in a cluster. It is not cluster-aware and will incorrectly configure NFS.

One or more cluster members can run NFS daemons and the mount daemons, as well as client versions of lockd and statd.

With nfsconfig or sysman nfs, you can perform the following tasks:

To configure NFS on a specific member, use the -focus option to sysman.

When you configure NFS without any focus, the configuration applies to the entire cluster and is saved in /etc/rc.config.common. If a focus is specified, then the configuration applies to only the specified cluster member and is saved in the CDSL file /etc/rc.config for that member.

Local NFS configurations override the clusterwide configuration. For example, if you configure member mutt as not being an NFS server, then mutt is not affected when you configure the entire cluster as a server; mutt continues not to be a server.

For a more interesting example, suppose you have a three-member cluster with members alpha, beta, and gamma. Suppose you configure 8 TCP server threads clusterwide. If you then set focus on member alpha and configure 10 TCP server threads, the ps command will show 10 TCP server threads on alpha, but only 8 on members beta and gamma. If you then set focus clusterwide and set the value from 8 TCP server threads to 12, alpha still has 10 TCP server threads, but beta and gamma now each have 12 TCP server threads.

If a member runs nfsd, it must also run mountd, and vice versa. This behavior occurs automatically when you configure NFS with nfsconfig or sysman nfs.

If locking is enabled on a cluster member, then the rpc.lockd and rpc.statd daemons are started on the member. If locking is configured clusterwide, then the lockd and statd run clusterwide (rpc.lockd -c and rpc.statd -c), and the daemons are highly available and are managed by CAA. The server uses the default cluster alias or an alias that is specified in /etc/exports.aliases as its address.

When a cluster acts as an NFS server, client systems that are external to the cluster see it as a single system with the cluster alias as its name. Client systems that mount directories with CDSLs in them see only those paths that are on the cluster member that is running the clusterwide statd and lockd pair.

You can start and stop services either on a specific member or on the entire cluster. Typically, you do not need to manage the clusterwide lockd and statd pair. However, if you do need to stop the daemons, enter the following command:

# caa_stop cluster_lockd
 

To start the daemons, enter the following command:

# caa_start cluster_lockd
 

To relocate the server lockd and statd pair to a different member, enter the caa_relocate command as follows:

# caa_relocate cluster_lockd
 

For more information about starting and stopping highly available applications, see Chapter 8.

7.6.2    Considerations for Using NFS in a Cluster

This section describes the differences between using NFS in a cluster and in a standalone system.

7.6.2.1    CFS Support of NFS File Systems

CFS supports the Network File System (NFS) client for read/write access. When a file system is NFS-mounted in a cluster, CFS makes it available for read/write access from all cluster members. The member that has actually mounted it serves the file system to other cluster members.

If the member that has mounted the NFS file system shuts down or fails, the file system is automatically unmounted and CFS begins to clean up the mount points. During the cleanup process, members that access these mount points may see various types of behavior, depending upon how far the cleanup has progressed:

Until the CFS cleanup is complete, members may still be able to create new files at the NFS file system's local mount point (or in any directories that were created locally beneath that mount point).

An NFS file system does not automatically fail over to another cluster member unless you are using AutoFS or Automount. Rather, you must manually remount it, on the same mount point or another, from another cluster member to make it available again. Alternatively, booting a cluster member will remount those file systems that are listed in the /etc/fstab file that are not currently mounted and served in the cluster.

7.6.2.2    Clients Must Use a Cluster Alias

When a cluster acts as an NFS server, clients must use the default cluster alias, or an alias that is listed in /etc/exports.aliases, to specify the host when mounting file systems served by the cluster. If a node that is external to the cluster attempts to mount a file system from the cluster and the node does not use the default cluster alias, or an alias that is listed in /etc/exports.aliases, a "connection refused" error is returned to the external node.

Other commands that run through mountd, like umount and export, receive a "Program unavailable" error when the commands are sent from external clients and do not use the default cluster alias or an alias listed in /etc/exports.aliases.

Before configuring additional aliases for use as NFS servers, read the sections in the Cluster Technical Overview manual that discuss how NFS and the cluster alias subsystem interact for NFS, TCP, and User Datagram Protocol (UDP) traffic. Also see exports.aliases(4) and the comments at the beginning of the /etc/exports.aliases file.

7.6.2.3    Using CDSLs to Mount NFS File Systems

When a cluster acts as an NFS client, an NFS file system that is mounted by one cluster member is accessible to all cluster members: the cluster file system (CFS) funnels file accesses through the mounting member to the external NFS server. That is, the cluster member performing the mount becomes the CFS server for the NFS file system and is the node that communicates with the external NFS server. By maintaining cache coherency across cluster members, CFS guarantees that all members at all times have the same view of the NFS file system.

However, in the event that the mounting member becomes unavailable, failover does not occur. Access to the NFS file system is lost until another cluster member mounts the NFS file system.

You can address this possible loss of file system availability in several ways. Using AutoFS to provide automatic failover of NFS file systems might be the most robust solution because it allows for both availability and cache coherency across cluster members. Using AutoFS in a cluster environment is described in Section 7.6.3.

As an alternative to using AutoFS, you can use the mkcdsl -a command to convert a mount point into a CDSL. Doing so will copy an existing directory to a member-specific area on all members. You then use the CDSL as the mount point for the NFS file system. In this scenario, only one NFS server exists for the file system, but each cluster member is an NFS client. Cluster members are not dependent on one cluster member functioning as the CFS server of the NFS file system. If one cluster member becomes unavailable, access to the NFS file system by the other cluster members is not affected. However, cache coherency across cluster members is not provided by CFS: the cluster members rely on NFS to maintain the cache coherency using the usual NFS methods, which do not provide single-system semantics.

If relying on NFS to provide the file system integrity is acceptable in your environment, perform the following steps to use a CDSL as the mount point:

  1. Create the mount point if one does not already exist.

    # mkdir /mountpoint
    

  2. Use the mkcdsl -a command to convert the directory into a CDSL. This will copy an existing directory to a member-specific area on all members.

    # mkcdsl -a /mountpoint
    

  3. Mount the NFS file system on each cluster member, using the same NFS server.

    # mount server:/filesystem  /mountpoint
    

We recommend adding the mount information to the /etc/fstab file so that the mount is performed automatically on each cluster member.

7.6.2.4    Loopback Mounts Not Supported

NFS loopback mounts do not work in a cluster. Attempts to NFS-mount a file system that is served by the cluster onto a directory on the cluster fail and return the message, Operation not supported.

7.6.2.5    Do Not Mount Non-NFS File Systems on NFS-Mounted Paths

CFS does not permit non-NFS file systems to be mounted on NFS-mounted paths. This limitation prevents problems with availability of the physical file system in the event that the serving cluster member goes down.

7.6.3    Using AutoFS in a Cluster

If you want automatic mounting of NFS file systems, use AutoFS. AutoFS provides automatic failover of the automounting service by means of CAA. One member acts as the CFS server for automounted file systems, and runs the one active copy of the AutoFS daemon, autofsd. If this member fails, CAA starts autofsd on another member.

For instructions on configuring AutoFS, see the section on automatically mounting a remote file system in the Tru64 UNIX Network Administration: Services manual. After you have configured AutoFS, you must start the daemon as follows:

# caa_start autofs

If you want the autofs resource to start automatically, use the /usr/sbin/caa_profile -update command to set auto_start profile option to 1.

If you use AutoFS, keep in mind the following:

In TruCluster Server Version 5.1A, the value of the SCRIPT_TIMEOUT attribute was increased to 3600 to reduce the possibility of the autofs timing out. You can increase this value, but we recommend that you do not decrease it.

In previous versions of TruCluster Server, depending on the number of file systems being imported, the speeds of datalinks, and the distribution of imported file systems among servers, you might see a CAA message like the following:

# CAAD[564686]: RTD #0: Action Script \
/var/cluster/caa/script/autofs.scr(start) timed out! (timeout=180) 

In this situation, you need to increase the value of the SCRIPT_TIMEOUT attribute in the CAA profile for autofs to a value greater than 180. You can do this by editing /var/cluster/caa/profile/autofs.cap, or you can use the caa_profile -update autofs command to update the profile.

For example, to increase SCRIPT_TIMEOUT to 3600 seconds, enter the following command:

# caa_profile -update autofs -o st=3600

For more information about CAA profiles and using the caa_profile command, see caa_profile(8).

7.6.3.1    Forcibly Unmounting File Systems

If AutoFS on a cluster member is stopped or becomes unavailable (for example, if the CAA autofs resource is stopped), intercept points and file systems auto-mounted by AutoFS continue to be available. However, in the case where AutoFS is stopped on a cluster member on which there are busy file systems, and then started on another member, AutoFS intercept points can continue to recognize the original cluster member as the server. Because the AutoFS intercept points are busy when the file systems that are mounted under them are busy, these intercept points still claim the original cluster member as the server. These intercept points do not allow new auto-mounts.

7.6.3.1.1    Determining Whether a Forced Unmount Is Required

You might encounter this problem in the following situations:

In the case where you detect an obvious problem accessing an auto-mounted file system, ensure that the auto-mounted file system is being served as expected. To do this, perform the following steps:

  1. Use the caa_stat autofs command to see where CAA indicates the autofs resource is running.

  2. Use the ps command to verify that the autofsd daemon is running on the member on which CAA expects it to run:

    # ps agx | grep autofsd
     
    

    If it is not running, run it and see whether this fixes the problem.

  3. Determine the auto-mount map entry that is associated with the inaccessible file system. One way to do this is to search the /etc/auto.x files for the entry.

  4. Use the cfsmgr -e command to determine whether the mount point exists and is being served by the expected member.

    If the server is not what CAA expects, the problem exists.

In the case where you move the CAA resource to another member, use the mount -e command to identify AutoFS intercept points and the cfsmgr -e command to show the servers for all mount points. Verify that all AutoFS intercept points and auto-mounted file systems have been unmounted on the member on which AutoFS was stopped.

When you use the mount -e command, search the output for autofs references similar to the following:

# mount -e | grep autofs
/etc/auto.direct on /mnt/mytmp type autofs (rw, nogrpid, direct)
 

When you use the cfsmgr -e command, search the output for map file entries similar to the following. The Server Status field does not indicate whether the file system is actually being served; look in the Server Name field for the name of the member on which AutoFS was stopped.

# cfsmgr -e
Domain or filesystem name = /etc/auto.direct
Mounted On = /mnt/mytmp
Server Name = provolone
Server Status : OK
 

7.6.3.1.2    Correcting the Problem

If you can wait until the busy file systems in question become inactive, do so. Then, run the autofsmount -U command on the former AutoFS server node to unmount them. Although this approach takes more time, it is a less intrusive solution.

If waiting until the busy file systems in question become inactive is not possible, use the cfsmgr -K directory command on the former AutoFS server node to forcibly unmount all AutoFS intercept points and auto-mounted file systems served by that node, even if they are busy.

Note

The cfsmgr -K command makes a best effort to unmount all AutoFS intercept points and auto-mounted file systems served by the node. However, the cfsmgr -K command may not succeed in all cases. For example, the cfsmgr -K command does not work if an NFS operation is stalled due to a down NFS server or an inability to communicate with the NFS server.

The cfsmgr -K command results in applications receiving I/O errors for open files in affected file systems. An application with its current working directory in an affected file system will no longer be able to navigate the file system namespace using relative names.

Perform the following steps to relocate the autofs CAA resource and forcibly unmount the AutoFS intercept points and auto-mounted file systems:

  1. Bring the system to a quiescent state if possible to minimize disruption to users and applications.

  2. Stop the autofs CAA resource by entering the following command:

    # caa_stop autofs
    

    CAA considers the autofs resource to be stopped even if some auto-mounted file systems are still busy.

  3. Enter the following command to verify that all AutoFS intercept points and auto-mounted file systems have been unmounted. Search the output for autofs references.

    # mount -e
    

  4. In the event that they have not all been unmounted, enter the following command to forcibly unmount the AutoFS intercepts and auto-mounted file systems:

    # cfsmgr -K directory
    

  5. Specify the directory on which an AutoFS intercept point or auto-mounted file system is mounted. You need enter only one mounted-on directory to remove all of the intercepts and auto-mounted file systems served by the same node.

  6. Enter the following command to start the autofs resource:

    # caa_start autofs -c cluster_member_to_be_server
    

7.6.4    Migrating from Automount to AutoFS

This section describes three possible scenarios for migrating from Automount to AutoFS:

7.6.4.1    Migrating Without a Reboot

Migrating without rebooting any cluster member requires the largest number of procedural steps, but provides the highest availability. This procedure requires the most steps because you cannot rely on a reboot to clean up the Automount intercept points and to automatically start AutoFS.

Note

Most Automount environments have a single automount instance for all map files. This procedure describes this common case.

If you have a complex Automount environment with a separate automount instance for each map file, you might have a customized version of the /etc/rc.config.common file, or the ps command might return multiple process identifiers and you must kill them all, or one cluster member might not be the Automount server node for all NFS file systems, and so forth.

While you extrapolate the procedure to fit your Automount environment, kill the "standby" copies of the Automount process first to prevent the Automount service from failing over when you kill the active Automount server process.

Follow these steps to migrate from Automount to AutoFS without rebooting any cluster member:

  1. Change the rc.config.common file.

    1. Determine the arguments to pass to autofsmount. These arguments are typically a subset of those already specified by the AUTOMOUNT_ARGS environment variable. To view the value of that variable, use the rcmgr -get command, as shown in the following example:

      # /usr/sbin/rcmgr -c get AUTOMOUNT_ARGS 
      -D MACH=alpha -D NET=f /- /etc/auto.direct
       
      

      Environment variables set by using the -D option resolve placeholders in the definition of automount map file entries. For example, the associated NET entry might appear in the map file as follows:

      vsx ${NET}system:/share/hunch/usr/projects2/vsx
       
      

      and would resolve to

      vsx fsystem:/share/hunch/usr/projects2/vsx
       
      

    2. Set the arguments to pass to autofsmount, as determined in the previous step. To do this, use the rcmgr -set command, as shown in the following example:

      # /usr/sbin/rcmgr -c set AUTOFSMOUNT_ARGS  -D MACH=alpha -D NET=f /- /etc/auto.direct
       
      

    3. Set the arguments to pass to the autofsd daemon, as shown in the following example:

      # /usr/sbin/rcmgr -c set AUTOFSD_ARGS -D MACH=alpha -D NET=f
      

      These arguments must match the environment variables, specified with the -D option, as set for AUTOMOUNT_ARGS.

    4. Use the mount -e command to identify a file system served by automount.

      # mount -e | grep "(pid"
      deli.zk3.dec.com:(pid524825) on /net type nfs (v2, ro, nogrpid, udp, hard, intr, noac, timeo=350, retrans=5)
      

      The automounted file system is indicated by hostname:(pid).

    5. Determine which cluster member is the Automount server node for the NFS file system you identified in the previous step, as shown in the following example:

      # cfsmgr -p /net
      Domain or filesystem name = /net
      Server Name = swiss
      Server Status: OK
      

    6. Stop the Automount service on all cluster members other than the Automount server you identified in the previous step. To do this, use the ps -ef command to display process identifiers, search the output for instances of automount, and then use the kill -TERM command, where TERM is the default, to kill each process.

      # ps -ef | grep automount
      root 1049132 1048577 0.0 May 10 ?? 0:00.00 /usr/sbin/automount -D MACH=alpha -D NET=f /- /etc/auto.direct
      

      # kill 1049132
      

      Starting with Tru64 UNIX Version 5.1A, the kill command is cluster-aware; you can kill a process from any cluster member.

    7. Disable Automount and enable AutoFS in the rc.config.common file, as follows:

      # /usr/sbin/rcmgr -c set AUTOMOUNT 0
      # /usr/sbin/rcmgr -c set AUTOFS 1
      

  2. Wait for all automounted file systems to become quiescent.

  3. Stop the Automount service on the cluster member that is operating as the server. To do this, use the ps -ef command to display process identifiers, search the output for instances of automount, and then use the kill -TERM command, where TERM is the default, to kill each process. Sending the SIGTERM signal to the automount daemon causes it to unmount all file systems that it has mounted, and to exit.

    # ps -ef | grep automount
    root  524825 524289  0.0   May 10 ??  0:00.01 /usr/sbin/automount -D MACH=alpha -D NET=f /- /etc/auto.direct
    

    # kill 524825
    

  4. Use the mount -e command and search the output for tmp_mnt, or the directory specified with the automount -M command, to verify that automounted file systems are no longer mounted.

    # mount -e | grep tmp_mnt
    

    If some mount points still exist, they will no longer be usable via the expected pathnames. However, they are still usable under the full /tmp_mnt/... pathnames. Because AutoFS does not use the /tmp_mnt mount point, there is no conflict and the full automount name space is available for AutoFS. If these tmp_mnt mount points later become idle, you can unmount them by using the -f option of the umount command, which unmounts remote file systems without notifying the server.

  5. Start AutoFS. AutoFS provides automatic failover of the automounting service by means of CAA: one cluster member acts as the CFS server for automounted file systems, and runs the one active copy of the AutoFS daemon. If this cluster member fails, CAA starts the autofs resource on another member.

    If you do not care which node serves AutoFS, use the /usr/sbin/caa_start autofs command without specifying a cluster member; otherwise, use the /usr/sbin/caa_start autofs -c member-name command to specify the cluster member that you want to serve AutoFS.

    # /usr/sbin/caa_start autofs
     
    

    The -c option starts the autofs resource on the specified member if the cluster member is allowed by the placement policy and resource dependencies. If the cluster member specified is not allowed by the placement policy and resource dependiencies, the caa_start command fails. If the specified member is not available, the command fails.

    See the discussion of the resource file options in caa_profile(8).

  6. Use the caa_stat autofs command to make sure that the autofs resource started as expected.

    # /usr/bin/caa_stat autofs
    NAME=autofs
    TYPE=application
    TARGET=ONLINE
    STATE=ONLINE on swiss
    

7.6.4.2    Migrating When Rebooting a Cluster Member

Migrating when rebooting a cluster member requires fewer procedural steps than migrating without a reboot, at the expense of availability.

Notes

Before you shut down a cluster member, you need to determine whether the cluster member you are shutting down is a critical voting member, and whether it is the only hosting member for one or more applications with a restricted placement policy. Both of these issues are described in Section 5.5.

Most Automount environments have a single automount instance for all map files. This procedure describes this common case.

If you have a complex Automount environment with a separate automount instance for each map file, you might have a customized version of the /etc/rc.config.common file, or the ps command might return multiple process identifiers and you must kill them all, or one cluster member might not be the Automount server node for all NFS file systems, and so forth.

As you extrapolate the procedure to fit your Automount environment, kill the "standby" copies of the Automount process first to prevent the Automount service from failing over when you kill the active Automount server process.

Follow these steps to migrate from Automount to AutoFS when rebooting a cluster member:

  1. Change the rc.config.common file.

    1. Determine the arguments to pass to autofsmount. These arguments are typically a subset of those already specified by the AUTOMOUNT_ARGS environment variable. To view the value of that variable, use the rcmgr -get command, as shown in the following example:

      # /usr/sbin/rcmgr -c get AUTOMOUNT_ARGS 
      -m -D MACH=alpha -D NET=f /- /etc/auto.direct
       
      

      Environment variables set by using the -D option resolve placeholders in the definition of automount map file entries. For example, the associated NET entry might appear in the map file as follows:

      vsx ${NET}system:/share/hunch/usr/projects2/vsx
       
      

      The entry resolves to the following:

      vsx fsystem:/share/hunch/usr/projects2/vsx
       
      

    2. Set the arguments to pass to autofsmount, as determined in the previous step. To do this, use the rcmgr -set command, as shown in the following example:

      # /usr/sbin/rcmgr -c set AUTOFSMOUNT_ARGS -D MACH=alpha -D NET=f /- /etc/auto.direct
       
      

    3. Set the arguments to pass to the autofsd daemon, as shown in the following example:

      # /usr/sbin/rcmgr -c set AUTOFSD_ARGS -D MACH=alpha -D NET=f
      

      These arguments must match the environment variables, specified with the -D option, as set for AUTOMOUNT_ARGS.

    4. Use the mount -e command to identify a file system served by Automount:

      # mount -e | grep "(pid"
      deli.zk3.dec.com:(pid524825) on /net type nfs (v2, ro, nogrpid, udp, hard, intr, noac, timeo=350, retrans=5)
      

      The automounted file system is indicated by hostname:(pid).

    5. Determine which cluster member is the Automount server node for the NFS file system you identified in the previous step.

      # cfsmgr -p /net
      Domain or filesystem name = /net
      Server Name = swiss
      Server Status: OK
      

    6. Stop the Automount service on all cluster members other than the Automount server you identified in the previous step. To do this, use the ps -ef command to display process identifiers, search the output for instances of automount, and then use the kill -TERM command, where TERM is the default, to kill each process.

      # ps -ef | grep automount
      root 1049132 1048577 0.0 May 10 ?? 0:00.00 /usr/sbin/automount -D MACH=alpha -D NET=f /- /etc/auto.direct
      

      # kill 1049132
      

      Starting with Tru64 UNIX Version 5.1A, the kill command is cluster-aware; you can kill a process from any cluster member.

    7. Disable Automount and enable AutoFS in the rc.config.common file, as follows:

      # /usr/sbin/rcmgr -c set AUTOMOUNT 0
      # /usr/sbin/rcmgr -c set AUTOFS 1
      

  2. Optionally, specify the AutoFS server. AutoFS provides automatic failover of the automounting service by means of CAA: one cluster member acts as the CFS server for automounted file systems, and runs the one active copy of the AutoFS daemon. If this cluster member fails, CAA starts the autofs resource on another member.

    You can use the caa_profile autofs -print command to view the CAA hosting and placement policy, if any. The hosting policy specifies an ordered list of members, separated by white space, that can host the application resource. The placement policy specifies the policy according to which CAA selects the member on which to start or restart the application resource. The autostart policy determines whether you want to start the resource automatically, regardless of whether it had been stopped or running before a reboot. Set auto_start=1 if you want to start the resource regardless of whether it had been running before the reboot.

    # /usr/sbin/caa_profile autofs -print
    NAME=autofs
    TYPE=application
    ACTION_SCRIPT=autofs.scr
    ACTIVE_PLACEMENT=0
    AUTO_START=0
    CHECK_INTERVAL=0
    DESCRIPTION=Autofs Services
    FAILOVER_DELAY=0
    FAILURE_INTERVAL=0
    FAILURE_THRESHOLD=0
    HOSTING_MEMBERS=
    OPTIONAL_RESOURCES=
    PLACEMENT=balanced
    REQUIRED_RESOURCES=
    RESTART_ATTEMPTS=3
    SCRIPT_TIMEOUT=3600
    

    The default, and recommended, behavior is to run on any cluster member, with a placement policy of balanced. If this policy is not suitable for your environment, use the /usr/sbin/caa_profile -update command to change the autofs resource profile.

    See the discussion of the resource file options in caa_profile(8).

    If you make a change, use the CAA /usr/sbin/caa_register -u autofs command to have the update take effect.

  3. Reboot the cluster member. Before you shut down the cluster member, make sure that it is not a critical voting member or the only hosting member for one or more applications with a restricted placement policy. Both of these issues are described in Section 5.5.

    When it reboots, Automount will no longer be running in the cluster, and AutoFS will start.

    # /sbin/shutdown -r now
    

7.6.4.3    Migrating When Rebooting the Cluster

Migrating when rebooting the entire cluster requires fewer procedural steps than migrating without a reboot or migrating when rebooting a single member, at the expense of cluster availability.

Rebooting the cluster is a drastic measure; this is not the preferred migration method.

Follow these steps to migrate from Automount to AutoFS when rebooting the cluster:

  1. Change the rc.config.common file.

    1. Determine the arguments to pass to autofsmount. These arguments are typically a subset of those already specified by the AUTOMOUNT_ARGS environment variable. To view the value of that variable, use the rcmgr -get command, as shown in the following example:

      # /usr/sbin/rcmgr -c get AUTOMOUNT_ARGS
      -D MACH=alpha -D NET=f /- /etc/auto.direct
      

      Environment variables set by using the -D option resolve placeholders in the definition of automount map file entries. For example, the associated NET entry might appear in the map file as follows:

      vsx ${NET}system:/share/hunch/usr/projects2/vsx
       
      

      This entry resolves to the following:

      vsx fsystem:/share/hunch/usr/projects2/vsx
       
      

    2. Set the arguments to pass to autofsmount, as determined in the previous step. To do this, use the rcmgr -set command, as shown in the following example:

      # /usr/sbin/rcmgr -c set AUTOFSMOUNT_ARGS -D MACH=alpha -D NET=f /- /etc/auto.direct
       
      

    3. Set the arguments to pass to the autofsd daemon, as shown in the following example:

      # /usr/sbin/rcmgr -c set AUTOFSD_ARGS -D MACH=alpha -D NET=f
      

      These arguments must match the environment variables, specified with the -D option, as set for AUTOMOUNT_ARGS.

    4. Disable Automount and enable AutoFS in the rc.config.common file, as follows:

      # /usr/sbin/rcmgr -c set AUTOMOUNT 0
      # /usr/sbin/rcmgr -c set AUTOFS 1
      

  2. Optionally, specify the AutoFS server. AutoFS provides automatic failover of the automounting service by means of CAA: one cluster member acts as the CFS server for automounted file systems, and runs the one active copy of the AutoFS daemon. If this cluster member fails, CAA starts the autofs resource on another member.

    You can use the /usr/bin/caa_profile autofs -print command to view the CAA hosting and placement policy, if any. The hosting policy specifies an ordered list of members, separated by white space, that can host the application resource. The placement policy specifies the policy according to which CAA selects the member on which to start or restart the application resource. The autostart policy determines whether you want to start the resource automatically, regardless of whether it had been stopped or running before a reboot. Set auto_start=1 if you want to start the resource regardless of whether it had been running before the reboot.

    # /usr/bin/caa_profile autofs -print
    NAME=autofs
    TYPE=application
    ACTION_SCRIPT=autofs.scr
    ACTIVE_PLACEMENT=0
    AUTO_START=0
    CHECK_INTERVAL=0
    DESCRIPTION=Autofs Services
    FAILOVER_DELAY=0
    FAILURE_INTERVAL=0
    FAILURE_THRESHOLD=0
    HOSTING_MEMBERS=
    OPTIONAL_RESOURCES=
    PLACEMENT=balanced
    REQUIRED_RESOURCES=
    RESTART_ATTEMPTS=3
    SCRIPT_TIMEOUT=3600
    

    The default, and recommended, behavior is to run on any cluster member, with a placement policy of balanced. If this policy is not suitable for your environment, use the /usr/bin/caa_profile -update command to change the autofs resource profile.

    See the discussion of the resource file options in caa_profile(8).

    If you make a change, use the CAA /usr/sbin/caa_register -u autofs command to have the update take effect.

  3. Reboot the cluster. When it reboots, Automount will no longer be running in the cluster, and AutoFS will start.

    # /sbin/shutdown -c now
    

7.7    Managing inetd Configuration

Configuration data for the Internet server daemon (inetd) is kept in the following two files:

To disable a clusterwide service on a local member, edit /etc/inetd.conf.local for that member, and enter disable in the ServerPath field for the service to be disabled. For example, if finger is enabled clusterwide in inetd.conf and you want to disable it on a member, add a line like the following to that member's inetd.conf.local file:

finger  stream  tcp     nowait  root   disable       fingerd
 

When /etc/inetd.conf.local is not present on a member, the configuration in /etc/inetd.conf is used. When inetd.conf.local is present, its entries take precedence over those in inetd.conf.

7.8    Managing Mail

TruCluster Server supports the following mail protocols:

SMTP is cluster-aware and can make use of the cluster alias. The other mail protocols can run in a cluster environment, but they act as though each cluster member is a standalone system.

In a cluster, all members must have the same mail configuration. If DECnet, SMTP, or any other protocol is configured on one cluster member, it must be configured on all members, and it must have the same configuration on each member. You can configure the cluster as a mail server, client, or as a standalone configuration, but the configuration must be clusterwide. For example, you cannot configure one member as a client and another member as a server.

Of the supported protocols, only SMTP is cluster-aware, so only SMTP can make use of the cluster alias. SMTP handles e-mail sent to the cluster alias, and labels outgoing mail with the cluster alias as the return address.

When configured, an instance of sendmail runs on each cluster member. Every member can handle messages waiting for processing because the mail queue file is shared. Every member can handle mail delivered locally because each user's maildrop is shared among all members.

The other mail protocols, DECnet Phase IV, DECnet/OSI, Message Transport System (MTS), UUCP, and X.25, can run in a cluster environment, but they act as though each cluster member is a standalone system. Incoming e-mail using one of these protocols must be addressed to an individual cluster member, not to the cluster alias. Outgoing e-mail using one of these protocols has as its return address the cluster member where the message originated.

Configuring DECnet Phase IV, DECnet/OSI, MTS, UUCP, or X.25 in a cluster is like configuring it in a standalone system. It must be configured on each cluster member, and any hardware that is required by the protocol must be installed on each cluster member.

The following sections describe managing mail in more detail.

7.8.1    Configuring Mail

Configure mail with either the mailsetup or mailconfig command. Whichever command you choose, you have to use it for future mail configuration on the cluster, because each command understands only its own configuration format.

7.8.1.1    Mail Files

The following mail files are all common files shared clusterwide:

The following mail files are member-specific:

Files in /var/adm/sendmail that have hostname as part of the file name use the default cluster alias in place of hostname. For example, if the cluster alias is accounting, /var/adm/sendmail contains files named accounting.m4 and Makefile.cf.accounting.

Because the mail statistics file, /usr/adm/sendmail/sendmail.st, is member-specific, mail statistics are unique to each cluster member. The mailstat command returns statistics only for the member on which the command executed.

When mail protocols other than SMTP are configured, the member-specific /var/adm/sendmail/protocols.map file stores member-specific information about the protocols in use. In addition to a list of protocols, protocols.map lists DECnet Phase IV and DECnet/OSI aliases, when those protocols are configured.

7.8.1.2    The Cw Macro (System Nicknames List)

Whether you configure mail with mailsetup or mailconfig, the configuration process automatically adds the names of all cluster members and the cluster alias to the Cw macro (nicknames list) in the sendmail.cf file. The nicknames list must contain these names. If, during mail configuration, you accidentally delete the cluster alias or a member name from the nicknames list, the configuration program will add it back in.

During configuration you can specify additional nicknames for the cluster. However, if you do a quick setup in mailsetup, you are not prompted to update the nickname list. The cluster members and the cluster alias are still automatically added to the Cw macro.

7.8.1.3    Configuring Mail at Cluster Creation

We recommend that you configure mail on your Tru64 UNIX system before you run the clu_create command. If you run only SMTP, then you do not need to perform further mail configuration when you add new members to the cluster. The clu_add_member command takes care of correctly configuring mail on new members as they are added.

If you configure DECnet Phase IV, DECnet/OSI, MTS, UUCP, or X.25, then each time that you add a new cluster member, you must run mailsetup or mailconfig and configure the protocol on the new member.

7.8.1.4    Configuring Mail After the Cluster Is Running

All members must have the same mail configuration. If you want to run only SMTP, then you need configure mail only once, and you can run mailsetup or mailconfig from any cluster member.

If you want to run a protocol other than SMTP, you must manually run mailsetup or mailconfig on every member and configure the protocols. Each member must also have any hardware required by the protocol. The protocols must be configured for every cluster member, and the configuration of each protocol must be the same on every member.

The mailsetup and mailconfig commands cannot be focused on individual cluster members. In the case of SMTP, the commands configure mail for the entire cluster. For other mail protocols, the commands configure the protocol only for the cluster member on which the command runs.

If you try to run mailsetup with the -focus option, you get the following error message:

Mail can only be configured for the entire cluster.

Whenever you add a new member to the cluster, and you are running any mail protocol other than SMTP, you must run mailconfig or mailsetup and configure the protocol on the new member. If you run only SMTP, then no mail configuration is required when a member is added.

Deleting members from the cluster requires no reconfiguration of mail, regardless of the protocols that you are running.

7.8.2    Distributing Mail Load Among Cluster Members

Mail handled by SMTP can be load balanced by means of the cluster alias selection priority (selp) and selection weight (selw), which load balance network connections among cluster members as follows:

By default, all cluster members have the same selection priority (selp=1) and selection weight (selw=1), as determined by the /etc/clu_alias.config file on each member. (The clu_create command uses a default selection weight of 3, but if you create an alias the default selection weight is 1.) When all members share the same selection priority and the same selection weight, then connection requests are distributed equally among the members. In the case of the default system configuration, each member in turn handles one incoming connection.

If you want all incoming mail (and all other connections) to be handled by a subset of cluster members, set the selection priority for those cluster members to a common value that is higher than the selection priority of the remaining members.

You can also create a mail alias that includes only those cluster members that you want to handle mail, or create a mail alias with all members and use the selection priority to determine the order in which members of the alias receive new connection requests.

Set the selection weight or selection priority for a member by running the cluamgr command on that member. If your cluster members have the default values for selp and selw, and you want all incoming mail (and all other connections) to be handled by a single cluster member, log in to that member and assign it a selp value greater than the default. For example, enter the following command:

# cluamgr -a alias=DEFAULTALIAS,selp=50

Suppose you have an eight-member cluster and you want two of the members, alpha and beta, to handle all incoming connections, with the load split 40/60 between alpha and beta, respectively. Log in to alpha and enter the following command:

# cluamgr -a alias=DEFAULTALIAS,selp=50,selw=2

Then log in to beta and enter the following command:

# cluamgr -a alias=DEFAULTALIAS,selp=50,selw=3

Assuming that the other members have the default selp of 1, beta and alpha will handle all connection requests. beta will take three connections, then alpha will take two, then beta will take the next three, and so on.

Note

Setting selp and selw in this manner affects all connections through the cluster alias, not just the mail traffic.

For more information on balancing connection requests, see Section 3.10 and cluamgr(8).

7.9    Configuring a Cluster for RIS

To create a Remote Installation Services (RIS) server in a cluster, perform the following procedure in addition to the procedure that is described in the Tru64 UNIX Sharing Software on a Local Area Network manual:

For information about /etc/bootptab, see bootptab(4).

Note

Depending on your network configuration, you may need to supply a unique, arbitrary hardware address when registering the alias with the RIS server.

To use a cluster as an RIS client, you must do the following:

  1. Register the cluster member from which you will be using the setld command with the RIS server. Do this by registering the member name and the hardware address of that member.

  2. Register the default cluster alias.

    If you are registering for an operating system kit, you will be prompted to enter a hardware address. The cluster alias does not have a physical interface associated with its host name. Instead, use any physical address that does not already appear in either /etc/bootptab or /usr/var/adm/ris/clients/risdb.

    If your cluster uses the cluster alias virtual MAC (vMAC) feature, register that virtual hardware address with the RIS server as the default cluster alias's hardware address. If your cluster does not use the vMAC feature, you can still generate a virtual address by using the algorithm that is described in the virtual MAC (vMAC) section, Section 3.12.

    A virtual MAC address consists of a prefix (the default is AA:01) followed by the IP address of the alias in hexadecimal format. For example, the default vMAC address for the default cluster alias deli whose IP address is 16.140.112.209 is AA:01:10:8C:70:D1. The address is derived in the following manner:

            Default vMAC prefix:       AA:01
            Cluster Alias IP Address:  16.140.112.209
            IP address in hex. format: 10.8C.70:D1
            vMAC for this alias:       AA:01:10:8C:70:D1
     
    

    Therefore, when registering this default cluster alias as a RIS client, the host name is deli and the hardware address is AA:01:10:8C:70:D1.

If you do not register both the default cluster alias and the member, the setld command will return a message such as one of the following:

# setld -l ris-server:
setld: Error contacting server ris-server: Permission denied.
setld: cannot initialize ris-server:
 

# setld -l ris-server:
setld: ris-server: not in server database
setld: cannot load control information
 

7.10    Displaying X Window Applications Remotely

You can configure the cluster so that a user on a system outside the cluster can run X applications on the cluster and display them on the user's system using the cluster alias.

The following example shows the use of out_alias as a way to apply single-system semantics to X applications that are displayed from cluster members.

In /etc/clua_services, the out_alias attribute is set for the X server port (6000). A user on a system outside the cluster wants to run an X application on a cluster member and display back to the user's system. Because the out_alias attribute is set on port 6000 in the cluster, the user must specify the name of the default cluster alias when running the xhost command to allow X clients access to the user's local system. For example, for a cluster named deli, the user runs the following command on the local system:

# xhost +deli
 

This use of out_alias allows any X application from any cluster member to be displayed on that user's system. A cluster administrator who wants users to allow access on a per-member basis can either comment out the Xserver line in /etc/clua_services, or remove the out_alias attribute from that line (and then run cluamgr -f on each cluster member to make the change take effect).

For more information on cluster aliases, see Chapter 3.