4 General Application Migration Issues

This chapter describes general migration issues that are relevant to all types of applications. Table 4-1 lists each migration issue and the types of applications that might encounter them, as well as where to find more information.

Table 4-1: Application Migration Considerations

Issues	Application Types Affected	For More Information
Clusterwide and member-specific files	Single-instanceMulti-instanceDistributed	Section 4.1
Device naming	Single-instanceMulti-instanceDistributed	Section 4.2
Interprocess communication	Multi-instanceDistributed	Section 4.3
Synchronized access to shared data	Multi-instanceDistributed	Section 4.4
Member-specific resources	Single-instance	Section 4.5
Expanded process IDs (PIDs)	Multi-instanceDistributed	Section 4.6
Distributed lock manager (DLM) parameters removed	Multi-instance Distributed	Section 4.7
Licensing	Single-instanceMulti-instanceDistributed	Section 4.8
Blocking layered products	Single-instanceMulti-instanceDistributed	Section 4.9

4.1 Clusterwide and Member-Specific Files

A cluster has two sets of configuration data:

Clusterwide data
Clusterwide data pertains to files and logs that can be shared by all members of a cluster. For example, when two systems are members of a cluster, they share a common /etc/passwd file that contains information about the authorized users for both systems.
Sharing configuration or management data makes file management easier. For example, Apache and Netscape configuration files can be shared, allowing you to manage the application from any node in the cluster.

Member-specific data
Member-specific data pertains to files that contain member-specific data. These files cannot be shared by all members of a cluster. Member-specific data may be configuration details that pertain to hardware found only on a specific system, such as a layered product driver for a specific printer connected to one cluster member.

Because the cluster file system (CFS) makes all files visible to and accessible by all cluster members, those applications that require clusterwide configuration data can easily write to a configuration file that all members can view. However, an application that must use and maintain member-specific configuration information needs to take some additional steps to avoid overwriting files.

To avoid overwriting files, consider using one of the following methods:

Method	Advantage	Disadvantage
Single file	Easy to manage.	Application must be aware of how to access member-specific data in the single file.
Multiple files	Keeps configuration information in a set of clusterwide files.	Multiple copies of files need to be maintained. Application must be aware of how to access member-specific files.
Context-dependent symbolic links (CDSLs)	Keeps configuration information in member-specific areas. CDSLs are transparent to the application; they look like symbolic links.	Moving or renaming files will break symbolic links. Application must be aware of how to handle CDSLs. Using CDSLs makes it more difficult for an application to find out about other instances of that application in the cluster.

You must decide which method best fits your application's needs. The following sections describe each approach.

4.1.1 Using a Single File

Using a single, uniquely named file keeps application configuration information in one clusterwide file as separate records for each node. The application reads and writes the correct record in the file. Managing a single file is easy because all data is in one central location.

As an example, in a cluster the /etc/printcap file contains entries for specific printers. The following parameter can be specified to indicate which nodes in the cluster can run the spooler for the print queue:

        :on=nodename1,nodename2,nodename3,...:

If the first node is up, it will run the spooler. If that node goes down, the next node, if it is up, will run the spooler, and so on.

4.1.2 Using Multiple Files

Using uniquely named multiple files keeps configuration information in a set of clusterwide files. For example, each cluster member has its own member-specific gated configuration file in /etc. Instead of using a context-dependent symbolic link (CDSL) to reference member-specific files through a common file name, the naming convention for these files takes advantage of member IDs to create a unique name for each member's file. For example:

# ls -l /etc/gated.conf.member*
-rw-r--r--   1 root   system    466 Jun 21 17:37 /etc/gated.conf.member1
-rw-r--r--   1 root   system    466 Jun 21 17:37 /etc/gated.conf.member2
-rw-r--r--   1 root   system    466 Jun 21 13:28 /etc/gated.conf.member3

This method requires more work to manage because multiple copies of files need to be maintained. For example, if the member ID of a cluster member changes, you must find and rename all member-specific files belonging to that member. Also, if the application is unaware of how to access member-specific files, you must configure it to do so.

4.1.3 Using CDSLs

Tru64 UNIX Version 5.0 introduced a special form of symbolic link, called a context-dependent symbolic link (CDSL), that TruCluster Server uses to point to the correct file for each member. CDSLs are useful when running multiple instances of an application on different cluster members on different sets of data.

Using a CDSL keeps configuration information in member-specific areas. However, the data can be referenced through the CDSL. Each member reads the common file name, but is transparently linked to its copy of the configuration file. CDSLs are an alternative to maintaining member-specific configuration information when an application cannot be easily changed to use multiple files.

The following example shows the CDSL structure for the file /etc/rc.config:

/etc/rc.config -> ../cluster/members/{memb}/etc/rc.config

For example, where a cluster member has a member ID of 3, the pathname /cluster/members/{memb}/etc/rc.config resolves to /cluster/members/member3/etc/rc.config.

Tru64 UNIX provides the mkcdsl command, which lets system administrators create CDSLs and update a CDSL inventory file. For more information on this command, see the TruCluster Server Cluster Administration manual and mkcdsl(8). For more information about CDSLs, see the Tru64 UNIX System Administration manual, hier(5), ln(1), and symlink(2).

4.2 Device Naming

Tru64 UNIX Version 5.0 introduced a new device-naming convention that consists of a descriptive name for the device and an instance number. These two elements form the basename of the device. For example:

Location in /dev	Device Name	Instance	Basename
`./disk`	`dsk`	0	`dsk0`
`./disk`	`cdrom`	1	`cdrom1`
`./tape`	`tape`	0	`tape0`

Moving a disk from one physical connection to another does not change the device name for the disk. For a detailed discussion of this device-naming model, see the Tru64 UNIX System Administration manual.

Although Tru64 UNIX recognizes both the old-style (rz) and new-style (dsk) device names, TruCluster Server recognizes only the new-style device names. Applications that depend on old-style device names or the /dev directory structure must be modified to use the newer device-naming convention.

You can use the hwmgr utility, a generic utility for managing hardware, to help map device names to their bus, target, and LUN position after installing Tru64 UNIX Version 5.1B. For example, enter the following command to view devices:

# hwmgr -view devices
 
  HWID: Device Name         Mfg	   Model            Location
  --------------------------------------------------------------------
    45: /dev/disk/floppy0c         3.5in floppy     fdi0-unit-0
    54: /dev/disk/cdrom0c   DEC    RRD47   (C) DEC  bus-0-targ-5-lun-0
    55: /dev/disk/dsk0c     COMPAQ BB00911CA0       bus-1-targ-0-lun-0
    56: /dev/disk/dsk1c     COMPAQ BB00911CA0       bus-1-targ-1-lun-0
    57: /dev/disk/dsk2c     DEC    HSG80            IDENTIFIER=7
    .
    .
    .

Use the following command to view devices clusterwide:

# hwmgr -view devices -cluster
 
HWID: Device Name        Mfg     Model           Hostname    Location
-----------------------------------------------------------------------
  45: /dev/disk/floppy0c         3.5in floppy    swiss    fdi0-unit-0
  54: /dev/disk/cdrom0c  DEC     RRD47   (C) DEC swiss    bus-0-targ-5-lun-0
  55: /dev/disk/dsk0c    COMPAQ  BB00911CA0      swiss    bus-1-targ-0-lun-0
  56: /dev/disk/dsk1c    COMPAQ  BB00911CA0      swiss    bus-1-targ-1-lun-0
  57: /dev/disk/dsk2c    DEC     HSG80           swiss    IDENTIFIER=7
  .
  .
  .

For more information on using this command, see hwmgr(8) and the Cluster Administration manual.

When modifying applications to use the new-style device-naming convention, look for the following:

Disks that are included in Advanced File System (AdvFS) domains

Raw disk devices

Disks that are encapsulated in Logical Storage Manager (LSM) volumes or that are included in disk groups

Disk names in scripts

Disk names in data files (Oracle OPS and Informix XPS)

SCSI bus renumbering

Note

If you previously renumbered SCSI buses in your ASE, carefully verify the mapping from physical device to bus number during an upgrade to TruCluster Server. See the Cluster Installation manual for more information.

4.3 Interprocess Communication

The following mechanisms for clusterwide interprocess communication (IPC) are supported:

TCP/IP connections using sockets

Buffered I/O or memory-mapped files

UNIX file locks

Distributed lock manager (DLM) locks

Clusterwide kill signal

Memory Channel application programming interface (API) library (memory windows, low level locks, and signals)

The following mechanisms are not supported for clusterwide IPC:

UNIX domain sockets

Named pipes (FIFO special files)

Signals

System V IPC (messages, shared memory, and semaphores)

If an application uses any of these IPC methods, it must be restricted to running as a single-instance application.

4.4 Synchronized Access to Shared Data

Multiple instances of an application running within a cluster must synchronize with each other for most of the same reasons that multiprocess and multithreaded applications synchronize on a standalone system. However, memory-based synchronization mechanisms (such as critical sections, mutexes, simple locks, and complex locks) work only on the local system and not clusterwide. Shared file data must be synchronized, or files must be used to synchronize the execution of instances across the cluster.

Because the cluster file system (CFS) is fully POSIX compliant, an application can use flock() system calls to synchronize access to shared files among instances. You can also use the distributed lock manager (DLM) API library functions for more sophisticated locking capabilities (such as additional lock modes, lock conversions, and deadlock detection). Because the DLM API library is supplied only in the TruCluster Server product, make sure that code that uses its functions and that is meant also to run on nonclustered systems precedes any DLM function calls with a call to clu_is_member(). The clu_is_member() function verifies that the system is in fact a cluster member. For more information about this command, see clu_is_member(3). For more information about the DLM API, see Chapter 9.

4.5 Member-Specific Resources

If multiple instances of an application are started simultaneously on more than one cluster member, some instances of the application may not work properly because they depend on resources that are available only from a specific member, such as large CPU cycles or a large amount of physical memory. This may restrict the application to running as a single instance in a cluster. Changing these characteristics in an application may be enough to allow it to run as multiple instances in a cluster, or, if more than one member has the resources, only run the application on those members.

4.6 Expanded PIDs

In TruCluster Server, process identifiers (PIDs) are expanded to a full 32-bit value. The data type PID_MAX is increased to 2147483647 (0x7fffffff); therefore, any applications that test for PID <= PID_MAX must be recompiled.

To ensure that PIDs are unique across a cluster, PIDs for each cluster member are based on the member ID and are allocated from a range of numbers unique to that member. The formula for available PIDs in a cluster is:

PID = (memberid * (2**19)) + 2

Typically, the first two values are reserved for the kernel idle process and /sbin/init. For example, PIDs 524,288 and 524,289 are assigned to kernel idle and init, respectively, on a cluster member whose memberid is 1.

Use PIDs to uniquely identify log and temporary files. If an application does store a PID in a file, make sure that that file is member-specific.

4.7 DLM Parameters Removed

Because the distributed lock manager (DLM) persistent resources, resource groups, and transaction IDs are enabled by default in TruCluster Available Server and TruCluster Production Server Version 1.6 and TruCluster Server Version 5.0 and later, the dlm_disable_rd and dlm_disable_grptx attributes are unneeded and have been removed from the DLM kernel subsystem.

4.8 Licensing

This section discusses licensing constraints and issues.

4.8.1 TruCluster Server Clusterwide Licensing Not Supported

TruCluster Server Version 5.1B does not support clusterwide licensing. Each time that you add an additional member to the cluster, you must register all required application licenses on that member for applications that may run on that member.

4.8.2 Layered Product Licensing and Network Adapter Failover

The Redundant Array of Independent Network Adapters (NetRAIN) and the Network Interface Failure Finder (NIFF) provide mechanisms for facilitating network failover and replace the monitored network interface method that was employed in the TruCluster Available Server and Production Server products.

NetRAIN provides transparent network adapter failover for multiple adapter configurations. NetRAIN monitors the status of its network interfaces with NIFF, which detects and reports possible network failures. You can use NIFF to generate events when network devices, including a composite NetRAIN device, fail. You can monitor these events and take appropriate actions when a failure occurs. For more information about NetRAIN and NIFF, see the Tru64 UNIX Network Administration: Connections manual.

In a cluster, an application can fail over and restart itself on another member. If it performs a license check when restarting, it may fail because it was looking for a particular member's IP address or its adapter's media access control (MAC) address.

Licensing schemes that use a network adapter's MAC address to uniquely identify a machine can be affected by how NetRAIN changes the MAC address. All network drivers support the SIOCRPHYSADDR ioctl that fetches MAC addresses from the interface. This ioctl returns two addresses in an array:

Default hardware address — the permanent address that is taken from the small PROM that each LAN adapter contains.

Current physical address — the address that the network interface responds to on the wire.

For licensing schemes that are based on MAC addresses, use the default hardware address that is returned by SIOCRPHYSADDR ioctl; do not use the current physical address because NetRAIN modifies this address for its own use. See the reference page for your network adapter (for example, tu(7)) for a sample program that uses the SIOCRPHYSADDR ioctl.

4.9 Blocking Layered Products

Check whether an application that you want to migrate is a blocking layered product. A blocking layered product is a product that prevents the installupdate command from completing during an update installation of TruCluster Server Version 5.1B. Blocking layered products must be removed from the cluster before starting a rolling upgrade that will include running the installupdate command.

Unless a layered product's documentation specifically states that you can install a newer version of the product on the first rolled member, and that the layered product knows what actions to take in a mixed-version cluster, we strongly recommend that you do not install either a new layered product or a new version of a currently installed layered product during a rolling upgrade.

The TruCluster Server Cluster Installation manual lists layered products that are known to break an update installation on TruCluster Server Version 5.1B.