7 Managing Application Performance

You may be able to improve overall Tru64 UNIX performance by improving application performance. This chapter describes application performance guidelines, see Table 7-1.

7.1 Improving Application Performance

Well-written applications use CPU, memory, and I/O resources efficiently. Table 7-1 describes some guidelines to improve application performance.

Table 7-1: Application Performance Improvement Guidelines

Guideline	Performance Benefit	Tradeoff
Install the latest operating system patches (Section 7.1.1)	Provides the latest optimizations	None
Use the latest version of the compiler (Section 7.1.2)	Provides the latest optimizations	None
Use parallelism (Section 7.1.3)	Improves SMP performance	None
Optimize applications (Section 7.1.4)	Generates more efficient code	None
Use shared libraries (Section 7.1.5)	Frees memory	May increase execution time
Reduce application memory requirements (Section 7.1.6)	Frees memory	Program may not run optimally
Use memory locking as part of real-time program initialization (Section 7.1.7)	Allows you to lock and unlock memory as needed	Reduces the memory available to processes and the UBC

The following sections describe these improvement guidelines in more detail.

7.1.1 Using the Latest Operating System Patches

Always install the latest operating system patches, which often contain performance enhancements.

Check the /etc/motd file to determine which patches you are running. Contact your customer service representative for information about installing patches.

7.1.2 Using the Latest Version of the Compiler

Always use the latest version of the compiler to build your application program. Usually, new versions include advanced optimizations.

Check the software on your system to ensure that you are using the latest version of the compiler.

7.1.3 Using Parallelism

To enhance parallelism, application developers working in Fortran or C should consider using the Kuch & Associates Preprocessor (KAP), which can have a significant impact on SMP performance. See the Programmer's Guide for details on KAP.

7.1.4 Optimizing Applications

Optimizing an application program can involve modifying the build process or modifying the source code. Various compiler and linker optimization levels can be used to generate more efficient user code. See the Programmer's Guide for more information on optimization.

Whether you are porting an application from a 32-bit system to Tru64 UNIX or developing a new application, never attempt to optimize an application until it has been thoroughly debugged and tested. If you are porting an application written in C, use the lint command with the -Q option or compile your program using the C compiler's -check option to identify possible portability problems that you may need to resolve.

7.1.5 Using Shared Libraries

Using shared libraries reduces the need for memory and disk space. When multiple programs are linked to a single shared library, the amount of physical memory used by each process can be significantly reduced.

However, shared libraries initially result in an execution time that is slower than if you had used static libraries.

7.1.6 Reducing Application Memory Requirements

You may be able to reduce an application's use of memory, which provides more memory resources for other processes or for file system caching. Follow these coding considerations to reduce your application's use of memory:

Configure and tune applications according to the guidelines provided by the application's installation procedure. For example, you may be able to reduce an application's anonymous memory requirements, set parallel/concurrent processing attributes, size shared global areas and private caches, and set the maximum number of open/mapped files.

You may want to use the mmap function instead of the read or write function in your applications. The read and write system calls require a page of buffer memory and a page of UBC memory, but mmap requires only one page of memory.

Look for data cache collisions between heavily used data structures, which occur when the distance between two data structures allocated in memory is equal to the size of the primary (internal) data cache. If your data structures are small, you can avoid collisions by allocating them contiguously in memory. To do this, use a single malloc call instead of multiple calls.

If an application uses large amounts of data for a short time, dynamically allocate the data with the malloc function instead of declaring it statically. When you have finished using dynamically allocated memory, it is freed for use by other data structures that occur later in the program. If you have limited memory resources, dynamically allocating data reduces an application's memory usage and can substantially improve performance.

If an application uses the malloc function extensively, you may be able to improve its processing speed or decrease its memory utilization by using the function's control variables to tune memory allocation. See malloc(3) for details on tuning memory allocation.

If your application fits in a 32-bit address space and allocates large amounts of dynamic memory by using structures that contain many pointers, you may be able to reduce memory usage by using the -xtaso option. The -xtaso option is supported by all versions of the C compiler (-newc, -migrate, and -oldc versions). To use the -xtaso option, modify your source code with a C-language pragma that controls pointer size allocations. See cc(1) for details.

See the Programmer's Guide for detailed information on process memory allocation.

7.1.7 Controlling Memory Locking

Real-time application developers should consider memory locking as a required part of program initialization. Many real-time applications remain locked for the duration of execution, but some may want to lock and unlock memory as the application runs. Memory-locking functions allow you to lock the entire process at the time of the function call and throughout the life of the application. Locked pages of memory cannot be used for paging and the process cannot be swapped out.

Memory locking applies to a process's address space. Only the pages mapped into a process's address space can be locked into memory. When the process exits, pages are removed from the address space and the locks are removed.

Use the mlockall function to lock all of a process' address space. Locked memory remains locked until either the process exits or the application calls the munlockall function. Use the ps command to determine if a process is locked into memory and cannot be swapped out. See Section 12.3.2.

Memory locks are not inherited across a fork, and all memory locks associated with a process are unlocked on a call to the exec function or when the process terminates. See the Guide to Realtime Programming and mlockall(3) for more information.