You may be able to improve overall Tru64 UNIX performance by improving
application performance.
This chapter describes application performance guidelines,
see
Table 7-1.
7.1 Improving Application Performance
Well-written applications use CPU, memory, and I/O resources efficiently.
Table 7-1
describes some guidelines to improve application
performance.
Table 7-1: Application Performance Improvement Guidelines
| Guideline | Performance Benefit | Tradeoff |
| Install the latest operating system patches (Section 7.1.1) | Provides the latest optimizations | None |
| Use the latest version of the compiler (Section 7.1.2) | Provides the latest optimizations | None |
| Use parallelism (Section 7.1.3) | Improves SMP performance | None |
| Optimize applications (Section 7.1.4) | Generates more efficient code | None |
| Use shared libraries (Section 7.1.5) | Frees memory | May increase execution time |
| Reduce application memory requirements (Section 7.1.6) | Frees memory | Program may not run optimally |
| Use memory locking as part of real-time program initialization (Section 7.1.7) | Allows you to lock and unlock memory as needed | Reduces the memory available to processes and the UBC |
The following sections describe these improvement guidelines in more
detail.
7.1.1 Using the Latest Operating System Patches
Always install the latest operating system patches, which often contain performance enhancements.
Check the
/etc/motd
file to determine which patches
you are running.
Contact your customer service representative for information
about installing patches.
7.1.2 Using the Latest Version of the Compiler
Always use the latest version of the compiler to build your application program. Usually, new versions include advanced optimizations.
Check the software on your system to ensure that you are using the latest
version of the compiler.
7.1.3 Using Parallelism
To enhance parallelism, application developers
working in Fortran or C should consider using the Kuch & Associates Preprocessor
(KAP), which can have a significant impact on SMP performance.
See the
Programmer's Guide
for details on KAP.
7.1.4 Optimizing Applications
Optimizing an application program can involve modifying the build process or modifying the source code. Various compiler and linker optimization levels can be used to generate more efficient user code. See the Programmer's Guide for more information on optimization.
Whether you are porting an application from a 32-bit system to Tru64 UNIX
or developing a new application, never attempt to optimize an application
until it has been thoroughly debugged and tested.
If you are porting an application
written in C, use the
lint
command with the
-Q
option or compile your program using the C compiler's
-check
option to identify possible portability problems that you
may need to resolve.
7.1.5 Using Shared Libraries
Using shared libraries reduces the need for memory and disk space. When multiple programs are linked to a single shared library, the amount of physical memory used by each process can be significantly reduced.
However, shared libraries initially result in an execution time that
is slower than if you had used static libraries.
7.1.6 Reducing Application Memory Requirements
You may be able to reduce an application's use of memory, which provides more memory resources for other processes or for file system caching. Follow these coding considerations to reduce your application's use of memory:
Configure and tune applications according to the guidelines provided by the application's installation procedure. For example, you may be able to reduce an application's anonymous memory requirements, set parallel/concurrent processing attributes, size shared global areas and private caches, and set the maximum number of open/mapped files.
You may want to use the
mmap
function instead
of the
read
or
write
function in your
applications.
The
read
and
write
system
calls require a page of buffer memory and a page of UBC memory, but
mmap
requires only one page of memory.
Look for data cache collisions between heavily used data structures,
which occur when the distance between two data structures allocated in memory
is equal to the size of the primary (internal) data cache.
If your data structures
are small, you can avoid collisions by allocating them contiguously in memory.
To do this, use a single
malloc
call instead of multiple
calls.
If an application uses large amounts
of data for a short time, dynamically allocate the data with the
malloc
function instead of declaring it statically.
When you have
finished using dynamically allocated memory, it is freed for use by other
data structures that occur later in the program.
If you have limited memory
resources, dynamically allocating data reduces an application's memory usage
and can substantially improve performance.
If an application uses the
malloc
function
extensively, you may be able to improve its processing speed or decrease its
memory utilization by using the function's control variables to tune memory
allocation.
See
malloc(3)
If your application fits in a 32-bit address space and allocates
large amounts of dynamic memory by using structures that contain many pointers,
you may be able to reduce memory usage by using the
-xtaso
option.
The
-xtaso
option is supported by all versions
of the C compiler (-newc,
-migrate,
and
-oldc
versions).
To use the
-xtaso
option, modify your source code with a C-language pragma that controls pointer
size allocations.
See
cc(1)
See the
Programmer's Guide
for detailed information on process memory allocation.
7.1.7 Controlling Memory Locking
Real-time application developers should consider memory locking as a required part of program initialization. Many real-time applications remain locked for the duration of execution, but some may want to lock and unlock memory as the application runs. Memory-locking functions allow you to lock the entire process at the time of the function call and throughout the life of the application. Locked pages of memory cannot be used for paging and the process cannot be swapped out.
Memory locking applies to a process's address space. Only the pages mapped into a process's address space can be locked into memory. When the process exits, pages are removed from the address space and the locks are removed.
Use the
mlockall
function to lock all of a process'
address space.
Locked memory remains locked until either the process exits
or the application calls the
munlockall
function.
Use the
ps
command to determine if a process is locked into memory
and cannot be swapped out.
See
Section 12.3.2.
Memory locks are not inherited across a fork, and all memory locks associated
with a process are unlocked on a call to the
exec
function
or when the process terminates.
See the
Guide to Realtime Programming
and
mlockall(3)