This appendix contains information about performance aspects of the Transport Control Protocol (TCP).
It discusses how programs can influence TCP throughput by controlling the following via socket options:
TCP window size
TCP error recovery
TCP round-trip time
TCP reliability
D.1 TCP Throughput and Window Size
TCP throughput depends on the transfer rate, which is the rate at which the network can accept packets, and the round-trip time, which is the delay between the time a TCP segment is sent and the time an acknowledgement arrives for that segment. These factors determine the amount of data that must be buffered (the window) prior to receiving acknowledgment to obtain maximum throughput on a TCP connection.
If the transfer rate or the round-trip time or both is high, the default window size used by TCP may be insufficient to keep the pipe fully loaded. Under these circumstances, TCP throughput can be limited because the sender is required to stall until acknowledgements for prior data are received.
The receive socket buffer size determines the maximum receive window
for a TCP connection.
The transfer rate from a sender can also be limited
by the send socket buffer size.
The default value is 61440 bytes for TCP send
and receive buffers.
D.1.1 Programming the TCP Socket Buffer Sizes
An application can override
the default TCP send and receive socket buffer sizes by using the
setsockopt
system call and specifying the SO_SNDBUF and SO_RCVBUF
options, prior to establishing the connection.
The largest size that can be
specified with the SO_SNDBUF and SO_RCVBUF options is limited by the kernel
variable
sb_max
.
See
Section D.1.2.1
for information about increasing this value.
For maximum throughput, send and receive socket buffers on both ends of the connection should be of equal size.
When writing programs that use the
setsockopt
system
call to change a TCP socket buffer size (SO_SNDBUF, SO_RCVBUF), note that
the actual socket buffer size used for a TCP connection can be larger than
the specified value.
This situation occurs when the specified socket buffer
size is not a multiple of the TCP Maximum Segment Size (MSS) to be used for
the connection.
TCP determines the actual size, and the specified size is rounded up
to the nearest multiple of the negotiated MSS.
For local network connections,
the MSS is generally determined by the network interface type and its maximum
transmission unit (MTU).
D.1.2 TCP Window Scale Option
Tru64 UNIX implements the TCP window scale option, as defined in RFC 1323: TCP Extensions for High Performance. The TCP window scale option, which allows larger windows to be used, was designed to increase throughput of TCP over high bandwidth, long delay networks. This option may also increase throughput of TCP in local Gigabit Ethernet and FDDI networks.
The window field in the TCP header is 16 bits. Therefore, the largest window that can be used without the window scale option is 2**16 (64KB). When the window scale option is used between cooperating systems, windows up to (2**30)-1 bytes are allowed. The option, transmitted between TCP peers at the time a connection is established, defines a scale factor which is applied to the window size value in each TCP header to obtain the actual window size.
The maximum receive window, and therefore the scale factor offered by TCP during connection establishment, is determined by the maximum receive socket buffer space.
If the receive socket buffer size is greater than 65535 bytes, during
connection establishment, TCP will specify the Window Scale option with a
scale factor based on the size of the receive socket buffer.
Both systems
involved in the TCP connection must send the Window Scale option in their
SYN segments for window scaling to occur in either direction on the connection.
As stated previously, for maximum throughput, send and receive buffers on
both ends of the connection should be of equal size.
D.1.2.1 Increasing the System Socket Buffer Size Limit
The
sb_max
kernel attribute for the Socket kernel subsystem limits the amount of socket
buffer space that can be allocated for each send and receive buffer.
The current
default is 1048576 bytes (1MB) but optionally you can increase it.
For local Gigabit Ethernet connections, the current value is sufficient. For long delay, high bandwidth paths, values greater than 1MB may be required.
To change the
sb_max
kernel attribute in the kernel
currently in memory, use either the
dxkerneltuner
utility
or the
sysconfig -r
command.
See
dxkerneltuner
(8)sysconfig
(8)D.2 TCP Performance and Error Recovery
TCP relies on acknowledgements to determine if packets arrive at their destination. In high-speed connections (for example, Gigabit Ethernet) that use large windows, the default mechanism can seriously affect throughput.
By default, if a packet is lost, TCP retransmits that packet and all
packets after it.
An application can override the default by using the
setsockopt
system call specifying the TCP_SACKENA option , prior
to establishing the connection.
After the option is agreed upon, the data
receiver can inform the sender about all segments that have arrived successfully.
In this way, the sender need retransmit only those segments that have actually
been lost.
This option is useful in cases where multiple segments are dropped.
D.3 TCP Performance and Round-Trip Measurement
TCP bases its round-trip time measurements on a only one packet per window. In high-speed connections (for example, Gigabit Ethernet) that use large window, it is possible for the round-trip time estimates to be seriously flawed, resulting in many retransmissions.
By default, TCP does not send time stamps in the TCP header.
An application
can override the default by using the
setsockopt
system
call specifying the TCP_TSOPTENA option, prior to establishing the connection.
After the option is selected, the sender places a timestamp in each data segment.
The receiver, if configured to accept them, sends these timestamps back in
ACK segments.
This provides the sender with a reliable mechanism with which
to measure round-trip time.
D.4 TCP Reliability and Sequence Numbers
TCP relies on sequence numbers to determine the correct sequencing of packets and to determine if duplicate packets have been received. In high-speed connections (for example, Gigabit Ethernet), it is possible for the sequence numbers to wrap around. This means that two packets could have the same sequence number yet contain different information; they are not duplicate but TCP will assume that they are.
By default, TCP does not provide a mechanism for rejecting old duplicate
packets.
An application can override the default by using the
setsockopt
system call specifying the TCP_PAWS option, after specifying the
TCP_TSOPTENA option, and prior to establishing the connection.
When the PAWS
(Protect Against Wrapped Sequence numbers) option is enabled, the receiver
rejects any old duplicate segments that are received.
This option is used
on synchronized TCP connections only.