CMU OpenVMS TCP/IP Frequently Asked Questions Last Update: 7-OCT-1996 FAQ Maintainer: Andy Harper A.Harper @ kcl.ac.uk PART 3 OF 4 -------------------------------------------------------------------------------- 3.0 KNOWN PROBLEMS ------------------ This section lists known problems with the current base release that are either outstanding or fixed by one or more of the patch kits. -------------------------------------------------------------------------------- 3.1 >>>> IPACP [13-JUL-1995] -------------- The IPACP process coordinates all IP traffic. It also includes a built in ethernet driver and telnet server. 3.1.1 >>>> IPACP ISSUES STATUS CODES TO OPCOM [11-OCT-1994] When the IPACP process (which coordinates the IP traffic) has problems, it can issue system status codes to OPCOM. Here is a typical sequence: %%%%%%%%%%% OPCOM 16-AUG-1993 10:49:23.75 %%%%%%%%%%% Message from user SYSTEM on XYZZY IPACP: XE status error. Status = 00000A00 %%%%%%%%%%% OPCOM 16-AUG-1993 10:49:23.83 %%%%%%%%%%% Message from user SYSTEM on XYZZY IPACP: XE retried 5 times. %%%%%%%%%%% OPCOM 16-AUG-1993 10:49:23.89 %%%%%%%%%%% Message from user SYSTEM on XYZZY IPACP: XE $QIO read error (dev_inact), RC=000020D4 To determine the exact problem, it is first necessary to translate the status codes (00000A00 and 000020D4) into the more usual text form. The DCL lexical function F$MESSAGE will translate them for you. Here is a little command file to make it easier: $! SHOWMSG.COM $! Usage: @SHOWMSG 20D4 $ WRITE SYS$OUTPUT F$MESSAGE(%X'P1') Typically, the messages are indicative of a problem with the ethernet itself or with the ethernet controller; the status messages may help to determine the root cause. The message texts from OpenCMU are not part of the standard system message files. For a translation of the error code into the text to be possible, the user must have issued a SET MESSAGE command on the file NETERROR.EXE. The installation of OpenCMU should have placed this in the SYS$MESSAGE directory. If not, locate the file called NETERROR.OBJ in the CMUIP_ROOT:[*...] tree and relink it to form the NETERROR.EXE, using this command: $ LINK/SHARE=SYS$COMMON:[SYSMSG]NETERROR NETERROR.OBJ Following this, the message texts can be made available to F$MESSAGE using: $ SET MESSAGE SYS$MESSAGE:NETERROR [Note: if, for any reason, NETERROR.OBJ does not exist in the directory tree, it can be found in the second saveset of the OpenCMU kit - CMUIP066.B] 3.1.2 >>>> IPACP CRASH DUE TO QUOTA EXCEEDED [20-MAR-1995] For systems with a high IP load, IPACP may occasionally crash with a quota exceeded. This does not refer to disk quota, but to one of the process quota limits. Usually, the quota in question is BYTLM. The default BYTLM provided for IPACP (65536) is sufficient for only about 20 connections. IPACP takes about 32000 for itself and each connection takes about 1872 bytes. This requirement is NOT currently documented. To increase the BYTLM for the IPACP, modify the IP_STARTUP.COM procedure and change the value of the /BUFFER_LIMIT qualifier on the RUN command that starts the IPACP process. Then shut down and restart IPACP. At the current time, there also appears to be a memory leak in IPACP which has the effect of gradually reducing the available BYTLM over time. When this gets close to zero, IPACP will hang (as it retries) and then crash soon afterwards. It is therefore desirable to give IPACP more BYTLM than the typical load might suggest. If this sort of crash is experienced, increase the BYTLM by 50% and restart it. 3.1.3 >>>> IPACP CRASHES WITH DIVIDE BY ZERO ERROR [15-AUG-1995] On some systems, the IPACP supplied in the V6DRIVER.SAVE patch kit can cause divide by zero problems when running OpenCMU on OpenVMS 6.1. If this happens, return to the IPACP.EXE image supplied in the TEKIP0665A.SAVE patch kit. The erroneous version identifies itself as version 6.7. -------------------------------------------------------------------------------- 3.2 >>>> NAMRES [13-JUL-1995] --------------- NAMRES is the DNS Name Resolver, responsible for translating system names into IP addresses, and vice versa. If not running, use of domain names is not possible though use of IP addresses ought to continue to work. 3.2.1 >>>> NAMRES GIVES DOMAIN REFERRAL EXCEEDED MESSAGES [11-OCT-1994] The name resolver can produce the message `Maximum domain referral limit exceeded' and fail to resolve a name into its address. This is often indicative of incorrect configuration of the name resolver. Ensure that the following lines are included in the NAMRES$CONFIG file: Variable:TIMEOUT:5 Variable:REFMAX:10 Variable:RECURSE:1 You might also want to add: Variable:NS_RETRANS:2 (NOTE: in table 3-8 of the last official manual, the last variable, labelled TIMEOUT, should be labelled RECURSE. TIMEOUT is given correctly as the second entry in the table). Restart the name resolver if necessary: $ IPNCP IPNCP> NAMRES EXIT .... IPNCP> STARTUP /NAMRES 3.2.2 >>>> NAMRES HANGS IN RWAST [12-JUL-1995] After some time, the NAMRES process can hang in an RWAST state, preventing further name resolutions from taking place. This is a bug in the current version and no fix is currently available. Processes in an RWAST state cannot be killed so stopping and restarting the NAMRES process is not possible by standard means. However, a number of workarounds may be possible: * Change the process name and restart NAMRES $ SET PROCESS/ID=xxxx/NAME=OLDNAMRES $ IPNCP STARTUP/NAMRES * Start up NAMRES under a privileged username different from that normally used. * Reboot the system. The last is the only recommended way to completely clear a hung 'RWAST' process. 3.2.3 >>>> NAMRES UNRESPONSIVE [28-NOV-1995] After some time running, NAMRES can become unresponsive although it appears to be still running. In this case, the best thing to do is to shut it down and start it up again. Use these IPNCP commands: $ IPNCP IPNCP> namres exit IPNCP> startup /namres On a busy systen, NAMRES may well gointo this state frequently and it may be worth setting up an automatic batch job which does this on a regular basis. For example: $ SUBMIT 'f$parse(";",f$environment("PROCEDURE"))' /AFTER="TOMORROW+6-" /NOLOG $ IPNCP NAMRES EXIT $ WAIT ::10 $ IPNCP STARTUP /NAMRES The first line of this procedure causes the job to be resubmitted at 6AM each morning. The rest simply simply shuts and restarts NMARES. It should be initially submitted to an appropriate batch queue by SYSTEM. -------------------------------------------------------------------------------- 3.3 >>>> NFS [13-JUL-1995] ------------ The NFS server allows directory hierarchies to be made accessible over the network, such that they can be 'mounted' as a disk on another system. There is no NFS client to allow local mounting of remote disks. 3.3.1 >>>> WHY DOESN'T THE NFS SERVER WORK [11-OCT-1994] The NFS server broke with version 6.6-5 of OpenCMU. At this time, there is no workable solution. IF NFS is a requirement, version 6.6-4 is the last version in which NFS works. -------------------------------------------------------------------------------- 3.4 >>>> FTP [13-JUL-1995] ------------ FTP provides file tranfer capabilities. The FTP server allows remote users to connect and transfer files. The FTP client allows local users to access remote systems. 3.4.1 >>>> WHY IS FTP SO SLOW [11-OCT-1994] The version of FTP supplied with the master 6.6-5 kit suffers from a number of bugs. One of these causes excessive error rates and retransmissions resulting in a low throughput rate. It is STRONGLY recommended that the 6.6-5A patch kit be applied. This greatly improves the performance. See also the freeware MGFTP software (described in more detail in the `Software' section elsewhere). 3.4.2 >>>> WHY DOES FTP CRASH WITH `EXCEEDED QUOTA' [11-OCT-1994] FTP (client or server) can fall over with an `exceeded quota' message if the SYSGEN parameter `MAXBUF' is not set correctly. The latest recommendation is for the minimum value to be 8192. Note that transferring files with large records, exceeding MAXBUF, may still cause problems. 3.4.3 >>>> FTP OF BACKUP SAVESETS GIVES CRC ERRORS [2-OCT-1995] One major use of FTP is in transferring BACKUP savesets to/from other systems. Often this leads to the recipient having difficulties unpacking that saveset; in particular, using BACKUP to list or unpack it results in a stream of messages similar to `CRC error' to the user's terminal and to OPCOM. This article summarizes why the error occurs and how to correct it. When BACKUP creates a saveset, it writes the file with a fixed length record format - the length being that specified with BACKUP's /BLOCK qualifier. For example: BACKUP/BLOCK=8192 * s.bck/save Creates a file with fixed length records of 8192. When FTP is used, in binary mode, the data is sent correctly but the record structure changes; typically, it is created with 512 byte records. Thus, when BACKUP is used to list or unpack the file contents, it finds that the record length of the file does not match the size used originally (this value is stored in the BACKUP saveset header as well as in the file header). If both ends of the FTP session support the special STRUC VMS mode of transfer, then it should be used and the file will transfer correctly. If this structure is not supported, the record structure becomes corrupted and must be manually `fixed up' before BACKUP can be used. There are several ways in which this can be done. Note that, in each case, the technique will work ONLY if the file has been transferred in binary mode ftp. If the file format has been corrupted by ANY other means (such as kermit, or a pathworks file copy) then the techniques will need to be modified appropriately. 1. Create an empty file with the correct record format, and then copy the saveset into it: $ CREATE/FDL=SYS$INPUT newfile.bck RECORD FORMAT FIXED SIZE nnnn CARRIAGE_CONTROL NONE ^Z $ COPY/OVERLAY file.bck newfile.bck NOTES: 'nnn' is the the record size used on the original BACKUP command. The easiest way to obtain this value is to use BACKUP/LIST on the original file; CRC errors WILL be generated but it will display the original block size used before then. The original file is called 'file.bck'. A new copy of the file is made in 'newfile.bck'. 2. An alternative to the above mechanism for creating the 'empty' file in the correct format, which is less obvious but quicker to type, is: $ BACKUP dummy_name newfile.bck/BLOCK=nnn NOTES: 'nnn' is the the record size used on the original BACKUP command. The easiest way to obtain this value is to use BACKUP/LIST on the original file; CRC errors WILL be generated but it will display the original block size used before then. 'dummy_name' is the name of a non-existent file. A new copy of the file is made in 'newfile.bck'. 3. Use the public domain utility called FILE (courtesy of Joe Meadows): $ FILE/RECORD_SIZE=nnn file.bck Where 'nnn' is the record size used on the original BACKUP command. NOTE: this utility does NOT make a copy of the file; instead it patches the file header directly. It is wise to make a backup copy before using this technique!!! 4. Use the public domain utility called FIX_SAVESET (author unknown): $ FIX_SAVESET file.bck This utility scans the file, on the assumption that it is a backup saveset; picks out the original record length from the backup saveset header stored in the file; and finally, patches the file header record size back to this length. A new copy of the file is not made. Summary: To summarize the correct method of transferring a BACKUP saveset using FTP: 1. If both ends support STRUC VMS, then a. ftp> SET STRUC VMS b. ftp> GET file File will be stored locally with the correct attributes. 2. If STRUC VMS is not supported by one or both ends, then a. ftp> BINARY b. ftp> GET file Once file arrives on the VMS system: c. FIX_SAVESET file Availability: ftp://ftp2.kcl.ac.uk/vms/default/fix_saveset.* ftP://ftp2.kcl.ac.uk/vms/joemeadows/file.* NOTE: These items are available in source form only and require a C compiler. 3.4.4 >>>> CANNOT LOGIN TO FTP SERVER AFTER UPGRADE TO OPENVMS 6.0 [20-JAN-1995] Following an upgrade to OpenVMS 6.0, users cannot log in to the FTP server once they have changed their password!. This is because the password hashing algorithm is updated in OpenVMS 6.0 and all new passwords use the new hashing algorithm. The existing FTP_SERVER does not know about the new one and consequently cannot hash correctly, causing a password mismatch. An updated FTP_SERVER is available. This will run on all version of VMS from 5.4 upwards. Alternatively, install MadGoat FTP (see software section elsewhere in this document). -------------------------------------------------------------------------------- 3.5 >>>> TELNET [13-JUL-1995] --------------- TELNET allows interactive use. The telnet server, built into the IPACP, allows remote users to access the local system. The telnet client allows local users to access remote systems interactively. 3.5.1 >>>> WHY DOES TELNET SOMETIMES HANG IN `RWAST' [11-OCT-1994] TELNET clients prior to version 5.0 could, under certain conditions, lock up a process in an RWAST state. All users are strongly recommended to upgrade to Version 5.0-1 of TELNET in which this problem, and others, are solved. 3.5.2 >>>> WHY DOES TELNETTING INTO OPENCMU HANG [11-OCT-1994] When telnetting into a OpenCMU host, the system does not prompt for a username until an extra carriage return appears. There are three known, unrelated, causes for this problem. First, a bug in earlier versions of the OpenCMU telnet software is known to cause unexpected hangs. To fix this, All users should install the latest patches to OpenCMU (6.6-5A) and the telnet client. Second, some PC telnet clients are known to contain problems that prevent them successfully interworking with OpenCMU TELNET. PC-NFS telnet versions 4.x and 5.x suffer from this problem. To fix, avoid these clients - there are plenty of reasonable alternative telnet clients around. Finally, it may be the case that some PC telnet's do not correctly negotiate the telnet options when the call is connected. One or other end can wait indefinitely for the opposite end to continue. At this time, no clear solution is known but the problem can sometimes be alleviated by adding the following to the OpenCMU INET$CONFIG file: Variable:TELNET_NEG_TIMEOUT:0 This causes telnet not to wait for negotiations to timeout, and can speed up those logins which appear to hang for a long time. Note: Under OpenVMS 6.1, the telnet pause bug appears again and there is no current solution to this. -------------------------------------------------------------------------------- >>>> 3.6 MISCELLANEOUS [ 22-AUG-1995 ] ---------------------- This section notes various unrelated items and known bugs that may affect several applications. 3.6.1 PORT NUMBER ALLOCATION BUG [ 22-AUG-1995 ] There is a bug in the low level IP software in OpenCMU 6.6-5 and up, that can result in bad port numbers being returned to an application. The details are as follows: * Client requests that a free port be allocated and set up for listening. * OpenCMU allocates and sets up the port correctly but returns ZERO back to the caller instead of the port number. * Subsequent references to the port then fail. NOTE: If the client requests an explicit port number, rather than letting Open CMU select it, the port number is returned correctly. Here is an example of the problem with FTP: * FTP client has a control connection opened to port 20 on the remote FTP server and wants to download a file. * FTP client requests OpenCMU to allocate a free port and set it up for listening, the idea being that the remote FTP server will make an outgoing call to this port and send the file to it. The OpenCMU bug causes ZERO to be returned for the randomly allocated port. * FTP client sends the returned port number to the FTP server in a PORT command, thus: PORT 123,45,1,2,0,0 * FTP server tries to open the data connection back to this port on the client system. This fails because port zero does not exist. This problem is corrected in the forthcoming 6.7 release. This bug is known to affect WWW clients, such as Lynx and Mosaic, where they are linked with an OpenCMU compatible socket library. The effect is to cause FTP transfers to fail unexpectedly while other protocols work fine. Note that the OpenCMU FTP client gets around the problem by randomly allocating the port itself, based on some function of the date/time, and asking OpenCMU to allocate that specific port. This causes the port number to be returned correctly but risks clashing with a port allocated by another application on the same system. The risk is small but can cause random failures of FTP. The same technique can be used by user written applications. --------------------------------------------------------------------------------