CMU OpenVMS TCP/IP

                     Frequently Asked Questions

                      Last Update:  7-OCT-1996

                    FAQ Maintainer:  Andy Harper
                                     A.Harper @ kcl.ac.uk

PART 3 OF 4

--------------------------------------------------------------------------------


3.0 KNOWN PROBLEMS
------------------

This section lists known problems with the current base release that are either
outstanding or fixed by one or more of the patch kits.

--------------------------------------------------------------------------------

3.1 >>>> IPACP							[13-JUL-1995]
--------------

The IPACP process coordinates all IP traffic. It also includes a built in
ethernet driver and telnet server.


3.1.1 >>>> IPACP ISSUES STATUS CODES TO OPCOM			[11-OCT-1994]

When the IPACP process (which coordinates the IP traffic) has problems, it can
issue system status codes to OPCOM. Here is a typical sequence:

   %%%%%%%%%%%  OPCOM  16-AUG-1993 10:49:23.75  %%%%%%%%%%%
   Message from user SYSTEM on XYZZY
   IPACP: XE status error.  Status = 00000A00

   %%%%%%%%%%%  OPCOM  16-AUG-1993 10:49:23.83  %%%%%%%%%%%
   Message from user SYSTEM on XYZZY
   IPACP: XE retried 5 times.

   %%%%%%%%%%%  OPCOM  16-AUG-1993 10:49:23.89  %%%%%%%%%%%
   Message from user SYSTEM on XYZZY
   IPACP: XE $QIO read error (dev_inact), RC=000020D4


To determine the exact problem, it is first necessary to translate the status
codes (00000A00  and  000020D4) into the more usual text form. The DCL lexical
function F$MESSAGE will translate them for you.  Here is a little command file
to make it easier:


   $! SHOWMSG.COM
   $! Usage: @SHOWMSG 20D4
   $	WRITE SYS$OUTPUT F$MESSAGE(%X'P1')


Typically, the messages are indicative of a problem with the ethernet itself or
with the ethernet controller; the status messages may help to determine the
root cause.

The message texts from OpenCMU are not part of the standard system message
files. For a translation of the error code into the text to be possible, the
user must have issued a SET MESSAGE command on the file  NETERROR.EXE.  The
installation of OpenCMU should have placed this in the SYS$MESSAGE directory.

If not, locate the file called NETERROR.OBJ in the CMUIP_ROOT:[*...] tree and
relink it to form the NETERROR.EXE, using this command:

   $ LINK/SHARE=SYS$COMMON:[SYSMSG]NETERROR NETERROR.OBJ


Following this, the message texts can be made available to F$MESSAGE using:

   $ SET MESSAGE SYS$MESSAGE:NETERROR

[Note: if, for any reason, NETERROR.OBJ does not exist in the directory tree, 
it can be found in the second saveset of the OpenCMU kit - CMUIP066.B]
							<dragon@nscvax.princeton.edu>
							<A.Harper@kcl.ac.uk>


3.1.2 >>>> IPACP CRASH DUE TO QUOTA EXCEEDED			[20-MAR-1995]

For systems with a high IP load, IPACP may occasionally crash with a quota
exceeded. This does not refer to disk quota, but to one of the process quota
limits. Usually, the quota in question is BYTLM.

The default BYTLM provided for IPACP (65536) is sufficient for only about 20
connections. IPACP takes about 32000 for itself and each connection takes about
1872 bytes. This requirement is NOT currently documented.

To increase the BYTLM for the IPACP, modify the IP_STARTUP.COM procedure and
change the value of the /BUFFER_LIMIT qualifier on the RUN command that starts
the IPACP process. Then shut down and restart IPACP.

At the current time, there also appears to be a memory leak in IPACP which has
the effect of gradually reducing the available BYTLM over time. When this gets
close to zero, IPACP will hang (as it retries) and then crash soon afterwards.
It is therefore desirable to give IPACP more BYTLM than the typical load might
suggest. If this sort of crash is experienced, increase the BYTLM by 50% and
restart it.
							<A.Harper@kcl.ac.uk>


3.1.3 >>>> IPACP CRASHES WITH DIVIDE BY ZERO ERROR		[15-AUG-1995]

On some systems, the IPACP supplied in the V6DRIVER.SAVE patch kit can cause
divide by zero problems when running OpenCMU on OpenVMS 6.1. If this happens,
return to the IPACP.EXE image supplied in the TEKIP0665A.SAVE patch kit.

The erroneous version identifies itself as version 6.7.
							<A.Harper@kcl.ac.uk>

--------------------------------------------------------------------------------

3.2 >>>> NAMRES							[13-JUL-1995]
---------------

NAMRES is the DNS Name Resolver, responsible for translating system names into
IP addresses, and vice versa. If not running, use of domain names is not
possible though use of IP addresses ought to continue to work.


3.2.1 >>>> NAMRES GIVES DOMAIN REFERRAL EXCEEDED MESSAGES	[11-OCT-1994]

The name resolver can produce the message `Maximum domain referral limit
exceeded' and fail to resolve a name into its address. This is often indicative
of incorrect configuration of the name resolver.  Ensure that the following
lines are included in the NAMRES$CONFIG file:

   Variable:TIMEOUT:5
   Variable:REFMAX:10
   Variable:RECURSE:1

You might also want to add:

   Variable:NS_RETRANS:2


(NOTE: in table 3-8 of the last official manual, the last variable, labelled
TIMEOUT, should be labelled RECURSE. TIMEOUT is given correctly as the second
entry in the table).

Restart the name resolver if necessary:

   $ IPNCP
   IPNCP> NAMRES EXIT
   ....
   IPNCP> STARTUP /NAMRES
							<A.Harper@kcl.ac.uk>


3.2.2 >>>> NAMRES HANGS IN RWAST				[12-JUL-1995]

After some time, the NAMRES process can hang in an RWAST state, preventing
further name resolutions from taking place.  This is a bug in the current
version and no fix is currently available.  Processes in an RWAST state cannot
be killed so stopping and restarting the NAMRES process is not possible by
standard means.

However, a number of workarounds may be possible:

  *  Change the process name and restart NAMRES
        $ SET PROCESS/ID=xxxx/NAME=OLDNAMRES
        $ IPNCP STARTUP/NAMRES

  *  Start up NAMRES under a privileged username different from that normally
     used.

  *  Reboot the system.

The last is the only recommended way to completely clear a hung 'RWAST'
process.
							<A.Harper@kcl.ac.uk>

3.2.3 >>>> NAMRES UNRESPONSIVE					[28-NOV-1995]

After some time running, NAMRES can become unresponsive although it appears to
be still running.  In this case, the best thing to do is to shut it down and
start it up again. Use these IPNCP commands:

  $ IPNCP
  IPNCP> namres exit
  IPNCP> startup /namres

On a busy systen, NAMRES may well gointo this state frequently and it may be
worth setting up an automatic batch job which does this on a regular basis. For
example:

  $ SUBMIT 'f$parse(";",f$environment("PROCEDURE"))' /AFTER="TOMORROW+6-" /NOLOG
  $ IPNCP NAMRES EXIT
  $ WAIT ::10
  $ IPNCP STARTUP /NAMRES

The first line of this procedure causes the job to be resubmitted at 6AM each
morning. The rest simply simply shuts and restarts NMARES. It should be
initially submitted to an appropriate batch queue by SYSTEM.
							<A.Harper@kcl.ac.uk>

--------------------------------------------------------------------------------

3.3 >>>> NFS							[13-JUL-1995]
------------

The NFS server allows directory hierarchies to be made accessible over the
network, such that they can be 'mounted' as a disk on another system. There is
no NFS client to allow local mounting of remote disks.


3.3.1 >>>> WHY DOESN'T THE NFS SERVER WORK			[11-OCT-1994]

The NFS server broke with version 6.6-5 of OpenCMU.  At this time, there is no
workable solution. IF NFS is a requirement, version 6.6-4 is the last version
in which NFS works.
							<A.Harper@kcl.ac.uk>

--------------------------------------------------------------------------------

3.4 >>>> FTP							[13-JUL-1995]
------------

FTP provides file tranfer capabilities. The FTP server allows remote users to
connect and transfer files. The FTP client allows local users to access remote
systems.


3.4.1 >>>> WHY IS FTP SO SLOW					[11-OCT-1994]

The version of FTP supplied with the master 6.6-5 kit suffers from a number of
bugs. One of these causes excessive error rates and retransmissions resulting
in a low throughput rate.

It is STRONGLY recommended that the 6.6-5A patch kit be applied. This greatly
improves the performance.

See also the freeware MGFTP software (described in more detail in the
`Software' section elsewhere).
							<A.Harper@kcl.ac.uk>


3.4.2 >>>> WHY DOES FTP CRASH WITH `EXCEEDED QUOTA'		[11-OCT-1994]

FTP (client or server) can fall over with an `exceeded quota' message if the
SYSGEN parameter `MAXBUF' is not set correctly.  The latest recommendation is
for the minimum value to be 8192.

Note that transferring files with large records, exceeding MAXBUF, may still
cause problems.
							<A.Harper@kcl.ac.uk>


3.4.3 >>>> FTP OF BACKUP SAVESETS GIVES CRC ERRORS		[2-OCT-1995]

One major use of FTP is in transferring BACKUP savesets to/from other systems.
Often this leads to the recipient having difficulties unpacking that saveset;
in particular, using BACKUP to list or unpack it results in a stream of
messages similar to `CRC error' to the user's terminal and to OPCOM. This
article summarizes why the error occurs and how to correct it.

When BACKUP creates a saveset, it writes the file with a fixed length record
format - the length being that specified with BACKUP's /BLOCK qualifier. For
example:

    BACKUP/BLOCK=8192 * s.bck/save
         Creates a file with fixed length records of 8192.

When FTP is used, in binary mode, the data is sent correctly but the record
structure changes; typically, it is created with 512 byte records. Thus, when
BACKUP is used to list or unpack the file contents, it finds that the record
length of the file does not match the size used originally (this value is
stored in the BACKUP saveset header as well as in the file header).

If both ends of the FTP session support the special STRUC VMS mode of transfer,
then it should be used and the file will transfer correctly. If this structure
is not supported, the record structure becomes corrupted and must be manually
`fixed up' before BACKUP can be used.

There are several ways in which this can be done. Note that, in each case, the
technique will work ONLY  if the file has been transferred in binary mode ftp.
If the file format has been corrupted by ANY other means (such as kermit, or a
pathworks file copy) then the techniques will need to be modified
appropriately.


 1.	Create an empty file with the correct record format, and then copy
        the saveset into it:

	   $ CREATE/FDL=SYS$INPUT  newfile.bck
	   RECORD
	     FORMAT FIXED
             SIZE nnnn
             CARRIAGE_CONTROL NONE
	   ^Z
	   $ COPY/OVERLAY file.bck newfile.bck


	NOTES:
	  'nnn' is the the record size used on the original BACKUP command. The
	  easiest way to obtain this value is to use BACKUP/LIST on the
          original file; CRC errors WILL be generated but it will display the
	  original block size used before then.

	  The original file is called 'file.bck'.

  	  A new copy of the file is made in 'newfile.bck'.


 2.	An alternative to the above mechanism for creating the 'empty' file in
	the correct format, which is less obvious but quicker to type, is:

	  $ BACKUP dummy_name  newfile.bck/BLOCK=nnn

	NOTES:
	  'nnn' is the the record size used on the original BACKUP command. The
	  easiest way to obtain this value is to use BACKUP/LIST on the
          original file; CRC errors WILL be generated but it will display the
	  original block size used before then.

	  'dummy_name' is the name of a non-existent file.

	  A new copy of the file is made in 'newfile.bck'.
          
	  
3.	Use the public domain utility called FILE (courtesy of Joe Meadows):

	   $ FILE/RECORD_SIZE=nnn file.bck

	Where 'nnn' is the record size used on the original BACKUP command.

	NOTE: this utility does NOT make a copy of the file; instead it patches
	the file header directly. It is wise to make a backup copy before using
	this technique!!!


4.	Use the public domain utility called FIX_SAVESET (author unknown):

	   $ FIX_SAVESET file.bck

	This utility scans the file, on the assumption that it is a backup
	saveset; picks out the original record length from the backup saveset
	header stored in the file; and finally, patches the file header record
	size back to this length. A new copy of the file is not made.


Summary:

To summarize the correct method of transferring a BACKUP saveset using FTP:

  1.	If both ends support STRUC VMS, then
	  a.	ftp> SET STRUC VMS
	  b.	ftp> GET file

	File will be stored locally with the correct attributes.

  2.	If STRUC VMS is not supported by one or both ends, then
	  a.	ftp> BINARY
	  b.	ftp> GET file

	Once file arrives on the VMS system:
	  c.	FIX_SAVESET file
							<A.Harper@kcl.ac.uk>


Availability:
   ftp://ftp2.kcl.ac.uk/vms/default/fix_saveset.*
   ftP://ftp2.kcl.ac.uk/vms/joemeadows/file.*

NOTE:  These items are available in source form only and require a C compiler.


3.4.4 >>>> CANNOT LOGIN TO FTP SERVER AFTER UPGRADE TO OPENVMS 6.0 [20-JAN-1995]

Following an upgrade to OpenVMS 6.0, users cannot log in to the FTP server once
they have changed their password!. This is because the password hashing
algorithm is updated in OpenVMS 6.0 and all new passwords use the new hashing
algorithm. The existing FTP_SERVER does not know about the new one and
consequently cannot hash correctly, causing a password mismatch.

An updated FTP_SERVER is available. This will run on all version of VMS from
5.4 upwards.

Alternatively, install MadGoat FTP (see software section elsewhere in this
document).
							<A.Harper@kcl.ac.uk>

--------------------------------------------------------------------------------

3.5 >>>> TELNET							[13-JUL-1995]
---------------

TELNET allows interactive use. The telnet server, built into the IPACP, allows
remote users to access the local system. The telnet client allows local users
to access remote systems interactively.


3.5.1 >>>> WHY DOES TELNET SOMETIMES HANG IN `RWAST'		[11-OCT-1994]

TELNET clients prior to version 5.0 could, under certain conditions, lock up a
process in an RWAST state. All users are strongly recommended to upgrade to
Version 5.0-1 of TELNET in which this problem, and others, are solved.
							<A.Harper@kcl.ac.uk>


3.5.2 >>>> WHY DOES TELNETTING INTO OPENCMU HANG		[11-OCT-1994]

When telnetting into a OpenCMU host, the system does not prompt for a username
until an extra carriage return appears. There are three known, unrelated,
causes for this problem.

First, a bug in earlier versions of the OpenCMU telnet software is known to cause
unexpected hangs. To fix this, All users should install the latest patches to
OpenCMU (6.6-5A) and the telnet client.

Second, some PC telnet clients are known to contain problems that prevent them
successfully interworking with OpenCMU TELNET. PC-NFS telnet versions 4.x and 5.x
suffer from this problem.  To fix, avoid these clients - there are plenty of
reasonable alternative telnet clients around.

Finally, it may be the case that some PC telnet's do not correctly negotiate
the telnet options when the call is connected. One or other end can wait
indefinitely for the opposite end to continue. At this time, no clear solution
is known but the problem can sometimes be alleviated by adding the following to
the OpenCMU INET$CONFIG file:

      Variable:TELNET_NEG_TIMEOUT:0

This causes telnet not to wait for negotiations to timeout, and can speed up
those logins which appear to hang for a long time.

Note: Under OpenVMS 6.1, the telnet pause bug appears again and there is no
current solution to this.
							<A.Harper@kcl.ac.uk>

--------------------------------------------------------------------------------

>>>> 3.6 MISCELLANEOUS 					[ 22-AUG-1995 ]
----------------------

This section notes various unrelated items and known bugs that may affect
several applications.


3.6.1 PORT NUMBER ALLOCATION BUG			[ 22-AUG-1995 ]

There is a bug in the low level IP software in OpenCMU 6.6-5 and up, that can
result in bad port numbers being returned to an application. The details are as
follows:

  *  Client requests that a free port be allocated and set up for listening.

  *  OpenCMU allocates and sets up the port correctly but returns ZERO back to
     the caller instead of the port number.

  *  Subsequent references to the port then fail.

NOTE: If the client requests an explicit port number, rather than letting Open
CMU select it, the port number is returned correctly.


Here is an example of the problem with FTP:

  * FTP client has a control connection opened to port 20 on the remote FTP
    server and wants to download a file.

  * FTP client requests OpenCMU to allocate a free port and set it up for
    listening, the idea being that the remote FTP server will make an outgoing
    call to this port and send the file to it. The OpenCMU bug causes ZERO to
    be returned for the randomly allocated port.

  * FTP client sends the returned port number to the FTP server in a PORT
    command, thus:
      PORT 123,45,1,2,0,0

  * FTP server tries to open the data connection back to this port on the
    client system. This fails because port zero does not exist.

This problem is corrected in the forthcoming 6.7 release.

This bug is known to affect WWW clients, such as Lynx and Mosaic, where they
are linked with an OpenCMU compatible socket library. The effect is to cause
FTP transfers to fail unexpectedly while other protocols work fine.

Note that the OpenCMU FTP client gets around the problem by randomly allocating
the port itself, based on some function of the date/time, and asking OpenCMU to
allocate that specific port. This causes the port number to be returned
correctly but risks clashing with a port allocated by another application on
the same system. The risk is small but can cause random failures of FTP. The
same technique can be used by user written applications.
							<A.Harper@kcl.ac.uk>

--------------------------------------------------------------------------------