WASD Hypertext Services - Scripting Environment

2 - CGI

2.1 - CGI Environment Variables
2.2 - Script Output
2.2.1 - CGI Compliant Output
2.2.2 - Non-Parsed-Header Output
2.3 - Raw HTTP Input (POST Processing)

[next] [previous] [contents] [full-page]

The information in this chapter merely outlines the WASD implementation details, which are in general very much vanilla CGI and NCSA CGI (Common Gateway Interface) compliant, originally based the INTERNET-DRAFT authored by D.Robinson (drtr@ast.cam.ac.uk), 8 January 1996, and confirmed against the latest INTERNET-DRAFT authored by Ken A.L.Coar (drtr@etrade.co.uk), 12 June 1999.

2.1 - CGI Environment Variables

With the standard CGI environment variables are provided to the script via DCL global symbols. Each CGI variable symbol name is prefixed with "WWW_" (by default, although this can be changed using the "/CGI_PREFIX" qualifier and the SET CGIPREFIX mapping rule, see "Technical Overview", this is not recommended if the WASD VMS scripts are to be used, as they expect CGI variable symbols to be prefixed in this manner).

In line with other CGI implementations, additional, non-compliant variables are provided to ease CGI interfacing. These provide the various components of the query string. A keyword query string and a form query string are parsed into separated variables, named

  WWW_KEY_number
  WWW_KEY_COUNT
  WWW_FORM_form-element-name

plus a number of non-"standard" CGI variables to assist in tailoring scripts for the WASD environment. Do not make your scripts dependent on any of these if portability is a goal. See the table below.

VMS Apache Compliance

WASD v7.0 had it's CGI environment tailored slightly to ease portability between VMS Apache and WASD. This included the provision of an APACHE$INPUT: stream and several Apache-specific CGI variables (see the table below). The CGILIB C function library (see 1.7 - Scripting Function Library) has also been made CSWS V1.0-1 (Apache 1.3.12) compliant.

VMS Limits

CGI variable values are limited to approximately 980 characters (regardless of the version of VMS in use). This should be sufficient for most circumstances (if not consider using CGIplus or ISAPI, extensions to CGI programming which remove this limitation). Why such an odd number and why a little rubbery? A DCL command line is limited to 255 characters so the symbols for larger variables are built up over successive DCL commands. The limit of 980 characters is set by what can actually be achieved this way (still, it's much more useful than a limit of 255 or less!).

The total length of all CGI variable names and values is determined by the value of [BufferSizeDclCommand] configuration directive, which determines the total buffer space of a mailbox providing the script's SYS$COMMAND. The default value of 4096 bytes will be ample for the typical CGI script request, however if it contains an extraordinary number of form fields, etc., it may be possible to exhaust this buffer space.

CGI Variables

Remember, all variables are prefixed by "WWW_", and not all variables will be present for all requests.

Name	Description	"Standard"
AUTH_ACCESS	"READ" or "READ+WRITE"	WASD
AUTH_AGENT	only present to and used by authorization agent	WASD
AUTH_GROUP	authentication group	WASD
AUTH_PASSWORD	plain-text password, only if EXTERNAL realm	WASD
AUTH_REALM	authentication realm	WASD
AUTH_REALM_DESCRIPTION	browser displayed string	WASD
AUTH_TYPE	authentication type (BASIC or DIGEST)	yes
AUTH_USER	details of authenticated user	WASD
CONTENT_LENGTH	"Content-Length:" from request header	yes
CONTENT_TYPE	"Content-Type:" from request header	yes
DOCUMENT_ROOT	always empty for WASD	Apache
FORM_field	query string "&" separated form elements	no
GATEWAY_BG	device name "BGnnnn:" of raw client socket	WASD
GATEWAY_INTERFACE	"CGI/1.1"	yes
GATEWAY_MRS	maximum record size of script SYS$OUTPUT	WASD
HTTP_ACCEPT	any list of browser-accepted content types	optional
HTTP_ACCEPT_CHARSET	any list of browser-accepted character sets	optional
HTTP_ACCEPT_LANGUAGE	any list of browser-accepted languages	optional
HTTP_AUTHORIZATION	any from request header	optional
HTTP_COOKIE	any cookie sent by the client	optional
HTTP_FORWARDED	any proxy/gateway hosts that forwarded the request	optional
HTTP_HOST	host and port request was sent to	optional
HTTP_IF_MODIFIED_SINCE	any last modified GMT time string	optional
HTTP_PRAGMA	any pragma directive of request header	optional
HTTP_REFERER	any source document URL for this request	optional
HTTP_USER_AGENT	client/browser identification string	optional
KEY_n	query string "+" separated elements	no
KEY_COUNT	number of "+" separated elements	no
PATH_INFO	virtual path of data requested in URL	yes
PATH_TRANSLATED	VMS file path of data requested in URL	yes
QUERY_STRING	un-URL-decoded string following "?" in URL	yes
REMOTE_ADDR	IP host address of HTTP client	yes
REMOTE_HOST	IP host name of HTTP client	yes
REMOTE_USER	authenticated remote user name (or empty)	yes
REQUEST_CHARSET	any server-determined request character set	WASD
REQUEST_METHOD	"GET", "PUT", etc.	yes
REQUEST_SCHEME	"http:" or "https:"	WASD
REQUEST_TIME_GMT	GMT time request received	WASD
REQUEST_TIME_LOCAL	Local time request received	WASD
REQUEST_URI	full, unescaped request string	Apache
SCRIPT_FILENAME	script file name (e.g. "CGI-BIN:[000000]QUERY.COM")	Apache
SCRIPT_NAME	script being executed (e.g. "/query")	yes
SERVER_ADDR	IP host name of server system	WASD
SERVER_ADMIN	email address for server administration	Apache
SERVER_CHARSET	server default character set	WASD
SERVER_GMT	offset from GMT (e.g. "+09:30")	WASD
SERVER_NAME	IP host name of server	yes
SERVER_PROTOCOL	HTTP protocol version (always "HTTP/1.0")	yes
SERVER_PORT	IP port request was received on	yes
SERVER_SIGNATURE	server ID, host name and port	Apache
SERVER_SOFTWARE	software ID of HTTP server	yes
UNIQUE_ID	request-unique 19 character string	Apache

If the request path is set to provide them, there are also be variables providing information about a Secure Sockets Layer transported request's SSL environment.

CGI Variable Demonstration

The basic CGI symbol names are demonstrated here with a call to a script that simply executes the following DCL code:

  $ SHOW SYMBOL WWW_*
  $ SHOW SYMBOL *

Note how the request components are represented for "ISINDEX"-style searching (third item) and a forms-based query (fourth item).

UNIQUE_ID Note

The UNIQUE_ID variable is a mostly Apache-compliant implementation (the "_" has been substituted for the "@" to allow it's use in file names), for each request generating a globally and temporally unique 19 character string that can be used where such a identifier might be needed. This string contains only "A"-"Z", "a"-"z", "0"-"9", "_" and "-" characters and is generated using a combination of time-stamp, host IP address, server system process identifier and counter, and is "guaranteed" to be unique in (Internet) space and time.

2.2 - Script Output

This information applies to all subprocess-based scripting, CGI, CGIplus, RTE, ISAPI. For subprocess scripting WASD uses mailboxes for inter-process communication (IPC). These are efficient, versatile and allow direct output from all VMS environments and utilities. Like many VMS record-oriented devices however there are some things to consider when using them.

Record-Oriented
The mailboxes are created record, not stream oriented. This means records output by standard VMS means (e.g. DCL, utilities, programming languages) are discretely identified and may be processed appropriately by the server as text or binary depending on the content-type.
Maximum Record Size
Being record oriented there is a maximum record size (MRS) that can be output. Records larger than this result in SYSTEM-F-MBTOOSML errors. The WASD default is 4096 bytes. This may be changed using the [BufferSizeDclOutput] configuration directive. This allocation consumes process BYTLM with each mailbox created so the account must be dimensioned sufficiently to supply demands for this quota. The maximum possible size for this is a VMS-limit of 60,000 bytes.
Buffer Space
When created the mailbox has it's buffer space set. With WASD IPC mailboxes this is the same as the MRS. The total data buffered may not exceed this without the script entering a wait state (for the mailbox contents to be cleared by the server). As mailboxes use a little of the buffer space to delimit records stored in it the amount of data is actually less than the total buffer space.

To determine the maximum record size and total capacity of the mailbox buffer between server and script WASD provides a CGI environment variable, GATEWAY_MRS, containing an integer with this value.

2.2.1 - CGI Compliant Output

Script response may be CGI or NPH compliant (see 2.2.2 - Non-Parsed-Header Output). CGI compliance means the script's response must begin with a line containing one of the following fields.

Status: an HTTP status code and associated explanation string
Content-Type: the script body's MIME content-type
Location: a redirection URL

Other HTTP-compliant response fields may follow, with the response header terminated and the response body begun by a single empty line. The following is an example of a CGI-compliant response.

  Content-Type: text/html
  Content-Length: 35

  <HTML>
  <B>Hello world!</B>
  </HTML>

Strict CGI output compliance can be enabled and disabled using the [CgiStrictOutput] configuration directive. With it disabled the server will accept any output from the script, if not CGI or NPH compliant then it automatically generates plain-text header. When enabled, if not a CGI or NPH header the server returns a "502 Bad Gateway" error. For debugging scripts generating this error introduce a plain-text debug mode and header, or use the WATCH facility's CGI item (see the Technical Overview).

WASD Specifics

This section describes how WASD deals with some particular output issues.

Content-Type: text/...
If the script response content-type is "text/..." (text document) WASD assumes that output will be line-oriented and requiring HTTP carriage-control (each record/line terminated by a line-feed), and will ensure each record it receives is correctly terminated before passing it to the client. In this way DCL procedure output (and the VMS environment in general) is supported transparently. Any other content-type is assumed to be binary and no carriage-control is enforced. This default behaviour may be modified as described below.
Carriage-Control
Carriage-control behaviour for any content-type may be explicitly set using either of two additional response header fields. The term stream is used to describe the server just transfering records, without additional processing, as they were received from the script. This is obviously necessary for binary/raw content such as images, octet-streams, etc. The term record describes the server ensuring each record it receives has correct carriage-control - a trailing newline. If not present one is added. This mode is useful for VMS textual streams (e.g. output from DCL and VMS utilities).
Using the Apache Group's proposed CGI/1.2 "Script-Control:" field. The WASD extension-directives X-record-mode and X-stream-mode sets the script output into each of the respective modes. See Script-Control:.
Examples of usage this field:
```
  Script-Control: X-stream-mode
  Script-Control: X-record-mode
```
Script Output Buffering
By default WASD writes each record received from the script to the client as it is received. This can range from a single byte to a complete mailbox buffer full. WASD leaves it up to the script to determine the rate at which output flows back to the client.
While this allows a certain flexibility it can be inefficient. There will be many instances where a script will be providing just a body of data to the client, and wish to do it as quickly and efficiently as possible. Using the proposed CGI/1.2 "Script-Control:" field with the WASD extension directive X-buffer-records a script can direct the server to buffer as many script output records as possible before transfering it to the client. The following should be added to the CGI response header.
```
  Script-Control: X-buffer-records
```
While the above offers some significant improvements to efficiency and perceived throughput the best approach is for the script to provide records the same size as the mailbox (see 2.2 - Script Output for detail on determining this size if required). The can be done explicitly by the script programming or if using the C language simply by changing stdout to a binary stream. With this environment the C-RTL will control output, automatically buffering as much as possible before writing it to the server.
```
  if ((stdout = freopen ("SYS$OUTPUT", "w", stdout, "ctx=bin")) == NULL)
     exit (vaxc$errno);
```
Also see the section describing NPH C Script.

Script-Control:

The Apache Group has proposed a CGI/1.2 that includes a Script-Control: CGI response header field. WASD implements the one proposed directive, along with a number of WASD extensions (those beginning with the "X-"). Note that by convention extensions unknown by an agent should be ignored, meaning that they can be freely included, only being meaningful to WASD and not significant to other implementations.

no-abort - The server must not terminate the script during processing for either no output or no-progress timeouts. The script is to be left completely alone to control it's own termination. Caution, such scripts if problematic could easily accumulate and "clog up" a server or system.
X-buffer-records - Buffer records written by the script until there is [BufferSizeDclOutput] bytes available then write it as a single block to the client.
X-crlf-mode - The server should always ensure each record has trailing carriage-return then newline characters (0x0d, 0x0a). This is generally what VMS requires for carriage control on terminals, printers, etc.
X-lifetime=value - The number of minutes before the idle script subprocess is deleted by the server. Zero sets it back to the default, "none" disables this functionality.
X-record-mode - The server should always ensure each record has a trailing newline character (0x0a), regardless of whether the response is a text/... content-type or not. This is what is usually required by browsers for carriage-control in text documents.
X-stream-mode - The server is not to adjust the carriage-control of records regardless of whether the response is a text/... content-type or not. What the script writes is exactly what the client is sent.
X-timeout-noprogress=value - The number of minutes allowed where the script does not transfer any data to the server before the server deletes the process. Zero sets it back to the default, "none" disables this functionality.
X-timeout-output=value - The number of minutes allowed before an active script is deleted by the server, regardless of it still processing the request. Zero sets it back to the default, "none" disables this functionality.

The following is a simple example response where the server is instructed not to delete the script process under any circumstances, and that the body does not require any carriage-control changes.

  Content-Type: text/plain
  Script-Control: no-abort; X-stream-mode
 
  long, slowww script-output ...

Example DCL Scripts

A simple script to provide the system time might be:

  $ say = "write sys$output"
  $! the next two lines make it CGI-compliant
  $ say "Content-Type: text/plain"
  $ say ""
  $! start of plain-text body
  $ show time

A script to provide the system time more elaborately (using HTML):

  $ say = "write sys$output"
  $! the next two lines make it CGI-compliant
  $ say "Content-Type: text/html"
  $ say ""
  $! start of HTML script output
  $ say "<HTML>"
  $ say "Hello ''WWW_REMOTE_HOST'"  !(CGI variable)
  $ say "<P>"
  $ say "System time on node ''f$getsyi("nodename")' is:"
  $ say "<H1>''f$cvtime()'</H1>"
  $ say "</HTML>"

2.2.2 - Non-Parsed-Header Output

A script does not have to output a CGI-compliant data stream. If it begins with a HTTP header status line WASD assumes it will supply a raw HTTP data stream, containing all the HTTP requirements. This is the same or equivalent to the non-parsed-header, or "nph..." scripts of many environments. This is an example of such a script response.

  HTTP/1.0 200 Success
  Content-Type: text/html
  Content-Length: 35

  <HTML>
  <B>Hello world!</B>
  </HTML>

Any such script must observe the HyperText Transfer Protocol, supplying a full response header and body, including correct carriage-control. Once the server detects the HTTP status header line it pays no more attention to any response header fields or body records, just transfering everything directly to the client. This can be very efficient, the server just a conduit between script and client, but does transfer the responsibility for a correct HTTP response onto the script.

NPH DCL Script

The following example shows a DCL script. Note the full HTTP header and each line explicitly terminated with a carriage-return and line-feed pair.

  $ lf[0,8] = %x0a
  $ crlf[0,16] = %x0d0a
  $ say = "write sys$output"
  $! the next line determines that it is raw HTTP stream
  $ say "HTTP/1.0 200 Success" + crlf
  $ say "Content-Type: text/html" + crlf
  $! response header separating blank line
  $ say crlf
  $! start of HTML script output
  $ say "<HTML>" + lf 
  $ say "Hello ''WWW_REMOTE_HOST'" + lf 
  $ say "<P>" + lf 
  $ say "Local time is ''WWW_REQUEST_TIME_LOCAL'" + lf 
  $ say "</HTML>" + lf

CGIUTL Utility

This assists with the generation of HTTP responses, including the transfer of binary content from files (copying a file back to the client as part of the request), and the processing of the contents of POSTed requests from DCL. See 1.6 - DCL Processing of Requests.

NPH C Script

When scripting using the C programming language there can be considerable efficiencies to be gained by providing a binary output stream from the script. This results in the C Run-Time Library (C-RTL) buffering output up to the maximum supported by the IPC mailbox. This may be simply enabled using a code construct similar to following to reopen stdout in binary mode.

  if ((stdout = freopen ("SYS$OUTPUT", "w", stdout, "ctx=bin")) == NULL)
     exit (vaxc$errno);

This is used consistently in WASD scripts. Carriage-control must be supplied as part of the C standard output (no differently to any other C program). Output can be be explicitly sent to the client at any stage using the fflush() standard library function. Note that if the fwrite() function is used the current contents of the C-RTL buffer are automatically flushed along the the content of the fwrite().

     fprintf (stdout,
  "HTTP/1.0 200 Success\r\n\
  Content-Type: text/html\r\n\
  \r\n\
  <HTML>\n\
  Hello %s\n\
  <P>\n\
  System time is %s\n\
  </HTML>\n",
     getenv("WWW_REMOTE_HOST"),
     getenv("WWW_REQUEST_TIME_LOCAL"));

CGI Function Library

A source code collection of C language functions useful for processing the more vexing aspects of CGI/CGIplus programming. See 1.7 - Scripting Function Library.

2.3 - Raw HTTP Input (POST Processing)

For POST and PUT HTTP methods (e.g. a POSTed HTML form) the body of the request may be read from the HTTP$INPUT stream. For executable image scripts requiring the body to be present on SYS$INPUT (the C language stdin stream) a user-mode logical may be defined immediately before invoking the image, as in the example.

  $ EGSCRIPT = "$HT_EXE:EGSCRIPT.EXE"
  $ DEFINE /USER SYS$INPUT HTTP$INPUT
  $ EGSCRIPT

The HTTP$INPUT stream may be explicitly opened and read. Note that this is a raw stream, and HTTP lines (carriage-return/line-feed terminated sequences of characters) may have been blocked together for network transport. These would need to be explicity parsed by the program.

[next] [previous] [contents] [full-page]