[next] [previous] [contents] [full-page]2.1 - CGI Environment Variables
2.2 - Script Output
2.2.1 - CGI Compliant Output
2.2.2 - Non-Parsed-Header Output
2.3 - Raw HTTP Input (POST Processing)
The information in this chapter merely outlines the WASD implementation
details, which are in general very much vanilla CGI and NCSA CGI (Common
Gateway Interface) compliant, originally based the INTERNET-DRAFT authored by
D.Robinson (drtr@ast.cam.ac.uk), 8 January 1996, and confirmed against the
latest INTERNET-DRAFT authored by Ken A.L.Coar (drtr@etrade.co.uk), 12 June
1999.
2.1 - CGI Environment Variables
With the standard CGI environment variables are provided to the script via DCL global symbols. Each CGI variable symbol name is prefixed with "WWW_" (by default, although this can be changed using the "/CGI_PREFIX" qualifier and the SET CGIPREFIX mapping rule, see "Technical Overview", this is not recommended if the WASD VMS scripts are to be used, as they expect CGI variable symbols to be prefixed in this manner).
There are a number of non-"standard" CGI variables to assist in tailoring scripts for the WASD environment. Do not make your scripts dependent on any of these if portability is a goal.
Never, ever substitute
the contents of CGI variables directly into the code stream using interpreters that will allows this (e.g. DCL, Perl). You run a very real risk of having unintended content maliciously change the intended function of the code. For example, never use comma substitution of a CGI variable at the DCL command line as in$ COPY 'WWW_FORM_SRC' 'WWW_FORM_DST'Always pre-process the content of the variable first, ensuring there has been nothing inserted that could subvert the intended purpose (repeated here to emphasize the significance of this rule).
CGI variable capacity now varies significantly with VMS version.
The total size of all CGI variable names and values is determined by the
value of [BufferSizeDclCommand] configuration directive, which determines the
total buffer space of a mailbox providing the script's SYS$COMMAND. The
default value of 4096 bytes will be ample for the typical CGI script request,
however if it contains very large individual variables or a large number of
form fields, etc., it may be possible to exhaust this quantity.
VMS V7.3-2 and later ...
CGI variables may contain up to approximately 8150 characters (the full
8192 symbol capacity cannot be realized due to the way the symbols are created
via the CLI). This is a significant increase on earlier capacities. Mailbox
buffer [BufferSizeDclCommand] may need to be increased if this capacity is to
be fully utilized.
VMS V7.3-1 and earlier ...
Values are limited to approximately 980 characters. This should still be
sufficient for most circumstances (if not consider using CGIplus or ISAPI,
extensions to CGI programming which remove this limitation). Why such an odd
number and why a little rubbery? A DCL command line with these versions is
limited to 255 characters so the symbols for larger variables are built up over
successive DCL commands with the limit of 980 characters determined by CLI
constraints.
CGI Variables
Remember, by default all variables are prefixed by "WWW_" (though this may be modified using the set CGIprefix= mapping rule), and not all variables will be present for all requests.
|
If the request path is set to provide them, there are also be variables
providing information about a Secure Sockets Layer transported request's SSL
environment.
Query String Variables
In line with other CGI implementations, additional, non-compliant variables
are provided to ease CGI interfacing. These provide the various components of
any query string. A keyword query string and a form
query string are parsed into
WWW_KEY_number
WWW_KEY_COUNT
WWW_FORM_form-element-name
Variables named WWW_KEY_number will be generated if the query string contains one or more plus ("+") and no equate symbols ("=").
Variables named WWW_FORM_form-element-name will be generated if
the query string contains one or more equate symbols. Generally such a query
string is used to encode form-URL-encoded (MIME type
x-www-form-urlencoded) requests. By default the server will
report an incorrect encoding with a 400 error response. However some scripts
use malformed encodings and so this behaviour may be suppressed using the
set script=query=relaxed mapping rule.
set /cgi-bin/script-name* script=query=relaxed
To suppress this decoding completely (and save a few CPU cycles) use the
following rule.
set /cgi-bin/script-name* script=query=none
UNIQUE_ID Note
The UNIQUE_ID variable is a mostly Apache-compliant implementation (the
"_" has been substituted for the "@" to allow it's use in file
names), for each request generating a globally and temporally unique 19
character string that can be used where such a identifier might be needed.
This string contains only "A"-"Z", "a"-"z",
"0"-"9", "_" and "-" characters and is generated
using a combination of time-stamp, host IP address, server system process
identifier and counter, and is "guaranteed" to be unique in (Internet)
space and time.
VMS Apache (CSWS) Compliance
WASD v7.0 had it's CGI environment tailored slightly to ease portability
between VMS Apache (Compaq Secure Web Server) and WASD. This included the
provision of an APACHE$INPUT: stream and several Apache-specific CGI variables
(see the table below). The CGILIB C function library
(1.10 - Scripting Function Library) has also been made CSWS V1.0-1 and later (Apache
1.3.12 and higher) compliant.
CGI Variable Demonstration
The basic CGI symbol names are demonstrated here with a call to a script
that simply executes the following DCL code:
$ SHOW SYMBOL WWW_*
$ SHOW SYMBOL *
Note how the request components are represented for
"ISINDEX"-style searching (third item) and a forms-based query (fourth
item).
Script source:
HT_ROOT:[SRC.OTHER]CGI_SYMBOLS.COM
This information applies to all non-DECnet based scripting, CGI, CGIplus, RTE, ISAPI. WASD uses mailboxes for script inter-process communication (IPC). These are efficient, versatile and allow direct output from all VMS environments and utilities. Like many VMS record-oriented devices however there are some things to consider when using them.
The mailboxes are created record, not stream oriented. This means records output by standard VMS means (e.g. DCL, utilities, programming languages) are discretely identified and may be processed appropriately by the server as text or binary depending on the content-type.
Being record oriented there is a maximum record size (MRS) that can be output. Records larger than this result in SYSTEM-F-MBTOOSML errors. The WASD default is 4096 bytes. This may be changed using the [BufferSizeDclOutput] configuration directive. This allocation consumes process BYTLM with each mailbox created so the account must be dimensioned sufficiently to supply demands for this quota. The maximum possible size for this is a VMS-limit of 60,000 bytes.
When created the mailbox has it's buffer space set. With WASD IPC mailboxes this is the same as the MRS. The total data buffered may not exceed this without the script entering a wait state (for the mailbox contents to be cleared by the server). As mailboxes use a little of the buffer space to delimit records stored in it the amount of data is actually less than the total buffer space.
To determine the maximum record size and total capacity of the mailbox
buffer between server and script WASD provides a CGI environment variable,
GATEWAY_MRS, containing an integer with this value.
2.2.1 - CGI Compliant Output
Script response may be CGI or NPH compliant (2.2.2 - Non-Parsed-Header Output). CGI compliance means the script's response must begin with a line containing one of the following fields.
Other HTTP-compliant response fields may follow, with the response header
terminated and the response body begun by a single empty line. The following
is an example of a CGI-compliant response.
Content-Type: text/html
Content-Length: 35
<HTML>
<B>Hello world!</B>
</HTML>
Strict CGI output compliance can be enabled and disabled using the
[CgiStrictOutput] configuration directive. With it disabled the server will
accept any output from the script, if not CGI or NPH compliant then it
automatically generates plain-text header. When enabled, if not a CGI or NPH
header the server returns a "502 Bad Gateway" error. For debugging
scripts generating this error introduce a plain-text debug mode and
header, or use the WATCH facility's CGI item (see the Technical Overview).
WASD Specifics
This section describes how WASD deals with some particular output issues.
If the script response content-type is "text/..." (text document) WASD assumes that output will be line-oriented and requiring HTTP carriage-control (each record/line terminated by a line-feed), and will ensure each record it receives is correctly terminated before passing it to the client. In this way DCL procedure output (and the VMS environment in general) is supported transparently. Any other content-type is assumed to be binary and no carriage-control is enforced. This default behaviour may be modified as described below.
Carriage-control behaviour for any content-type may be explicitly set using either of two additional response header fields. The term stream is used to describe the server just transfering records, without additional processing, as they were received from the script. This is obviously necessary for binary/raw content such as images, octet-streams, etc. The term record describes the server ensuring each record it receives has correct carriage-control - a trailing newline. If not present one is added. This mode is useful for VMS textual streams (e.g. output from DCL and VMS utilities).
Using the Apache Group's proposed CGI/1.2 "Script-Control:" field. The WASD extension-directives X-record-mode and X-stream-mode sets the script output into each of the respective modes (Script-Control:).
Examples of usage this field:
Script-Control: X-stream-mode
Script-Control: X-record-mode
By default WASD writes each record received from the script to the client as it is received. This can range from a single byte to a complete mailbox buffer full. WASD leaves it up to the script to determine the rate at which output flows back to the client.
While this allows a certain flexibility it can be inefficient. There will
be many instances where a script will be providing just a body of data to the
client, and wish to do it as quickly and efficiently as possible. Using the
proposed CGI/1.2 "Script-Control:" field with the WASD extension
directive X-buffer-records a script can direct the server to buffer
as many script output records as possible before transfering it to the client.
The following should be added to the CGI response header.
Script-Control: X-buffer-records
While the above offers some significant improvements to efficiency and
perceived throughput the best approach is for the script to provide records the
same size as the mailbox (2.2 - Script Output for detail on
determining this size if required). The can be done explicitly by the script
programming or if using the C language simply by changing stdout
to a binary stream. With this environment the C-RTL will control output,
automatically buffering as much as possible before writing it to the server.
if ((stdout = freopen ("SYS$OUTPUT", "w", stdout, "ctx=bin")) == NULL)
exit (vaxc$errno);
Also see the section describing NPH C Script.
The Apache Group has proposed a CGI/1.2 that includes a Script-Control: CGI response header field. WASD implements the one proposed directive, along with a number of WASD extensions (those beginning with the "X-"). Note that by convention extensions unknown by an agent should be ignored, meaning that they can be freely included, only being meaningful to WASD and not significant to other implementations.
The following is a simple example response where the server is instructed
not to delete the script process under any circumstances, and that the body
does not require any carriage-control changes.
Content-Type: text/plain
Script-Control: no-abort; X-stream-mode
long, slowww script-output ...
Example DCL Scripts
A simple script to provide the system time might be:
$ say = "write sys$output"
$! the next two lines make it CGI-compliant
$ say "Content-Type: text/plain"
$ say ""
$! start of plain-text body
$ show time
A script to provide the system time more elaborately (using HTML):
$ say = "write sys$output"
$! the next two lines make it CGI-compliant
$ say "Content-Type: text/html"
$ say ""
$! start of HTML script output
$ say "<HTML>"
$ say "Hello ''WWW_REMOTE_HOST'" !(CGI variable)
$ say "<P>"
$ say "System time on node ''f$getsyi("nodename")' is:"
$ say "<H1>''f$cvtime()'</H1>"
$ say "</HTML>"
2.2.2 - Non-Parsed-Header Output
A script does not have to output a CGI-compliant data stream. If it begins
with a HTTP header status line WASD assumes it will supply a
raw HTTP data stream, containing all the HTTP requirements.
This is the same as or equivalent to the non-parsed-header, or
"nph..." scripts of many environments. This is an example of
such a script response.
HTTP/1.0 200 Success
Content-Type: text/html
Content-Length: 35
<HTML>
<B>Hello world!</B>
</HTML>
Any such script must observe the HyperText Transfer Protocol, supplying a
full response header and body, including correct
carriage-control. Once the server detects the HTTP status header line it
pays no more attention to any response header fields or body records, just
transfering everything directly to the client. This can be very efficient, the
server just a conduit between script and client, but does transfer the
responsibility for a correct HTTP response onto the script.
NPH DCL Script
The following example shows a DCL script. Note the full HTTP header and
each line explicitly terminated with a carriage-return and line-feed pair.
$ lf[0,8] = %x0a
$ crlf[0,16] = %x0d0a
$ say = "write sys$output"
$! the next line determines that it is raw HTTP stream
$ say "HTTP/1.0 200 Success" + crlf
$ say "Content-Type: text/html" + crlf
$! response header separating blank line
$ say crlf
$! start of HTML script output
$ say "<HTML>" + lf
$ say "Hello ''WWW_REMOTE_HOST'" + lf
$ say "<P>" + lf
$ say "Local time is ''WWW_REQUEST_TIME_LOCAL'" + lf
$ say "</HTML>" + lf
CGIUTL Utility
This assists with the generation of HTTP responses, including the transfer
of binary content from files (copying a file back to the client as part of the
request), and the processing of the contents of POSTed requests from DCL
(1.9 - DCL Processing of Requests).
NPH C Script
When scripting using the C programming language there can be considerable
efficiencies to be gained by providing a binary output stream from the script.
This results in the C Run-Time Library (C-RTL) buffering output up to the
maximum supported by the IPC mailbox. This may be enabled using a code
construct similar to following to reopen stdout in binary mode.
if ((stdout = freopen ("SYS$OUTPUT", "w", stdout, "ctx=bin")) == NULL)
exit (vaxc$errno);
This is used consistently in WASD scripts. Carriage-control must be
supplied as part of the C standard output (no differently to any other C
program). Output can be be explicitly sent to the client at any stage using
the fflush() standard library function. Note that if the
fwrite() function is used the current contents of the C-RTL buffer
are automatically flushed along the the content of the fwrite().
fprintf (stdout,
"HTTP/1.0 200 Success\r\n\
Content-Type: text/html\r\n\
\r\n\
<HTML>\n\
Hello %s\n\
<P>\n\
System time is %s\n\
</HTML>\n",
getenv("WWW_REMOTE_HOST"),
getenv("WWW_REQUEST_TIME_LOCAL"));
CGI Function Library
A source code collection of C language functions useful for processing the
more vexing aspects of CGI/CGIplus programming (1.10 - Scripting Function Library).
2.3 - Raw HTTP Input (POST Processing)
For POST and PUT HTTP methods (e.g. a POSTed HTML form) the body of the
request may be read from the HTTP$INPUT stream. For executable image scripts
requiring the body to be present on SYS$INPUT (the C language
stdin stream) a user-mode logical may be defined immediately
before invoking the image, as in the example.
$ EGSCRIPT = "$HT_EXE:EGSCRIPT.EXE"
$ DEFINE /USER SYS$INPUT HTTP$INPUT
$ EGSCRIPT
The HTTP$INPUT stream may be explicitly opened and read. Note that this is a raw stream, and HTTP lines (carriage-return/line-feed terminated sequences of characters) may have been blocked together for network transport. These would need to be explicity parsed by the program.