WASD Hypertext Services - Scripting Environment

2 - CGI

2.1 - CGI Environment Variables
2.2 - Script Output
    2.2.1 - CGI Compliant Output
    2.2.2 - Non-Parsed-Header Output
2.3 - Raw HTTP Input (POST Processing)
[next] [previous] [contents] [full-page]

The information in this chapter merely outlines the WASD implementation details, which are in general very much vanilla CGI and NCSA CGI (Common Gateway Interface) compliant, originally based the INTERNET-DRAFT authored by D.Robinson (drtr@ast.cam.ac.uk), 8 January 1996, and confirmed against the latest INTERNET-DRAFT authored by Ken A.L.Coar (drtr@etrade.co.uk), 12 June 1999.


2.1 - CGI Environment Variables

With the standard CGI environment variables are provided to the script via DCL global symbols. Each CGI variable symbol name is prefixed with "WWW_" (by default, although this can be changed using the "/CGI_PREFIX" qualifier and the SET CGIPREFIX mapping rule, see "Technical Overview", this is not recommended if the WASD VMS scripts are to be used, as they expect CGI variable symbols to be prefixed in this manner).

There are a number of non-"standard" CGI variables to assist in tailoring scripts for the WASD environment. Do not make your scripts dependent on any of these if portability is a goal.

Never, ever substitute

the contents of CGI variables directly into the code stream using interpreters that will allows this (e.g. DCL, Perl). You run a very real risk of having unintended content maliciously change the intended function of the code. For example, never use comma substitution of a CGI variable at the DCL command line as in
  $ COPY 'WWW_FORM_SRC' 'WWW_FORM_DST'
Always pre-process the content of the variable first, ensuring there has been nothing inserted that could subvert the intended purpose (repeated here to emphasize the significance of this rule).

CGI variable capacity now varies significantly with VMS version.

The total size of all CGI variable names and values is determined by the value of [BufferSizeDclCommand] configuration directive, which determines the total buffer space of a mailbox providing the script's SYS$COMMAND. The default value of 4096 bytes will be ample for the typical CGI script request, however if it contains very large individual variables or a large number of form fields, etc., it may be possible to exhaust this quantity.


VMS V7.3-2 and later ...

CGI variables may contain up to approximately 8150 characters (the full 8192 symbol capacity cannot be realized due to the way the symbols are created via the CLI). This is a significant increase on earlier capacities. Mailbox buffer [BufferSizeDclCommand] may need to be increased if this capacity is to be fully utilized.


VMS V7.3-1 and earlier ...

Values are limited to approximately 980 characters. This should still be sufficient for most circumstances (if not consider using CGIplus or ISAPI, extensions to CGI programming which remove this limitation). Why such an odd number and why a little rubbery? A DCL command line with these versions is limited to 255 characters so the symbols for larger variables are built up over successive DCL commands with the limit of 980 characters determined by CLI constraints.


CGI Variables

Remember, by default all variables are prefixed by "WWW_" (though this may be modified using the set CGIprefix= mapping rule), and not all variables will be present for all requests.

CGI Environment Variables
NameDescriptionOrigin
AUTH_ACCESS"READ" or "READ+WRITE"WASD
AUTH_AGENTused by an authorization agent (specialized use)WASD
AUTH_GROUPauthentication groupWASD
AUTH_PASSWORDplain-text password, only if EXTERNAL realmWASD
AUTH_REALMauthentication realmWASD
AUTH_REALM_DESCRIPTIONbrowser displayed stringWASD
AUTH_TYPEauthentication type (BASIC or DIGEST)CGI
AUTH_USERdetails of authenticated userWASD
CONTENT_LENGTH"Content-Length:" from request headerCGI
CONTENT_TYPE"Content-Type:" from request headerCGI
DOCUMENT_ROOTgenerally empty, configurable path settingApache
FORM_field-namequery string "&" separated form elementsWASD
GATEWAY_BGdevice name of raw client socket (specialized use)WASD
GATEWAY_EOFEnd of request sentinal (specialized use)WASD
GATEWAY_EOTEnd of callout sentinal (specialized use)WASD
GATEWAY_ESCCallout escape sentinal (specialized use)WASD
GATEWAY_INTERFACE"CGI/1.1"CGI
GATEWAY_MRSmaximum record size of script SYS$OUTPUTWASD
HTTP_ACCEPTany list of browser-accepted content typesCGI
HTTP_ACCEPT_CHARSETany list of browser-accepted character setsCGI
HTTP_ACCEPT_LANGUAGEany list of browser-accepted languagesCGI
HTTP_AUTHORIZATIONany from request header (specialized use)CGI
HTTP_CACHE_CONTROLHTTP/1.1 cache control directiveCGI
HTTP_COOKIEany cookie sent by the clientCGI
HTTP_FORWARDEDany proxy/gateway hosts that forwarded the requestCGI
HTTP_HOSThost and port request was sent toCGI
HTTP_IF_MODIFIED_SINCEany last modified GMT time stringCGI
HTTP_PRAGMAany pragma directive of request headerCGI
HTTP_REFERERany source document URL for this requestCGI
HTTP_USER_AGENTclient/browser identification stringCGI
HTTP_X_FORWARDED_FORproxied client host name or addressSquid
HTTP_field-nameany other unknown request header fieldWASD
KEY_nquery string "+" separated elementsWASD
KEY_COUNTnumber of "+" separated elementsWASD
PATH_INFOvirtual path of data requested in URLCGI
PATH_TRANSLATEDVMS file path of data requested in URLCGI
QUERY_STRINGun-URL-decoded string following "?" in URLCGI
REMOTE_ADDRIP host address of HTTP clientCGI
REMOTE_HOSTIP host name of HTTP clientCGI
REMOTE_USERauthenticated remote user name (or empty)CGI
REQUEST_CHARSETany server-determined request character setWASD
REQUEST_METHOD"GET", "PUT", etc.CGI
REQUEST_SCHEME"http:" or "https:"WASD
REQUEST_TIME_GMTGMT time request receivedWASD
REQUEST_TIME_LOCALLocal time request receivedWASD
REQUEST_URIfull, unescaped request stringApache
SCRIPT_DEFAULTmapped default directory for scriptWASD
SCRIPT_FILENAMEscript file name (e.g. CGI-BIN:[000000]QUERY.COM)Apache
SCRIPT_NAMEscript being executed (e.g. "/query")CGI
SERVER_ADDRIP host name of server systemWASD
SERVER_ADMINemail address for server administrationApache
SERVER_CHARSETserver default character setWASD
SERVER_GMToffset from GMT (e.g. "+09:30")WASD
SERVER_NAMEIP host name of serverCGI
SERVER_PROTOCOLHTTP protocol version (always "HTTP/1.0")CGI
SERVER_PORTIP port request was received onCGI
SERVER_SIGNATUREserver ID, host name and portApache
SERVER_SOFTWAREsoftware ID of HTTP serverCGI
UNIQUE_IDunique 19 character stringApache

If the request path is set to provide them, there are also be variables providing information about a Secure Sockets Layer transported request's SSL environment.


Query String Variables

In line with other CGI implementations, additional, non-compliant variables are provided to ease CGI interfacing. These provide the various components of any query string. A keyword query string and a form query string are parsed into

  WWW_KEY_number 
  WWW_KEY_COUNT
  WWW_FORM_form-element-name

Variables named WWW_KEY_number will be generated if the query string contains one or more plus ("+") and no equate symbols ("=").

Variables named WWW_FORM_form-element-name will be generated if the query string contains one or more equate symbols. Generally such a query string is used to encode form-URL-encoded (MIME type x-www-form-urlencoded) requests. By default the server will report an incorrect encoding with a 400 error response. However some scripts use malformed encodings and so this behaviour may be suppressed using the set script=query=relaxed mapping rule.

   set /cgi-bin/script-name* script=query=relaxed

To suppress this decoding completely (and save a few CPU cycles) use the following rule.

   set /cgi-bin/script-name* script=query=none


UNIQUE_ID Note

The UNIQUE_ID variable is a mostly Apache-compliant implementation (the "_" has been substituted for the "@" to allow it's use in file names), for each request generating a globally and temporally unique 19 character string that can be used where such a identifier might be needed. This string contains only "A"-"Z", "a"-"z", "0"-"9", "_" and "-" characters and is generated using a combination of time-stamp, host IP address, server system process identifier and counter, and is "guaranteed" to be unique in (Internet) space and time.


VMS Apache (CSWS) Compliance

WASD v7.0 had it's CGI environment tailored slightly to ease portability between VMS Apache (Compaq Secure Web Server) and WASD. This included the provision of an APACHE$INPUT: stream and several Apache-specific CGI variables (see the table below). The CGILIB C function library (1.10 - Scripting Function Library) has also been made CSWS V1.0-1 and later (Apache 1.3.12 and higher) compliant.


CGI Variable Demonstration

The basic CGI symbol names are demonstrated here with a call to a script that simply executes the following DCL code:

  $ SHOW SYMBOL WWW_*
  $ SHOW SYMBOL *
Note how the request components are represented for "ISINDEX"-style searching (third item) and a forms-based query (fourth item).

Script source:

HT_ROOT:[SRC.OTHER]CGI_SYMBOLS.COM


2.2 - Script Output

This information applies to all non-DECnet based scripting, CGI, CGIplus, RTE, ISAPI. WASD uses mailboxes for script inter-process communication (IPC). These are efficient, versatile and allow direct output from all VMS environments and utilities. Like many VMS record-oriented devices however there are some things to consider when using them.

To determine the maximum record size and total capacity of the mailbox buffer between server and script WASD provides a CGI environment variable, GATEWAY_MRS, containing an integer with this value.


2.2.1 - CGI Compliant Output

Script response may be CGI or NPH compliant (2.2.2 - Non-Parsed-Header Output). CGI compliance means the script's response must begin with a line containing one of the following fields.

Other HTTP-compliant response fields may follow, with the response header terminated and the response body begun by a single empty line. The following is an example of a CGI-compliant response.

  Content-Type: text/html
  Content-Length: 35

  <HTML>
  <B>Hello world!</B>
  </HTML>

Strict CGI output compliance can be enabled and disabled using the [CgiStrictOutput] configuration directive. With it disabled the server will accept any output from the script, if not CGI or NPH compliant then it automatically generates plain-text header. When enabled, if not a CGI or NPH header the server returns a "502 Bad Gateway" error. For debugging scripts generating this error introduce a plain-text debug mode and header, or use the WATCH facility's CGI item (see the Technical Overview).


WASD Specifics

This section describes how WASD deals with some particular output issues.


Script-Control:

The Apache Group has proposed a CGI/1.2 that includes a Script-Control: CGI response header field. WASD implements the one proposed directive, along with a number of WASD extensions (those beginning with the "X-"). Note that by convention extensions unknown by an agent should be ignored, meaning that they can be freely included, only being meaningful to WASD and not significant to other implementations.

The following is a simple example response where the server is instructed not to delete the script process under any circumstances, and that the body does not require any carriage-control changes.

  Content-Type: text/plain
  Script-Control: no-abort; X-stream-mode
 
  long, slowww script-output ...


Example DCL Scripts

A simple script to provide the system time might be:

  $ say = "write sys$output"
  $! the next two lines make it CGI-compliant
  $ say "Content-Type: text/plain"
  $ say ""
  $! start of plain-text body
  $ show time

A script to provide the system time more elaborately (using HTML):

  $ say = "write sys$output"
  $! the next two lines make it CGI-compliant
  $ say "Content-Type: text/html"
  $ say ""
  $! start of HTML script output
  $ say "<HTML>"
  $ say "Hello ''WWW_REMOTE_HOST'"  !(CGI variable)
  $ say "<P>"
  $ say "System time on node ''f$getsyi("nodename")' is:"
  $ say "<H1>''f$cvtime()'</H1>"
  $ say "</HTML>"


2.2.2 - Non-Parsed-Header Output

A script does not have to output a CGI-compliant data stream. If it begins with a HTTP header status line WASD assumes it will supply a raw HTTP data stream, containing all the HTTP requirements. This is the same as or equivalent to the non-parsed-header, or "nph..." scripts of many environments. This is an example of such a script response.

  HTTP/1.0 200 Success
  Content-Type: text/html
  Content-Length: 35

  <HTML>
  <B>Hello world!</B>
  </HTML>

Any such script must observe the HyperText Transfer Protocol, supplying a full response header and body, including correct carriage-control. Once the server detects the HTTP status header line it pays no more attention to any response header fields or body records, just transfering everything directly to the client. This can be very efficient, the server just a conduit between script and client, but does transfer the responsibility for a correct HTTP response onto the script.


NPH DCL Script

The following example shows a DCL script. Note the full HTTP header and each line explicitly terminated with a carriage-return and line-feed pair.

  $ lf[0,8] = %x0a
  $ crlf[0,16] = %x0d0a
  $ say = "write sys$output"
  $! the next line determines that it is raw HTTP stream
  $ say "HTTP/1.0 200 Success" + crlf
  $ say "Content-Type: text/html" + crlf
  $! response header separating blank line
  $ say crlf
  $! start of HTML script output
  $ say "<HTML>" + lf 
  $ say "Hello ''WWW_REMOTE_HOST'" + lf 
  $ say "<P>" + lf 
  $ say "Local time is ''WWW_REQUEST_TIME_LOCAL'" + lf 
  $ say "</HTML>" + lf 


CGIUTL Utility

This assists with the generation of HTTP responses, including the transfer of binary content from files (copying a file back to the client as part of the request), and the processing of the contents of POSTed requests from DCL (1.9 - DCL Processing of Requests).


NPH C Script

When scripting using the C programming language there can be considerable efficiencies to be gained by providing a binary output stream from the script. This results in the C Run-Time Library (C-RTL) buffering output up to the maximum supported by the IPC mailbox. This may be enabled using a code construct similar to following to reopen stdout in binary mode.

  if ((stdout = freopen ("SYS$OUTPUT", "w", stdout, "ctx=bin")) == NULL)
     exit (vaxc$errno);

This is used consistently in WASD scripts. Carriage-control must be supplied as part of the C standard output (no differently to any other C program). Output can be be explicitly sent to the client at any stage using the fflush() standard library function. Note that if the fwrite() function is used the current contents of the C-RTL buffer are automatically flushed along the the content of the fwrite().

     fprintf (stdout,
  "HTTP/1.0 200 Success\r\n\
  Content-Type: text/html\r\n\
  \r\n\
  <HTML>\n\
  Hello %s\n\
  <P>\n\
  System time is %s\n\
  </HTML>\n",
     getenv("WWW_REMOTE_HOST"),
     getenv("WWW_REQUEST_TIME_LOCAL"));


CGI Function Library

A source code collection of C language functions useful for processing the more vexing aspects of CGI/CGIplus programming (1.10 - Scripting Function Library).


2.3 - Raw HTTP Input (POST Processing)

For POST and PUT HTTP methods (e.g. a POSTed HTML form) the body of the request may be read from the HTTP$INPUT stream. For executable image scripts requiring the body to be present on SYS$INPUT (the C language stdin stream) a user-mode logical may be defined immediately before invoking the image, as in the example.

  $ EGSCRIPT = "$HT_EXE:EGSCRIPT.EXE"
  $ DEFINE /USER SYS$INPUT HTTP$INPUT
  $ EGSCRIPT

The HTTP$INPUT stream may be explicitly opened and read. Note that this is a raw stream, and HTTP lines (carriage-return/line-feed terminated sequences of characters) may have been blocked together for network transport. These would need to be explicity parsed by the program.


[next] [previous] [contents] [full-page]