WASD Hypertext Services - Technical Overview

10 - Mapping Rules

10.1 - VMS File System Specifications
10.2 - Extended File Specifications (ODS-5)
    10.2.1 - Characters In Request Paths
    10.2.2 - Characters In Server-Generated Paths
10.3 - Rules
    10.3.1 - MAP, PASS, FAIL Rules
    10.3.2 - REDIRECT Rule
    10.3.3 - USER Rule
    10.3.4 - EXEC/UXEC and SCRIPT, Script Mapping Rules
    10.3.5 - SET Rule
10.4 - Rule Interpretation
10.5 - Mapping Examples
10.6 - Virtual Servers
10.7 - Conditional Mapping
10.8 - Mapping User Directories (tilde character ("~"))
    10.8.1 - Using The SYSUAF
    10.8.2 - Without Using The SYSUAF

[next] [previous] [contents] [full-page]

Mapping rules are used in five primary ways.

To map a request path onto the VMS file system.
To process a request path according to specified criteria resulting in an effective path that is different to that supplied with the request.
To identify requests requiring script activation and to parse the script from the path portion of that request. The path portion is then independently re-mapped.
To conditionally map to different end-results based on one or more criteria of the request.
To provide differing virtual sites depending on the actual service accessed by the client.

Mapping is basically for server-internal purposes only. The only time the path information of the request itself is modified is when a script component is removed. At all other times the path information remains unchanged. Path authorization is always applied to the path supplied with the request.

By default, the system-table logical name HTTPD$MAP locates a common mapping rule file, unless an individual rule file is specified using a job- table logical name. Simple editing of the mapping file changes the rules. Comment lines may be included by prefixing them with the hash "#" character. Although, there is no fixed limit on the number of rules there are the processing implications of scanning a large, linear database.

Rules are given a basic consistency check when loaded (i.e. server startup, map reload, etc.) If there is an obvious problem (unknown rule, missing component, etc., path not absolute) a warning message is generated and the rule is not loaded into the database. This will not cause the server startup to fail. These warning messages may be found in the server process log.

The server administration facility allows realm and arbitrary paths to be checked against the rule database in real-time using the WATCH facility. See 16 - WATCH Facility. In this way the rule database may be checked against test or even live requests.

Any changes to the mapping file may be (re)loaded into the running HTTPd server using the following command on the server system:

  $ HTTPD /DO=MAP

Also see 6 - Server Configuration for daemon configuration.

Server Mapping Rules

A server's currently loaded mapping rules may be interrogated. See 15 - Server Administration for further information.

Mapping Overhead

Naturally, each rule that needs to be processed adds a little to consumed CPU, introduces some latency, and ultimately reduces throughput. The test-bench has shown this to be acceptably small compared to the overall costs of responding to a request. Using the ApacheBench tool on a DEC 3000 Model 300 running OpenVMS V7.2-1 with a simple access to /ht_root/exercise/0k.html showed approximately 60 requests/second throughput using the following mapping file.

  pass /ht_root/exercise/*

After adding various quantities of the same intervening rule

  pass /ht_root/example/* 
  pass /ht_root/example/* 
    .
    .
    .
  pass /ht_root/example/* 
  pass /ht_root/exercise/*

the following results were derived.

Mapping Overhead

Intervening Rules	Requests/S	Throughput
0	60	benchmark
100	57	-5%
200	54	-10%
500	48	-20%

Although this is a fairly contrived set-up and actual real-world rule-sets are more complex than this, even one hundred rules is a very large set, and it does indicate that for all intents and purposes mapping rules can be used to achieve desired objectives without undue concern about impact on server throughput.

10.1 - VMS File System Specifications

The VMS file system in mapping rules is always assumed to begin with a device or concealed device logical. Specifying a Master File Directory (MFD) component, the [000000] is completely optional, although always implied. The mapping functions will always insert one if required for correct file system syntax. That is, if the VMS file system mapping of a path results in a file in a top-level directory an MFD is inserted if not explicitly present in the mapping. For example, both of the following paths

  /dka100/example.txt
  /dka100/000000/example.txt

would result in a mapping to

  DKA100:[000000]EXAMPLE.TXT

The MFD is completely optional when both specifying paths in mapping rules and when supplying paths in a request. Similarly, when supplying a path that includes directory components, as in

  /dka100/dir1/dir2/example.txt
  /dka100/000000/dir1/dir2/example.txt

both mapping to

  DKA100:[DIR1.DIR2]EXAMPLE.TXT

Implication
When using logical names in file system mappings they must be able to be used as concealed devices and cannot be logical equivalents of directory specifications.

Concealed device logicals are created using the following syntax:

  $ DEFINE LOGICAL_NAME device:[dir1.dir2.]
  $ DEFINE LOGICAL_NAME /TRANSLATION=CONCEALED physical_device:[dir1.dir2.]

For ODS-2 volumes (see 10.2 - Extended File Specifications (ODS-5) immediately below), when during rule mapping of a path to a VMS file specification an RMS-invalid character (e.g. "+") or syntax (e.g. multiple periods) is encountered a dollar symbol is substituted in an attempt to make it acceptable. This functionality is often useful for document collections imported to the local web originating from, for instance, a Unix site that utilizes non-RMS file system syntax. The default substitution character may be changed on a per-path basis using the SET rule, 10.3.5 - SET Rule.

10.2 - Extended File Specifications (ODS-5)

OpenVMS Alpha V7.2 introduced a new on-disk file system structure, ODS-5. This brings to VMS in general, and WASD and other Web servers in particular, a number of issues regarding the handling of characters previously not encountered during (ODS-2) file system activities. It is necessary to distinguish paths to ODS-5, extended specification volumes from the default ODS-2 ones. See 10.3.5 - SET Rule.

10.2.1 - Characters In Request Paths

There is a standard for characters used in HTTP requests paths and query strings (URLs). This includes conventions for the handling of reserved characters, for example "?", "+", "&", "=" that have specific meanings in a request, characters that are completely forbidden, for example white-space, control characters (0x00 to 0x1f), and others that have usages by convention, for example the "~", commonly used to indicate a username mapping. The request can otherwise contain these characters provided they are URL-encoded (i.e. a percentage symbol followed by two hexadecimal digits representing the hexadecimal-encoded character value).

There is also an RMS standard for handling characters in extended file specifications, some of which are forbidden in the ODS-2 file naming conventions, and others which have a reserved meaning to either the command-line interpreter (e.g. the space) or the file system structure (e.g. the ":", "[", "]" and "."). Generally the allowed but reserved characters can be used in ODS-5 file names if escaped using the "^" character. For example, the ODS-2 file name "THIS_AND_THAT.TXT" could be named "This^_^&^_That.txt" on an ODS-5 volume. More complex rules control the use of character combinations with significance to RMS, for instance multiple periods. The following file name is allowed on an ODS-5 volume, "A-GNU-zipped-TAR-archive^.tar.gz", where the non-significant period has been escaped making it acceptable to RMS.

The WASD server will accept request paths for file specifications in both formats, URL-encoded and RMS-escaped. Of course characters absolutely forbidden in request paths must still be URL-encoded, the most obvious example is the space. RMS will accept the file name "This^ and^ that.txt" (i.e. containing escaped spaces) but the request path would need to be specified as "This%20and%20that.txt", or possibly "This^%20and^%20that.txt" although the RMS escape character is basically redundant.

Unlike for ODS-2 volumes, ODS-5 volumes do not have "invalid" characters, so no processing is performed to ensure RMS compliance.

10.2.2 - Characters In Server-Generated Paths

When the server generates a path to be returned to the browser, either in a viewable page such as a directory listing or error message, or as a part of the HTTP transaction such as a redirection, the path will contain the URL-encoded equivalent of the canonical form of an extended file specification escaped character. For example, the file name "This^_and^_that.txt" will be represented by "This%20and%20that.txt".

When presenting a file name in a viewable page the general rule is to also provide this URL-equivalent of the unescaped file name, with a small number of exceptions. The first is a directory listing where VMS format has been requested by including a version component in the request file specification. The second is in similar fashion, but with the tree facility, displaying a directory tree. The third is in the navigation page of the UPDate menu. In all of the instances the canonical form of the extended file specification is presented (although any actual reference to the file is URL-encoded as described above).

10.3 - Rules

There are seven mapping rules.

Rules that map paths to the file system, and to other paths:
- MAP
- PASS
- FAIL
- REDIRECT
- USER
Rules that provide access to scripting:
- EXEC
- SCRIPT
- UXEC
A rule that sets characteristics against particular paths:
- SET

10.3.1 - MAP, PASS, FAIL Rules

map template result
If the URL path matches the template, substitute the result string for the path and use that for further rule processing. Both template and result paths must be absolute (i.e. begin with "/").
pass template
pass template result
pass template "999 message text"
If the URL path matches the template, substitute the result if present (if not just use the original URL path), processing no further rules.
The result should be a either a physical VMS file system specification in URL format or an HTTP status-code message (see below). If there is a direct correspondance between the template and result the result may be omitted.
The PASS directive is also used to reverse-map VMS file specifications to the URL path format equivalent.
An HTTP status-code message can be provided as a result. The server then generates a response corresponding to that status code containing the supplied message. Status-code results should be enclosed in one of single or double quotes, or curly braces. See examples. A 3nn status results in a redirection response with the message text comprising the location. Codes 4nn and 5nn result in an error message. Other code ranges (e.g. 0, 1nn, 2nn, etc.) simply cause the connection to be immediately dropped, and can be used for that purpose (i.e. no indication of why!)
fail template
If the URL path matches the template, prohibit access, processing no further rules. The template path must be absolute (i.e. begin with "/").

10.3.2 - REDIRECT Rule

redirect template result
If the URL path matches the template, substitute the result string for the path. Process no further rules. Redirection rules can provide result URLs in four formats, each with a slightly different behaviour.
1. The result can be a full URL ("http://host.domain/path/to/whatever"). This is used to redirect requests to a specific service, usually on a another host.
2. If the scheme (e.g. "http:") is omitted the scheme of the current request is substituted. This allows HTTP requests to be transparently redirected via HTTP and HTTPS (SSL) requests via HTTPS (e.g. "//host.domain/path/to/whatever", note the leading double-slash).
3. In a similar fashion both the scheme and the host name may be omitted (e.g. "///path/to/whatever", note the leading triple-slash). The server then substitutes the appropriate request scheme and host name before returning the redirection to the client.
4. If the scheme is provided but no host component the current request's host infomation is substituted and the redirection made using that (e.g. "https:///secure/path/to/whatever". This effectively allows a request to be redirected from standard to SSL, or from SSL to standard HTTP on the same server.
5. Alternatively, it may be just a path ("/path/to/whatever", a single leading slash), which will cause the server to internally generate an entire new request structure to process the new path (i.e. request redirection is not returned to the client).

10.3.3 - USER Rule

The USER rule maps a VMS user account default device and directory (i.e. home directory) into a request path. That is, the base location for the request is obtained from the VMS systems SYSUAF file. This is usually invoked by a request path in the form "/~username/", see 10.8 - Mapping User Directories (tilde character ("~")) for more detailed information.

user template result
If the path matches the template then the result is substituted, with the following conditions. At least one wildcard must be present. The first wildcard in the result substitutes the username's home directory into the path (in place of the "~username"). Any subsequent wildcard(s) substitute corresponding part(s) of the original path.
If the user DANIEL's default device and directory were
```
  USER$DISK:[DANIEL]
```
the following rule
```
  user /~*/* /*/www/*
```
would result in the following path being mapped and used
```
  /user$disk/daniel/www/
```

NOTE
Accounts that possess SYSPRV, are CAPTIVE, have been DISUSERED or that have expired passwords will not be mapped. A "directory not found" error report is returned.

10.3.4 - EXEC/UXEC and SCRIPT, Script Mapping Rules

Also see "Scripting Environment" document for further information.

The EXEC/UXEC and SCRIPT directives have the variants EXEC+/UXEC+ and SCRIPT+. These behave in exactly the same fashion and simply mark the rule as representing a CGIplus script environment.

The EXEC/UXEC rules maps script directories.

The SCRIPT rules maps script file names. It behaves a little differently to the EXEC rule, essentially supplying in a single rule the effect of a MAP then an EXEC rule.

Both rules must have a template and result, and both must end in a wildcard asterisk. The placement of the wildcards and the subsequent functionality is slightly different however. Both template and result paths must be absolute (i.e. begin with "/").

exec template result
The EXEC rule requires the template's asterisk to immediately follow the slash terminating the directory specification containing the scripts. The script name follows immediately as part of the wildcard-matched string. For example:
```
  exec /htbin/* /ht_root/script/*
```
If the URL path matches the template, the result, including the first slash-terminated part of the wildcard-matched section, becomes the URL format physical VMS file specification the script to be executed. What remains of the original URL path is used to create the path information. Process no further rules.
Hence, the EXEC rule will match multiple script specifications without further rules, the script name being supplied with the URL path. Hence any script (i.e. procedure, executable) in the specified directory is accessable, a possible security concern if script management is distributed.
exec template (run-time-environment)result
A variation on the "exec" rules allows a Run-Time Environment (RTE) to be mapped. An RTE is a persistant scripting environment not unlike CGIplus. The essential difference is an RTE provides an environment in which a variety of scripts can be run. It is often an interpreter, such as Perl, where the advantages of persistance (reduced response latency and system impact) are available. For more information on RTEs and how they operate see the WASD Scripting Environment document.
The RTE executable is specified in parentheses prefixed to the mapping result, as show in this example:
```
  exec /pl-bin/* (cgi-bin:[0000000]perlrte.exe)/ht_root/src/perl/*
```
script template result
The SCRIPT rule requires the template's asterisk to immediately follow the unique string identifying the script in the URL path. The wildcard-matched string is the following path, and supplied to the script. For example:
```
  script /conan* /ht_root/script/conan*
```
If the URL path matches the template, the result becomes the URL format physical VMS file specification for the DCL procedure of the script to be executed (the default file extension of ".COM" is not required). What remains of the original URL path is used to create the path information. Process no further rules.
NOTE
The wildcard asterisk is best located immediately after the unique script identifier. In this way there does not need to be any path supplied with the script. If even a slash follows the script identifier it may be mapped into a file specification that may or may not be meaningful to the script.

Hence, the SCRIPT rule will match only the script specified in the result, making for finely-granular scripting at the expense of a rule for each script thus specified. It also implies that only the script name need precede any other path information.
It may be thought of as a more efficient implementation of the equivalent functionlity using two CERN rules, as illustrated in the following example:
```
  map /conan* /script/conan*
  exec /cgi-bin/* /cgi-bin/*
```
uxec template result
The UXEC rule is an analog to the EXEC rule, except it is used to map user scripts. It requires two mapping asterisks, the first for the username, the second for the script name. It must be used in conjunction with a SET script=as=~ rule. For example:
```
  SET   /~*/www/cgi-bin/*  script=as=~
  UXEC  /~*/cgi-bin/*  /*/www/cgi-bin/*
```
For further information see User Account Scripting and the "Scripting Overview, Introduction".

Script Location

It is conventional to locate script images in HT_ROOT:[AXP] or HT_ROOT:[VAX] (depending on the platform), and procedures, etc. in HT_ROOT:[SCRIPT] or HT_ROOT:[SCRIPT_LOCAL]. These multiple directories are accessable via the single search list logical CGI-BIN.

Script files can be located elsewhere, either subdirectories of HT_ROOT:[SCRIPT_LOCAL] or in an area completely outside of the HT_ROOT tree. Two approaches are available.

Modify the search list CGI-BIN to include the additional directories.
Use mapping rules to make the script accessable. This can be done by using the EXEC or SCRIPT rule to specify the directory directly as in these examples
```
  exec /mycgi-bin/* /site_local_scripts/bin/*
  script /myscript* /web/myscripts/bin/myscript.exe*
```
or by using the MAP rules to make a hierarchy of script locations obvious and accessable, as in this example
```
  map /cgi-bin/myscripts/* /cgi-bin_myscripts/* 
  exec /cgi-bin_myscripts/* /web/myscripts/bin/*
```

10.3.5 - SET Rule

The SET rule does not change the mapping of a path, it just sets one or more characteristics against that path that affect the subsequent processing in some way. It is a general purpose rule that conveniently allows the administrator to tell the server to process requests with particular paths in some ad hoc and generally useful fashion. Most SET parameters are single keywords that act as boolean switches on the request, some require parameter strings. Multiple space-separated parameters may be set against against the one path in a single SET statement.

AUTHONCE, NOAUTHONCE - If a request path contains both a script component and a resource component by default the WASD server makes sure both parts are authorized before allowing access (see 12 - Authentication and Authorization). This can be disabled using this path setting. When this is done only the original request path undergoes authorization. "authONCE".
CACHE, NOCACHE - The default is to cache files (when caching is enabled). NOCACHE sets files in this path as not to be stored in the file cache. A subsequent CACHE will again permit caching, allowing smaller path hierarchies to be cached within larger ones that are not.
CGIPREFIX - CGI environment variable names are by default prefixed with "WWW_". This may be changed on a per-path basis using this SET rule. To remove the prefix altogether for selected scripts use "CGIprefix=".
CHARSET - This setting allows overriding of the server default ([CharsetDefault] configuration parameter) content-type character set (in the response header) for text files (plain and HTML). A string is required as in the following example, "charset=ISO-8859-5".
CONTENT - The content-type of a file is normally determined by the file's type (extension). This setting allows files matching the template to be returned with the specified content-type. The content-type must be specified as a parameter, e.g. "content=application/binary".
EXPIRED, NOEXPIRED - This setting allows files in the specified paths to be sent pre-expired. The browser should always then reload them whenever accessed.
INDEX - This setting provides the "Index of" (directory listing) format string for directory paths matching the template. It uses the same formatting as can be supplied with a URL and overrides any query string passed via any URL.
LOG, NOLOG - When server access logging is enabled the default is to log all requests. The NOLOG setting suppresses logging for requests involving the specified path template.
MAPONCE, NOMAPONCE - Normally, when a script has been identified during mapping, the resultant path information is also mapped in a second pass. This can be suppressed by SETting the path as MAPONCE. The resultant path is then given to the script without further processing.
PROFILE, NOPROFILE - When using the server /PROFILE qualifier enable or disable the authentication profile when assessing access for a specific path.
REPORT - This setting allows error and other server-generated reports for any specified path to changed between detailed and basic (see 6.4.1 - Basic and Detailed). Use "report=BASIC" and "report=DETAILED".
ODS-5, ODS-2 - The ODS-5 setting is used to indicate that a particular path maps to files on an ODS-5 volume and so the names may comply to extended specifications. This changes the way file names are processed, including for example the replacement of invalid RMS characters (see immediately below). The ODS-2 setting is basically redundant, because if a path is not indicated as being ODS-5 it is assumed to be ODS-2. This can be used for clarity in the mapping rules if required.
RMSCHAR - This setting applies to ODS-2 paths (the default) only. Paths SET as ODS-5 do not have this applied. During rule mapping of a path to a VMS file specification, if an RMS-invalid character (e.g. "+") or syntax (e.g. multiple periods) is encountered a dollar symbol is substituted in an attempt to make it acceptable. This setting provides an alternate substitution character. Any general RMS-valid character may be specified (e.g. alpha-numeric, '$', '-' or '_', although the latter three are probably the only REAL choices). A single character is required as in the following example, "RMSchar=_".
SCRIPT=AS= - For non-server account scripting this rule allows the user account to be either explicitly specified or substituted through the use of the tilde character "~" or the dollar "$". For further detail see the "Scripting Overview, Introduction".
SCRIPT=[FIND|NOFIND] - By default the server always confirms the existance and accessability of a script file by searching for it before attempting to activate it. If it does not exsist it reports an error. It may be possible a Run-Time Environment (RTE) may require to access it's own script file via a mechanism available only to itself. The server script search may be disabled by SETting the path as nofind, for example "script=nofind". The script path and filename is directly passed to the RTE for it to process and activate.
SSI - SSI documents cannot contain privileged directives (e.g. <--#exec ... --> unless owned by SYSTEM ([1,4]) or are in path set as allowing these directives. Use SSI=priv to enable this, SSI=nopriv to disable. Caution: these SSI directives are quite powerful, use great care when allowing any particular document author or authors to use them.
SSLCGI=string - Enables and sets the type of CGI variables used to represent a Secure Sockets Layer (SSL) CGI variables.
- "SSLCGI=none" disables the facility
- "SSLCGI=Apache_mod_SSL" provides Apache mod_ssl style variables
- "SSLCGI=Purveyor" provides Purveyor style variables
When enabling these variables it is advised to increase the HTTPD$CONFIG [BufferSizeDclCommand] and [BufferSizeCgiPlusIn] directives by approximately 2048.
STMLF, NOSTMLF - Specify files to be automatically converted to Stream-LF format. The default is to ignore conversion. STMLF allows selected paths to be converted. See File Record Format.

Of course, as with all mapping rules, paths containing file types (extensions) may be specified so it is quite easy to apply settings to particular groups of files. Multiple settings may be made against the one path, merely separate set directives from each other with white-space. If a setting string is required to contain white-space enclose the string with single or double quotes, or curly brackets. The following example gives a small selection of potential uses.

  # examples of SET rule usage
  # --------------------------
  # disable caching for selected paths
  set /ht_root/src/* NOcache
  set /sys$common/* NOcache
  # enable stream-LF conversion in selected directory trees
  set /web/* stmlf
  set /ht_root/* stmlf
  # respond with Cyrillic character set(s) from relevant directories
  set /*/8859-5/* charset=ISO-8859-5
  set /*/koi8-r/* charset=KOI8-R
  # the Sun Java tutorial when UNZIPped contains underscores for invalid characters
  set /vms/java/tutorial/* RMSchar=_
  # if a request has "/plain-text/" in it's path then ALWAYS return as plain-text!
  set /*/plain-text/* content=text/plain
  map /*/plain-text/* /*/*
  # same for "/binary/"
  set /*/binary/* content=text/plain
  map /*/binary/* /*/*
  # indicate extended file specifications on this path
  set /Documents/* ODS-5
  pass /Documents/* /ods5_device/Documents/*
  # disable server script search for this RTE
  set /onerte/*  script=nofind
  exec /onerte/* (CGI-BIN:[000000]ONERTE.EXE)/ht_root/src/one/*

10.4 - Rule Interpretation

The rules are scanned from first towards last, until a matching PASS, EXEC, SCRIPT, FAIL, REDIRECT, UXEC or USER rule is encountered, when processing ceases and final substitution occurs. Mapped rules substitute the template with the result and continue to the next rule.

Use of wildcards in template and result:

The template may contain one or more asterisk ("*") wildcard symbols. These match zero or more characters up until the character following the wildcard (or end-of-string). If no wildcard is present then the path must match the template exactly.
The result may contain one or more asterisk ("*") wildcard symbols. It must not contain more wildcards than the template. The result wildcards are expanded to replace the matching characters of the respective template wildcards. Characters represented by wildcards in the template not represented by a corresponding wildcard in the result are ignored. Non-wildcard result characters are directly inserted in reconstructed path. Non-wildcard characters in the template are ignored. If the result contains no wildcards it completely replaces the URL path.

Virtual Servers

As described in 6.3 - Virtual Services virtual service syntax may be used with mapping rules to selectively apply rules to one specific service. If virtual services are configured rule interpretation sees only rules common to all services and those specific to it's own service (host address and port). In all other aspects rule interpretation applies as described above.

10.5 - Mapping Examples

The example mapping rule file for the WASD HTTP server can be viewed.

Example of Map Rule

The result string of these rules may or may not correspond to to a VMS physical file system path. Either way the resulting rule is further processed before passing or failing.

The following example shows a path "/web/unix/shells/c" being mapped to "/web/software/unix/scripts/c", with this being used to process further rules.
```
  map /web/unix/* /web/software/unix/*
```

Examples of Pass Rule

The result string of these rules should correspond to to a VMS physical file path.

This example shows a path "/web/rts/home.html" being mapped to "/user$rts/web/home.html", and this returned as the mapped path.
```
  pass /web/rts/* /user$rts/web/*
```
This maps a path "/icon/bhts/dir.gif" to "/web/icon/bhts/dir.gif", and this returned as the mapped path.
```
  pass /icon/bhts/* /web/icon/bhts/*
```
This example illustrates HTTP status code mapping. Each of these does basically the same thing, just using one of the three possible delimiters according to the characters required in the message. The server generates a 403 response with has as it's text the following message. (Also see the conditional mapping examples.)
```
  pass /private/* "403 Can't go in there!"
  pass /private/* '403 "/private/" is off-limits!'
  pass /private/* {403 Can't go into "/private/"}
```

Examples of Fail Rule

If a URL path "/web/private/home.html" is being mapped the path would immediately be failed.
```
  fail /web/private/*
```
To ensure all access fails, other than that explicitly passed, this entry should be included the the rules.
```
  fail /*
```

Examples of Exec and Script Rules

If a URL path "/htbin/ismap/web/example.conf" is being mapped the "/ht_root/script/" must be the URL format equivalent of the physical VMS specification for the directory locating the script DCL procedure. The "/web/example.conf" that followed the "/htbin/ismap" in the original URL becomes the translated path for the script.
```
  exec /cgi-bin/* /cgi-bin/*
```
If a URL path "/pl-bin/example/this/directory/and-file.txt" is being mapped the script name and filename become "/pl-bin/example" and "HT_ROOT:[SRC.PERL]EXAMPLE.PL" respectively, the path information and translated become "/this/directory/and-file.txt" and "THIS:[DIRECTORY]AND-FILE.TXT", and the interpreter (run-time environment) activated to interpret the script is CGI-BIN:[000000]PERLRTE.EXE.
```
  exec /pl-bin/* (cgi-bin:[000000]perlrte.exe)/ht_root/src/perl/*
```
If a URL path "/conan/web/example.hlb" is being mapped the "/ht_root/script/conan" must be the URL format equivalent of the physical VMS specification for the DCL procedure. The "/web/example.hlb" that followed the "/conan/" in the original URL becomes the translated path for the script.
```
  script /conan* /ht_root/script/conan*
```

Example of Redirect Rule

If a URL path "/AnotherGroup/this/that/other.html" is being mapped the URL would be redirected to "http://host/this/that/other.html"
```
  redirect /AnotherGroup/* http://host/group/*
```

10.6 - Virtual Servers

As described in 6.3 - Virtual Services, virtual service syntax may be used with mapping rules to selectively apply rules to one specific service. This example provides the essentials of using this syntax. Note that service-specific and service-common rules may be mixed in any order allowing common mappings (e.g. for scripting) to be shared.

  # a mapping rule example of virtual servers
  [[alpha.wasd.dsto.defence.gov.au:80]]
  # ALPHA is the only service allowing access to VMS help directory
  pass /sys$common/syshlp/*
  [[beta.wasd.dsto.defence.gov.au:80]]
  # good stuff is only available from BETA
  pass /good-stuff/*
  # BETA has it's own error report format, the others share one
  pass /errorreport /ht_root/local/errorreportalpha.shtml  
  [[gamma.wasd.dsto.defence.gov.au:80]]
  # gamma responds with documents using the Cyrillic character set
  set /* charset=ISO-8859-5
  [[*]]
  # common file and script mappings
  exec /cgi-bin/* /cgi-bin/*
  exec+ /cgiplus-bin/* /cgi-bin/*
  script+ /help/* /cgiplus-bin/conan/*
  pass /errorreport /ht_root/local/errorreport.shtml  
  # now the base directories for all documents
  [[alpha.wasd.dsto.defence.gov.au:80]]
  /* /web/alpha/*
  [[beta.wasd.dsto.defence.gov.au:80]]
  /* /web/beta/*
  [[gamma.wasd.dsto.defence.gov.au:80]]
  /* /web/gamma/*
  [[*]]
  # catch-all rule (just in case :^)
  pass /* /web/*

The server administration menu WATCH report (see 15.3 - HTTPd Server Reports) provides the capability to view the rule databse as well as rule mapping during actual request processing, using the WATCH facility.

10.7 - Conditional Mapping

The purpose of conditional mapping is to apply rules only after certain criteria other than the initial path match are met. These criteria serve to create conditional mapping rules, and were introduced in version 4.4.

THIS OFFERS A POWERFUL TOOL TO THE SERVER ADMINISTRATOR!

Conditional mapping can be applied on the following criteria:

authenticated remote user
client internet address
browser-accepted languages
browser-accepted character sets
browser-accepted content-types
browser identification string
cookie data
host and port specified in request header
HTTP method (GET, POST, etc.)
proxy/gateway host(s) request forwarded by
refering page
request scheme (protocol ... "http:" or "https:")
query string
server name
server port

Conditionals must follow the rule and are delimited by "[" and "]". Multiple, space-separated conditions may be included within one "[...]". This behaves as a logical OR (i.e. the condition only needs one matched to be true). Multiple "[...]" conditionals may be included against a rule. These act as a logical AND (i.e. all must have at least one condition matched). If a condition begins with a "!" it acts as a negation operator (i.e. matched strings result in a false condition, unmatched strings in a true condition). The result of an entire conditional may also be negated by prefixing the "[" with a "!".

If a conditional, or set of conditionals, is not met the rule is completely ignored.

Matching is done by simple, case-insensitive, string comparison, using the wildcards "*", matching one or more characters, and "%", matching any single character.

White-space (spaces and TABs), wildcards and the delimiting "[" and "]", are forbidden characters and cannot be used within condition matching strings, nor can they be encoded for inclusion in any way (for simplicity and speed of processing). These characters are uncommon in the information being matched against, but if one does occur then "match" it using a single character wildcard ("%").

While conditionals are powerful adjuncts to smart serving they do add significant overhead to rule mapping and should be used with this in mind.

Conditionals

ac: - browser-accepted content types ("Accept:" request header field)
al: - browser-accepted languages ("Accept-Language:" request header field)
as: - browser-accepted character sets ("Accept-Charset:" request header field)
ck: - cookie data ("Cookie:" request header field)
fo: - request forwarded by proxy/gateway host(s) ("Forwarded:" request header field)
ho: - browser host internet name or address
hm: - browser host internet address compare to dotted-decimal network and mask specification
me: - request HTTP method
qs: - query string
rf: - refering page ("Referer:" request header field)
ru: - authenticated remote user name
sc: - request scheme (protocol), "http", and if SSL is in use "https" (see 14 - Secure Sockets Layer)
sn: - server name
sp: - server port
ua: - browser ("User-Agent:" request header field)
vs: - virtual host and port request directed to ("Host:" request header field)

Examples

NOTE
It is possible to spoof (impersonate) internet host addresses. Therefore any controls applied using host name/address information cannot be used for authorization purposes in the strictest sense of the term.

The following example shows a rule being applied only if the client host is within a particular subnet. This is being used to provide a "private" home page to those in the subnet while others get a "public" page by the second rule.
```
  pass / /web/internal/ [ho:131.185.250.*]
  pass / /web/
```
This is a similar example to the above, but showing multiple host specifications and specifically excluding one particular host using the negation operator "!". This could be read as pass if ((host OR host) AND (not host)).
```
  pass / /web/internal/ [ho:*.fred.com ho:*.george.com] [!ho:you.fred.com]
  pass / /web/
```
The next example shows how to prevent browsing of a particular tree except from specified host addresses.
```
  pass /web/internal/* /web/SorryNoAccess.html [!ho:131.185.250.*]
  pass /web/internal/*
```
This could be used to prevent browsing of the server configuration files (an alternative to this sort of approach is to use the authorization file, see 12 - Authentication and Authorization).
```
  pass /ht_root/local/* /web/SorryNoAccess.html [!ho:131.185.250.201]
```
This example performs much the same task as the previous one, but uses whole conditional negation to prevent browsing of a particular tree except from specified addresses (as well as using the continuation character to provide a more easily comprehended layout ... note the trailing spaces as required). This could be read as pass if not (host OR host OR host).
```
  pass /web/internal/* /web/SorryNoAccess.html \
  ![\
  ho:131.185.250.* \
  ho:131.185.251.* \
  ho:131.185.45.1 \
  ho:ws2.wasd.dsto.gov.au\
  ]
  pass /web/internal/*
```

This example demonstrates mapping pages according to geography or language preference (it's a bit contrived, but ...)

  pass /doc/* /web/doc/french/* [ho:*.fr al:fr]
  pass /doc/* /web/doc/swedish/* [ho:*.se al:se]
  pass /doc/* /web/doc/english/*

How to exclude specific browsers from your site (how many times have we seen this!)

  # I had to pick on a well-known acronym, no offence Bill!
  pass /* /web/NoThankYou.html [ua:*MSIE*]

This example allows excluding certain requests from specific addresses. This could be read as pass if ((method is POST) AND (not host)).
```
  pass /* /web/NotAllowed.html [me:POST] [!ho:*.my.net]
```
The following illustrates using the server name and/or server port to conditionally map servers executing on clustered nodes using the same configuration file, or for multi-homed/multi-ported hosts. Distinct home pages are maintained for each system, and on BETA two servers execute, one on port 8000 that may only be used by those within the specified network address range.
```
  pass / /web/welcome_to_Alpha.html [sn:alpha.*]
  pass / /web/welcome_to_Beta.html [sn:beta.*] [sp:80]
  pass /* /sorry_no_access.html [sn:beta.*] [sp:8000] [!ho:*.my.sub.net]
  pass / /web/welcome_to_Beta_private.html [sn:beta.*] [sp:8000]
```
Each of these three do basically the same thing, just using the three possible delimiters according to the characters required in the message. The server generates a 403 response with has as it's text the following message.
```
  pass /private/* "403 Can't go in there!" [!ho:my.host.name]
  pass /private/* '403 "/private/" is off-limits!' [!ho:my.host.name]
  pass /private/* {403 Can't go into "/private/"} [!ho:my.host.name]
```
This example illustrates the use of a host network mask, the "HM:" conditional.
```
  pass /private/* "403 Can't go in there!" [!hm:131.185.250.128/255.255.255.192]
```
The mask is a dotted-decimal network address, a slash, then a dotted-decimal mask. This example shows a 6 bit subnet. Network mask conditionals operate by bitwise-ANDing the client host address with the mask, bitwise-ANDing the network address supplied with the mask, then comparing the two results for equality. Using the above example the host 131.185.250.250 would be accepted, but 131.185.250.50 would be rejected.

Note that rule processing for any particular path may be checked using the WATCH facility from the server administration menu. See 16 - WATCH Facility for details.

10.8 - Mapping User Directories (tilde character ("~"))

The convention for specifying user web areas is "/~username/". The basic idea is that the user's web-available file-space is mapped into the request in place of the tilde and username.

10.8.1 - Using The SYSUAF

The USER rule maps a VMS user account default device and directory (i.e. home directory) into a request path (see 10.3.3 - USER Rule). That is, the base location for the request is obtained from the VMS systems SYSUAF file. A user's home directory information is cached, to reduce load on the authorization databases. As this information is usually quite static there is no timeout period on such information (although it may be flushed to make room for other user's). Cache contents is include in the Mapping Rules Report (see 15.3 - HTTPd Server Reports) and is implicitly flushed when the server's rules are reloaded (see 15.5 - HTTPd Server Action).

The following is a typical usage of the rule.

  USER  /~*/*  /*/www/*

Note the "/www" subdirectory component. It is stongly recommended that users never be mapped into their top-level, but into a web-specific subdirectory. This effectively "sandboxes" Web access to that subdirectory hierarchy, allowing the user privacy elsewhere in the home area.

To accomodate request user paths that do not incorporate a trailing delimiter after the username the following redirect may be used to cause the browser to re-request with a more appropriate path (make sure it follows the USER rule).

  REDIRECT  /~*  ///~*/

WASD also "reverse maps" VMS specifications into paths and so requires additional rules to provide these mappings. (Reverse mapping is required during directory listings and error reporting.) For the continuing example the following rules would be required (and in the stated order).

  USER  /~*/*  /*/www/*
  REDIRECT  /~*  ///~*/
  PASS  /~*/*  /user$disk/*/www/*

Where user home directories are spread over multiple devices (physical or concealed logical) a reverse-mapping rule would be required for each. Consider the following situation, where user directories are distributed across these devices (concealed logicals)

  USER$GROUP1:
  USER$GROUP2:
  USER$GROUP2:
  USER$OTHER:

This would require the following mapping rules (in the stated order).

  USER  /~*/*  /*/www/
  PASS  /~*/*  /user$group1/*/www/*
  PASS  /~*/*  /user$group2/*/www/*
  PASS  /~*/*  /user$group3/*/www/*
  PASS  /~*/*  /user$other/*/www/*

Accounts with a search list as a default device (e.g. SYS$SYSROOT) present particular complications in this schema and should be avoided.

NOTE
Accounts that possess SYSPRV, are CAPTIVE, have been DISUSERED or that have expired passwords will not be mapped. A "directory not found" error report is returned. This error was chosen to make it to make more difficult to probe the authorization environment, determining whether accounts exist or not.

Of course vanilla mapping rules may be used to provide for special cases. For instance, if there is requirement for a particular, privileged account to have a user mapping that could be provided as in the following (rather exagerated) example.

  PASS  /~system/*  /sys$common/sysmgr/www/*
  USER  /~*/*  /*/www/
  PASS  /~*/*  /user$disk/*/www/*

User Account Scripting

In some situations it may be desirable to allow the average Web user to experiment with or implement scripts. With WASD 7.1 and later, and VMS V6.2 and later, this is possible. Detached scripting must be enabled, the /PERSONA startup qualifier used, and appropriate mapping rules in place. If the SET "script=as=" mapping rule specifies a tilde character then for a user request the mapped SYSUAF username is substituted.

The following example shows the essentials of setting up a user environment where access to a subdirectory in the user's home directory, [.WWW] with script's located in a subdirectory of that, [.WWW.CGI-BIN].

  SET   /~*/www/cgi-bin/*  script=as=~
  UXEC  /~*/cgi-bin/*  /*/www/cgi-bin/*
  USER  /~*/*  /*/www/*
  REDIRECT  /~*  /~*/
  PASS  /~*/*  /dka0/users/*/*

For more detailed information see the "Scripting Overview, Introduction".

10.8.2 - Without Using The SYSUAF

Deprecated and Discouraged
There are now "better" approaches to achieving the same functionality as described in this section. This documentation is retained only for reference by older site configurations.

The server is also able to map user directories using the same mechanisms as for any other. No reference needs to be made to the SYSUAF, user support can be accomplished via a combination of mapping rule and logical name. This approach relies on a correspondance between the username and the home directory name. Hence users are made known by the HTTPd using the name of their top-level directory. User scripts can also be supported using WASD's DECnet scripting environment.

The "PASS" rule provides a wildcard representation of users' directory paths. As part of this mapping a subdirectory specifically for the hypertext data should always be included. Never map users' top-level directories. For instance if a user's account home directory was located in the area USER$DISK:[DANIEL] the following rule would potentially allow the user DANIEL to provide web documents from the home subdirectory [.WWW] (if the user has created it) using the accompanying URL:

  pass /~*/* /user$disk/*/www/*
 
  http://host/~daniel/

It is recommended that a separate logical name be created for locating user directories. This helps hide the internal organisation of the file system. The following logical name definition and mapping rule illustrate this point.

  $ DEFINE /SYSTEM /EXEC /TRANSLATION=CONCEALED WWW_USER device:[USER.]
 
  pass /~*/* /www_user/*/www/*

Where users are grouped into different areas of the file system a logical search list may be defined.

  $ DEFINE /SYSTEM /EXEC /TRANSLATION=CONCEALED -
           WWW_USER -
           DISK1:[GROUP1.], -
           DISK1:[GROUP2.], -
           DISK2:[GROUP3.], -
           DISK2:[GROUP4.]
    
  pass /~*/* /www_user/*/www/*

As logical search lists have specific uses and some complications (e.g. when creating files) this is the only use for them recommended with this server, although it is specifically coded to allow for search lists in document specifications.

If only a subset of all users are to be provided with WWW publishing access either their account directories can be individually mapped (best used only with a small number) or a separate area of the file system be provided for this purpose and specifically mapped as user space.

Of course, user mapping is amenable to all other rule processing so it is a simple matter to redirect or otherwise process user paths. For instance, the published username does not need to, or need to continue to, correspond to any real user area, or the user's actual name or home area:

  redirect /~doej/* http://a.nother.host/~doej/*
  pass /~doej/* /www/messages/deceased.html
  pass /~danielm/* /special$www$area/danielm/*
  pass /~Mark.Daniel/* /user$disk/danielm/www/*
  pass /~*/* /www_user/*/www/*

A user directory is always presented as a top-level directory (i.e. no parent directory is shown), although any subdirectory tree is accesssable by default.

[next] [previous] [contents] [full-page]