WASD Hypertext Services - Technical Overview

13 - Proxy Services

13.1 - HTTP Proxy Serving
    13.1.1 - Reverse Proxy
    13.1.2 - Enabling A Proxy Service
    13.1.3 - Proxy Chaining
    13.1.4 - Controlling Proxy Serving
13.2 - Caching
    13.2.1 - Cache Device
    13.2.2 - Enabling Caching
    13.2.3 - Cache Management
    13.2.4 - Cache Invalidation
    13.2.5 - Cache Retention
13.3 - FTP Proxy Serving
13.4 - CONNECT Serving
    13.4.1 - Enabling CONNECT Serving
    13.4.2 - Controlling CONNECT Serving
13.5 - Reporting and Maintenance
    13.5.1 - PCACHE Utility
13.6 - Browser Configuration
[next] [previous] [contents] [full-page]

A proxy server acts as an intermediary between Web clients and Web servers.  It listens for requests from the clients and forwards these to remote servers.  The proxy server then receives the responses from the servers and returns them to the clients.  Why go to this trouble?  There are several reasons, the most common being:


Proxy Serving Quick-Start

No additional software needs to be installed to provide proxy serving.  The following steps provide a brief outline of proxy configuration. 

  1. Enable proxy serving and specify which particular services are to be proxies (13.1.2 - Enabling A Proxy Service). 

  2. If proxy caching is required (most probably, see 13.2 - Caching)

  3. If providing SSL tunnelling (proxy of Secure Sockets Layer transactions) add/modify a service for that (13.4 - CONNECT Serving). 

  4. Add HTTPD$MAP mapping rules for controlling this/these services (13.1.4 - Controlling Proxy Serving, 13.4.2 - Controlling CONNECT Serving, and 13.3 - FTP Proxy Serving). 

  5. Restart server (HTTPD/DO=RESTART). 


Configuration Parameter Summary


Error Messages

When proxy processing is enabled and HTTPD$CONFIG directive [ReportBasicOnly] is disabled it is necessary to make adjustments to the contents of the HTTPD$MSG message configuration file [status] item beginning "Additional Information".  Each of the "/httpd/-/statusnxx.html" links

  <A HREF="/httpd/-/status1xx.html">1<I>xx</I></A>
  <A HREF="/httpd/-/status2xx.html">2<I>xx</I></A>
  <A HREF="/httpd/-/status3xx.html">3<I>xx</I></A>
  <A HREF="/httpd/-/status4xx.html">4<I>xx</I></A>
  <A HREF="/httpd/-/status5xx.html">5<I>xx</I></A>
  <A HREF="/httpd/-/statushelp.html">Help</A>
should be changed to include a local host component
  <A HREF="http://local.host.name/httpd/-/status1xx.html">1<I>xx</I></A>
  <A HREF="http://local.host.name/httpd/-/status2xx.html">2<I>xx</I></A>
  <A HREF="http://local.host.name/httpd/-/status3xx.html">3<I>xx</I></A>
  <A HREF="http://local.host.name/httpd/-/status4xx.html">4<I>xx</I></A>
  <A HREF="http://local.host.name/httpd/-/status5xx.html">5<I>xx</I></A>
  <A HREF="http://local.host.name/httpd/-/statushelp.html">Help</A>

If this is not provided the links and any error report will be interpreted by the browser as relative to the server the proxy was attempting to request from and the error explanation will not be accessable. 


13.1 - HTTP Proxy Serving

WASD (currently) provides a proxy service for the HTTP scheme (prototcol). 

Proxy serving generally relies on DNS resolution of the requested host name.  DNS lookup can introduce significant latency to transactions.  To help ameliorate this WASD incorporates a host name cache.  To ensure cache consistency the contents are regularly flushed, after which host names must use DNS lookup again, refreshing the information in the cache.  The period of this cache purge is contolled with the [ProxyHostCachePurgeHours] configuration parameter. 

When a request is made by a proxy server is is common for it to add a line to the request header stating that it is a forwarded request and the agent doing the forwarding.  With WASD proxying this line would look something like this:

  Forwarded: by http://host.name.domain (HTTPd-WASD/6.0.0 OpenVMS/AXP Digital-TCPIP SSL)
It is enabled using the [ProxyForwardedBy] configuration parameter. 


13.1.1 - Reverse Proxy

The use of WASD proxy serving as a firewall component assumes two configured network interfaces on the system, one of which is connected to the internal network, the other to the external network.  (Firewalling could also be accomplished using a single network interface with router blocking external access to all but the server system.) Outgoing (internal to external) proxying is the most common configuration, however a proxy server can also be used to provide controlled external access to selected internal resources.  This is sometimes known as reverse proxy

In this configuration the proxy server is contacted by an external browser with a standard HTTP request.  Proxy server rules map this request onto a proxy-request format result.  For example:

  pass /sales/* http://sales.corporate.server.com/*

The server recognises the result format and performs a proxy request to a system on the internal network.  Note that the mappings required could become quite complex, but it is possible.  See example 7 in 13.1.4 - Controlling Proxy Serving


13.1.2 - Enabling A Proxy Service

Proxy serving is enabled on a per-server basis using the [ProxyServing] configuration parameter. 

WASD can configure services using the HTTPD$CONFIG [service] directive, the HTTPD$SERVICE configuration file, or even the /SERVICE= qualifier. 


HTTPD$CONFIG [Service]

The actual services providing the proxy serving (i.e. the host and port) are specified on a per-service basis.  This means it is possible to have proxy and non-proxy services deployed on the one server (on different ports of course).  Proxying is enabled by appending the proxy keyword to the particular service specification.  The following example shows a non-proxy and proxy service. 

  [Service]
  http://alpha.wasd.dsto.defence.gov.au:80
  http://alpha.wasd.dsto.defence.gov.au:8080;proxy


HTTPD$SERVICE

Proxy service configuration using the HTTPD$SERVICE configuration is slightly simpler, with a specific configuration directive for each aspect.  See 8 - Service Configuration.  This example illustrates configuring the same services as used in the previous section. 

  [[http://alpha.wasd.dsto.defence.gov.au:80]]
 
  [[http://alpha.wasd.dsto.defence.gov.au:8080]]
  [ServiceProxy]  enabled

Examples in following section all show configuration using the HTTPD$CONFIG [Service] directive.  When using the HTTPD$SERVICE configuration file administration menu interface all relevant proxy directives are provided for selection. 


13.1.3 - Proxy Chaining

Some sites may already be firewalled and have corporate proxy servers providing Internet access.  It is quite possible to use WASD proxying in this environment, where the WASD server makes it's proxied requests via the next proxy server in the hierarchy.  This is known as proxy chaining.  Using the chain keyword specify the host name of the next server when enabling the proxy service, as in this example:

  [Service]
  http://alpha.wasd.dsto.defence.gov.au:8080;proxy;chain=next.proxy.host


13.1.4 - Controlling Proxy Serving

Controlling both access-to and access-via proxy serving is possible. 


Proxy Password

Access to the proxy service can be directly controlled through the use of WASD authorization.  Proxy authorization is distinct from general access authorization.  It uses specific proxy authorization fields provided by HTTP, and by this allows a proxied transaction to also supply transaction authorization for the remote server. 

The following example shows a service specification using the "pauth" parameter making the proxy service require authorization for use. 

  [Service]
  http://alpha.wasd.dsto.defence.gov.au:8080;proxy;pauth

In addition to the service being specified as requiring authorization it is also necessary to configure the source of the authentication.  This is done using the HTTPD$AUTH configuration file.  The following example shows all requests for the proxy virtual service must be authorized (GET and well as POST, etc.), although it is possible to restrict access to only read (GET), preventing data being sent out via the server. 

  [[alpha.wasd.dsto.defence.gov.au:8080]]
  ["Proxy Access"=PROXY_ACCESS=id]
  http://* read+write


Local Password

It is also possible to control proxy access via local authorization, although this is less flexible by removing the ability to then pass authorization information to the remote service.  In other repects it is set up in the same way as proxy authorization, only using the "lauth" parameter. 


Access Filtering

Extensive control of how, by whom and what a proxy service is used for may be exercised using WASD general and conditional mapping, (see 10 - Mapping Rules and 10.7 - Conditional Mapping), possibly in the context of a virtual service specification for the particular connect service host and port (see 10.6 - Virtual Servers).  The following examples provide a small indication of how mapping could be used in a proxy service context. 

  1. It is possible, though more often not practical, to regulate which hosts are connected to via the proxy service.  For example, the following rule forbids accessing any site with the string "hacker" in it (for the proxy service "alpha...:8080".
      [[alpha.wasd.dsto.defence.gov.au:8080]]
      pass http://*hacker*/* "403 Proxy access to this host is forbidden."
      pass http://*
    

  2. Or as in the following example, only allow access to specific sites. 
      [[alpha.wasd.dsto.defence.gov.au:8080]]
      pass http://*.org/*
      pass http://*.digital.com/*
      pass http://* "403 Proxy access to this host is forbidden."
    

  3. It is also possible to restrict access via the proxy service to selected hosts on the internal subnet.  Here only a range of numeric addresses plus a single host in another subnet are allowed access to the service. 
      [[alpha.wasd.dsto.defence.gov.au:8080]]
      pass http://* "403 Restricted access." ![ho:131.185.250.* ho:131.185.200.10]
      pass http://*
    

  4. In the following example POSTing to a particular proxied servers is not allowed (why I can't imagine, but hey, this is an example!)
      [[alpha.wasd.dsto.defence.gov.au:8080]]
      pass http://subscribe.sexy.com/* "403 POSTing not allowed." [me:POST]
      pass http://*
    

  5. It is possible to redirect proxied requests to other sites. 
      [[alpha.wasd.dsto.defence.gov.au:8080]]
      redirect http://www.sexy.com/* http://www.disney.com/
      pass http://*
    

  6. A proxy service is just a specialized capability of a general HTTP service.  Therefore it is quite in order for the one service to respond to standard HTTP requests as well as proxy-format HTTP requests.  To enforce the use of a particular service as proxy-only, add a final rule to a virtual service's mapping restricting non-proxy requests. 
      [[alpha.wasd.dsto.defence.gov.au:8080]]
      pass http://*
      pass /* "403 This is a proxy-only service."
    

  7. This example provides the essentials when supporting reverse proxying.  Note that mappings may become quite complex when supporting access to resources across multiple internal systems (e.g. access to directory icons). 
      [[main.corporate.server.com:80]]
      pass /sales/* http://sales.corporate.server.com/*
      pass /shipping/* http://shipping.corporate.server.com/*
      pass /support/* http://support.corporate.server.com/*
      pass * "403 Nothing to access here!"
    
NOTE

To expedite proxy mapping is it recommended to have a final rule for the proxy virtual service that explicitly passes the request.  This would most commonly be a permissive pass as in example 1, could quite easily be an restrictive pass as in example 2, or a combination as in example 6.


13.2 - Caching

Caching involves using the local file-system for storage of responses that can be reused when a request for the same URL is made.  The WASD server does not have to be configured for caching, it will provide proxied access without any caching taking place. 

When a proxied request is processed, and it's characteristics would allow the response to be cached, a unique identifier generated from the URL is used to create a corresponding file name.  The response header and any body are stored in this file.  This may be the data of an HTML page, a graphic, etc. 

When a proxied request is being processed, and it's characteristics would allow the request to be cached, the unique identifier generated allows for a previously created cache file to be checked for.  If it exists, and is current enough, the response is returned from it, instead of from the remote server.  If it exists and is no longer current the request is re-made to the remote server, and the response if still cacheable is re-cached, keeping the contents current.  If it does not exist the response is delivered from the remote server. 


Not all responses can be cached! 

The main critera are for the response to be successful (200 status), general (i.e. one not in response to a specialized query or action), and not too volatile (i.e. the same page may be expected to be returned more than once, preferably over an extended period). 

Proxied requests can only be cached if ...

Proxied responses will only be cached if ...

The [ProxyCacheFileKbytesMax] configuration parameter controls the maximum size of a response before it will not be cached.  This can be determined from any "Content-Length:" response header field, in which case it will proactively not be cached, or if during cache load the maximum size of the file increases beyond the specified limit the load is aborted. 


Not all sites may benefit from cache! 

As many transactions on today's Web contain query strings, etc., and therefore cannot be meaningfully cached, it should not be assumed the cost/benefit of having a proxy cache enabled is a forgone conclusion.  Each site should monitor the proxy traffic reports and decide on a local policy. 

The facilities described in 13.5 - Reporting and Maintenance allow a reasonably informed decision to be made.  Items to be considered. 

Last, but by no means least, understanding the characteristics of local usage.  For example, are there a small number of requests generating lots of non-cacheable traffic?  For instance, a few users accessing streaming content. 


13.2.1 - Cache Device

Selection of a disk device for supporting the proxy cache should not be made without careful consideration, doubly so if significant traffic is experienced.  Here are some common-sense suggestions. 

Initially the directory will need to be created.  This can be done manually as described below, or if using the supplied server startup procedures (see STARTUP.COM) it is checked for and if it does not exist is automatically created during startup.  The directory must be owned by the HTTP$SERVER account and have full read+write+execute+delete access.  It is suggested to name it [HT_CACHE] and may be created manually using the following command. 

  $ CREATE /DIR /OWN=HTTP$SERVER /PROT=(O:RWED,W) device:[HT_CACHE.]

It is a relatively simple matter to relocate the cache at any stage.  Simply create the required directory in the new location, modify the startup procedures to reflect this, shut the server down completely then restart it using the procedures (not a /DO=RESTART!). The contents of the previous location could be transfered to the new using the BACKUP utility if desired. 


HT_CACHE_ROOT Logical

It is required to define the logical name HT_CACHE_ROOT if any proxy services are specified in the server configuration.  The server will not start unless it is correctly defined.  The logical should be a concealed device logical specifying the top level directory of the cache tree.  The following example shows how to define such a logical name. 

  $ DEFINE /SYSTEM /EXEC /TRANSLATION=CONCEALED HT_CACHE_ROOT device:[HT_CACHE.]

If example startup procedure is in use then it is quite straight-forward to have the logical created during server startup (see STARTUP.COM). 


13.2.2 - Enabling Caching

Caching may enabled on a per-service basis.  This means it is possible to have a caching proxy service and a non-caching service active on the one server.  Caching is enabled by appending the cache keyword to the particular service specification.  The following example shows a non-proxy and a caching proxy service. 

  [Service]
  http://alpha.wasd.dsto.defence.gov.au:80
  http://alpha.wasd.dsto.defence.gov.au:8080;proxy;cache

Proxy caching may be selectively disabled for a particular site, sites or paths within sites using the set nocache mapping rule.  This rule, used to disable caching for local requests, also disables proxy file caching for that subset of requests.  This example shows a couple of variations. 

  [[alpha.wasd.dsto.defence.gov.au:8080]]
  # disable caching for local site's servers that respond fairly quickly
  set http://*.local.domain/* nocache
  # disable caching of log files
  set http://*.log nocache
  pass http://*
NOTE

It is also recommended to place the cache directory under some authorization control to prevent casual browsing and access of the cache contents.  Something local, similar in intention to
  [VMS]
  /ht_cache_root/* ~webadmin,131.185.250.*,r+w ;


13.2.3 - Cache Management

As the proxy cache is implemented using the local file system, management of the cache implies controlling the number of, and exactly which files remain in cache.  Essentially then, management means when and which to delete.  The [ProxyReportLog] configuration parameter enables the server process log reporting of cache management activities. 

Cache file deletion takes two forms. 

  1. ROUTINE

    This ensures files that have not been accessed within specified limits are periodically and regularly deleted.  The [ProxyCacheRoutineHourOfDay] configuration parameter controls this activity. 

    The ROUTINE form occurs once per day at the specified hour.  The cache files are scanned looking for those that exceed the configuration parameter for maximum period since last access, which are then deleted (the largest number of [ProxyCachePurgeList], as described below). 

  2. REACTIVE

    This is a remedial action, when cache device usage is reaching it's configuration limit and files need to be deleted to free up space.  The following parameters control this behaviour. 

    The cache device space usage is checked at the specified interval. 

    If the device reaches the specified percentage used a cache purge is initiated and by deleting files until the specified reduction is attained, the total space in use on the disk is reduced. 

    The cache files are scanned using the [ProxyCachePurgeList] parameter described below, working from the greatest to least number of hours in the steps provided.  At each scan files not accessed within that period are deleted.  At each few files deleted the device free space is checked as having reached the lower purge percentage limit, at which point the scan terminates. 

    This parameter has as it's input a series of comma-separated integers representing a series of hours since files were last accessed.  In this way the cache can be progressively reduced until percentage usage targets are realized.  Such a parameter would be specified as follows,

      [ProxyCachePurgeList] 168,48,24,8,0
    
    meaning the purge would first delete files not accessed in the last week, then not for the last two days, then the last twenty-four hours, then eight, then finally all files.  The largest of the specified periods (in this case 168) is also used as the limit for the ROUTINE scan and file delete. 

    Once the target reduction percentage is reached the purge stops.  During the purge operation further cache files are not created.  Even when cache files cannot be created for any reason proxy serving still continues transparently to the clients. 

    NOTE

    Cache files can be manually deleted at any time (from the command line) without disturbing the proxy-caching server and without rebuilding any databases.  When deleting, the /BEFORE=date/time qualifier can be used, with /CREATED being the document's last-modified date, /REVISED being the last time it was loaded, and /EXPIRED the last time the file was accessed (used to supply a request).  Be aware that on an active server it is quite possible some files may be locked at time of attempted deletion. 


13.2.4 - Cache Invalidation

For the purposes of this document, cache invalidation is defined as the determination when a cache file's data is no longer valid and needs to be reloaded. 

The method used for cache validation is deliberately quite simple in algorithm and implementation.  In this first attempt at a proxy server the overriding criteria have been efficiency, simplicity of implementation, and reliability.  Wishing to avoid complicated revalidation using behind-the-scenes HEAD requests the basic approach has been to just invalidate the cache item upon exiry of a period related to it's "Last-Modified:" age or upon a no-cache request, both described further below. 

The revision count (automatically updated by VMS) tracks the absolute number of accesses since the file was created (actually a maximum of 65535, or an unsigned short, but that should be enough for informational purposes). 


13.2.5 - Cache Retention

The [ProxyCaheReloadList] configuration parameter is used to control when a file being accessed is reloaded from source. 

This parameter supplies a series of integers representing the hours after which an access to a cache file causes the file to be invalidated and reloaded from it's source during the proxied request.  Each number in the series represents the lower boundary of the range between it and the next number of hours.  A file with a last-loaded age falling within a range is reloaded at the lower boundary of that particular range.  The following example

  [ProxyCacheReloadList] 1,2,4,8,12,24,48,96,168
would result in a file 1.5 hours old being reloaded every hour, 3.25 hours old every 2 hours, 7 hours old every 4 hours, etc.  Here "old" means since last (or of course first) loaded.  Files not reloaded since the final integer, in this example 168 (one week), are always reloaded. 


13.3 - FTP Proxy Serving

WASD provides a proxy service for the FTP scheme (prototcol).  This is not (as yet) integrated into the HTTPd or cached being provided using the proxy agent script HT_ROOT:[SRC.MISC]FETCH.C.

The (probable) file system of the FTP server host is determined by examining the results of an FTP PWD command.  If it returns a current working directory specification containing a "/" then it's assumes it to be Unix(-like), if ":[" then VMS, if a "\" then DOS.  Anything else is unknown and it tries to do it's best with an uninterpreted listing. 

Note that the content-type of the transfer is determined by the way the proxy server interprets the FTP request path's "file" extension.  This may or may not correspond with what the remote system might consider the file type to be.  The default content-type for unknown files is "application/octet-stream" (binary).  In addition, a directory listing contains three links indicated by the italicised characters, "aid".  These allow the user to specify the transfer mode, text ("a" for ASCII), binary ("i" for image) and "d" for directory listing, for files with a content-type not correctly interpreted by the agent. 

Supports the FTP URL ";type=a" (return document as plain text), ";type=i" (return document as binary) and ";type=d" (return directory listing) modifiers.  If a particular site is giving problems then a ";type=debug" may be added, revealing the client-server FTP dialog.  This may provide some insight into the problem. 

Rules required in HTTPD$MAP for acting as an agent of (script for) proxy:

  redirect ftp://* /fetch/ftp://*
  pass /ftp://* 
  script+ /fetch/* /cgi-bin/fetch/*


13.4 - CONNECT Serving

The connect service provides firewall proxying for any connection-oriented TCP/IP access.  Essentially it provides the ability to tunnel any other protocol via a Web proxy server.  In the context of Web services it is most commonly used to provide firewall-transparent access for Secure Sockets Layer (SSL) transactions. 

The WASD connect service implements the de facto standard HTTP CONNECT method, described in a number of Internet Drafts. 


13.4.1 - Enabling CONNECT Serving

As with proxy serving in general, CONNECT serving may enabled on a per-service basis.  Connect serving is enabled by appending the connect keyword to the particular service specification.  The following example shows a non-proxy, a proxy without connect service, a connect service, and finally a proxy with connect service. 

  [Service]
  http://alpha.wasd.dsto.defence.gov.au:80
  http://alpha.wasd.dsto.defence.gov.au:8080;proxy
  http://alpha.wasd.dsto.defence.gov.au:8081;connect
  http://alpha.wasd.dsto.defence.gov.au:8082;proxy;connect


13.4.2 - Controlling CONNECT Serving

The connect service poses a significant security dilemma when in use in a firewalled environment.  Once a CONNECT service connection has been accepted and established it essentially acts as a relay to whatever data is passed through it.  Therefore any transaction whatsoever can occur via the connect service, which in many environments may be considered undesirable. 

In the context of the Web and the use of the connect service for proxying SSL transactions it may be well considered to restrict possible connections to the well-known SSL port, 443. This may be done using conditional mapping rules, as in the following example:

  [[alpha.wasd.dsto.defence.gov.au:8080]]
  pass *:443 [me:connect]
  pass * "403 CONNECT only allowed to port 443." [me:connect]
All of the comments on the use of general and conditional mapping made in 13.1.4 - Controlling Proxy Serving can also be applied to the connect service. 


13.5 - Reporting and Maintenance

The HTTPDMON utility allows real-time monitoring of proxy serving activity.  See 20.6 - HTTPd Monitor

Proxy reports and some administrative control may be exercised from the server administration menu, see 15 - Server Administration.  The information reported includes:

The following actions can be initiated from this menu.  Note that three of these relate to proxy file cache and so may take varying periods to complete, depending on the number of files.  If the cache is particularly large the scan/purge may take some considerable time

Also available from the administration menu is a dialog allowing the proxy characteristics of the running server to be adjusted on an ad hoc basis.  This only affects the executing server, to make changes to permenant configuration the HTTPD$CONFIG configuration file must be changed. 

This dialog can be used to modify the device free space percentages according to recent changes in device usage, alter the reload or purge hour list characteristics, etc.  After making these changes a routine or reactive purge will automatically be initiated to reduce the space in use by the proxy cache if implied by the new settings. 


13.5.1 - PCACHE Utility

It is often useful to be able to list the contents of the proxy cache directory or the characteristics or contents of a particular cache file.  Cache files have a specific internal format and so require a tool capable of dealing with this.  The HT_ROOT:[SRC.UTILS]PCACHE.C program provides a versatile command-line utility as well as CGI(plus) script, making cache file information accessable from a browser.  It also allows cache files to be selected by wildcard filtering on the basis of the contents of the associated URL or response header.  For detailed information on the various command-line options and CGI query-string options see the description at the start of the source code file. 


Command-Line Use

Make the HT_EXE:PCACHE.EXE executable a foreign verb.  It is then possible to


Script Use

To make the PCACHE script available to the server ensure the following line exists in the HTTP$CONFIG configuration file in the [AddType] section. 

  .HTC  application/x-script  /cgiplus-bin/pcache  WASD proxy cache file

The following rule needs to be in the HTTPD$MAP configuration file. 

  pass /ht_cache_root/*
NOTE

It is also recommended to place the utility and the cache directory under some authorization control to prevent casual browsing and access of the cache contents.  Something local, similar in intention to
  [VMS]
  /pcache/* ~webadmin,131.185.250.*,r+w ;
  /ht_cache_root/* ~webadmin,131.185.250.*,r+w ;

Once available the following is then possible. 

NOTE

Cache directory trees have the potential to become heavily populated, so the use of the script to generate listings of the cache contents could return extremely large listing documents. 


13.6 - Browser Configuration

The browser needs to be configured to access URLs via the proxy server.  This is done using two basic approaches, manual and automatic. 


[next] [previous] [contents] [full-page]