/Brief Introduction to HTTPd Code5

HFRD Hypertext Services - Technical Overview

p [next] [previous][contents]
-

9 - Brief Introduction to HTTPd Code




@ This section is designed to be only a broad overview of the Jbasic functionality of the HTTPd server. It also does not cover the full 3suite of HFRD VMS Hypertext Services software. GThe source code should also be consulted. 

Multi-Threaded



K The HFRD HTTPd is written to exploit VMS operating system characteristics Jallowing the straight-forward implementation of multi-threaded code. The Mserver is written to be I/O event driven. Asynchronous System Traps (ASTs), Kor software interrupts, at the conclusion of an I/O (or other) event allow Jfunctions to be activated to post-process the event. The event traps are Hautomatically queued on a FIFO basis, allowing a series of events to be Hsequentially processed. When not responding to an event the process is Oquiescent, or otherwise occupied, effectively interleaving I/O and processing, 4and allowing a sophisticated client multi-threading.

I When VMS supports kernel-threads (beginning v7.0 I believe) this may be Nenhanced to optionally support multiple threads over multiple CPUs within the 2one process, further extending server throughput. 

K Multi-threaded code is inherently more complex than single-threaded code, Kand there are issues involved in the synchronization of some activities in Osuch an environment. Fortunately VMS handles many of these issues internally. KAfter connection acceptance, all of the processing done within HTTPd is at ?USER mode AST delivery level, and for all intents and Jpurposes the processing done therein is atomic, implictly handling its own synchronization issues. 

K HTTPd is written to make longer duration activities, such as the transfer Mof a file's contents, event-driven. Other, shorter duration activites, such Eas accepting a client connection request, are handled synchronously. 

G It is worth noting that with asynchronous, and AST-driven output, the Ldata being written must be guaranteed to exist without modification for the Hduration of the write (until completion AST delivery). This means data Bwritten must be static or in buffers that persist with the thread.OFunction-local (automatic) storage cannot be used. The HTTPd server allocates Odynamic storage for general (e.g. output buffering) or specific (e.g. response headers) uses. 

Tasks



F Each request can have one or more tasks executed sequentially Ato fullfil the request. This occurs most obviously with the HTMLNpre-processor, but also, to a more limited extent, with directory listing and Dits read-me file inclusion. A task is defined as one of: 

F Each one of these modules executes relatively independently. Before Ncommencing a task, a next-task pointer can be set to the function required to Kexecute at the conclusion of that task. At that conclusion, the next-task Kfunctionality checks for a specified task to start or continue. If it has Hbeen specified control is passed to that next-task function via an AST. 

Memory Management




7 Per-Thread memory is managed as two distinct portions.

    J
  1. A fixed-size structure of dynamic memory is used to contain the core :request thread data. This is released at thread disposal.

    H

  2. A heap of dynamically allocated memory is maintained during the life of a thread structure.

    ? When a dynamic structure is required this heap is expanded by @calloc()ing memory, placing this in a double-linked list Fstructure, and returning a pointer to the usable portion of the newly Dallocated memory. This list is released in one operation at thread Fdisposal, by traversing the list and free()ing each individual Ichunk (making it easier to avoid the memory leaks associated with making Dautononmous allocations for each dynamic memory structure required).



Output Buffering




E To reduce the number of individual network writes, and thus provide Jsignificant improvements in efficiency, output generated from all modules Nexcept File(), is buffered into larger packets before sending to the client. KAll modules, including File(), work to implement a seamless integration of =output via this mechanism (best seen in the Shtml() module). 

L The AST-driven nature of the server means this functionality is moderately Icomplex. A form of double buffering is employed, allowing the buffer to Loverflow and be flushed to the network asynchronously, without overwriting, Mlosing or needing to re-request data. Two buffer spaces are employed. When Fone fills it is written to the network and a pointer to the two areas Iexchanged, allowing the supplied (and overflowed) data to be immediately Mbuffered (without a synchronous wait for the network write to complete), and Lan immediate return to further AST-driven processing. The alternate buffer Ccontinues to be used until it fills, when the process is repeated. 

E The possibility of an asynchronous write with every buffered output Ointroduces complexity. Every buffered output call must be used as if it is an Masynchronous network write, an AST function address supplied with every call Non the off-chance (and eventuality) that an actual network write will occur. KIf a network write does not occur (most of the time) the AST is explicitly Gdeclared for delivery. This need to supply an AST function with every Ebuffered write, basically means only one buffered write may occur per AST-executed function. 

Rule-Mapping



K A fundamental aspect of any HTTPd implementation is the rule mapping used Hto create a logical structure for the hypertext file system. The HTTPd Lmapping function is designed to be flexible enough that script programs can Jalso use it. As a result it is text-file based, and opened and read when Omapping. This method of mapping provides a good deal of flexibility, coupled Hwith acceptable performance. The function has received a high level of 'attention in an effort to optimize it. 

Auto-Scripting



L The HFRD VMS HTTP server has the facility to automatically invoke a script Lto process a non-HTML document (file). This facility is based on detecting Dthe MIME content data type (via the file's extension) and causing a Ntransparent, local redirection, invoking the script as if it was specified in the original request. 

Internal Directives



B The HTTPd server detects certain strings as directives about its Mbehaviour. These directives are passed in the query string component of the Grequest, and as reserved sequences cannot occur in normal requests (an 7unlikely combination of characters has been selected). 



HTTPd Modules




I The HTTPd server comprises eight main modules, implementing the obvious Bfunctionality of the server, and other, smaller, support modules. 



9.1 - HTTPD.C


ACode for: HTTPd.c

G This is the main module of the server. It handles all TCP/IP network Iactivites, from creating the server socket and listening on the port, to Lreading and writing network I/O. The network read and write functions have Nprovision for specifying I/O completion AST function addresses. If these are Mprovided then the function is called upon completion of the network I/O. If Dnot provided then the I/O completes without calling an AST routine. 

I The server begins by creating a network socket and then binding that to Hthe HTTP port. The server then enters an infinite loop, waiting for IP connections.

H When a connection request is received the remote host is checked as an Jallowed connection. If allowed, a request data structure is created from Mdynamic memory, and an asynchronous read is queued from the network client. ?The pointer to this dynamic data structure becomes the request Gthread, and is passed from function to function, AST routine to IAST routine. The AST completion routine of the network read specifies a Hrequest analysis function. The function then returns to the connection acceptance loop. 

C When the network read completes an AST completion function in the 7Request() module is called to process the HTTP request. 

9.2 - REQUEST.C

DCode for:Request.c

H This module reads the request header from the client, parses this, and Mthen calls the appropriate task function to execute the request (i.e. send a Hfile, pre-process an HTML file, generate a directory listing, execute a script, etc.)

J The request header is contained in the network read buffer. If it cannotObe completely read in the first chunk, the read buffer is dynamically expanded Kso as to be read in multiple chunks. The request header is addressed by a Nspecific pointer that allows the parse-and-execute function to process either Ja genuine, initial client request header, or a pseudo-header generated to effect a redirection. 

I The method, path information and query string are parsed from the first Oline of the header. Other, specific request header fields are also parsed out Kand stored for later reference. Once this has been done the header is not further used.

C Once the relevant information is obtained from the request header Fprocessing commences on implementing the request. This comprsises theKrule-mapping of any path information, the RMS parsing of any resulting VMS Efile specification and decision-making on how to execute the request.



I This functionality is used to parse and execute both the initial client Lrequest and any pseudo-request generated internally to effect a redirection. 

9.3 - FILE.C

@Code for: File.c 

I This module implements the file transfer functionality. It obtains the Mfile specification, mime content type and encoding (binary/text) information Jfrom the request data structure. It handles record-oriented (text) files Gslightly differently to binary (e.g. image) files (specified using the fAddType configuration directive, see 4 - HTTPd Configuration). NRecord-oriented files will have multiple records buffered before writing them Ncollectively to the network (improving efficiency). Binary file reads are by JVirtual Block, and are written to the network immediately. The essential $behaviour however is much the same. 

     I
  1. The primary function attempts to open the file. If unsuccessful it Oimmediately returns the error status to the calling routine for further action M(this behaviour is used to try each of multiple home pages by detecting file-not-found, for example).

    N

  2. After successfully opening the file it generates an HTTP response header Kif required. It then calls one of either two functions to queue the first Oread from the file, one for text files (record-oriented transfer), another for Mbinary files (block-oriented transfer). After the read is queued it returns with a success status code. 

    E

  3. When the asynchronous file read completes one of either two AST Ecompletion functions (one for text the other for binary) is called toJpost-process the I/O. Status is checked for success or otherwise. If an Merror the status is reported to the client, the file closed, and the request thread concluded. 

    @ If end-of-file, the file is closed, for record-oriented J(text) files the buffer checked and if necessary flushed. If an end task Nfunction was specified control is now passed to that, otherwise the thread is concluded. 

    C If not end-of-file, for text files multiple records may be Mbuffered before writing to the network. If the buffer is full (the read was Hunsuccessful due to insufficient space) the contents are asynchronously Owritten to the network, with the network write completion routine specifying a Mfunction to re-read the the file record that just failed. If there is still Jspace in the buffer another asynchronous read of the file is queued in an Mattempt to append the next record into the buffer. After the read is queued the function completes. 

    ? If not end-of-file, for binary files a successful read Mresults in a call to the network write function to send this to the client. LThis call contains the address of the function to read the next blocks from Mthe file as an AST completion routine. After the asynchronous network write "is queued the function completes. 



F For text files the contents can be encapsulated as plain text. This =involves prefixing the file send with a <PRE> >HTML tag and postfixing it with a </PRE> tag. JThe buffer is filled as per normal but when ready to output a function is >called that escapes all HTML-forbidden characters first (e.g. %``<'', ``>'', ``&'', etc.) 

9.4 - MENU.C

?Code for: Menu.c

H This module implements the HFRD menu interpretation functionality. It Mobtains the file specification from the request data structure. Output from MLthis module is buffered to reduce network writes, improving I/O efficiency. 

D Essential behaviour: 

     eI
  1. The primary function attempts to open the file. If unsuccessful it dOimmediately returns the error status to the calling routine for further action eA(this behaviour is used to try multiple home pages, for example).A

    LN

  2. After successfully opening the file it generates an HTTP response header Kif required. A call is then made to asynchronously read a record from the DOfile opened. This call contains the address of a function to count the number TLof menu sections (blank-line delimited groups of lines) in the file. After Ithe asynchronous file read is queued the function returns with a success status code.

    aN

  3. When the asynchronous file read completes the AST completion function is Ocalled to count the sections. Status is checked for success or otherwise. If tHan error the status is reported to the client, the file closed, and the processing concluded. 

    K If end-of-file, or up to three sections counted, the file is repositioned aNto the start and then another asynchronous file read is queued (starting with Othe first record again), with the AST completion routine specified as the menu SHinterpretation function. If still counting sections the completion AST Droutine specified is the same section counting function. After the 8asynchronous file read is queued the function completes.

    gN

  4. When the asynchronous file read completes the AST completion function is Mcalled to interpret the line, dependant on the section number it occurs in. gGStatus is checked for success or otherwise. If an error the status is nGreported to the client, the file closed, and the request concluded. IftEend-of-file, the file is closed and the processing concluded. For a iPsuccessful record read the line can either be title, description or menu item. HWhen the line is interpreted and written to the network another read is Equeued, with an AST completion routine again specifying the contents t7interpretation function. The function then completes. a
h a

9.5 - DIR.C


=Code for: Dir.cS

D This module implements the HTTPd directory listing functionality. LDirectories are listed first, then files. File detail format customizable, Mwith the default resembling the default CERN and NCSA server layout. Output hEfrom this module is buffered to reduce network writes, improving I/O Oefficiency. HTML files have the <TITLE></TITLE> element extracted as a ``Description'' item. 

h Essential behaviour: 

     nO
  1. The primary function obtains the file specification from the request data tJstructure. Server directives, controlling some features of the directory Nlisting beahaviour, are checked for and parsed out if present. The directory Olisting layout is initialized. The directory specification (path information) lHis parsed to obtain the directory and file name/type components. After Osuccessfully parsing the specification it generates an HTTP response header if required. 

    aO

  2. Column headings and (possibly) a parent directory item are buffered in an TJasynchronous function call. An RMS structure is initialized to allow the Pasynchronous search for all files in the specified directory ending in ".DIR". 

    tG

  3. For each directory file found the directory search AST completion LNfunction is called. Status is checked for success or otherwise. If an error Othe status is reported to the client and the request processing concluded. If cGthe directory contained no directory files, or the directory files are Lexhausted a call to a function to begin a listing of non-directory files is &made and the function then completes. 

    e

    iL If a directory file was returned a synchronous call to list the details of Nthat directory is made and then another asynchronous search call made with an 4AST completion function again back to this function.

    fO

  4. When the directory files are exhausted the RMS structure is reinitialized sOto allow the search for all specification-matching, non-directory files in the i0directory. An asynchronous search call is made.

    M

  5. For each matching file found the file search AST completion function is lMcalled. Status is checked for success or otherwise. If an error the status rJis reported to the client and the processing concluded. If the directory Kcontained no matching files, or the files are exhausted, the processing is t2concluded and the function immediately completes. 

    sH If a file was returned a call is made to a function to check whether a Ofile description can be obtained (HTML files only). If it can then a function >Nto initiate this is called and the function completes. If no description can Pbe obtained a synchronous call is made to a function to list the file details. LAfter the file details are listed another asynchronous search call is made, Hwith the same function specified for AST completion. The function then immediately completes.

    I

  6. To asynchronously locate a description in an HTML file, the file is eEopened and then each record asynchronously read and examined for the G<TITLE> element. Once obtained a synchronous call is made to a oNfunction to list the file details. After the file details are listed another Nasynchronous search call is made, with the file search function specified for ;AST completion. The function then immediately completes. 
a n

9.6 - DCL.C


=Code for: DCL.ca

tG The DCL execution functionality must interface and coordinate with an dFexternal subprocess. It too is asynchronously driven by I/O once the Ksubprocess has been created and is executing independently. Communication r%with the subprocess is via mailboxes.

d' The DCL facility is used is two modes:a

    0
  1. To execute independent DCL commands.IThis is used to provide DCL output for pre-processed HTML. In this mode >multiple DCL commands may be executed within the one request. #
  2. To execute CGI scripts. AIn this mode only one DCL command is executed during the request.n
s

nF DCL related structures and devices (e.g. mailboxes) are retained for Klife of the request, and may be reused if multiple commands are required. +This reduces the overhead of DCL execution.n

     aM
  1. The primary DCL function ensures any required file specification exists aM(e.g. script procedure). The first time it is executed during an individual i+request it creates two or three mailboxes: m
      ,
    1. for the subprocess' SYS$INPUT ,
    2. for the subprocess' SYS$OUTPUTN
    3. if CGI-script execution, available for the subprocess to explicitly open Nproviding direct read access to the HTTP data stream (this could be done with /a DEFINE /USER SYS$INPUT HTTPD$INPUT) o
    i /9 A function writes to the SYS$INPUT, creating a iOnumber of CGI-compliant symbol names and executing the command or invoking the s"execution of a DCL procedure, etc.

    cH A subprocess is spawned. Input and output are then I/O AST completion Iroutine driven, and the primary function returns to the calling routine. n

    qB

  2. When the subprocess writes to the SYS$OUTPUT stream Pthe I/O completion AST routine associated with reading that mailbox is called. 

    iI If CGI-script execution, the first I/O from this stream is analyzed for ICGI-compliance. It is determined whether a raw HTTP data stream will be iOsupplied by the script, or whether the script will be CGI-compliant (requiring RNthe addition of HTTP header, etc.) and whether HTTP carriage-control needs to "be checked/added for each record. 

    tL A CGI local redirection header (partial URL) is a special case. When this @is received all output from the subprocess is suppressed Huntil the script processing is ready to be concluded. At that time the B``Location:'' information of the header is used to reinitiate the /request, using the same thread data structure. i

    < When normal SYS$OUTPUT processing is complete the Lrecord received can be handled in one of two ways. If it is raw HTTP it is Masynchronously written to the network. The AST completion routine specified h@with the network write will queue another read from subprocess' BSYS$OUTPUT. If it is record-oriented I/O (e.g. from DCL Joutput), it is output buffered, into larger chunks, before these multiple Grecords are written to the network at the one time. This improves I/O E efficiency. 

    9= The SYS$OUTPUT stream is a little problematic. At 9Lsubprocess exit there may be one or more records waiting in the mailbox for Hreading and subsequent writing to the client over the network, delaying Oprocessing conclusion. Detection of completion is accomplished by making each iMQIO sensitive to mailbox status via the SS$_NOWRITER status, which indicates nOthere is no channel assigned to the mailbox, and the mailbox buffer is empty. KIt then becomes safe to dispose of the client thread without loss of data. e

    N

  3. If CGI-script execution, The HTTP data stream made available is also AST Gdriven. If the subprocess opens the stream and reads from it, the I/O eLcompletion routine called queues another asynchronous read from the network client. 
n l

9.7 - SHTML.C


ACode for: sHTML.ct

cL The HTML pre-processor module provides this functionality as an integrated Kpart of the server. Output from this module is buffered to reduce network !writes, improving I/O efficiency.a

a Essential behaviour: 

     I
  1. The primary function attempts to open the file. If unsuccessful it SOimmediately returns the error status to the calling routine for further action A(this behaviour is used to try multiple home pages, for example).c

    nN

  2. After successfully opening the file it generates an HTTP response header Kif required. A call is then made to asynchronously read a record from the =Nfile opened. The record read AST function scans the record (line) looking forOpre-processor directives embedded in HTML comment directives. If no directive mEis found the record is output buffered and another queued to be read.n

    xI

  3. If a directive is detected any part of the line up the directive is aMoutput buffered and a function called to parse the directive. This function cMreports an error if the directive specified is not supported (unknown, etc.) iHIf a supported directive a specific function is called according to the Odirective specified. These functions provide the pre-processor information in lone of four ways: 
      

      t

    1. Internally

      nJ Information such as the system time, current document information, etc., Lcan be provided from information contained in the request data, etc., or in Nthe case of specified document/file information obtained via the file system. NThese directives have the relevant information buffered and then the function +returns to the directive parsing function. 

      

    2. Via DCL Executiona

      oK Information that must be obtained thorugh DCL execution is obtained using iIan asynchronous call to the Dcl() module. The next-task function aMis specified as the line parsing function. When the DCL module has finished eHexecuting the required command control is passed back to this function. 

      

    3. Sending a File

      cE If a file is #included this is provided with an asynchronous Jcall to the File() module. The next-task function is specified as Lthe line parsing function. When the File() module has finished transfering ;the included file control is passed back to this function. n

      e

    4. Directory Listingl

      oJ If a directory listing is requested this is provided via an asynchronous Icall to the Dir() module. The next-task function is specified as nNthe line parsing function. When the Dir() module has finished generating the 1listing control is passed back to this function. a

    

    tG

  4. Directives continue to be parsed, and executed, asynchronously if Knecessary (as just described), from within a line until the end-of-line is eNreached. Any remaining characters are output buffered. Lines continue to be >read from the file using the AST mechanism until end-of-file. 
C e

9.8 - ISMAP.C


ACode for: IsMap.c 

aF The clickable-image support module provides this functionality as an Eintegrated part of the server. It supports image configuration file tGdirectives in either of NCSA or CERN formats. Extensive configuration t3specification error reporting has been implemented.I

Acknowlegement:



EH Three coordinate mapping functions have been plagiarized from the NCSA >imagemap.c script program. These have been inserted Iunaltered in the module and an infrastructure built around the essential Iprocessing they provide. Due acknowlegement to the original authors and Nmaintainers of that application. Any copyright over portions of that code is also acknowleged: 

  ** mapper 1.26  ** 7/26/93 Kevin Hughes, kevinh@pulua.hcc.hawaii.eduN  ** "macmartinized" polygon code copyright 1992 by Eric Haines, erich@eye.com


o Essential behaviour: 

     tG
  1. The primary function attempts to open the configuration file. If fCunsuccessful it generates an error report and concludes processing.r

    F

  2. After successfully opening the configuration file it extracts theJclient-supplied coordinate from the query string. A call is then made to Basynchronously read a record (line) from the configuration file. ?Configuration file processing is asynchronous from that point. f

    oN

  3. The record (line) read AST function checks for end-of-file, when it will Dreturn the default URL (if supplied). After end-of-file the /file is closed and the processing is concluded.r

    K If not end-of-file, a function is called to parse the record for an image Bmapping directive. When the components have been parsed the NCSA @imagemap.c routines are used to determine if the click 8coordinates are within the specified region coordinates.

    >L If it is within the region the click has been mapped and the URL is placed Min heap memory and the thread's redirection location pointer set to it. The lMfile is closed and the processing conclusion function called. This function tLdetects the redirection location and if a local URL instead of disposing of Pthe thread generates a new, internal request from the redirection information. JIn a non-local URL the client is sent a redirection response and then the thread concluded. 

    bG If not within the region a call is made to asynchronous read the next >#record from the configuration file./

c e

9.9 - LOGGING.C


ECode for: Logging.cd

? The logging module provides an access log (server logs,mIincluding error messages are generated by the detached HTTPd process, seecLsections Server Process Logging Directory and23.1.2.4 - Logging).

dF The access log format is that of the Web-standard, ``common''-format,Jallowing processing by most log-analysis tools. Each entry (record, line) comprises:€

  client_host r_ident auth_user [time] "request" reponse_status bytes_sent
 where:Ci

mG In addition to legitimate request entries the server adds bogusoLentries to time-stamp server startup, shutdown, and the log being explicitly[opened or closed (see 3.1.2.4 - Logging). These entries aretIcorrectly formatted so as to be processed by a log analysis tool, and arenFrecognisable as being ``POST'' method and coming from user ``HTTPd''. KThe request path contains the event and a hexadecimal VMS status code, thatr7represents a valid exit status only in ``END'' entries. 

@ Clickable-image requests are logged as ``302'' entries, and the3resulting, redirected request entry logged as well.s

dH When a log entry is required the file is opened if closed. The file isMagain closed one minute after the initial request. This flushes the contentseof the write-behind buffers.

e


p [next] [previous][contents]d