Chapter 5 - Using HTMLDOC on a Web Server

This chapter describes how to interface HTMLDOC to your web server using CGI scripts and programs.

Note:

The free version of HTMLDOC for Windows does not support use from a web server.

The Basics

HTMLDOC can be used in a variety of ways to generate formatted reports on a web server. The most common way is to combine HTMLDOC with a CGI script or program and send the output to the HTTP client.

To make this work the CGI script or program must send the appropriate HTTP attributes, the required empty line to signify the beginning of the document, and then execute the HTMLDOC program to generate the HTML, PostScript, or PDF file as needed.

Another way to generate PDF files from your reports is to use HTMLDOC as a "portal" application. When used as a portal, HTMLDOC automatically retrieves the named document or report from your server and passes a PDF version to the web browser. See the next sections for more information.

WARNING:

Passing information directly from the web browser to HTMLDOC can potentially expose your system to security risks. Always be sure to "sanitize" any input from the web browser so that filenames, URLs, and options passed to HTMLDOC are not acted on by the shell program.

Calling HTMLDOC from a Shell Script

Shell scripts are probably the easiest to work with, but are normally limited to GET type requests. Here is a script called topdf that acts as a portal, converting the named file to PDF:

Users of this CGI would reference the URL "http://www.domain.com/topdf.cgi/index.html" to generate a PDF file of the site's home page.

The options variable in the script can be set to use any supported command-line option for HTMLDOC; for a complete list see Chapter 8 - Command-Line Reference.

Calling HTMLDOC from Perl

Perl scripts offer the ability to generate more complex reports, pull data from databases, etc. The easiest way to interface Perl scripts with HTMLDOC is to write a report to a temporary file and then execute HTMLDOC to generate the PDF file.

Here is a simple Perl subroutine that can be used to write a PDF report to the HTTP client:

Calling HTMLDOC from PHP

PHP is quickly becoming the most popular server-side scripting language available. PHP provides a passthru() function that can be used to run HTMLDOC. This combined with the header() function can be used to provide on-the-fly reports in PDF format.

Here is a simple PHP function that can be used to convert a HTML report to PDF and send it to the HTTP client:

The function accepts a filename and an optional "options" string for specifying the header, footer, fonts, etc.

To prevent malicious users from passing in unauthorized characters into this function, the following function can be used to verify that the URL/filename does not contain any characters that might be interpreted by the shell:

Another method is to use the escapeshellarg() function provided with PHP 4.0.3 and higher to generate a quoted shell argument for HTMLDOC.

To make a "portal" script, add the following code to complete the example:

Calling HTMLDOC from C

C programs offer the best flexibility and easily supports on-the-fly report generation without the need for temporary files.

Here are some simple C functions that can be used to generate a PDF report to the HTTP client from a temporary file or pipe:

Calling HTMLDOC from Java

Java programs are a portable way to add PDF support to your web server. Here is a class called htmldoc that acts as a portal, converting the named file to PDF. It can also be called by your Java servlets to process an HTML file and send the result to the client in PDF format: