Chapter 6 - HTML Reference

This chapter defines all of the HTML elements and attributes that are recognized and supported by HTMLDOC.

General Usage

There are two types of HTML files - structured documents using headings (H1, H2, etc.) which HTMLDOC calls "books", and unstructured documents that do not use headings which HTMLDOC calls "web pages".

A very common mistake is to try converting a web page using:

which will likely produce a PDF file with no pages. To convert web page files you must use the --webpage option at the command-line or choose Web Page in the input tab of the GUI.

HTMLDOC does not support HTML 4.0 elements, attributes, stylesheets, or scripting.

Elements

The following HTML elements are recognized by HTMLDOC:

ElementVersionSupported?Notes
!DOCTYPE3.0YesDTD is ignored
A1.0YesSee Below
ACRONYM2.0YesNo font change
ADDRESS2.0Yes 
AREA2.0No 
B1.0Yes 
BASE2.0No 
BASEFONT1.0No 
BIG2.0Yes 
BLINK2.0No 
BLOCKQUOTE2.0Yes 
BODY1.0Yes 
BR2.0Yes 
CAPTION2.0YesSee Below
CENTER2.0Yes 
CITE2.0YesItalic/Oblique
CODE2.0YesCourier
DD2.0Yes 
DEL2.0YesStrikethrough
DFN2.0YesHelvetica
DIR2.0Yes 
DIV3.2Yes 
DL2.0Yes 
DT2.0YesItalic/Oblique
EM2.0YesItalic/Oblique
EMBED2.0YesHTML Only
FONT2.0YesSee Below
ElementVersionSupported?Notes
FORM2.0No 
FRAME3.2No 
FRAMESET3.2No 
H11.0YesBoldface, See Below
H21.0YesBoldface, See Below
H31.0YesBoldface, See Below
H41.0YesBoldface, See Below
H51.0YesBoldface, See Below
H61.0YesBoldface, See Below
HEAD1.0Yes 
HR1.0YesSee Below
HTML1.0Yes 
I1.0Yes 
IMG1.0YesSee Below
INPUT2.0No 
INS2.0YesUnderline
ISINDEX2.0No 
KBD2.0YesCourier Bold
LI2.0Yes 
LINK2.0No 
MAP2.0No 
MENU2.0Yes 
META2.0YesSee Below
MULTICOLN3.0No 
NOBR1.0No 
NOFRAMES3.2No 
OL2.0Yes 
OPTION2.0No 
P1.0Yes 
PRE1.0Yes 
ElementVersionSupported?Notes
S2.0YesStrikethrough
SAMP2.0YesCourier
SCRIPT2.0No 
SELECT2.0No 
SMALL2.0Yes 
SPACERN3.0Yes 
STRIKE2.0Yes 
STRONG2.0YesBoldface Italic/Oblique
SUB2.0YesReduced Fontsize
SUP2.0YesReduced Fontsize
TABLE2.0YesSee Below
TD2.0Yes 
TEXTAREA2.0No 
TH2.0YesBoldface Center
TITLE2.0Yes 
TR2.0Yes 
TT2.0YesCourier
U1.0Yes 
UL2.0Yes 
VAR2.0YesHelvetica Oblique
WBR1.0No 

Comments

HTMLDOC supports many special HTML comments to initiate page breaks, set the header and footer text, and control the current media options:

<!-- FOOTER LEFT "foo" -->
Sets the left footer text; the test is applied to the current page if empty, or the next page otherwise.
<!-- FOOTER CENTER "foo" -->
Sets the center footer text; the test is applied to the current page if empty, or the next page otherwise.
<!-- FOOTER RIGHT "foo" -->
Sets the right footer text; the test is applied to the current page if empty, or the next page otherwise.
<!-- HALF PAGE -->
Break to the next half page.
<!-- HEADER LEFT "foo" -->
Sets the left header text; the test is applied to the current page if empty, or the next page otherwise.
<!-- HEADER CENTER "foo" -->
Sets the center header text; the test is applied to the current page if empty, or the next page otherwise.
<!-- HEADER RIGHT "foo" -->
Sets the right header text; the test is applied to the current page if empty, or the next page otherwise.
<!-- MEDIA BOTTOM nnn -->
Sets the bottom margin of the page. The "nnn" string can be any standard measurement value, e.g. 0.5in, 36, 12mm, etc. Breaks to a new page if the current page is already marked.
<!-- MEDIA COLOR "foo" -->
Sets the media color attribute for the page. The "foo" string is any color name that is supported by the printer, e.g. "Blue", "White", etc. Breaks to a new page or sheet if the current page is already marked.
<!-- MEDIA DUPLEX NO -->
Chooses single-sided printing for the page; breaks to a new page or sheet if the current page is already marked.
<!-- MEDIA DUPLEX YES -->
Chooses double-sided printing for the page; breaks to a new sheet if the current page is already marked.
<!-- MEDIA LANDSCAPE NO -->
Chooses portrait orientation for the page; breaks to a new page if the current page is already marked.
<!-- MEDIA LANDSCAPE YES -->
Chooses landscape orientation for the page; breaks to a new page if the current page is already marked.
<!-- MEDIA LEFT nnn -->
Sets the left margin of the page. The "nnn" string can be any standard measurement value, e.g. 0.5in, 36, 12mm, etc. Breaks to a new page if the current page is already marked.
<!-- MEDIA POSITION nnn -->
Sets the media position attribute (input tray) for the page. The "nnn" string is an integer that usually specifies the tray number. Breaks to a new page or sheet if the current page is already marked.
<!-- MEDIA RIGHT nnn -->
Sets the right margin of the page. The "nnn" string can be any standard measurement value, e.g. 0.5in, 36, 12mm, etc. Breaks to a new page if the current page is already marked.
<!-- MEDIA SIZE foo -->
Sets the media size to the specified size. The "foo" string can be "Letter", "Legal", "Universal", or "A4" for standard sizes or "WIDTHxHEIGHTunits" for custom sizes, e.g. "8.5x11in"; breaks to a new page or sheet if the current page is already marked.
<!-- MEDIA TOP nnn -->
Sets the top margin of the page. The "nnn" string can be any standard measurement value, e.g. 0.5in, 36, 12mm, etc. Breaks to a new page if the current page is already marked.
<!-- MEDIA TYPE "foo" -->
Sets the media type attribute for the page. The "foo" string is any type name that is supported by the printer, e.g. "Plain", "Glossy", etc. Breaks to a new page or sheet if the current page is already marked.
<!-- NEED length -->
Break if there is less than length units left on the current page. The length value defaults to lines of text but can be suffixed by in, mm, or cm to convert from the corresponding units.
<!-- NEW PAGE -->
Break to the next page.
<!-- NEW SHEET -->
Break to the next sheet.
<!-- NUMBER-UP nn -->
Sets the number of pages that are placed on each output page. Valid values are 1, 2, 4, 6, 9, and 16.
<!-- PAGE BREAK -->
Break to the next page.

Header/Footer Strings

The HEADER and FOOTER comments allow you to set an arbitrary string of text for the left, center, and right headers and footers. Each string consists of plain text; special values or strings can be inserted using the dollar sign ($):

$$
Inserts a single dollar sign in the header.
CHAPTER
Inserts the current chapter heading.
$CHAPTERPAGE
$CHAPTERPAGE(format)
Inserts the current page number within a chapter or file. When a format is specified, uses that numeric format (1 = decimal, i = lowercase roman numerals, I = uppercase roman numerals, a = lowercase ascii, A = uppercase ascii) for the page numbers.
$CHAPTERPAGES
$CHAPTERPAGES(format)
Inserts the total page count within a chapter or file. When a format is specified, uses that numeric format (1 = decimal, i = lowercase roman numerals, I = uppercase roman numerals, a = lowercase ascii, A = uppercase ascii) for the page count.
$DATE
Inserts the current date.
$HEADING
Inserts the current heading.
$LOGOIMAGE
Inserts the logo image; all other text in the string will be ignored.
$PAGE
$PAGE(format)
Inserts the current page number. When a format is specified, uses that numeric format (1 = decimal, i = lowercase roman numerals, I = uppercase roman numerals, a = lowercase ascii, A = uppercase ascii) for the page numbers.
$PAGES
$PAGES(format)
Inserts the total page count. When a format is specified, uses that numeric format (1 = decimal, i = lowercase roman numerals, I = uppercase roman numerals, a = lowercase ascii, A = uppercase ascii) for the page count.
$TIME
Inserts the current time.
$TITLE
Inserts the document title.

FONT Attributes

Limited typeface specification is currently supported to ensure portability across platforms and for older PostScript printers:

Requested FontActual Font
ArialHelvetica
CourierCourier
HelveticaHelvetica
MonospaceCourier
Sans-SerifHelvetica
SerifTimes
SymbolSymbol
TimesTimes

All other unrecognized typefaces are silently ignored.

Headings

Currently HTMLDOC supports a maximum of 1000 chapters (H1 headings). This limit can be increased by changing the MAX_CHAPTERS constant in the config.h file included with the source code.

All chapters start with a top-level heading (H1) markup. Any headings within a chapter must be of a lower level (H2 to H15). Each chapter starts a new page or the next odd-numbered page if duplexing is selected.

Note:

Heading levels 7 to 15 are not standard HTML and will not likely be recognized by most web browsers.

The headings you use within a chapter must start at level 2 (H2). If you skip levels the heading will be shown under the last level that was known. For example, if you use the following hierarchy of headings:

the table-of-contents that is generated will show:

Numbered Headings

When the numbered headings option is enabled, HTMLDOC recognizes the following additional attributes for all heading elements:
VALUE="#"
Specifies the starting value for this heading level (default is "1" for all new levels).
TYPE="1"
Specifies that decimal numbers should be generated for this heading level.
TYPE="a"
Specifies that lowercase letters should be generated for this heading level.
TYPE="A"
Specifies that uppercase letters should be generated for this heading level.
TYPE="i"
Specifies that lowercase roman numerals should be generated for this heading level.
TYPE="I"
Specifies that uppercase roman numerals should be generated for this heading level.

Images

HTMLDOC supports loading of BMP, GIF, JPEG, and PNG image files. EPS and other types of image files are not supported at this time.

Links

External URL and internal (#target and filename.html) links are fully supported for HTML and PDF output.

When generating PDF files, local PDF file links will be converted to external file links for the PDF viewer instead of URL links. That is, you can directly link to another local PDF file from your HTML document with:

META Attributes

HTMLDOC supports the following META attributes for the title page and document information:

<META NAME="AUTHOR" CONTENT="..."
Specifies the document author.
<META NAME="COPYRIGHT" CONTENT="..."
Specifies the document copyright.
<META NAME="DOCNUMBER" CONTENT="..."
Specifies the document number.
<META NAME="GENERATOR" CONTENT="..."
Specifies the application that generated the HTML file.
<META NAME="KEYWORDS" CONTENT="..."
Specifies document search keywords.
<META NAME="SUBJECT" CONTENT="..."
Specifies document subject.

Page Breaks

HTMLDOC supports four new page comments to specify page breaks. In addition, the older BREAK attribute is still supported by the HR element: Support for the BREAK attribute is deprecated and will be removed in a future release of HTMLDOC.

Tables

Currently HTMLDOC supports a maximum of 200 columns within a single table. This limit can be increased by changing the MAX_COLUMNS constant in the config.h file included with the source code. HTMLDOC supports HTML 3.0 tables with the following exceptions:

HTMLDOC does not support HTML 4.0 table elements or attributes, such as TBODY, THEAD, TFOOT, or RULES.