The preprocessor was introduced in version V1.05 to allow users more flexibility in the HTML they generate. As such it moves AscToHTM towards being a HTML authoring tool, as opposed to a simple text conversion or migration tool. Although this wasn't AscToHTM's original intention, it is increasingly the use to which AscToHTM's "power users" are putting it. As such this is a rapidly growing area of functionally within the product.
The preprocessor looks for lines that begin with a special character sequence. Presently this is "$_$_", but this will become configurable in later versions.
Preprocessor lines are not normally output to the HTML generated. Instead they are used to modify AscToHTM's behaviour in a number of ways.
The pre-processor can be used to mark sections in your document so that AscToHTM will process them as you wish.
- Note:
- AscToHTM does attempt to spot much user-formatted text automatically, but this is a difficult area and prone to error. Hence the use of these directives can reduce the error rate on such occasions.
This directive is used to divide the document up into named section types. Section type names can be repeated through the document, and by default text is assumed to belong to a section called "all", indicating that this text is always copied to the output file.
Section type names must contain no white space, but may contain underscores.
This has no effect unless the user supplies a policy file indicating that they wish to select only certain section types for output.
For example, if the text document looks like this
Some text that'll always get copied, because it is in an "all" section type by default. $_$_SECTION Private Some text that will be copied either when the preprocessor is switched off, or when the user's policy file indicates that "private" section types are to be included. $_$_SECTION Other Likewise, this is an "other" section type. $_$_SECTION Private And here's some more "private" text. $_$_SECTION all Some text that will always get copied because it is explicitly in an "all" section type.
If the user then supplies a document policy file which includes the lines (see 6.3.5)
[Preprocessor] Use Preprocessor : Yes
then the two section types marked "private" won't be copied into the converted file unless the line
Include document section : Private
is added to the policy file. Similarly with the "other" section.
- Note_1:
- Strictly speaking the "use preprocessor" line above isn't needed as this is set to "yes" by default. This means that any $_$_SECTION lines will cause text to be omitted unless you supply an appropriate policy file.
- Note_2:
- Be aware that any sections omitted are also omitted from the analysis pass. This may have unexpected results as AscToHTM responds only to the input text that is to be included in the output.
The BEGIN_TABLE ... END_TABLE directives are used to bracket a table in the source text. AscToHTM will then attempt to analyse this table as best it can.
This is explained more in the AscToTab documentation.
Inside this section you can use other TABLE pre-processor commands to tailor the HTML generated (see 7.4).
Similarly the BEGIN_DELIMITED_TABLE ... END_DELIMITED_TABLE directives can be used to delimit a series of tab-delimited data values that should be interpreted as a table (e.g. data originally exported from a spreadsheet such as Excel)
The presence of these directives overrides any value set in the "Attempt table generation" policy
The BEGIN_CONTENTS ... END_CONTENTS directives are used to bracket a contents list in the source
document. AscToHTM will attempt to automatically detect the presence and location of any contents list in the document, but the algorithm can be problematic.Use this markup only when the document contains a contents list that AscToHTM fails to detect correctly.
See the discussion in 5.6.2.
The BEGIN_HTML ... END_HTML directives are used to bracket actual HTML in the source document.
The bracketed HTML will be transcribed to the output file unconverted. This device will allow you to embed images, tables and other HTML constructs not normally generated by AscToHTM.
This is how the image to the right has been added.
If you simply wish to insert a single line of HTML, the HTML_LINE command (see 7.3.2) offers a more compact form.
For in-line HTML use the HTML in-line tag (see 8.2.7)
The BEGIN_CODE ... END_CODE directives are used to bracket a piece of sample code in the source text.
AscToHTM will either render this in <PRE> ... </PRE> markup or <CODE> ... </CODE> markup (see the discussion about the policy "Use <CODE>..</CODE> markup" to see why the former is used as default).
The BEGIN_DIAGRAM ... END_DIAGRAM directives are used to bracket a piece of Ascii art or text diagram in the source text.
AscToHTM will render this in <PRE> ... </PRE> markup.
The BEGIN_PRE ... END_PRE directives are largely replaced by the TABLE, CODE and DIAGRAM directives. They are maintained for backwards compatability, and have the same effect as the DIAGRAM commands (see 7.1.6).
New in version 3.2
The BEGIN_IGNORE ... END_IGNORE directive delimit a section of text that should be ignored. This could be used to place comments in the source file, or to mark text that shouldn't be converted when the file is being generated by some third party software package.
This directive allows you to specify the <TITLE>...</TITLE> to be inserted into the <HEAD> section of the output page. This title will appear in the browser's frame title whenever the page is viewed, and will be the text shown in your browser's history.
The presence of a TITLE command overrides any title specified in a policy file (see 6.3.1).
To fully understand how titles are calculated, see the discussion in 5.6.1
This directive allows you to specify a description of your document that is added to a META tag inserted into the <HEAD> section of the output page(s) as follows :-
<META NAME="description" CONTENT="your description">
This tag is often used by search engines (e.g. AltaVista) as a brief description of the contents of your page. If omitted the first few lines may be shown instead, which is often less satisfactory.
The presence of a DESCRIPTION pre-processor command overrides any description specified via a "Document description" policy line.
This directive allows you to specify keywords that are added to a META tag inserted into the <HEAD> section of the output page(s) as follows :-
<META NAME="keywords" CONTENT="your list or keywords">
This tag is often used by search engines when indexing your HTML page. You should add here any relevant keywords possibly not contained in the text itself.
The presence of a KEYWORDS pre-processor command overrides any keywords specified via a "Document keywords" policy line.
This directive allows you to specify the URL of a style sheet file, usually with a .css extension. Style sheet files are a new HTML feature that allow you specify fonts and colours to be applied to your document.
The resulting HTML is inserted into the <HEAD> section of the output page(s) as follows :-
<LINK REL="STYLESHEET" HREF="URL" TYPE="text/css">
The presence of a STYLE_SHEET pre-processor command will overrides any style sheet specified via a "Document style sheet" policy line.
This directive allows you to specify the name of a source file to be included at this point. This is useful if you wish some standard text inserted into many related documents, or into the same documents at many locations.
The included file will be treated as though it were part of the original file during both the analysis and output passes.
The include will fail is the fail cannot be found, and a test for recursive include files will be made.
This directive allows you to embed a single line of HTML in your source file. The rest of the line is copied across faithfully to the output file.
Essentially this offers the functionality as the HTML section commands (see 7.1.4), but in a more compact form.
New in version 3.2
These command allow you to add navigational aids to your document.
The CONTENTS_LIST command inserts a contents list at the present location. When this is present the normal generation of a contents list at the top of the document is suppressed.
The CONTENTS_LIST directive may also be supplied as an in-line tag (see 8.2.3). The same user arguments apply.
The NAVIGATION_BAR command inserts a navigation bar that takes to to the next/previous and contents files. This will only be generated when you have selected to split your file by setting the "Split level" policy.
New in version 3.2
The LINERULE directive allows you to insert a horizontal line into your text. It has the syntax:-
LINERULE <length>,<thickness>
where
<length> length of line in pixels/pts <thickness> thickness of line in pixels/pts
New in version 3.2
The TOC directive marks a point in the file that will receive an anchor point, and then be linked to from any generated contents lists.
This can be useful to index non-headings like key diagrams and tables.
The syntax is:
TOC <level>, <link name>, <display text>
where,
<level>
the level in the TOC, starting with 1 being the most
significant, equivalent to "chapter"<link name>
The (usually short) name by which this linkpoint may
be known. This is the value used to create an ANCHOR
point, and which may be referenced in any
HYPERLINK tag.<display text>
The text to be shown in the TOC. This will also be
used to generate an ANCHOR name, and may be used in
a TOC type HYPERLINK Tag, although this is marginally
less portable than referencing the link name
If omitted, defaults to the link name, and only one
ANCHOR is created.
See also the section on HYPERLINK tags (8.2.9).
These directives are used to tailor the HTML generated in any tables AscToHTM creates. They are placed either
- At the top of the file
Directives placed here become defaults for the whole file, and will replace any policies that have been set (see 6.3.7)
- Inside a BEGIN_TABLE ... END_TABLE section
Directives placed here will apply only to the table marked up by these commands (see 7.1.2).
The table commands are described (naturally enough) in the following table.
Directive Value |
Effect |
---|---|
TABLE_BGCOLOR Colour |
Colour of background |
TABLE_BORDER Number |
Size of border. 0 = None |
TABLE_BORDERCOLOR Colour |
Colour of border |
TABLE_CAPTION Text |
Table caption. Added centred at the top |
TABLE_CELL_ALIGN Align |
Specifies the default alignment of cells. Left, right or center |
TABLE_CELLSPACING Number |
Spacing between cells. |
TABLE_CELLPADDING Number |
Padding inside each cell |
TABLE_COLOUR_ROWS or (none) TABLE_COLOR_ROWS |
If present this specifies that the odd and even rows of the table should be coloured differently. See also the "Colour data rows" policy. |
TABLE_CONVERT_XREFS (none) |
If present, indicates that any section cross-references in the table may be converted to hyperlinks (see also the policy line "Convert TABLE X-refs to links") |
TABLE_EVEN_COLOUR or Colour TABLE_EVEN_COLOR |
When data rows are to be coloured this specifies the colour of the even numbered rows. |
TABLE_HEADER_ROWS Number |
Number of header rows. These will be placed in <TH> .. </TH> markup |
TABLE_HEADER_COLS Number |
Number of header columns. These will be marked up in bold |
TABLE_MAY_BE_SPARSE (none) |
If present, indicates that the TABLE may be sparse (see also the policy "Expect sparse tables") |
TABLE_MIN_COLUMN_SEPARATION Number |
Number of spaces to be taken as a column separator when analysing the table (see also the policy "Minimum TABLE column separation"). |
TABLE_ODD_COLOUR or Colour TABLE_ODD_COLOR |
When data rows are to be coloured this specifies the colour of the odd numbered rows. |
TABLE_WIDTH Text |
The width of the table (see also the policy "Default TABLE width") |
Colours must be HTML acceptable values which will placed in the various attributes of the <BODY> tag and other.
You can enter any value acceptable to HTML. Normally a value is expressed as a "#" and a 6-digit hexadecimal value in the range #000000 (black) to #FFFFFF (white), but certain colours such as "white", "blue", "red" etc may also be recognised by HTML. AscToHTM simply transcribes your value into the output file.
A value of "none" signals the defaults are to be used. By default AscToHTM changes the background colour to be white (the true HTML default is a light gray whose value is "#C0C0C0").
- NOTE:
- This feature has the potential to cause mayhem, and as such is offered to users on a "as is" basis. That is, we offer no support for getting this feature to have the effect a user may desire.
This directive allows you change a particular policy in part of a document. This is a potentially powerful feature, allowing you to tailor the conversion of your file in different sections of that file, or to embed the policy particular to a file in commands inserted at the top of the file itself.
The syntax of the command line is
$_$_CHANGE_POLICY <Policy Line>
where <Policy_line> is a policy line as it would appear in a policy file, and (usually) as it appears in the Policy manual.
For example the following would all be valid directives
$_$_CHANGE_POLICY Background Colour : red $_$_CHANGE_POLICY Ignore multiple blank lines : Yes
Although how and when they would take affect will depend on the policy.
For example, the background colour would only take effect if splitting the file up, and only on the next file generation. This works, BTW, so if anyone wants to split a file into many pages, all different colours, then be my guest.
There are a many caveats to this behaviour :-
- Not all policies are supported
Not all policies may be changed in this way. In particular policies that open other policy files are not supported. Even if a policy if "changed", it does not follow that changing the policy will have an effect.
- analysis policies
It is unlikely that this feature can be sensibly used to influence the analysis of file, other than when placed at the top of the file only. If such a manner it is simply an alternative to using a separate policy file.
- output policies
Output policies are referenced at different times. Only those that are referenced after the line is read from the source file may be influenced, thus things like output file name may have no effect.
- toggleable policies
Not all policies once changed, can be changed back. This is particularly of policies that contain values to be added to a list. This is an issue that may be addresses in later versions.
- unpredictable behaviour
Messing with policies can cause unpredictable behaviour. For example if you alter the section splitting parameters, then the chances of a section cross-reference elsewhere in the document being calculated as a correct hyperlink diminishes.
That's why this feature is offered UNSUPPORTED
- readahead buffer
To further complicate matters, AscToHTM uses a readahead, write behind buffer which means that you may need to experiment with the placing of your policy change to within 40 lines (the size of the buffer).
This problem is alleviated since version 3.2.
![]() |
Converted from a single text file by AscToHTM © 1997-99 John A. Fotheringham | ![]() |