version 1.06, 22-DEC-1997
Introduction
Building instructions
Command line usage
Command file usage
Comment lines
Variables
Reserved and special variables
Scope of variables and macros
Command pass through
Functions
f$out
f$in
f$read,f$write
f$exit,f$break
f$date
f$file_info
f$type
f$evaluate,f$<- Math, String, and Logical Operations
Macros
f$macro_record
f$macro_break, f$macro_return
f$macro_repeat
If structures
Loop structures
<<>> embedded substitution tags
Copyright
Reporting bugs, getting more information
Example miniproc script (testfile.mpc)
Miniproc is a tiny preprocessor for use in formatting HTML
or other documents, and performing other similar tasks. It
may be used to embed preprocessing information for any
language into the source code, so that platform specific
versions of the final code result after the presource code
is processed. (Like the C preprocessor, but it will work
with any language.) Miniproc scripts are case sensitive in
all locations (variable names, function names, macro names).
In order to keep Miniproc very small the script syntax is
extremely rigid. Although this can make Miniproc scripts a
bit ugly, it also eliminates many common coding problems
(for instance, incorrectly nested if/then/else constructs,
or use of = for ==, or operator precedence problems). That
isn't to say it isn't possible to miscode a Miniproc script,
just that miscoded scripts will usually exit with an error
message - which is better than going on to do the wrong
thing.
It is called "miniproc" because it is a mini processor, and
because the term "miniproc" appears not to be in general use
at the time it is written (less than 10 hits in AltaVista on
22-NOV-1997.)
table of contents
Miniproc is completely contained in miniproc.c. It is an
ANSI C program, and compiles cleanly on OpenVMS and Irix
with the pickiest ANSI C compiler settings. Just compile it
in ANSI C mode, link it (if that's a separate step on your
OS), and run it.
table of contents
(This varies a bit with operating system, add quotes
slashes, etc. as required to pass the double quotes seen
here):
miniproc input.mpc int1=123 s1="a string" s2=&123
That is, the first parameter is the name of the first input
file to open. If that isn't provided, the program prompts
for one. Subsequent parameters are pairs of
"variablename"="variablecontents", they are equivalent to
input file commands like:
#__intvar=123
#__stringvar="this is a string"
#__alsostring=&123
Variable names are case sensitive, and you may need to use
OS specific quoting to get the desired results. Variables
may be defined on the command line, but neither macros nor
functions may be invoked.
table of contents
Miniproc scripts or command files consist of 4 types of lines:
1. Pass through. A line is read from the current input
file, substitutions are performed, and the line is
written to the current default output file. Pass
through lines may not be continued. The substitution
tag is <<>>, where the contents of the variable
named is inserted into that position in the line. These
lines may not begin with "#__", the command line indicator.
2. Commands. These begin with "#__" and may be continued
by placing a final "-" as the last character on the
line, with the final line lacking this character. There
is only one command per command line. The maximum length
of a command line is 2048 characters. There are exactly
6 types of command line:
#__! this is a comment Comment lines
#__var1=var2 Variable Assignment
#__macrnoname parameter parameter Macro invocations
#__f$whatever parameter parameter Function invocations
#__if/elseif/else/endif test If structures
#__"#__command" Command pass through
Commands are read in, continued lines are assembled into
a single command, substitutions are performed, and then
the final command line is passed to the command line
interpreter. Trailing spaces and tabs on command lines
are ignored, as are spaces and tabs between "#__" and
the command. Multiple spaces and tabs will be reduced to a
single space between each token in a command line, and
for assignments, white space around "=" is allowed, but
has no affect.
table of contents
#__! Comment, rest of string is ignored
table of contents
There are an unlimited number of variables. Variables are
created when they first appear to the left of "=". (There
are also some predefined variables, see Reserved
and special variables, below.) Variables can hold either
strings or integers. For most variables once the type is
set, it cannot be changed. For certain special predefined
variables, it can be changed. String variables can be used
as pointers to other strings or integer variables. Multiple
levels of redirection can be obtained by prepending as many
"*" as needed. Variable names may NOT start with a digit, a
digit indicates an immediate integer value.
There are two ways to indicate "what follows is a string
literal": enclose it in double quotes, or prefix it with
"&". If the string literal has no spaces or tabs they are
equivalent in all usages. However, if the string literal
contains trailing spaces or tabs, then you must use the ""
form to prevent them from being trimmed off. Furthermore,
except in variable assignment statements, the literal area
for & and " extends only to the next space,tab, or end of
line. Consequently, and this is IMPORTANT: "this is a
string" will only be treated as a single string in a
variable assignment statement - anyplace else it will be
broken up into the separate tokens: ["this] [is] [a]
[string"].
Examples:
#__name="string" put the string literal into name
(without the outermost set of quotes. If it doesn't
exist, create it. The type is determined by the value
it will hold. Double quotes mark off a region from the
first pair on the line to the last. So that: ="foobar"
"foobar" "boo" will store the string:
foobar" "foobar" "boo
This is different from most other languages! Even
though Miniproc is written in C, \t,\n and so forth
have no special meaning in strings.
#__name=&string put the string literal into name
#__name="" reset name value to an empty string
#__name=& "
#__name=name2 copy contents of name2 into name
#__name=&12 put the string "12" into name
#__count=12 put the integer 12 into count
#__name2=&count name2 contains "count"
#__pointme1=&name2 pointme1 contains "name2"
#__pointme2=&count pointme2 contains "count"
#__name3=*pointme1 name3 takes on the value of name2 = "count"
#__count2=*pointme2 count2 takes on the value of count
#__count2=**pointme1 count2 takes on the value of count
table of contents
Some variables are reserved and have special meanings
and uses. These are:
STATUS
Integer returned by macros, functions, and command
files.
0 = failed, anything else = ok.
Functions usually return 1 for ok.
To set STATUS on macro exit use f$macro_return or
f$macro_break.
To set STATUS on input file exit use f$exit or f$break.
RESULT
The special variable used by the f$<- function to
return the results of a calculation. It may be
either an integer or a string.
trace
Integer. Set the trace level, for debugging.
This is a bit mask, any bit set causes that
operation to be logged to stdout.
(But the integer must be specified in decimal syntax!)
1 Log command lines
2 Log noncommand lines
4 Log variable creation
8 Log variable setting
16 Log macro invocation
32 Log function invocation
64 Log output lines (to stderr)
128 Log results of substitution passes
Default is 0, nothing is logged.
It is not possible to log command line actions!
subs
Integer
The number of levels of <<>> substitution to perform.
The default is 1, so if a new <<>> is created, it
will not be substituted. If the value is set higher
than 1, then after the first full pass through a
line a second or third pass will be made. Set it to
something very large and it will go until it cannot find
any more <<>>. If N is set to 0 it will not do
any <<>> substitutions.
macrosubs
Integer
Similar to subs, but controls replacements while
macros are recording. The default level is 0 - no
replacements while macros are recording. It is
important to note that line continuation resolution
occurs during macro execution, so that if a
a substituted variable is split across two lines it will not
be substituted during recording no matter what
macrosubs is set to.
safety
Integer
Set on command line ONLY to restrict actions taken by
possibly hostile input files. Bit map. Default is 0.
1 use only string to the right of /\]>: in file names,
disabling paths (excluding the file name passed from
the command line)
2 disables f$in
4 disables f$out (all output to stdout)
8 disables f$file_info
P1-P9
Special variables (integer or string)
Used to pass parameters into Macros.
MC1,2,3
MC1,2,3MAX
Integers.
These hold macro repeat count information.
See f$macro_repeat for more information.
table of contents
Variables and macros are global (visible in all modules)
unless they are explicitly created with local scope by
preceding the name in every location where it is used with a
colon, as in ":var". Macros may also be global or local -
give them global names when other input files will use the
same macro, otherwise, give them local names. That way
":calculate" can be a different function in each input file.
Note that "var1" and ":var1" are different variables even
when declared in the same module. Local variables may not
be set on the command line. Scope rules are:
Global variable or macro "name":
visible in all input files and macros
internal name = "name"
Local variable or macro declared in "input.mpc",
but not inside a macro:
visible only in that input file
internal name = "^input.mpc^name"
Local variable or macro declared in "input.mpc",
inside a macro "amacro":
visible only in that macro inside that input file
internal name = "^^input.mpc^amacro^name"
(If you are familiar with DCL from OpenVMS, the scoping
rules for global versus local are exactly the same as for =
vs ==.)
It is a very bad idea to refer to local variables indirectly
through global variables. That is:
#__aglobal=&*:var In one module
#__whatever=aglobal later, in another module
if there is not a local variable ":var" in the new module
the reference will cause a fatal error. If there is a local
variable ":var" it will be referenced instead of the
original, which is probably not the intended use.
It is ok to pass the values of local variables into a macro.
As in:
#__somemacro :localvar
but local variables should not be passed by reference (by
name) for the same reason as described above.
table of contents
Command pass through is a special shorthand for handling
lines that would normally be interpreted as commands.
Without the shorthand one of these two forms must be used:
#__var="#__some command line <<name>>"
#__f$write var
or
#__var="#__some command line <<name>>"
<<var>>
But these take two lines and require the creation of a
temporary variable. The shorthand forms are:
#__"#__some command line <<name>>"
#____some command line <<name>>
table of contents
Functions cause certain actions to take place and most
change the value of one or more variables. All set the
variable STATUS (uppercase) when they return. For the rest
of this, string means either an explicit string, like
"string" or &string, or a string variable, like name.
Integer is either an explicit integer like 123, or an
integer variable like >count<.
f$out
f$in
f$read,f$write
f$exit,f$break
f$date
f$file_info
f$type
f$evaluate,f$<- Math, String, and Logical Operations
table of contents
#__f$out filename [filenumber [disposition]]
Opens the file "filename" for output. Filename is
a variable name, or "string", or &string. With
no other parameters, it redirects the primary output
stream (filenumber 0) to the new file. Filenumbers
may be in the range 0-9, inclusive.
Disposition is a string variable and may be either
"new" or "append". Default is "new" - that is, the
output file is created when opened. On most
operating systems this will destroy any previous
versions, but if file versions are allowed it will
just create a new version. To use disposition you
must include a filenumber. f$out automatically
closes open files if a filenumber is reused. If
filename is an empty string, it closes that
filenumber.
table of contents Functions
#__f$in filename [filenumber]
Opens the file "filename" for input. Filename is
a variable name, or "string", or &string. With
no other parameters, it redirects the primary input
stream (filenumber 10) to the new file. Filenumbers
may be in the range 10-19, inclusive. The primary
input stream may be redirected up to 10 levels deep
with f$in commands. When a redirected stream executes
f$exit or f$break that input file is closed and the
input stream continues from the previous file.
Filenumbers 11-19 are automatically closed if reused.
This does not generate a warning or error. If
filename is an empty string, the file is closed
without opening another file.
Filenumber 10 may only be closed via f$exit or f$break.
table of contents Functions
#__f$read string filenumber
#__f$write string filenumber
Read or write a string variable from/to a filenumber.
Note that it is VERY DANGEROUS to read from
filenumber 0 (the command stream) since any mistakes
will corrupt the logic of the script it contains.
An input string may not be larger than any input
line, but the output string can be any size that the
operating system supports.
Read returns 1 if the read was normal, and 0
on any error or EOF. If the string truncated
on read it is a fatal error.
Write returns 1 for normal operation, 0 for
any error.
table of contents Functions
#__f$exit integer [bang]
#__f$break integer [bang]
Close current input file and return integer status.
If status isn't specified, defaults to 1 (true).
If input stream has been redirected, return to last
input stream. When all input streams are closed the
program exits.
If the second parameter is present it causes an
immediate exit from the entire program, passing the
status value to the operating system. Either f$exit
or f$break may be used for this function anywhere in
a miniproc script.
Use f$exit to exit unconditionally from an input script.
f$exit checks for dangling bits from if/elseif/else
structures, indicating bad command file syntax. As a
consequence, it may not be used conditionally.
Use f$break to exit from within an if/elseif/else
structure. f$break does not check for dangling
if/elseif/else structures on exit. f$break may not be
used outside of such a structure.
Except when executing an unconditional program exit,
neither of these may be used within a macro.
table of contents Functions
#__f$date sets the following variables (implicitly)
day the day (Sun - Sat) (string)
month the month (Jan - Dec) (string)
dd the date (1-31) integer
mm the month (1-12) "
wday day of the week (1-7) "
yday day of the year (1-365) "
yyyy the year (4 digit) "
hour the hour (0-23) "
minute the minute (0-59) "
second the second (0-59) "
unixtime store time in Unix format
table of contents Functions
#__f$file_info filename
sets the following variables for the file named in the immediate sting
variable filename.
file_exists 1=true, 0=false
file_size In bytes. The size may not be exact on some
operating systems and for some types of files.
file_modified Time file was last modified, in Unix time
table of contents Functions
#__f$type name
Returns the type of the variable named in the
immediate string value.
STATUS Meaning
0 not defined
1 integer
2 string
3 macro
table of contents Functions
f$evaluate
f$<-
#__f$evaluate result op operand operand operand ...
#__f$<- op operand operand operand ...
Evaluate an expression, which will produce a single
result using a single operater "op", and up to N
operands. The types of the operands and result must
match, and the result variable must already exist.
In general, operands can be either variables or
immediate values, except that strings containing
delimiters may only be used from within a variable.
Operations available are for integer math, boolean
logic, and string manipulation.
f$evaluate and f$<- are equivalent, except that
the result for f$<- is always stored in RESULT. The
f$<- form is primarily for use in if/else/elseif
structures, when the result should be tested and
then not used further.
"result" is buffered internally, so that any
variable may be both the result and an operand,
and the operation will always work as expected.
integer operands, integer result:
add result = op1 + op2 [...+ opN]
subtract result = op1 - op2 [...- opN]
multiply result = op1 * op2 [...* opN]
divide result = op1 / op2 [.../ opN]
power result = op1 ^ op2 (op1 raised to op2 power)
modulo result = op1 modulo op2
integer operands, integer/logical result (1=true, 0=false)
eq,neq,ge,le,lt,gt
result = if op1 (operator) op2 [...AND op1 (operator) opN]
logical operands, logical result
and result = op1 AND op2 [... AND opN]
or result = op1 OR op2 [... OR opN]
xor result = op1 XOR op2 [... AND (op1 XOR opN)]
not result = NOT op1
nand result = NOT (op1 AND op2 [... AND opN])
nor result = NOT (op1 OR op2 [.... OR opN])
string operations, string result:
append result = op1 // op2 [... //opN]]
uppercase result= uppercase(op1)
lowercase result= lowercase(op1)
element op1 holds index integer (1 is first)
op2 holds delimiter string (any
character from it delimits)
op3 holds delimited string
Result set to indicated token, or ""
if not valid, and STATUS set to false.
Example:
op1=4
op2=","
op3="a,b,c,d"
then result = "d", STATUS=1
But if op1=20, result="", STATUS=0
shortest Result is shortest string in op1,op2...
longest Result is longest string in op1,op2...
in case of a tie, the first one
encountered wins.
lexhigh
lexlow result = op with the highest/lowest lexical
values. operands are compared left to
right and if lengths don't match, the
shorter one is extended with zeroes.
head result = first op1 characters of op2
[//op3//op4...//opN]
if op1 > length of op2, all of op2.
tail result = last op1 characters of op2,
if op1 > length of op2, all of op2.
segment result = starting from position op1,
extract op2 characters from op3
(op4..opN)
If op2 > all string lengths,
then just to the last character in op3.
locate result = position in op2 that
matches the string in op1. If the
string isn't found, result=0.
eliminate result = op1, minus any characters in op2.
retain result = op1, keeping only characters in op2.
stringdel result=op1 minus any patterns that appear in
op1, op2, .. opN, applied sequentially. Example:
op1=foobar
op2=ob
op3=oa (this forms when ob comes out)
result=far
STATUS is 1 if no changes, 2 if changes.
resize op1 is an integer, changes the size of
the result string's memory area to
op1 characters, and replaces the last
character with a string terminator.
Size must be more than zero.
If the string is truncated, STATUS is
0, otherwise, 1. (op1=1 truncates a
string to the empty string.)
string operands, integer/logical result
compare result = 1 if op1 is exactly the same
as op2.
ccompare result = 1 if op1 differs from op2,
case doesn't count.
length result= length(op1) [ ... + length(opN)]
(does not include final \ 0 on string)
integer operand, string result
tostring result = string representation of integer op1.
op2 C formatting string, use "%d"
when in doubt. The final formatted
string may not be more than 31 characters in size.
(Use %c to store control characters like
bell (7) or escape (27).)
tointeger result (integer) = op1 (string). Ie,
result = 123 when op1 = &123.
tohex like tointeger, but hexadecimal
tooctal like tointeger, but octal
table of contents Functions
Macros contain a series of lines, command or pass through,
and are permanent, they may be recorded exactly one time.
Macro names must start with a letter, and may not be the
same as a variable name. Macros are invoked by name. If the
name doesn't correspond to a known macro it is assumed to be
a string variable, with the value of that variable being the
macro's name. Examples:
foobar Execute the macro named foobar
string Execute the macro named in the string variable
*string Execute the macro pointed to by the string variable.
Macros accept up to 9 parameters which are passed by value.
(To pass a variable by name just enclose it in double quotes
or precede it with a &).
#__name "foo" &boo name2 1 count
The preceding line says execute the macro "name", and pass
it the string literals foo and boo (which may be the names
of other variables), the value of name2 and count, and the
integer value 1. Parameters show up inside a Macro in
variables P1 - P9. These are special variables, and may
contain either strings or integers. They may not contain a
macro, but may contain the name of a macro. Since P
variables are globals, if a macro will invoke another macro,
it must first save the contents of the P variables in named
variables. To pass more than one string literal, use string
variables or & operators:
#__null=""
#__name &foo $null $null $null 10
or either of these forms
#__name &foo & & & 10
#__name "foo" "" "" "" 10
but this won't work as expected
#__name "foo" " " " " " " 10
as the use of " " to enclose spaces is only
allowed in a variable assignment statement.
For macros (but not functions) you can use local
variables to pass a string which, in effect, contains
spaces. Pass the values like this:
#__name "foo<<:s>>has<<:s>>spaces"
and inside the macro name (but NOT in the calling
routine have this local variable assignment)
#__ :s = " "
If subs is at least 2, this line in the macro:
P1 is [<<P1>>]
would be substituted out to:
P1 is [foo has spaces]
table of contents Macros
#__f$macro_record name [deck]
reate and begin recording a macro. When a macro
is recording it goes in verbatim, with no
substitutions or other expansions performed.
Only one macro may be recording at a time.
The name is a literal string, the only way to
change it during execution is by <<var>>
substitution. deck is also a string literal.
Deck terminates the macro when it appears on a line
like #__deck. If deck is not supplied it
defaults to "f$macro_end".
It is a fatal error to try to rerecord a macro,
so if there is any chance that a file will be
reexecuted during a single run, protect the macros
as you would C header files, like this:
#__ifnot f$test macroname
#__f$macro_record macroname deck
...(macro contents)...
#__deck
#__endif a
table of contents Macros
#__f$macro_break status
#__f$macro_return status
All macros MUST end with an f$macro_return command.
It marks the end of the macro, and handles updating
any counters that are active in that macro, and it
also checks syntax for dangling if/elseif/else/
endif constructs.
Use f$macro_break to immediately terminate a macro
and return to the calling script or macro.
f$macro_break does not check for syntax of
incomplete if structures.
Macros return a status value in the integer
variable STATUS. If it isn't explicitly set it
comes back as 1 (true).
table of contents Macros
#__f$macro_repeat name [int1 [int2 [int3]]]]
Defines up to 3 repeat counters that are initiated
each time the named macro is executed. These are
named MC1, MC2, and MC3, with corresponding range
limits of MC1MAX, MC2MAX, and MC3MAX. These are
readonly integer variables. (Actually, you can
rewrite their values, but they are reset on each
repeat through the macro without regard to your
actions. The default setting for f$macro_repeat is
that the macro executes once.
#__f$macro_repeat foobar 3 2
Means that the macro command
#__foobar
would execute 6 times, and while it did so the
counter MC1 would count from 1 to 3, and for each
of those, the counter MC2 would count from 1 to 2.
#__f$macro_repeat foobar 0
Disables the macro foobar. The next instance of
#__foobar
would be skipped, without even touching the STATUS
variable.
table of contents Macros
#__ifnot label test
#__if label test
#__elseif label test
#__elseifnot label test
#__else label
#__endif label
If, elseif, else structure.
Label is an arbitrary immediate string, case sensitive.
If a variable is to be used for the label it must
be substituted all the way to a value, ie
#__if label <<alabel>>
The function of the label is to allow detection of
overlapping if structures at run time.
The "not" forms invert the logic of the test.
Test type interpretation
int 0 = false, anything else is true
string zero length string is false, anything else is true
*string as for string, but indirect reference
macro check STATUS returned, 0 is false, anything else true
Note that a macro which has been set to
loop zero times returns a status of 1
when invoked, so if used in a test in
this state it will always be true.
function check STATUS, false if 0, true if not.
EXCEPTION. If the function was
f$evaluate or f$<- if STATUS is 0 it is
a fatal error, if not, check the value
returned and act on that.
You can use macros and f$evaluate together to construct
arbitrarily complicated tests.
table of contents
The only loop mechanism in miniproc is to use f$macro_repeat
to set the repeat counters for a macro, and then execute
that macro. There is no way to set an infinite loop
condition since the counter limits are finite. However, if
you do
#__f$macro_repeat foobar 2000000000 2000000000 2000000000
#__foobar
that is effectively the same thing as an infinite loop,
since the macro will take 8 x 10^27 cycles to complete.
Typical loop structures can be implemented within a macro
without much difficulty. For instance:
do 100 times
#__f$macro_record do100 deck
...(operations)...
#__f$macro_return 1
#__deck
#__f$macro_repeat do100 100
do while variable is true
#__f$macro_record dowhile deck
#__if a variable
...(operations)...
#__else a
#__ f$macro_break 1
#__endif a
#__f$macro_return 1
#__deck
do until variable is true
#__f$macro_record dountil deck
...(operations)...
#__if a variable
#__ f$macro_break 1
#__endif a
#__f$macro_return 1
#__deck
and so forth.
table of contents
The <<>> tag is the only miniproc operation that
can be mixed with other characters in an output line.
<<>> substitutions are done before ANYTHING else
on each line. See above for the action of "subs", which
controls how many times the line is processed to remove
<<>>. The * operator does not work inside
<<>>, that is <<*name>> will not resolve to whatever
name points to. This is not an error, it will leave <<*name>>
as is on the output line.
<<name>> Insert the string variable text.
<<name>> Insert the integer variable into text.
Typical usage might be:
#__whichstory=&murderweapon
#__whichpocket="right coat"
#__killer="Robert"
#__! then much, much later...
#__! the next three lines have some single or double
#__! substitutions and then go right to the output
"I have an invitation to dinner," said <<killer>> as he gripped
the <<<<whichstory>>>> in his <<whichpocket>> pocket
ever more tightly.
See testfile.mpc for an example miniproc script.
table of contents
Copyright 1997 by David Mathog and California Instititute of
Technology.
This software may be used freely, but may not be
redistributed. You may modify this sofware for your own
use, but you may not incorporate any part of the original
code into any other piece of software which will then be
distributed (whether free or commercial) unless prior
written consent is obtained.
table of contents
For more information, or to report bugs, contact:
mathog@seqaxp.bio.caltech.edu
table of contents