Blame | Last modification | View Log | RSS feed
<?xml version="1.0" standalone="no"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
<!--
$Id$
-->
<book>
<bookinfo>
<title>Calculus Users' Guide</title>
<corpauthor>The TenDRA Project</corpauthor>
<author>
<firstname>Jeroen</firstname>
<surname>Ruigrok van der Werven</surname>
</author>
<authorinitials>JRvdW</authorinitials>
<pubdate>2004</pubdate>
<copyright>
<year>2004</year>
<year>2005</year>
<holder>The TenDRA Project</holder>
</copyright>
<copyright>
<year>1998</year>
<holder>DERA</holder>
</copyright>
</bookinfo>
<chapter id="introduction">
<title>Introduction</title>
<para>This document describes a tool, <command>calculus</command>, which
allows complex type systems to be described in a simple algebraic
format, and transforms this into a system of C types which implements
this algebra.</para>
<para><command>calculus</command> was initially written as a design tool
for use with the TenDRA C and C++ producers. The producers' internal
datatypes reflect the highly complex interdependencies between the C and
C++ language concepts (types, expressions and so on) which they were
designed to represent. A tool for managing this complexity and allowing
changes to the internal datatypes to be made with confidence was
therefore seen to be an absolute necessity. The tool can also be applied
in similar situations where a complex type system needs to be
described.</para>
<para>The tool also provides for a separation between the specification of
the type system and its implementation in terms of actual C types. This
separation has a number of advantages. The advantages of maintaining a
level of abstraction in the specification are obvious. It is possible to
apply extremely rigorous type checking to any program written using the
tool, thereby allowing many potential errors to be detected at compile
time. It is also possible to restrict access to the types to certain
well-specified constructs, thereby increasing the maintainability of
code written using the tool.</para>
<para>This separation also has beneficial effects on the implementation of
the type system. By leaving all the type checking aspects at the
specification level, it is possible to impose an extremely homogeneous
underlying implementation. This means, for example, that a single memory
management system can be applied to all the types within the system. It
also opens the possibility of writing operations which apply to all
types within the system in a straightforward manner. The
<link linkend="diskoutput">disk reading and writing routines</link>
described below are an example of this.</para>
<sect1 id="options">
<title>Using calculus</title>
<para>The general form for invoking <command>calculus</command> is as
follows:
<command>calculus<optional><option>options</option></optional>
<option>input</option> ....
<optional><option>output</option></optional></command>
where <option>input</option> is a file containing the input type
system description and <option>output</option> is the directory where
the files implementing this system are to be output. This directory
must already exist. If no <option>output</option> argument is given
then the current working directory is assumed. Note that several
<option>input</option> files can be given. Unless
<link linkend="module">otherwise specified</link> it is the last file
which is used to generate the output.</para>
<para>The form in which the
<link linkend="textinput">input type systems</link>
are expressed is described below. The form of the output depends on
<option>options</option>. By default, the C implementation of the type
system is output. If the <option>-t</option> option is passed to
<command>calculus</command> then
<ulink url="tcpplus.html#token"><code>#pragma token</code></ulink>
statements describing the type system specification are output. This
<link linkend="tokenoutput">specification</link> is given below, with
some of the more important
<link linkend="coutput">implementation details</link> being described
in the following section.</para>
<para>Note that it is necessary to check any program written using
<command>calculus</command> against the <code>#pragma token</code>
version of the specification in order to get the full benefits of the
rigorous type checking provided by the tool. Some sections of the
program (if only the
<link linkend="coutputsupport">basic functions</link>) may be
dependent upon the implementation, and so not be suitable for this
form of checking.</para>
</sect1>
<sect1 id="example">
<title>Example program</title>
<para>The program <command>calculus</command> itself was written using a
type system specified using the <command>calculus</command> tool. It
is designed to provide an example of its own application, with some
features not strictly necessary for the functionality of the program
being added for illustrative purposes.</para>
</sect1>
</chapter>
<chapter id="textinput">
<title>Input syntax</title>
<para>The overall input file format is as follows:
<programlisting>
algebra:
ALGEBRA identifier version<subscript>opt</subscript>: item-list<subscript>opt</subscript>
version:
(integer.integer)
item-list:
item
item-list item
item:
primitive
identity
enumeration
structure
union
</programlisting></para>
<para>The initial identifier gives the overall name of the algebra. A
version number may also be associated with the algebra (if this is
omitted the version is assumed to be 1.0). The main body of the
algebra definition consists of a list of items describing the
primitives, the identities, the enumerations, the structures and the
unions comprising the algebra.</para>
<para>Here <literal>identifier</literal> has the same meaning as in C.
The only other significant lexical units are
<literal>integer</literal>, which consists of a sequence of decimal
digits, and <literal>string</literal>, which consists of any number of
characters enclosed in double quotes. There are no escape sequences in
strings. C style comments may be used anywhere in the input. White
space is not significant.</para>
<sect1 id="inputprim">
<title>Primitives</title>
<para>Primitives form the basic components from which the other types in
the algebra are built up. They are described as follows:
<programlisting>
primitive:
object-identifier = quoted-type;
</programlisting>
where the primitive identifier is given by:
<programlisting>
object-identifier:
#<subscript>opt</subscript>:<subscript>opt</subscript> identifier
#<subscript>opt</subscript>:<subscript>opt</subscript> identifier(identifier)
</programlisting>
and the primitive definition is a string which gives the C type
corresponding to this primitive:
<programlisting>
quoted-type:
string
</programlisting></para>
<para>Note that each primitive (and also each identity, each
enumeration, each structure and each union) has two names associated
with it. The second name is optional; if it is not given then it is
assumed to be the same as the first name. The first name is that which
will be given to the corresponding type in the output file. The second
is a short form of this name which will be used in forming constructor
names etc. in the output.</para>
<para>The optional hash and colon which may be used to qualify an object
identifier are provided for backwards compatibility only and are not
used in the output routines.</para>
</sect1>
<sect1 id="inputident">
<title>Identities</title>
<para>Identities are used to associate a name with a particular type in
the algebra. In this they correspond to <code>typedef</code>s in C.
They are described as follows:
<programlisting>
identity:
object-identifier = type;
</programlisting>
where the definition type, <literal>type</literal>, is as described
<link linkend="inputtype">below</link>.</para>
</sect1>
<sect1 id="inputenum">
<title>Enumerations</title>
<para>Enumerations are used to define types which can only take values
from some finite set. They are described as follows:
<programlisting>
enumeration:
enum !<subscript>opt</subscript> object-identifier = { enumerator-list };
enum !<subscript>opt</subscript> object-identifier = base-enumeration + { enumerator-list };
</programlisting>
where:
<programlisting>
base-enumeration:
identifier
</programlisting>
is the name of a previously defined enumeration type. The latter form
is used to express extension enumeration types. An enumeration type may
be qualified by an exclamation mark to indicate that no lists of this
type will be constructed.</para>
<para>The enumeration constants themselves are defined as
follows:
<programlisting>
enumerator:
identifier
identifier = enumerator-value
enumerator-list:
enumerator
enumerator-list, enumerator
</programlisting></para>
<para>Each enumerator is assigned a value in an ascending sequence,
starting at zero. The next value to be assigned can be set using an
<literal>enumerator-value</literal>. This is an expression formed from
<literal>integer</literal>s, <literal>identifier</literal>s
representing previous enumerators from the same enumeration, and the
question mark character which stands for the previous enumeration
value. The normal C arithmetic operations can be applied to build up
more complex <literal>enumerator-value</literal>s. All enumerator
evaluation is done in the <code>unsigned long</code> type of the host
machine. Values containing more than 32 bits are not portable.</para>
<para>Enumerations thus correspond to enumeration types in C, except
that they are genuinely distinct types.</para>
</sect1>
<sect1 id="inputstruct">
<title>Structures</title>
<para>Structures are used to build up composite types from other types
in the algebra. They correspond to structures in C. They are described
as follows:
<programlisting>
structure:
struct object-identifier = component-group;
struct object-identifier = base-structure + component-group;
</programlisting>
where:
<programlisting>
base-structure:
identifier
</programlisting>
is the name of a previously defined structure type. The latter form
is used to express (single) inheritance of structures. All components
of the base structure also become components of the derived
structure.</para>
<para>The structure components themselves are defined as follows:
<programlisting>
component-group:
{ component-list<subscript>opt</subscript> }
component-list:
component-declarations;
component-list component-declarations;
component-declarations:
type component-declarators
component-declarators:
component-declarator
component-declarators, component-declarator
component-declarator:
identifier component-initialiser<subscript>opt</subscript>
component-initialiser:
= string
</programlisting></para>
<para>The optional <link linkend="outputstruct">component initialiser
strings</link> are explained below.</para>
<para>Structures are the only algebra construct which prevent the input
from being a general graph. Unions may be defined in terms of
themselves, but (as in C) pointers must be used to define structures
in terms of themselves.</para>
</sect1>
<sect1 id="inputunion">
<title>Unions</title>
<para>Unions are used to build up types which can hold a variety of
information. They differ from C unions in that they are discriminated.
They are described as follows:
<programlisting>
union:
union object-identifier = component-group + field-group map-group<subscript>opt</subscript>;
union object-identifier = base-union + field-group map-group<subscript>opt</subscript>;
</programlisting>
where:
<programlisting>
base-union:
identifier
</programlisting>
is the name of a previously defined union type. The latter form is
used to express (single) inheritance of unions. All components, fields
and maps of the base union also become components of the derived
union. Note that only new fields and maps can be added in the derived
union.</para>
<para>The <literal>component-group</literal> gives a set of components
which are common to all the different union cases. The cases
themselves are described as follows:
<programlisting>
field-group:
{ field-list }
field:
#<subscript>opt</subscript> #<subscript>opt</subscript> field-identifier-list->component-group
#<subscript>opt</subscript> #<subscript>opt</subscript> field-identifier-list->base-field + component-group
base-field:
identifier
field-list:
field
field-list, field
field-identifier:
identifier
field-identifier-list:
field-identifier
field-identifier-list, field-identifier
</programlisting></para>
<para>The optional one or two hashes which may be used to qualify a list
of field identifiers are used to indicate
<link linkend="diskalias">aliasing</link> in the disk reading and
writing routines. The <literal>base-field</literal> case is a
notational convenience which allows one field in a union to inherit
all the components of another field.</para>
<para>Note that a number of field identifiers may be associated with the
same set of field components. Any such list containing more than one
identifier forms a field identifier set, named after the first field
identifier.</para>
<para>In addition a number of maps may be associated with a union.
These maps correspond to functions which take the union, plus a number
of other map parameter types, and return the map return type. They are
described as follows:
<programlisting>
map-group:
:[map-list<subscript>opt</subscript>]
map:
extended-type #<subscript>opt</subscript> identifier(parameter-list<subscript>opt</subscript>)
map-list:
map
map-list map
</programlisting>
where:
<programlisting>
parameter-list:
parameter-declarations
parameter-list; parameter-declarations
parameter-declarations:
extended-type parameter-declarators
parameter-declarators:
identifier
parameter-declarators, identifier
</programlisting></para>
<para>Note that the map parameter and return types are given by:
<programlisting>
extended-type:
type
quoted-type
</programlisting></para>
<para>In addition to the
<link linkend="inputtype">types derived from the algebra</link> it is
possible to use quoted C types in this context.</para>
<para>A map may be qualified by means of a hash. This means that the
associated function also takes a
<link linkend="outputunionmaps">destructor function</link> as a
parameter.</para>
</sect1>
<sect1 id="inputtype">
<title>Type constructors</title>
<para>The types derived from the algebra may be described as follows:
<programlisting>
type:
identifier
PTR type
LIST type
STACK type
VEC type
VEC_PTR type
</programlisting></para>
<para>The simple types correspond to primitive, identity, enumeration,
structure or union names. It is possible for a type to be used before
it is defined, but it must be defined at some point.</para>
<para>The derived type constructors correspond to pointers, lists,
stacks, vectors and pointers into vectors. They may be used to build
up further types from the basic algebra types.</para>
</sect1>
<sect1 id="module">
<title>Relations between algebras</title>
<para>As <link linkend="options">mentioned above</link>, more than one
input algebra may be specified to <command>calculus</command>. Each is
processed separately, and output is generated for only one. By default
this is the last algebra processed, however a specific algebra can be
specified using the command-line option <option>-A</option>
<option>name</option>, where <option>name</option> is the name of the
algebra to be used for output.</para>
<para>Types may be imported from one algebra to another by means of
commands of the form:
<programlisting>
import:
IMPORT identifier;
IMPORT identifier::identifier;
</programlisting>
which fit into the main syntax as an <literal>item</literal>. The
first form imports all the types from the algebra given by
<literal>identifier</literal> into the current algebra. The second
imports a single type, given by the second
<literal>identifier</literal> from the algebra given by the first
<literal>identifier</literal>.</para>
<para>Note that importing a type in this way also imports all the types
used in its construction. This includes such things as structure
components and union fields and maps. Thus an algebra consisting just
of <literal>import</literal> commands can be used to express
subalgebras in a simple fashion.</para>
</sect1>
</chapter>
<chapter id="tokenoutput">
<title>Output type system specification</title>
<para>In this section we document the basic output of
<command>calculus</command>. Two forms of the output can be generated -
a description of the output specification in terms of the TenDRA
<code>#pragma token</code> constructs, and the actual C code which
implements these tokens.</para>
<para>In this section the description given will be at the level of the
output specification. The more important details of the
<link linkend="coutput">C implementation</link> are given in the
following section.</para>
<para>The output is split among several header files in the specified
output directory. The main output is printed into
<filename>name.h</filename>, where <literal>name</literal> is the
overall algebra name. Unless otherwise stated, all the objects specified
below are to be found in <literal>name</literal><code>.h</code>. However
for each union, <literal>union</literal>, in the algebra certain
information associated with the union is printed into
<literal>union</literal><code>_ops.h</code>. If the union has any maps
associated with it then further output is printed to
<filename>union_map.h</filename> and <filename>union_hdr.h</filename>.
</para>
<sect1 id="outputinfo">
<title>Version information</title>
<para>Certain basic information about the input algebra is included in
<literal>name</literal><code>.h</code>. <literal>name</literal>
<code>_NAME</code> is a string literal giving the overall algebra
name. <literal>name</literal><code>_VERSION</code> is a string
literal giving the algebra version number.
<literal>name</literal><code>_SPECIFICATION</code> and
<literal>name</literal><code>_IMPLEMENTATION</code> are flags which
take the values 0 or 1 depending on whether the specification of the
type system in terms of <code>#pragma token</code> statements or the C
implementation is included.</para>
</sect1>
<sect1 id="outputbasic">
<title>Basic types</title>
<para>Six abstract type operators, each taking a type as argument and
returning a type, are specified as follows:
<programlisting>
TYPE PTR(TYPE t);
TYPE LIST(TYPE t);
TYPE STACK(TYPE t);
TYPE VEC(TYPE t);
TYPE VEC_PTR(TYPE t);
TYPE SIZE(TYPE t);
</programlisting></para>
<para>These represent a pointer to an object of type
<literal>t</literal>, a list of objects of type <literal>t</literal>,
a stack of objects of type <literal>t</literal>, a vector of objects
of type <literal>t</literal>, a pointer into a vector of objects of
type <literal>t</literal>, and an allocation block size for type
<literal>t</literal> respectively. (It is possible to suppress all
constructs involving <code>VEC</code> or <code>VEC_PTR</code> by
passing the <code>-x</code> command-line option to
<command>calculus</command>. Similarly <code>STACK</code> constructs
may be suppressed using <code>-z</code>.)</para>
<para>These constructors can be applied to any type, although in
practice they are only applied to the types specified in the algebra
and those derived from them. For example, we may form the type:
<programlisting>
LIST(PTR(int))
</programlisting>
representing a list of pointers to <code>int</code>.</para>
<para>An integral type:
<programlisting>
INT_TYPE name_dim;
</programlisting>
is specified, where <literal>name</literal> is the overall algebra
name. This type is used to represent the sizes of vectors.</para>
<para>A function pointer type:
<programlisting>
typedef void (*DESTROYER)();
</programlisting>
is specified, which represents a destructor function. Two destructor
functions are specified:
<programlisting>
void destroy_name();
void dummy_destroy_name();
</programlisting>
where <literal>name</literal> is as above.
<code>destroy_</code><literal>name</literal> is the default
destructor, whereas <code>dummy_destroy_</code><literal>name</literal>
is a dummy destructor which has no effect. The details of the
arguments passed to the destructors and so on are implementation
dependent.</para>
</sect1>
<sect1 id="outputsize">
<title>Operations on sizes</title>
<para>The <code>SIZE</code> type constructor is used to represent a
multiple of the size of a type. It is used, for example, in the
<link linkend="outputptr">pointer stepping construct</link>
<code>STEP_ptr</code> to specify the number of units the pointer is to
be increased by. Having a separate abstract type for the size of each
type greatly increases the scope for type checking of memory
allocation, and other, functions.</para>
<para>For each basic type in the algebra (a primitive, a structure or a
union), there is a constant expression:
<programlisting>
SIZE ( t ) SIZE_type;
</programlisting>
where <literal>t</literal> denotes the type itself, and
<literal>type</literal> is the associated type name.</para>
<para>For the five other type constructors
<link linkend="outputbasic">described above</link> there are constant
expressions:
<programlisting>
SIZE(PTR(t)) SIZE_ptr(TYPE t);
SIZE(LIST(t)) SIZE_list(TYPE t);
SIZE(STACK(t)) SIZE_stack(TYPE t);
SIZE(VEC(t)) SIZE_vec(TYPE t);
SIZE(VEC_PTR(t)) SIZE_vec_ptr(TYPE t);
</programlisting>
for any type <literal>t</literal>.</para>
<para>These constructs allow the size of any type derived from the
algebra to be built up. There is also a construct which corresponds to
multiplying the size of a type by a constant. This takes the
form:
<programlisting>
SIZE(t) SCALE(SIZE(t), INT_TYPE itype);
</programlisting>
for any type <literal>t</literal> and any integral type
<literal>itype</literal>.</para>
</sect1>
<sect1 id="outputptr">
<title>Operations on pointers</title>
<para>The <code>PTR</code> type constructor is used to represent a
pointer to an object of type <literal>t</literal>. It should be
emphasised that this construct is not in general implemented by means
of the C pointer construct.</para>
<sect2 id="simple-pointer-constructs">
<title>Simple pointer constructs</title>
<para>There are several simple operations on pointers, given as
follows:
<programlisting>
PTR(t) NULL_ptr(TYPE t);
int IS_NULL_ptr(PTR (t));
int EQ_ptr(PTR(t), PTR(t));
PTR(t) STEP_ptr(PTR(t), SIZE(t));
</programlisting></para>
<para>The construct <code>NULL_ptr</code> is used to form the null
pointer to <literal>t</literal> for a type <literal>t</literal>.
This is a constant expression. <code>IS_NULL_ptr</code> can be used
to test whether a given pointer expression is a null pointer.
Similarly <code>EQ_ptr</code> checks whether two pointers are equal
(note that we can only compare pointers to the same type). Finally
<code>STEP_ptr</code> can be used to add a scalar value to a
pointer.</para>
</sect2>
<sect2 id="unique-pointers">
<title>Unique pointers</title>
<para>There are also constructs for generating and destroying unique
pointers:
<programlisting>
PTR(t) UNIQ_ptr(TYPE t);
void DESTROY_UNIQ_ptr(PTR(t));
</programlisting></para>
<para>A unique pointer is guaranteed to be different from all other
undestroyed pointers. Dereferencing a unique pointer is
undefined.</para>
</sect2>
<sect2 id="pointer-construction-destruction">
<title>Pointer construction and destruction</title>
<para>The constructs:
<programlisting>
PTR(t) MAKE_ptr(SIZE(t));
void DESTROY_ptr(PTR(t), SIZE(t));
</programlisting>
are used to respectively create and destroy a pointer to a given
type.</para>
</sect2>
<sect2 id="construct-assignment-dereference">
<title>Assignment and dereference constructs</title>
<para>The constructs for assigning and dereferencing pointers have one
of two forms depending on the complexity of the type pointed to. For
simple types, including primitive, enumeration and union types, they
have the form:
<programlisting>
void COPY_type(PTR(t), t);
t DEREF_type(PTR(t));
</programlisting>
where <literal>t</literal> is the type involved and
<literal>type</literal> is the associated short name.</para>
<para>For more complex types, including structures, the assignment or
dereference cannot be done in a single expression, so the constructs
are specified to be statements as follows:
<programlisting>
STATEMENT COPY_type(PTR(t), t);
STATEMENT DEREF_type(PTR(t), lvalue t);
</programlisting></para>
<para>Here the <code>lvalue</code> type qualifier is used to indicate
that the statement argument is an assignable <code>lvalue</code>. In
this case it should give the location of an object of type
<literal>t</literal> into which the pointer will be
dereferenced.</para>
<para>The appropriate assignment and dereference constructs are
generated for each of the basic algebra types (primitives,
enumerations, structures and unions). In addition there are generic
assignment and dereference constructs for pointer types, list types,
stack types, vector types and vector pointer types. The first three
are of the first form above, whereas the second two have the second
form, as follows:
<programlisting>
void COPY_ptr(PTR(PTR(t)), PTR(t));
PTR(t) DEREF_ptr(PTR (PTR (t)));
void COPY_list(PTR(LIST(t)), LIST(t));
LIST(t) DEREF_list(PTR(LIST(t)));
void COPY_stack(PTR(STACK(t)), STACK(t));
STACK(t) DEREF_stack(PTR(STACK(t)));
STATEMENT COPY_vec(PTR(VEC(t)), VEC(t));
STATEMENT DEREF_vec(PTR(VEC(t)), lvalue VEC(t));
STATEMENT COPY_vec_ptr(PTR(VEC_PTR(t)), VEC_PTR(t));
STATEMENT DEREF_vec_ptr(PTR(VEC_PTR(t)), lvalue VEC_PTR(t));
</programlisting></para>
</sect2>
</sect1>
<sect1 id="outputlist">
<title>Operations on lists</title>
<para>The <code>LIST</code> type constructor is used to represent a list
of objects of type <literal>t</literal>.</para>
<sect2 id="simple-list-constructs">
<title>Simple list constructs</title>
<para>There are several simple list constructs:
<programlisting>
LIST(t) NULL_list(TYPE t);
int IS_NULL_list(LIST(t));
int EQ_list(LIST(t), LIST(t));
unsigned LENGTH_list(LIST(t));
PTR(t) HEAD_list(LIST(t));
LIST(t) TAIL_list(LIST(t));
PTR(LIST(t)) PTR_TAIL_list(LIST(t));
LIST(t) END_list(LIST(t));
LIST(t) REVERSE_list(LIST(t));
LIST(t) APPEND_list(LST(t), LIST(t));
</programlisting></para>
<para>Empty lists may be constructed or tested for.
<code>NULL_list</code> is a constant expression. Two lists may be
checked for equality (note that this is equality of lists, rather
than element-wise equality). The number of elements in a list can
be found. The head or tail (or <code>car</code> and
<code>cdr</code>) of a list may be formed. The end element of a list
may be selected. The order of the elements in a list can be
reversed. One list can be appended to another.</para>
</sect2>
<sect2 id="unique-lists">
<title>Unique lists</title>
<para>There are also constructs for generating and destroying unique
lists:
<programlisting>
LIST(t) UNIQ_list(TYPE t);
void DESTROY_UNIQ_list(LIST(t));
</programlisting></para>
<para>A unique lists is guaranteed to be different from all other
undestroyed lists. Taking the head or tail of a unique list is
undefined.</para>
</sect2>
<sect2 id="list-construction-destruction">
<title>List construction and destruction</title>
<para>For each type <literal>t</literal> there are operations for
constructing and deconstructing lists. For the basic types
comprising the algebra these take the form:
<programlisting>
STATEMENT CONS_type(t, LIST(t), lvalue LIST(t));
STATEMENT UN_CONS_type(lvalue t, lvalue LIST(t), LIST(t)) ;
STATEMENT DESTROY_CONS_type(DESTROYER, lvalue t, lvalue LIST(t), LIST(t));
</programlisting>
where <literal>type</literal> is the short type name.</para>
<para>The operation <code>CONS_</code><literal>type</literal> is used
to build a list whose head is the first argument and whose tail is
the second argument. This is assigned to the third argument.
<code>UN_CONS_</code><literal>type</literal> reverses this process,
decomposing a list into its head and its tail.
<code>DESTROY_CONS_</code><literal>type</literal> is similar, but it
also applies the given destructor function to the decomposed list
component.</para>
<para>There are also generic list construction and deconstruction
operations which apply to lists of pointers (with a <code>ptr</code>
suffix), lists of lists (with a <code>list</code> suffix), lists of
stacks (with a <code>stack</code> suffix), lists of vectors (with a
<code>vec</code> suffix) and lists of pointers to vectors (with a
<code>vec_ptr</code> suffix). For example, for lists of pointers
these have the form:
<programlisting>
STATEMENT CONS_ptr(PTR(t), LIST(PTR(t)), lvalue LIST(PTR(t))) ;
STATEMENT UN_CONS_ptr(lvalue PTR(t), lvalue LIST(PTR(t)), LIST(PTR(t)));
STATEMENT DESTROY_CONS_ptr(DESTROYER, lvalue PTR(t), lvalue LIST(PTR(t)), LIST(PTR(t)));
</programlisting></para>
<para>There is also a generic list destruction construct:
<programlisting>
STATEMENT DESTROY_list(LIST(t), SIZE(t));
</programlisting>
which may be used to destroy all the elements in a list.</para>
</sect2>
</sect1>
<sect1 id="outputstack">
<title>Operations on stacks</title>
<para>The <code>STACK</code> type constructor is used to represent
stacks of objects of type <literal>t</literal>. Empty stacks can be
created and tested for:
<programlisting>
STACK(t) NULL_stack(TYPE t);
int IS_NULL_stack(STACK(t));
</programlisting></para>
<para>For each type <literal>t</literal> there are operations for
pushing objects onto a stack and for popping elements off. For the
basic types comprising the algebra these take the form:
<programlisting>
STATEMENT PUSH_type(t, lvalue STACK(t));
STATEMENT POP_type(lvalue t, lvalue STACK(t));
</programlisting>
where <literal>type</literal> is the short type name. There are also
generic constructs, such as <code>PUSH_ptr</code> for pushing any
pointer type, similar to the
<link linkend="outputlist">generic list constructors</link>
above.</para>
<para>Stacks are in fact just modified lists with pushing corresponding
adding an element to the start of a list, and popping to removing the
first element. There are constructs:
<programlisting>
LIST(t) LIST_stack(STACK(t));
STACK(t) STACK_list(LIST(t));
</programlisting>
for explicitly converting between lists and stacks.</para>
</sect1>
<sect1 id="outputvec">
<title>Operations on vectors</title>
<para>The <code>VEC</code> type constructor is used to represent vectors
of objects of type <literal>t</literal>. There are a number of simple
operations on vectors:
<programlisting>
VEC(t) NULL_vec(TYPE t);
name_dim DIM_vec(VEC(t));
name_dim DIM_ptr_vec(PTR(VEC(t)));
PTR(t) PTR_ptr_vec(PTR(VEC(t)));
</programlisting></para>
<para>An empty vector can be constructed (note that, unlike null
pointers and null lists, this is not a constant expression). The
number of elements in a vector (or in a vector given by a pointer) can
be determined (note how the type
<literal>name</literal><code>_dim</code> is used to represent vector
sizes). A pointer to a vector can be transformed into a pointer to the
elements of the vector.</para>
<para>In general a vector may be created or destroyed using the
operators:
<programlisting>
STATEMENT MAKE_vec(SIZE(t), name_dim, lvalue VEC(t));
STATEMENT DESTROY_vec(VEC(t), SIZE(t));
</programlisting></para>
<para>Finally a vector can be trimmed using:
<programlisting>
STATEMENT TRIM_vec(VEC(t), SIZE(t), int, int, lvalue VEC(t));
</programlisting>
the two integral arguments giving the lower and upper bounds for the
trimming operation.</para>
</sect1>
<sect1 id="outputvecptr">
<title>Operations on vector pointers</title>
<para>The <code>VEC_PTR</code> type constructor is used to represent a
pointer to an element of a vector of objects of type
<literal>t</literal>. Apart from the basic constructors already
mentioned, there are only two operations on vector pointers:
<programlisting>
VEC_PTR(t) VEC_PTR_vec(VEC(t));
PTR(t) PTR_vec_ptr(VEC_PTR(t));
</programlisting></para>
<para>The first transforms a vector into a vector pointer (pointing to
its first argument). The second transforms a vector pointer into a
normal pointer.</para>
</sect1>
<sect1 id="outputprim">
<title>Operations on primitives</title>
<para>Each primitive specified within the algebra maps onto a
<code>typedef</code> defining the type in terms of its given
definition. The only operations on primitives are those listed above
- the size constructs, the pointer assignment and dereference
operations, the list construction and deconstruction operations and
the stack push and pop operations.</para>
</sect1>
<sect1 id="outputident">
<title>Operations on identities</title>
<para>Each identity specified within the algebra maps onto a
<code>typedef</code> in the output file. There are no operations on
identities.</para>
</sect1>
<sect1 id="outputenum">
<title>Operations on enumerations</title>
<para>Each enumeration specified within the algebra maps onto an
unsigned integral type in the output file. The
<link linkend="outputprim">basic operations</link>
listed above are always generated unless it has been indicated that
<link linkend="inputenum">no lists of this type will be formed</link>,
when <code>CONS_</code><literal>type</literal> and the other list and
stack operators are suppressed. In addition each enumerator which is a
member of this enumeration maps onto a constant expression:
<programlisting>
t enum_member;
</programlisting>
where <literal>t</literal> is the enumeration type,
<literal>enum</literal> is the short type name, and
<literal>member</literal> is the enumerator name. It is guaranteed
that the first enumerator will have value 0, the second 1, and so on,
unless the value of the enumerator is explicitly given (as in C).
There is also a constant expression:
<programlisting>
unsigned long ORDER_enum;
</programlisting>
giving one more than the maximum enumerator in the enumeration.</para>
</sect1>
<sect1 id="outputstruct">
<title>Operations on structures</title>
<para>Each structure specified within the algebra maps onto a
<code>typedef</code> defining the type to be the C structure with the
given components. In addition to the
<link linkend="outputprim">basic operations</link>
listed above there are also field selectors defined.</para>
<para>Suppose that <literal>t</literal> is a structure type with short
name <literal>struct</literal>, and that <literal>comp</literal> is a
component of <literal>t</literal> of type <literal>ctype</literal>.
Then there is an operation:
<programlisting>
PTR(ctype) struct_comp(PTR(t));
</programlisting>
which transforms a pointer to the structure into a pointer to the
component. There is also an operation:
<programlisting>
STATEMENT MAKE_struct(ctype, ...., PTR(t));
</programlisting>
where <literal>ctype</literal> ranges over all the component types
which do not have a component initialiser string in the structure
definition. This is used to assign values to all the components of a
structure. If a component has an initialiser string then this is used
as an expression giving the initial value, otherwise the given
operation argument is used. The initialiser strings are evaluated in
the context of the <code>MAKE_</code><literal>struct</literal>
statement. The parameters to the operation may be referred to by the
corresponding component name followed by <code>_</code>. The partially
initialised object may be referred to by the special character
sequence <code>%0</code>. Any remainder operations should be written
as <code>%%</code> rather than <code>%</code>.</para>
<para>Inheritance in structures is handled as follows: if
<literal>t</literal> is a derived structure with base structure
<literal>btype</literal> then there is an operation:
<programlisting>
PTR(btype) CONVERT_struct_base(PTR(t));
</programlisting>
where <literal>struct</literal> and <literal>base</literal> are the
short names of <literal>t</literal> and <literal>btype</literal>
respectively.</para>
</sect1>
<sect1 id="outputunion">
<title>Operations on unions</title>
<para>Each union specified within the algebra maps onto an opaque type
in the output file. In addition to the
<link linkend="outputprim">basic operations</link> listed above there
are a number of other constructs output into
<literal>name</literal><code>.h</code>, namely:
<programlisting>
unsigned int ORDER_union;
t NULL_union;
int IS_NULL_union(t);
int EQ_union(t, t);
</programlisting>
where <literal>t</literal> denotes the union type, and
<literal>union</literal> is the short type name.
<code>ORDER_</code><literal>union</literal> is a constant value giving
the number of union fields.
<code>NULL_</code><literal>union</literal> is a distinguished constant
value of type <literal>t</literal>. Values of type
<literal>t</literal> may be compared against this distinguished value,
or against each other.</para>
<sect2 id="union-construction-operations">
<title>Union construction operations</title>
<para>Most of the output for the union type <literal>t</literal> is
output into the file <filename>union_ops.h</filename>. This contains
a construct:
<programlisting>
unsigned int TAG_union(t);
</programlisting>
for extracting the discriminant tag from a union.</para>
<para>For each shared component, <literal>comp</literal>, of
<literal>t</literal> of type <literal>ctype</literal>, say, there is
an operator:
<programlisting>
PTR(ctype) union_comp(t);
</programlisting>
which extracts this component from the union.</para>
<para>For each field, <literal>field</literal>, of the union there are
constructs:
<programlisting>
unsigned int union_field_tag;
int IS_union_field(t);
</programlisting>
giving the (constant) discriminant tag associated with this field,
and a test for whether an object of type <literal>t</literal> has
this discriminant.</para>
<para>In addition, for each unshared component,
<literal>comp</literal>, of <literal>field</literal> of type
<literal>ctype</literal>, say, there is an operator:
<programlisting>
PTR(ctype) union_field_comp(t);
</programlisting>
which extracts this component from the union.</para>
<para>There are also operations for constructing and deconstructing
objects of type <literal>t</literal> for field
<literal>field</literal> given as follows:
<programlisting>
STATEMENT MAKE_union_field(ctype, ...., lvalue t);
STATEMENT DECONS_union_field(lvalue ctype, ...., t);
STATEMENT DESTROY_union_field(DESTROYER, lvalue ctype, ...., t);
</programlisting>
The operation <code>MAKE_</code><literal>union_field</literal>
constructs a value of type <literal>t</literal> from the various
components and assigns it into the last argument. The
<literal>ctype</literal> arguments range over all the components of
<literal>field</literal>, both the shared components and the
unshared components, which do not have a component initialiser
string in the definition. The union component are initialised either
using the <link linkend="outputstruct">initialiser string</link> or
the given operation argument.</para>
<para><code>DECONS_</code><literal>union_field</literal> performs the
opposite operation, deconstructing an object of type
<literal>t</literal> into its components for this field.
<code>DESTROY_</code><literal>union_field</literal> is similar,
except that it also applies the given destructor function to the
original value. For these two operations <literal>ctype</literal>
ranges over all the components, including those with
initialiser strings.</para>
</sect2>
<sect2 id="union-field-sets">
<title>Union field sets</title>
<para>Recall that
<link linkend="inputunion">a number of union field identifiers</link>
may be associated with the same set of components. In this case
these fields form a union field set named after the first element of
the set. There are operations:
<programlisting>
int IS_union_field_etc(t);
PTR(ctype) union_field_etc_comp(t);
STATEMENT MAKE_union_field_etc(unsigned int, ctype, ...., lvalue t);
STATEMENT MODIFY_union_field_etc(unsigned int, t);
STATEMENT DECONS_union_field_etc(lvalue ctype, ...., t);
STATEMENT DESTROY_union_field_etc(DESTROYER, lvalue ctype, ...., t);
</programlisting>
defined on these sets. These are exactly analogous to the single
field operations except that they work for any field in the set.
Note that
<code>MAKE_</code><literal>union_field</literal><code>_etc</code>
takes an extra argument, giving the tag associated with the
particular element of the set being constructed. Also the construct
<code>MODIFY_</code><literal>union_field</literal><code>_etc</code>
is introduced to allow the tag of an existing object to be modified
to another value within the same set.</para>
<para>The valid range of union tags for this set can be given by the
two constants:
<programlisting>
unsigned int union_field_tag;
unsigned int union_field_etc_tag;
</programlisting></para>
<para>A union is a member of this set if its tag is greater than or
equal to <literal>union_field</literal><code>_tag</code> and
strictly less than
<literal>union_field</literal><code>_etc_tag</code>.</para>
</sect2>
<sect2 id="inheritance-in-unions">
<title>Inheritance in unions</title>
<para>Inheritance in unions is handled as follows: if
<literal>t</literal> is a derived union with base union
<literal>btype</literal> then there is an operation:
<programlisting>
btype CONVERT_union_base(t);
</programlisting>
where <literal>union</literal> and <literal>base</literal> are the
short names of <literal>t</literal> and <literal>btype</literal>
respectively.</para>
</sect2>
<sect2 id="outputunionmaps">
<title>Union maps</title>
<para>For each map, <literal>map</literal>, on the union
<literal>t</literal>, we have declared in
<literal>union</literal><code>_ops.h</code> an operator:
<programlisting>
map_ret map_union(t, map_par, ....);
</programlisting>
where <literal>map_ret</literal> is the map return type and
<literal>map_par</literal> ranges over the map parameter types. This
is except for maps with destructors (i.e. those qualified by a
<link linkend="inputunion">hash symbol</link>) which have the form:
<programlisting>
map_ret map_union(t, DESTROYER, map_par, ....);
</programlisting></para>
<para>These maps are implemented by having one function per field for
each map. The output file
<literal>union</literal><code>_map.h</code> contains tables of these
functions. These have the form:
<programlisting>
map_ret(*map_union_table[ORDER_union])(t, map_par, ....) = {
....
map_union_field,
....
} ;
</programlisting>
where there is one entry per union field.</para>
<para>In order to aid the construction of these functions a set of
function headers is provided in
<literal>union</literal><code>_hdr.h</code>. These take the form:
<programlisting>
#define HDR_map_union_field\
map_ret map_union_field(name_union, destroyer, par, ....)\
t name_union;\
DESTROYER destroyer; /* if required */\
map_par par;\
....\
{\
ctype comp;\
....\
DECONS_union_field(comp, ...., name_union);
</programlisting></para>
<para>There is also an alternative function header,
<code>HDR_</code><literal>map</literal>_d_<literal>union_field</literal>,
which calls <code>DESTROY_</code><literal>union_field</literal>
rather than <code>DECONS_</code><literal>union_field</literal>. The
destructor function used is <literal>destroyer</literal>, if this
parameter is present, or the default destructor,
<code>destroy_</code><literal>name</literal>, otherwise.</para>
</sect2>
</sect1>
</chapter>
<chapter id="coutput">
<title>Implementation details</title>
<sect1 id="coutputtypes">
<title>Implementation of types</title>
<para>The C implementation of the type system
<link linkend="tokenoutput">specified above</link> is based on a
single type, <literal>name</literal>, with the same name as the input
algebra. This is defined as follows:
<programlisting>
typedef union name_tag {
unsigned int ag_tag;
union name_tag *ag_ptr;
unsigned ag_enum;
unsigned long ag_long_enum;
name_dim ag_dim;
t ag_prim_type;
....
} name;
</programlisting></para>
<para>where <literal>t</literal> runs over all the primitive types. All
of the types in the algebra can be packed into a block of cells of
type <literal>name</literal>. The following paragraphs describe the
implementation of each type, together with how they are packed as
blocks of cells. This is illustrated by the following diagram:</para>
<mediaobject>
<imageobject>
<imagedata fileref="images/calculus.png" format="PNG"/>
</imageobject>
<textobject>
<phrase>Packing of types</phrase>
</textobject>
<caption>
<para>Packing of types in calculus</para>
</caption>
</mediaobject>
<para>Primitive types are implemented by a <code>typedef</code> defining
the type in terms of its
<link linkend="outputprim">given definition</link>. A primitive type
can be packed into a single cell using the appropriate
<code>ag_prim_</code><literal>type</literal> field of the
union.</para>
<para>Identity types are implemented by a <code>typedef</code> defining
the type as being equal to another
<link linkend="outputident">type from the algebra</link>.</para>
<para>Enumeration types are all implemented as <code>unsigned int</code>
if all its values fit into 16 bits, or <code>unsigned long</code>
otherwise. An enumeration type can be packed into a single cell using
the <code>ag_enum</code> or <code>ag_long_enum</code> field of the
union.</para>
<para>Structure types are implemented by a <code>typedef</code> defining
the type to be the C structure with the
<link linkend="outputstruct">given components</link>. A structure type
may be packed into a block of cells by packing each of the components
in turn.</para>
<para>Union types are all implemented by a pointer to
<literal>name</literal>. This pointer references a block of cells.
The null pointer represents
<code>NULL_</code><literal>union</literal>, otherwise the first cell
contains the union discriminant tag (in the <code>ag_tag</code>
field), with the subsequent cells containing the packed field
components (shared components first, then unshared components). If the
union has only one field then the discriminant can be omitted. The
union itself can be packed into a single cell using the
<code>ag_ptr</code> field of the union.</para>
<para>Pointer types are all implemented by a pointer to
<literal>name</literal>. Null pointers represent
<code>NULL_ptr</code>, otherwise the pointer references a single cell.
This cell contains a pointer to the packed version of the object being
pointed to in its <code>ag_ptr</code> field. A pointer type itself can
be packed into a single cell using the <code>ag_ptr</code> field of
the union.</para>
<para>List types are all implemented by a pointer to
<literal>name</literal>. Null pointers represent
<code>NULL_list</code>, otherwise the pointer references a block of
two cells. The first cell gives the tail of the list in its
<code>ag_ptr</code> field; the second cell contains a pointer to the
packed version of the head of the list in its <code>ag_ptr</code>
field. A list type itself can be packed into a single cell using the
<code>ag_ptr</code> field of the union.</para>
<para>Stack types are identical to list types, with the head of the list
corresponding to the top of the stack.</para>
<para>Vector pointer and vector types are all implemented by structures
defined as follows:
<programlisting>
typedef unsigned int name_dim;
typedef struct {
name *vec;
name *ptr;
} name_VEC_PTR;
typedef struct {
name_dim dim;
name_VEC_PTR elems;
} name_VEC;
</programlisting></para>
<para>The <code>vec</code> field of a vector pointer contains a pointer
to a block of cells containing the packed versions of the elements of
the vector. The <code>ptr</code> field is a pointer to the current
element of this block. A vector type also has a field giving the
vector size. A vector pointer type can be packed into a block of two
cells, using the <code>ag_ptr</code> field of each to hold its two
components. A vector type can similarly be packed into a block of
three cells, the first holding the vector size in its
<code>ag_dim</code> field.</para>
<para>All size types are implemented as <code>int</code>.</para>
</sect1>
<sect1 id="coutputsupport">
<title>Support routines</title>
<para>The implementation requires the user to provide certain support
functions. Most fundamental are the routines for allocating and
deallocating the blocks of cells which are used to store the types.
These are specified as follows:
<programlisting>
name *gen_name(unsigned int);
void destroy_name(name *, unsigned int);
void dummy_destroy_name(name *, unsigned int);
</programlisting>
where <code>gen_</code><literal>name</literal> allocates a block of
cells of the given size, <code>destroy_</code><literal>name</literal>
deallocates the given block, and
<code>dummy_destroy_</code><literal>name</literal> has
<link linkend="outputbasic">no effect</link>. The precise details of
how the memory management is to be handled are left to the
user.</para>
<para>Certain generic list functions must also be provided, namely:
<programlisting>
void destroy_name_list(name *, unsigned int);
name *reverse_name_list(name *);
name *append_name_list(name *, name *);
name *end_name_list(name *);
</programlisting>
which implement the constructs <code>DESTROY_list</code>,
<code>REVERSE_list</code>, <code>APPEND_list</code> and
<code>END_list</code> respectively.</para>
<para>Finally a dummy empty vector:
<programlisting>
name_VEC empty_name_vec;
</programlisting>needs to be defined.</para>
<para>Examples of these support routines can be found in the
<link linkend="example"><command>calculus</command> program</link>
itself.</para>
</sect1>
<sect1 id="coutputassert">
<title>Run-time checking</title>
<para>The type checking facilities supported by
<command>calculus</command> allow for compile-time detection of many
potential programming errors, however there are many problems, such as
dereferencing null pointers, deconstructing empty lists and union tag
errors which can only be detected at run-time. For this reason
<command>calculus</command> can be made to add extra assertions to
check for such errors into its output. This is done by specifying the
<code>-a</code> command-line option.</para>
<para>These assertions are implemented by means of macros. If the macro
<code>ASSERTS</code> is defined then these macros expand to give the
run-time checks. If not the output is exactly equivalent to that which
would have been output if the <code>-a</code> option had not been
given.</para>
<para>The assertions require certain support functions which are output
into a separate file, <code>assert_def.h</code>. These functions need
to be compiled into the program when <code>ASSERTS</code> is defined.
Note that the functions are implementation specific.</para>
</sect1>
</chapter>
<chapter id="diskoutput">
<title>Disk reading and writing</title>
<para>One of the facilities which the
<link linkend="coutputtypes">homogeneous implementation</link> of the
type system described above allows for is the addition of persistence.
Persistence in this context means allowing objects to be written to, and
read from, disk. Also discussed in this section is the related topic of
the object printing routines, which allow a human readable
representation of objects of the type system to be output for debugging
or other purposes.</para>
<para>The disk reading and writing routines are output into the files
<filename>read_def.h</filename> and <filename>write_def.h</filename>
respectively if the <option>-d</option> command-line option is passed to
<command>calculus</command>. The object printing routines are output if
the <option>-p</option> option is given, with additional code designed
for use with run-time debuggers being added if the <option>-a</option>
option is also given.</para>
<para>All of these routines use extra constructs output in the main output
files (<filename>name.h</filename> and
<filename>union_ops.h</filename>), but not normally accessible. The
macro <literal>name</literal><code>_IO_ROUTINES</code> should be defined
in order to make these available for the disk reading and writing
routines to use.</para>
<sect1 id="diskwrite">
<title>Disk writing routines</title>
<para>The disk writing routines output in
<filename>write_def.h</filename> consist of a function:
<programlisting>
static void WRITE_type(t);
</programlisting>
for each type <literal>t</literal> mentioned within the algebra
description, which writes an object of that type to disk.</para>
<para>Note that such routines are output not only for the basic types,
such as unions and structures, but also for any composite types, such
as pointers and lists, which are used in their definition. The
<literal>type</literal> component of the name
<code>WRITE_</code><literal>type</literal> is derived from basic types
<literal>t</literal> by using the short type name. For composite types
it is defined recursively as follows:
<programlisting>
LIST(t)->list_type
PTR(t)->ptr_type
STACK(t)->stack_type
VEC(t)->vec_type
VEC_PTR(t)->vptr_type
</programlisting></para>
<para>Such functions are defined for identity types and composite types
involving identity types by means of macros which define them in terms
of the identity definitions.
<code>WRITE_</code><literal>type</literal> functions for the primitive
types should be provided by the user to form a foundation on which all
the other functions may be built.</para>
<para>The user may wish to generate
<code>WRITE_</code><literal>type</literal> (or other disk reading and
writing) functions for types other than those mentioned in the algebra
definition. This can be done by means of a command-line option of the
form <code>-E</code><literal>input</literal> where
<literal>input</literal> is a file containing a list of the extra
types required. In the notation
<link linkend="textinput">used above</link> the syntax for
<literal>input</literal> is given by:
<programlisting>
extra:
type-list<subscript>opt</subscript>
type-list:
type;
type-list type;
</programlisting></para>
<para>The <code>WRITE_</code><literal>type</literal> functions are
defined recursively in an obvious fashion. The user needs to provide
the writing routines for the primitives already mentioned, plus
support routines (or macros):
<programlisting>
void WRITE_BITS(int, unsigned int);
void WRITE_DIM(name_dim);
void WRITE_ALIAS(unsigned int);
</programlisting>
for writing a number of bits to disk, writing a vector dimension and
writing an <link linkend="diskalias">object alias</link>.</para>
<para>Any of the <code>WRITE_</code><literal>type</literal> functions
may be overridden by the user by defining a macro
<code>WRITE_</code><literal>type</literal> with the desired effect.
Note that the <code>WRITE_</code><literal>type</literal> function for
an identity can be overridden independently of the function for the
identity definition. This provides a method for introducing types
which are representationally the same, but which are treated
differently by the disk reading and writing routines.</para>
</sect1>
<sect1 id="diskread">
<title>Disk reading routines</title>
<para>The disk reading routines output in
<filename>read_def.h</filename> are exactly analogous to the disk
writing routines. For each type <literal>t</literal> (except
primitives) there is a function or macro:
<programlisting>
static t READ_type(void);
</programlisting>
which reads an object of that type from disk. The user must provide
the <code>READ_</code><literal>type</literal> functions for the
primitive types, plus support routines:
<programlisting>
unsigned int READ_BITS(int);
name_dim READ_DIM(void);
unsigned int READ_ALIAS(void);
</programlisting>
for reading a number of bits from disk, reading a vector dimension and
reading an <link linkend="diskalias">object alias</link>. The
<code>READ_</code><literal>type</literal> functions may be overridden
by means of macros as before.</para>
</sect1>
<sect1 id="diskprint">
<title>Object printing routines</title>
<para>The object printing routines output in
<filename>print_def.h</filename> consist of a function or macro:
<programlisting>
static void PRINT_type(FILE *, t, char *, int);
</programlisting>
for each type <literal>t</literal>, which prints an object of type
<literal>t</literal> to the given file, using the given object name
and indentation value. The user needs to provide basic output
routines:
<programlisting>
void OUTPUT_type(FILE *, t);
</programlisting>
for each primitive type. The
<code>PRINT_</code><literal>type</literal> functions may be overridden
by means of macros as before.</para>
<para>The printing routines are under the control of three variables
defined as follows:
<programlisting>
static int print_indent_step = 4;
static int print_ptr_depth = 1;
static int print_list_expand = 0;
</programlisting></para>
<para>These determine the indentation to be used in the output, to what
depth pointers are to be dereferenced when printing, and whether lists
and stacks are to be fully expanded into a sequence of elements or
just split into a head and a tail.</para>
<para>One application of these object printing routines is to aid
debugging programs written using the <command>calculus</command> tool.
The form of the type system implementation means that it is not easy
to extract information using run-time debuggers without a detailed
knowledge of the structure of this implementation. As a more
convenient alternative, if both the <code>-p</code> and
<code>-a</code> command-line options are given then
<command>calculus</command> will generate functions:
<programlisting>
void DEBUG_type(t);
</programlisting>
defined in terms of <code>PRINT_</code><literal>type</literal>, for
printing an object of the given type to the standard output. Many
debuggers have problems passing structure arguments, so for structure,
vector and vector pointer types
<code>DEBUG_</code><literal>type</literal> takes the form:
<programlisting>
void DEBUG_type(t *);
</programlisting>
These debugging routines are only defined conditionally, if the macro
<code>DEBUG</code> is defined.</para>
</sect1>
<sect1 id="diskalias">
<title>Aliasing</title>
<para>An important feature of the disk reading and writing routines,
namely aliasing, has been mentioned but not yet described. The problem
to be faced is that many of the objects built up using type systems
defined using <command>calculus</command> will be cyclic - they will
include references to themselves in their own definitions. Aliasing
is a mechanism for breaking such cycles by ensuring that only one copy
of an object is ever written to disk, or that only one copy is created
when reading from disk. This is done by associating a unique number as
an alias for the object.</para>
<para>For example, when writing to disk, the first time the object is
written the alias definition is set up. Consequently the alias number
is written instead of the object it represents. Similarly when reading
from disk, an alias may be associated with an object when it is read.
When this alias is encountered subsequently it will always refer to
this same object.</para>
<para>The objects on which aliasing can be specified are the
<link linkend="inputunion">union fields</link>. A union field may be
qualified by one or two hash symbols to signify that objects of that
type should be aliased.</para>
<para>The two hash case is used to indicate that the user wishes to gain
access to the objects during the aliasing mechanism. In the disk
writing case, the object to be written, <literal>x</literal> say, is
split into its components using the appropriate
<code>DECONS_</code><literal>union_field</literal> construct. Then the
user-defined routine, or macro:
<programlisting>
ALIAS_union_field(comp, ...., x);
</programlisting>
(where <literal>comp</literal> ranges over all the union components)
is called prior to writing the object components to disk.</para>
<para>Similarly in the disk reading case, the object being read,
<literal>x</literal>, is initialised by calling the user-defined
routine:
<programlisting>
UNALIAS_union_field(x);
</programlisting>
prior to reading the object components from disk. Each object
component is then read into a local variable, <literal>comp</literal>.
Finally the user-defined routine:
<programlisting>
UNIFY_union_field(comp, ...., x);
</programlisting>
(where <literal>comp</literal> ranges over all the union components)
is called to assign these values to <literal>x</literal> before
returning.</para>
<para>In the single hash case the object is not processed in this way.
It is just written straight to disk, or has its components immediately
assigned to it when reading from disk.</para>
<para>Note that aliasing is used, not just in the disk reading and
writing routines, but also in the object printing functions. After
calling any such function the user should call the routine:
<programlisting>
void clear_name_alias(void);
</programlisting>
to clear all aliases.</para>
<para>Aliases are implemented by adding an extra field to the objects to
be aliased, which contains the alias number, if this has been
assigned, or zero, otherwise. A list of all these extra fields is
maintained. In addition to the routine
<code>clear_</code><literal>name</literal>_alias mentioned above, the
user should provide support functions and variables:
<programlisting>
unsigned int crt_name_alias;
void set_name_alias(name *, unsigned int);
name *find_name_alias(unsigned int);
</programlisting>
giving the next alias number to be assigned, and routines for adding
an alias to the list of all aliases, and looking up an alias in this
list. Example implementations of these routines are given in the
<link linkend="example"><command>calculus</command> program</link>
itself.</para>
</sect1>
<sect1 id="application">
<title>Application to calculus</title>
<para>As <link linkend="example">mentioned above</link>, the
<command>calculus</command> program itself is an example of its own
application. It therefore contains routines for reading and writing a
representation of an algebra to and from disk, and for pretty-printing
the contents of an algebra. These may be accessed using the
<link linkend="options">command-line options</link> mentioned
above.</para>
<para>If the <option>-w</option> command-line option is specified to
<command>calculus</command> then it reads its input file,
<literal>input</literal>, as normal, but writes a disk representation
of the input algebra to <literal>output</literal>, which in this
instance is an output file, rather than an output directory. An output
file produced in this way can then be specified as an input file to
<command>calculus</command> if the <option>-r</option> option is
given. Finally the input algebra may be pretty-printed to an output
file (or the standard output if no <literal>output</literal> argument
is given) by specifying the <option>-o</option> option.</para>
</sect1>
</chapter>
<chapter id="template-files">
<title>Template files</title>
<para>It is possible to use <command>calculus</command> to generate an
output file from a template input file, <literal>template</literal>,
using the syntax:
<programlisting>
calculus [ options ] input .... -Ttemplate output
</programlisting></para>
<para>The template file consists of a list of either template directives
or source lines containing escape sequences which are expanded by
<command>calculus</command>. Template directive lines are distinguished
by having <code>@</code> as their first character. Escape sequences
consist of <code>%</code> following by one or more characters.</para>
<para>There are two template directives; loops take the form:
<programlisting>
@loop control
....
@end
</programlisting>
and conditionals take the form:
<programlisting>
@if condition
....
@else
....
@endif
</programlisting>
or:
<programlisting>
@if condition
....
@endif
</programlisting>
where <code>....</code> stands for any sequence of template directives
or source lines.</para>
<para>The <literal>control</literal> statements in a loop can be
<code>primitive</code>, <code>identity</code>, <code>enum</code>,
<code>struct</code> or <code>union</code> to loop over all the
primitive, identity, enumeration structure or union types within the
input algebra. Within an <code>enum</code> loop it is possible to use
<code>enum.const</code> to loop over all the enumeration constants of
the current enumeration type. Within a <code>struct</code> loop it is
possible to use <code>struct.comp</code> to loop over all the components
of the current structure. Within a <code>union</code> loop it is
possible to use <code>union.comp</code> to loop over all the shared
components of the current union, <code>union.field</code> to loop over
all the fields of the current union, and <code>union.map</code> to loop
over all the maps of the current union. Within a
<code>union.field</code> loop it is possible to use
<code>union.field.comp</code> to loop over all the components of the
current union field. Within a <code>union.map</code> loop it is possible
to use <code>union.map.arg</code> to loop over all the arguments of the
current union map.</para>
<para>The valid <literal>condition</literal> statements in a conditional
are <code>true</code> and <code>false</code>, plus
<code>comp.complex</code>, which is true if the current structure or
union field component has a complex type (i.e. those for which
<code>COPY_</code><literal>type</literal> and
<code>DEREF_</code><literal>type</literal> require two arguments), and
<code>comp.default</code>, which is true if the current structure or
union field component has a default initialiser value.</para>
<para>A number of escape sequences can be used anywhere. <code>%ZX</code>
and <code>%ZV</code> give the name and version number of the version of
<command>calculus</command> used. <code>%Z</code> and <code>%V</code>
give the name and version number of the input algebra. <code>%%</code>
and <code>%@</code> give <code>%</code> and <code>@</code> respectively,
and <code>%</code> as the last character in a line suppresses the
following newline character.</para>
<para>Within a <code>primitive</code> loop, <code>%PN</code> gives the
primitive name, <code>%PM</code> gives the short primitive name and
<code>%PD</code> gives the primitive definition.</para>
<para>Within an <code>identity</code> loop, <code>%IN</code> gives the
identity name, <code>%IM</code> gives the short identity name and
<code>%IT</code> gives the identity definition.</para>
<para>Within an <code>enum</code> loop, <code>%EN</code> gives the
enumeration name, <code>%EM</code> gives the short enumeration name and
<code>%EO</code> gives the enumeration order,
<code>ORDER_</code><literal>enum</literal>. Within an
<code>enum.const</code> loop, <code>%ES</code> gives the enumeration
constant name and <code>%EV</code> gives its value.</para>
<para>Within a <code>struct</code> loop, <code>%SN</code> gives the
structure name and <code>%SM</code> gives the short structure
name.</para>
<para>Within a <code>union</code> loop, <code>%UN</code> gives the union
name, <code>%UM</code> gives the short union name and <code>%UO</code>
gives the union order, <code>ORDER_</code><literal>union</literal>.
Within a <code>union.field</code> loop, <code>%FN</code> gives the field
name. Within a <code>struct.comp</code>, <code>union.comp</code> or
<code>union.field.comp</code> loop, <code>%CN</code> gives the component
name, <code>%CT</code> gives the component type, <code>%CU</code> gives
the short form of the component type and <code>%CV</code> gives the
default component initialiser value (if <code>comp.default</code> is
true). Within a <code>union.map</code> loop, <code>%MN</code> gives the
map name and <code>%MR</code> gives the map return type. Within a
<code>union.map.arg</code> loop, <code>%AN</code> gives the argument
name and <code>%AT</code> gives the argument type.</para>
<para>As an example, the following template file gives a simple algebra
pretty printer:
<programlisting>
ALGEBRA %X (%V):
/* PRIMITIVE TYPES */
@loop primitive
%PN (%PM) = "%PD";
@end
/* IDENTITY TYPES */
@loop identity
%IN (%IM) = %IT;
@end
/* ENUMERATION TYPES */
@loop enum
enum %EN (%EM) = {
@loop enum.const
%ES = %EV,
@end
};
@end
/* STRUCTURE TYPES */
@loop struct
struct %SN (%SM) = {
@loop struct.comp
%CT %CN;
@end
};
@end
/* UNION TYPES */
@loop union
union %UN (%UM) = {
@loop union.comp
%CT %CN;
@end
} + {
@loop union.field
%FN->{
@loop union.field.comp
%CT %CN;
@end
};
@end
};
@end
</programlisting></para>
</chapter>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
href="../../../common/colophon-dera.xml"/>
</book>