WebSVN – tendra.SVN – /trunk/doc/user/guides/calculus/calculus.xml

<?xml version="1.0" standalone="no"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
  "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">

<!--
  $Id$
-->
  
<book>
  <bookinfo>
    <title>Calculus Users' Guide</title>

    <corpauthor>The TenDRA Project</corpauthor>

    <author>
      <firstname>Jeroen</firstname>
      <surname>Ruigrok van der Werven</surname>
    </author>
    <authorinitials>JRvdW</authorinitials>
    <pubdate>2004</pubdate>

    <copyright>
      <year>2004</year>
      <year>2005</year>

      <holder>The TenDRA Project</holder>
    </copyright>

    <copyright>
      <year>1998</year>

      <holder>DERA</holder>
    </copyright>
  </bookinfo>

  <chapter id="introduction">
    <title>Introduction</title>

    <para>This document describes a tool, <command>calculus</command>, which
      allows complex type systems to be described in a simple algebraic
      format, and transforms this into a system of C types which implements
      this algebra.</para>

    <para><command>calculus</command> was initially written as a design tool
      for use with the TenDRA C and C++ producers. The producers' internal
      datatypes reflect the highly complex interdependencies between the C and
      C++ language concepts (types, expressions and so on) which they were
      designed to represent. A tool for managing this complexity and allowing
      changes to the internal datatypes to be made with confidence was
      therefore seen to be an absolute necessity. The tool can also be applied
      in similar situations where a complex type system needs to be
      described.</para>

    <para>The tool also provides for a separation between the specification of
      the type system and its implementation in terms of actual C types. This
      separation has a number of advantages.  The advantages of maintaining a
      level of abstraction in the specification are obvious. It is possible to
      apply extremely rigorous type checking to any program written using the
      tool, thereby allowing many potential errors to be detected at compile
      time. It is also possible to restrict access to the types to certain
      well-specified constructs, thereby increasing the maintainability of
      code written using the tool.</para>

    <para>This separation also has beneficial effects on the implementation of
      the type system. By leaving all the type checking aspects at the
      specification level, it is possible to impose an extremely homogeneous
      underlying implementation. This means, for example, that a single memory
      management system can be applied to all the types within the system. It
      also opens the possibility of writing operations which apply to all
      types within the system in a straightforward manner. The
      <link linkend="diskoutput">disk reading and writing routines</link>
      described below are an example of this.</para>

    <sect1 id="options">
      <title>Using calculus</title>

      <para>The general form for invoking <command>calculus</command> is as
        follows:
        <command>calculus<optional><option>options</option></optional>
        <option>input</option> ....
        <optional><option>output</option></optional></command>
        where <option>input</option> is a file containing the input type
        system description and <option>output</option> is the directory where
        the files implementing this system are to be output. This directory
        must already exist. If no <option>output</option> argument is given
        then the current working directory is assumed. Note that several
        <option>input</option> files can be given.  Unless
        <link linkend="module">otherwise specified</link> it is the last file
        which is used to generate the output.</para>

      <para>The form in which the
        <link linkend="textinput">input type systems</link>
        are expressed is described below. The form of the output depends on
        <option>options</option>. By default, the C implementation of the type
        system is output. If the <option>-t</option> option is passed to
        <command>calculus</command> then
        <ulink url="tcpplus.html#token"><code>#pragma token</code></ulink>
        statements describing the type system specification are output.  This
        <link linkend="tokenoutput">specification</link> is given below, with
        some of the more important
        <link linkend="coutput">implementation details</link> being described
        in the following section.</para>

      <para>Note that it is necessary to check any program written using
        <command>calculus</command> against the <code>#pragma token</code>
        version of the specification in order to get the full benefits of the
        rigorous type checking provided by the tool. Some sections of the
        program (if only the
        <link linkend="coutputsupport">basic functions</link>) may be
        dependent upon the implementation, and so not be suitable for this
        form of checking.</para>
    </sect1>

    <sect1 id="example">
      <title>Example program</title>

      <para>The program <command>calculus</command> itself was written using a
        type system specified using the <command>calculus</command> tool. It
        is designed to provide an example of its own application, with some
        features not strictly necessary for the functionality of the program
        being added for illustrative purposes.</para>
    </sect1>
  </chapter>

  <chapter id="textinput">
    <title>Input syntax</title>

    <para>The overall input file format is as follows:

      <programlisting>
algebra:
        ALGEBRA identifier version<subscript>opt</subscript>: item-list<subscript>opt</subscript>

version:
        (integer.integer)

item-list:
        item
        item-list item

item:
        primitive
        identity
        enumeration
        structure
        union
      </programlisting></para>

      <para>The initial identifier gives the overall name of the algebra.  A
        version number may also be associated with the algebra (if this is
        omitted the version is assumed to be 1.0). The main body of the
        algebra definition consists of a list of items describing the
        primitives, the identities, the enumerations, the structures and the
        unions comprising the algebra.</para>

      <para>Here <literal>identifier</literal> has the same meaning as in C.
        The only other significant lexical units are
        <literal>integer</literal>, which consists of a sequence of decimal
        digits, and <literal>string</literal>, which consists of any number of
        characters enclosed in double quotes. There are no escape sequences in
        strings. C style comments may be used anywhere in the input. White
        space is not significant.</para>

    <sect1 id="inputprim">

      <title>Primitives</title>

      <para>Primitives form the basic components from which the other types in
        the algebra are built up. They are described as follows:

        <programlisting>
primitive:
        object-identifier = quoted-type;
        </programlisting>

        where the primitive identifier is given by:

        <programlisting>
object-identifier:
        #<subscript>opt</subscript>:<subscript>opt</subscript> identifier
        #<subscript>opt</subscript>:<subscript>opt</subscript> identifier(identifier)
        </programlisting>

        and the primitive definition is a string which gives the C type
        corresponding to this primitive:

        <programlisting>
quoted-type:
        string
        </programlisting></para>

      <para>Note that each primitive (and also each identity, each
        enumeration, each structure and each union) has two names associated
        with it. The second name is optional; if it is not given then it is
        assumed to be the same as the first name. The first name is that which
        will be given to the corresponding type in the output file. The second
        is a short form of this name which will be used in forming constructor
        names etc. in the output.</para>

      <para>The optional hash and colon which may be used to qualify an object
        identifier are provided for backwards compatibility only and are not
        used in the output routines.</para>
    </sect1>

    <sect1 id="inputident">
      <title>Identities</title>

      <para>Identities are used to associate a name with a particular type in
        the algebra. In this they correspond to <code>typedef</code>s in C.
        They are described as follows:

        <programlisting>
identity:
        object-identifier = type;
        </programlisting>
        
        where the definition type, <literal>type</literal>, is as described
        <link linkend="inputtype">below</link>.</para>
    </sect1>

    <sect1 id="inputenum">
      <title>Enumerations</title>

      <para>Enumerations are used to define types which can only take values
        from some finite set. They are described as follows:

      <programlisting>
enumeration:
        enum !<subscript>opt</subscript> object-identifier = { enumerator-list };
        enum !<subscript>opt</subscript> object-identifier = base-enumeration + { enumerator-list };
      </programlisting>

      where:

      <programlisting>
base-enumeration:
        identifier
      </programlisting>

      is the name of a previously defined enumeration type.  The latter form
      is used to express extension enumeration types.  An enumeration type may
      be qualified by an exclamation mark to indicate that no lists of this
      type will be constructed.</para>

      <para>The enumeration constants themselves are defined as
        follows:

        <programlisting>
enumerator:
        identifier
        identifier = enumerator-value

enumerator-list:
        enumerator
        enumerator-list, enumerator
        </programlisting></para>

      <para>Each enumerator is assigned a value in an ascending sequence,
        starting at zero. The next value to be assigned can be set using an
        <literal>enumerator-value</literal>. This is an expression formed from
        <literal>integer</literal>s, <literal>identifier</literal>s
        representing previous enumerators from the same enumeration, and the
        question mark character which stands for the previous enumeration
        value. The normal C arithmetic operations can be applied to build up
        more complex <literal>enumerator-value</literal>s. All enumerator
        evaluation is done in the <code>unsigned long</code> type of the host
        machine. Values containing more than 32 bits are not portable.</para>

      <para>Enumerations thus correspond to enumeration types in C, except
        that they are genuinely distinct types.</para>
    </sect1>

    <sect1 id="inputstruct">
      <title>Structures</title>

      <para>Structures are used to build up composite types from other types
        in the algebra. They correspond to structures in C. They are described
        as follows:

        <programlisting>
structure:
        struct object-identifier = component-group;
        struct object-identifier = base-structure + component-group;
        </programlisting>

        where:

        <programlisting>
base-structure:
        identifier
        </programlisting>
                
        is the name of a previously defined structure type.  The latter form
        is used to express (single) inheritance of structures.  All components
        of the base structure also become components of the derived
        structure.</para>

      <para>The structure components themselves are defined as follows:

        <programlisting>
component-group:
        { component-list<subscript>opt</subscript> }

component-list:
        component-declarations;
        component-list component-declarations;

component-declarations:
        type component-declarators

component-declarators:
        component-declarator
        component-declarators, component-declarator

component-declarator:
        identifier component-initialiser<subscript>opt</subscript>

component-initialiser:
        = string
        </programlisting></para>
        
      <para>The optional <link linkend="outputstruct">component initialiser
        strings</link> are explained below.</para>

      <para>Structures are the only algebra construct which prevent the input
        from being a general graph. Unions may be defined in terms of
        themselves, but (as in C) pointers must be used to define structures
        in terms of themselves.</para>
    </sect1>

    <sect1 id="inputunion">
      <title>Unions</title>

      <para>Unions are used to build up types which can hold a variety of
        information. They differ from C unions in that they are discriminated.
        They are described as follows:

        <programlisting>
union:
        union object-identifier = component-group + field-group map-group<subscript>opt</subscript>;
        union object-identifier = base-union + field-group map-group<subscript>opt</subscript>;
        </programlisting>

        where:

        <programlisting>
base-union:
        identifier
        </programlisting>
                
        is the name of a previously defined union type. The latter form is
        used to express (single) inheritance of unions. All components, fields
        and maps of the base union also become components of the derived
        union. Note that only new fields and maps can be added in the derived
        union.</para>

      <para>The <literal>component-group</literal> gives a set of components
        which are common to all the different union cases. The cases
        themselves are described as follows:

        <programlisting>
field-group:
        { field-list }

field:
        #<subscript>opt</subscript> #<subscript>opt</subscript> field-identifier-list-&gt;component-group
        #<subscript>opt</subscript> #<subscript>opt</subscript> field-identifier-list-&gt;base-field + component-group

base-field:
        identifier

field-list:
        field
        field-list, field

field-identifier:
        identifier

field-identifier-list:
        field-identifier
        field-identifier-list, field-identifier
        </programlisting></para>

      <para>The optional one or two hashes which may be used to qualify a list
        of field identifiers are used to indicate
        <link linkend="diskalias">aliasing</link> in the disk reading and
        writing routines.  The <literal>base-field</literal> case is a
        notational convenience which allows one field in a union to inherit
        all the components of another field.</para>

      <para>Note that a number of field identifiers may be associated with the
        same set of field components. Any such list containing more than one
        identifier forms a field identifier set, named after the first field
        identifier.</para>

      <para>In addition a number of maps may be associated with a union.
        These maps correspond to functions which take the union, plus a number
        of other map parameter types, and return the map return type. They are
        described as follows:

        <programlisting>
map-group:
        :[map-list<subscript>opt</subscript>]

map:
        extended-type #<subscript>opt</subscript> identifier(parameter-list<subscript>opt</subscript>)

map-list:
        map
        map-list map
        </programlisting>
                
        where:
        
        <programlisting>
parameter-list:
        parameter-declarations
        parameter-list; parameter-declarations

parameter-declarations:
        extended-type parameter-declarators

parameter-declarators:
        identifier
        parameter-declarators, identifier
        </programlisting></para>
        
      <para>Note that the map parameter and return types are given by:

        <programlisting>
          extended-type:
                  type
                  quoted-type
        </programlisting></para>
                  
      <para>In addition to the
        <link linkend="inputtype">types derived from the algebra</link> it is
        possible to use quoted C types in this context.</para>

      <para>A map may be qualified by means of a hash. This means that the
        associated function also takes a
        <link linkend="outputunionmaps">destructor function</link> as a
        parameter.</para>
    </sect1>

    <sect1 id="inputtype">
      <title>Type constructors</title>

      <para>The types derived from the algebra may be described as follows:

        <programlisting>
type:
        identifier
        PTR type
        LIST type
        STACK type
        VEC type
        VEC_PTR type
        </programlisting></para>
                
      <para>The simple types correspond to primitive, identity, enumeration,
        structure or union names. It is possible for a type to be used before
        it is defined, but it must be defined at some point.</para>

      <para>The derived type constructors correspond to pointers, lists,
        stacks, vectors and pointers into vectors. They may be used to build
        up further types from the basic algebra types.</para>
    </sect1>

    <sect1 id="module">
      <title>Relations between algebras</title>

      <para>As <link linkend="options">mentioned above</link>, more than one
        input algebra may be specified to <command>calculus</command>. Each is
        processed separately, and output is generated for only one. By default
        this is the last algebra processed, however a specific algebra can be
        specified using the command-line option <option>-A</option>
        <option>name</option>, where <option>name</option> is the name of the
        algebra to be used for output.</para>

      <para>Types may be imported from one algebra to another by means of
        commands of the form:

        <programlisting>
import:
        IMPORT identifier;
        IMPORT identifier::identifier;
        </programlisting>
                
        which fit into the main syntax as an <literal>item</literal>. The
        first form imports all the types from the algebra given by
        <literal>identifier</literal> into the current algebra. The second
        imports a single type, given by the second
        <literal>identifier</literal> from the algebra given by the first
        <literal>identifier</literal>.</para>

      <para>Note that importing a type in this way also imports all the types
        used in its construction. This includes such things as structure
        components and union fields and maps. Thus an algebra consisting just
        of <literal>import</literal> commands can be used to express
        subalgebras in a simple fashion.</para>
    </sect1>
  </chapter>

  <chapter id="tokenoutput">
    <title>Output type system specification</title>

    <para>In this section we document the basic output of
      <command>calculus</command>. Two forms of the output can be generated -
      a description of the output specification in terms of the TenDRA
      <code>#pragma token</code> constructs, and the actual C code which
      implements these tokens.</para>

    <para>In this section the description given will be at the level of the
      output specification. The more important details of the
      <link linkend="coutput">C implementation</link> are given in the
      following section.</para>

    <para>The output is split among several header files in the specified
      output directory. The main output is printed into
      <filename>name.h</filename>, where <literal>name</literal> is the
      overall algebra name. Unless otherwise stated, all the objects specified
      below are to be found in <literal>name</literal><code>.h</code>. However
      for each union, <literal>union</literal>, in the algebra certain
      information associated with the union is printed into
      <literal>union</literal><code>_ops.h</code>. If the union has any maps
      associated with it then further output is printed to
      <filename>union_map.h</filename> and <filename>union_hdr.h</filename>.
      </para>

    <sect1 id="outputinfo">
      <title>Version information</title>

      <para>Certain basic information about the input algebra is included in
        <literal>name</literal><code>.h</code>. <literal>name</literal>
        <code>_NAME</code> is a string literal giving the overall algebra
        name.  <literal>name</literal><code>_VERSION</code> is a string
        literal giving the algebra version number.
        <literal>name</literal><code>_SPECIFICATION</code> and
        <literal>name</literal><code>_IMPLEMENTATION</code> are flags which
        take the values 0 or 1 depending on whether the specification of the
        type system in terms of <code>#pragma token</code> statements or the C
        implementation is included.</para>
    </sect1>

    <sect1 id="outputbasic">
      <title>Basic types</title>

      <para>Six abstract type operators, each taking a type as argument and
        returning a type, are specified as follows:

        <programlisting>
TYPE PTR(TYPE t);
TYPE LIST(TYPE t);
TYPE STACK(TYPE t);
TYPE VEC(TYPE t);
TYPE VEC_PTR(TYPE t);
TYPE SIZE(TYPE t);
        </programlisting></para>
        
      <para>These represent a pointer to an object of type
        <literal>t</literal>, a list of objects of type <literal>t</literal>,
        a stack of objects of type <literal>t</literal>, a vector of objects
        of type <literal>t</literal>, a pointer into a vector of objects of
        type <literal>t</literal>, and an allocation block size for type
        <literal>t</literal> respectively. (It is possible to suppress all
        constructs involving <code>VEC</code> or <code>VEC_PTR</code> by
        passing the <code>-x</code> command-line option to
        <command>calculus</command>. Similarly <code>STACK</code> constructs
        may be suppressed using <code>-z</code>.)</para>

      <para>These constructors can be applied to any type, although in
        practice they are only applied to the types specified in the algebra
        and those derived from them. For example, we may form the type:

        <programlisting>
LIST(PTR(int))
        </programlisting>
        
        representing a list of pointers to <code>int</code>.</para>

      <para>An integral type:

        <programlisting>
INT_TYPE name_dim;
        </programlisting>
        
        is specified, where <literal>name</literal> is the overall algebra
        name.  This type is used to represent the sizes of vectors.</para>

      <para>A function pointer type:

        <programlisting>
typedef void (*DESTROYER)();
        </programlisting>
        
        is specified, which represents a destructor function.  Two destructor
        functions are specified:

        <programlisting>
void destroy_name();
void dummy_destroy_name();
        </programlisting>
        
        where <literal>name</literal> is as above.
        <code>destroy_</code><literal>name</literal> is the default
        destructor, whereas <code>dummy_destroy_</code><literal>name</literal>
        is a dummy destructor which has no effect. The details of the
        arguments passed to the destructors and so on are implementation
        dependent.</para>
    </sect1>

    <sect1 id="outputsize">
      <title>Operations on sizes</title>

      <para>The <code>SIZE</code> type constructor is used to represent a
        multiple of the size of a type. It is used, for example, in the
        <link linkend="outputptr">pointer stepping construct</link>
        <code>STEP_ptr</code> to specify the number of units the pointer is to
        be increased by.  Having a separate abstract type for the size of each
        type greatly increases the scope for type checking of memory
        allocation, and other, functions.</para>

      <para>For each basic type in the algebra (a primitive, a structure or a
        union), there is a constant expression:

        <programlisting>
SIZE ( t ) SIZE_type;
        </programlisting>
        
        where <literal>t</literal> denotes the type itself, and
        <literal>type</literal> is the associated type name.</para>

      <para>For the five other type constructors
        <link linkend="outputbasic">described above</link> there are constant
        expressions:

        <programlisting>
SIZE(PTR(t))      SIZE_ptr(TYPE t);
SIZE(LIST(t))     SIZE_list(TYPE t);
SIZE(STACK(t))    SIZE_stack(TYPE t);
SIZE(VEC(t))      SIZE_vec(TYPE t);
SIZE(VEC_PTR(t))  SIZE_vec_ptr(TYPE t);
      </programlisting>
      
      for any type <literal>t</literal>.</para>

      <para>These constructs allow the size of any type derived from the
        algebra to be built up. There is also a construct which corresponds to
        multiplying the size of a type by a constant. This takes the
        form:

        <programlisting>
SIZE(t) SCALE(SIZE(t), INT_TYPE itype);
        </programlisting>

        for any type <literal>t</literal> and any integral type
        <literal>itype</literal>.</para>
    </sect1>

    <sect1 id="outputptr">
      <title>Operations on pointers</title>

      <para>The <code>PTR</code> type constructor is used to represent a
        pointer to an object of type <literal>t</literal>. It should be
        emphasised that this construct is not in general implemented by means
        of the C pointer construct.</para>

      <sect2 id="simple-pointer-constructs">
        <title>Simple pointer constructs</title>

        <para>There are several simple operations on pointers, given as
          follows:

          <programlisting>
PTR(t) NULL_ptr(TYPE t);
int IS_NULL_ptr(PTR (t));
int EQ_ptr(PTR(t), PTR(t));
PTR(t) STEP_ptr(PTR(t), SIZE(t));
          </programlisting></para>

        <para>The construct <code>NULL_ptr</code> is used to form the null
          pointer to <literal>t</literal> for a type <literal>t</literal>.
          This is a constant expression.  <code>IS_NULL_ptr</code> can be used
          to test whether a given pointer expression is a null pointer.
          Similarly <code>EQ_ptr</code> checks whether two pointers are equal
          (note that we can only compare pointers to the same type). Finally
          <code>STEP_ptr</code> can be used to add a scalar value to a
          pointer.</para>
      </sect2>

      <sect2 id="unique-pointers">
        <title>Unique pointers</title>

        <para>There are also constructs for generating and destroying unique
          pointers:

          <programlisting>
PTR(t) UNIQ_ptr(TYPE t);
void DESTROY_UNIQ_ptr(PTR(t));
          </programlisting></para>
        
        <para>A unique pointer is guaranteed to be different from all other
          undestroyed pointers. Dereferencing a unique pointer is
          undefined.</para>
      </sect2>

      <sect2 id="pointer-construction-destruction">
        <title>Pointer construction and destruction</title>

        <para>The constructs:

          <programlisting>
PTR(t) MAKE_ptr(SIZE(t));
void DESTROY_ptr(PTR(t), SIZE(t));
          </programlisting>

          are used to respectively create and destroy a pointer to a given
          type.</para>
      </sect2>

      <sect2 id="construct-assignment-dereference">
        <title>Assignment and dereference constructs</title>

        <para>The constructs for assigning and dereferencing pointers have one
          of two forms depending on the complexity of the type pointed to. For
          simple types, including primitive, enumeration and union types, they
          have the form:

          <programlisting>
void COPY_type(PTR(t), t);
t DEREF_type(PTR(t));
          </programlisting>

          where <literal>t</literal> is the type involved and
          <literal>type</literal> is the associated short name.</para>

        <para>For more complex types, including structures, the assignment or
          dereference cannot be done in a single expression, so the constructs
          are specified to be statements as follows:

          <programlisting>
STATEMENT COPY_type(PTR(t), t);
STATEMENT DEREF_type(PTR(t), lvalue t);
          </programlisting></para>

        <para>Here the <code>lvalue</code> type qualifier is used to indicate
          that the statement argument is an assignable <code>lvalue</code>. In
          this case it should give the location of an object of type
          <literal>t</literal> into which the pointer will be
          dereferenced.</para>

        <para>The appropriate assignment and dereference constructs are
          generated for each of the basic algebra types (primitives,
          enumerations, structures and unions). In addition there are generic
          assignment and dereference constructs for pointer types, list types,
          stack types, vector types and vector pointer types.  The first three
          are of the first form above, whereas the second two have the second
          form, as follows:

          <programlisting>
void COPY_ptr(PTR(PTR(t)), PTR(t));
PTR(t) DEREF_ptr(PTR (PTR (t)));
void COPY_list(PTR(LIST(t)), LIST(t));
LIST(t) DEREF_list(PTR(LIST(t)));
void COPY_stack(PTR(STACK(t)), STACK(t));
STACK(t) DEREF_stack(PTR(STACK(t)));
STATEMENT COPY_vec(PTR(VEC(t)), VEC(t));
STATEMENT DEREF_vec(PTR(VEC(t)), lvalue VEC(t));
STATEMENT COPY_vec_ptr(PTR(VEC_PTR(t)), VEC_PTR(t));
STATEMENT DEREF_vec_ptr(PTR(VEC_PTR(t)), lvalue VEC_PTR(t));
          </programlisting></para>
      </sect2>
    </sect1>

    <sect1 id="outputlist">
      <title>Operations on lists</title>

      <para>The <code>LIST</code> type constructor is used to represent a list
        of objects of type <literal>t</literal>.</para>

      <sect2 id="simple-list-constructs">
        <title>Simple list constructs</title>

        <para>There are several simple list constructs:

          <programlisting>
LIST(t) NULL_list(TYPE t);
int IS_NULL_list(LIST(t));
int EQ_list(LIST(t), LIST(t));
unsigned LENGTH_list(LIST(t));
PTR(t) HEAD_list(LIST(t));
LIST(t) TAIL_list(LIST(t));
PTR(LIST(t)) PTR_TAIL_list(LIST(t));
LIST(t) END_list(LIST(t));
LIST(t) REVERSE_list(LIST(t));
LIST(t) APPEND_list(LST(t), LIST(t));
          </programlisting></para>

        <para>Empty lists may be constructed or tested for.
          <code>NULL_list</code> is a constant expression. Two lists may be
          checked for equality (note that this is equality of lists, rather
          than element-wise equality).  The number of elements in a list can
          be found. The head or tail (or <code>car</code> and
          <code>cdr</code>) of a list may be formed. The end element of a list
          may be selected. The order of the elements in a list can be
          reversed. One list can be appended to another.</para>
      </sect2>

      <sect2 id="unique-lists">
        <title>Unique lists</title>

        <para>There are also constructs for generating and destroying unique
          lists:

          <programlisting>
LIST(t) UNIQ_list(TYPE t);
void DESTROY_UNIQ_list(LIST(t));
          </programlisting></para>

        <para>A unique lists is guaranteed to be different from all other
          undestroyed lists. Taking the head or tail of a unique list is
          undefined.</para>
      </sect2>

      <sect2 id="list-construction-destruction">
        <title>List construction and destruction</title>

        <para>For each type <literal>t</literal> there are operations for
          constructing and deconstructing lists. For the basic types
          comprising the algebra these take the form:

          <programlisting>
STATEMENT CONS_type(t, LIST(t), lvalue LIST(t));
STATEMENT UN_CONS_type(lvalue t, lvalue LIST(t), LIST(t)) ;
STATEMENT DESTROY_CONS_type(DESTROYER, lvalue t, lvalue LIST(t), LIST(t));
          </programlisting>

          where <literal>type</literal> is the short type name.</para>

        <para>The operation <code>CONS_</code><literal>type</literal> is used
          to build a list whose head is the first argument and whose tail is
          the second argument. This is assigned to the third argument.
          <code>UN_CONS_</code><literal>type</literal> reverses this process,
          decomposing a list into its head and its tail.
          <code>DESTROY_CONS_</code><literal>type</literal> is similar, but it
          also applies the given destructor function to the decomposed list
          component.</para>

        <para>There are also generic list construction and deconstruction
          operations which apply to lists of pointers (with a <code>ptr</code>
          suffix), lists of lists (with a <code>list</code> suffix), lists of
          stacks (with a <code>stack</code> suffix), lists of vectors (with a
          <code>vec</code> suffix) and lists of pointers to vectors (with a
          <code>vec_ptr</code> suffix). For example, for lists of pointers
          these have the form:

          <programlisting>
STATEMENT CONS_ptr(PTR(t), LIST(PTR(t)), lvalue LIST(PTR(t))) ;
STATEMENT UN_CONS_ptr(lvalue PTR(t), lvalue LIST(PTR(t)), LIST(PTR(t)));
STATEMENT DESTROY_CONS_ptr(DESTROYER, lvalue PTR(t), lvalue LIST(PTR(t)), LIST(PTR(t)));
          </programlisting></para>

        <para>There is also a generic list destruction construct:

          <programlisting>
          STATEMENT DESTROY_list(LIST(t), SIZE(t));
          </programlisting>

          which may be used to destroy all the elements in a list.</para>
      </sect2>
    </sect1>

    <sect1 id="outputstack">
      <title>Operations on stacks</title>

      <para>The <code>STACK</code> type constructor is used to represent
        stacks of objects of type <literal>t</literal>. Empty stacks can be
        created and tested for:

        <programlisting>
STACK(t) NULL_stack(TYPE t);
int IS_NULL_stack(STACK(t));
        </programlisting></para>

      <para>For each type <literal>t</literal> there are operations for
        pushing objects onto a stack and for popping elements off. For the
        basic types comprising the algebra these take the form:

        <programlisting>
STATEMENT PUSH_type(t, lvalue STACK(t));
STATEMENT POP_type(lvalue t, lvalue STACK(t));
        </programlisting>

        where <literal>type</literal> is the short type name. There are also
        generic constructs, such as <code>PUSH_ptr</code> for pushing any
        pointer type, similar to the
        <link linkend="outputlist">generic list constructors</link>
        above.</para>

      <para>Stacks are in fact just modified lists with pushing corresponding
        adding an element to the start of a list, and popping to removing the
        first element. There are constructs:

        <programlisting>
LIST(t) LIST_stack(STACK(t));
STACK(t) STACK_list(LIST(t));
        </programlisting>

        for explicitly converting between lists and stacks.</para>
    </sect1>

    <sect1 id="outputvec">
      <title>Operations on vectors</title>

      <para>The <code>VEC</code> type constructor is used to represent vectors
        of objects of type <literal>t</literal>. There are a number of simple
        operations on vectors:

        <programlisting>
VEC(t) NULL_vec(TYPE t);
name_dim DIM_vec(VEC(t));
name_dim DIM_ptr_vec(PTR(VEC(t)));
PTR(t) PTR_ptr_vec(PTR(VEC(t)));
        </programlisting></para>

      <para>An empty vector can be constructed (note that, unlike null
        pointers and null lists, this is not a constant expression). The
        number of elements in a vector (or in a vector given by a pointer) can
        be determined (note how the type
        <literal>name</literal><code>_dim</code> is used to represent vector
        sizes). A pointer to a vector can be transformed into a pointer to the
        elements of the vector.</para>

      <para>In general a vector may be created or destroyed using the
        operators:

        <programlisting>
STATEMENT MAKE_vec(SIZE(t), name_dim, lvalue VEC(t));
STATEMENT DESTROY_vec(VEC(t), SIZE(t));
        </programlisting></para>

      <para>Finally a vector can be trimmed using:

        <programlisting>
STATEMENT TRIM_vec(VEC(t), SIZE(t), int, int, lvalue VEC(t));
        </programlisting>

        the two integral arguments giving the lower and upper bounds for the
        trimming operation.</para>
    </sect1>

    <sect1 id="outputvecptr">
      <title>Operations on vector pointers</title>

      <para>The <code>VEC_PTR</code> type constructor is used to represent a
        pointer to an element of a vector of objects of type
        <literal>t</literal>.  Apart from the basic constructors already
        mentioned, there are only two operations on vector pointers:

        <programlisting>
VEC_PTR(t) VEC_PTR_vec(VEC(t));
PTR(t) PTR_vec_ptr(VEC_PTR(t));
        </programlisting></para>

      <para>The first transforms a vector into a vector pointer (pointing to
        its first argument). The second transforms a vector pointer into a
        normal pointer.</para>
    </sect1>

    <sect1 id="outputprim">
      <title>Operations on primitives</title>

      <para>Each primitive specified within the algebra maps onto a
        <code>typedef</code> defining the type in terms of its given
        definition.  The only operations on primitives are those listed above
        - the size constructs, the pointer assignment and dereference
        operations, the list construction and deconstruction operations and
        the stack push and pop operations.</para>
    </sect1>

    <sect1 id="outputident">
      <title>Operations on identities</title>

      <para>Each identity specified within the algebra maps onto a
        <code>typedef</code> in the output file. There are no operations on
        identities.</para>
    </sect1>

    <sect1 id="outputenum">
      <title>Operations on enumerations</title>

      <para>Each enumeration specified within the algebra maps onto an
        unsigned integral type in the output file. The 
        <link linkend="outputprim">basic operations</link>
        listed above are always generated unless it has been indicated that
        <link linkend="inputenum">no lists of this type will be formed</link>,
        when <code>CONS_</code><literal>type</literal> and the other list and
        stack operators are suppressed. In addition each enumerator which is a
        member of this enumeration maps onto a constant expression:

        <programlisting>
t enum_member;
        </programlisting>

        where <literal>t</literal> is the enumeration type,
        <literal>enum</literal> is the short type name, and
        <literal>member</literal> is the enumerator name. It is guaranteed
        that the first enumerator will have value 0, the second 1, and so on,
        unless the value of the enumerator is explicitly given (as in C).
        There is also a constant expression:

        <programlisting>
unsigned long ORDER_enum;
        </programlisting>

        giving one more than the maximum enumerator in the enumeration.</para>
    </sect1>

    <sect1 id="outputstruct">
      <title>Operations on structures</title>

      <para>Each structure specified within the algebra maps onto a
        <code>typedef</code> defining the type to be the C structure with the
        given components. In addition to the
        <link linkend="outputprim">basic operations</link>
        listed above there are also field selectors defined.</para>

      <para>Suppose that <literal>t</literal> is a structure type with short
        name <literal>struct</literal>, and that <literal>comp</literal> is a
        component of <literal>t</literal> of type <literal>ctype</literal>.
        Then there is an operation:

        <programlisting>
PTR(ctype) struct_comp(PTR(t));
        </programlisting>

        which transforms a pointer to the structure into a pointer to the
        component. There is also an operation:

        <programlisting>
STATEMENT MAKE_struct(ctype, ...., PTR(t));
        </programlisting>

        where <literal>ctype</literal> ranges over all the component types
        which do not have a component initialiser string in the structure
        definition. This is used to assign values to all the components of a
        structure. If a component has an initialiser string then this is used
        as an expression giving the initial value, otherwise the given
        operation argument is used. The initialiser strings are evaluated in
        the context of the <code>MAKE_</code><literal>struct</literal>
        statement.  The parameters to the operation may be referred to by the
        corresponding component name followed by <code>_</code>. The partially
        initialised object may be referred to by the special character
        sequence <code>%0</code>. Any remainder operations should be written
        as <code>%%</code> rather than <code>%</code>.</para>

      <para>Inheritance in structures is handled as follows: if
        <literal>t</literal> is a derived structure with base structure
        <literal>btype</literal> then there is an operation:

        <programlisting>
PTR(btype) CONVERT_struct_base(PTR(t));
        </programlisting>

        where <literal>struct</literal> and <literal>base</literal> are the
        short names of <literal>t</literal> and <literal>btype</literal>
        respectively.</para>
    </sect1>

    <sect1 id="outputunion">
      <title>Operations on unions</title>

      <para>Each union specified within the algebra maps onto an opaque type
        in the output file. In addition to the
        <link linkend="outputprim">basic operations</link> listed above there
        are a number of other constructs output into
        <literal>name</literal><code>.h</code>, namely:

        <programlisting>
unsigned int ORDER_union;
t NULL_union;
int IS_NULL_union(t);
int EQ_union(t, t);
        </programlisting>

        where <literal>t</literal> denotes the union type, and
        <literal>union</literal> is the short type name.
        <code>ORDER_</code><literal>union</literal> is a constant value giving
        the number of union fields.
        <code>NULL_</code><literal>union</literal> is a distinguished constant
        value of type <literal>t</literal>. Values of type
        <literal>t</literal> may be compared against this distinguished value,
        or against each other.</para>

      <sect2 id="union-construction-operations">
        <title>Union construction operations</title>

        <para>Most of the output for the union type <literal>t</literal> is
          output into the file <filename>union_ops.h</filename>. This contains
          a construct:

          <programlisting>
unsigned int TAG_union(t);
          </programlisting>

          for extracting the discriminant tag from a union.</para>

        <para>For each shared component, <literal>comp</literal>, of
          <literal>t</literal> of type <literal>ctype</literal>, say, there is
          an operator:

          <programlisting>
PTR(ctype) union_comp(t);
          </programlisting>

          which extracts this component from the union.</para>

        <para>For each field, <literal>field</literal>, of the union there are
          constructs:

          <programlisting>
unsigned int union_field_tag;
int IS_union_field(t);
          </programlisting>

          giving the (constant) discriminant tag associated with this field,
          and a test for whether an object of type <literal>t</literal> has
          this discriminant.</para>

        <para>In addition, for each unshared component,
          <literal>comp</literal>, of <literal>field</literal> of type
          <literal>ctype</literal>, say, there is an operator:

          <programlisting>
PTR(ctype) union_field_comp(t);
          </programlisting>

          which extracts this component from the union.</para>

        <para>There are also operations for constructing and deconstructing
          objects of type <literal>t</literal> for field
          <literal>field</literal> given as follows:

          <programlisting>
STATEMENT MAKE_union_field(ctype, ...., lvalue t);
STATEMENT DECONS_union_field(lvalue ctype, ...., t);
STATEMENT DESTROY_union_field(DESTROYER, lvalue ctype, ...., t);
          </programlisting>

          The operation <code>MAKE_</code><literal>union_field</literal>
          constructs a value of type <literal>t</literal> from the various
          components and assigns it into the last argument. The
          <literal>ctype</literal> arguments range over all the components of
          <literal>field</literal>, both the shared components and the
          unshared components, which do not have a component initialiser
          string in the definition. The union component are initialised either
          using the <link linkend="outputstruct">initialiser string</link> or
          the given operation argument.</para>

        <para><code>DECONS_</code><literal>union_field</literal> performs the
          opposite operation, deconstructing an object of type
          <literal>t</literal> into its components for this field.
          <code>DESTROY_</code><literal>union_field</literal> is similar,
          except that it also applies the given destructor function to the
          original value. For these two operations <literal>ctype</literal>
          ranges over all the components, including those with
          initialiser strings.</para>
      </sect2>

      <sect2 id="union-field-sets">
        <title>Union field sets</title>

        <para>Recall that
          <link linkend="inputunion">a number of union field identifiers</link>
          may be associated with the same set of components.  In this case
          these fields form a union field set named after the first element of
          the set. There are operations:

          <programlisting>
int IS_union_field_etc(t);
PTR(ctype) union_field_etc_comp(t);
STATEMENT MAKE_union_field_etc(unsigned int, ctype, ...., lvalue t);
STATEMENT MODIFY_union_field_etc(unsigned int, t);
STATEMENT DECONS_union_field_etc(lvalue ctype, ...., t);
STATEMENT DESTROY_union_field_etc(DESTROYER, lvalue ctype, ...., t);
          </programlisting>

          defined on these sets. These are exactly analogous to the single
          field operations except that they work for any field in the set.
          Note that
          <code>MAKE_</code><literal>union_field</literal><code>_etc</code>
          takes an extra argument, giving the tag associated with the
          particular element of the set being constructed. Also the construct
          <code>MODIFY_</code><literal>union_field</literal><code>_etc</code>
          is introduced to allow the tag of an existing object to be modified
          to another value within the same set.</para>

        <para>The valid range of union tags for this set can be given by the
          two constants:

          <programlisting>
unsigned int union_field_tag;
unsigned int union_field_etc_tag;
          </programlisting></para>

        <para>A union is a member of this set if its tag is greater than or
          equal to <literal>union_field</literal><code>_tag</code> and
          strictly less than
          <literal>union_field</literal><code>_etc_tag</code>.</para>
      </sect2>

      <sect2 id="inheritance-in-unions">
        <title>Inheritance in unions</title>

        <para>Inheritance in unions is handled as follows: if
          <literal>t</literal> is a derived union with base union
          <literal>btype</literal> then there is an operation:

          <programlisting>
btype CONVERT_union_base(t);
          </programlisting>

          where <literal>union</literal> and <literal>base</literal> are the
          short names of <literal>t</literal> and <literal>btype</literal>
          respectively.</para>
      </sect2>

      <sect2 id="outputunionmaps">
        <title>Union maps</title>

        <para>For each map, <literal>map</literal>, on the union
          <literal>t</literal>, we have declared in
          <literal>union</literal><code>_ops.h</code> an operator:

          <programlisting>
map_ret map_union(t, map_par, ....);
          </programlisting>

          where <literal>map_ret</literal> is the map return type and
          <literal>map_par</literal> ranges over the map parameter types. This
          is except for maps with destructors (i.e. those qualified by a
          <link linkend="inputunion">hash symbol</link>) which have the form:

          <programlisting>
map_ret map_union(t, DESTROYER, map_par, ....);
          </programlisting></para>

        <para>These maps are implemented by having one function per field for
          each map. The output file
          <literal>union</literal><code>_map.h</code> contains tables of these
          functions. These have the form:

          <programlisting>
map_ret(*map_union_table[ORDER_union])(t, map_par, ....) = {
        ....
        map_union_field,
        ....
} ;
          </programlisting>

          where there is one entry per union field.</para>

        <para>In order to aid the construction of these functions a set of
          function headers is provided in
          <literal>union</literal><code>_hdr.h</code>.  These take the form:

          <programlisting>
#define HDR_map_union_field\
        map_ret map_union_field(name_union, destroyer, par, ....)\
        t name_union;\
        DESTROYER destroyer; /* if required */\
        map_par par;\
        ....\
        {\
                ctype comp;\
                ....\
                DECONS_union_field(comp, ...., name_union);
          </programlisting></para>

        <para>There is also an alternative function header,
          <code>HDR_</code><literal>map</literal>_d_<literal>union_field</literal>,
          which calls <code>DESTROY_</code><literal>union_field</literal>
          rather than <code>DECONS_</code><literal>union_field</literal>. The
          destructor function used is <literal>destroyer</literal>, if this
          parameter is present, or the default destructor,
          <code>destroy_</code><literal>name</literal>, otherwise.</para>
      </sect2>
    </sect1>
  </chapter>

  <chapter id="coutput">
    <title>Implementation details</title>

    <sect1 id="coutputtypes">
      <title>Implementation of types</title>

      <para>The C implementation of the type system
        <link linkend="tokenoutput">specified above</link> is based on a
        single type, <literal>name</literal>, with the same name as the input
        algebra. This is defined as follows:

      <programlisting>
typedef union name_tag {
        unsigned int ag_tag;
        union name_tag *ag_ptr;
        unsigned ag_enum;
        unsigned long ag_long_enum;
        name_dim ag_dim;
        t ag_prim_type;
        ....
} name;
      </programlisting></para>

      <para>where <literal>t</literal> runs over all the primitive types.  All
        of the types in the algebra can be packed into a block of cells of
        type <literal>name</literal>.  The following paragraphs describe the
        implementation of each type, together with how they are packed as
        blocks of cells.  This is illustrated by the following diagram:</para>

      <mediaobject>
        <imageobject>
          <imagedata fileref="images/calculus.png" format="PNG"/>
        </imageobject>
        <textobject>
          <phrase>Packing of types</phrase>
        </textobject>
        <caption>
          <para>Packing of types in calculus</para>
        </caption>
      </mediaobject>

      <para>Primitive types are implemented by a <code>typedef</code> defining
        the type in terms of its
        <link linkend="outputprim">given definition</link>. A primitive type
        can be packed into a single cell using the appropriate
        <code>ag_prim_</code><literal>type</literal> field of the
        union.</para>

      <para>Identity types are implemented by a <code>typedef</code> defining
        the type as being equal to another
        <link linkend="outputident">type from the algebra</link>.</para>

      <para>Enumeration types are all implemented as <code>unsigned int</code>
        if all its values fit into 16 bits, or <code>unsigned long</code>
        otherwise. An enumeration type can be packed into a single cell using
        the <code>ag_enum</code> or <code>ag_long_enum</code> field of the
        union.</para>

      <para>Structure types are implemented by a <code>typedef</code> defining
        the type to be the C structure with the
        <link linkend="outputstruct">given components</link>. A structure type
        may be packed into a block of cells by packing each of the components
        in turn.</para>

      <para>Union types are all implemented by a pointer to
        <literal>name</literal>.  This pointer references a block of cells.
        The null pointer represents
        <code>NULL_</code><literal>union</literal>, otherwise the first cell
        contains the union discriminant tag (in the <code>ag_tag</code>
        field), with the subsequent cells containing the packed field
        components (shared components first, then unshared components). If the
        union has only one field then the discriminant can be omitted. The
        union itself can be packed into a single cell using the
        <code>ag_ptr</code> field of the union.</para>

      <para>Pointer types are all implemented by a pointer to
        <literal>name</literal>.  Null pointers represent
        <code>NULL_ptr</code>, otherwise the pointer references a single cell.
        This cell contains a pointer to the packed version of the object being
        pointed to in its <code>ag_ptr</code> field. A pointer type itself can
        be packed into a single cell using the <code>ag_ptr</code> field of
        the union.</para>

      <para>List types are all implemented by a pointer to
        <literal>name</literal>.  Null pointers represent
        <code>NULL_list</code>, otherwise the pointer references a block of
        two cells. The first cell gives the tail of the list in its
        <code>ag_ptr</code> field; the second cell contains a pointer to the
        packed version of the head of the list in its <code>ag_ptr</code>
        field. A list type itself can be packed into a single cell using the
        <code>ag_ptr</code> field of the union.</para>

      <para>Stack types are identical to list types, with the head of the list
        corresponding to the top of the stack.</para>

      <para>Vector pointer and vector types are all implemented by structures
        defined as follows:

        <programlisting>
typedef unsigned int name_dim;

typedef struct {
        name *vec;
        name *ptr;
} name_VEC_PTR;

typedef struct {
        name_dim dim;
        name_VEC_PTR elems;
} name_VEC;
        </programlisting></para>

      <para>The <code>vec</code> field of a vector pointer contains a pointer
        to a block of cells containing the packed versions of the elements of
        the vector. The <code>ptr</code> field is a pointer to the current
        element of this block. A vector type also has a field giving the
        vector size.  A vector pointer type can be packed into a block of two
        cells, using the <code>ag_ptr</code> field of each to hold its two
        components. A vector type can similarly be packed into a block of
        three cells, the first holding the vector size in its
        <code>ag_dim</code> field.</para>

      <para>All size types are implemented as <code>int</code>.</para>
    </sect1>

    <sect1 id="coutputsupport">
      <title>Support routines</title>

      <para>The implementation requires the user to provide certain support
        functions. Most fundamental are the routines for allocating and
        deallocating the blocks of cells which are used to store the types.
        These are specified as follows:

        <programlisting>
name *gen_name(unsigned int);
void destroy_name(name *, unsigned int);
void dummy_destroy_name(name *, unsigned int);
        </programlisting>

        where <code>gen_</code><literal>name</literal> allocates a block of
        cells of the given size, <code>destroy_</code><literal>name</literal>
        deallocates the given block, and
        <code>dummy_destroy_</code><literal>name</literal> has
        <link linkend="outputbasic">no effect</link>. The precise details of
        how the memory management is to be handled are left to the
        user.</para>

      <para>Certain generic list functions must also be provided, namely:

        <programlisting>
void destroy_name_list(name *, unsigned int);
name *reverse_name_list(name *);
name *append_name_list(name *, name *);
name *end_name_list(name *);
        </programlisting>

        which implement the constructs <code>DESTROY_list</code>,
        <code>REVERSE_list</code>, <code>APPEND_list</code> and
        <code>END_list</code> respectively.</para>

      <para>Finally a dummy empty vector:
        <programlisting>
name_VEC empty_name_vec;
        </programlisting>needs to be defined.</para>

      <para>Examples of these support routines can be found in the
        <link linkend="example"><command>calculus</command> program</link>
        itself.</para>
    </sect1>

    <sect1 id="coutputassert">
      <title>Run-time checking</title>

      <para>The type checking facilities supported by
        <command>calculus</command> allow for compile-time detection of many
        potential programming errors, however there are many problems, such as
        dereferencing null pointers, deconstructing empty lists and union tag
        errors which can only be detected at run-time. For this reason
        <command>calculus</command> can be made to add extra assertions to
        check for such errors into its output. This is done by specifying the
        <code>-a</code> command-line option.</para>

      <para>These assertions are implemented by means of macros. If the macro
        <code>ASSERTS</code> is defined then these macros expand to give the
        run-time checks. If not the output is exactly equivalent to that which
        would have been output if the <code>-a</code> option had not been
        given.</para>

      <para>The assertions require certain support functions which are output
        into a separate file, <code>assert_def.h</code>. These functions need
        to be compiled into the program when <code>ASSERTS</code> is defined.
        Note that the functions are implementation specific.</para>
    </sect1>
  </chapter>

  <chapter id="diskoutput">
    <title>Disk reading and writing</title>

    <para>One of the facilities which the
      <link linkend="coutputtypes">homogeneous implementation</link> of the
      type system described above allows for is the addition of persistence.
      Persistence in this context means allowing objects to be written to, and
      read from, disk. Also discussed in this section is the related topic of
      the object printing routines, which allow a human readable
      representation of objects of the type system to be output for debugging
      or other purposes.</para>

    <para>The disk reading and writing routines are output into the files
      <filename>read_def.h</filename> and <filename>write_def.h</filename>
      respectively if the <option>-d</option> command-line option is passed to
      <command>calculus</command>. The object printing routines are output if
      the <option>-p</option> option is given, with additional code designed
      for use with run-time debuggers being added if the <option>-a</option>
      option is also given.</para>

    <para>All of these routines use extra constructs output in the main output
      files (<filename>name.h</filename> and
      <filename>union_ops.h</filename>), but not normally accessible.  The
      macro <literal>name</literal><code>_IO_ROUTINES</code> should be defined
      in order to make these available for the disk reading and writing
      routines to use.</para>

    <sect1 id="diskwrite">
      <title>Disk writing routines</title>

      <para>The disk writing routines output in
        <filename>write_def.h</filename> consist of a function:

        <programlisting>
static void WRITE_type(t);
        </programlisting>
        
        for each type <literal>t</literal> mentioned within the algebra
        description, which writes an object of that type to disk.</para>

      <para>Note that such routines are output not only for the basic types,
        such as unions and structures, but also for any composite types, such
        as pointers and lists, which are used in their definition. The
        <literal>type</literal> component of the name
        <code>WRITE_</code><literal>type</literal> is derived from basic types
        <literal>t</literal> by using the short type name. For composite types
        it is defined recursively as follows:

        <programlisting>
LIST(t)-&gt;list_type
PTR(t)-&gt;ptr_type
STACK(t)-&gt;stack_type
VEC(t)-&gt;vec_type
VEC_PTR(t)-&gt;vptr_type
        </programlisting></para>

      <para>Such functions are defined for identity types and composite types
        involving identity types by means of macros which define them in terms
        of the identity definitions.
        <code>WRITE_</code><literal>type</literal> functions for the primitive
        types should be provided by the user to form a foundation on which all
        the other functions may be built.</para>

      <para>The user may wish to generate
        <code>WRITE_</code><literal>type</literal> (or other disk reading and
        writing) functions for types other than those mentioned in the algebra
        definition. This can be done by means of a command-line option of the
        form <code>-E</code><literal>input</literal> where
        <literal>input</literal> is a file containing a list of the extra
        types required.  In the notation
        <link linkend="textinput">used above</link> the syntax for
        <literal>input</literal> is given by:

        <programlisting>
extra:
        type-list<subscript>opt</subscript>

type-list:
        type;
        type-list type;
        </programlisting></para>

      <para>The <code>WRITE_</code><literal>type</literal> functions are
        defined recursively in an obvious fashion. The user needs to provide
        the writing routines for the primitives already mentioned, plus
        support routines (or macros):

        <programlisting>
void WRITE_BITS(int, unsigned int);
void WRITE_DIM(name_dim);
void WRITE_ALIAS(unsigned int);
        </programlisting>

        for writing a number of bits to disk, writing a vector dimension and
        writing an <link linkend="diskalias">object alias</link>.</para>

      <para>Any of the <code>WRITE_</code><literal>type</literal> functions
        may be overridden by the user by defining a macro
        <code>WRITE_</code><literal>type</literal> with the desired effect.
        Note that the <code>WRITE_</code><literal>type</literal> function for
        an identity can be overridden independently of the function for the
        identity definition.  This provides a method for introducing types
        which are representationally the same, but which are treated
        differently by the disk reading and writing routines.</para>
    </sect1>

    <sect1 id="diskread">
      <title>Disk reading routines</title>

      <para>The disk reading routines output in
        <filename>read_def.h</filename> are exactly analogous to the disk
        writing routines. For each type <literal>t</literal> (except
        primitives) there is a function or macro:

        <programlisting>
static t READ_type(void);
        </programlisting>

        which reads an object of that type from disk. The user must provide
        the <code>READ_</code><literal>type</literal> functions for the
        primitive types, plus support routines:

        <programlisting>
unsigned int READ_BITS(int);
name_dim READ_DIM(void);
unsigned int READ_ALIAS(void);
        </programlisting>

        for reading a number of bits from disk, reading a vector dimension and
        reading an <link linkend="diskalias">object alias</link>. The
        <code>READ_</code><literal>type</literal> functions may be overridden
        by means of macros as before.</para>
    </sect1>

    <sect1 id="diskprint">
      <title>Object printing routines</title>

      <para>The object printing routines output in
        <filename>print_def.h</filename> consist of a function or macro:

        <programlisting>
static void PRINT_type(FILE *, t, char *, int);
        </programlisting>

        for each type <literal>t</literal>, which prints an object of type
        <literal>t</literal> to the given file, using the given object name
        and indentation value. The user needs to provide basic output
        routines:

        <programlisting>
void OUTPUT_type(FILE *, t);
        </programlisting>
        
        for each primitive type. The
        <code>PRINT_</code><literal>type</literal> functions may be overridden
        by means of macros as before.</para>

      <para>The printing routines are under the control of three variables
        defined as follows:

        <programlisting>
static int print_indent_step = 4;
static int print_ptr_depth = 1;
static int print_list_expand = 0;
        </programlisting></para>

      <para>These determine the indentation to be used in the output, to what
        depth pointers are to be dereferenced when printing, and whether lists
        and stacks are to be fully expanded into a sequence of elements or
        just split into a head and a tail.</para>

      <para>One application of these object printing routines is to aid
        debugging programs written using the <command>calculus</command> tool.
        The form of the type system implementation means that it is not easy
        to extract information using run-time debuggers without a detailed
        knowledge of the structure of this implementation. As a more
        convenient alternative, if both the <code>-p</code> and
        <code>-a</code> command-line options are given then
        <command>calculus</command> will generate functions:
        
        <programlisting>
void DEBUG_type(t);
        </programlisting>
        
        defined in terms of <code>PRINT_</code><literal>type</literal>, for
        printing an object of the given type to the standard output. Many
        debuggers have problems passing structure arguments, so for structure,
        vector and vector pointer types
        <code>DEBUG_</code><literal>type</literal> takes the form:
        
        <programlisting>
void DEBUG_type(t *);
        </programlisting>
        
        These debugging routines are only defined conditionally, if the macro
        <code>DEBUG</code> is defined.</para>
    </sect1>

    <sect1 id="diskalias">
      <title>Aliasing</title>

      <para>An important feature of the disk reading and writing routines,
        namely aliasing, has been mentioned but not yet described. The problem
        to be faced is that many of the objects built up using type systems
        defined using <command>calculus</command> will be cyclic - they will
        include references to themselves in their own definitions.  Aliasing
        is a mechanism for breaking such cycles by ensuring that only one copy
        of an object is ever written to disk, or that only one copy is created
        when reading from disk. This is done by associating a unique number as
        an alias for the object.</para>

      <para>For example, when writing to disk, the first time the object is
        written the alias definition is set up. Consequently the alias number
        is written instead of the object it represents. Similarly when reading
        from disk, an alias may be associated with an object when it is read.
        When this alias is encountered subsequently it will always refer to
        this same object.</para>

      <para>The objects on which aliasing can be specified are the
        <link linkend="inputunion">union fields</link>. A union field may be
        qualified by one or two hash symbols to signify that objects of that
        type should be aliased.</para>

      <para>The two hash case is used to indicate that the user wishes to gain
        access to the objects during the aliasing mechanism. In the disk
        writing case, the object to be written, <literal>x</literal> say, is
        split into its components using the appropriate
        <code>DECONS_</code><literal>union_field</literal> construct. Then the
        user-defined routine, or macro:

        <programlisting>
ALIAS_union_field(comp, ...., x);
        </programlisting>

        (where <literal>comp</literal> ranges over all the union components)
        is called prior to writing the object components to disk.</para>

      <para>Similarly in the disk reading case, the object being read,
        <literal>x</literal>, is initialised by calling the user-defined
        routine:

        <programlisting>
UNALIAS_union_field(x);
        </programlisting>

        prior to reading the object components from disk. Each object
        component is then read into a local variable, <literal>comp</literal>.
        Finally the user-defined routine:

        <programlisting>
UNIFY_union_field(comp, ...., x);
        </programlisting>

        (where <literal>comp</literal> ranges over all the union components)
        is called to assign these values to <literal>x</literal> before
        returning.</para>

      <para>In the single hash case the object is not processed in this way.
        It is just written straight to disk, or has its components immediately
        assigned to it when reading from disk.</para>

      <para>Note that aliasing is used, not just in the disk reading and
        writing routines, but also in the object printing functions.  After
        calling any such function the user should call the routine:

        <programlisting>
void clear_name_alias(void);
        </programlisting>

        to clear all aliases.</para>

      <para>Aliases are implemented by adding an extra field to the objects to
        be aliased, which contains the alias number, if this has been
        assigned, or zero, otherwise. A list of all these extra fields is
        maintained. In addition to the routine
        <code>clear_</code><literal>name</literal>_alias mentioned above, the
        user should provide support functions and variables:

        <programlisting>
unsigned int crt_name_alias;
void set_name_alias(name *, unsigned int);
name *find_name_alias(unsigned int);
        </programlisting>

        giving the next alias number to be assigned, and routines for adding
        an alias to the list of all aliases, and looking up an alias in this
        list. Example implementations of these routines are given in the
        <link linkend="example"><command>calculus</command> program</link>
        itself.</para>
    </sect1>

    <sect1 id="application">
      <title>Application to calculus</title>

      <para>As <link linkend="example">mentioned above</link>, the
        <command>calculus</command> program itself is an example of its own
        application. It therefore contains routines for reading and writing a
        representation of an algebra to and from disk, and for pretty-printing
        the contents of an algebra. These may be accessed using the
        <link linkend="options">command-line options</link> mentioned
        above.</para>

      <para>If the <option>-w</option> command-line option is specified to
        <command>calculus</command> then it reads its input file,
        <literal>input</literal>, as normal, but writes a disk representation
        of the input algebra to <literal>output</literal>, which in this
        instance is an output file, rather than an output directory. An output
        file produced in this way can then be specified as an input file to
        <command>calculus</command> if the <option>-r</option> option is
        given.  Finally the input algebra may be pretty-printed to an output
        file (or the standard output if no <literal>output</literal> argument
        is given) by specifying the <option>-o</option> option.</para>
    </sect1>
  </chapter>

  <chapter id="template-files">
    <title>Template files</title>

    <para>It is possible to use <command>calculus</command> to generate an
      output file from a template input file, <literal>template</literal>,
      using the syntax:
      
      <programlisting>
calculus [ options ] input .... -Ttemplate output
      </programlisting></para>
      
    <para>The template file consists of a list of either template directives
      or source lines containing escape sequences which are expanded by
      <command>calculus</command>. Template directive lines are distinguished
      by having <code>@</code> as their first character.  Escape sequences
      consist of <code>%</code> following by one or more characters.</para>

    <para>There are two template directives; loops take the form:

      <programlisting>
@loop control
....
@end
      </programlisting>

      and conditionals take the form:

      <programlisting>
@if condition
....
@else
....
@endif
      </programlisting>

      or:

      <programlisting>
@if condition
....
@endif
      </programlisting>

      where <code>....</code> stands for any sequence of template directives
      or source lines.</para>

    <para>The <literal>control</literal> statements in a loop can be
      <code>primitive</code>, <code>identity</code>, <code>enum</code>,
      <code>struct</code> or <code>union</code> to loop over all the
      primitive, identity, enumeration structure or union types within the
      input algebra. Within an <code>enum</code> loop it is possible to use
      <code>enum.const</code> to loop over all the enumeration constants of
      the current enumeration type. Within a <code>struct</code> loop it is
      possible to use <code>struct.comp</code> to loop over all the components
      of the current structure. Within a <code>union</code> loop it is
      possible to use <code>union.comp</code> to loop over all the shared
      components of the current union, <code>union.field</code> to loop over
      all the fields of the current union, and <code>union.map</code> to loop
      over all the maps of the current union. Within a
      <code>union.field</code> loop it is possible to use
      <code>union.field.comp</code> to loop over all the components of the
      current union field. Within a <code>union.map</code> loop it is possible
      to use <code>union.map.arg</code> to loop over all the arguments of the
      current union map.</para>

    <para>The valid <literal>condition</literal> statements in a conditional
      are <code>true</code> and <code>false</code>, plus
      <code>comp.complex</code>, which is true if the current structure or
      union field component has a complex type (i.e. those for which
      <code>COPY_</code><literal>type</literal> and
      <code>DEREF_</code><literal>type</literal> require two arguments), and
      <code>comp.default</code>, which is true if the current structure or
      union field component has a default initialiser value.</para>

    <para>A number of escape sequences can be used anywhere.  <code>%ZX</code>
      and <code>%ZV</code> give the name and version number of the version of
      <command>calculus</command> used.  <code>%Z</code> and <code>%V</code>
      give the name and version number of the input algebra.  <code>%%</code>
      and <code>%@</code> give <code>%</code> and <code>@</code> respectively,
      and <code>%</code> as the last character in a line suppresses the
      following newline character.</para>

    <para>Within a <code>primitive</code> loop, <code>%PN</code> gives the
      primitive name, <code>%PM</code> gives the short primitive name and
      <code>%PD</code> gives the primitive definition.</para>

    <para>Within an <code>identity</code> loop, <code>%IN</code> gives the
      identity name, <code>%IM</code> gives the short identity name and
      <code>%IT</code> gives the identity definition.</para>

    <para>Within an <code>enum</code> loop, <code>%EN</code> gives the
      enumeration name, <code>%EM</code> gives the short enumeration name and
      <code>%EO</code> gives the enumeration order,
      <code>ORDER_</code><literal>enum</literal>. Within an
      <code>enum.const</code> loop, <code>%ES</code> gives the enumeration
      constant name and <code>%EV</code> gives its value.</para>

    <para>Within a <code>struct</code> loop, <code>%SN</code> gives the
      structure name and <code>%SM</code> gives the short structure
      name.</para>

    <para>Within a <code>union</code> loop, <code>%UN</code> gives the union
      name, <code>%UM</code> gives the short union name and <code>%UO</code>
      gives the union order, <code>ORDER_</code><literal>union</literal>.
      Within a <code>union.field</code> loop, <code>%FN</code> gives the field
      name.  Within a <code>struct.comp</code>, <code>union.comp</code> or
      <code>union.field.comp</code> loop, <code>%CN</code> gives the component
      name, <code>%CT</code> gives the component type, <code>%CU</code> gives
      the short form of the component type and <code>%CV</code> gives the
      default component initialiser value (if <code>comp.default</code> is
      true). Within a <code>union.map</code> loop, <code>%MN</code> gives the
      map name and <code>%MR</code> gives the map return type. Within a
      <code>union.map.arg</code> loop, <code>%AN</code> gives the argument
      name and <code>%AT</code> gives the argument type.</para>

    <para>As an example, the following template file gives a simple algebra
      pretty printer:

    <programlisting>
ALGEBRA %X (%V):

/* PRIMITIVE TYPES */
@loop primitive
%PN (%PM) = "%PD";
@end

/* IDENTITY TYPES */
@loop identity
%IN (%IM) = %IT;
@end

/* ENUMERATION TYPES */
@loop enum

enum %EN (%EM) = {
@loop enum.const
        %ES = %EV,
@end
};
@end

/* STRUCTURE TYPES */
@loop struct

struct %SN (%SM) = {
@loop struct.comp
        %CT %CN;
@end
};
@end

/* UNION TYPES */
@loop union

union %UN (%UM) = {
@loop union.comp
        %CT %CN;
@end
} + {
@loop union.field
        %FN-&gt;{
@loop union.field.comp
                %CT %CN;
@end
        };
@end
};
@end
    </programlisting></para>
  </chapter>

  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude"
    href="../../../common/colophon-dera.xml"/>
</book>
Subversion Repositories tendra.SVN

(root)/trunk/doc/user/guides/calculus/calculus.xml – Rev 7