Subversion Repositories tendra.SVN

Rev

Rev 6 | Blame | Compare with Previous | Last modification | View Log | RSS feed

<?xml version="1.0" standalone="no"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
  "http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">

<!--
  $Id$
-->

<book>
  <bookinfo>
    <title>C++ Producer Guide</title>

    <corpauthor>The TenDRA Project</corpauthor>

    <author>
      <firstname>Jeroen</firstname>
      <surname>Ruigrok van der Werven</surname>
    </author>
    <authorinitials>JRvdW</authorinitials>
    <pubdate>2004</pubdate>

    <copyright>
      <year>2004</year>
      <year>2005</year>

      <holder>The TenDRA Project</holder>
    </copyright>

    <copyright>
      <year>1998</year>

      <holder>DERA</holder>
    </copyright>
  </bookinfo>
  
  <chapter>
  <sect1 id="intro">
    <title>Introduction</title>

    <para>This document is designed as a technical overview of the TenDRA C++
      to TDF/ANDF producer.  It is divided into two broad areas; descriptions
      of the <A HREF="#interface">public interfaces</A> of the producer, and
      an overview of the producer <A HREF="#program">source code</A>.</para>

    <para>Whereas the interface description contains most of the information
      which would be required in a users' guide, it is not necessarily in a
      readily digestible form.  The C++ producer is designed to complement the
      existing TenDRA C to TDF producer; although they are completely distinct
      programs, the same design philosophy underlies both and they share a
      number of common interfaces.  There are no radical differences between
      the two producers, besides the fact that the C++ producer covers a
      vastly larger and more complex language.  This means that much of the
      <A HREF="#tdfc">existing documentation on the C producer</A> can be
      taken as also applying to the C++ producer.  This document tries to make
      clear where the C++ producer extends the C producer's interfaces, and
      those portions of these interfaces which are not directly applicable to
      C++.</para>

    <para>
    A familiarity with both C++ and TDF is assumed. The version of C++
    implemented is that given by the <A HREF="#cplusplus">draft ISO C++
    standard</A>.  All references to &quot;ISO C++&quot; within the document
    should strictly be qualified using the word &quot;draft&quot;, but
    for convenience this has been left implicit.  The C++ producer has
    a number of switches which allow it to be configured for older dialects
    of C++.  In particular, the version of C++ described in the <A HREF="#arm">ARM
    (Annotated Reference Manual)</A> is fully supported. 
    </para>

    <para>The <A HREF="#tdf">TDF specification</A> (version 4.0) may be consulted
    for a description of the compiler intermediate language used.  The
    paper 
    <A HREF="#port"><I>TDF and Portability</I></A> provides a useful (if
    slightly old) introduction to some of the ideas relating to static
    program analysis and interface checking which underlie the whole TenDRA
    compilation system. 
    </para>

    <para>
    The warning sign: 
    
    <IMG SRC="../images/warn.gif" ALT="warning"/>
    
    is used within the document to indicate areas where the implementation
    is currently incomplete or incorrect. 
    </para>

    <sect2 id="update">
      <title>1.1. Updated introduction</title>

      <para>Since this document was originally written, the old C producer,
        <I>tdfc</I>, has been replaced by a new C producer, <I>tdfc2</I>,
        which is just a modified version of the C++ producer, <I>tcpplus</I>.
        All C producer documentation continues to apply to the new C producer,
        but the new C producer also has many of the features described in this
        document as only applying to the C++ producer.</para>
    </sect2>
  </sect1>

  <sect1 id="interface">
    <title>Interface descriptions</title>
  <para>
  The most important public interfaces of the C++ producer are the ISO
  C++ standard and the TDF 4.0 specification; however there are other
  interfaces, mostly common to both the C and C++ producers, which are
  described in this section. 
  </para>
  <para>
  An important design criterion of the C++ producer was that it should
  be strictly ISO conformant by default, but have a method whereby dialect
  features and extra static program analysis can be enabled. This compiler
  configuration is controlled by the 
  <A HREF="pragma.html"><code>#pragma TenDRA</code> directives</A>
  described in the first section. 
  </para>
  <para>
  The requirement that the C and C++ producers should be able to translate
  portable C or C++ programs into target independent TDF requires a
  mechanism whereby the target dependent implementations of APIs can
  be represented.  This mechanism, the <A HREF="token.html"><code>#pragma
  token</code> syntax</A>, is described in the following section.  Note
  that at present this mechanism only contains support for C APIs; it
  is considered that the C++ language itself contains sufficient interface
  mechanisms for C++ APIs to be described. 
  </para>
  <para>
  The C and C++ producers provide two mechanisms whereby type and declaration
  information derived from a translation unit can be stored to a file
  for post-processing by other tools.  The first is the 
  <A HREF="dump.html">symbol table dump</A>, which is a public interface
  designed for use by third party tools.  The second is the 
  <A HREF="link.html">C/C++ spec file</A>, which is designed for ease
  of reading and writing by the producers themselves, and is used for
  intermodule analysis. 
  </para>
  <para>
  The mapping from C++ to TDF implemented by the C++ producer is largely
  straightforward.  There are however target dependencies arising within
  the language itself which require special handling.  These are represented
  by certain <A HREF="lib.html">standard tokens</A> which the producer
  requires to be defined on the target machine.  These tokens are also
  used to describe the interface between the producer and the run-time
  system.  Note that the C++ producer is primarily concerned with the
  C++ language, not with the standard C++ library. An example implementation
  of those library components which are required as an integral part
  of the language (memory allocation, exception handling, run-time type
  information etc.) is provided. Otherwise, libraries should be obtained
  from third parties.  A number of hints on <A HREF="std.html">integrating
  such libraries</A> with the C++ producer are given. 
  </para>
  </sect1>
  
  <sect1 id="program">
    <title>Program overview</title>
  <para>
  The C++ producer is a large program (over 200000 lines, including
  automatically generated code) written in C.  A description of the
  <A HREF="style.html#language">coding conventions</A> used, the 
  <A HREF="style.html#api">API</A> observed and the basic organisation
  of the <A HREF="style.html#src">source code</A> are described in the
  first section. 
  </para>
  <para>
  One of the design methods used in the C++ producer is the extensive
  use of automatic code generation tools.  The type system is based
  around the <code>calculus</code> tool, which allows complex type systems
  to be described in a simple format.  The interface generated by <code>calculus
  </code> allows for rigorous static type checking, generic type constructors
  for lists, stacks etc., encapsulation of the operations on the types
  within the system, and optional run-time checking for null pointers
  and discriminated union tags.  An overview is given of the <A HREF="alg.html">type
  system</A> used as the basis of the C++ producer design.  Also see
  the 
  <A HREF="../utilities/calc.html"><code>calculus</code> users' guide</A>.
  </para>
  <para>
  The other general purpose code generation tool used in the C++ producer
  is the parser generator, <code>sid</code>.  A brief description of
  the problems in writing a <A HREF="parse.html">C++ parser</A> is given.
  Also see the <A HREF="../utilities/sid.html"><code>sid</code> users'
  guide</A>. 
  </para>
  <para>
  The other code generation tools used were written specifically for
  the C++ producer.  The error reporting routines within the producer
  are based on an <A HREF="error.html">error catalogue</A>, from which
  code for constructing and printing errors is generated.  The 
  <A HREF="tdf.html">TDF output routines</A> are based on primitives
  automatically generated from a standard database describing the TDF
  specification. 
  </para>
  <para>
  The program itself is well commented, so no lower level program documentation
  has been provided.  When performing development work the producer
  should be compiled with the <code>DEBUG</code> macro defined. This
  enables the <code>calculus</code> run-time checks, along with other
  assertions, and makes available the debugging routines, 
  <code>DEBUG_</code><I>type</I>, which can be used to print an object
  from the internal type system. 
  </para>
  </sect1>
  
  <sect1 id="reference">
    <title>References</title>
    <itemizedlist>
      <listitem><A id="cplusplus"><B>Working paper for Draft Proposed
      Internation Standard for Information Systems - Programming Language
      C++</B></A>, X3J16/96-0225, December 1996:     
      <A HREF="http://www.cygnus.com/misc/wp/dec96pub/">
      <code>http://www.cygnus.com/misc/wp/dec96pub/</code></A> or     
      <A HREF="http://www.maths.warwick.ac.uk/c++/pub/wp/html/cd2/">
      <code>http://www.maths.warwick.ac.uk/c++/pub/wp/html/cd2/</code></A>.
      </listitem>
      <listitem><A id="arm"><B>The Annotated C++ Reference Manual</B></A>,
      Margaret Ellis and Bjarne Stroustrup, ISBN 0-201-51459-1,
      Addison-Wesley, 1990:     
      <A HREF="http://heg-school.aw.com/cseng/authors/ellis/annocpp/annocpp.html">
      <code>http://heg-school.aw.com/cseng/authors/ellis/annocpp/annocpp.html</code>
      </A>
      </listitem>
      <listitem><A id="tdf"><B>TDF Specification, Issue 4.0</B></A>: 
      <A HREF="../tdf/spec1.html">attached</A>. 
      </listitem>
      <listitem><A id="tdfc"><B>C Checker Reference Manual</B></A>: 
      <A HREF="../tdfc/tdfc1.html">attached</A>. 
      </listitem>
      <listitem><A id="port"><B>TDF and Portability</B></A>: 
      <A HREF="../port/port1.html">attached</A>. 
      </listitem>
      <listitem><A id="cstyle"><B>C Coding Standards</B></A>,
      DRA/CIS(SE2)/WI/94/57/2.0 (OSSG internal document). 
      </listitem>
    </itemizedlist>
  </sect1>
  
  <sect1>
  <title>
  C++ Producer Guide: Invocation 
  </title>
  
  <sect2>
    <title>2.1. Invocation</title>
  <para>
  In this section it is described how the C++ to TDF producer, 
  <code>tcpplus</code>, fits into the overall compilation scheme controlled
  by the TenDRA compiler front-end, <code>tcc</code>, or the TenDRA
  checker front-end, <code>tchk</code>.  While it is possible to use
  <code>tcpplus</code> as a stand-alone program, it is recommended that
  it should be invoked via <code>tcc</code> or <code>tchk</code>. The
  <code>tcc</code> users' guide should be consulted for more details.
  </para>
  <para>
  <code>tcc</code> and <code>tchk</code> require the <code>-Yc++</code>
  command-line option in order to enable their C++ capabilities.  Files
  with a <code>.C</code> suffix are recognised as C++ source files and
  passed to <code>tcpplus</code> for processing (see 
  <A HREF="#compile">below</A>).  It is possible to change the suffix
  used for C++ source files; for example <code>-sC:cc</code> causes
  <code>.cc</code> files to be recognised as C++ source files.  An interesting
  variation is <code>-sC:c</code> which causes C source files to be
  processed by the C++ producer.  Similarly <code>.I</code> files are
  recognised as preprocessed C++ source files and <code>.K</code>
  files are recognised as C++ spec files. 
  </para>
  <para>
  Most of the command-line option handling for <code>tcpplus</code>
  is done by <code>tcc</code> and <code>tchk</code>, however it is possible
  to pass the option <I>opt</I> directly to <code>tcpplus</code> using
  the option <code>-Wx,</code><I>opt</I> to <code>tcc</code> or <code>tchk</code>.
  Similarly <code>-Wg,</code><I>opt</I> and <code>-WS,</code><I>opt</I>
  can be used to pass options to the C++ preprocessor and the C++ spec
  linker (both of which are actually <code>tcpplus</code> invoked with
  different options) respectively. 
  </para>
  
  
  <sect3 id="compile">
    <title>2.1.1. Compilation scheme</title>
  <para>
  The overall compilation scheme controlled by <code>tcc</code>, as
  it relates to the C++ producer, can be represented as follows: 
  
  <IMG SRC="../images/compile.gif" ALT="compilation scheme"/>
  
  Each C++ source file, <code>a.C</code> say, is processed using 
  <code>tcpplus</code> to give an output TDF capsule, <code>a.j</code>,
  which is passed to the installer phase of <code>tcc</code>.  The capsule
  is linked with any target dependent token definition libraries, translated
  to assembler and assembled to give a binary object file, 
  <code>a.o</code>.  The various object files comprising the program
  are then linked with the system libraries to give a final executable,
  <code>a.out</code>. 
  </para>
  <para>
  In addition to this main compilation scheme, <code>tcpplus</code>
  can additionally be made to output a <A HREF="link.html">C++ spec
  file</A>
  for each C++ source file, <code>a.K</code> say.  These C++ spec files
  can be linked, using <code>tcpplus</code> in its spec linker mode,
  to give an additional TDF capsule, <code>x.j</code> say, and a combined
  C++ spec file, <code>x.K</code>.  The main purpose of this C++ spec
  linking is to perform intermodule checks on the program, however in
  the course of this checking exported templates which are defined in
  one module and used in another are instantiated.  This extra code
  is output to <code>x.j</code>, which is then installed and linked
  in the normal way. 
  </para>
  <para>
  Note that intermodule checks, and hence intermodule template instantiations,
  are only performed if the <code>-im</code> option is passed to <code>tcc</code>.
  </para>
  <para>
  The TenDRA checker, <code>tchk</code>, is similar to <code>tcc</code>
  except that it disables TDF output and has intermodule analysis enabled
  by default. 
  </para>
  </sect3>
  
  <sect3 id="option">
    <title>2.1.2. Producer options</title>
  <para>
  The general form for the invocation of <code>tcpplus</code> is as
  follows: 
  <programlisting>
        tcpplus [ <I>options</I> ] [ <I>input-file</I> ] .... [ <I>output-file</I> ]
  </programlisting>
  The output file can alternatively be specified using the 
  <A HREF="#output"><code>-o</code> option</A>.  If no output file is
  given, or the output file is <code>-</code>, the standard output is
  used.  In general there can be any number of input files.  If no input
  file is given, or the input file is <code>-</code>, the standard input
  is used. 
  </para>
  <para>
  <code>tcpplus</code> has three modes which determine the form of its
  input and output files.  The default mode is compilation, in which
  a single input C++ source file is translated into an output TDF capsule.
  In preprocessing mode, specified using the 
  <A HREF="#preproc"><code>-E</code> option</A>, a single input C++
  source file is preprocessed into an output C++ source file.  Note
  that the preprocessor is built into <code>tcpplus</code>, rather than,
  as with most other compilers, being a separate program.  The final
  mode is 
  <A HREF="link.html">C++ spec linking</A>, specified using the 
  <A HREF="#linker"><code>-S</code> option</A>.  Any number of C++ spec
  input files are linked and any code generated as a result (for example,
  template instantiations) is written to the output TDF capsule. 
  </para>
  <para>
  In either compilation or spec linking mode, a C++ spec output file
  can be generated, in addition to the TDF capsule, using the 
  <A HREF="#spec"><code>-s</code> option</A>.  In any mode a symbol
  table dump output file can generated using the <A HREF="#dump"><code>-d</code>
  option</A>. 
  </para>
  <para>
  Command-line options can appear in any order and can be interspersed
  with the input and output files, except following a <code>--</code>
  option.  All the multi-part options can be given either as one or
  two command-line arguments, so that <code>-I</code><I>directory</I>
  and 
  <code>-I</code> <I>directory</I> are equivalent.  The recognised options
  are as follows: 
  
  <itemizedlist>
  
  <listitem><B>-A<I>predicate</I>(<I>tokens</I>)</B>
  Asserts that the given predicate is true, that is to say: 
  <programlisting>
        #assert <I>predicate</I> ( <I>tokens</I> )
  </programlisting>
  The special case <code>-A-</code> undefines all the built-in predicates
  (of which there are none).  Use of this option automatically enables
  support for the <A HREF="pragma.html#ppdir"><code>#assert</code> and
  <code>#unassert</code> directives</A>. 
  </listitem>
  
  <listitem><B>-D<I>macro</I></B>
  <B>-D<I>macro</I>=<I>tokens</I></B>
  Defines the given macro to be 1 in the first case, or the given sequence
  of preprocessing tokens in the second case, that is to say: 
  <programlisting>
        #define <I>macro</I> 1
        #define <I>macro tokens</I>
  </programlisting>
  respectively.  In fact <code>-D</code> and <code>-U</code> options
  to 
  <code>tcc</code> are not passed as <code>-D</code> and <code>-U</code>
  options to <code>tcpplus</code>.  Instead a 
  <A HREF="#start-up">start-up</A> file containing the equivalent 
  <code>#define</code> and <code>#undef</code> directives is used. 
  </listitem>
  
  <listitem><A id="preproc"><B>-E</B></A>
  Enables preprocessing mode in which the input C++ source file is preprocessed
  into the output file. 
  </listitem>
  
  <listitem><B>-F<I>file</I></B>
  Causes a list of command-line options to be read from <I>file</I>.
  Other than empty lines and lines beginning with <code>#</code>, each
  line in the file is treated as if it had been specified as a separate
  command-line option. 
  </listitem>
  
  <listitem><B>-H</B>
  Enables verbose inclusion mode in which warnings are printed at the
  start and end of each included source file. 
  </listitem>
  
  <listitem><B>-I<I>directory</I></B>
  Adds the given directory to the list searched for included source
  files. No such directories are built into the producer by default.
  </listitem>
  
  <listitem><A id="directory"><B>-N<I>name</I>:<I>directory</I></B></A>
  This is identical to <code>-I</code><I>directory</I> except that it
  also associates the given identifier with the directory.  The directory
  name can be used to specify a <A HREF="pragma.html#scope">compilation
  profile</A> to be used on files included from this directory. 
  </listitem>
  
  <listitem><A id="linker"><B>-S</B></A>
  Enables C++ spec linker mode, in which any number of C++ spec input
  files are linked together. 
  </listitem>
  
  <listitem><B>-U<I>macro</I></B>
  Undefines the given macro, that is to say: 
  <programlisting>
        #undef <I>macro</I>
  </programlisting>
  The special case <code>-U-</code> undefines all the built-in macros.
  These may be described as follows: 
  <programlisting>
        #define __FILE__                <I>(current file)</I>
        #define __LINE__                <I>(current line)</I>
        #define __TIME__                <I>(current time)</I>
        #define __DATE__                <I>(current date)</I>
        #define __STDC__                1
        #define __STDC_VERSION__        199409L
        #define __cplusplus             199711L
  </programlisting>
  The actual value of <code>__cplusplus</code> gives the date of the
  draft ISO C++ standard on which the current version of the producer
  is based. The value given above gives the expected date of the final
  C++ standard. 
  </listitem>
  
  <listitem><B>-V</B>
  Causes the name of each function to be printed to the standard output
  as it is compiled. 
  </listitem>
  
  <listitem><B>-W<I>option</I></B>
  Sets the given <A HREF="pragma.html#low">compiler option</A> to give
  a warning, that is to say: 
  <programlisting>
        #pragma TenDRA option &quot;<I>option</I>&quot; warning
  </programlisting>
  The special case <code>-Wall</code> enables a wide range of warnings.
  </listitem>
  
  <listitem><B>-X</B>
  Disables exception handling.  The <A HREF="lib.html#except">current
  implementation</A> can be a large run-time overhead if not required.
  The effect of linking any module compiled with this option with a
  module which throws an exception is undefined.  This is equivalent
  to <A HREF="#output"><code>-j-e</code></A>. 
  </listitem>
  
  <listitem><B>-a</B>
  Causes complete program analysis to be applied.  That is to say it
  is assumed that no other translation units need to be linked in order
  for the program to execute. 
  </listitem>
  
  <listitem><B>-c</B>
  Disables TDF output.  The output file will still be a valid TDF capsule,
  but it will contain no information.  This is equivalent to 
  <A HREF="#output"><code>-j-c</code></A>. 
  </listitem>
  
  <listitem><para><A id="dump"><B>-d<I>opt</I>=<I>dump-file</I></B></A>
  Specifies the given file as a <A HREF="dump.html">symbol table dump</A>
  output file.  <I>opt</I> will be a series of characters describing
  the information to be dumped, as follows: 
  
  <table>
  <tr><th>Key</th>
  <th>Description</th>
  </tr>
  <tr><td><code>a</code></td>
  <td>equivalent to <code>ehlmu</code></td>
  </tr>
  <tr><td><code>c</code></td>
  <td>dump string literals</td>
  </tr>
  <tr><td><code>e</code></td>
  <td>dump error messages</td>
  </tr>
  <tr><td><code>h</code></td>
  <td>dump header information</td>
  </tr>
  <tr><td><code>k</code></td>
  <td>dump keyword identifiers</td>
  </tr>
  <tr><td><code>l</code></td>
  <td>dump local variables</td>
  </tr>
  <tr><td><code>m</code></td>
  <td>dump macro identifiers</td>
  </tr>
  <tr><td><code>s</code></td>
  <td>dump scope information</td>
  </tr>
  <tr><td><code>u</code></td>
  <td>dump identifier usage information</td>
  </tr>
  </table>
  
  </para>
  <para>
  Note that these correspond to the <code>tcc -sym</code> options. 
  </para>
  </listitem>
  
  <listitem><A id="end-up"><B>-e<I>file</I></B></A>
  Specifies the given file as an end-up file.  This is equivalent to
  adding: 
  <programlisting>
        #include &quot;<I>file</I>&quot;
  </programlisting>
  at the end of the input source file.  More than one end-up file may
  be given; they are processed in the order given. 
  </listitem>
  
  <listitem><A id="start-up"><B>-f<I>file</I></B></A>
  Specifies the given file as a start-up file.  This is equivalent to
  adding: 
  <programlisting>
        #include &quot;<I>file</I>&quot;
  </programlisting>
  at the start of the input source file.  More than one start-up file
  may be given; they are processed in the order given. 
  </listitem>
  
  <listitem><B>-g</B>
  Specifies that the output TDF capsule should also contain information
  to allow for the generation of run-time debugging directives.  This
  is equivalent to <A HREF="#output"><code>-jg</code></A>. 
  </listitem>
  
  <listitem><B>-h</B>
  Causes a full list of command-line options to be printed.  This includes
  a number not documented here which are unlikely to prove useful to
  the normal user. 
  </listitem>
  
  <listitem><A id="output"><B>-j<I>opt</I></B></A>
  Sets the TDF output options given by <I>opt</I>.  This consists of
  a sequence of characters describing the options to be enabled or disabled.
  By default, or following a <code>+</code>, the options are enabled;
  following a <code>-</code> they are disabled.  The available options
  are as follows: 
  </listitem>
  
  <table>
  <tr><th>Key</th>
  <th>Default</th>
  <th>Description</th>
  </tr>
  <tr><td><code>a</code></td>
  <td>off</td>
  <td>output external names for local objects</td>
  </tr>
  <tr><td><code>b</code></td>
  <td>off</td>
  <td>work round old installer bugs</td>
  </tr>
  <tr><td><code>c</code></td>
  <td>on</td>
  <td>output TDF capsule</td>
  </tr>
  <tr><td><code>d</code></td>
  <td>off</td>
  <td>output termination function</td>
  </tr>
  <tr><td><code>e</code></td>
  <td>on</td>
  <td>output exceptions</td>
  </tr>
  <tr><td><code>f</code></td>
  <td>on</td>
  <td>mangle template function signatures</td>
  </tr>
  <tr><td><code>g</code></td>
  <td>off</td>
  <td>output debugging information</td>
  </tr>
  <tr><td><code>i</code></td>
  <td>off</td>
  <td>output dynamic initialisers as a function</td>
  </tr>
  <tr><td><code>n</code></td>
  <td>on</td>
  <td>mangle object names</td>
  </tr>
  <tr><td><code>o</code></td>
  <td>off</td>
  <td>order class data members by access</td>
  </tr>
  <tr><td><code>p</code></td>
  <td>on</td>
  <td>output partial destructors</td>
  </tr>
  <tr><td><code>r</code></td>
  <td>on</td>
  <td>output run-time type information</td>
  </tr>
  <tr><td><code>s</code></td>
  <td>on</td>
  <td>output shared string literals</td>
  </tr>
  <tr><td><code>t</code></td>
  <td>off</td>
  <td>output token declarations</td>
  </tr>
  <tr><td><code>u</code></td>
  <td>on</td>
  <td>output unused static variables</td>
  </tr>
  <tr><td><code>v</code></td>
  <td>off</td>
  <td>output local virtual function tables</td>
  </tr>
  </table>
  
  <listitem><A id="error"><B>-m<I>opt</I></B></A>
  Sets the error formatting options given by <I>opt</I>.  This consists
  of a sequence of characters describing the options to be enabled or
  disabled. By default, or following a <code>+</code>, the options are
  enabled; following a <code>-</code> they are disabled.  The available
  options are as follows: 
  
  <table>
  <tr><th>Key</th>
  <th>Default</th>
  <th>Description</th>
  </tr>
  <tr><td><code>c</code></td>
  <td>off</td>
  <td>show source code with error</td>
  </tr>
  <tr><td><code>e</code></td>
  <td>off</td>
  <td>show error name</td>
  </tr>
  <tr><td><code>f</code></td>
  <td>on</td>
  <td>reliable <code>fseek</code> function</td>
  </tr>
  <tr><td><code>g</code></td>
  <td>off</td>
  <td>record statement locations</td>
  </tr>
  <tr><td><code>i</code></td>
  <td>on</td>
  <td>reliable <code>stat</code> function</td>
  </tr>
  <tr><td><code>k</code></td>
  <td>off</td>
  <td>enable C++ spec output</td>
  </tr>
  <tr><td><code>l</code></td>
  <td>off</td>
  <td>output full error location</td>
  </tr>
  <tr><td><code>s</code></td>
  <td>on</td>
  <td>output ISO section number</td>
  </tr>
  <tr><td><code>t</code></td>
  <td>off</td>
  <td>use <code>typedef</code> names in errors</td>
  </tr>
  <tr><td><code>w</code></td>
  <td>off</td>
  <td>disable warnings</td>
  </tr>
  <tr><td><code>z</code></td>
  <td>off</td>
  <td>continue after error</td>
  </tr>
  </table>
  
  </listitem>
  
  <listitem><A id="table"><B>-n<I>port-table</I></B></A>
  Specifies that the given <A HREF="pragma.html#table">portability table</A>
  should be used to specify the basic configuration parameters. 
  </listitem>
  
  <listitem><A id="output"><B>-o<I>output-file</I></B></A>
  Gives an alternative method of specifying the output file. 
  </listitem>
  
  <listitem><B>-q</B>
  Causes the program to quit immediately without processing its input
  files. This is useful primarily in version and command-line option
  queries. 
  </listitem>
  
  <listitem><A id="spec"><B>-s<I>spec-file</I></B></A>
  Specifies the given file as a C++ spec output file. 
  </listitem>
  
  <listitem><B>-t</B>
  Specifies that token declarations should be included in the output
  TDF capsule.  While these are strictly unnecessary, they help when
  pretty-printing the output.  This is equivalent to 
  <A HREF="#output"><code>-jt</code></A>. 
  </listitem>
  
  <listitem><A id="unmangle"><B>-u</B></A>
  The form: 
  <programlisting>
        tcpplus -u <I>name</I> .... <I>name</I>
  </programlisting>
  can be used to print the unmangled forms of a list of 
  <A HREF="lib.html#mangle">mangled identifier names</A> to the standard
  output. 
  </listitem>
  
  <listitem><B>-v</B>
  Causes the C++ producer version number, plus information on the versions
  of C++ and TDF supported, to be printed to the standard error. 
  </listitem>
  
  <listitem><B>-w</B>
  Disables all warning messages.  This is equivalent to 
  <A HREF="#error"><code>-mw</code></A>. 
  </listitem>
  
  <listitem><B>-z</B>
  Forces an output file to be created even if compilation errors occur.
  The effect of installing a TDF capsule produced using this option
  is undefined.  This is equivalent to <A HREF="#error"><code>-mz</code></A>.
  </listitem>
  
  <listitem><B>--</B>
  Marks the last option.  Any subsequent arguments are interpreted as
  input and output files even if they resemble command-line options.
  </listitem>
  
  </itemizedlist>
  </para>
  </sect3>
  </sect2>
  
  <sect2>
    <title>2.2. Compiler configuration</title>
  <para>
  This section describes how the C++ producer can be configured to apply
  extra static checks or to support various dialects of C++.  In all
  cases the default behaviour is precisely that specified in the ISO
  C++ standard with no extra checks. 
  </para>
  <para>
  Certain very basic configuration information is specified using a
  <A HREF="#table">portability table</A>, however the primary method
  of configuration is by means of <code>#pragma</code> directives. 
  These directives may be placed within the program itself, however
  it is generally more convenient to group them into a 
  <A HREF="man.html#start-up">start-up file</A> in order to create a
  <A id="usr">user-defined compilation profile</A>.  The 
  <code>#pragma</code> directives recognised by the C++ producer have
  one of the equivalent forms: 
  <programlisting>
        #pragma TenDRA ....
        #pragma TenDRA++ ....
  </programlisting>
  Some of these are common to the C and C++ producers (although often
  with differing default behaviour).  The C producer will ignore any
  <code>TenDRA++</code> directives, so these may be used in compilation
  profiles which are to be used by both producers.  In the descriptions
  below, the presence of a <code>++</code> is used to indicate a directive
  which is C++ specific; the other directives are common to both producers.
  </para>
  <para>
  Within the description of the <code>#pragma</code> syntax, <I>on</I>
  stands for <code>on</code>, <code>off</code> or <code>warning</code>,
  <I>allow</I> stands for <code>allow</code>, <code>disallow</code>
  or 
  <code>warning</code>, <I>string-literal</I> is any string literal,
  <I>integer-literal</I> is any integer literal, <I>identifier</I> is
  any simple, unqualified identifier name, and <I>type-id</I> is any
  type identifier.  Other syntactic items are described in the text.
  A 
  <A HREF="pragma1.html">complete grammar</A> for the <code>#pragma</code>
  directives accepted by the C++ producer is given as an annex. 
  </para>
  
  
  <sect3 id="table">
    <title>2.2.1. Portability tables</title>
  <para>
  Certain very basic configuration information is read from a file called
  a portability table, which may be specified to the producer using
  a 
  <A HREF="man.html#table"><code>-n</code> option</A>.  This information
  includes the minimum sizes of the basic integral types, the 
  <A HREF="#char">sign of plain <code>char</code></A>, and whether signed
  types can be assumed to be symmetric (for example, [-127,127]) or
  maximum (for example, [-128,127]). 
  </para>
  <para>
  The default portability table values, which are built into the producer,
  can be expressed in the form: 
  <programlisting>
        char_bits                       8
        short_bits                      16
        int_bits                        16
        long_bits                       32
        signed_range                    symmetric
        char_type                       either
        ptr_int                         none
        ptr_fn                          no
        non_prototype_checks            yes
        multibyte                       1
  </programlisting>
  This illustrates the syntax for the portability table; note that all
  ten entries are required, even though the last four are ignored. 
  </para>
  </sect3>  
  
  <sect3 id="low">
    <title>2.2.2. Low level configuration</title>
  <para>
  The simplest level of configuration is to reset the severity level
  of a particular error message using: 
  <programlisting>
        #pragma TenDRA++ error <I>string-literal on</I>
        #pragma TenDRA++ error <I>string-literal allow</I>
  </programlisting>
  The given <I>string-literal</I> should name an error from the 
  <A HREF="error.html">error catalogue</A>.  A severity of <code>on</code>
  or <code>disallow</code> indicates that the associated diagnostic
  message should be an error, which causes the compilation to fail.
  A severity of 
  <code>warning</code> indicates that the associated diagnostic message
  should be a warning, which is printed but allows the compilation to
  continue.  A severity of <code>off</code> or <code>allow</code>
  indicates that the associated error should be ignored.  Reducing the
  severity of any error from its default value, other than via one of
  the dialect directives described in this section, results in undefined
  behaviour. 
  </para>
  <para>
  The next level of configuration is to reset the severity level of
  a particular compiler option using: 
  <programlisting>
        #pragma TenDRA++ option <I>string-literal on</I>
        #pragma TenDRA++ option <I>string-literal allow</I>
  </programlisting>
  The given <I>string-literal</I> should name an option from the option
  catalogue.  The simplest form of compiler option just sets the severity
  level of one or more error messages.  Some of these options may require
  additional processing to be applied.</para>
  <para>
  It is possible to link a particular error message to a particular
  compiler option using: 
  <programlisting>
        #pragma TenDRA++ error <I>string-literal</I> as option <I>string-literal</I>
  </programlisting>
  </para>
  <para>
  Note that the directive: 
  <programlisting>
        #pragma TenDRA++ use error <I>string-literal</I> 
  </programlisting>
  can be used to raise a given error at any point in a translation unit
  in a similar fashion to the <code>#error</code> directive.  The values
  of any parameters for this error are unspecified. 
  </para>
  <para>
  The directives just described give the primitive operations on error
  messages and compiler options.  Many of the remaining directives in
  this section are merely higher level ways of expressing these primitives.
  </para>
  </sect3>  
  
  <sect3 id="scope">
    <title>2.2.3. Checking scopes</title>
  <para>
  Most compiler options are scoped.  A checking scope may be defined
  by enclosing a list of declarations within: 
  <programlisting>
        #pragma TenDRA begin
        ....
        #pragma TenDRA end
  </programlisting>
  If the final <code>end</code> directive is omitted then the scope
  ends at the end of the translation unit.  Checking scopes may be nested
  in the obvious way.  A checking scope inherits its initial set of
  checks from its enclosing scope (this includes the implicit main checking
  scope consisting of the entire input file).  Any checks switched on
  or off within a scope apply only to the remainder of that scope and
  any scope it contains.  A particular check can only be set once in
  a given scope. The set of applied checks reverts to its previous state
  at the end of the scope.</para>
  <para>
  A checking scope can be named using the directives: 
  <programlisting>
        #pragma TenDRA begin name environment <I>identifier</I>
        ....
        #pragma TenDRA end
  </programlisting>
  Checking scope names occupy a namespace distinct from any other namespace
  within the translation unit.  A named scope defines a set of modifications
  to the current checking scope.  These modifications may be reapplied
  within a different scope using: 
  <programlisting>
        #pragma TenDRA use environment <I>identifier</I>
  </programlisting>
  The default behaviour is not to allow checks set in the named checking
  scope to be reset in the current scope.  This can however be modified
  using: 
  <programlisting>
        #pragma TenDRA use environment <I>identifier</I> reset <I>allow</I>
  </programlisting>
  </para>
  <para>
  Another use of a named checking scope is to associate a checking scope
  with a named include file directory.  This is done using: 
  <programlisting>
        #pragma TenDRA directory <I>identifier</I> use environment <I>identifier</I>
  </programlisting>
  where the directory name is one introduced via a 
  <A HREF="man.html#directory"><code>-N</code> command-line option</A>.
  The effect of this directive, if a <code>#include</code> directive
  is found to resolve to a file from the given directory, is as if the
  file was enclosed in directives of the form: 
  <programlisting>
        #pragma TenDRA begin
        #pragma TenDRA use environment <I>identifier</I> reset allow
        ....
        #pragma TenDRA end
  </programlisting>
  </para>
  <para>
  The checks applied to the expansion of a macro definition are those
  from the scope in which the macro was defined, not that in which it
  was expanded. The macro arguments are checked in the scope in which
  they are specified, that is to say, the scope in which the macro is
  expanded.  This enables macro definitions to remain localised with
  respect to checking scopes. 
  </para>
  </sect3>  
  
  <sect3 id="limits">
    <title>2.2.4. Implementation limits</title>
  <para>
  This table gives the default implementation limits imposed by the
  C++ producer for the various implementation quantities listed in Annex
  B of the ISO C++ standard, together with the minimum limits allowed
  in ISO C and C++.  A default limit of <I>none</I> means that the quantity
  is limited only by the size of the host machine (either <code>ULONG_MAX</code>
  or until it runs out of memory).  A limit of <I>target</I> means that
  while no limits is imposed by the C++ front-end, particular target
  machines may impose such limits. 
  </para>
  
  <table>
  <tr><th>Quantity identifier</th>
  <th>Min C limit</th>  <th>Min C++ limit</th>
  <th>Default limit</th>
  </tr>
  <tr><td>statement_depth</td>
  <td>15</td>  <td>256</td>
  <td>none</td>
  </tr>
  <tr><td>hash_if_depth</td>
  <td>8</td>  <td>256</td>
  <td>none</td>
  </tr>
  <tr><td>declarator_max</td>
  <td>12</td>  <td>256</td>
  <td>none</td>
  </tr>
  <tr><td>paren_depth</td>
  <td>32</td>  <td>256</td>
  <td>none</td>
  </tr>
  <tr><td>name_limit</td>
  <td>31</td>  <td>1024</td>
  <td>none</td>
  </tr>
  <tr><td>extern_name_limit</td>
  <td>6</td>  <td>1024</td>
  <td>target</td>
  </tr>
  <tr><td>external_ids</td>
  <td>511</td>  <td>65536</td>
  <td>target</td>
  </tr>
  <tr><td>block_ids</td>
  <td>127</td>  <td>1024</td>
  <td>none</td>
  </tr>
  <tr><td>macro_ids</td>
  <td>1024</td>  <td>65536</td>
  <td>none</td>
  </tr>
  <tr><td>func_pars</td>
  <td>31</td>  <td>256</td>
  <td>none</td>
  </tr>
  <tr><td>func_args</td>
  <td>31</td>  <td>256</td>
  <td>none</td>
  </tr>
  <tr><td>macro_pars</td>
  <td>31</td>  <td>256</td>
  <td>none</td>
  </tr>
  <tr><td>macro_args</td>
  <td>31</td>  <td>256</td>
  <td>none</td>
  </tr>
  <tr><td>line_length</td>
  <td>509</td>  <td>65536</td>
  <td>none</td>
  </tr>
  <tr><td>string_length</td>
  <td>509</td>  <td>65536</td>
  <td>none</td>
  </tr>
  <tr><td>sizeof_object</td>
  <td>32767</td>  <td>262144</td>
  <td>target</td>
  </tr>
  <tr><td>include_depth</td>
  <td>8</td>  <td>256</td>
  <td>256</td>
  </tr>
  <tr><td>switch_cases</td>
  <td>257</td>  <td>16384</td>
  <td>none</td>
  </tr>
  <tr><td>data_members</td>
  <td>127</td>  <td>16384</td>
  <td>none</td>
  </tr>
  <tr><td>enum_consts</td>
  <td>127</td>  <td>4096</td>
  <td>none</td>
  </tr>
  <tr><td>nested_class</td>
  <td>15</td>  <td>256</td>
  <td>none</td>
  </tr>
  <tr><td>atexit_funcs</td>
  <td>32</td>  <td>32</td>
  <td>target</td>
  </tr>
  <tr><td>base_classes</td>
  <td>N/A</td>  <td>16384</td>
  <td>none</td>
  </tr>
  <tr><td>direct_bases</td>
  <td>N/A</td>  <td>1024</td>
  <td>none</td>
  </tr>
  <tr><td>class_members</td>
  <td>N/A</td>  <td>4096</td>
  <td>none</td>
  </tr>
  <tr><td>virtual_funcs</td>
  <td>N/A</td>  <td>16384</td>
  <td>none</td>
  </tr>
  <tr><td>virtual_bases</td>
  <td>N/A</td>  <td>1024</td>
  <td>none</td>
  </tr>
  <tr><td>static_members</td>
  <td>N/A</td>  <td>1024</td>
  <td>none</td>
  </tr>
  <tr><td>friends</td>
  <td>N/A</td>  <td>4096</td>
  <td>none</td>
  </tr>
  <tr><td>access_declarations</td>
  <td>N/A</td>  <td>4096</td>
  <td>none</td>
  </tr>
  <tr><td>ctor_initializers</td>
  <td>N/A</td>  <td>6144</td>
  <td>none</td>
  </tr>
  <tr><td>scope_qualifiers</td>
  <td>N/A</td>  <td>256</td>
  <td>none</td>
  </tr>
  <tr><td>external_specs</td>
  <td>N/A</td>  <td>1024</td>
  <td>none</td>
  </tr>
  <tr><td>template_pars</td>
  <td>N/A</td>  <td>1024</td>
  <td>none</td>
  </tr>
  <tr><td>instance_depth</td>
  <td>N/A</td>  <td>17</td>
  <td>17</td>
  </tr>
  <tr><td>exception_handlers</td>
  <td>N/A</td>  <td>256</td>
  <td>none</td>
  </tr>
  <tr><td>exception_specs</td>
  <td>N/A</td>  <td>256</td>
  <td>none</td>
  </tr>
  </table>
  
  <para>
  It is possible to impose lower limits on most of the quantities listed
  above by means of the directive: 
  <programlisting>
        #pragma TenDRA++ option value <I>string-literal integer-literal</I>
  </programlisting>
  where <I>string-literal</I> gives one of the quantity identifiers
  listed above and <I>integer-literal</I> gives the limit to be imposed.
  An error is reported if the quantity exceeds this limit (note however
  that checks have not yet been implemented for all of the quantities
  listed).  Note that the <A HREF="#identifier"><code>name_limit</code></A>
  and 
  <A HREF="#include"><code>include_depth</code></A> implementation limits
  can be set using dedicated directives. 
  </para>
  <para>
  The maximum number of errors allowed before the producer bails out
  can be set using the directive:
  <programlisting>
        #pragma TenDRA++ set error limit <I>integer-literal</I>
  </programlisting>
  The default value is 32.
  </para>
  </sect3>  
  
  <sect3 id="lex">
    <title>2.2.5. Lexical analysis</title>
  <para>
  During lexical analysis, a source file which is not empty should end
  in a newline character.  It is possible to relax this constraint using
  the directive: 
  <programlisting>
        #pragma TenDRA no nline after file end <I>allow</I>
  </programlisting>
  </para>
  </sect3>  
  
  <sect3 id="keyword">
    <title>2.2.6. Keywords</title>
  <para>
  In several places in this section it is described how to introduce
  keywords for TenDRA language extensions.  By default, no such extra
  keywords are defined.  There are also low-level directives for defining
  and undefining keywords.  The directive: 
  <programlisting>
        #pragma TenDRA++ keyword <I>identifier</I> for keyword <I>identifier</I> 
  </programlisting>
  can be used to introduce a keyword (the first identifier) standing
  for the standard C++ keyword given by the second identifier.  The
  directive: 
  <programlisting>
        #pragma TenDRA++ keyword <I>identifier</I> for operator <I>operator</I> 
  </programlisting>
  can similarly be used to introduce a keyword giving an alternative
  representation for the given operator or punctuator, as, for example,
  in: 
  <programlisting>
        #pragma TenDRA++ keyword and for operator &amp;&amp;
  </programlisting>
  Finally the directive: 
  <programlisting>
        #pragma TenDRA++ undef keyword <I>identifier</I> 
  </programlisting>
  can be used to undefine a keyword. 
  </para>
  </sect3>  
  
  <sect3 id="comment">
    <title>2.2.7. Comments</title>
  <para>
  C-style comments do not nest.  The directive: 
  <programlisting>
        #pragma TenDRA nested comment analysis <I>on</I>
  </programlisting>
  enables a check for the characters <code>/*</code> within C-style
  comments. 
  </para>
  </sect3>  
  
  <sect3 id="identifier-names">
    <title>2.2.8. Identifier names</title>
  <para>
  During lexical analysis, each character in the source file has an
  associated look-up value which is used to determine whether the character
  can be used in an identifier name, is a white space character etc.
  These values are stored in a simple look-up table.  It is possible
  to set the look-up value using: 
  <programlisting>
        #pragma TenDRA++ character <I>character-literal</I> as <I>character-literal</I> allow 
  </programlisting>
  which sets the look-up for the first character to be the default look-up
  for the second character.  The form: 
  <programlisting>
        #pragma TenDRA++ character <I>character-literal</I> disallow 
  </programlisting>
  sets the look-up of the character to be that of an invalid character.
  The forms: 
  <programlisting>
        #pragma TenDRA++ character <I>string-literal</I> as <I>character-literal</I> allow 
        #pragma TenDRA++ character <I>string-literal</I> disallow 
  </programlisting>
  can be used to modify the look-up values for the set of characters
  given by the string literal.  For example: 
  <programlisting>
        #pragma TenDRA character '$' as 'a' allow
        #pragma TenDRA character '\r' as ' ' allow
  </programlisting>
  allows <code>$</code> to be used in identifier names (like <code>a</code>)
  and carriage return to be a white space character.  The former is
  a common dialect feature and can also be controlled by the directive:
  <programlisting>
        #pragma TenDRA dollar as ident <I>allow</I>
  </programlisting>
  </para>
  <para>
  The maximum number of characters allowed in an identifier name can
  be set using the directives: 
  <programlisting>
        #pragma TenDRA set name limit <I>integer-literal</I>
        #pragma TenDRA++ set name limit <I>integer-literal</I> warning 
  </programlisting>
  This length is given by the <code>name_limit</code> implementation
  quantity  
  <A HREF="#limits">mentioned above</A>.  Identifiers which exceed this
  length raise an error or a warning, but are not truncated. 
  </para>
  </sect3>  
  
  <sect3 id="int">
    <title>2.2.9. Integer literals</title>
  <para>
  The rules for finding the type of an integer literal can be described
  using directives of the form: 
  <programlisting>
        #pragma TenDRA integer literal <I>literal-spec</I>
  </programlisting>
  where: 
  <programlisting>
        <I>literal-spec</I> :
                <I>literal-base literal-suffix<SUB>opt</SUB> literal-type-list</I>
  
        <I>literal-base</I> :
                octal
                decimal
                hexadecimal
  
        <I>literal-suffix</I> :
                unsigned
                long
                unsigned long
                long long
                unsigned long long
  
        <I>literal-type-list</I> :
                * <I>literal-type-spec</I>
                <I>integer-literal literal-type-spec</I> | <I>literal-type-list</I>
                ? <I>literal-type-spec</I> | <I>literal-type-list</I>
  
        <I>literal-type-spec</I> :
                : <I>type-id</I>
                * <I>allow<SUB>opt</SUB></I> : <I>identifier</I>
                * * <I>allow<SUB>opt</SUB></I> :
  </programlisting>
  Each directive gives a literal base and suffix, describing the form
  of an integer literal, and a list of possible types for literals of
  this form. This list gives a mapping from the value of the literal
  to the type to be used to represent the literal.  There are three
  cases for the literal type; it may be a given integral type, it may
  be calculated using a given <A HREF="lib.html#literal">literal type
  token</A>, or it may cause an error to be raised.  There are also
  three cases for describing a literal range; it may be given by values
  less than or equal to a given integer literal, it may be given by
  values which are guaranteed to fit into a given integral type, or
  it may be match any value.  For example: 
  <programlisting>
        #pragma token PROC ( VARIETY c ) VARIETY l_i # ~lit_int
        #pragma TenDRA integer literal decimal 32767 : int | ** : l_i
  </programlisting>
  describes how to find the type of a decimal literal with no suffix.
  Values less that or equal to 32767 have type <code>int</code>; larger
  values have target dependent type calculated using the token 
  <code>~lit_int</code>.  Introducing a <code>warning</code> into the
  directive will cause a warning to be printed if the token is used
  to calculate the value. 
  </para>
  <para>
  Note that this scheme extends that implemented by the C producer,
  because of the need for more accurate information in the C++ producer.
  For example, the specification above does not fully express the ISO
  rule that the type of a decimal integer is the first of the types
  <code>int</code>, <code>long</code> and <code>unsigned long</code>
  which it fits into (it only expresses the first step).  However with
  the C++ extensions it is possible to write: 
  <programlisting>
        #pragma token PROC ( VARIETY c ) VARIETY l_i # ~lit_int
        #pragma TenDRA integer literal decimal ? : int | ? : long |\
            ? : unsigned long | ** : l_i
  </programlisting>
  </para>
  </sect3>  
  
  <sect3 id="char">
    <title>2.2.10. Character literals and built-in types</title>
  <para>
  By default, a simple character literal has type <code>int</code> in
  C and type <code>char</code> in C++.  The type of such literals can
  be controlled using the directive: 
  <programlisting>
        #pragma TenDRA++ set character literal : <I>type-id</I> 
  </programlisting>
  The type of a wide character literal is given by the implementation
  defined type <code>wchar_t</code>.  By default, the definition of
  this type is taken from the target machine's <code>&lt;stddef.h&gt;</code>
  C header (note that in ISO C++, <code>wchar_t</code> is actually a
  keyword, but its underlying representation must be the same as in
  C). This definition can be overridden in the producer by means of
  the directive: 
  <programlisting>
        #pragma TenDRA set wchar_t : <I>type-id</I>
  </programlisting>
  for an integral type <I>type-id</I>.  Similarly, the definitions of
  the other implementation dependent integral types which arise naturally
  within the language - the type of the difference of two pointers,
  <code>ptrdiff_t</code>, and the type of the <code>sizeof</code>
  operator, <code>size_t</code> - given in the <code>&lt;stddef.h&gt;</code>
  header can be overridden using the directives: 
  <programlisting>
        #pragma TenDRA set ptrdiff_t : <I>type-id</I>
        #pragma TenDRA set size_t : <I>type-id</I>
  </programlisting>
  These directives are useful when targeting a specific machine on which
  the definitions of these types are known; while they may not affect
  the code generated they can cut down on spurious conversion warnings.
  Note that although these types are built into the producer they are
  not visible to the user unless an appropriate header is included (with
  the exception of the keyword <code>wchar_t</code> in ISO C++), however
  the directives: 
  <programlisting>
        #pragma TenDRA++ type <I>identifier</I> for <I>type-name</I> 
  </programlisting>
  can be used to make these types visible.  They are equivalent to a
  <code>typedef</code> declaration of <I>identifier</I> as the given
  built-in type, <code>ptrdiff_t</code>, <code>size_t</code> or 
  <code>wchar_t</code>. 
  </para>
  <para>
  Whether plain <code>char</code> is signed or unsigned is implementation
  dependent.  By default the implementation is determined by the definition
  of the <A HREF="lib.html#arith"><code>~char</code> token</A>, however
  this can be overridden in the producer either by means of the 
  <A HREF="#table">portability table</A> or by the directive: 
  <programlisting>
        #pragma TenDRA character <I>character-sign</I>
  </programlisting>
  where <I>character-sign</I> can be <code>signed</code>, 
  <code>unsigned</code> or <code>either</code> (the default).  Again
  this directive is useful primarily when targeting a specific machine
  on which the signedness of <code>char</code> is known. 
  </para>
  </sect3>  
  
  <sect3 id="string">
    <title>2.2.11. String literals</title>
  <para>
  By default, character string literals have type <code>char [n]</code>
  in C and older dialects of C++, but type <code>const char [n]</code>
  in ISO C++.  Similarly wide string literals have type <code>wchar_t
  [n]</code>
  or <code>const wchar_t [n]</code>.  Whether string literals are 
  <code>const</code> or not can be controlled using the two directives:
  <programlisting>
        #pragma TenDRA++ set string literal : const 
        #pragma TenDRA++ set string literal : no const 
  </programlisting>
  In the case where literals are <code>const</code>, the array-to-pointer
  conversion is allowed to cast away the <code>const</code> to allow
  for a degree of backwards compatibility.  The status of this deprecated
  conversion can be controlled using the directive: 
  <programlisting>
        #pragma TenDRA writeable string literal <I>allow</I>
  </programlisting>
  (yes, I know that that should be <code>writable</code>).  Note that
  this directive has a slightly different meaning in the C producer.
  </para>
  <para>
  Adjacent string literals tokens of similar types (either both character
  string literals or both wide string literals) are concatenated at
  an early stage in parser, however it is unspecified what happens if
  a character string literal token is adjacent to a wide string literal
  token.  By default this gives an error, but the directive: 
  <programlisting>
        #pragma TenDRA unify incompatible string literal <I>allow</I>
  </programlisting>
  can be used to enable the strings to be concatenated to give a wide
  string literal. 
  </para>
  <para>
  If a <code>'</code> or <code>&quot;</code> character does not have
  a matching closing quote on the same line then it is undefined whether
  an implementation should report an unterminated string or treat the
  quote as a single unknown character.  By default, the C++ producer
  treats this as an unterminated string, but this behaviour can be controlled
  using the directive: 
  <programlisting>
        #pragma TenDRA unmatched quote <I>allow</I>
  </programlisting>
  </para>
  </sect3>  
  
  <sect3 id="escape">
    <title>2.2.12. Escape sequences</title>
  <para>
  By default, if the character following the <code>\</code> in an escape
  sequence is not one of those listed in the ISO C or C++ standards
  then an error is given.  This behaviour, which is left unspecified
  by the standards, can be controlled by the directive: 
  <programlisting>
        #pragma TenDRA unknown escape <I>allow</I>
  </programlisting>
  The result is that the <code>\</code> in unknown escape sequences
  is ignored, so that <code>\z</code> is interpreted as <code>z</code>,
  for example.  Individual escape sequences can be enabled or disabled
  using the directives: 
  <programlisting>
        #pragma TenDRA++ escape <I>character-literal</I> as <I>character-literal</I> allow 
        #pragma TenDRA++ escape <I>character-literal</I> disallow 
  </programlisting>
  so that, for example: 
  <programlisting>
        #pragma TenDRA++ escape 'e' as '\033' allow 
        #pragma TenDRA++ escape 'a' disallow 
  </programlisting>
  sets <code>\e</code> to be the ASCII escape character and disables
  the alert character <code>\a</code>. 
  </para>
  <para>
  By default, if the value of a character, given for example by a 
  <code>\x</code> escape sequence, does not fit into its type then an
  error is given.  This implementation dependent behaviour can however
  be controlled by the directive: 
  <programlisting>
        #pragma TenDRA character escape overflow <I>allow</I>
  </programlisting>
  the value being converted to its type in the normal way. 
  </para>
  </sect3>  
  
  <sect3 id="ppdir">
    <title>2.2.13. Preprocessing directives</title>
  <para>
  Non-standard preprocessing directives can be controlled using the
  directives: 
  <programlisting>
        #pragma TenDRA directive <I>ppdir allow</I>
        #pragma TenDRA directive <I>ppdir</I> (ignore) <I>allow</I>
  </programlisting>
  where <I>ppdir</I> can be <code>assert</code>, <code>file</code>,
  <code>ident</code>, <code>import</code> (C++ only), 
  <code>include_next</code> (C++ only), <code>unassert</code>,
  <code>warning</code> (C++ only) or <code>weak</code>.  The second form
  causes the directive to be processed but ignored (note that there is no
  <code>(ignore) disallow</code> form).  The treatment of other unknown
  preprocessing directives can be controlled using: 
  <programlisting>
        #pragma TenDRA unknown directive <I>allow</I>
  </programlisting>
  Cases where the token following the <code>#</code> in a preprocessing
  directive is not an identifier can be controlled using: 
  <programlisting>
        #pragma TenDRA no directive/nline after ident <I>allow</I>
  </programlisting>
  When permitted, unknown preprocessing directives are ignored. 
  </para>
  <para>
  By default, unknown <code>#pragma</code> directives are ignored without
  comment, however this behaviour can be modified using the directive:
  <programlisting>
        #pragma TenDRA unknown pragma <I>allow</I>
  </programlisting>
  Note that any unknown <code>#pragma TenDRA</code> directives always
  give an error. 
  </para>
  <para>
  Older preprocessors allowed text after <code>#else</code> and 
  <code>#endif</code> directives.  The following directive can be used
  to enable such behaviour: 
  <programlisting>
        #pragma TenDRA text after directive <I>allow</I>
  </programlisting>
  Such text after a directive is ignored. 
  </para>
  <para>
  Some older preprocessors have problems with white space in preprocessing
  directives - whether at the start of the line, before the initial
  <code>#</code>, or between the <code>#</code> and the directive identifier.
  Such white space can be detected using the directives: 
  <programlisting>
        #pragma TenDRA indented # directive <I>allow</I>
        #pragma TenDRA indented directive after # <I>allow</I>
  </programlisting>
  respectively. 
  </para>
  </sect3>  
  
  <sect3 id="target-if">
    <title>2.2.14. Target dependent conditional inclusion</title>
  <para>
  One of the effects of trying to compile code in a target independent
  manner is that it is not always possible to completely evaluate the
  condition in a <code>#if</code> directive.  Thus the conditional inclusion
  needs to be preserved until the installer phase.  This can only be
  done if the target dependent <code>#if</code> is more structured than
  is normally required for preprocessing directives. There are two cases;
  in the first, where the <code>#if</code> appears in a statement, it
  is treated as if it were a <code>if</code> statement with braces including
  its branches; that is: 
  <programlisting>
        #if cond
            true_statements
        #else
            false_statements
        #endif
  </programlisting>
  maps to: 
  <programlisting>
        if ( cond ) {
            true_statements
        } else {
            false_statements
        }
  </programlisting>
  In the second case, where the <code>#if</code> appears in a list of
  declarations, normally gives an error.  The can however be overridden
  by the directive: 
  <programlisting>
        #pragma TenDRA++ conditional declaration <I>allow</I>
  </programlisting>
  which causes both branches of the <code>#if</code> to be analysed.
  </para>
  </sect3>  
  
  <sect3 id="include">
    <title>2.2.15. File inclusion directives</title>
  <para>
  There is a maximum depth of nested <code>#include</code>
  directives allowed by the C++ producer. This depth is given by the
  <code>include_depth</code> implementation quantity  
  <A HREF="#limits">mentioned above</A>.  Its value is fairly small
  in order to detect recursive inclusions.  The maximum depth can be
  set using: 
  <programlisting>
        #pragma TenDRA includes depth <I>integer-literal</I>
  </programlisting>
  </para>
  <para>
  A further check, for full pathnames in <code>#include</code> directives
  (which may not be portable), can be enabled using the directive: 
  <programlisting>
        #pragma TenDRA++ complete file includes <I>allow</I> 
  </programlisting>
  </para>
  </sect3>  
  
  <sect3 id="macro">
    <title>2.2.16. Macro definitions</title>
  <para>
  By default, multiple consistent definitions of a macro are allowed.
  This behaviour can be controlled using the directive: 
  <programlisting>
        #pragma TenDRA extra macro definition <I>allow</I>
  </programlisting>
  The ISO C/C++ rules for determining whether two macro definitions
  are consistent are fairly restrictive.  A more relaxed rule allowing
  for consistent renaming of macro parameters can be enabled using:
  <programlisting>
        #pragma TenDRA weak macro equality <I>allow</I>
  </programlisting>
  </para>
  <para>
  In the definition of macros with parameters, a <code>#</code> in the
  replacement list must be followed by a parameter name, indicating
  the stringising operation.  This behaviour can be controlled by the
  directive: 
  <programlisting>
        #pragma TenDRA no ident after # <I>allow</I>
  </programlisting>
  which allows a <code>#</code> which is not followed by a parameter
  name to be treated as a normal preprocessing token. 
  </para>
  <para>
  In a list of macro arguments, the effect of a sequence of preprocessing
  tokens which otherwise resembles a preprocessing directive is undefined.
  The C++ producer treats such directives as normal sequences of preprocessing
  tokens, but can be made to report such behaviour using: 
  <programlisting>
        #pragma TenDRA directive as macro argument <I>allow</I>
  </programlisting>
  </para>
  </sect3>  
  
  <sect3 id="empty">
    <title>2.2.17. Empty source files</title>
  <para>
  ISO C requires that a translation unit should contain at least one
  declaration.  C++ and older dialects of C allow translation units
  which contain no declarations.  This behaviour can be controlled using
  the directive: 
  <programlisting>
        #pragma TenDRA no external declaration <I>allow</I>
  </programlisting>
  </para>
  </sect3>  
  
  <sect3 id="std">
    <title>2.2.18. The <code>std</code> namespace</title>
  <para>
  Several classes declared in the <code>std</code> namespace arise naturally
  as part of the C++ language specification.  These are as follows:
  <programlisting>
        std::type_info          // type of typeid construct
        std::bad_cast           // thrown by dynamic_cast construct
        std::bad_typeid         // thrown by typeid construct
        std::bad_alloc          // thrown by new construct
        std::bad_exception      // used in exception specifications
  </programlisting>
  The definitions of these classes are found, when needed, by looking
  up the appropriate class name in the <code>std</code> namespace. 
  Depending on the context, an error may be reported if the class is
  not found. It is possible to modify the namespace which is searched
  for these classes using the directive: 
  <programlisting>
        #pragma TenDRA++ set std namespace : <I>scope-name</I>
  </programlisting>
  where <I>scope-name</I> can be an identifier giving a namespace name
  or <code>::</code>, indicating the global namespace. 
  </para>
  </sect3>  
  
  <sect3 id="linkage">
    <title>2.2.19. Object linkage</title>
  <para>
  If an object is declared with both external and internal linkage in
  the same translation unit then, by default, an error is given.  This
  behaviour can be changed using the directive: 
  <programlisting>
        #pragma TenDRA incompatible linkage <I>allow</I>
  </programlisting>
  When incompatible linkages are allowed, whether the resultant identifier
  has external or internal linkage can be set using one of the directives:
  <programlisting>
        #pragma TenDRA linkage resolution : off
        #pragma TenDRA linkage resolution : (external) <I>on</I>
        #pragma TenDRA linkage resolution : (internal) <I>on</I>
  </programlisting>
  </para>
  <para>
  It is possible to declare objects with external linkage in a block.
  C leaves it undefined whether declarations of the same object in different
  blocks, such as: 
  <programlisting>
        void f ()
        {
            extern int a ;
            ....
        }
  
        void g ()
        {
            extern double a ;
            ....
        }
  </programlisting>
  are checked for compatibility.  However in C++ the one definition
  rule implies that such declarations are indeed checked for compatibility.
  The status of this check can be set using the directive: 
  <programlisting>
        #pragma TenDRA unify external linkage <I>on</I>
  </programlisting>
  Note that it is not possible in ISO C or C++ to declare objects or
  functions with internal linkage in a block.  While <code>static</code>
  object definitions in a block have a specific meaning, there is no
  real reason why <code>static</code> functions should not be declared
  in a block.  This behaviour can be enabled using the directive: 
  <programlisting>
        #pragma TenDRA block function static <I>allow</I>
  </programlisting>
  </para>
  <para>
  Inline functions have external linkage by default in ISO C++, but
  internal linkage in older dialects.  The default linkage can be set
  using the directive: 
  <programlisting>
        #pragma TenDRA++ inline linkage <I>linkage-spec</I> 
  </programlisting>
  where <I>linkage-spec</I> can be <code>external</code> or 
  <code>internal</code>.  Similarly <code>const</code> objects have
  internal linkage by default in C++, but external linkage in C.  The
  default linkage can be set using the directive: 
  <programlisting>
        #pragma TenDRA++ const linkage <I>linkage-spec</I> 
  </programlisting>
  </para>
  <para>
  Older dialects of C treated all identifiers with external linkage
  as if they had been declared <code>volatile</code> (i.e. by being
  conservative in optimising such values).  This behaviour can be enabled
  using the directive: 
  <programlisting>
        #pragma TenDRA external volatile_t
  </programlisting>
  </para>
  <para>
  It is possible to set the default language linkage using the directive:
  <programlisting>
        #pragma TenDRA++ external linkage <I>string-literal</I> 
  </programlisting>
  This is equivalent to enclosing the rest of the current checking scope
  in: 
  <programlisting>
        extern <I>string-literal</I> {
            ....
        }
  </programlisting>
  It is unspecified what happens if such a directive is used within
  an explicit linkage specification and does not nest correctly.  This
  directive is particularly useful when used in a <A HREF="#scope">named
  environment</A> associated with an include directory.  For example,
  it can be used to express the fact that all the objects declared in
  headers included from that directory have C linkage. 
  </para>
  <para>
  A change in ISO C++ relative to older dialects is that the language
  linkage of a function now forms part of the function type.  For example:
  <programlisting>
        extern &quot;C&quot; int f ( int ) ;
        int ( *pf ) ( int ) = f ;               // error
  </programlisting>
  The directive: 
  <programlisting>
        #pragma TenDRA++ external function linkage <I>on</I> 
  </programlisting>
  can be used to control whether function types with differing language
  linkages, but which are otherwise compatible, are considered compatible
  or not. 
  </para>
  </sect3>  
  
  <sect3 id="static">
    <title>2.2.20. Static identifiers</title>
  <para>
  By default, objects and functions with internal linkage are mapped
  to tags without external names in the output TDF capsule.  Thus such
  names are not available to the installer and it needs to make up internal
  names to represent such objects in its output.  This is not desirable
  in such operations as profiling, where a meaningful internal name
  is needed to make sense of the output.  The directive: 
  <programlisting>
        #pragma TenDRA preserve <I>identifier-list</I>
  </programlisting>
  can be used to preserve the names of the given list of identifiers
  with internal linkage.  This is done using the <code>static_name_def</code>
  TDF construct.  The form: 
  <programlisting>
        #pragma TenDRA preserve *
  </programlisting>
  will preserve the names of all identifiers with internal linkage in
  this way. 
  </para>
  </sect3>  
  
  <sect3 id="decl_none">
    <title>2.2.21. Empty declarations</title>
  <para>
  ISO C++ requires every declaration or member declaration to introduce
  one or more names into the program.  The directive: 
  <programlisting>
        #pragma TenDRA unknown struct/union <I>allow</I>
  </programlisting>
  can be used to relax one particular instance of this rule, by allowing
  anonymous class definitions (recall that anonymous unions are objects,
  not types, in C++ and so are not covered by this rule).  The C++ grammar
  also allows a solitary semicolon as a declaration or member declaration;
  however such a declaration does not introduce a name and so contravenes
  the rule above.  The rule can be relaxed in this case using the directive:
  <programlisting>
        #pragma TenDRA extra ; <I>allow</I>
  </programlisting>
  Note that the C++ grammar explicitly allows for an extra semicolon
  following an inline member function definition, but that semicolons
  following other function definitions are actually empty declarations
  of the form above.  A solitary semicolon in a statement is interpreted
  as an empty expression statement rather than an empty declaration
  statement. 
  </para>
  </sect3>  
  
  <sect3 id="implicit">
    <title>2.2.22. Implicit <code>int</code></title>
  <para>
  The C &quot;implicit <code>int</code>&quot; rule, whereby a type of
  <code>int</code>
  is inferred in a list of type or declaration specifiers which does
  not contain a type name, has been removed in ISO C++, although it
  was supported in older dialects of C++.  This check is controlled
  by the directive: 
  <programlisting>
        #pragma TenDRA++ implicit int type <I>allow</I> 
  </programlisting>
  Partial relaxations of this rules are allowed.  The directive: 
  <programlisting>
        #pragma TenDRA++ implicit int type for const/volatile <I>allow</I> 
  </programlisting>
  will allow for implicit <code>int</code> when the list of type specifiers
  contains a cv-qualifier.  Similarly the directive: 
  <programlisting>
        #pragma TenDRA implicit int type for function return <I>allow</I>
  </programlisting>
  will allow for implicit <code>int</code> in the return type of a function
  definition (this excludes constructors, destructors and conversion
  functions, where special rules apply).  A function definition is the
  only kind of declaration in ISO C where a declaration specifier is
  not required. Older dialects of C allowed declaration specifiers to
  be omitted in other cases.  Support for this behaviour can be enabled
  using: 
  <programlisting>
        #pragma TenDRA implicit int type for external declaration <I>allow</I>
  </programlisting>
  The four cases can be demonstrated in the following example: 
  <programlisting>
        extern a ;              // implicit int
        const b = 1 ;           // implicit const int
  
        f ()                    // implicit function return
        {
            return 2 ;
        }
  
        c = 3 ;                 // error: not allowed in C++
  </programlisting>
  </para>
  </sect3>  
  
  <sect3 id="longlong">
    <title>2.2.23. Extended integral types</title>
  <para>
  The <code>long long</code> integral types are not part of ISO C or
  C++ by default, however support for them can be enabled using the
  directive: 
  <programlisting>
        #pragma TenDRA longlong type <I>allow</I>
  </programlisting>
  This support includes allowing <code>long long</code> in type specifiers
  and allowing <code>LL</code> and <code>ll</code> as integer literal
  suffixes. 
  </para>
  <para>
  There is a further directive given by the two cases: 
  <programlisting>
        #pragma TenDRA set longlong type : long long
        #pragma TenDRA set longlong type : long
  </programlisting>
  which can be used to control the implementation of the <code>long
  long</code> types.  Either they can be mapped to the 
  <A HREF="lib.html#arith">default representation</A>, which is guaranteed
  to contain at least 64 bits, or they can be mapped to the corresponding
  <code>long</code> types. 
  </para>
  <para>
  Because these <code>long long</code> types are not an intrinsic part
  of C++ the implementation does not integrate them into the language
  as fully as is possible.  This is to prevent the presence or otherwise
  of 
  <code>long long</code> types affecting the semantics of code which
  does not use them.  For example, it would be possible to extend the
  rules for the types of integer literals, integer promotion types and
  arithmetic types to say that if the given value does not fit into
  the standard integral types then the extended types are tried.  This
  has not been done, although these rules could be implemented by changing
  the definitions of the <A HREF="lib.html#arith">standard tokens</A>
  used to determine these types.  By default, only the rules for arithmetic
  types involving a <code>long long</code> operand and for <code>LL</code>
  integer literals mention <code>long long</code> types. 
  </para>
  </sect3>  
  
  <sect3 id="bitfield-types">
    <title>2.2.24. Bitfield types</title>
  <para>
  The C++ rules on bitfield types differ slightly from the C rules.
  Firstly any integral or enumeration type is allowed in a bitfield,
  and secondly the bitfield width may exceed the underlying type size
  (the extra bits being treated as padding).  These properties can be
  controlled using the directives: 
  <programlisting>
        #pragma TenDRA extra bitfield int type <I>allow</I>
        #pragma TenDRA bitfield overflow <I>allow</I>
  </programlisting>
  respectively. 
  </para>
  </sect3>  
  
  <sect3 id="elab">
    <title>2.2.25. Elaborated type specifiers</title>
  <para>
  In elaborated type specifiers, the class key (<code>class</code>,
  <code>struct</code>, <code>union</code> or <code>enum</code>) should
  agree with any previous declaration of the type (except that <code>class</code>
  and <code>struct</code> are interchangeable).  This requirement can
  be relaxed using the directive: 
  <programlisting>
        #pragma TenDRA ignore struct/union/enum tag <I>on</I>
  </programlisting>
  </para>
  <para>
  In ISO C and C++ it is not possible to give a forward declaration
  of an enumeration type.  This constraint can be relaxed using the
  directive: 
  <programlisting>
        #pragma TenDRA forward enum declaration <I>allow</I>
  </programlisting>
  Until the end of its definition, an enumeration type is treated as
  an incomplete type (as with class types).  In enumeration definitions,
  and a couple of other contexts where comma-separated lists are required,
  the directive: 
  <programlisting>
        #pragma TenDRA extra , <I>allow</I>
  </programlisting>
  can be used to allow a trailing comma at the end of the list. 
  </para>
  <para>
  The directive: 
  <programlisting>
        #pragma TenDRA complete struct/union analysis <I>on</I>
  </programlisting>
  can be used to enable a check that every class or union has been completed
  within each translation unit in which it is declared. 
  </para>
  </sect3>  
  
  <sect3 id="impl_func">
    <title>2.2.26. Implicit function declarations</title>
  <para>
  C, but not C++, allows calls to undeclared functions, the function
  being declared implicitly.  It is possible to enable support for implicit
  function declarations using the directive: 
  <programlisting>
        #pragma TenDRA implicit function declaration <I>on</I>
  </programlisting>
  Such implicitly declared functions have C linkage and type 
  <code>int ( ... )</code>. 
  </para>
  </sect3>  
  
  <sect3 id="weak">
    <title>2.2.27. Weak function prototypes</title>
  <para>
  The C producer supports a concept, weak prototypes, whereby type checking
  can be applied to the arguments of a non-prototype function.  This
  checking can be enabled using the directive: 
  <programlisting>
        #pragma TenDRA weak prototype analysis <I>on</I>
  </programlisting>
  The concept of weak prototypes is not applicable to C++, where all
  functions are prototyped.  The C++ producer does allow the syntax
  for explicit weak prototype declarations, but treats them as if they
  were normal prototypes.  These declarations are denoted by means of
  a keyword, 
  <code>WEAK</code> say, introduced by the directive: 
  <programlisting>
        #pragma TenDRA keyword <I>identifier</I> for weak
  </programlisting>
  preceding the <code>(</code> of the function declarator.  The directives:
  <programlisting>
        #pragma TenDRA prototype <I>allow</I>
        #pragma TenDRA prototype (weak) <I>allow</I>
  </programlisting>
  which can be used in the C producer to warn of prototype or weak prototype
  declarations, are similarly ignored by the C++ producer. 
  </para>
  <para>
  The C producer also allows the directives: 
  <programlisting>
        #pragma TenDRA argument <I>type-id</I> as <I>type-id</I>
        #pragma TenDRA argument <I>type-id</I> as ...
        #pragma TenDRA extra ... <I>allow</I>
        #pragma TenDRA incompatible promoted function argument <I>allow</I>
  </programlisting>
  which control the compatibility of function types.  These directives
  are ignored by the C++ producer (some of them would make sense in
  the context of C++ but would over-complicate function overloading).
  </para>
  </sect3>  
  
  <sect3 id="printf">
    <title>2.2.28. <code>printf</code> and <code>scanf</code>
  argument checking</title>
  <para>
  The C producer includes a number of checks that the arguments in a
  call to a function in the <code>printf</code> or <code>scanf</code>
  families match the given format string.  The check is implemented
  by using the directives: 
  <programlisting>
        #pragma TenDRA type <I>identifier</I> for ... printf
        #pragma TenDRA type <I>identifier</I> for ... scanf
  </programlisting>
  to introduce a type representing a <code>printf</code> or <code>scanf</code>
  format string.  For most purposes this type is treated as <code>const
  char *</code>, but when it appears in a function declaration it alerts
  the producer that any extra arguments passed to that function should
  match the format string passed as the corresponding argument.  The
  TenDRA API headers conditionally declare <code>printf</code>, 
  <code>scanf</code> and similar functions in something like the form:
  <programlisting>
        #ifdef __NO_PRINTF_CHECKS
        typedef const char *__printf_string ;
        #else
        #pragma TenDRA type __printf_string for ... printf
        #endif
  
        int printf ( __printf_string, ... ) ;
        int fprintf ( FILE *, __printf_string, ... ) ;
        int sprintf ( char *, __printf_string, ... ) ;
  </programlisting>
  These declarations can be skipped, effectively disabling this check,
  by defining the <code>__NO_PRINTF_CHECKS</code> macro. 
  </para>
  <para>
  <IMG SRC="../images/warn.gif" ALT="warning"/>
  These <code>printf</code> and <code>scanf</code> format string checks
  have not yet been implemented in the C++ producer due to presence
  of an alternative, type checked, I/O package - namely 
  <code>&lt;iostream&gt;</code>.  The format string types are simply
  treated as <code>const char *</code>. 
  </para>
  </sect3>  
  
  <sect3 id="typedef">
    <title>2.2.29. Type declarations</title>
  <para>
  C does not allow multiple definitions of a <code>typedef</code> name,
  whereas C++ allows multiple consistent definitions.  This behaviour
  can be controlled using the directive: 
  <programlisting>
        #pragma TenDRA extra type definition <I>allow</I>
  </programlisting>
  </para>
  </sect3>  
  
  <sect3 id="compatible">
    <title>2.2.30. Type compatibility</title>
  <para>
  The directive: 
  <programlisting>
        #pragma TenDRA incompatible type qualifier <I>allow</I>
  </programlisting>
  allows objects to be redeclared with different cv-qualifiers (normally
  such redeclarations would be incompatible).  The composite type is
  qualified using the join of the cv-qualifiers in the various redeclarations.
  </para>
  <para>
  The directive: 
  <programlisting>
        #pragma TenDRA compatible type : <I>type-id</I> == <I>type-id</I> : <I>allow
  </I>
  </programlisting>
  asserts that the given two types are compatible.  Currently the only
  implemented version is <code>char * == void *</code> which enables
  <code>char *</code> to be used as a generic pointer as it was in older
  dialects of C. 
  </para>
  </sect3>  
  
  <sect3 id="complete">
    <title>2.2.31. Incomplete types</title>
  <para>
  Some dialects of C allow incomplete arrays as member types.  These
  are generally used as a place-holder at the end of a structure to
  allow for the allocation of an arbitrarily sized array.  Support for
  this feature can be enabled using the directive: 
  <programlisting>
        #pragma TenDRA incomplete type as object type <I>allow</I>
  </programlisting>
  </para>
  </sect3>  
  
  <sect3 id="type-conversions">
    <title>2.2.32. Type conversions</title>
  <para>
  There are a number of directives which allow various classes of type
  conversion to be checked.  The directives: 
  <programlisting>
        #pragma TenDRA conversion analysis (int-int explicit) <I>on</I>
        #pragma TenDRA conversion analysis (int-int implicit) <I>on</I>
  </programlisting>
  will check for unsafe explicit or implicit conversions between arithmetic
  types.  Similarly conversions between pointers and arithmetic types
  can be checked using: 
  <programlisting>
        #pragma TenDRA conversion analysis (int-pointer explicit) <I>on</I>
        #pragma TenDRA conversion analysis (int-pointer implicit) <I>on</I>
  </programlisting>
  or equivalently: 
  <programlisting>
        #pragma TenDRA conversion analysis (pointer-int explicit) <I>on</I>
        #pragma TenDRA conversion analysis (pointer-int implicit) <I>on</I>
  </programlisting>
  Conversions between pointer types can be checked using: 
  <programlisting>
        #pragma TenDRA conversion analysis (pointer-pointer explicit) <I>on</I>
        #pragma TenDRA conversion analysis (pointer-pointer implicit) <I>on</I>
  </programlisting>
  </para>
  <para>
  There are some further variants which can be used to enable useful
  sets of conversion checks.  For example: 
  <programlisting>
        #pragma TenDRA conversion analysis (int-int) <I>on</I>
  </programlisting>
  enables both implicit and explicit arithmetic conversion checks. 
  The directives: 
  <programlisting>
        #pragma TenDRA conversion analysis (int-pointer) <I>on</I>
        #pragma TenDRA conversion analysis (pointer-int) <I>on</I>
        #pragma TenDRA conversion analysis (pointer-pointer) <I>on</I>
  </programlisting>
  are equivalent to their corresponding explicit forms (because the
  implicit forms are illegal by default).  The directive: 
  <programlisting>
        #pragma TenDRA conversion analysis <I>on</I>
  </programlisting>
  is equivalent to the four directives just given.  It enables checks
  on implicit and explicit arithmetic conversions, explicit arithmetic
  to pointer conversions and explicit pointer conversions. 
  </para>
  <para>
  The default settings for these checks are determined by the implicit
  and explicit conversions allowed in C++.  Note that there are differences
  between the conversions allowed in C and C++.  For example, an arithmetic
  type can be converted implicitly to an enumeration type in C, but
  not in C++.  The directive: 
  <programlisting>
        #pragma TenDRA conversion analysis (int-enum implicit) <I>on</I> 
  </programlisting>
  can be used to control the status of this conversion.  The level of
  severity for an error message arising from such a conversion is the
  maximum of the severity set by this directive and that set by the
  <code>int-int implicit</code> directive above. 
  </para>
  <para>
  The implicit pointer conversions described above do not include conversions
  to and from the generic pointer <code>void *</code>, which have their
  own controlling directives.  A pointer of type <code>void *</code>
  can be converted implicitly to another pointer type in C but not in
  C++; this is controlled by the directive: 
  <programlisting>
        #pragma TenDRA++ conversion analysis (void*-pointer implicit) <I>on</I> 
  </programlisting>
  The reverse conversion, from a pointer type to <code>void *</code>
  is allowed in both C and C++, and has a controlling directive: 
  <programlisting>
        #pragma TenDRA++ conversion analysis (pointer-void* implicit) <I>on</I> 
  </programlisting>
  </para>
  <para>
  In ISO C and C++, a function pointer can only be cast to other function
  pointers, not to object pointers or <code>void *</code>.  Many dialects
  however allow function pointers to be cast to and from other pointers.
  This behaviour can be controlled using the directive: 
  <programlisting>
        #pragma TenDRA function pointer as pointer <I>allow</I>
  </programlisting>
  which causes function pointers to be treated in the same way as all
  other pointers. 
  </para>
  <para>
  The integer conversion checks described above only apply to unsafe
  conversions.  A simple-minded check for shortening conversions is
  not adequate, as is shown by the following example: 
  <programlisting>
        char a = 1, b = 2 ;
        char c = a + b ;
  </programlisting>
  the sum <code>a + b</code> is evaluated as an <code>int</code> which
  is then shortened to a <code>char</code>.  Any check which does not
  distinguish this sort of &quot;safe&quot; shortening conversion from
  unsafe shortening conversions such as: 
  <programlisting>
        int a = 1, b = 2 ;
        char c = a + b ;
  </programlisting>
  is not likely to be very useful.  The producer therefore associates
  two types with each integral expression; the first is the normal,
  representation type and the second is the underlying, semantic type.
  Thus in the first example, the representation type of <code>a + b</code>
  is <code>int</code>, but semantically it is still a <code>char</code>.
  The conversion analysis is based on the semantic types. 
  </para>
  <para>
  <IMG SRC="../images/warn.gif" ALT="warning"/>
  The C producer supports a directive: 
  <programlisting>
        #pragma TenDRA keyword <I>identifier</I> for type representation
  </programlisting>
  whereby a keyword can be introduced which can be used to explicitly
  declare a type with given representation and semantic components.
  Unfortunately this makes the <A HREF="parse.html">C++ grammar</A>
  ambiguous, so it has not yet been implemented in the C++ producer.
  </para>
  <para>
  It is possible to allow individual conversions by means of conversion
  tokens.  A <A HREF="token.html">procedure token</A> which takes one
  rvalue expression program parameter and returns an rvalue expression,
  such as: 
  <programlisting>
        #pragma token PROC ( EXP : t : ) EXP : s : conv #
  </programlisting>
  can be regarded as mapping expressions of type <code>t</code> to expressions
  of type <code>s</code>.  The directive: 
  <programlisting>
        #pragma TenDRA conversion <I>identifier-list</I> allow
  </programlisting>
  can be used to nominate such a token as a conversion token.  That
  is to say, if the conversion, whether explicit or implicit, from <code>t</code>
  to <code>s</code> cannot be done by other means, it is done by applying
  the token <code>conv</code>, so: 
  <programlisting>
        t a ;
        s b = a ;               // maps to conv ( a )
  </programlisting>
  Note that, unlike conversion functions, conversion tokens can be applied
  to any types. 
  </para>
  </sect3>  
  
  <sect3 id="cast">
    <title>2.2.33. Cast expressions</title>
  <para>
  ISO C++ introduces the constructs <code>static_cast</code>, 
  <code>const_cast</code> and <code>reinterpret_cast</code>, which can
  be used in various contexts where an old style explicit cast would
  previously have been used.  By default, an explicit cast can perform
  any combination of the conversions performed by these three constructs.
  To aid migration to the new style casts the directives: 
  <programlisting>
        #pragma TenDRA++ explicit cast as <I>cast-state allow</I> 
        #pragma TenDRA++ explicit cast <I>allow</I> 
  </programlisting>
  where <I>cast-state</I> is defined as follows: 
  <programlisting>
        <I>cast-state</I> :
                static_cast
                const_cast
                reinterpret_cast
                static_cast | <I>cast-state</I>
                const_cast | <I>cast-state</I>
                reinterpret_cast | <I>cast-state</I>
  </programlisting>
  can be used to restrict the conversions which can be performed using
  explicit casts.  The first form sets the interpretation of explicit
  cast to be combinations of the given constructs; the second resets
  the interpretation to the default.  For example: 
  <programlisting>
        #pragma TenDRA++ explicit cast as static_cast | const_cast allow
  </programlisting>
  means that conversions requiring <code>reinterpret_cast</code> (the
  most unportable conversions) will not be allowed to be performed using
  explicit casts, but will have to be given as a <code>reinterpret_cast</code>
  construct.  Changing <code>allow</code> to <code>warning</code> will
  also cause a warning to be issued for every explicit cast expression.
  </para>
  </sect3>  
  
  <sect3 id="ellipsis">
    <title>2.2.34. Ellipsis functions</title>
  <para>
  The directive: 
  <programlisting>
        #pragma TenDRA ident ... <I>allow</I>
  </programlisting>
  may be used to enable or disable the use of <code>...</code> as a
  primary expression in a function defined with ellipsis.  The type
  of such an expression is implementation defined.  This expression
  is used in the definition of the <A HREF="lib.html#ellipsis"><code>va_start
  </code>
  macro</A> in the <code>&lt;stdarg.h&gt;</code> header.  This header
  automatically enables this switch. 
  </para>
  </sect3>  
  
  <sect3 id="overload">
    <title>2.2.35. Overloaded functions</title>
  <para>
  Older dialects of C++ did not report ambiguous overloaded function
  resolutions, but instead resolved the call to the first of the most
  viable candidates to be declared.  This behaviour can be controlled
  using the directive: 
  <programlisting>
        #pragma TenDRA++ ambiguous overload resolution <I>allow</I> 
  </programlisting>
  There are occasions when the resolution of an overloaded function
  call is not clear.  The directive: 
  <programlisting>
        #pragma TenDRA++ overload resolution <I>allow</I> 
  </programlisting>
  can be used to report the resolution of any such call (whether explicit
  or implicit) where there is more than one viable candidate. 
  </para>
  <para>
  An interesting consequence of compiling C++ in a target independent
  manner is that certain overload resolutions can only be determined
  at install-time. For example, in: 
  <programlisting>
        int f ( int ) ;
        int f ( unsigned int ) ;
        int f ( long ) ;
        int f ( unsigned long ) ;
  
        int a = f ( sizeof ( int ) ) ;  // which f?
  </programlisting>
  the type of the <code>sizeof</code> operator, <code>size_t</code>,
  is target dependent, but its promotion must be one of the types 
  <code>int</code>, <code>unsigned int</code>, <code>long</code> or
  <code>unsigned long</code>.  Thus the call to <code>f</code> always
  has a unique resolution, but what it is is target dependent.  The
  equivalent directives: 
  <programlisting>
        #pragma TenDRA++ conditional overload resolution <I>allow</I> 
        #pragma TenDRA++ conditional overload resolution (complete) <I>allow</I> 
  </programlisting>
  can be used to warn about such target dependent overload resolutions.
  By default, such resolutions are only allowed if there is a unique
  resolution for each possible implementation of the argument types
  (note that, for simplicity, the possibility of <code>long long</code>
  implementation types is ignored).  The directive: 
  <programlisting>
        #pragma TenDRA++ conditional overload resolution (incomplete) <I>allow</I> 
  </programlisting>
  can be used to allow target dependent overload resolutions which only
  have resolutions for some of the possible implementation types (if
  one of the <code>f</code> declarations above was removed, for example).
  If the implementation does not match one of these types then an install-time
  error is given. 
  </para>
  <para>
  There are restrictions on the set of candidate functions involved
  in a target dependent overload resolution.  Most importantly, it should
  be possible to bring their return types to a common type, as if by
  a series of <code>?:</code> operations.  This common type is the type
  of the target dependent call.  By this means, target dependent types
  are prevented from propagating further out into the program.  Note
  that since sets of overloaded functions usually have the same semantics,
  this does not usually present a problem. 
  </para>
  </sect3>  
  
  <sect3 id="expressions">
    <title>2.2.36. Expressions</title>
  <para>
  The directive: 
  <programlisting>
        #pragma TenDRA operator precedence analysis <I>on</I> 
  </programlisting>
  can be used to enable a check for expressions where the operator precedence
  is not necessarily what might be expected.  The intended precedence
  can be clarified by means of explicit parentheses.  The precedence
  levels checked are as follows: 
  <itemizedlist>
  <listitem><code>&amp;&amp;</code> versus <code>||</code>. 
  </listitem>
  <listitem><code>&lt;&lt;</code> and <code>&gt;&gt;</code> versus binary
  <code>+</code> and <code>-</code>. 
  </listitem>
  <listitem>Binary <code>&amp;</code> versus binary <code>+</code>,     <code>-</code>,
  <code>==</code>, <code>!=</code>, <code>&gt;</code>,     <code>&gt;=</code>,
  <code>&lt;</code> and <code>&lt;=</code>. 
  </listitem>
  <listitem><code>^</code> versus binary <code>&amp;</code>, <code>+</code>,
  <code>-</code>, <code>==</code>, <code>!=</code>, <code>&gt;</code>,
  <code>&gt;=</code>, <code>&lt;</code> and <code>&lt;=</code>. 
  </listitem>
  <listitem><code>|</code> versus binary <code>^</code>, <code>&amp;</code>,
  <code>+</code>, <code>-</code>, <code>==</code>, <code>!=</code>,
  <code>&gt;</code>, <code>&gt;=</code>, <code>&lt;</code> and     <code>&lt;=
  </code>. 
  </listitem>
  </itemizedlist>
  Also checked are expressions such as <code>a &lt; b &lt; c</code>
  which do not have their normal mathematical meaning.  For example,
  in: 
  <programlisting>
        d = a &lt;&lt; b + c ;  // precedence is a &lt;&lt; ( b + c )
  </programlisting>
  the precedence is counter-intuitive, although strangely enough, it
  isn't in: 
  <programlisting>
        cout &lt;&lt; b + c ;           // precedence is cout &lt;&lt; ( b + c )
  </programlisting>
  </para>
  <para>
  Other dubious arithmetic operations can be checked for using the directive:
  <programlisting>
        #pragma TenDRA integer operator analysis <I>on</I>
  </programlisting>
  This includes checks for operations, such as division by a negative
  value, which are implementation dependent, and those such as testing
  whether an unsigned value is less than zero, which serve no purpose.
  Similarly the directive: 
  <programlisting>
        #pragma TenDRA++ pointer operator analysis <I>on</I> 
  </programlisting>
  checks for dubious pointer operations.  This includes very simple
  bounds checking for arrays and checking that only the simple literal
  <code>0</code>
  is used in null pointer constants: 
  <programlisting>
        char *p = 1 - 1 ;       // valid, but weird
  </programlisting>
  </para>
  <para>
  The directive: 
  <programlisting>
        #pragma TenDRA integer overflow analysis <I>on</I>
  </programlisting>
  is used to control the treatment of overflows in the evaluation of
  integer constant expressions.  This includes the detection of division
  by zero. 
  </para>
  </sect3>  
  
  <sect3 id="initialiser-expressions">
    <title>2.2.37. Initialiser expressions</title>
  <para>
  C, but not C++, only allows constant expressions in static initialisers.
  The directive: 
  <programlisting>
        #pragma TenDRA variable initialization <I>allow</I>
  </programlisting>
  can be enable support for C++-style dynamic initialisers.  Conversely,
  it can be used in C++ to detect such dynamic initialisers. 
  </para>
  <para>
  In older dialects of C it was not possible to initialise an automatic
  variable of structure or union type.  This can be checked for using
  the directive: 
  <programlisting>
        #pragma TenDRA initialization of struct/union (auto) <I>allow</I>
  </programlisting>
  </para>
  <para>
  The directive: 
  <programlisting>
        #pragma TenDRA++ complete initialization analysis <I>on</I> 
  </programlisting>
  can be used to check aggregate initialisers.  The initialiser should
  be fully bracketed (i.e. with no elision of braces), and should have
  an entry for each member of the structure or array. 
  </para>
  </sect3>  
  
  <sect3 id="lvalue">
    <title>2.2.38. Lvalue expressions</title>
  <para>
  C++ defines the results of several operations to be lvalues, whereas
  they are rvalues in C.  The directive: 
  <programlisting>
        #pragma TenDRA conditional lvalue <I>allow</I>
  </programlisting>
  is used to apply the C++ rules for lvalues in conditional (<code>?:</code>)
  expressions. 
  </para>
  <para>
  Older dialects of C++ allowed <code>this</code> to be treated as an
  lvalue. It is possible to enable support for this dialect feature
  using the directive: 
  <programlisting>
        #pragma TenDRA++ this lvalue <I>allow</I> 
  </programlisting>
  however it is recommended that programs using this feature should
  be modified. 
  </para>
  </sect3>  
  
  <sect3 id="discard">
    <title>2.2.39. Discarded expressions</title>
  <para>
  The directive: 
  <programlisting>
        #pragma TenDRA discard analysis <I>on</I>
  </programlisting>
  can be used to enable a check for values which are calculated but
  not used.  There are three checks controlled by this directive, each
  of which can be controlled independently.  The directive: 
  <programlisting>
        #pragma TenDRA discard analysis (function return) <I>on</I>
  </programlisting>
  checks for functions which return a value which is not used.  The
  check needs to be enabled for both the declaration and the call of
  the function in order for a discarded function return to be reported.
  Discarded returns for overloaded operator functions are never reported.
  The directive: 
  <programlisting>
        #pragma TenDRA discard analysis (value) <I>on</I>
  </programlisting>
  checks for other expressions which are not used.  Finally, the directive:
  <programlisting>
        #pragma TenDRA discard analysis (static) <I>on</I>
  </programlisting>
  checks for variables with internal linkage which are defined but not
  used. 
  </para>
  <para>
  An unused function return or other expression can be asserted to be
  deliberately discarded by explicitly casting it to <code>void</code>
  or, equivalently, preceding it by a keyword introduced using the directive:
  <programlisting>
        #pragma TenDRA keyword <I>identifier</I> for discard value
  </programlisting>
  A static variable can be asserted to be deliberately unused by including
  it in list of identifiers in a directive of the form: 
  <programlisting>
        #pragma TenDRA suspend static <I>identifier-list</I>
  </programlisting>
  </para>
  </sect3>  
  
  <sect3 id="if">
    <title>2.2.40. Conditional and iteration statements</title>
  <para>
  The directive: 
  <programlisting>
        #pragma TenDRA const conditional <I>allow</I> 
  </programlisting>
  can be used to enable a check for constant expressions used in conditional
  contexts.  A literal constant is allowed in the condition of a <code>while
  </code>, <code>for</code> or <code>do</code> statement to allow for
  such common constructs as: 
  <programlisting>
        while ( true ) {
            // while statement body
        }
  </programlisting>
  and target dependent constant expressions are allowed in the condition
  of an <code>if</code> statement, but otherwise constant conditions
  are reported according to the status of this check. 
  </para>
  <para>
  The common error of writing <code>=</code> rather than <code>==</code>
  in conditions can be detected using the directive: 
  <programlisting>
        #pragma TenDRA assignment as bool <I>allow</I>
  </programlisting>
  which can be used to disallow such assignment expressions in contexts
  where a boolean is expected.  The error message can be suppressed
  by enclosing the assignment within parentheses. 
  </para>
  <para>
  Another common error associated with iteration statements, particularly
  with certain <A HREF="style.html">heretical</A> brace styles, is the
  accidental insertion of an extra semicolon as in: 
  <programlisting>
        for ( init ; cond ; step ) ;
        {
            // for statement body
        }
  </programlisting>
  The directive: 
  <programlisting>
        #pragma TenDRA extra ; after conditional <I>allow</I>
  </programlisting>
  can be used to enable a check for such suspicious empty iteration
  statement bodies (it actually checks for <code>;{</code>). 
  </para>
  </sect3>  
  
  <sect3 id="switch">
    <title>2.2.41. Switch statements</title>
  <para>
  A <code>switch</code> statement is said to be exhaustive if its control
  statement is guaranteed to take one of the values of its 
  <code>case</code> labels, or if it has a <code>default</code> label.
  The TenDRA C and C++ producers allow a <code>switch</code> statement
  to be asserted to be exhaustive using the syntax: 
  <programlisting>
        switch ( cond ) EXHAUSTIVE {
            // switch statement body
        }
  </programlisting>
  where <code>EXHAUSTIVE</code> is either the directive: 
  <programlisting>
        #pragma TenDRA exhaustive
  </programlisting>
  or a keyword introduced using: 
  <programlisting>
        #pragma TenDRA keyword <I>identifier</I> for exhaustive
  </programlisting>
  Knowing whether a <code>switch</code> statement is exhaustive or not
  means that checks relying on flow analysis (including variable usage
  checks) can be applied more precisely. 
  </para>
  <para>
  In certain circumstances it is possible to deduce whether a 
  <code>switch</code> statement is exhaustive or not.  For example,
  the directive: 
  <programlisting>
        #pragma TenDRA enum switch analysis <I>on</I> 
  </programlisting>
  enables a check on <code>switch</code> statements on values of enumeration
  type.  Such statements should be exhaustive, either explicitly by
  using the <code>EXHAUSTIVE</code> keyword or declaring a 
  <code>default</code> label, or implicitly by having a <code>case</code>
  label for each enumerator.  Conversely, the value of each <code>case</code>
  label should equal the value of an enumerator.  For the purposes of
  this check, boolean values are treated as if they were declared using
  an enumeration type of the form: 
  <programlisting>
        enum bool { false = 0, true = 1 } ;
  </programlisting>
  </para>
  <para>
  A common source of errors in <code>switch</code> statements is the
  fall-through from one <code>case</code> or <code>default</code>
  statement to the next.  A check for this can be enabled using: 
  <programlisting>
        #pragma TenDRA fall into case <I>allow</I>
  </programlisting>
  <code>case</code> or <code>default</code> labels where fall-through
  from the previous statement is intentional can be marked by preceding
  them by a keyword, <code>FALL_THRU</code> say, introduced using the
  directive: 
  <programlisting>
        #pragma TenDRA keyword <I>identifier</I> for fall into case
  </programlisting>
  </para>
  </sect3>  
  
  <sect3 id="for">
    <title>2.2.42. For statements</title>
  <para>
  In ISO C++ the scope of a variable declared in a for-init-statement
  is the body of the <code>for</code> statement; in older dialects it
  extended to the end of the enclosing block.  So: 
  <programlisting>
        for ( int i = 0 ; i &lt; 10 ; i++ ) {
            // for statement body
        }
        return i ;      // OK in older dialects, error in ISO C++
  </programlisting>
  This behaviour is controlled by the directive: 
  <programlisting>
        #pragma TenDRA++ for initialization block <I>on</I> 
  </programlisting>
  a state of <code>on</code> corresponding to the ISO rules and 
  <code>off</code> to the older rules.  Perhaps most useful is the 
  <code>warning</code> state which implements the old rules but gives
  a warning if a variable declared in a for-init-statement is used outside
  the corresponding <code>for</code> statement body.  A program which
  does not give such warnings should compile correctly under either
  set of rules. 
  </para>
  </sect3>  
  
  <sect3 id="return">
    <title>2.2.43. Return statements</title>
  <para>
  In C, but not in C++, it is possible to have a <code>return</code>
  statement without an expression in a function which does not return
  <code>void</code>.  It is possible to enable this behaviour using
  the directive: 
  <programlisting>
        #pragma TenDRA incompatible void return <I>allow</I>
  </programlisting>
  Note that this check includes the implicit <code>return</code> caused
  by falling off the end of a function.  The effect of such a 
  <code>return</code> statement is undefined.  The C++ rule that falling
  off the end of <code>main</code> is equivalent to returning a value
  of 0 overrides this check. 
  </para>
  </sect3>  
  
  <sect3 id="reach">
    <title>2.2.44. Unreached code analysis</title>
  <para>
  The directive: 
  <programlisting>
        #pragma TenDRA unreachable code <I>allow</I>
  </programlisting>
  enables a flow analysis check to detect unreachable code.  It is possible
  to assert that a statement is reached or not reached by preceding
  it by a keyword introduced by one of the directives: 
  <programlisting>
        #pragma TenDRA keyword <I>identifier</I> for set reachable
        #pragma TenDRA keyword <I>identifier</I> for set unreachable
  </programlisting>
  </para>
  <para>
  The fact that certain functions, such as <code>exit</code>, do not
  return a value can be exploited in the flow analysis routines.  The
  equivalent directives: 
  <programlisting>
        #pragma TenDRA bottom <I>identifier</I>
        #pragma TenDRA++ type <I>identifier</I> for bottom
  </programlisting>
  can be used to introduce a <code>typedef</code> declaration for the
  type, bottom, returned by such functions.  The TenDRA API headers
  declare 
  <code>exit</code> and similar functions in this way, for example:
  <programlisting>
        #pragma TenDRA bottom __bottom
        __bottom exit ( int ) ;
        __bottom abort ( void ) ;
  </programlisting>
  The bottom type is compatible with <code>void</code> in function declarations
  to allow such functions to be redeclared in their conventional form.
  </para>
  </sect3>  
  
  <sect3 id="variable">
    <title>2.2.45. Variable flow analysis</title>
  <para>
  The directive: 
  <programlisting>
        #pragma TenDRA variable analysis <I>on</I>
  </programlisting>
  enables checks on the uses of automatic variables and function parameters.
  These checks detect: 
  <itemizedlist>
  <listitem>If a variable is not used in its scope. 
  </listitem>
  <listitem>If the value of a variable is used before it has been assigned
  to. 
  </listitem>
  <listitem>If a variable is assigned to twice without an intervening use.
  </listitem>
  <listitem>If a variable is assigned to twice without an intervening sequence
  point. 
  </listitem>
  </itemizedlist>
  as illustrated by the variables <code>a</code>, <code>b</code>, 
  <code>c</code> and <code>d</code> respectively in: 
  <programlisting>
        void f ()
        {
            int a ;                     // a never used
            int b ;
            int c = b ;                 // b not initialised
            c = 0 ;                     // c assigned to twice
            int d = 0 ;
            d = ++d ;                   // d assigned to twice
        }
  </programlisting>
  The second, and more particularly the third, of these checks requires
  some fairly sophisticated flow analysis, so any hints which can be
  picked up from <A HREF="#switch">exhaustive <code>switch</code>
  statements</A> etc. is likely to increase the accuracy of the errors
  detected. 
  </para>
  <para>
  In a non-static member function the various non-static data members
  are analysed as if they were automatic variables.  It is checked that
  each member is initialised in a constructor.  A common source of initialisation
  problems in a constructor is that the base classes and members are
  initialised in the canonical order of virtual bases, non-virtual direct
  bases and members in the order of their declaration, rather than in
  the order in which their initialisers appear in the constructor definition.
  Therefore a check that the initialisers appear in the canonical order
  is also applied. 
  </para>
  <para>
  It is possible to change the state of a variable during the variable
  analysis using the directives: 
  <programlisting>
        #pragma TenDRA set <I>expression</I>
        #pragma TenDRA discard <I>expression</I>
  </programlisting>
  The first asserts that the variable given by the <I>expression</I>
  has been assigned to; the second asserts that the variable is not
  used.  An alternative way of expressing this is by means of keywords:
  <programlisting>
        SET ( <I>expression</I> )
        DISCARD ( <I>expression</I> )
  </programlisting>
  introduced using the directives. 
  <programlisting>
        #pragma TenDRA keyword <I>identifier</I> for set
        #pragma TenDRA keyword <I>identifier</I> for discard variable
  </programlisting>
  respectively.  These expressions can appear in expression statements
  and as the first argument of a comma expression. 
  </para>
  <para>
  <IMG SRC="../images/warn.gif" ALT="warning"/>
  The variable flow analysis checks have not yet been completely implemented.
  They may not detect errors in certain circumstances and for extremely
  convoluted code may occasionally give incorrect errors. 
  </para>
  </sect3>  
  
  <sect3 id="hide">
    <title>2.2.46. Variable hiding</title>
  <para>
  The directive: 
  <programlisting>
        #pragma TenDRA variable hiding analysis <I>on</I>
  </programlisting>
  can be used to enable a check for hiding of other variables and, in
  member functions, data members, by local variable declarations. 
  </para>
  </sect3>  
  
  <sect3 id="exception">
    <title>2.2.47. Exception analysis</title>
  <para>
  The ISO C++ rules do not require exception specifications to be checked
  statically.  This is to facilitate the integration of large systems
  where a single change in an exception specification could have ramifications
  throughout the system.  However it is often useful to apply such checks,
  which can be enabled using the directive: 
  <programlisting>
        #pragma TenDRA++ throw analysis <I>on</I>
  </programlisting>
  This detects any potentially uncaught exceptions and other exception
  problems.  In the error messages arising from this check, an uncaught
  exception of type <code>...</code> means that an uncaught exception
  of an unknown type (arising, for example, from a function without
  an exception specification) may be thrown.  For example: 
  <programlisting>
        void f ( int ) throw ( int ) ;
        void g ( int ) throw ( long ) ;
        void h ( int ) ;
  
        void e () throw ( int )
        {
            f ( 1 ) ;                   // OK
            g ( 2 ) ;                   // uncaught 'long' exception
            h ( 3 ) ;                   // uncaught '...' exception
        }
  </programlisting>
  </para>
  </sect3>  
  
  <sect3 id="template">
    <title>2.2.48. Template compilation</title>
  <para>
  The C++ producer makes the distinction between exported templates,
  which may be used in one module and defined in another, and non-exported
  templates, which must be defined in every module in which they are
  used. As in the ISO C++ standard, the <code>export</code> keyword
  is used to distinguish between the two cases.  In the past, different
  compilers have had different template compilation models; either all
  templates were exported or no templates were exported.  The latter
  is easily emulated - if the <code>export</code> keyword is not used
  then no templates will be exported.  To emulate the former behaviour
  the directive: 
  <programlisting>
        #pragma TenDRA++ implicit export template <I>on</I>
  </programlisting>
  can be used to treat all templates as if they had been declared using
  the <code>export</code> keyword. 
  </para>
  <para>
  <IMG SRC="../images/warn.gif" ALT="warning"/>
  The automatic instantiation of exported templates has not yet been
  implemented correctly.  It is intended that such instantiations will
  be generated during <A HREF="link.html">intermodule analysis</A>
  (where they conceptually belong).  At present it is necessary to work
  round this using explicit instantiations. 
  </para>
  </sect3>  
  
  <sect3 id="catch_all">
    <title>2.2.49. Other checks</title>
  <para>
  Several checks of varying utility have been implemented in the C++
  producer but do not as yet have individual directives controlling
  their use.  These can be enabled <I>en masse</I> using the directive:
  <programlisting>
        #pragma TenDRA++ catch all <I>allow</I> 
  </programlisting>
  It is intended that this directive will be phased out as these checks
  are assigned controlling directives.  It is possible to achieve finer
  control over these checks by enabling their individual error messages
  <A HREF="#low">as described above</A>. 
  </para>
  </sect3>
  </sect2>
  
  <sect2 id="token">
    <title>2.3. Token syntax</title>
  <para>
  The C and C++ producers allow place-holders for various categories
  of syntactic classes to be expressed using directives of the form:
  <programlisting>
        #pragma TenDRA token <I>token-spec</I>
  </programlisting>
  or simply: 
  <programlisting>
        #pragma token <I>token-spec</I>
  </programlisting>
  These place-holders are represented as TDF tokens and hence are called
  tokens.  These tokens stand for a certain type, expression or whatever
  which is to be represented by a certain named TDF token in the producer
  output.  This mechanism is used, for example, to allow C API specifications
  to be represented target independently.  The types, functions and
  expressions comprising the API can be described using <code>#pragma
  token</code> directives and the target dependent definitions of these
  tokens, representing the implementation of the API on a particular
  machine, can be linked in later.  This mechanism is described in detail
  elsewhere. 
  </para>
  <para>
  A <A HREF="pragma1.html#token">summary of the grammar</A> for the
  <code>#pragma token</code> directives accepted by the C++ producer
  is given as an annex. 
  </para>
  
  
  <sect3 id="spec">
    <title>2.3.1. Token specifications</title>
  <para>
  A token specification is divided into two components, a 
  <I>token-introduction</I> giving the token sort, and a 
  <I>token-identification</I> giving the internal and external token
  names: 
  <programlisting>
        <I>token-spec</I> :
                <I>token-introduction token-identification</I>
  
        <I>token-introduction</I> :
                <I>exp-token</I>
                <I>statement-token</I>
                <I>type-token</I>
                <I>member-token</I>
                <I>procedure-token</I>
  
        <I>token-identification</I> :
                <I>token-namespace<SUB>opt</SUB> identifier</I> # <I>external-identifier<SUB>opt</SUB></I>
  
        <I>token-namespace</I> :
                TAG
  
        <I>external-identifier</I> :
                -
                <I>preproc-token-list</I>
  </programlisting>
  The <code>TAG</code> qualifier is used to indicate that the internal
  name lies in the C tag namespace.  This only makes sense for structure
  and union types.  The external token name can be given by any sequence
  of preprocessing tokens.  These tokens are not macro expanded.  If
  no external name is given then the internal name is used.  The special
  external name <code>-</code> is used to indicate that the token does
  not have an associated external name, and hence is local to the current
  translation unit.  Such a local token must be defined.  White space
  in the external name (other than at the start or end) is used to indicate
  that a TDF unique name should be used.  The white space serves as
  a separator for the unique name components. 
  </para>
  
  <H4><A id="expression-tokens">Expression tokens</A></H4>
  <para>
  Expression tokens are specified as follows: 
  <programlisting>
        <I>exp-token</I> :
                EXP <I>exp-storage<SUB>opt</SUB></I> : <I>type-id</I> :
                NAT
                INTEGER
  </programlisting>
  representing a expression of the given type, a non-negative integer
  constant and general integer constant, respectively.  Each expression
  has an associated storage class: 
  <programlisting>
        <I>exp-storage</I> :
                lvalue
                rvalue
                const
  </programlisting>
  indicating whether it is an lvalue, an rvalue or a compile-time constant
  expression.  An absent <I>exp-storage</I> is equivalent to 
  <code>rvalue</code>.  All expression tokens lie in the macro namespace;
  that is, they may potentially be defined as macros. 
  </para>
  <para>
  For backwards compatibility with the C producer, the directive:
  <programlisting>
        #pragma TenDRA++ rvalue token as const <I>allow</I>
  </programlisting>
  causes <code>rvalue</code> tokens to be treated as <code>const</code>
  tokens.</para>
  
  <H4>Statement tokens</H4>
  <para>
  Statement tokens are specified as follows: 
  <programlisting>
        <I>statement-token</I> :
                STATEMENT
  </programlisting>
  All statement tokens lie in the macro namespace. 
  </para>
  
  <H4>Type tokens</H4>
  <para>
  Type tokens are specified as follows: 
  <programlisting>
        <I>type-token</I> :
                TYPE
                VARIETY
                VARIETY signed
                VARIETY unsigned
                FLOAT
                ARITHMETIC
                SCALAR
                CLASS
                STRUCT
                UNION
  </programlisting>
  representing a generic type, an integral type, a signed integral type,
  an unsigned integral type, a floating point type, an arithmetic (integral
  or floating point) type, a scalar (arithmetic or pointer) type, a
  class type, a structure type and a union type respectively. 
  </para>
  <para>
  <IMG SRC="../images/warn.gif" ALT="warning"/>
  Floating-point, arithmetic and scalar token types have not yet been
  implemented correctly in either the C or C++ producers. 
  </para>
  
  <H4><A id="member">Member tokens</A></H4>
  <para>
  Member tokens are specified as follows: 
  <programlisting>
        <I>member-token</I> :
                MEMBER <I>access-specifier<SUB>opt</SUB> member-type-id</I> : <I>type-id</I> :
  </programlisting>
  where an <I>access-specifier</I> of <code>public</code> is assumed
  if none is given.  The member type is given by: 
  <programlisting>
        <I>member-type-id</I> :
                <I>type-id</I>
                <I>type-id</I> % <I>constant-expression</I>
  </programlisting>
  where <code>%</code> is used to denote bitfield members (since 
  <code>:</code> is used as a separator).  The second type denotes the
  structure or union the given member belongs to.  Different types can
  have members with the same internal name, but the external token name
  must be unique.  Note that only non-static data members can be represented
  in this form. 
  </para>
  <para>
  Two declarations for the same <code>MEMBER</code> token (including token
  definitions) should have the same type, however the directive:
  <programlisting>
        #pragma TenDRA++ incompatible member declaration <I>allow</I>
  </programlisting>
  allows declarations with different types, provided these types have the
  same size and alignment requirements.
  </para>
  
  <H4>Procedure tokens</H4>
  <para>
  Procedure, or high-level, tokens are specified in one of three ways:
  <programlisting>
        <I>procedure-token</I> :
                <I>general-procedure</I>
                <I>simple-procedure</I>
                <I>function-procedure</I>
  </programlisting>
  All procedure tokens (except ellipsis functions - see below) lie in
  the macro namespace.  The most general form of procedure token specifies
  two sets of parameters.  The bound parameters are those which are
  used in encoding the actual TDF output, and the program parameters
  are those which are <A HREF="#args">specified in the program</A>.
  The program parameters are expressed in terms of the bound parameters.
  A program parameter can be an expression token parameter, a statement
  token parameter, a member token parameter, a procedure token parameter
  or any type.  The bound parameters are deduced from the program parameters
  by a similar process to that used in template argument deduction.
  <programlisting>
        <I>general-procedure</I> :
                PROC { <I>bound-toks<SUB>opt</SUB></I> | <I>prog-pars<SUB>opt</SUB></I> } <I>token-introduction
  </I>
  
        <I>bound-toks</I> :
                <I>bound-token</I>
                <I>bound-token</I> , <I>bound-toks</I>
  
        <I>bound-token</I> :
                <I>token-introduction token-namespace<SUB>opt</SUB> identifier</I>
  
        <I>prog-pars</I> :
                <I>program-parameter</I>
                <I>program-parameter</I> , <I>prog-pars</I>
  
        <I>program-parameter</I> :
                EXP <I>identifier</I>
                STATEMENT <I>identifier</I>
                TYPE <I>type-id</I>
                MEMBER <I>type-id</I> : <I>identifier</I>
                PROC <I>identifier</I>
  </programlisting>
  </para>
  <para>
  The simplest form of a <I>general-procedure</I> is one in which the
  <I>prog-pars</I> correspond precisely to the <I>bound-toks</I>.  In
  this case the syntax: 
  <programlisting>
        <I>simple-procedure</I> :
                PROC ( <I>simple-toks<SUB>opt</SUB></I> ) <I>token-introduction</I>
  
        <I>simple-toks</I> :
                <I>simple-token</I>
                <I>simple-token</I> , <I>simple-toks</I>
  
        <I>simple-token</I> :
                <I>token-introduction token-namespace<SUB>opt</SUB> identifier<SUB>opt</SUB></I>
  </programlisting>
  may be used.  Note that the parameter names are optional. 
  </para>
  <para>
  A function token is specified as follows: 
  <programlisting>
        <I>function-procedure</I> :
                FUNC <I>type-id</I> :
  </programlisting>
  where the given type is a function type.  This has two effects: firstly
  a function with the given type is declared; secondly, if the function
  type has the form: 
  <programlisting>
        r ( p1, ...., pn )
  </programlisting>
  a procedure token with sort: 
  <programlisting>
        PROC ( EXP rvalue : p1 :, ...., EXP rvalue : pn : ) EXP rvalue : r :
  </programlisting>
  is declared.  For ellipsis function types only the function, not the
  token, is declared.  Note that the token behaves like a macro definition
  of the corresponding function.  Unless explicitly enclosed in a linkage
  specification, a function declared using a <code>FUNC</code>
  token has C linkage.  Note that it is possible for two <code>FUNC</code>
  tokens to have the same internal name, because of function overloading,
  however external names must be unique. 
  </para>
  <para>
  The directive: 
  <programlisting>
        #pragma TenDRA incompatible interface declaration <I>allow</I>
  </programlisting>
  can be used to allow incompatible redeclarations of functions declared
  using <code>FUNC</code> tokens.  The token declaration takes precedence.
  </para>
  <para>
  <IMG SRC="../images/warn.gif" ALT="warning"/>
  Certain of the more complex examples of <code>PROC</code> tokens such
  as, for example, tokens with <code>PROC</code> parameters, have not
  been implemented in either the C or C++ producers. 
  </para>
  </sect3>
  
  <sect3 id="token-arguments">
    <title>2.3.2. Token arguments</title>
  <para>
  As mentioned above, the program parameters for a <code>PROC</code>
  token are those specified in the program itself.  These arguments
  are expressed as a comma-separated list enclosed in brackets, the
  form of each argument being determined by the corresponding program
  parameter. 
  </para>
  <para>
  An <code>EXP</code> argument is an assignment expression.  This must
  be an lvalue for <code>lvalue</code> tokens and a constant expression
  for 
  <code>const</code> tokens.  The argument is converted to the token
  type (for <code>lvalue</code> tokens this is essentially a conversion
  between the corresponding reference types).  A <code>NAT</code> or
  <code>INTEGER</code> argument is an integer constant expression. 
  In the former case this must be non-negative. 
  </para>
  <para>
  A <code>STATEMENT</code> argument is a statement.  This statement
  should not contain any labels or any <code>goto</code> or <code>return</code>
  statements. 
  </para>
  <para>
  A type argument is a type identifier.  This must name a type of the
  correct category for the corresponding token.  For example, a 
  <code>VARIETY</code> token requires an integral type. 
  </para>
  <para>
  <A id="offset">A member argument must describe the offset of a member
  or nested member of the given structure or union type</A>.  The type
  of the member should agree with that of the <code>MEMBER</code> token.
  The general form of a member offset can be described in terms of member
  selectors and array indexes as follows: 
  <programlisting>
        <I>member-offset</I> :
                ::<I><SUB>opt</SUB> id-expression</I>
                <I>member-offset</I> . ::<I><SUB>opt</SUB> id-expression</I>
                <I>member-offset</I> [ <I>constant-expression</I> ]
  </programlisting>
  </para>
  <para>
  A <code>PROC</code> argument is an identifier.  This identifier must
  name a <code>PROC</code> token of the appropriate sort. 
  </para>
  </sect3>  
  
  <sect3 id="tokdef">
    <title>2.3.3. Defining tokens</title>
  <para>
  Given a token specification of a syntactic object and a normal language
  definition of the same object (including macro definitions if the
  token lies in the macro namespace), the producers attempt to unify
  the two by defining the TDF token in terms of the given definition.
  Whether the token specification occurs before or after the language
  definition is immaterial.  Unification also takes place in situations
  where, for example, two types are known to be compatible.  Multiple
  consistent explicit token definitions are allowed by default when
  allowed by the language; this is controlled by the directive: 
  <programlisting>
        #pragma TenDRA compatible token <I>allow</I>
  </programlisting>
  The default unification behaviour may be modified using the directives:
  <programlisting>
        #pragma TenDRA no_def <I>token-list</I>
        #pragma TenDRA define <I>token-list</I>
        #pragma TenDRA reject <I>token-list</I>
  </programlisting>
  or equivalently: 
  <programlisting>
        #pragma no_def <I>token-list</I>
        #pragma define <I>token-list</I>
        #pragma ignore <I>token-list</I>
  </programlisting>
  which set the state of the tokens given in <I>token-list</I>.  A state
  of <code>no_def</code> means that no unification is attempted and
  that any attempt to explicitly define the token results in an error.
  A state of <code>define</code> means that unification takes place
  and that the token must be defined somewhere in the translation unit.
  A state of <code>reject</code> means that unification takes place as
  normal, but any resulting token definition is discarded and not output
  to the TDF capsule. 
  </para>
  <para>
  If a token with the state <code>define</code> is not defined, then the
  behaviour depends on the sort of the token.  A <code>FUNC</code> token
  is implicitly defined in terms of its underlying function, such as:
  <programlisting>
        #define f( a1, ...., an )       ( f ) ( a1, ...., an )
  </programlisting>
  Other undefined tokens cause an error.  This behaviour can be modified
  using the directives:
  <programlisting>
        #pragma TenDRA++ implicit token definition <I>allow</I>
        #pragma TenDRA++ no token definition <I>allow</I>
  </programlisting>
  respectively.</para>
  <para>
  The primitive operations, <code>no_def</code>, <code>define</code> and
  <code>reject</code>, can also be expressed using the context sensitive
  directive: 
  <programlisting>
        #pragma TenDRA interface <I>token-list</I>
  </programlisting>
  or equivalently: 
  <programlisting>
        #pragma interface <I>token-list</I>
  </programlisting>
  By default this is equivalent to <code>no_def</code>, but may be modified
  by inclusion using one of the directives: 
  <programlisting>
        #pragma TenDRA extend <I>header-name</I>
        #pragma TenDRA implement <I>header-name</I>
  </programlisting>
  or equivalently: 
  <programlisting>
        #pragma extend interface <I>header-name</I>
        #pragma implement interface <I>header-name</I>
  </programlisting>
  These are equivalent to: 
  <programlisting>
        #include <I>header-name</I>
  </programlisting>
  except that the form <code>[....]</code> is allowed as a header name.
  This is equivalent to <code>&lt;....&gt;</code> except that it starts
  the directory search after the point at which the including file was
  found, rather than at the start of the path (i.e. it is equivalent
  to the 
  <code>#include_next</code> directive found in some preprocessors).
  The effect of the <code>extend</code> directive on the state of the
  <code>interface</code> directive is as follows: 
  <programlisting>
        no_def -&gt; no_def
        define -&gt; reject
        reject -&gt; reject
  </programlisting>
  The effect of the <code>implement</code> directive is as follows:
  <programlisting>
        no_def -&gt; define
        define -&gt; define
        reject -&gt; reject
  </programlisting>
  That is to say, a <code>implement</code> directive will cause all
  the tokens in the given header to be defined and their definitions
  output. Any tokens included in this header by <code>extend</code>
  may be defined, but their definitions will not be output.  This is
  precisely the behaviour which is required to ensure that each token
  is defined exactly once in an API library build. 
  </para>
  <para>
  The lists of tokens in the directives above are expressed in the form:
  <programlisting>
        <I>token-list</I> :
                <I>token-id token-list<SUB>opt</SUB></I>
                # <I>preproc-token-list</I>
  </programlisting>
  where a <I>token-id</I> represents an internal token name: 
  <programlisting>
        <I>token-id</I> :
                <I>token-namespace<SUB>opt</SUB> identifier</I>
                <I>type-id</I> . <I>identifier</I>
  </programlisting>
  Note that member tokens are specified by means of both the member
  name and its parent type.  In this type specifier, <code>TAG</code>,
  rather than 
  <code>class</code>, <code>struct</code> or <code>union</code>, may
  be used in elaborated type specifiers for structure and union tokens.
  If the 
  <I>token-id</I> names an overloaded function then the directive is
  applied to all <code>FUNC</code> tokens of that name.  It is possible
  to  be more selective using the <code>#</code> form which allows the
  external token name to be specified.  Such an entry must be the last
  in a <I>token-list</I>. 
  </para>
  <para>
  A related directive has the form: 
  <programlisting>
        #pragma TenDRA++ undef token <I>token-list</I>
  </programlisting>
  which undefines all the given tokens so that they are no longer visible.
  </para>
  <para>
  As noted above, a macro is only considered as a token definition if
  the token lies in the macro namespace.  Tokens which are not in the
  macro namespace, such as types and members, cannot be defined using
  macros. Occasionally API implementations do define member selector
  as macros in terms of other member selectors.  Such a token needs
  to be explicitly defined using a directive of the form: 
  <programlisting>
        #pragma TenDRA member definition <I>type-id</I> : <I>identifier member-offset
  </I>
  </programlisting>
  where <I>member-offset</I> is <A HREF="#offset">as above</A>. 
  </para>
  </sect3>
  </sect2>

  <sect2>
    <title>2.4. Symbol table dump</title>
  <para>
  The symbol table dump provides a method whereby third party tools
  can interface with the C and C++ producers.  The producer outputs
  information on the identifiers declared within a source file, their
  uses etc. into a file which can then be post-processed by a separate
  tool. Any error messages and warnings can also be included in this
  file, allowing more sophisticated error presentation tools to be written.
  </para>
  <para>
  The file to be used as the symbol table output file, plus details
  of what information is to be included in the dump file can be specified
  using the <A HREF="man.html#dump"><code>-d</code> command-line option</A>.
  The format of the dump file is described below; a 
  <A HREF="dump1.html">summary of the syntax</A> is given as an annex.
  </para>
  
  
  <sect3 id="lexical-elements">
    <title>2.4.1. Lexical elements</title>
  <para>
  A symbol table dump file consists of a sequence of characters giving
  information on identifiers, errors etc. arising from a translation
  unit. The fundamental lexical tokens are a <I>number</I>, consisting
  of a sequence of decimal digits, and a <I>string</I>, consisting of
  a sequence of characters enclosed in angle braces.  A <I>string</I>
  can have one of two forms: 
  <programlisting>
        <I>string</I> :
                &lt;<I>characters</I>&gt;
                &amp;<I>number</I>&lt;<I>characters</I>&gt;
  </programlisting>
  In the first form, the <I>characters</I> are terminated by the first
  <code>&gt;</code> character encountered.  In the second form, the
  number of characters is given by the preceding <I>number</I>.  No
  white space is allowed either before or after the <I>number</I>. 
  To aid parsers, the C++ producer always uses the second form for strings
  containing more than 100 characters.  There are no escape characters
  in strings; the 
  <I>characters</I> can contain any characters, including newlines and
  <code>#</code>, except that the first form cannot contain a 
  <code>&gt;</code> character. 
  </para>
  <para>
  Space, tab and newline characters are white space.  Comments begin
  with 
  <code>#</code> and run to the end of the line.  Comments are treated
  as white space.  All other characters are treated as distinct lexical
  tokens. 
  </para>
  </sect3>  
  
  <sect3 id="main">
    <title>2.4.2. Overall syntax</title>
  <para>
  A symbol table dump file takes the form of a list of commands of various
  kinds conveying information on the analysed file.  This can be represented
  as follows: 
  <programlisting>
        <I>dump-file</I> :
                <I>command-list<SUB>opt</SUB></I>
  
        <I>command-list</I> :
                <I>command command-list<SUB>opt</SUB></I>
  
        <I>command</I> :
                <I>version-command</I>
                <I>identifier-command</I>
                <I>scope-command</I>
                <I>override-command</I>
                <I>base-command</I>
                <I>api-command</I>
                <I>template-command</I>
                <I>promotion-command</I>
                <I>error-command</I>
                <I>path-command</I>
                <I>file-command</I>
                <I>include-command</I>
                <I>string-command</I>
  </programlisting>
  The various kinds of command are discussed below.  The first command
  in the dump file should be of the form: 
  <programlisting>
        <I>version-command</I> :
                V <I>number number string</I>
  </programlisting>
  where the two numbers give the version of the dump file format (the
  version described here is 1.1 so both numbers should be 1) and the
  string gives the language being represented, for example, 
  <code>&lt;C++&gt;</code>. 
  </para>
  </sect3>  
  
  <sect3 id="file-locations">
    <title>2.4.3. File locations</title>
  <para>
  A location within a source file can be specified using three 
  <I>number</I>s and two <I>string</I>s.  These give respectively, the
  column number, the line number taking <code>#line</code> directives
  into account, the line number not taking <code>#line</code> directives
  into account, the file name taking <code>#line</code> directives into
  account, and the file name not taking <code>#line</code> directives
  into account.  Any or all of the trailing elements can be replaced
  by 
  <code>*</code> to indicate that they have not changed relative to
  the last <I>location</I> given.  Note that for the two line numbers,
  unchanged means that the difference of the line numbers, taking 
  <code>#line</code> directives into account or not, is unchanged. 
  Thus: 
  <programlisting>
        <I>location</I> :
                <I>number number number string string</I>
                <I>number number number string</I> *
                <I>number number number</I> *
                <I>number number</I> *
                <I>number</I> *
                *
  </programlisting>
  Note that there is a concept of the <A id="crt_loc">current file
  location</A>, relative to which other locations are given.  The initial
  value of the current file location is undefined.  Unless otherwise
  stated, all <I>location</I> elements update the current file location.
  </para>
  </sect3>  
  
  <sect3 id="identifiers">
    <title>2.4.4. Identifiers</title>
  <para>
  Each identifier is represented in the symbol table dump by a unique
  number.  The same number always represents the same identifier. 
  </para>
  
  <H4><A id="hashid">Identifier names</A></H4>
  <para>
  The number representing an identifier is introduced in the first declaration
  or use of that identifier and thereafter the number alone is used
  to denote the identifier: 
  <programlisting>
        <I>identifier</I> :
                <I>number</I> = <I>identifier-name access<SUB>opt</SUB> scope-identifier</I>
                <I>number</I>
  </programlisting>
  </para>
  <para>
  The identifier name is given by: 
  <programlisting>
        <I>identifier-name</I> :
                <I>string</I>
                C <I>type</I>
                D <I>type</I>
                O <I>string</I>
                T <I>type</I>
  </programlisting>
  denoting respectively, a simple identifier name, a constructor for
  a type, a destructor for a type, an overloaded operator function name,
  and a conversion function name.  The empty string is used for anonymous
  identifiers. 
  </para>
  <para>
  The optional identifier access is given by: 
  <programlisting>
        <I>access</I> :
                N
                B
                P
  </programlisting>
  denoting <code>public</code>, <code>protected</code> and 
  <code>private</code> respectively.  An absent <I>access</I> is equivalent
  to <code>public</code>.  Note that all identifiers, not just class
  members, can have access specifiers; however the access of a non-member
  is always <code>public</code>. 
  </para>
  <para>
  The <A HREF="#scope">scope</A> (i.e. class, namespace, block etc.)
  in which an identifier is declared is given by: 
  <programlisting>
        <I>scope-identifier</I> :
                <I>identifier</I>
                *
  </programlisting>
  denoting either a named or an unnamed scope. 
  </para>
  
  <H4><A id="use">Identifier uses</A></H4>
  <para>
  Each declaration or use of an identifier is represented by a command
  of the form: 
  <programlisting>
        <I>identifier-command</I> :
                D <I>identifier-info type-info</I>
                M <I>identifier-info type-info</I>
                T <I>identifier-info type-info</I>
                Q <I>identifier-info</I>
                U <I>identifier-info</I>
                L <I>identifier-info</I>
                C <I>identifier-info</I>
                W <I>identifier-info type-info</I>
  </programlisting>
  where: 
  <programlisting>
        <I>identifier-info</I> :
                <I>identifier-key location identifier</I>
  </programlisting>
  gives the kind of identifier being declared or used, the location
  of the declaration or use, and the number associated with the identifier.
  Each declaration may, depending on the <I>identifier-key</I>, associate
  various <I>type-info</I> with the identifier, giving its type etc.
  </para>
  <para>
  The various kinds of <I>identifier-command</I> are described below.
  Any can be preceded by <code>I</code> to indicate an implicit declaration
  or use.  <code>D</code> denotes a definition.  <code>M</code> (make)
  denotes a declaration.  <code>T</code> denotes a tentative definition
  (C only).  <code>Q</code> denotes the end of a definition, for those
  identifiers such as classes and functions whose definitions may be
  spread over several lines.  <code>U</code> denotes an undefine operation
  (such as <code>#undef</code> for macro identifiers).  <code>C</code>
  denotes a call to a function identifier; <code>L</code> (load) denotes
  other identifier uses.  Finally <code>W</code> denotes implicit type
  information such as the C producer gleans from its 
  <A HREF="pragma.html#weak">weak prototype analysis</A>. 
  </para>
  <para>
  The various <I>identifier-key</I>s are their associated <I>type-info</I>
  fields are given by the following table: 
  </para>
  
  <table>
  <tr><th>Key</th>
  <th>Type information</th>
  <th>Description</th>
  </tr>
  <tr><td><code>K</code></td>
  <td><code>*</code></td>
  <td>keyword</td>
  </tr>
  <tr><td><code>MO</code></td>
  <td><I>sort</I></td>
  <td>object macro</td>
  </tr>
  <tr><td><code>MF</code></td>
  <td><I>sort</I></td>
  <td>function macro</td>
  </tr>
  <tr><td><code>MB</code></td>
  <td><I>sort</I></td>
  <td>built-in macro</td>
  </tr>
  <tr><td><code>TC</code></td>
  <td><I>type</I></td>
  <td>class tag</td>
  </tr>
  <tr><td><code>TS</code></td>
  <td><I>type</I></td>
  <td>structure tag</td>
  </tr>
  <tr><td><code>TU</code></td>
  <td><I>type</I></td>
  <td>union tag</td>
  </tr>
  <tr><td><code>TE</code></td>
  <td><I>type</I></td>
  <td>enumeration tag</td>
  </tr>
  <tr><td><code>TA</code></td>
  <td><I>type</I></td>
  <td><code>typedef</code> name</td>
  </tr>
  <tr><td><code>NN</code></td>
  <td><code>*</code></td>
  <td>namespace name</td>
  </tr>
  <tr><td><code>NA</code></td>
  <td><I>scope-identifier</I></td>
  <td>namespace alias</td>
  </tr>
  <tr><td><code>VA</code></td>
  <td><I>type</I></td>
  <td>automatic variable</td>
  </tr>
  <tr><td><code>VP</code></td>
  <td><I>type</I></td>
  <td>function parameter</td>
  </tr>
  <tr><td><code>VE</code></td>
  <td><I>type</I></td>
  <td><code>extern</code> variable</td>
  </tr>
  <tr><td><code>VS</code></td>
  <td><I>type</I></td>
  <td><code>static</code> variable</td>
  </tr>
  <tr><td><code>FE</code></td>
  <td><I>type identifier<SUB>opt</SUB></I></td>
  <td><code>extern</code> function</td>
  </tr>
  <tr><td><code>FS</code></td>
  <td><I>type identifier<SUB>opt</SUB></I></td>
  <td><code>static</code> function</td>
  </tr>
  <tr><td><code>FB</code></td>
  <td><I>type identifier<SUB>opt</SUB></I></td>
  <td>built-in operator function</td>
  </tr>
  <tr><td><code>CF</code></td>
  <td><I>type identifier<SUB>opt</SUB></I></td>
  <td>member function</td>
  </tr>
  <tr><td><code>CS</code></td>
  <td><I>type identifier<SUB>opt</SUB></I></td>
  <td><code>static</code> member function</td>
  </tr>
  <tr><td><code>CV</code></td>
  <td><I>type identifier<SUB>opt</SUB></I></td>
  <td>virtual member function</td>
  </tr>
  <tr><td><code>CM</code></td>
  <td><I>type</I></td>
  <td>data member</td>
  </tr>
  <tr><td><code>CD</code></td>
  <td><I>type</I></td>
  <td><code>static</code> data member</td>
  </tr>
  <tr><td><code>E</code></td>
  <td><I>type</I></td>
  <td>enumerator</td>
  </tr>
  <tr><td><code>L</code></td>
  <td><code>*</code></td>
  <td>label</td>
  </tr>
  <tr><td><code>XO</code></td>
  <td><I>sort</I></td>
  <td>object token</td>
  </tr>
  <tr><td><code>XF</code></td>
  <td><I>sort</I></td>
  <td>procedure token</td>
  </tr>
  <tr><td><code>XP</code></td>
  <td><I>sort</I></td>
  <td>token parameter</td>
  </tr>
  <tr><td><code>XT</code></td>
  <td><I>sort</I></td>
  <td>template parameter</td>
  </tr>
  </table>
  
  <para>
  The function identifier keys can optionally be followed by 
  <code>C</code> indicating that the function has C linkage, and 
  <code>I</code> indicating that the function is inline.  By default,
  functions declared in a C++ dump file have C++ linkage and functions
  declared in a C dump file have C linkage.  The optional 
  <I>identifier</I> which forms part of the <I>type-info</I> of these
  functions is used to form linked lists of overloaded functions. 
  </para>
  
  <H4><A id="scope">Identifier scopes</A></H4>
  <para>
  Each identifier belongs to a scope, called its parent scope, in which
  it is declared.  For example, the parent of a member of a class is
  the class itself.  This information is expressed in an identifier
  declaration using a <I>scope-identifier</I>.  In addition to the obvious
  scopes such as classes and namespaces, there are other scopes such
  as blocks in function definitions.  It is possible to introduce dummy
  identifiers to name such scopes.  The parent of such a dummy identifier
  will be the enclosing scope identifier, so these dummy identifiers
  naturally represent the block structure.  The parent of the top-level
  block in a function definition can be considered to be the function
  itself. 
  </para>
  <para>
  Information on the start and end of such scopes is given by: 
  <programlisting>
        <I>scope-command</I> :
                SS <I>scope-key location identifier</I>
                SE <I>scope-key location identifier</I>
  </programlisting>
  where: 
  <programlisting>
        <I>scope-key</I> :
                N
                S
                B
                D
                H
                CT
                CF
                CC
  </programlisting>
  gives the kind of scope involved: a namespace, a class, a block, some
  other declarative scope, a declaration block (see below), a true conditional
  scope, a false conditional scope or a target dependent conditional
  scope. 
  </para>
  <para>
  A declaration block is a sequence of declarations enclosed in directives
  of the form: 
  <programlisting>
        #pragma TenDRA declaration block <I>identifier</I> begin
        ....
        #pragma TenDRA declaration block end
  </programlisting>
  This allows the sequence of declarations to be associated with the
  given 
  <I>identifier</I> in the symbol dump file.  This technique is used
  in the API description files to aid analysis tools in determining
  which declarations are part of the API. 
  </para>
  
  <H4><A id="scope">Other identifier information</A></H4>
  <para>
  Other information associated with an identifier may be expressed using
  other dump commands.  For example: 
  <programlisting>
        <I>override-command</I> :
                O <I>identifier identifier</I>
  </programlisting>
  is used to express the fact that the two <I>identifier</I>s are virtual
  member functions, the first of which overrides the second. 
  </para>
  <para>
  The command: 
  <programlisting>
        <I>base-command</I> :
                B <I>identifier-key identifier base-graph</I>
  
        <I>base-graph</I> :
                <I>base-class</I>
                <I>base-class</I> ( <I>base-list</I> )
  
        <I>base-class</I> :
                <I>number</I> = V<I><SUB>opt</SUB> access<SUB>opt</SUB> type-name</I>
                <I>number</I> :
  
        <I>base-list</I> :
                <I>base-graph base-list<SUB>opt</SUB></I>
  
  </programlisting>
  associates a base class graph with a class identifier.  Any class
  which does not have an associated <I>base-command</I> can be assumed
  to have no base classes.  Each node in the graph is a <I>type-name</I>
  with an associated list of base classes.  A <code>V</code> is used
  to indicate a virtual base class.  Each node is numbered; duplicate
  numbers are used to indicate bases identified via the virtual base
  class structure.  Any base class can then be referred to as: 
  <programlisting>
        <I>base-number</I> :
                <I>number</I> : <I>type-name</I>
  </programlisting>
  indicating the base class with the given number in the given class.
  </para>
  <para>
  The command: 
  <programlisting>
        <I>api-command</I> :
                X <I>identifier-key identifier string</I>
  </programlisting>
  associates the external token name given by the <I>string</I> with
  the given tokenised identifier. 
  </para>
  <para>
  The command: 
  <programlisting>
        <I>template-command</I> :
                Z <I>identifier-key identifier token-application specialise-info</I>
  </programlisting>
  is used to introduce an identifier corresponding to an instance of
  a template, <I>token-application</I>.  This instance may correspond
  to a specialisation of the primary template; this information is represented
  by: 
  <programlisting>
        <I>specialise-info</I> :
                <I>identifier</I>
                <I>token-application</I>
                *
  </programlisting>
  where <code>*</code> indicates a non-specialised instance. 
  </para>
  </sect3>  
  
  <sect3 id="types">
    <title>2.4.5. Types</title>
  <para>
  The <A id="built-in">built-in types</A> are represented in the symbol
  table dump as follows: 
  </para>
  
  <table>
  <tr><th>Type</th>
  <th>Encoding</th>
  <th>Type</th>
  <th>Encoding</th>
  </tr>
  <tr><td>char</td>
  <td><code>c</code></td>
  <td>float</td>
  <td><code>f</code></td>
  </tr>
  <tr><td>signed char</td>
  <td><code>Sc</code></td>
  <td>double</td>
  <td><code>d</code></td>
  </tr>
  <tr><td>unsigned char</td>
  <td><code>Uc</code></td>
  <td>long double</td>
  <td><code>r</code></td>
  </tr>
  <tr><td>signed short</td>
  <td><code>s</code></td>
  <td>void</td>
  <td><code>v</code></td>
  </tr>
  <tr><td>unsigned short</td>
  <td><code>Us</code></td>
  <td>(bottom)</td>
  <td><code>u</code></td>
  </tr>
  <tr><td>signed int</td>
  <td><code>i</code></td>
  <td>bool</td>
  <td><code>b</code></td>
  </tr>
  <tr><td>unsigned int</td>
  <td><code>Ui</code></td>
  <td>ptrdiff_t</td>
  <td><code>y</code></td>
  </tr>
  <tr><td>signed long</td>
  <td><code>l</code></td>
  <td>size_t</td>
  <td><code>z</code></td>
  </tr>
  <tr><td>unsigned long</td>
  <td><code>Ul</code></td>
  <td>wchar_t</td>
  <td><code>w</code></td>
  </tr>
  <tr><td>signed long long</td>
  <td><code>x</code></td>
  <td>-</td>
  <td>-</td>
  </tr>
  <tr><td>unsigned long long</td>
  <td><code>Ux</code></td>
  <td>-</td>
  <td>-</td>
  </tr>
  </table>
  
  <para>
  Named types (classes, enumeration types etc.) can be represented by
  the corresponding identifier or token application: 
  <programlisting>
        <I>type-name</I> :
                <I>identifier</I>
                <I>token-application</I>
  </programlisting>
  <A id="composite">Composite and qualified types</A> are represented
  in terms of their subtypes as follows: 
  </para>
  
  <table>
  <tr><th>Type</th>
  <th>Encoding</th>
  </tr>
  <tr><td><code>const</code> type</td>
  <td><code>C</code> <I>type</I></td>
  </tr>
  <tr><td><code>volatile</code> type</td>
  <td><code>V</code> <I>type</I></td>
  </tr>
  <tr><td>pointer type</td>
  <td><code>P</code> <I>type</I></td>
  </tr>
  <tr><td>reference type</td>
  <td><code>R</code> <I>type</I></td>
  </tr>
  <tr><td>pointer to member type</td>
  <td><code>M</code> <I>type-name</I> <code>:</code> <I>type</I></td>
  </tr>
  <tr><td>function type</td>
  <td><code>F</code> <I>type parameter-types</I></td>
  </tr>
  <tr><td>array type</td>
  <td><code>A</code> <I>nat<SUB>opt</SUB></I> <code>:</code> <I>type</I></td>
  </tr>
  <tr><td>bitfield type</td>
  <td><code>B</code> <I>nat</I> <code>:</code> <I>type</I></td>
  </tr>
  <tr><td>template type</td>
  <td><code>t</code> <I>parameter-list<SUB>opt</SUB></I> <code>:</code> <I>type</I></td>
  </tr>
  <tr><td>promotion type</td>
  <td><code>p</code> <I>type</I></td>
  </tr>
  <tr><td>arithmetic type</td>
  <td><code>a</code> <I>type</I> <code>:</code> <I>type</I></td>
  </tr>
  <tr><td>integer literal type</td>
  <td><code>n</code> <I>lit-base<SUB>opt</SUB> lit-suffix<SUB>opt</SUB></I></td>
  </tr>
  <tr><td>weak function prototype (C only)</td>
  <td><code>W</code> <I>type parameter-types</I></td>
  </tr>
  <tr><td>weak parameter type (C only)</td>
  <td><code>q</code> <I>type</I></td>
  </tr>
  </table>
  
  <para>
  Other types can be represented by their textual representation using
  the form <code>Q</code> <I>string</I>, or by <code>*</code>, indicating
  an unknown type. 
  </para>
  <para>
  The parameter types for a function type are represented as follows:
  <programlisting>
        <I>parameter-types</I> :
                : <I>exception-spec<SUB>opt</SUB> func-qualifier<SUB>opt</SUB></I> :
                . <I>exception-spec<SUB>opt</SUB> func-qualifier<SUB>opt</SUB></I> :
                . <I>exception-spec<SUB>opt</SUB> func-qualifier<SUB>opt</SUB></I> .
                , <I>type parameter-types</I>
  </programlisting>
  where the <code>::</code> form indicates that there are no further
  parameters, the <code>.:</code> form indicates that the parameters
  are terminated by an ellipsis, and the <code>..</code> form indicates
  that no information is available on the further parameters (this can
  only happen with non-prototyped functions in C).  The function qualifiers
  are given by: 
  <programlisting>
        <I>func-qualifier</I> :
                C <I>func-qualifier<SUB>opt</SUB></I>
                V <I>func-qualifier<SUB>opt</SUB></I>
  </programlisting>
  representing <code>const</code> and <code>volatile</code> member functions.
  The function exception specifier is given by: 
  <programlisting>
        <I>exception-spec</I> :
                ( <I>exception-list<SUB>opt</SUB></I> )
  
        <I>exception-list</I> :
                <I>type</I>
                <I>type</I> , <I>exception-list</I>
  </programlisting>
  with an absent exception specifier, as in C++, indicating that any
  exception may be thrown. 
  </para>
  <para>
  Array and bitfield sizes are represented as follows: 
  <programlisting>
        <I>nat</I> :
                + <I>number</I>
                - <I>number</I>
                <I>identifier</I>
                <I>token-application</I>
                <I>string</I>
  </programlisting>
  where a <I>string</I> is used to hold a textual representation of
  complex values. 
  </para>
  <para>
  Template types are represented by a list of template parameters, which
  will have previously been declared using the <code>XT</code> identifier
  key, followed by the underlying type expressed in terms of these parameters.
  The parameters are represented as follows: 
  <programlisting>
        <I>parameter-list</I> :
                <I>identifier</I>
                <I>identifier</I> , <I>parameter-list</I>
  </programlisting>
  </para>
  <para>
  Integer literal types are represented by the value of the literal
  followed by a representation of the literal base and suffix.  These
  are given by: 
  <programlisting>
        <I>lit-base</I> :
                O
                X
  </programlisting>
  representing octal and hexadecimal literals respectively (decimal
  is the default), and: 
  <programlisting>
        <I>lit-suffix</I> :
                U
                l
                Ul
                x
                Ux
  </programlisting>
  representing the <code>U</code>, <code>L</code>, <code>UL</code>,
  <code>LL</code> and <code>ULL</code> suffixes respectively. 
  </para>
  <para>
  Target dependent integral promotion types are represented using 
  <code>p</code>, so for example the promotion of <code>unsigned short</code>
  is represented as <code>pUs</code>.  Information on the other cases,
  where the promotion type is known, can be given in a command of the
  form: 
  <programlisting>
        <I>promotion-command</I> :
                P <I>type</I> : <I>type</I>
  </programlisting>
  Thus the fact that the promotion of <code>short</code> is <code>int</code>
  would be expressed by the command <code>Ps:i</code>. 
  </para>
  </sect3>  
  
  <sect3 id="sort">
    <title>2.4.6. Sorts</title>
  <para>
  A <I>sort</I> in the symbol table dump corresponds to the sort of
  a token declared in the <A HREF="token.html#spec"><code>#pragma token</code>
  syntax</A>.  Expression tokens are represented as follows: 
  <programlisting>
        <I>expression-sort</I> :
                ZEL <I>type</I>
                ZER <I>type</I>
                ZEC <I>type</I>
                ZN
  </programlisting>
  corresponding to <code>lvalue</code>, <code>rvalue</code> and 
  <code>const</code> <code>EXP</code> tokens of the given type, and
  <code>NAT</code> or <code>INTEGER</code> tokens, respectively. Statement
  tokens are represent by: 
  <programlisting>
        <I>statement-sort</I> :
                ZS
  </programlisting>
  </para>
  <para>
  Type tokens are represented as follows: 
  <programlisting>
        <I>type-sort</I> :
                ZTO
                ZTI
                ZTF
                ZTA
                ZTP
                ZTS
                ZTU
  </programlisting>
  corresponding to <code>TYPE</code>, <code>VARIETY</code>, <code>FLOAT</code>,
  <code>ARITHMETIC</code>, <code>SCALAR</code>, <code>STRUCT</code>
  or 
  <code>CLASS</code>, and <code>UNION</code> token respectively.  There
  are corresponding <code>TAG</code> forms: 
  <programlisting>
        <I>tag-type-sort</I> :
                ZTTS
                ZTTU
  </programlisting>
  </para>
  <para>
  Member tokens are represented using: 
  <programlisting>
        <I>member-sort</I> :
                ZM <I>type</I> : <I>type-name</I>
  </programlisting>
  where the first type gives the member type and the second gives the
  parent structure or union type. 
  </para>
  <para>
  Procedure tokens can be represented using: 
  <programlisting>
        <I>proc-sort</I> :
                ZPG <I>parameter-list<SUB>opt</SUB></I> ; <I>parameter-list<SUB>opt</SUB></I> : <I>sort</I>
                ZPS <I>parameter-list<SUB>opt</SUB></I> : <I>sort</I>
  </programlisting>
  The first form corresponds to the more general form of <code>PROC</code>
  token, that expressed using <code>{ .... | .... }</code>, which has
  separate lists of bound and program parameters.  These token parameters
  will have previously been declared using the <code>XP</code> identifier
  key.  The second form corresponds to the case where the bound and
  program parameter lists are equal, that expressed as a <code>PROC</code>
  token using <code>( .... )</code>.  A more specialised version of
  this second form is a <code>FUNC</code> token, which is represented
  as: 
  <programlisting>
        <I>func-sort</I> :
                ZF <I>type</I>
  </programlisting>
  </para>
  <para>
  As noted above, template parameters are represented by a <I>sort</I>.
  Template type parameters are represented by <code>ZTO</code>, while
  template expression parameters are represent by <code>ZEC</code>
  (recall that such parameters are always constant expressions).  The
  remaining case, template template parameters, can be represented as:
  <programlisting>
        <I>template-sort</I> :
                ZTt <I>parameter-list<SUB>opt</SUB></I> :
  </programlisting>
  </para>
  <para>
  Finally, the number of parameters in a macro definition is represented
  by a <I>sort</I> of the form: 
  <programlisting>
        <I>macro-sort</I> :
                ZUO
                ZUF <I>number</I>
  </programlisting>
  corresponding to a object-like macro and a function-like macro with
  the given number of parameters, respectively. 
  </para>
  </sect3>  
  
  <sect3 id="token-applications">
    <title>2.4.7. Token applications</title>
  <para>
  Given an identifier representing a <code>PROC</code> token or a template,
  an application of that token or an instance of that template can be
  represented using: 
  <programlisting>
        <I>token-application</I> :
                T <I>identifier</I> , <I>token-argument-list</I> :
  </programlisting>
  where the token or template arguments are given by: 
  <programlisting>
        <I>token-argument-list</I> :
                <I>token-argument</I>
                <I>token-argument</I> , <I>token-argument-list</I>
  </programlisting>
  Note that the case where there are no arguments is generally just
  represented by <I>identifier</I>; this case is specified separately
  in the rest of the grammar. 
  </para>
  <para>
  A <I>token-argument</I> can represent a value of any of the sorts
  listed above: expressions, integer constants, statements, types, members,
  functions and templates.  These are given respectively by: 
  <programlisting>
        <I>token-argument</I> :
                E <I>expression</I>
                N <I>nat</I>
                S <I>statement</I>
                T <I>type</I>
                M <I>member</I>
                F <I>identifier</I>
                C <I>identifier</I>
  </programlisting>
  where: 
  <programlisting>
        <I>expression</I> :
                <I>nat</I>
  
        <I>statement</I> :
                <I>expression</I>
  
        <I>member</I> :
                <I>identifier</I>
                <I>string</I>
  </programlisting>
  </para>
  </sect3>  
  
  <sect3 id="error">
    <title>2.4.8. Errors</title>
  <para>
  Each error in the C++ <A HREF="error.html">error catalogue</A> is
  represented by a number.  These numbers happen to correspond to the
  position of the error within the catalogue, but in general this need
  not be the case.  The first use of each error introduces the error
  number by associating it with a <I>string</I> giving the error name.
  This has the form <code>cpp.</code><I>error</I> where <I>error</I>
  gives an error name from the C++ (<code>cpp</code>) error catalogue.
  Thus: 
  <programlisting>
        <I>error-name</I> :
                <I>number</I> = <I>string</I>
                <I>number</I>
  </programlisting>
  </para>
  <para>
  Each error message written to the symbol table dump has the form:
  <programlisting>
        <I>error-command</I> :
                ES <I>location error-info</I>
                EW <I>location error-info</I>
                EI <I>location error-info</I>
                EF <I>location error-info</I>
                EC <I>error-info</I>
                EA <I>error-argument</I>
  </programlisting>
  denoting constraint errors, warnings, internal errors, fatal errors,
  continuation errors and error arguments respectively.  Note that an
  error message may consist of several components; the initial error
  plus a number of continuation errors.  Each error message may also
  have a number of error argument associated with it.  This error information
  is given by: 
  <programlisting>
        <I>error-info</I> :
                <I>error-name number number</I>
  </programlisting>
  where the first <I>number</I> gives the number of error arguments
  which should be read, and the second is nonzero to indicate that a
  continuation error should be read. 
  </para>
  <para>
  Each error argument has one of the forms: 
  <programlisting>
        <I>error-argument</I> :
                B <I>base-number</I>
                C <I>scope-identifier</I>
                E <I>expression</I>
                H <I>identifier-name</I>
                I <I>identifier</I>
                L <I>location</I>
                N <I>nat</I>
                S <I>string</I>
                T <I>type</I>
                V <I>number</I>
                V - <I>number</I>
  </programlisting>
  corresponding to the various syntactic categories described above.
  Note that a <I>location</I> error argument, while expressed relative
  to the 
  <A HREF="#crt_loc">current file location</A>, does not change this
  location. 
  </para>
  </sect3>  
  
  <sect3 id="file">
    <title>2.4.9. File inclusions</title>
  <para>
  It is possible to include information on header files within the symbol
  table dump.  Firstly a number is associated with each directory on
  the <code>#include</code> search path: 
  <programlisting>
        <I>path-command</I> :
                FD <I>number</I> = <I>string string<SUB>opt</SUB></I>
  </programlisting>
  The first <I>string</I> gives the directory pathname; the second,
  if present, gives the associated directory name as specified in the
  <A HREF="man.html#directory"><code>-N</code> command-line option</A>.
  </para>
  <para>
  Now the start and end of each file are marked using: 
  <programlisting>
        <I>file-command</I> :
                FS <I>location directory</I>
                FE <I>location</I>
  </programlisting>
  where <I>directory</I> gives the number of the directory in the search
  path where the file was found, or <code>*</code> if the file was found
  by other means.  It is worth noting that if, for example, a function
  definition is the last item in a file, the <code>FE</code> command
  will appear in the symbol table dump before the <code>QFE</code> command
  for the end of the function definition.  This is because lexical analysis,
  where the end of file is detected, takes place before parsing, where
  the end of function is detected. 
  </para>
  <para>
  A <code>#include</code> directive, whether explicit or implicit, can
  be represented using: 
  <programlisting>
        <I>include-command</I> :
                FIA <I>location string</I>
                FIQ <I>location string</I>
                FIN <I>location string</I>
                FIS <I>location string</I>
                FIE <I>location string</I>
                FIR <I>location</I>
  </programlisting>
  the first three corresponding to header names of the forms 
  <code>&lt;....&gt;</code>, <code>&quot;....&quot;</code> and <code>[....]</code>
  respectively, the next two corresponding to <A HREF="man.html#start-up">start-up
  </A>
  and <A HREF="man.html#end-up">end-up</A> files, and the final form
  being used to resume the original file after the <code>#include</code>
  directive has been processed. 
  </para>
  </sect3>  
  
  <sect3 id="string-literals">
    <title>2.4.10. String literals</title>
  <para>
  It is possible to dump information on string literals to the symbol
  table dump file using the commands: 
  <programlisting>
        <I>string-command</I> :
                A <I>location string</I>
                AC <I>location string</I>
                AL <I>location string</I>
                ACL <I>location string</I>
  </programlisting>
  representing string literals, character literals, wide string literals
  and wide character literals respectively.  The given <I>string</I>
  gives the string text. 
  </para>
  </sect3>
  </sect2>
  
  <sect2>
    <title>2.5. Intermodule analysis</title>
  <para>
  <IMG SRC="../images/warn.gif" ALT="warning"/>
  The C++ spec linking routines have not yet been completely implemented,
  and so are disabled in the current version of the C++ producer. 
  </para>
  <para>
  A C++ spec file is a dump of the C++ producer's <A HREF="alg.html">internal
  representation</A> of a translation unit.  Such files can be written
  to, and read from, disk to perform such operations as intermodule
  analysis. 
  </para>
  <para>
  Note that the format of a C++ spec file is specific to the C++ producer
  and may change between releases to reflect modifications in the internal
  type system.  The C producer has a similar dump format, called a C
  spec file, however the two are incompatible.  If intermodule analysis
  between C and C++ source files is required then the <A HREF="dump.html">symbol
  table dump</A> format should be used. 
  </para>
  </sect2>
  
  <sect2>
    <title>2.6. Implementation details</title>
  <para>
  This section describes various of the implementation details of the
  C++ producer TDF output.  In particular it describes the standard
  TDF tokens used to represent the target dependent aspects of the language
  and to provide links into the run-time system.  Many of these tokens
  are common to the C and C++ producers.  Those which are unique to
  the C++ producer have names of the form <code>~cpp.*</code>.  Note
  that the description is in terms of TDF tokens, not the internal tokens
  introduced by the 
  <A HREF="token.html"><code>#pragma token</code> syntax</A>. 
  </para>
  <para>
  There are two levels of implementation in the run-time system.  The
  actual interface between the producer and the run-time system is given
  by the standard tokens.  The provided implementation defines these
  tokens in a way appropriate to itself.  An alternative implementation
  would have to define the tokens differently.  It is intended that
  the standard tokens are sufficiently generic to allow a variety of
  implementations to hook into the producer output in the manner they
  require. 
  </para>
  
  
  <sect3 id="arith">
    <title>2.6.1. Arithmetic types</title>
  <para>
  The representations of the basic arithmetic types are target dependent,
  so, for example, an <code>int</code> may contain 16, 32, 64 or some
  other number of bits.  Thus it is necessary to introduce a token to
  stand for each of the built-in arithmetic types (including the 
  <A HREF="pragma.html#longlong"><code>long long</code> types</A>).
  Each integral type is represented by a <code>VARIETY</code> token
  as follows: </para>
  
  <table>
  <tr><th>Type</th>
  <th>Token</th>
  <th>Encoding</th>
  </tr>
  <tr><td>char</td>
  <td>~char</td>
  <td>0</td>
  </tr>
  <tr><td>signed char</td>
  <td>~signed_char</td>
  <td>0 | 4 = 4</td>
  </tr>
  <tr><td>unsigned char</td>
  <td>~unsigned_char</td>
  <td>0 | 8 = 8</td>
  </tr>
  <tr><td>signed short</td>
  <td>~signed_short</td>
  <td>1 | 4 = 5</td>
  </tr>
  <tr><td>unsigned short</td>
  <td>~unsigned_short</td>
  <td>1 | 8 = 9</td>
  </tr>
  <tr><td>signed int</td>
  <td>~signed_int</td>
  <td>2 | 4 = 6</td>
  </tr>
  <tr><td>unsigned int</td>
  <td>~unsigned_int</td>
  <td>2 | 8 = 10</td>
  </tr>
  <tr><td>signed long</td>
  <td>~signed_long</td>
  <td>3 | 4 = 7</td>
  </tr>
  <tr><td>unsigned long</td>
  <td>~unsigned_long</td>
  <td>3 | 8 = 11</td>
  </tr>
  <tr><td>signed long long</td>
  <td>~signed_longlong</td>
  <td>3 | 4 | 16 = 23 </td>
  </tr>
  <tr><td>unsigned long long</td>
  <td>~unsigned_longlong</td>
  <td>3 | 8 | 16 = 27</td>
  </tr>
  </table>
  
  <para>
  Similarly each floating point type is represent by a 
  <code>FLOATING_VARIETY</code> token: 
  </para>
  
  <table>
  <tr><th>Type</th>   <th>Token</th>
  </tr>
  <tr><td>float</td>  <td>~float</td>
  </tr>
  <tr><td>double</td> <td>~double</td>
  </tr>
  <tr><td>long double</td> <td>~long_double</td>
  </tr>
  </table>
  
  <para>
  Each integral type also has an encoding as a <code>SIGNED_NAT</code>
  as shown above.  This number is a bit pattern built up from the following
  values: 
  </para>
  
  <table>
  <tr><th>Type</th>   <th>Encoding</th>
  </tr>
  <tr><td>char</td>  <td>0</td>
  </tr>
  <tr><td>short</td>  <td>1</td>
  </tr>
  <tr><td>int</td>  <td>2</td>
  </tr>
  <tr><td>long</td>  <td>3</td>
  </tr>
  <tr><td>signed</td> <td>4</td>
  </tr>
  <tr><td>unsigned</td> <td>8</td>
  </tr>
  <tr><td>long long</td> <td>16</td>
  </tr>
  </table>
  
  <para>
  Any target dependent integral type can be represented by a 
  <code>SIGNED_NAT</code> token using this encoding.  This representation,
  rather than one based on <code>VARIETY</code>s, is used for ease of
  manipulation.  The token: 
  <programlisting>
        ~convert : ( SIGNED_NAT ) -&gt; VARIETY
  </programlisting>
  gives the mapping from the integral encoding to the representing variety.
  For example, it will map <code>6</code> to <code>~signed_int</code>.
  </para>
  <para>
  The token: 
  <programlisting>
        ~promote : ( SIGNED_NAT ) -&gt; SIGNED_NAT
  </programlisting>
  describes how to form the promotion of an integral type according
  to the ISO C/C++ value preserving rules, and is used by the producer
  to represent target dependent promotion types.  For example, the promotion
  of <code>unsigned short</code> may be <code>int</code> or <code>unsigned
  int</code> depending on the representation of these types; that is
  to say, <code>~promote ( 9 )</code> will be <code>6</code> on some
  machines and <code>10</code> on others.  Although <code>~promote</code>
  is used by default, a program may specify another token with the same
  sort signature to be used in its place by means of the directive:
  <programlisting>
        #pragma TenDRA compute promote <I>identifier</I>
  </programlisting>
  For example, a standard token <code>~sign_promote</code> is defined
  which gives the older C sign preserving promotion rules.  In addition,
  the promotion of an individual type can be specified using: 
  <programlisting>
        #pragma TenDRA promoted <I>type-id</I> : <I>promoted-type-id</I>
  </programlisting>
  </para>
  <para>
  The token: 
  <programlisting>
        ~arith_type : ( SIGNED_NAT, SIGNED_NAT ) -&gt; SIGNED_NAT
  </programlisting>
  similarly describes how to form the usual arithmetic result type from
  two promoted integral operand types.  For example, the arithmetic
  type of <code>long</code> and <code>unsigned int</code> may be 
  <code>long</code> or <code>unsigned long</code> depending on the representation
  of these types; that is to say, 
  <code>~arith_type ( 7, 10 )</code> will be <code>7</code> on some
  machines and <code>11</code> on others. 
  </para>
  <para>
  Any tokenised type declared using: 
  <programlisting>
        #pragma token VARIETY v # tv
  </programlisting>
  will be represented by a <code>SIGNED_NAT</code> token with external
  name 
  <code>tv</code> corresponding to the encoding of <code>v</code>. 
  Special cases of this are the implementation dependent integral types
  which arise naturally within the language.  The external token names
  for these types are given below: 
  </para>
  
  <table>
  <tr><th>Type</th>   <th>Token</th>
  </tr>
  <tr><td>bool</td>  <td>~cpp.bool</td>
  </tr>
  <tr><td>ptrdiff_t</td> <td>ptrdiff_t</td>
  </tr>
  <tr><td>size_t</td> <td>size_t</td>
  </tr>
  <tr><td>wchar_t</td> <td>wchar_t</td>
  </tr>
  </table>
  
  <para>
  So, for example, a <code>sizeof</code> expression has shape 
  <code>~convert ( size_t )</code>.  The token <code>~cpp.bool</code>
  is defined in the default implementation, but the other tokens are
  defined according to their definitions on the target machine in the
  normal API library building mechanism. 
  </para>
  </sect3>  
  
  <sect3 id="literal">
    <title>2.6.2. Integer literal types</title>
  <para>
  The <A HREF="pragma.html#int">type of an integer literal</A> is defined
  in terms of the first in a list of possible integral types.  The first
  type in which the literal value can be represented gives the type
  of the literal.  For small literals it is possible to work out the
  type exactly, however for larger literals the result is target dependent.
  For example, the literal <code>50000</code> will have type <code>int</code>
  on machines in which <code>50000</code> fits into an <code>int</code>,
  and 
  <code>long</code> otherwise.  This target dependent mapping is given
  by a series of tokens of the form: 
  <programlisting>
        ~lit_* : ( SIGNED_NAT ) -&gt; SIGNED_NAT
  </programlisting>
  which map a literal value to the representation of an integral type.
  The token used depends on the list of possible types, which in turn
  depends on the base used to represent the literal and the integer
  suffix used, as given in the following table: 
  </para>
  
  <table>
  <tr><th>Base</th>
  <th>Suffix</th>
  <th>Token</th>
  <th>Types</th>
  </tr>
  <tr><td>decimal</td>
  <td>none</td>
  <td>~lit_int</td>
  <td>int, long, unsigned long</td>
  </tr>
  <tr><td>octal</td>
  <td>none</td>
  <td>~lit_hex</td>
  <td>int, unsigned int, long, unsigned long</td>
  </tr>
  <tr><td>hexadecimal</td>
  <td>none</td>
  <td>~lit_hex</td>
  <td>int, unsigned int, long, unsigned long</td>
  </tr>
  <tr><td>any</td>
  <td>U</td>
  <td>~lit_unsigned</td>
  <td>unsigned int, unsigned long</td>
  </tr>
  <tr><td>any</td>
  <td>L</td>
  <td>~lit_long</td>
  <td>long, unsigned long</td>
  </tr>
  <tr><td>any</td>
  <td>UL</td>
  <td>~lit_ulong</td>
  <td>unsigned long</td>
  </tr>
  <tr><td>any</td>
  <td>LL</td>
  <td>~lit_longlong</td>
  <td>long long, unsigned long long</td>
  </tr>
  <tr><td>any</td>
  <td>ULL</td>
  <td>~lit_ulonglong</td>
  <td>unsigned long long</td>
  </tr>
  </table>
  
  <para>
  Thus, for example, the shape of the integer literal 50000 is: 
  <programlisting>
        ~convert ( ~lit_int ( 50000 ) )
  </programlisting>
  </para>
  </sect3>  
  
  <sect3 id="bitfield">
    <title>2.6.3. Bitfield types</title>
  <para>
  The sign of a plain bitfield type, declared without using 
  <code>signed</code> or <code>unsigned</code>, is left unspecified
  in C and C++.  The token: 
  <programlisting>
        ~cpp.bitf_sign : ( SIGNED_NAT ) -&gt; BOOL
  </programlisting>
  is used to give a mapping from integral types to the sign of a plain
  bitfield of that type, in a form suitable for use in the TDF 
  <code>bfvar_bits</code> construct.  (Note that <code>~cpp.bitf_sign</code>
  should have been a standard C token but was omitted.) 
  </para>
  </sect3>  
  
  <sect3 id="pointer">
    <title>2.6.4. Generic pointers</title>
  <para>
  TDF has no concept of a generic pointer type, so tokens are used to
  defer the representation of <code>void *</code> and the basic operations
  on it to the target machine.  The fundamental token is: 
  <programlisting>
        ~ptr_void : () -&gt; SHAPE
  </programlisting>
  which gives the representation of <code>void *</code>.  This shape
  will be denoted by <code>pv</code> in the description of the following
  tokens.  It is not guaranteed that <code>pv</code> is a TDF <code>pointer</code>
  shape, although normally it will be implemented as a pointer to a
  suitable alignment. 
  </para>
  <para>
  The token: 
  <programlisting>
        ~null_pv : () -&gt; EXP pv
  </programlisting>
  gives the value of a null pointer of type <code>void *</code>.  Generic
  pointers can also be converted to and from other pointers.  These
  conversions are represented by the tokens: 
  <programlisting>
        ~to_ptr_void : ( ALIGNMENT a, EXP POINTER a ) -&gt; EXP pv
        ~from_ptr_void : ( ALIGNMENT a, EXP pv ) -&gt; EXP POINTER a
  </programlisting>
  where the given alignment describes the destination or source pointer
  type.  Finally a generic pointer may be tested against the null pointer
  or two generic pointers may be compared.  These operations are represented
  by the tokens: 
  <programlisting>
        ~pv_test : ( EXP pv, LABEL, NTEST ) -&gt; EXP TOP
        ~cpp.pv_compare : ( EXP pv, EXP pv, LABEL, NTEST ) -&gt; EXP TOP
  </programlisting>
  where the given <code>NTEST</code> gives the comparison to be applied
  and the given label gives the destination to jump to if the test fails.
  (Note that <code>~cpp.pv_compare</code> should have been a standard
  C token but was omitted.) 
  </para>
  </sect3>  
  
  <sect3 id="undefined-conversions">
    <title>2.6.5. Undefined conversions</title>
  <para>
  Several conversions in C and C++ can only be represented by undefined
  TDF.  For example, converting a pointer to an integer can only be
  represented in TDF by forming a union of the pointer and integer shapes,
  putting the pointer into the union and pulling the integer out.  Such
  conversions are tokenised.  Undefined conversions not mentioned below
  may be performed by combining those given with the standard, well-defined,
  conversions. 
  </para>
  <para>
  The token: 
  <programlisting>
        ~ptr_to_ptr : ( ALIGNMENT a, ALIGNMENT b, EXP POINTER a ) -&gt; EXP POINTER b
  </programlisting>
  is used to convert between two incompatible pointer types.  The first
  alignment describes the source pointer shape while the second describes
  the destination pointer shape.  Note that if the destination alignment
  is greater than the source alignment then the source pointer can be
  used in most TDF constructs in place of the destination pointer, so
  the use of <code>~ptr_to_ptr</code> can be omitted (the exception
  is 
  <code>pointer_test</code> which requires equal alignments).  Base
  class pointer conversions are examples of these well-behaved, alignment
  preserving conversions. 
  </para>
  <para>
  The tokens: 
  <programlisting>
        ~f_to_pv : ( EXP PROC ) -&gt; EXP pv
        ~pv_to_f : ( EXP pv ) -&gt; EXP PROC
  </programlisting>
  are used to convert pointers to functions to and from <code>void *</code>
  (these conversions are not allowed in ISO C/C++ but are in older dialects).
  </para>
  <para>
  The tokens: 
  <programlisting>
        ~i_to_p : ( VARIETY v, ALIGNMENT a, EXP INTEGER v ) -&gt; EXP POINTER a
        ~p_to_i : ( ALIGNMENT a, VARIETY v, EXP POINTER a ) -&gt; EXP INTEGER v
        ~i_to_pv : ( VARIETY v, EXP INTEGER v ) -&gt; EXP pv
        ~pv_to_i : ( VARIETY v, EXP pv ) -&gt; EXP INTEGER v
  </programlisting>
  are used to convert integers to and from <code>void *</code> and other
  pointers. 
  </para>
  </sect3>  
  
  <sect3 id="div">
    <title>2.6.6. Integer division</title>
  <para>
  The precise form of the integer division and remainder operations
  in C and C++ is left unspecified with respect to the sign of the result
  if either operand is negative.  The tokens: 
  <programlisting>
        ~div : ( EXP INTEGER v, EXP INTEGER v ) -&gt; EXP INTEGER v
        ~rem : ( EXP INTEGER v, EXP INTEGER v ) -&gt; EXP INTEGER v
  </programlisting>
  are used to represent integer division and remainder.  They will map
  onto one of the pairs of TDF constructs, <code>div0</code> and <code>rem0</code>,
  <code>div1</code> and <code>rem1</code> or <code>div2</code> and 
  <code>rem2</code>. 
  </para>
  </sect3>  
  
  <sect3 id="call">
    <title>2.6.7. Calling conventions</title>
  <para>
  The function calling conventions used by the C++ producer are essentially
  the same as those used by the C producer with one exception.  That
  is to say, all types except arrays are passed by value (note that
  individual installers may modify these conventions to conform to their
  own ABIs). 
  </para>
  <para>
  The exception concerns classes with a non-trivial constructor, destructor
  or assignment operator.  These classes are passed as function arguments
  by taking a reference to a copy of the object (although it is often
  possible to eliminate the copy and pass a reference to the object
  directly).  They are passed as function return values by adding an
  extra parameter to the start of the function parameters giving a reference
  to a location into which the return value should be copied. 
  </para>
  
  <H4>Member functions</H4>
  <para>
  Non-static member functions are implemented in the obvious fashion,
  by passing a pointer to the object the method is being applied to
  as the first argument (or the second argument if the method has an
  extra argument for its return value). 
  </para>
  
  <H4><A id="ellipsis">Ellipsis functions</A></H4>
  <para>
  Calls to functions declared with ellipses are via the 
  <code>apply_proc</code> TDF construct, with all the arguments being
  treated as non-variable.  However the definition of such a function
  uses the <code>make_proc</code> construct with a variable parameter.
  This parameter can be referred to within the program using the 
  <A HREF="pragma.html#ellipsis"><code>...</code> expression</A>.  The
  type of this expression is given by the built-in token: 
  <programlisting>
        ~__va_t : () -&gt; SHAPE
  </programlisting>
  The <code>va_start</code> macro declared in the 
  <code>&lt;stdarg.h&gt;</code> header then describes how the variable
  parameter (expressed as <code>...</code>) can be converted to an expression
  of type <code>va_list</code> suitable for use in the 
  <code>va_arg</code> macro. 
  </para>
  <para>
  Note that the variable parameter is in effect only being used to determine
  where the first optional parameter is defined.  The assumption is
  that all such parameters are located contiguously on the stack, however
  the fact that calls to such functions do not use the variable parameter
  mechanism means that this is not automatically the case.  Strictly
  speaking this means that the implementation of ellipsis functions
  uses undefined behaviour in TDF, however given the non-type-safe function
  calling rules in C this is unavoidable and installers need to make
  provision for such calls (by dumping any parameters from registers
  to the stack if necessary).  Given the theoretically type-safe nature
  of C++ it would be possible to avoid such undefined behaviour, but
  the need for C-compatible calling conventions prevents this. 
  </para>
  </sect3>  
  
  <sect3 id="ptr_mem">
    <title>2.6.8. Pointers to data members</title>
  <para>
  The representation of, and operations on, pointers to data members
  are represented by tokens to allow for a variety of implementations.
  It is assumed that all pointers to data members (as opposed to pointers
  to function members) are represented by the same shape: 
  <programlisting>
        ~cpp.pm.type : () -&gt; SHAPE
  </programlisting>
  This shape will be denoted by <code>pm</code> in the description of
  the following tokens. 
  </para>
  <para>
  There are two basic methods of constructing a pointer to a data member.
  The first is to take the address of a data member of a class.  A data
  member is represented in TDF by an expression which gives the offset
  of the member from the start of its enclosing <code>compound</code>
  shape (note that it is not possible to take the address of a member
  of a virtual base). The mapping from this offset to a pointer to a
  data member is given by: 
  <programlisting>
        ~cpp.pm.make : ( EXP OFFSET ) -&gt; EXP pm
  </programlisting>
  The second way of constructing a pointer to a data member is to use
  a null pointer to member: 
  <programlisting>
        ~cpp.pm.null : () -&gt; EXP pm
  </programlisting>
  The other fundamental operation on a pointer to data member is to
  turn it back into an offset expression which can be added to a pointer
  to a class to access a member of that class in a <code>.*</code> or
  <code>-&gt;*</code>
  operation.  This is done by the token: 
  <programlisting>
        ~cpp.pm.offset : ( EXP pm, ALIGNMENT a ) -&gt; EXP OFFSET ( a, a )
  </programlisting>
  Note that it is necessary to specify an alignment in order to describe
  the shape of the result.  The value of this token is undefined if
  the given expression is a null pointer to data member. 
  </para>
  <para>
  A pointer to a data member of a non-virtual base class can be converted
  to a pointer to a data member of a derived class.  The reverse conversion
  is also possible using <code>static_cast</code>.  If the base is a
  <A HREF="#primary">primary base class</A> then these conversions are
  trivial and have no effect.  Otherwise null pointers to data members
  are converted to null pointers to data members, and the non-null cases
  are handled by the tokens: 
  <programlisting>
        ~cpp.pm.cast : ( EXP pm, EXP OFFSET ) -&gt; EXP pm
        ~cpp.pm.uncast : ( EXP pm, EXP OFFSET ) -&gt; EXP pm
  </programlisting>
  where the given offset is the offset of the base class within the
  derived class.  It is also possible to convert between any two pointers
  to data members using <code>reinterpret_cast</code>.  This conversion
  is implied by the equality of representation between any two pointers
  to data members and has no effect. 
  </para>
  <para>
  The only remaining operations on pointer to data members are to test
  one against the null pointer to data member and to compare two pointer
  to data members.  These are represented by the tokens: 
  <programlisting>
        ~cpp.pm.test : ( EXP pm, LABEL, NTEST ) -&gt; EXP TOP
        ~cpp.pm.compare : ( EXP pm, EXP pm, LABEL, NTEST ) -&gt; EXP TOP
  </programlisting>
  where the given <code>NTEST</code> gives the comparison to be applied
  and the given label gives the destination to jump to if the test fails.
  </para>
  <para>
  In the default implementation, pointers to data members are implemented
  as <code>int</code>.  The null pointer to member is represented by
  0 and the address of a class member is represented by 1 plus the offset
  of the member (in bytes).  Casting to and from a derived class then
  correspond to adding or subtracting the base class offset (in bytes),
  and pointer to member comparisons correspond to integer comparisons.
  </para>
  </sect3>  
  
  <sect3 id="ptr_mem_func">
    <title>2.6.9. Pointers to function members</title>
  <para>
  As with pointers to data members, pointers to function members and
  the operations on them are represented by tokens to allow for a range
  of implementations.  All pointers to function members are represented
  by the same shape: 
  <programlisting>
        ~cpp.pmf.type : () -&gt; SHAPE
  </programlisting>
  This shape will be denoted by <code>pmf</code> in the description
  of the following tokens.  Many of the tokens take an expression which
  has a shape which is a pointer to the alignment of <code>pmf</code>.
  This will be denoted by <code>ppmf</code>. 
  </para>
  <para>
  There are two basic methods for constructing a pointer to a function
  member.  The first is to take the address of a non-static member function
  of a class.  There are two cases, depending on whether or not the
  member function is virtual.  The non-virtual case is given by the
  token: 
  <programlisting>
        ~cpp.pmf.make : ( EXP PROC, EXP OFFSET, EXP OFFSET ) -&gt; EXP pmf
  </programlisting>
  where the first argument is the address of the corresponding function,
  the second argument gives any base class offset which is to be added
  when calling this function (to deal with inherited member functions),
  and the third argument is a zero offset. 
  </para>
  <para>
  For virtual functions, a pointer to function member of the form above
  is entered in the <A HREF="#vtable">virtual function table</A> for
  the corresponding class.  The actual pointer to the virtual function
  member then gives a reference into the virtual function table as follows:
  <programlisting>
        ~cpp.pmf.vmake : ( SIGNED_NAT, EXP OFFSET, EXP, EXP ) -&gt; EXP pmf
  </programlisting>
  where the first argument gives the index of the function within the
  virtual function table, the second argument gives the offset of the
  <I>vptr</I> field within the class, and the third and fourth arguments
  are zero offsets. 
  </para>
  <para>
  The second way of constructing a pointer to a function member is to
  use a null pointer to function member: 
  <programlisting>
        ~cpp.pmf.null : () -&gt; EXP pmf
        ~cpp.pmf.null2 : () -&gt; EXP pmf
  </programlisting>
  For technical reasons there are two versions of this token, although
  they have the same value.  The first token is used in static initialisers;
  the second token is used in other expressions. </para>
  <para>
  The cast operations on pointers to function members are more complex
  than those on pointers to data members.  The value to be cast is copied
  into a temporary and one of the tokens: 
  <programlisting>
        ~cpp.pmf.cast : ( EXP ppmf, EXP OFFSET, EXP, EXP OFFSET ) -&gt; EXP TOP
        ~cpp.pmf.uncast : ( EXP ppmf, EXP OFFSET, EXP, EXP OFFSET ) -&gt; EXP TOP
  </programlisting>
  is applied to modify the value of the temporary according to the given
  cast.  The first argument gives the address of the temporary, the
  second gives the base class offset to be added or subtracted, the
  third gives the number to be added or subtracted to convert virtual
  function indexes for the base class into virtual function indexes
  for the derived class, and the fourth gives the offset of the <I>vptr</I>
  field within the class.  Again, the ability to use <code>reinterpret_cast</code>
  to convert between any two pointer to function member types arises
  because of the uniform representation of these types. 
  </para>
  <para>
  As with pointers to data members, there are tokens implementing comparisons
  on pointers to function members: 
  <programlisting>
        ~cpp.pmf.test : ( EXP ppmf, LABEL, NTEST ) -&gt; EXP TOP
        ~cpp.pmf.compare : ( EXP ppmf, EXP ppmf, LABEL, NTEST ) -&gt; EXP TOP
  </programlisting>
  Note however that the arguments are passed by reference. 
  </para>
  <para>
  The most important, and most complex, operation is calling a function
  through a pointer to function member.  The first step is to copy the
  pointer to function member into a temporary.  The token: 
  <programlisting>
        ~cpp.pmf.virt : ( EXP ppmf, EXP, ALIGNMENT ) -&gt; EXP TOP
  </programlisting>
  is then applied to the temporary to convert a pointer to a virtual
  function member to a normal pointer to function member by looking
  it up in the corresponding virtual function table.  The first argument
  gives the address of the temporary, the second gives the object to
  which the function is to be applied, and the third gives the alignment
  of the corresponding class.  Now the base class conversion to be applied
  to the object can be determined by applying the token: 
  <programlisting>
        ~cpp.pmf.delta : ( ALIGNMENT a, EXP ppmf ) -&gt; EXP OFFSET ( a, a )
  </programlisting>
  to the temporary to find the offset to be added.  Finally the function
  to be called can be extracted from the temporary using the token:
  <programlisting>
        ~cpp.pmf.func : ( EXP ppmf ) -&gt; EXP PROC
  </programlisting>
  The function call then procedes as normal. 
  </para>
  <para>
  The default implementation is that described in the ARM, where each
  pointer to function member is represented in the form: 
  <programlisting>
        struct PTR_MEM_FUNC {
            short delta ;
            short index ;
            union {
                void ( *func ) () ;
                short off ;
            } u ;
        } ;
  </programlisting>
  The <code>delta</code> field gives the base class offset (in bytes)
  to be added before applying the function.  The <code>index</code>
  field is 0 for null pointers, -1 for non-virtual function pointers
  and the index into the virtual function table for virtual function
  pointers (as described below these indexes start from 1).  For non-virtual
  function pointers the function itself is given by the <code>u.func</code>
  field. For virtual function pointers the offset of the <I>vptr</I>
  field within the class is given by the <code>u.off</code> field. 
  </para>
  </sect3>  
  
  <sect3 id="class">
    <title>2.6.10. Class layout</title>
  <para>
  Consider a class with no base classes: 
  <programlisting>
        class A {
            // A's members
        } ;
  </programlisting>
  Each object of class <I>A</I> needs its own copy of the non-static
  data members of <I>A</I> and, for polymorphic types, a means of referencing
  the virtual function table and run-time type information for <I>A</I>.
  This is accomplished using a layout of the form: 
  
  <IMG SRC="../images/class.gif" ALT="class A"/>
  
  where the <I>A</I> component consists of the non-static data members
  and 
  <I>vptr A</I> is a pointer to the virtual function table for <I>A</I>.
  For non-polymorphic classes the <I>vptr A</I> field is omitted; otherwise
  space for <I>vptr A</I> needs to be allocated within the class and
  the pointer needs to be initialised in each constructor for <I>A</I>.
  The precise layout of the <A HREF="#vtable">virtual function table</A>
  and the <A HREF="#rtti">run-time type information</A> is given below.
  </para>
  <para>
  Two alternative ways of laying out the non-static data members within
  the class are implemented.  The first, which is default, gives them
  in the order in which they are declared in the class definition. 
  The second lays out the <code>public</code>, the <code>protected</code>,
  and the <code>private</code> members in three distinct sections, the
  members within each section being given in the order in which they
  are declared. The latter can be enabled using the <code>-jo</code>
  command-line option. 
  </para>
  <para>
  The offset of each member within the class (including <I>vptr A</I>)
  can be calculated in terms of the offset of the previous member. 
  The first member has offset zero.  The offset of any other member
  is given by the offset of the previous member plus the size of the
  previous member, rounded up to the alignment of the current member.
  The overall size of the class is given by the offset of the last member
  plus the size of the last member, rounded up using the token: 
  <programlisting>
        ~comp_off : ( EXP OFFSET ) -&gt; EXP OFFSET
  </programlisting>
  which allows for any target dependent padding at the end of the class.
  The shape of the class is then a <code>compound</code> shape with
  this offset. 
  </para>
  <para>
  Classes with no members need to be treated slightly differently. 
  The shape of such a class is given by the token: 
  <programlisting>
        ~cpp.empty.shape : () -&gt; SHAPE
  </programlisting>
  (recall that an empty class still has a nonzero size).  The token:
  <programlisting>
        ~cpp.empty.offset : () -&gt; EXP OFFSET
  </programlisting>
  is used to represent the offset required for an empty class when it
  is used as a base class.  This may be a zero offset. 
  </para>
  <para>
  Bitfield members provide a slight complication to the picture above.
  The offset of a bitfield is additionally padded using the token: 
  <programlisting>
        ~pad : ( EXP OFFSET, SHAPE, SHAPE ) -&gt; EXP OFFSET
  </programlisting>
  where the two shapes give the type underlying the bitfield and the
  bitfield itself. 
  </para>
  <para>
  The layout of unions is similar to that of classes except that all
  members have zero offset, and the size of the union is the maximum
  of the sizes of its members, suitably padded.  Of course unions cannot
  be polymorphic and cannot have base classes. 
  </para>
  <para>
  Pointers to incomplete classes are represented by means of the alignment:
  <programlisting>
        ~cpp.empty.align : () -&gt; ALIGNMENT
  </programlisting>
  This token is also used for the alignment of a complete class if that
  class is never used in the generated TDF in a manner which requires
  it to be complete.  This can lead to savings on the size of the generated
  code by preventing the need to define all the member offset tokens
  in order to find the shape of the class. 
  </para>
  </sect3>  
  
  <sect3 id="derive">
    <title>2.6.11. Derived class layout</title>
  <para>
  The description of the implementation of derived classes will be given
  in terms of the example class hierarchy given by: 
  <programlisting>
        class A {
            // A's members
        } ;
  
        class B : public A {
            // B's members
        } ;
  
        class C : public A {
            // C's members
        } ;
  
        class D : public B, public C {
            // D's members
        } ;
  </programlisting>
  or, as a directed acyclic graph: 
  </para>
  
  <IMG SRC="../images/graph.gif" ALT="class D"/>
  
  
  <H4>Single inheritance</H4>
  <para>
  The layout of class <I>A</I> is given by: 
  
  <IMG SRC="../images/classA.gif" ALT="class A"/>
  
  as above.  Class <I>B</I> inherits all the members of class <I>A</I>
  plus those members explicitly declared within class <I>B</I>.  In
  addition, class <I>B</I> inherits all the virtual member functions
  of <I>A</I>, some of which may be overridden in <I>B</I>, extended
  by any additional virtual functions declared in <I>B</I>.  This may
  be represented as follows: 
  
  <IMG SRC="../images/classB.gif" ALT="class B"/>
  
  where <I>A</I> denotes those members inherited from the base class
  and 
  <I>B</I> denotes those members added in the derived class.  Note that
  an object of class <I>B</I> contains a sub-object of class <I>A</I>.
  The fact that this sub-object is located at the start of <I>B</I>
  means that the base class conversion from <I>B</I> to <I>A</I> is
  trivial.  Any base class with this property is called a 
  <A id="primary">primary base class</A>. 
  </para>
  <para>
  Note that in theory two virtual function tables are required, the
  normal virtual function table for <I>B</I>, denoted by <I>vtbl B</I>,
  and a modified virtual function table for <I>A</I>, denoted by <I>vtbl
  B::A</I>, taking into account any overriding virtual functions within
  <I>B</I>, and pointing to <I>B</I>'s run-time type information.  This
  latter means that the dynamic type information for the <I>A</I> sub-object
  relates to 
  <I>B</I> rather than <I>A</I>.  However these two tables can usually
  be combined - if the virtual functions added in <I>B</I> are listed
  in the virtual function table after those inherited from <I>A</I>
  and the form of the overriding is <A HREF="#override">suitably well
  behaved</A>
  (in the sense defined below) then <I>vptr B::A</I> is an initial segment
  of <I>vptr B</I>.  It is also possible to remove the <I>vptr B</I>
  field and use <I>vptr B::A</I> in its place in this case (it has to
  be this way round to preserve the <I>A</I> sub-object).  Thus the
  items shaded in the diagram can be removed. 
  </para>
  <para>
  The class <I>C</I> is similarly given by: 
  
  <IMG SRC="../images/classC.gif" ALT="class C"/>
  
  </para>
  
  <H4>Multiple inheritance</H4>
  <para>
  Class <I>D</I> is more complex because of the presence of multiple
  inheritance.  <I>D</I> inherits all the members of <I>B</I>, including
  those which <I>B</I> inherits from <I>A</I>, plus all the members
  of 
  <I>C</I>, including those which <I>C</I> inherits from <I>A</I>. 
  It also inherits all of the virtual member functions from <I>B</I>
  and 
  <I>C</I>, some of which may be overridden in <I>D</I>, extended by
  any additional virtual functions declared in <I>D</I>.  This may be
  represented as follows: 
  
  <IMG SRC="../images/classD.gif" ALT="class D"/>
  
  Note that there are two copies of <I>A</I> in <I>D</I> because virtual
  inheritance has not been used. 
  </para>
  <para>
  The <I>B</I> base class of <I>D</I> is essentially similar to the
  single inheritance case already discussed; the <I>C</I> base class
  is different however.  Note firstly that the <I>C</I> sub-object of
  <I>D</I> is located at a non-zero offset, <I>delta D::C</I>, from
  the start of the object. This means that the base class conversion
  from <I>D</I> to <I>C</I>
  consists of adding this offset (for pointer conversions things are
  further complicated by the need to allow for null pointers).  Also
  <I>vtbl D::C</I> is not an initial segment of <I>vtbl D</I> because
  this contains the virtual functions inherited from <I>B</I> first,
  followed by those inherited from <I>C</I>, followed by those first
  declared in <I>D</I> (there are <A HREF="#override">other reasons</A>
  as well).  Thus <I>vtbl D::C</I> cannot be eliminated. 
  </para>
  
  <H4>Virtual inheritance</H4>
  <para>
  Virtual inheritance introduces a further complication.  Now consider
  the class hierarchy given by: 
  <programlisting>
        class A {
            // A's members
        } ;
  
        class B : virtual public A {
            // B's members
        } ;
  
        class C : virtual public A {
            // C's members
        } ;
  
        class D : public B, public C {
            // D's members
        } ;
  </programlisting>
  or, as a <A id="diamond">directed acyclic graph</A>: 
  
  <IMG SRC="../images/diamond.gif" ALT="class D"/>
  
  As before <I>A</I> is given by: 
  
  <IMG SRC="../images/classA.gif" ALT="class A"/>
  
  but now <I>B</I> is given by: 
  
  <IMG SRC="../images/virtualB.gif" ALT="class B"/>
  
  Rather than having the sub-object of class <I>A</I> directly as part
  of 
  <I>B</I>, the class now contains a pointer, <I>ptr A</I>, to this
  sub-object.  The virtual sub-objects are always located at the end
  of a class layout; their offset may therefore vary for different objects,
  however the offset for <I>ptr A</I> is always fixed.  The <I>ptr A</I>
  field is initialised in each constructor for <I>B</I>.  In order to
  perform the base class conversion from <I>B</I> to <I>A</I>, the contents
  of <I>ptr A</I> are taken (again provision needs to be made for null
  pointers in pointer conversions).  In cases when the dynamic type
  of the <I>B</I> object can be determined statically it is possible
  to access the <I>A</I> sub-object directly by adding a suitable offset.
  Because this conversion is non-trivial (see <A HREF="#override">below</A>)
  the virtual function table <I>vtbl B::A</I> is not an initial segment
  of 
  <I>vtbl B</I> and cannot be eliminated. 
  </para>
  <para>
  The class <I>C</I> is similarly given by: 
  
  <IMG SRC="../images/virtualC.gif" ALT="class C"/>
  
  Now the class <I>D</I> is given by: 
  
  <IMG SRC="../images/virtualD.gif" ALT="class D"/>
  
  Note that there is a single <I>A</I> sub-object of <I>D</I> referenced
  by the <I>ptr A</I> fields in both the <I>B</I> and <I>C</I> sub-objects.
  The elimination of <I>vtbl D::B</I> is as above. 
  </para>
  </sect3>  
  
  <sect3 id="constr">
    <title>2.6.12. Constructors and destructors</title>
  <para>
  The implementation of constructors and destructors, whether explicitly
  or implicitly defined, is slightly more complex than that of other
  member functions.  For example, the constructors need to set up the
  internal <I>vptr</I> and <I>ptr</I> fields mentioned above. 
  </para>
  <para>
  The order of initialisation in a constructor is as follows: 
  <itemizedlist>
  <listitem>The internal <I>ptr</I> fields giving the locations of the virtual
  base classes are initialised. 
  </listitem>
  <listitem>The constructors for the virtual base classes are called. 
  </listitem>
  <listitem>The constructors for the non-virtual direct base classes are called.
  </listitem>
  <listitem>The internal <I>vptr</I> fields giving the locations of the virtual
  function tables are initialised. 
  </listitem>
  <listitem>The constructors for the members of the class are called. 
  </listitem>
  <listitem>The main constructor body is executed. 
  </listitem>
  </itemizedlist>
  To ensure that each virtual base is only initialised once, if a class
  has a virtual base class then all its constructors have an implicit
  extra parameter of type <code>int</code>.  The first two steps above
  are then only applied if this flag is nonzero.  In normal applications
  of the constructor this argument will be 1, however in base class
  initialisations such as those in the third and fourth steps above,
  it will be 0. 
  </para>
  <para>
  Note that similar steps to protect virtual base classes are not taken
  in an implicitly declared <code>operator=</code> function.  The order
  of assignment in this case is as follows: 
  <itemizedlist>
  <listitem>The assignment operators for the direct base classes (both virtual
  and non-virtual) are called. 
  </listitem>
  <listitem>The assignment operators for the members of the class are called.
  </listitem>
  <listitem>A reference to the object assigned to (i.e. <code>*this</code>)
  is     returned. 
  </listitem>
  </itemizedlist>
  </para>
  <para>
  The order of destruction in a destructor is essentially the reverse
  of the order of construction: 
  <itemizedlist>
  <listitem>The main destructor body is executed. 
  </listitem>
  <listitem>The destructor for the members of the class are called. 
  </listitem>
  <listitem>The internal <I>vptr</I> fields giving the locations of the virtual
  function tables are re-initialised. 
  </listitem>
  <listitem>The destructors for the non-virtual direct base classes are called.
  </listitem>
  <listitem>The destructors for the virtual base classes are called. 
  </listitem>
  <listitem>If necessary the space occupied by the object is deallocated.
  </listitem>
  </itemizedlist>
  All destructors have an extra parameter of type <code>int</code>.
  The virtual base classes are only destroyed if this flag is nonzero
  when and-ed with 2.  The space occupied by the object is only deallocated
  if this flag is nonzero when and-ed with 1.  This deallocation is
  equivalent to inserting:  
  <programlisting>
        delete this ;
  </programlisting>
  in the destructor.  The <code>operator delete</code> function is called
  via the destructor in this way in order to implement the pseudo-virtual
  nature of these deallocation functions.  Thus for normal destructor
  calls the extra argument is 2, for base class destructor calls it
  is 0, and for calls arising from a <code>delete</code> expression
  it is 3. 
  </para>
  <para>
  The point at which the virtual function tables are initialised in
  the constructor, and the fact that they are re-initialised in the
  destructor, is to ensure that virtual functions called from base class
  initialisers are handled correctly (see ISO C++ 12.7). 
  </para>
  <para>
  A further complication arises from the need to destroy 
  <A id="partial">partially constructed objects</A> if an exception
  is thrown in a constructor.  A count is maintained of the number of
  base classes and members constructed within a constructor.  If an
  exception is thrown then it is caught in the constructor, the constructed
  base classes and members are destroyed, and the exception is re-thrown.
  The count variable is used to determine which bases and members need
  to be destroyed. 
  </para>
  <para>
  <IMG SRC="../images/warn.gif" ALT="warning"/> These partial destructors
  currently do not interact correctly with any exception specification
  on the constructor.  Exceptions thrown within destructors are not
  correctly handled either. 
  </para>
  </sect3>  
  
  <sect3 id="vtable">
    <title>2.6.13. Virtual function tables</title>
  <para>
  The virtual functions in a polymorphic class are given in its virtual
  function table in the following order: firstly those virtual functions
  inherited from its direct base classes (which may be overridden in
  the derived class) followed by those first declared in the derived
  class in the order in which they are declared.  Note that this can
  result in virtual functions inherited from virtual base classes appearing
  more than once.  The virtual functions are numbered from 1 (this is
  slightly more convenient than numbering from 0 in the default implementation).
  </para>
  <para>
  The virtual function table for this class has shape: 
  <programlisting>
        ~cpp.vtab.type : ( NAT ) -&gt; SHAPE
  </programlisting>
  the argument being <I>n + 1</I> where <I>n</I> is the number of virtual
  functions in the class (there is also a token: 
  <programlisting>
        ~cpp.vtab.diag : () -&gt; SHAPE
  </programlisting>
  which is used in the diagnostic output for a generic virtual function
  table).  The table is created using the token: 
  <programlisting>
        ~cpp.vtab.make : ( EXP pti, EXP OFFSET, NAT, EXP NOF ) -&gt; EXP vt
  </programlisting>
  where the first expression gives the address of the <A HREF="#rtti">run-time
  type information structure</A> for the class, the second expression
  gives the offset of the <I>vptr</I> field within the class (i.e. <I>voff</I>),
  the integer constant is <I>n + 1</I>, and the final expression is
  a 
  <code>make_nof</code> construct giving information on each of the
  <I>n</I>
  virtual functions. 
  </para>
  <para>
  The information given on each virtual function in this table has the
  form of a <A HREF="#ptr_mem_func">pointer to function member</A> formed
  using the token: 
  <programlisting>
        ~cpp.pmf.make : ( EXP PROC, EXP OFFSET, EXP OFFSET ) -&gt; EXP pmf
  </programlisting>
  as above, except that the third argument gives the offset of the base
  class in virtual function tables such as <I>vtbl B::A</I>.  For pure
  virtual functions the function pointer in this token is given by:
  <programlisting>
        ~cpp.vtab.pure : () -&gt; EXP PROC
  </programlisting>
  In the default implementation this gives a function 
  <code>__TCPPLUS_pure</code> which just calls <code>abort</code>. 
  </para>
  <para>
  To avoid duplicate copies of virtual function tables and run-time
  type information structures being created, the ARM algorithm is used.
  The virtual function table and run-time type information structure
  for a class are defined in the module containing the definition of
  the first non-inline, non-pure virtual function declared in that class.
  If such a function does not exist then duplicate copies are created
  in every module which requires them.  In the former case the virtual
  function table will have an <A HREF="#other">external tag name</A>;
  in the latter case it will be an internal tag.  This scheme can be
  overridden using the <code>-jv</code> command-line option, which causes
  local virtual function tables to be output for all classes. 
  </para>
  <para>
  Note that the discussion above applies to both simple virtual function
  tables, such as <I>vtbl B</I> above, and to those arising from base
  classes, such as <I>vtbl B::A</I>.  <A id="override">We are now
  in a position to precisely determine when <I>vtbl B::A</I> is an initial
  segment of <I>vtbl B</I> and hence can be eliminated</A>.  Firstly,
  <I>A</I> must be the first direct base class of <I>B</I> and cannot
  be virtual.  This is to ensure both that there are no virtual functions
  in <I>vtbl B</I> before those inherited from <I>A</I>, and that the
  corresponding base class conversion is trivial so that the pointers
  to function members of <I>B</I> comprising the virtual function table
  can be equally regarded as pointers to function members of <I>A</I>.
  The second requirement is that if a virtual function for <I>A</I>,
  <I>f</I>, is overridden in <I>B</I> then the return type for <I>B::f</I>
  cannot differ from the return type for <I>A::f</I> by a non-trivial
  conversion (recall that ISO C++ allows the return types to differ
  by a base class conversion).  In the non-trivial conversion case the
  function entered in <I>vtbl B::A</I> needs to be, not <I>B::f</I>
  as in <I>vtbl B</I>, but a stub function which calls <I>B::f</I> and
  converts its return value to the return type of <I>A::f</I>. 
  </para>
  
  <H4>Calling virtual functions</H4>
  <para>
  The virtual function call mechanism is implemented using the token:
  <programlisting>
        ~cpp.vtab.func : ( EXP ppvt, SIGNED_NAT ) -&gt; EXP ppmf
  </programlisting>
  which has as its arguments a reference to the <I>vptr</I> field of
  the object the function is to be called for, and the number of the
  virtual function to be called.  It returns a reference to the corresponding
  pointer to function member within the object's virtual function table.
  The function is then called by extracting the base class offset to
  be added, and the function to be called, from this reference using
  the tokens: 
  <programlisting>
        ~cpp.pmf.delta : ( ALIGNMENT a, EXP ppmf ) -&gt; EXP OFFSET ( a, a )
        ~cpp.pmf.func : ( EXP ppmf ) -&gt; EXP PROC
  </programlisting>
  described as part of the <A HREF="#ptr_mem_func">pointer to function
  member call mechanism</A> above. 
  </para>
  </sect3>  
  
  <sect3 id="rtti">
    <title>2.6.14. Run-time type information</title>
  <para>
  Each C++ type can be associated with a run-time type information structure
  giving information about that type.  These type information structures
  have shape given by the token: 
  <programlisting>
        ~cpp.typeid.type : () -&gt; SHAPE
  </programlisting>
  which corresponds to the representation for the standard type 
  <code>std::type_info</code> declared in the header 
  <code>&lt;typeinfo&gt;</code>.  Each type information structure consists
  of a tag number, giving information on the kind of type represented,
  a string literal, giving the name of the type, and a pointer to a
  list of base type information structures.  These are combined to give
  a type information structure using the token: 
  <programlisting>
        ~cpp.typeid.make : ( SIGNED_NAT, EXP, EXP ) -&gt; EXP ti
  </programlisting>
  Each base type information structure has shape given by the token:
  <programlisting>
        ~cpp.baseid.type : () -&gt; SHAPE
  </programlisting>
  It consists of a pointer to a type information structure, an expression
  used to describe the offset of a base class, a pointer to the next
  base type information structure in the list, and two integers giving
  information on type qualifiers etc.  These are combined to give a
  base type information structure using the token: 
  <programlisting>
        ~cpp.baseid.make : ( EXP, EXP, EXP, SIGNED_NAT, SIGNED_NAT ) -&gt; EXP bi
  </programlisting>
  </para>
  <para>
  The following table gives the various tag numbers used in type information
  structures plus a list of the base type information structures associated
  with each type.  Macros giving these tag numbers are provided in the
  default implementation in a header, <code>interface.h</code>, which
  is shared by the C++ producer. 
  </para>
  <para>
  
  <table>
  <tr><th>Type</th>
  <th>Form</th>
  <th>Tag</th>
  <th>Base information</th>
  </tr>
  <tr><td>integer</td>
  <td>-</td>
  <td>0</td>
  <td>-</td>
  </tr>
  <tr><td>floating point</td>
  <td>-</td>
  <td>1</td>
  <td>-</td>
  </tr>
  <tr><td>void</td>
  <td>-</td>
  <td>2</td>
  <td>-</td>
  </tr>
  <tr><td>class or struct</td>
  <td>class T</td>
  <td>3</td>
  <td>[base,access,virtual], ....</td>
  </tr>
  <tr><td>union</td>
  <td>union T</td>
  <td>4</td>
  <td>-</td>
  </tr>
  <tr><td>enumeration</td>
  <td>enum T</td>
  <td>5</td>
  <td>-</td>
  </tr>
  <tr><td>pointer</td>
  <td>cv T *</td>
  <td>6</td>
  <td>[T,cv,0]</td>
  </tr>
  <tr><td>reference</td>
  <td>cv T &amp;</td>
  <td>7</td>
  <td>[T,cv,0]</td>
  </tr>
  <tr><td>pointer to member</td>
  <td>cv T S::*</td>
  <td>8</td>
  <td>[S,0,0], [T,cv,0]</td>
  </tr>
  <tr><td>array</td>
  <td>cv T [n]</td>
  <td>9</td>
  <td>[T,cv,n]</td>
  </tr>
  <tr><td>bitfield</td>
  <td>cv T : n</td>
  <td>10</td>
  <td>[T,cv,n]</td>
  </tr>
  <tr><td>C++ function</td>
  <td>cv T ( S1, ...., Sn )</td>
  <td>11</td>
  <td>[T,cv,0], [S1,0,0], ...., [Sn,0,0]</td>
  </tr>
  <tr><td>C function</td>
  <td>cv T ( S1, ...., Sn )</td>
  <td>12</td>
  <td>[T,cv,0], [S1,0,0], ...., [Sn,0,0]</td>
  </tr>
  </table>
  
  </para>
  <para>
  In the form column <code>cv T</code> is used to denote not only the
  normal cv-qualifiers but, when <code>T</code> is a function type,
  the member function cv-qualifiers.  Arrays with an unspecified bound
  are treated as if their bound was zero.  Functions with ellipsis are
  treated as if they had an extra parameter of a dummy type named 
  <code>...</code> (see below).  Note the distinction between C++ and
  C function types. 
  </para>
  <para>
  Each base type information structure is described as a triple consisting
  of a type and two integers.  One of these integers may be used to
  encode a type qualifier, <code>cv</code>, as follows: 
  </para>
  <para>
  
  <table>
  <tr><th>Qualifier</th>   <th>Encoding</th>
  </tr>
  <tr><td>none</td>  <td>0</td>
  </tr>
  <tr><td>const</td>  <td>1</td>
  </tr>
  <tr><td>volatile</td> <td>2</td>
  </tr>
  <tr><td>const volatile</td><td>3</td>
  </tr>
  </table>
  
  </para>
  <para>
  The base type information for a class consists of information on each
  of its direct base classes.  The includes the offset of this base
  within the class (for a virtual base class this is the offset of the
  corresponding 
  <I>ptr</I> field), whether the base is virtual (1) or not (0), and
  the base class access, encoded as follows: 
  </para>
  <para>
  
  <table>
  <tr><th>Access</th>   <th>Encoding</th>
  </tr>
  <tr><td>public</td> <td>0</td>
  </tr>
  <tr><td>protected</td> <td>1</td>
  </tr>
  <tr><td>private</td> <td>2</td>
  </tr>
  </table>
  
  </para>
  <para>
  For example, the run-time type information structures for the classes
  declared in the <A HREF="#diamond">diamond lattice</A> above can be
  represented as follows: 
  
  <IMG SRC="../images/rttiD.gif" ALT="typeid D"/>
  
  </para>
  
  <H4>Defining run-time type information structures</H4>
  <para>
  For built-in types, the run-time type information structure may be
  referenced by the token: 
  <programlisting>
        ~cpp.typeid.basic : ( SIGNED_NAT ) -&gt; EXP pti
  </programlisting>
  where the argument gives the encoding of the type as given in the
  following table: 
  </para>
  
  <table>
  <tr><th>Type</th>   <th>Encoding</th>
  <th>Type</th>   <th>Encoding</th>
  </tr>
  <tr><td>char</td>  <td>0</td>
  <td>unsigned long</td> <td>11</td>
  </tr>
  <tr><td>(error)</td> <td>1</td>
  <td>float</td>  <td>12</td>
  </tr>
  <tr><td>void</td>  <td>2</td>
  <td>double</td> <td>13</td>
  </tr>
  <tr><td>(bottom)</td> <td>3</td>
  <td>long double</td> <td>14</td>
  </tr>
  <tr><td>signed char</td> <td>4</td>
  <td>wchar_t</td> <td>16</td>
  </tr>
  <tr><td>signed short</td> <td>5</td>
  <td>bool</td>  <td>17</td>
  </tr>
  <tr><td>signed int</td> <td>6</td>
  <td>(ptrdiff_t)</td> <td>18</td>
  </tr>
  <tr><td>signed long</td> <td>7</td>
  <td>(size_t)</td> <td>19</td>
  </tr>
  <tr><td>unsigned char</td> <td>8</td>
  <td>(...)</td>  <td>20</td>
  </tr>
  <tr><td>unsigned short</td><td>9</td>
  <td>signed long long</td>
  <td>23</td>
  </tr>
  <tr><td>unsigned int</td> <td>10</td>
  <td>unsigned long long</td>
  <td>27</td>
  </tr>
  </table>
  
  <para>
  Note that the encoding for the basic integral types is the same as
  that 
  <A HREF="#arith">given above</A>.  The other types are assigned to
  unused values.  Note that the encodings for <code>ptrdiff_t</code>
  and 
  <code>size_t</code> are not used, instead that for their implementation
  is used (using the standard tokens <code>ptrdiff_t</code> and 
  <code>size_t</code>).  The encodings for <code>bool</code> and 
  <code>wchar_t</code> are used because they are conceptually distinct
  types even though they are implemented as one of the basic integral
  types.  The type labelled <code>...</code> is the dummy used in the
  representation of ellipsis functions.  The default implementation
  uses an array of type information structures, <code>__TCPPLUS_typeid</code>,
  to implement <code>~cpp.typeid.basic</code>. 
  </para>
  <para>
  The run-time type information structures for classes are defined in
  the same place as their <A HREF="#vtable">virtual function tables</A>.
  Other run-time type information structures are defined in whatever
  modules require them.  In the former case the type information structure
  will have an <A HREF="#other">external tag name</A>; in the latter
  case it will be an internal tag. 
  </para>
  
  <H4>Accessing run-time type information</H4>
  <para>
  The primary means of accessing the run-time type information for an
  object is using the <code>typeid</code> construct.  In cases where
  the operand type can be determined statically, the address of the
  corresponding type information structure is returned.  In other cases
  the token: 
  <programlisting>
        ~cpp.typeid.ref : ( EXP ppvt ) -&gt; EXP pti
  </programlisting>
  is used, where the argument gives a reference to the <I>vptr</I> field
  of the object being checked.  From this information it is trivial
  to trace the corresponding type information. 
  </para>
  <para>
  Another means of querying the run-time type information for an object
  is using the <code>dynamic_cast</code> construct.  When the result
  cannot be determined statically, this is implemented using the token:
  <programlisting>
        ~cpp.dynam.cast : ( EXP ppvt, EXP pti ) -&gt; EXP pv
  </programlisting>
  where the first expression gives a reference to the <I>vptr</I> field
  of the object being cast and the second gives the run-time type information
  for the type being cast to.  In the default implementation this token
  is implemented by the procedure <code>__TCPPLUS_dynamic_cast</code>.
  The key point to note is that the virtual function table contains
  the offset, <I>voff</I>, of the <I>vptr</I> field from the start of
  the most complete object.  Thus it is possible to find the address
  of the most complete object.  The run-time type information contains
  enough information to determine whether this object has a sub-object
  of the type being cast to, and if so, how to find the address of this
  sub-object.  The result is returned as a <code>void *</code>, with
  the null pointer indicating that the conversion is not possible. 
  </para>
  </sect3>  
  
  <sect3 id="dynamic-initialisation">
    <title>2.6.15. Dynamic initialisation</title>
  <para>
  The dynamic initialisation of variables with static storage duration
  in C++ is implemented by means of the TDF <code>initial_value</code>
  construct.  However in order for the producer to maintain control
  over the order of initialisation, rather than each variable being
  initialised separately using <code>initial_value</code>, a single
  expression is created which initialises all the variables in a module,
  and this initialiser expression is used to initialise a single dummy
  variable using <code>initial_value</code>.  Note that, while this
  enables the variables within a single module to be initialised in
  the order in which they are defined, the order of initialisation between
  different modules is unspecified. 
  </para>
  <para>
  The implementation needs to keep a list of those variables with static
  storage duration which have been initialised so that it can call the
  destructors for these objects at the end of the program. This is done
  by declaring a variable of shape: 
  <programlisting>
        ~cpp.destr.type : () -&gt; SHAPE
  </programlisting>
  for each such object with a non-trivial destructor.  Each element
  of an array is considered a distinct object.  Immediately after the
  variable has been initialised the token: 
  <programlisting>
        ~cpp.destr.global : ( EXP pd, EXP POINTER c, EXP PROC ) -&gt; EXP TOP
  </programlisting>
  is called to add the variable to the list of objects to be destroyed.
  The first argument is the address of the dummy variable just declared,
  the second is the address of the object to be destroyed, and the third
  is the destructor to be used.  In this way a list giving the objects
  to be destroyed, and the order in which to destroy them, is built
  up.  Note that partially constructed objects are destroyed within
  their constructors (see <A HREF="#partial">above</A>) so that only
  completely constructed objects need to be considered. 
  </para>
  <para>
  The implementation also needs to ensure that it calls the destructors
  in this list at the end of the program, including calls of 
  <code>exit</code>.  This is done by calling the token: 
  <programlisting>
        ~cpp.destr.init : () -&gt; EXP TOP
  </programlisting>
  at the start of each <code>initial_value</code> construct.  In the
  default implementation this uses <code>atexit</code> to register a
  function, <code>__TCPPLUS_term</code>, which calls the destructors.
  To aid alternative implementations the token: 
  <programlisting>
        ~cpp.start : () -&gt; EXP TOP
  </programlisting>
  is called at the start of the <code>main</code> function, however
  this has no effect in the default implementation. 
  </para>
  </sect3>  
  
  <sect3 id="except">
    <title>2.6.16. Exception handling</title>
  <para>
  Conceptually, exception handling can be described in terms of the
  following diagram: 
  
  <IMG SRC="../images/try.gif" ALT="try stack"/>
  
  At any point in the execution of the program there is a stack of currently
  active <code>try</code> blocks and currently active local variables.
  A 
  <code>try</code> block is pushed onto the stack as it is entered and
  popped from the stack when it is left (whether directly or via a jump).
  A local variable with a non-trivial destructor is pushed onto the
  stack just after its constructor has been called at the start of its
  scope, and popped from the stack just before its destructor is called
  at the end of its scope (including before jumps out of its scope).
  Each element of an array is considered a separate object.  Each <code>try</code>
  block has an associated list of handlers.  Each local variable has
  an associated destructor. 
  </para>
  <para>
  Provided no exception is thrown this stack grows and shrinks in a
  well-behaved manner as execution proceeds.  When an exception is thrown
  an exception manager is invoked to find a matching exception handler.
  The exception manager proceeds to execute a loop to unwind the stack
  as follows.  If the stack is empty then the exception cannot be caught
  and 
  <code>std::terminate</code> is called.  Otherwise the top element
  is popped from the stack.  If this is a local variable then the associated
  destructor is called for the variable.  If the top element is a 
  <code>try</code> block then the current exception is compared in turn
  to each of the associated handlers.  If a match is found then execution
  jumps to the handler body, otherwise the exception manager continues
  to the next element of the stack. 
  </para>
  <para>
  Note that this description is purely conceptual.  There is no need
  for exception handling to be implemented by a stack in this way (although
  the default implementation uses a similar technique).  It does however
  serve to illustrate the various stages which must exist in any implementation.
  </para>
  
  <H4>Try blocks</H4>
  <para>
  At the start of a <code>try</code> block a variable of shape: 
  <programlisting>
        ~cpp.try.type : () -&gt; SHAPE
  </programlisting>
  is declared corresponding to the stack element for this block.  This
  is then initialised using the token: 
  <programlisting>
        ~cpp.try.begin : ( EXP ptb, EXP POINTER fa, EXP POINTER ca ) -&gt; EXP TOP
  </programlisting>
  </para>
  where the first argument is a pointer to this variable, the second
  argument is the TDF <code>current_env</code> construct, and the third
  argument is the result of the TDF <code>make_local_lv</code> construct
  on the label which is used to mark the first handler associated with
  the block.  Note that the last two arguments enable a TDF 
  <code>long_jump</code> construct to be applied to transfer control
  to the first handler. 
  <para>
  When control exits from a <code>try</code> block, whether by reaching
  the end of the block or jumping out of it, the block is removed from
  the stack using the token: 
  <programlisting>
        ~cpp.try.end : ( EXP ptb ) -&gt; EXP TOP
  </programlisting>
  where the argument is a pointer to the <code>try</code> block variable.
  </para>
  
  <H4>Local variables</H4>
  <para>
  The technique used to add a local variable with a non-trivial destructor
  to the stack is similar to that used in the dynamic initialisation
  of global variables.  A local variable of shape <code>~cpp.destr.type</code>
  is declared at the start of the variable scope.  This is initialised
  just after the constructor for the variable is called using the token:
  <programlisting>
        ~cpp.destr.local : ( EXP pd, EXP POINTER c, EXP PROC ) -&gt; EXP TOP
  </programlisting>
  where the first argument is a pointer to the variable being initialised,
  the  second is a pointer to the local variable to be destroyed, and
  the third is the destructor to be called.  At the end of the variable
  scope, just before its destructor is called, the token: 
  <programlisting>
        ~cpp.destr.end : ( EXP pd ) -&gt; EXP TOP
  </programlisting>
  where the argument is a pointer to destructor variable, is called
  to remove the local variable destructor from the stack.  Note that
  partially constructed objects are destroyed within their constructors
  (see 
  <A HREF="#partial">above</A>) so that only completely constructed
  objects need to be considered. 
  </para>
  <para>
  In cases where the local variable may be conditionally initialised
  (for example a temporary variable in the second operand of a <code>||</code>
  operation) the local variable of shape <code>~cpp.destr.type</code>
  is initialised to the value given by the token: 
  <programlisting>
        ~cpp.destr.null : () -&gt; EXP d
  </programlisting>
  (normally it is  left uninitialised).  Before the destructor for this
  variable is called the value of the token: 
  <programlisting>
        ~cpp.destr.ptr : ( EXP pd ) -&gt; EXP POINTER c
  </programlisting>
  is tested.  If <code>~cpp.destr.local</code> has been called for this
  variable then this token returns a pointer to the variable, otherwise
  it returns a null pointer.  The token <code>~cpp.destr.end</code>
  and the destructor are only called if this token indicates that the
  variable has been initialised. 
  </para>
  
  <H4>Throwing an exception</H4>
  <para>
  When a <code>throw</code> expression with an argument is encountered
  a number of steps performed.  Firstly, space is allocated to hold
  the exception value using the token: 
  <programlisting>
        ~cpp.except.alloc : ( EXP VARIETY size_t ) -&gt; EXP pv
  </programlisting>
  the argument of which gives the size of the value.  The space allocated
  is returned as an expression of type <code>void *</code>.  Secondly,
  the exception value is copied into the space allocated, using a copy
  constructor if appropriate.  Finally the exception is raised using
  the token: 
  <programlisting>
        ~cpp.except.throw : ( EXP pv, EXP pti, EXP PROC ) -&gt; EXP BOTTOM
  </programlisting>
  The first argument gives the pointer to the exception value, returned
  by 
  <code>~cpp.except.alloc</code>, the second argument gives a pointer
  to the run-time type information for the exception type, and the third
  argument gives the destructor to be called to destroy the exception
  value (if any). This token sets the current exception to the given
  values and invokes the exception manager as above. 
  </para>
  <para>
  A <code>throw</code> expression without an argument results in a call
  to the token: 
  <programlisting>
        ~cpp.except.rethrow : () -&gt; EXP BOTTOM
  </programlisting>
  which re-invokes the exception manager with the current exception.
  If there is no current exception then the implementation should call
  <code>std::terminate</code>. 
  </para>
  
  <H4>Handling an exception</H4>
  <para>
  The exception manager proceeds to find an exception in the manner
  described above, unwinding the stack and calling destructors for local
  variables.  When a <code>try</code> block is popped from the stack
  a TDF <code>long_jump</code> is applied to transfer control to its
  list of handlers.  For each handler in turn it is checked whether
  the handler can catch the current exception.  For <code>...</code>
  handlers this is always true; for other handlers it is checked using
  the token: 
  <programlisting>
        ~cpp.except.catch : ( EXP pti ) -&gt; EXP VARIETY int
  </programlisting>
  where the argument is a pointer to the run-time type information for
  the handler type.  This token gives 1 if the exception is caught by
  this handler, and 0 otherwise.  If the exception is not caught by
  the handler then the next handler is checked, until there are no more
  handlers associated with the <code>try</code> block.  In this case
  control is passed back to the exception manager by re-throwing the
  current exception using <code>~cpp.except.rethrow</code>. 
  </para>
  <para>
  If an exception is caught by a handler then a number of steps are
  performed. Firstly, if appropriate, the handler variable is initialised
  by copying the current exception value.  A pointer to the current
  exception value can be obtained using the token: 
  <programlisting>
        ~cpp.except.value : () -&gt; EXP pv
  </programlisting>
  Once this initialisation is complete the token: 
  <programlisting>
        ~cpp.except.caught : () -&gt; EXP TOP
  </programlisting>
  is called to indicate that the exception has been caught.  The handler
  body is then executed.  When control exits from the handler, whether
  by reaching the end of the handler or by jumping out of it, the token:
  <programlisting>
        ~cpp.except.end : () -&gt; EXP TOP
  </programlisting>
  is called to indicate that the exception has been completed.  Note
  that the implementation should call the destructor for the current
  exception and free the space allocated by <code>~cpp.except.alloc</code>
  at this point. Execution then continues with the statement following
  the handler. 
  </para>
  <para>
  To conclude, the TDF generated for a <code>try</code> block and its
  associated list of handlers has the form: 
  <programlisting>
        variable (
            long_jump_access,
            stack_tag,
            make_value ( ~cpp.try.type ),
            conditional (
                handler_label,
                sequence (
                    ~cpp.try.begin (
                        obtain_tag ( stack_tag ),
                        current_env,
                        make_local_lv ( handler_label ) ),
                        <I>try-block-body</I>,
                        ~cpp.try.end ),
                    conditional (
                        catch_label_1,
                        sequence (
                            integer_test (
                                not_equal,
                                catch_label_1,
                                ~cpp.except.catch (
                                    <I>handler-1-typeid</I> ) )
                            variable (
                                handler_tag_1,
                                <I>handler-1-init</I> (
                                    ~cpp.except.value ),
                                sequence (
                                    ~cpp.except.caught,
                                    <I>handler-1-body</I> ) )
                            ~cpp.except.end )
                        conditional (
                            catch_label_2,
                            <I>further-handlers</I>,
                            ~cpp.except.rethrow ) ) ) )
  </programlisting>
  </para>
  <para>
  Note that for a local variable to maintain its previous value when
  an  exception is caught in this way it is necessary to declare it
  using the TDF <code>long_jump_access</code> construct.  Any local
  variable which contains a <code>try</code> block in its scope is declared
  in this way. 
  </para>
  <para>
  To aid implementations in the writing of exception managers the following
  standard tokens are provided: 
  <programlisting>
        ~cpp.ptr.code : () -&gt; SHAPE POINTER ca
        ~cpp.ptr.frame : () -&gt; SHAPE POINTER fa
        ~cpp.except.jump : ( EXP POINTER fa, EXP POINTER ca ) -&gt; EXP BOTTOM
  </programlisting>
  These give the shape of the TDF <code>make_local_lv</code> construct,
  the shape of the TDF <code>current_env</code> construct, and direct
  access to the TDF <code>long_jump</code> access.  The exception manager
  in the default implementation is a function called <code>__TCPPLUS_throw</code>.
  </para>
  
  <H4>Exception specifications</H4>
  <para>
  If a function is declared with an exception specification then extra
  code needs to be generated in the function definition to catch any
  unexpected exceptions thrown by the function and to call <code>std::unexpected
  </code>. Since this is a potentially high overhead for small functions,
  this extra code is not generated if it can be proved that such unexpected
  exceptions can never be thrown (the analysis is essentially the same
  as that in the 
  <A HREF="pragma.html#exception">exception analysis</A> check). 
  </para>
  <para>
  The implementation of exception specification is to enclose the entire
  function definition in a <code>try</code> block.  The handler for
  this block uses <code>~cpp.except.catch</code> to check whether the
  current exception can be caught by any of the types listed in the
  exception specification.  If so the current exception is re-thrown.
  If none of these types catch the current exception then the token:
  <programlisting>
        ~cpp.except.bad : ( SIGNED_NAT ) -&gt; EXP TOP
  </programlisting>
  is called.  The argument is 1 if the exception specification includes
  the special type <code>std::bad_exception</code>, and 0 otherwise.
  The implementation should call <code>std::unexpected</code>, but how
  any exceptions thrown during this call are to be handled depends on
  the value of the argument. 
  </para>
  </sect3>  
  
  <sect3 id="mangle">
    <title>2.6.17. Mangled identifier names</title>
  <para>
  In a similar fashion to other C++ compilers, the C++ producer needs
  a method of mapping C++ identifiers to a form suitable for further
  processing, namely TDF tag names.  This mangled name contains an encoding
  of the identifier name, its parent namespace or class and its type.
  Identifiers with C linkage are not mangled.  The producer contains
  a built-in <A HREF="man.html#unmangle">name unmangler</A>
  which performs the reverse operation of transforming the mangled form
  of an identifier name back to the underlying identifier.  This can
  be useful when analysing system linker errors. 
  </para>
  <para>
  Note that the type of an identifier forms part of its mangled name
  not only for functions, but also for variables.  Many other compilers
  do not mangle variable names, however the ISO C++ rules on namespaces
  and variables with C linkage make it necessary (this can be suppressed
  using the <code>-j-n</code> command-line option).  Declaring the language
  linkage of a variable inconsistently can therefore lead to linking
  errors with the C++ producer which are not detected by other compilers.
  A common example is: 
  <programlisting>
        extern int errno ;
  </programlisting>
  which, leaving aside whether <code>errno</code> is actually an external
  variable, should be: 
  <programlisting>
        extern &quot;C&quot; int errno ;
  </programlisting>
  </para>
  <para>
  As described above, the mangled form of an identifier has three components;
  the identifier name, the identifier namespace and the identifier type.
  Two underscores (<code>__</code>) are used to separate the name component
  from the namespace and type components.  The mangling scheme used
  is based on that described in the ARM.  The description below is not
  complete; the mangling and unmangling routines themselves should be
  consulted for a complete description. 
  </para>
  
  <H4>Mangling identifier names</H4>
  <para>
  Simple identifier names are mapped to themselves.  Unicode characters
  of the forms <code>\u</code><I>xxxx</I> and <code>\U</code><I>xxxxxxxx</I>
  are mapped to <code>__k</code><I>xxxx</I> and <code>__K</code><I>xxxxxxxx</I>
  respectively, where the hex digits are output in their canonical lower-case
  form.  Constructors are mapped to <code>__ct</code> and destructors
  to <code>__dt</code>.  Conversions functions are mapped to 
  <code>__op</code><I>type</I> where <I>type</I> is the mangled form
  of the conversion type.  Overloaded operator functions, 
  <code>operator@</code>, are mapped as follows: 
  </para>
  
  <table>
  <tr><th>Operator</th>   <th>Mapping</th>
  <th>Operator</th>   <th>Mapping</th>
  <th>Operator</th>   <th>Mapping</th>
  </tr>
  <tr><td>&amp;</td>  <td>__ad</td>
  <td>&amp;=</td> <td>__aad</td>
  <td>[]</td>  <td>__vc</td>
  </tr>
  <tr><td>-&gt;</td>  <td>__rf</td>
  <td>-&gt;*</td> <td>__rm</td>
  <td>=</td>  <td>__as</td>
  </tr>
  <tr><td>,</td>  <td>__cm</td>
  <td>~</td>  <td>__co</td>
  <td>/</td>  <td>__dv</td>
  </tr>
  <tr><td>/=</td>  <td>__adv</td>
  <td>==</td>  <td>__eq</td>
  <td>()</td>  <td>__cl</td>
  </tr>
  <tr><td>&gt;</td>  <td>__gt</td>
  <td>&gt;=</td>  <td>__ge</td>
  <td>&lt;</td>  <td>__lt</td>
  </tr>
  <tr><td>&lt;=</td>  <td>__le</td>
  <td>&amp;&amp;</td> <td>__aa</td>
  <td>||</td>  <td>__oo</td>
  </tr>
  <tr><td>&lt;&lt;</td> <td>__ls</td>
  <td>&lt;&lt;=</td> <td>__als</td>
  <td>-</td>  <td>__mi</td>
  </tr>
  <tr><td>-=</td>  <td>__ami</td>
  <td>--</td>  <td>__mm</td>
  <td>!</td>  <td>__nt</td>
  </tr>
  <tr><td>!=</td>  <td>__ne</td>
  <td>|</td>  <td>__or</td>
  <td>|=</td>  <td>__aor</td>
  </tr>
  <tr><td>+</td>  <td>__pl</td>
  <td>+=</td>  <td>__apl</td>
  <td>++</td>  <td>__pp</td>
  </tr>
  <tr><td>%</td>  <td>__md</td>
  <td>%=</td>  <td>__amd</td>
  <td>&gt;&gt;</td> <td>__rs</td>
  </tr>
  <tr><td>&gt;&gt;=</td> <td>__ars</td>
  <td>*</td>  <td>__ml</td>
  <td>*=</td>  <td>__aml</td>
  </tr>
  <tr><td>^</td>  <td>__er</td>
  <td>^=</td>  <td>__aer</td>
  <td>delete</td> <td>__dl</td>
  </tr>
  <tr><td>delete []</td> <td>__vd</td>
  <td>new</td>  <td>__nw</td>
  <td>new []</td> <td>__vn</td>
  </tr>
  <tr><td>?:</td>  <td>__cn</td>
  <td>:</td>  <td>__cs</td>
  <td>::</td>  <td>__cc</td>
  </tr>
  <tr><td>.</td>  <td>__df</td>
  <td>.*</td>  <td>__dm</td>
  <td>abs</td>  <td>__ab</td>
  </tr>
  <tr><td>max</td>  <td>__mx</td>
  <td>min</td>  <td>__mn</td>
  <td>sizeof</td> <td>__sz</td>
  </tr>
  <tr><td>typeid</td> <td>__td</td>
  <td>vtable</td> <td>__tb</td>
  <td>-</td>  <td>-</td>
  </tr>
  </table>
  
  <para>
  Note that this table contains a number of operators which are not
  part of C++ or cannot be overloaded in C++.  These are used in the
  representation of target dependent integer constants. 
  </para>
  
  <H4>Mangling namespace names</H4>
  <para>
  The global namespace is mapped to an empty string.  Simple namespace
  and class names are mapped as above, but are preceded by a series
  of decimal digits giving the length of the mangled name.  Nested namespaces
  and classes are represented by a sequence of such namespace names,
  preceded by the number of elements in the sequence.  This takes the
  form <code>Q</code><I>digit</I> if there are less than 10 elements,
  or 
  <code>Q_</code><I>digits</I><code>_</code> if there are more than
  10. Note that members of anonymous classes or namespaces are local
  to their translation unit, and so do not have external tag names.
  </para>
  
  <H4>Mangling types</H4>
  <para>
  The mangling of types is essentially similar to that used in the 
  <A HREF="dump.html">symbol table dump</A> format.  The type used in
  the mangled name for an identifier ignores the return type for a function
  and ignores the most significant bound for an array. 
  </para>
  <para>
  The built-in types are mapped in precisely the same way as in the
  <A HREF="dump.html#built-in">symbol table dump</A>.  Class and enumeration
  types are mapped to their type names mangled in the same way as the
  namespace names above.  The exception to this is that in a class member,
  the parent class is mapped to <code>X</code>. 
  </para>
  <para>
  The composite types are again mapped in a similar fashion to that
  in the <A HREF="dump.html#composite">dump file</A>.  For example,
  <code>PCc</code> represents <code>const char *</code>.  The only difficult
  case concerns function parameter types where the ARM 
  <code>T</code> and <code>N</code> encodings are used for duplicate
  parameter types.  The function return type is included in the mangled
  form except for function identifier types.  In the cases where the
  identifier is known always to represent a function (constructors,
  destructors etc.) the initial <code>F</code>
  indicating a function type is also omitted. 
  </para>
  <para>
  The types of template functions and classes are represented by the
  underlying template and the template arguments giving rise to the
  instance.  Template classes are preceded by <code>t</code>; template
  functions are preceded by <code>G</code> rather than <code>F</code>.
  Type arguments are represented by <code>Z</code> followed by the type
  value; non-type arguments are represented by the argument type followed
  by the argument value.  In the underlying type the template parameters
  are represented by <code>m0</code>, <code>m1</code> etc. An alternative
  scheme, in which the mangled form of a template function includes
  the type of that instance, rather than the underlying template, can
  be enabled using the <code>-j-f</code>
  command-line option. 
  </para>
  
  <H4><A id="other">Other mangled names</A></H4>
  <para>
  The <A HREF="#vtable">virtual function table</A> for a class, when
  this is a variable with external linkage, is named <code>__vt__</code><I>type
  </I>, where <I>type</I> is the mangled form of the class name.  The
  virtual function table for a base class is named <code>__vt__</code><I>base</I>
  where <I>base</I> is a sequence of mangled class names specifying
  the base class.  The <A HREF="#rtti">run-time type information structure</A>
  for a type, when this is a variable with external linkage, is named
  <code>__ti__</code><I>type</I>, where <I>type</I> is the mangled form
  of the type name. 
  </para>
  
  <H4>Mangled name examples</H4>
  <para>
  The following gives some examples of the name mangling scheme: 
  <programlisting>
        class A {
            static int a ;                      // a__1Ai
        public :
            A () ;                              // __ct__1A
            A ( int ) ;                         // __ct__1Ai
            A ( const A &amp; ) ;                       // __ct__1ARCX
            virtual ~A () ;                     // __dt__1A
            operator bool () ;                  // __opb__1A
            bool operator! () ;                 // __nt__1A
        } ;
  
        // virtual function table       __vt__1A
        // run-time type information    __ti__1A
  
        int f ( A *, int, A * ) ;               // f__FP1AiT1
        int b = 2 ;                             // b__i
        int c [3] ;                             // c__A_i
  
        namespace N {
            int *p = 0 ;                        // p__1NPi
        }
  </programlisting>
  </para>
  </sect3>
  </sect2>
  
  <sect2>
    <title>2.7. Standard library</title>
  <para>
  At present the default implementation contains only a very small fraction
  of the ISO C++ library, namely those headers - 
  <code>&lt;exception&gt;</code>, <code>&lt;new&gt;</code> and 
  <code>&lt;typeinfo&gt;</code> - which are an integral part of the
  language specification.  These headers are also those which require
  the most cooperation between the producer and the library implementation,
  as described in the <A HREF="lib.html">previous section</A>. 
  </para>
  <para>
  It is suggested that if further library components are required then
  they be acquired from third parties.  It should be noted however that
  such libraries may require <A HREF="#porting">some effort</A> to be
  ported to an ISO compliant compiler; for example, some information
  on porting the <code>libio</code> component of <code>libg++</code>,
  which contains some very compiler-dependent code, are 
  <A HREF="#libio">given below</A>.  Libraries compiled with other C++
  compilers may not link correctly with modules compiled using <code>tcc</code>.
  </para>
  
  
  <sect3 id="porting">
    <title>2.7.1. Common porting problems</title>
  <para>
  Experience in porting pre-ISO C++ programs has shown that the following
  new ISO C++ features tend to cause the most problems: 
  <itemizedlist>
  <listitem><A HREF="pragma.html#implicit">Implicit <code>int</code></A> has
  been banned. 
  </listitem>
  <listitem><A HREF="pragma.html#string">String literals are now <code>const</code>
  </A>, although in simple assignments the <code>const</code> is
  implicitly removed. 
  </listitem>
  <listitem>The scope of a <A HREF="pragma.html#for">variable declared in
  a for-init-statement</A> is the <code>for</code> statement itself.
  </listitem>
  <listitem><A HREF="lib.html#mangle">Variables have linkage</A> and so should
  be declared <code>extern &quot;C&quot;</code> if appropriate. 
  </listitem>
  <listitem>The standard C library is now declared in the <code>std</code>
  namespace. 
  </listitem>
  <listitem>The <A HREF="pragma.html#template">template compilation model</A>
  has been clarified.  The notation for explicit instantiation and 
  specialisation has changed. 
  </listitem>
  <listitem>Templates are analysed at their point of definition as well as
  their point of instantiation. 
  </listitem>
  <listitem><A HREF="pragma.html#keyword">New keywords</A> have been introduced.
  </listitem>
  </itemizedlist>
  Note that many of these features have controlling <code>#pragma</code>
  directives, so that it is possible to switch to using the pre-ISO
  features. 
  </para>
  </sect3>  
  
  <sect3 id="libio">
    <title>2.7.2. Porting <code>libio</code></title>
  <para>
  Perhaps the library component which is most likely to be required
  is 
  <code>&lt;iostream&gt;</code>.  A readily available freeware implementation
  of a pre-ISO (i.e. non-template) <code>&lt;iostream&gt;</code>
  package is given by the <code>libio</code> component of <code>libg++</code>.
  This section describes some of the problems encountered in porting
  this package (version 2.7.1).  
  </para>
  <para>
  The <A HREF="man.html"><code>tcc</code> compiler flags</A> used in
  porting <code>libio</code> were: 
  <programlisting>
        tcc -Yposix -Yc++ -sC:cc
  </programlisting>
  indicating that the POSIX API is to be used and that the <code>.cc</code>
  suffix is used to identify C++ source files. 
  </para>
  <para>
  In <code>iostream.h</code>, <code>cin</code>, <code>cout</code>, 
  <code>cerr</code> and <code>clog</code> should be declared with C
  linkage, otherwise the C++ producer includes the type in the 
  <A HREF="lib.html#mangle">mangled name</A> and the fake 
  <code>iostream</code> hacks in <code>stdstream.cc</code> don't work.
  The definition of <code>EOF</code> in this header can cause problems
  if both <code>iostream.h</code> and <code>stdio.h</code> are included.
  In this case <code>stdio.h</code> should be included first. 
  </para>
  <para>
  In <code>stdstream.cc</code>, the <A HREF="lib.html#derive">correct
  definitions</A> for the fake <code>iostream</code> structures are
  as follows: 
  <programlisting>
        struct _fake_istream::myfields {
            _ios_fields *vb ;           // pointer to virtual base class ios
            _IO_ssize_t _gcount ;       // istream fields
            void *vptr ;                // pointer to virtual function table
        } ;
  
        struct _fake_ostream::myfields {
            _ios_fields *vb ;           // pointer to virtual base class ios
            void *vptr ;                // pointer to virtual function table
        } ;
  </programlisting>
  The fake definition macros are then defined as follows: 
  <programlisting>
        #define OSTREAM_DEF( NAME, SBUF, TIE, EXTRA_FLAGS )\
            extern &quot;C&quot; _fake_ostream NAME = { { &amp;NAME.base, 0 }, .... } ;
  
        #define ISTREAM_DEF( NAME, SBUF, TIE, EXTRA_FLAGS )\
            extern &quot;C&quot; _fake_istream NAME = { { &amp;NAME.base, 0, 0 }, .... } ;
  </programlisting>
  Note that these are declared with C linkage as above. 
  </para>
  <para>
  In <code>stdstrbufs.cc</code>, the <A HREF="lib.html#other">correct
  definitions</A> for the virtual function table names are as follows:
  <programlisting>
        #define filebuf_vtable          __vt__7filebuf
        #define stdiobuf_vtable         __vt__8stdiobuf
  </programlisting>
  Note that the <code>_G_VTABLE_LABEL_PREFIX</code> macro is incorrectly
  defined by the configuration process (it should be <code>__vt__</code>),
  but the <code>##</code> directives in which it is used don't work
  on an ISO compliant preprocessor anyway (token concatenation takes
  place after replacement of macro parameters, but before further macro
  expansion). The dummy virtual function tables should also be declared
  with C linkage to suppress name mangling. 
  </para>
  In addition, the initialisation of the standard streams relies on
  the file pointers <code>stdout</code> etc. being constant expressions,
  which in general they are not.  The directive:
  <programlisting>
        #pragma TenDRA++ rvalue token as const allow
  </programlisting>
  will cause the C++ producer to assume that all <A HREF="token.html#exp">
  tokenised rvalue expressions</A> are constant.
  <para>
  In <code>streambuf.cc</code>, if <code>errno</code> is to be explicitly
  declared it should have C linkage or be declared in the <code>std</code>
  namespace. 
  </para>
  <para>
  In <code>iomanip.cc</code>, the explicit template instantiations should
  be prefixed by <code>template</code>.  The corresponding template
  declarations in <code>iomanip.h</code> should be declared using 
  <A HREF="pragma.html#template"><code>export</code></A> (note that
  the <code>__GNUG__</code> version uses <code>extern</code>, which
  may yet win out over <code>export</code>). 
  </para>
  </sect3>
  </sect2>
  </sect1>
  
  <sect1>
  <title>
  C++ Producer Guide: Style guide 
  </title>
  
  <sect2>
    <title>3.1. Source code organisation</title>
  <para>
  This section describes the basic organisation of the source code for
  the C++ producer.  This includes the coding conventions applied, the
  application programming interface (API) observed and the division
  of the code into separate modules. 
  </para>
  
  
  <sect3 id="language">
    <title>3.1.1. C coding standard</title>
  <para>
  The C++ producer is written in a subset of C which is compatible with
  C++ (it compiles with most C compilers, but also bootstraps itself).
  It has been written to conform to the local (OSSG) 
  <A HREF="index.html#cstyle">C coding standard</A>; most of the conformance
  checking being automated by use of a 
  <A HREF="pragma.html#usr">user-defined compilation profile</A>, 
  <code>ossg_std.h</code>.  The standard macros described in the coding
  standard are defined in the standard header <code>ossg.h</code>. This
  is included from the header <code>config.h</code> which is included
  by all source files.  The default definitions for these macros, set
  according to the value of <code>__STDC__</code> and other compiler-defined
  macros, should be correct, but they can be overridden by defining
  the <code>FS_*</code> macros, described in the header, as command-line
  options. 
  </para>
  <para>
  The most important of these macros are those used to handle function
  prototypes, enabling both ISO and pre-ISO C compilers to be accommodated.
  Simple function definitions take the form: 
  <programlisting>
        ret function
            PROTO_N ( ( p1, p2, ...., pn ) )
            PROTO_T ( par1 p1 X par2 p2 X .... X parn pn )
        {
            ....
        }
  </programlisting>
  with the <code>PROTO_N</code> macro being used to list the parameter
  names (note the double bracket) and the <code>PROTO_T</code> macro
  being used to list the parameter types using <code>X</code> (cartesian
  product) as a separator.  The corresponding function declaration will
  have the form: 
  <programlisting>
        ret function PROTO_S ( ( par1, par2, ...., parn ) ) ;
  </programlisting>
  The case where there are no parameter types is defined using: 
  <programlisting>
        ret function
            PROTO_Z ()
        {
            ....
        }
  </programlisting>
  and declared as: 
  <programlisting>
        ret function PROTO_S ( ( void ) ) ;
  </programlisting>
  Functions with ellipses are defined using: 
  <programlisting>
        #if FS_STDARG
        #include &lt;stdarg.h&gt;
        #else
        #include &lt;varargs.h&gt;
        #endif
  
        ret function
            PROTO_V ( ( par1 p1, par2 p2, ...., parn pn, ... ) )
        {
            va_list args ;
            ....
        #if FS_STDARG
            va_start ( args, pn ) ;
        #else
            par1 p1 ;
            par2 p2 ;
            ....
            parn pn ;
            va_start ( args ) ;
            p1 = va_arg ( args, par1 ) ;
            p2 = va_arg ( args, par2 ) ;
            ....
            pn = va_arg ( args, parn ) ;
        #endif
            ....
            va_end ( args ) ;
            ....
        }
  </programlisting>
  and declared as: 
  <programlisting>
        ret function PROTO_W ( ( par1, par2, ...., parn, ... ) ) ;
  </programlisting>
  Note that <code>&lt;varargs.h&gt;</code> does not allow for parameters
  preceding the <code>va_alist</code>, so the fixed parameters need
  to be explicitly assigned from <code>args</code>. 
  </para>
  <para>
  The following <A HREF="pragma.html#keyword">TenDRA keywords</A> are
  defined (with suitable default values for non-TenDRA compilers): 
  <programlisting>
        #pragma TenDRA keyword SET for set
        #pragma TenDRA keyword UNUSED for discard variable
        #pragma TenDRA keyword IGNORE for discard value
        #pragma TenDRA keyword EXHAUSTIVE for exhaustive
        #pragma TenDRA keyword REACHED for set reachable
        #pragma TenDRA keyword UNREACHED for set unreachable
        #pragma TenDRA keyword FALL_THROUGH for fall into case
  </programlisting>
  </para>
  <para>
  Various flags giving properties of the compiler being used are defined
  in <code>ossg.h</code>.  Among the most useful are <code>FS_STDARG</code>,
  which is true if the compiler supports ellipsis functions (see above),
  and <code>FS_STDC_HASH</code>, which is true if the preprocessor supports
  the ISO stringising and concatenation operators.  The macros 
  <code>CONST</code> and <code>VOLATILE</code>, to be used in place
  of 
  <code>const</code> and <code>volatile</code>, are also defined. 
  </para>
  <para>
  A policy of rigorous static program checking is enforced.  The TenDRA
  C producer is applied with the user-defined compilation mode 
  <code>ossg_std.h</code> and intermodule checks enabled.  Checking
  is applied with both the C and <code>#pragma token</code>  
  <A HREF="../utilities/calc.html"><code>calculus</code> output files</A>.
  The C++ producer itself is applied with the same checks.  <code>gcc
  -Wall</code> and various versions of <code>lint</code> are also periodically
  applied. 
  </para>
  </sect3>  
  
  <sect3 id="api">
    <title>3.1.2. API usage and target dependencies</title>
  <para>
  Most of the API features used in the C++ producer are to be found
  in the ISO C API, with just a couple of extensions from POSIX required.
  These POSIX features can be disabled with minimal loss of functionality
  by defining the macro <code>FS_POSIX</code> to be false. 
  </para>
  <para>
  The following features are used from the ISO <code>&lt;stdio.h&gt;</code>
  header: 
  <programlisting>
        BUFSIZ          EOF             FILE            SEEK_SET
        fclose          fflush          fgetc           fgets
        fopen           fprintf         fputc           fputs
        fread           fseek           fwrite          rewind
        sprintf         stderr          stdin           stdout
        vfprintf
  </programlisting>
  from the ISO <code>&lt;stdlib.h&gt;</code> header: 
  <programlisting>
        EXIT_SUCCESS    EXIT_FAILURE    NULL            abort
        exit            free            malloc          realloc
        size_t
  </programlisting>
  and from the ISO <code>&lt;string.h&gt;</code> header: 
  <programlisting>
        memcmp          memcpy          strchr          strcmp
        strcpy          strlen          strncmp         strrchr
  </programlisting>
  The three headers just mentioned are included in all source files
  via the 
  <code>ossg_api.h</code> header file (included by <code>config.h</code>).
  The remaining headers are only included as and when they are needed.
  The following features are used from the ISO <code>&lt;ctype.h&gt;</code>
  header: 
  <programlisting>
        isalpha         isprint
  </programlisting>
  from the ISO <code>&lt;limits.h&gt;</code> header: 
  <programlisting>
        UCHAR_MAX       UINT_MAX        ULONG_MAX
  </programlisting>
  from the ISO <code>&lt;stdarg.h&gt;</code> header: 
  <programlisting>
        va_arg          va_end          va_list         va_start
  </programlisting>
  (note that if <code>FS_STDARG</code> is false the XPG3 
  <code>&lt;varargs.h&gt;</code> header is used instead); and from the
  ISO 
  <code>&lt;time.h&gt;</code> header: 
  <programlisting>
        localtime       time            time_t          struct tm
        tm::tm_hour     tm::tm_mday     tm::tm_min      tm::tm_mon
        tm::tm_sec      tm::tm_year
  </programlisting>
  The following features are used from the POSIX 
  <code>&lt;sys/stat.h&gt;</code> header: 
  <programlisting>
        stat            struct stat     stat::st_dev    stat::st_ino
        stat::st_mtime
  </programlisting>
  The <code>&lt;sys/types.h&gt;</code> header is also included to provide
  the necessary types for <code>&lt;sys/stat.h&gt;</code>. 
  </para>
  <para>
  There are a couple of target dependencies in the producer which can
  overridden using command-line options: 
  <itemizedlist>
  <listitem>It assumes that if a count of the number of characters read from
  an input file is maintained, then that count value can be used as
  an argument to <code>fseek</code>.  This may not be true on machines
  where the end of line marker consists of both a newline and a carriage
  return.  In this case the <code>-m-f</code> command-line option can
  be used to switch to a slower, but more portable, algorithm for setting
  file positions. 
  </listitem>
  <listitem>It assumes that a file is uniquely determined by the 
  <code>st_dev</code> and <code>st_ino</code> fields of its corresponding
  <code>stat</code> value.  This is used when processing 
  <code>#include</code> directives to prevent a file being read more
  than once if this is not necessary.  This assumption may not be true
  on machines with a small <code>ino_t</code> type which have file systems
  mounted from machines with a larger <code>ino_t</code> type.  In this
  case the <code>-m-i</code> command-line option can be used to disable
  this check. 
  </listitem>
  </itemizedlist>
  </para>
  </sect3>  
  
  <sect3 id="src">
    <title>3.1.3. Source code modules</title>
  <para>
  For convenience, the source code is divided between a number of directories:
  <itemizedlist>
  
  <listitem>The base directory contains only the module containing the 
  <code>main</code> function, the basic type descriptions and the 
  <code>Makefile</code>.  
  </listitem>
  <listitem>The directories <code>obj_c</code> and <code>obj_tok</code> contain
  respectively the C and <code>#pragma token</code> headers generated
  from the type algebra by <A HREF="../utilities/calc.html"><code>calculus</code>
  </A>.  The directory <code>obj_templ</code> contains certain <code>calculus
  </code>
  template files. 
  </listitem>
  <listitem>The directory <code>utility</code> contains routines for such
  utility operations as memory allocation and error reporting, including
  the <A HREF="error.html">error catalogue</A>. 
  </listitem>
  <listitem>The directory <code>parse</code> contains routines concerned with
  parsing and preprocessing the input, including the 
  <A HREF="../utilities/sid.html"><code>sid</code> grammar</A>. 
  </listitem>
  <listitem>The directory <code>construct</code> contains routines for building
  up and analysing the internal representation of the parsed code. 
  </listitem>
  <listitem>The directory <code>output</code> contains routines for outputting
  the internal representation in various formats including as a 
  <A HREF="tdf.html">TDF capsule</A>, a <A HREF="link.html">C++ spec
  file</A>, or a <A HREF="dump.html">symbol table dump file</A>. 
  </listitem>
  </itemizedlist>
  </para>
  <para>
  Each module consists of a C source file, <code><I>file</I>.c</code>
  say, containing function definitions, and a corresponding header file
  <code><I>file</I>.h</code> containing the declarations of these functions.
  The header is included within its corresponding source file to check
  these declarations; it is protected against multiple inclusions by
  a macro of the form <code><I>FILE</I>_INCLUDED</code>. The header
  contains a brief comment describing the purpose of the module; each
  function in the source file contains a comment describing its purpose,
  its inputs and its output. 
  </para>
  <para>
  The following table lists all the source modules in the C++ producer
  with a brief description of the purpose of each: 
  </para>
  <para>
  
  <table>
  <tr><th>Module</th> <th>Directory</th>
  <th>Purpose</th>
  </tr>
  <tr><td>access</td> <td>construct</td>
  <td>member access control</td>
  </tr>
  <tr><td>allocate</td> <td>construct</td>
  <td><code>new</code> and <code>delete</code> expressions</td>
  </tr>
  <tr><td>assign</td> <td>construct</td>
  <td>assignment expressions</td>
  </tr>
  <tr><td>basetype</td> <td>construct</td>
  <td>basic type operations</td>
  </tr>
  <tr><td>buffer</td> <td>utility</td>
  <td>buffer reading and writing routines</td>
  </tr>
  <tr><td>c_class</td> <td>obj_c</td>
  <td><code>calculus</code> support routines</td>
  </tr>
  <tr><td>capsule</td> <td>output</td>
  <td>top-level TDF encoding routines</td>
  </tr>
  <tr><td>cast</td> <td>construct</td>
  <td>cast expressions</td>
  </tr>
  <tr><td>catalog</td> <td>utility</td>
  <td>error catalogue definition</td>
  </tr>
  <tr><td>char</td> <td>parse</td>
  <td>character sets</td>
  </tr>
  <tr><td>check</td> <td>construct</td>
  <td>expression checking</td>
  </tr>
  <tr><td>chktype</td> <td>construct</td>
  <td>type checking</td>
  </tr>
  <tr><td>class</td> <td>construct</td>
  <td>class and enumeration definitions</td>
  </tr>
  <tr><td>compile</td> <td>output</td>
  <td>TDF tag definition encoding routines</td>
  </tr>
  <tr><td>constant</td> <td>parse</td>
  <td>integer constant evaluation</td>
  </tr>
  <tr><td>construct</td> <td>construct</td>
  <td>constructors and destructors</td>
  </tr>
  <tr><td>convert</td> <td>construct</td>
  <td>standard type conversions</td>
  </tr>
  <tr><td>copy</td> <td>construct</td>
  <td>expression copying</td>
  </tr>
  <tr><td>debug</td> <td>utility</td>
  <td>development aids</td>
  </tr>
  <tr><td>declare</td> <td>construct</td>
  <td>variable and function declarations</td>
  </tr>
  <tr><td>decode</td> <td>output</td>
  <td>bitstream reading routines</td>
  </tr>
  <tr><td>derive</td> <td>construct</td>
  <td>base class graphs; inherited members</td>
  </tr>
  <tr><td>destroy</td> <td>construct</td>
  <td>garbage collection routines</td>
  </tr>
  <tr><td>diag</td> <td>output</td>
  <td>TDF diagnostic output routines</td>
  </tr>
  <tr><td>dump</td> <td>output</td>
  <td>symbol table dump routines</td>
  </tr>
  <tr><td>encode</td> <td>output</td>
  <td>bitstream writing routines</td>
  </tr>
  <tr><td>error</td> <td>utility</td>
  <td>error output routines</td>
  </tr>
  <tr><td>exception</td> <td>construct</td>
  <td>exception handling</td>
  </tr>
  <tr><td>exp</td> <td>output</td>
  <td>TDF expression encoding routines</td>
  </tr>
  <tr><td>expression</td> <td>construct</td>
  <td>expression processing</td>
  </tr>
  <tr><td>file</td> <td>parse</td>
  <td>low-level I/O routines</td>
  </tr>
  <tr><td>function</td> <td>construct</td>
  <td>function definitions and calls</td>
  </tr>
  <tr><td>hash</td> <td>parse</td>
  <td>hash table and identifier name routines</td>
  </tr>
  <tr><td>identifier</td> <td>construct</td>
  <td>identifier expressions</td>
  </tr>
  <tr><td>init</td> <td>output</td>
  <td>TDF initialiser expression encoding routines</td>
  </tr>
  <tr><td>initialise</td> <td>construct</td>
  <td>variable initialisers</td>
  </tr>
  <tr><td>instance</td> <td>construct</td>
  <td>template instances and specialisations</td>
  </tr>
  <tr><td>inttype</td> <td>construct</td>
  <td>integer and floating point type routines</td>
  </tr>
  <tr><td>label</td> <td>construct</td>
  <td>labels and jumps</td>
  </tr>
  <tr><td>lex</td> <td>parse</td>
  <td>lexical analysis</td>
  </tr>
  <tr><td>literal</td> <td>parse</td>
  <td>integer and string literals</td>
  </tr>
  <tr><td>load</td> <td>output</td>
  <td>C++ spec reading routines</td>
  </tr>
  <tr><td>macro</td> <td>parse</td>
  <td>macro expansion</td>
  </tr>
  <tr><td>main</td> <td>-</td>
  <td>main routine; command-line arguments</td>
  </tr>
  <tr><td>mangle</td> <td>output</td>
  <td>identifier name mangling</td>
  </tr>
  <tr><td>member</td> <td>construct</td>
  <td>member selector expressions</td>
  </tr>
  <tr><td>merge</td> <td>construct</td>
  <td>intermodule merge routines</td>
  </tr>
  <tr><td>namespace</td> <td>construct</td>
  <td>namespaces; name look-up</td>
  </tr>
  <tr><td>operator</td> <td>construct</td>
  <td>overloaded operators</td>
  </tr>
  <tr><td>option</td> <td>utility</td>
  <td>compiler options</td>
  </tr>
  <tr><td>overload</td> <td>construct</td>
  <td>overload resolution</td>
  </tr>
  <tr><td>parse</td> <td>parse</td>
  <td>low-level parser routines</td>
  </tr>
  <tr><td>pragma</td> <td>parse</td>
  <td><code>#pragma</code> directives</td>
  </tr>
  <tr><td>predict</td> <td>parse</td>
  <td>parser look-ahead routines</td>
  </tr>
  <tr><td>preproc</td> <td>parse</td>
  <td>preprocessing directives</td>
  </tr>
  <tr><td>print</td> <td>utility</td>
  <td>error argument printing routines</td>
  </tr>
  <tr><td>quality</td> <td>construct</td>
  <td>extra expression checks</td>
  </tr>
  <tr><td>redeclare</td> <td>construct</td>
  <td>variable and function redeclarations</td>
  </tr>
  <tr><td>rewrite</td> <td>construct</td>
  <td>inline member function definitions</td>
  </tr>
  <tr><td>save</td> <td>output</td>
  <td>C++ spec writing routines</td>
  </tr>
  <tr><td>shape</td> <td>output</td>
  <td>TDF shape encoding routines</td>
  </tr>
  <tr><td>statement</td> <td>construct</td>
  <td>statement processing</td>
  </tr>
  <tr><td>stmt</td> <td>output</td>
  <td>TDF statement encoding routines</td>
  </tr>
  <tr><td>struct</td> <td>output</td>
  <td>TDF structure encoding routines</td>
  </tr>
  <tr><td>syntax[0-9]*</td> <td>parse</td>
  <td><code>sid</code> parser output</td>
  </tr>
  <tr><td>system</td> <td>utility</td>
  <td>system dependent routines</td>
  </tr>
  <tr><td>table</td> <td>parse</td>
  <td>portability table reading</td>
  </tr>
  <tr><td>template</td> <td>construct</td>
  <td>template declarations and checks</td>
  </tr>
  <tr><td>throw</td> <td>output</td>
  <td>TDF exception handling encoding routines</td>
  </tr>
  <tr><td>tok</td> <td>output</td>
  <td>TDF standard tokens encoding</td>
  </tr>
  <tr><td>tokdef</td> <td>construct</td>
  <td>token definitions</td>
  </tr>
  <tr><td>token</td> <td>construct</td>
  <td>token declarations and expansion</td>
  </tr>
  <tr><td>typeid</td> <td>construct</td>
  <td>run-time type information</td>
  </tr>
  <tr><td>unmangle</td> <td>output</td>
  <td>identifier name unmangling</td>
  </tr>
  <tr><td>variable</td> <td>construct</td>
  <td>variable analysis</td>
  </tr>
  <tr><td>virtual</td> <td>construct</td>
  <td>virtual functions</td>
  </tr>
  <tr><td>xalloc</td> <td>utility</td>
  <td>memory allocation routines</td>
  </tr>
  </table>
  </para>
  </sect3>
  </sect2>
  
  <sect2>
    <title>3.2. Type system</title>
  <para>
  This section describes the type system used in the C++ producer. Unless
  otherwise stated the types are declared using the 
  <A HREF="../utilities/calc.html"><code>calculus</code> tool</A> as
  part of the algebra, <code>c_class.alg</code>.  The design of this
  type algebra was clearly largely based on the concepts underlying
  the C++ language; however TDF provided an important influence, not
  merely as the intended target language, but also because of its clear
  presentation of essential language features. 
  </para>
  
  
  <sect3 id="primitive">
    <title>3.2.1. Primitive types</title>
  <para>
  The primitive types used within the algebra <code>c_class</code> are
  defined as follows: 
  <programlisting>
        int = &quot;int&quot; ;
        unsigned = &quot;unsigned&quot; ;
        string = &quot;character *&quot; ;
        ulong_type (ulong) = &quot;unsigned long&quot; ;
        BITSTREAM_P (bits) = &quot;BITSTREAM *&quot; ;
        PPTOKEN_P (pptok) = &quot;PPTOKEN *&quot; ;
  </programlisting>
  The integral types are self-explanatory.  All string literals used
  in the C++ producer are based on the character type: 
  <programlisting>
        typedef unsigned char character ;
  </programlisting>
  hence the definition of <code>string</code>.  The remaining primitive
  give links to those portions of the type system which are defined
  outside of the algebra.  The types <A HREF="#bits"><code>BITSTREAM</code></A>
  and <A HREF="#pptok"><code>PPTOKEN</code></A> are described below.
  </para>
  </sect3>  
  
  <sect3 id="cv">
    <title>3.2.2. <code>CV_SPEC</code></title>
  <para>
  The enumeration type <code>CV_SPEC</code> (short name <code>cv</code>)
  is used to represent a C++ type qualifier.  It takes the form of a
  bitfield, the elements of which can be or-ed together to represent
  combinations of type qualifiers.  The cv-qualifiers are represented
  by <code>cv_const</code> and <code>cv_volatile</code> in the obvious
  manner.  The value <code>cv_lvalue</code> is used as a qualifier to
  indicate whether a type is an lvalue or an rvalue.  Other values are
  used in function types to represent the function language linkage.
  </para>
  </sect3>  
  
  <sect3 id="ntype">
    <title>3.2.3. <code>BUILTIN_TYPE</code></title>
  <para>
  The enumeration type <code>BUILTIN_TYPE</code> (<code>ntype</code>)
  is used to represent the built-in C++ types (<code>char</code>, 
  <code>float</code>, <code>void</code> etc.).  It is used chiefly as
  an index into tables of type information. 
  </para>
  </sect3>  
  
  <sect3 id="btype">
    <title>3.2.4. <code>BASE_TYPE</code></title>
  <para>
  The enumeration type <code>BASE_TYPE</code> (<code>btype</code>) is
  used to represent a C++ simple type specifier such as <code>signed</code>,
  <code>short</code> or <code>int</code>.  It takes the form of a bitfield,
  the elements of which can be or-ed together to represent combinations
  of type specifiers.  Its chief use is when reading a type from the
  input file; the various simple type specifiers are combined to give
  a value of this type, which is then mapped to an actual <A HREF="#type">C++
  type</A>. 
  </para>
  </sect3>  
  
  <sect3 id="itype">
    <title>3.2.5. <code>INT_TYPE</code></title>
  <para>
  The union type <code>INT_TYPE</code> (<code>itype</code>) is used
  to represent an integral or bitfield C++ type.  The basic integral
  types are given by the <code>basic</code> field.  Bitfield types are
  represented by the <code>bitfield</code> field.  There are also fields
  representing target dependent integral promotion, arithmetic and integer
  literal types, plus <code>VARIETY</code> tokens.  Only one <code>INT_TYPE</code>
  object is created for each integral type. 
  </para>
  </sect3>  
  
  <sect3 id="ftype">
    <title>3.2.6. <code>FLOAT_TYPE</code></title>
  <para>
  The union type <code>FLOAT_TYPE</code> (<code>ftype</code>) is used
  to represent a floating point C++ type.  The basic floating point
  types are given by the <code>basic</code> field.  There are also fields
  representing target dependent argument promotion and arithmetic types,
  plus <code>FLOAT</code> tokens.  Only one <code>FLOAT_TYPE</code>
  object is created for each floating point type. 
  </para>
  </sect3>  
  
  <sect3 id="cinfo">
    <title>3.2.7. <code>CLASS_INFO</code></title>
  <para>
  The enumeration type <code>CLASS_INFO</code> (<code>cinfo</code>)
  is used to represent information relating to a class or enumeration
  definition.  It takes the form of a bitfield, the elements of which
  can be or-ed together to represent various combinations of properties.
  </para>
  </sect3>  
  
  <sect3 id="cusage">
    <title>3.2.8. <code>CLASS_USAGE</code></title>
  <para>
  The enumeration type <code>CLASS_USAGE</code> (<code>cusage</code>)
  is used to represent information relating to the way a class is used.
  It takes the form of a bitfield, the elements of which can be or-ed
  together to represent various combinations of properties. 
  </para>
  </sect3>  
  
  <sect3 id="ctype">
    <title>3.2.9. <code>CLASS_TYPE</code></title>
  <para>
  The union type <code>CLASS_TYPE</code> (<code>ctype</code>) is used
  to represent a C++ class or union.  The main components are an 
  <A HREF="#id">identifier</A> giving the class name, 
  <A HREF="#cinfo">class information</A> and <A HREF="#cusage">class
  usage</A> fields, a <A HREF="#nspace">namespace</A> giving the class
  members, a <A HREF="#graph">graph</A> representing the base class
  structure, and a <A HREF="#virt">virtual function table</A>.  Only
  one 
  <code>CLASS_TYPE</code> object is created for each class or union.
  </para>
  <para>
  Each class maintains a list, <code>pals</code>, of class and function
  identifiers which are declared as friends of that class.  It also
  maintains a list, <code>chums</code>, of those class types which declare
  it to be a friend (this is what is actually used in the access checks).
  Similarly each function identifier maintains a list, 
  <code>chums</code>, of those class types which declare it to be a
  friend. 
  </para>
  <para>
  Each class maintains a list of its constructors, destructors and conversion
  functions (included inherited conversion functions).  It also maintains
  a list of its virtual base classes.  This information can be obtained
  by other means but it is more convenient to record it within the class
  type itself. 
  </para>
  </sect3>  
  
  <sect3 id="graph">
    <title>3.2.10. <code>GRAPH</code></title>
  <para>
  The union type <code>GRAPH</code> (<code>graph</code>) is used to
  represent a directed acyclic graph arising from the base classes of
  a class.  Each node of the graph has a <code>head</code> which is
  a 
  <A HREF="#ctype">class type</A>, and several <code>tails</code> which
  give the base class graphs for that class.  Each node has pointers,
  <code>top</code>, to the top of the graph (i.e. the most derived class),
  and <code>up</code>, to the node of which the current node is a direct
  base.  Each node also has an <code>access</code> field which gives
  information on the base access, whether it is virtual or not, and
  so on, in the form of a <A HREF="#dspec"><code>DECL_SPEC</code></A>.
  Virtual bases are handled by the <code>equal</code> field which defines
  an equivalence relation on the graph which identifies equivalent virtual
  bases.  
  </para>
  </sect3>  
  
  <sect3 id="virt">
    <title>3.2.11. <code>VIRTUAL</code></title>
  <para>
  The union type <code>VIRTUAL</code> (<code>virt</code>) is used to
  represent the virtual functions declared in a class.  The <code>table</code>
  field is used to represent a virtual function table, and consists
  primarily of a list of <code>VIRTUAL</code> objects giving the virtual
  functions for the associated class.  These virtual functions are of
  four kinds, each represented by a union field.  A virtual function
  first declared in a class is represented by the <code>simple</code>
  field; a virtual function in a class which overrides an inherited
  virtual function is represented by the <code>override</code> field;
  an inherited, non-overridden virtual function which is not overridden
  in a base class is represented by the 
  <code>inherit</code> field; a inherited, non-overridden virtual function
  which is overridden in some base class is represented by the 
  <code>complex</code> field. 
  </para>
  </sect3>  
  
  <sect3 id="etype">
    <title>3.2.12. <code>ENUM_TYPE</code></title>
  <para>
  The union type <code>ENUM_TYPE</code> (<code>etype</code>) is used
  to represent a C++ enumeration type.  This consists primarily of an
  <A HREF="#id">identifier</A> giving the enumeration name, a 
  <A HREF="#cinfo">class information</A> field, a <A HREF="#type">type</A>
  giving the underlying representation of the enumeration type, and
  a list of <A HREF="#id">identifiers</A> giving the enumerators comprising
  the enumeration. 
  </para>
  </sect3>  
  
  <sect3 id="type">
    <title>3.2.13. <code>TYPE</code></title>
  <para>
  The union type <code>TYPE</code> (<code>type</code>) is used to represent
  a C++ type.  Every type has an associated <A HREF="#cv">type qualifier</A>,
  <code>qual</code>, which determines whether the type is 
  <code>const</code>, <code>volatile</code> or an lvalue.  A type may
  also have an associated <A HREF="#id">identifier</A>, <code>name</code>,
  giving the corresponding type name (the null identifier being used
  for unnamed types).  The other type components are determined by the
  union tag.  Each of the type constructs above has a corresponding
  field in the <code>TYPE</code> union: 
  <code>integer</code> for <A HREF="#itype">integral types</A>, 
  <code>floating</code> for <A HREF="#ftype">floating point types</A>,
  <code>bitfield</code> for <A HREF="#itype">bitfield types</A>, 
  <code>compound</code> for <A HREF="#ctype">class or union types</A>,
  and 
  <code>enumerate</code> for <A HREF="#etype">enumeration types</A>.
  There are also fields <code>top</code> and <code>bottom</code>
  corresponding to <code>void</code> and bottom (the type used to represent
  values which never return). 
  </para>
  <para>
  Other fields of the <code>TYPE</code> union represent composite types;
  for example, the <code>array</code> field, representing array types,
  comprises a base type, <code>sub</code>, and an <A HREF="#nat">integer
  constant</A> giving the array bound, <code>size</code>.  These are
  generally simple, apart from <code>func</code>, representing a function
  type.  This has the obvious components: a return type, <code>ret</code>,
  a list of parameter types, <code>ptypes</code>, and a flag indicating
  ellipsis functions, <code>ellipsis</code>.  It also has an associated
  <A HREF="#nspace">namespace</A>, <code>pars</code>, in which the function
  parameters are declared.  The parameter identifiers are extracted
  from this as a list, <code>pids</code>.  Member function qualifiers
  and language linkage information are represented by a 
  <A HREF="#cv"><code>CV_QUAL</code></A>, <code>mqual</code>.  The implicit
  extra parameter for member functions is recorded in the list 
  <code>mtypes</code>, which adds this extra type to the start of 
  <code>ptypes</code>.  Finally <code>except</code> gives any exception
  specifiers; the case where the exception specifier is absent being
  represented by the special value, <code>univ_type_set</code>. 
  </para>
  </sect3>  
  
  <sect3 id="dspec">
    <title>3.2.14. <code>DECL_SPEC</code></title>
  <para>
  The enumeration type <code>DECL_SPEC</code> (<code>dspec</code>) is
  used to represent information on the declaration and usage of an identifier.
  It takes the form of a bitfield, the elements of which can be or-ed
  together to represent various combinations of properties.  The 32
  bits in this bitfield (the maximum which can be represented portably)
  are a significant restriction.  This means that the same member of
  <code>DECL_SPEC</code> is often used to mean different things in different
  contexts.  This can prove confusing on occasions. 
  </para>
  </sect3>  
  
  <sect3 id="hashid">
    <title>3.2.15. <code>HASHID</code></title>
  <para>
  The union type <code>HASHID</code> (<code>hashid</code>) is used to
  represent a C++ identifier name.  The simplest form of identifier
  name, 
  <code>name</code>, consists of just a string of characters, such as
  <code>foo</code>.  Extended identifier names, <code>ename</code>,
  are similar, but may contain Unicode characters.  There are however
  other forms of identifier name in C++: conversion function names (<code>conv
  </code>) such as <code>operator int</code>, overloaded operator names
  (<code>op</code>) such as <code>operator+</code>, constructor names
  (<code>constr</code>), and destructor names (<code>destr</code>).
  There are also names which are used for anonymous identifiers (<code>anon</code>).
  </para>
  <para>
  Note the distinction between an identifier name and an actual 
  <A HREF="#id">identifier</A>.  The latter is a meaning associated
  with a name in a particular context.  Every identifier name has an
  associated underlying meaning, <code>id</code>.  This is used to handle
  keywords and macros, but for most identifier names this will be a
  dummy identifier. Nested underlying meanings (such as a macro hiding
  a keyword) are handled by linking the <code>alias</code> fields of
  the corresponding identifiers.  Every identifier name also has a <code>cache
  </code> field which is used to record the look-up of this name as
  an unqualified identifier.  This may be set to the null identifier
  to indicate that the look-up needs to be re-evaluated. 
  </para>
  <para>
  Identifier names are stored in one of a small number of hash tables,
  linked using their <code>next</code> field.  Each name has only one
  entry in these tables, allowing equality of names to be implemented
  as <code>EQ_hashid</code>. 
  </para>
  </sect3>  
  
  <sect3 id="qual">
    <title>3.2.16. <code>QUALIFIER</code></title>
  <para>
  The enumeration type <code>QUALIFIER</code> (<code>qual</code>) is
  used to represent the various ways in which an identifier name can
  be qualified.  For example, <code>::A::a</code> is represented by
  <code>qual_full</code>.  The value <code>qual_mark</code> is used
  in the representation of function identifier expressions to indicate
  that overload resolution has been performed. 
  </para>
  </sect3>  
  
  <sect3 id="identifier">
    <title>3.2.17. <code>IDENTIFIER</code></title>
  <para>
  The union type <code>IDENTIFIER</code> (<code>id</code>) is used to
  represent the various kinds of C++ identifiers.  Every identifier
  has an associated <A HREF="#hashid">identifier name</A>, a parent
  <A HREF="#nspace">namespace</A>, a <A HREF="#dspec">declaration information</A>
  field, and a <A HREF="#loc">location</A> for its declaration or definition.
  Each identifier also has an 
  <code>alias</code> field which is normally used to represent the aliasing
  which can occur in inheritance or <code>using</code>
  declarations. 
  </para>
  <para>
  The various fields of the <code>IDENTIFIER</code> union correspond
  to the various kinds of identifier which can arise in C++ - class
  names, functions, variables, class members, macros, keywords etc.
  Each field has appropriate components giving its type, its definition
  or whatever other information is required.  For example, the <code>variable
  </code>
  field has a <A HREF="#type">type</A> and two <A HREF="#exp">expressions</A>,
  giving the constructor and destructor values for the object. 
  </para>
  <para>
  Most of these identifier components are self-explanatory, however
  the treatment of overloaded functions bears discussion.  The various
  fields representing functions have an <code>over</code> component
  which is used to link overloaded functions together.  A set of overloaded
  functions is treated as if it were a single <code>IDENTIFIER</code>
  - the first in the list - for the purposes of storing in a <A HREF="#member">namespace
  member</A>; the other overloaded meanings are accessed by chasing
  down the <code>over</code> components.  In other situations, whether
  a function identifier represents a single function or a set of overloaded
  functions can be worked out from the context.  For example, in identifier
  expressions the <A HREF="#qual">identifier qualifier</A> is used to
  mark whether overload resolution has taken place. 
  </para>
  </sect3>  
  
  <sect3 id="member">
    <title>3.2.18. <code>MEMBER</code></title>
  <para>
  The union type <code>MEMBER</code> (<code>member</code>) is used to
  represent a member of a <A HREF="#nspace">namespace</A>.  Each member
  contains two identifiers, <code>id</code> and <code>alt</code>.  The
  <code>id</code> field gives the meaning associated with a particular
  name in this namespace; the <code>alt</code> field is used to represent
  a type name which may be hidden by a non-type name. 
  </para>
  <para>
  There are two kinds of member, <code>small</code> and <code>large</code>,
  corresponding to whether the namespace holds its members in a simple
  linked list or in a hash table. 
  </para>
  </sect3>  
  
  <sect3 id="nspace">
    <title>3.2.19. <code>NAMESPACE</code></title>
  <para>
  The union type <code>NAMESPACE</code> (<code>nspace</code>) is used
  to represent the set of identifiers declared in a particular scope.
  For example, the members declared in a C++ class or namespace, the
  parameters declared in a function declarator and the local variables
  declared in a block all form scopes.  The various kinds of scope are
  distinguished as different fields of the union, but there are basically
  two categories.  The first, such as function blocks, which have relatively
  small numbers of elements, store their members as a simple linked
  lists.  The second, such as classes, which have larger numbers of
  elements, store their members in hash tables.  In both cases the elements
  are stored using the <A HREF="#member"><code>MEMBER</code></A>
  type. 
  </para>
  <para>
  The key operation on a namespace is to look up a particular 
  <A HREF="#hashid">identifier name</A> in its linked list or hash table
  of members to find the meaning, if any, associated with that name
  in the namespace.  This can be a complex operation because of the
  need to take base classes and <code>using</code> directives (as stored
  in the <code>use</code> component) into account. 
  </para>
  </sect3>  
  
  <sect3 id="nat">
    <title>3.2.20. <code>NAT</code></title>
  <para>
  The union type <code>NAT</code> (<code>nat</code>) is used to represent
  an integer constant expression.  Values are represented as lists of
  16 bit 'digits'.  Values which fit into a single digit are represented
  by the <code>small</code> field; larger values by the <code>large</code>
  field.  Negated values can be represented by the <code>neg</code>
  field. Folding of integer constant expressions is performed in the
  producer, however the result can only be represented as described
  above if its value is target independent.  Target dependent values
  are represented by the <code>calc</code> field which contains an 
  <A HREF="#exp">expression</A> describing how to calculate the value.
  The <code>token</code> field is used to represent <code>NAT</code>
  tokens. 
  </para>
  <para>
  Objects representing small integer constants are created at the start
  of the program and stored in a table for ease of access.  Larger constants
  are created as and when they are required. 
  </para>
  </sect3>  
  
  <sect3 id="flt">
    <title>3.2.21. <code>FLOAT</code></title>
  <para>
  The union type <code>FLOAT</code> (<code>flt</code>) is used to represent
  a floating point constant expression.  There is only one field, <code>simple
  </code>, which corresponds to a floating point literal.  No folding
  of floating point constant expressions is attempted in the producer
  (it is virtually impossible to do so in a target independent manner).
  </para>
  <para>
  Objects representing useful floating point constants (0.0, 1.0 etc.)
  are created for each floating point type and stored as part of the
  corresponding <A HREF="#ftype"><code>FLOAT_TYPE</code></A>.  Other
  values are created as and when they are required. 
  </para>
  </sect3>  
  
  <sect3 id="str">
    <title>3.2.22. <code>STRING</code></title>
  <para>
  The union type <code>STRING</code> (<code>str</code>) is used to represent
  a string constant expression.  There is only one field, 
  <code>simple</code>, which corresponds to a character string literal,
  however the <code>kind</code> field can be used to modify the interpretation
  put on the characters appearing in the <code>text</code>
  field.  By default, each character in <code>text</code> corresponds
  to a single character in the literal; however an alternative representation,
  in which <code>text</code> consists of a sequence of multibyte characters
  - one control character plus four value characters - is used in more
  complex cases. 
  </para>
  <para>
  All strings are stored in a hash table intended to ensure that the
  same <code>STRING</code> object is used for equal string literals.
  This not only saves space during the processing of the input file,
  but also facilitates the output of shared string literals in the TDF
  capsule. 
  </para>
  <para>
  Note that the terminal zero character does not form part of the 
  <code>STRING</code> object.  Instead information on this is stored
  as part of the type of a <A HREF="#exp">string literal expression</A>.
  The text of the string literal is either truncated or padded with
  zeros until its length matches the size of the array bound in the
  type of the corresponding literal expression. 
  </para>
  </sect3>  
  
  <sect3 id="ntest">
    <title>3.2.23. <code>NTEST</code></title>
  <para>
  The enumeration type <code>NTEST</code> (<code>ntest</code>) is used
  to represent the various C++ relational operators (<code>==</code>,
  <code>!=</code>, <code>&gt;</code> etc.).  The values correspond to
  the encoding of the TDF <code>NTEST</code> sort, which facilitates
  code generation.  The values also have the property that the values
  for complementary operators (such as <code>&lt;</code> and 
  <code>&gt;=</code>) always add up to the same value, 
  <code>ntest_negate</code>, allowing operators to be complemented in
  a straightforward manner. 
  </para>
  </sect3>  
  
  <sect3 id="rmode">
    <title>3.2.24. <code>RMODE</code></title>
  <para>
  The enumeration type <code>RMODE</code> (<code>rmode</code>) is used
  to represent the various C++ rounding modes (towards zero, towards
  smaller etc.).  The values correspond to the encoding of the TDF 
  <code>RMODE</code> sort, which facilitates code generation. 
  </para>
  </sect3>  
  
  <sect3 id="exp">
    <title>3.2.25. <code>EXP</code></title>
  <para>
  The union type <code>EXP</code> (<code>exp</code>) is used to represent
  a C++ expression or statement.  Each expression has an associated
  <A HREF="#type">type</A>, <code>type</code>, but most of the information
  about an expression is stored in one of the large number of fields
  of the <code>EXP</code> union.  Most of these fields are fairly simple.
  For example, there are fields corresponding to <A HREF="#nat">integer
  literals</A>, <A HREF="#flt">floating point literals</A>, 
  <A HREF="#str">string literals</A> and <A HREF="#id">identifiers</A>.
  Composite expressions are formed in the normal way; for example, there
  are various binary operators comprising two argument expressions.
  The 
  <code>EXP</code> fields corresponding to statements are slightly more
  complex.  They each have a <code>parent</code> field which points
  to the enclosing statement.  A couple of cases bear additional discussion.
  </para>
  <para>
  The <code>sequence</code> field represents a compound statement or
  block.  This contains a <A HREF="#nspace">namespace</A>, in which
  any local variables are declared, and a list of expressions, giving
  the statements comprising the block.  The null namespace is used if
  the block does not constitute a scope.  The first statement in the
  list is always a dummy to enable <code>first</code> and <code>last</code>
  pointers to be maintained to the start and end of the list without
  having to worry about null lists. 
  </para>
  <para>
  <A id="solve">The <code>solve_stmt</code> field corresponds to the
  TDF <code>labelled</code> construct</A> (in early versions of TDF
  this construct was called <code>solve</code>, hence the terminology).
  The problem is that C and C++ labels and <code>goto</code>s are totally
  unstructured, whereas the TDF label constructs are structured.  Any
  statement which contains unstructured labels is enclosed in a 
  <code>solve_stmt</code> construct, enclosing both the labelled statement
  and all jumps to it (in general this cannot be done until the end
  of the function).  Any labels or variables which are bypassed by such
  unstructured jumps also need to be pulled out to the <code>solve_stmt</code>
  construct.  It is not just explicit labels which can cause such problems;
  complex <code>switch</code> statements have the same effect. 
  </para>
  </sect3>  
  
  <sect3 id="off">
    <title>3.2.26. <code>OFFSET</code></title>
  <para>
  The union type <code>OFFSET</code> (<code>off</code>) is used to represent
  an offset expression.  This is used as an adjunct to the normal 
  <A HREF="#exp">expression</A> representation.  The <code>OFFSET</code>
  union has fields corresponding to a type offset (used in pointer arithmetic),
  the offset of a member of a class and the offset of a base class.
  There are also simple operations on offsets, such as multiplication
  by an expression. 
  </para>
  </sect3>  
  
  <sect3 id="tok">
    <title>3.2.27. <code>TOKEN</code></title>
  <para>
  The union type <code>TOKEN</code> (<code>tok</code>) is used to represent
  one of a number of different categories within the C++ language. 
  It corresponds to the sort of a token declared using the 
  <A HREF="token.html"><code>#pragma token</code> syntax</A>.  Thus
  there are fields corresponding to expression, statement, integer constant,
  type, function, member and procedure tokens.  The similarities between
  <code>PROC</code> tokens and templates have been remarked above; for
  example, the parameters of the template: 
  <programlisting>
        template &lt; class T, int n &gt; class A {
            T a [n] ;
            // ....
        } ;
  </programlisting>
  are essentially equivalent to those in the procedure token: 
  <programlisting>
        PROC ( TYPE T, EXP const : int : n ) ....
  </programlisting>
  (recall that non-type template arguments are always constant expressions).
  Thus a field, <code>templ</code>, of the <code>TOKEN</code> union
  is used to represent lists of template parameters.  Note that a further
  field, <code>class</code>, is also required to represent template
  template parameters.  A <A HREF="#type">template type</A> is represented
  by a field, <code>templ</code>, of the union <code>TYPE</code>, which
  comprises a template sort and a sub-type expressed in terms of the
  template parameters. 
  </para>
  <para>
  In addition to representing token and template sorts in this way,
  the 
  <code>TOKEN</code> union is used to represent token and template arguments.
  Each of the parameter sorts listed above has an appropriate 
  <code>value</code> component which can store a value of that sort.
  Many of the union types in the algebra, including <A HREF="#type">types</A>
  and <A HREF="#exp">expressions</A>, have a field of the form: 
  <programlisting>
        token -&gt; {
            IDENTIFIER tok ;
            LIST TOKEN args ;
        }
  </programlisting>
  representing the given token <A HREF="#id">identifier</A> applied
  to the given list of arguments. 
  </para>
  <para>
  <A id="form">Template instances are represented slightly differently
  from token applications</A>.  Each instance of a template class or
  a template function gives rise to a new class or function 
  <A HREF="#id">identifier</A>.  This identifier has an underlying form
  giving the template identifier and the template arguments.  This is
  expressed as a <code>token</code> member of the 
  <A HREF="#type"><code>TYPE</code></A> union (although it is not technically
  a type, this happens to be the most convenient representation).  Each
  such form has an associated 
  <A HREF="#inst"><code>INSTANCE</code></A> component which gives further
  information about the template instance.  The form for a template
  function instance is stored in the <code>form</code> component of
  the corresponding <A HREF="#id">identifier</A>.  The form for a template
  class instance is stored in the <code>form</code> component of the
  corresponding <A HREF="#ctype">class type</A>. 
  </para>
  <para>
  Members of instances of template classes also have a form type, but
  in this case the form is an <code>instance</code> type.  This gives
  a link back to the corresponding member of the template class. 
  </para>
  </sect3>  
  
  <sect3 id="inst">
    <title>3.2.28. <code>INSTANCE</code></title>
  <para>
  The union type <code>INSTANCE</code> (<code>inst</code>) is used to
  represent a particular instance of a template or token.  Each 
  <A HREF="#tok">template sort</A> has an associated list of all the
  instances of that template, which is used to ensure that the same
  template applied with the same arguments always has the same value.
  Information on partial or explicit specialisations and usage information
  are stored as part of the corresponding 
  <code>INSTANCE</code>.  Each template instance identifier has a link
  back to its corresponding <code>INSTANCE</code> via its 
  <A HREF="#form"><code>form</code> component</A>. 
  </para>
  </sect3>  
  
  <sect3 id="err">
    <title>3.2.29. <code>ERROR</code></title>
  <para>
  The union type <code>ERROR</code> (<code>err</code>) is used to represent
  an error arising during the compilation of a C++ program. Errors are
  first class objects within the producer and can be passed to and from
  procedures.  Each error has an associated <code>severity</code>
  (serious, warning, none etc.).  Simple errors are represented by the
  <code>simple</code> field, which consists of an index, <code>number</code>,
  into the error catalogue, plus a variable length list of error arguments.
  Errors can be combined into composite errors using the 
  <code>compound</code> field, which represents the join of two errors
  - 
  <code>head</code> followed by <code>tail</code>. 
  </para>
  <para>
  The chief operation on an error after it has been built up is to report
  it.  Each error report consists of an error object and a 
  <A HREF="#loc">file location</A> indicating where the error occurred.
  </para>
  </sect3>  
  
  <sect3 id="var">
    <title>3.2.30. <code>VARIABLE</code></title>
  <para>
  The structure type <code>VARIABLE</code> (<code>var</code>) is used
  to represent a variable state and is used in the variable analysis
  checks. 
  </para>
  </sect3>  
  
  <sect3 id="location">
    <title>3.2.31. <code>LOCATION</code></title>
  <para>
  The structure type <code>LOCATION</code> (<code>loc</code>) is used
  to represent a location in an input file.  It comprises a pointer
  to an 
  <A HREF="#posn">input file position</A>, <code>posn</code>, modified
  by a line number, taking <code>#line</code> directives into account,
  <code>line</code>.  Note that character positions within the line
  are not currently recorded. 
  </para>
  </sect3>  
  
  <sect3 id="posn">
    <title>3.2.32. <code>POSITION</code></title>
  <para>
  The structure type <code>POSITION</code> (<code>posn</code>) is used
  to represent a position in an input file.  It consists of two file
  names, 
  <code>file</code> taking <code>#line</code> directives into account,
  and 
  <code>input</code> giving the actual file name, plus a line number
  offset, <code>offset</code>, which gives the difference between the
  line number taking <code>#line</code> directives into account and
  the actual line number.  Other information stored includes the datestamp
  on the input file, <code>datestamp</code>, and a pointer to a 
  <A HREF="#loc">file location</A> which, for files included using 
  <code>#include</code>, gives the location the file was included from.
  </para>
  </sect3>  
  
  <sect3 id="bits">
    <title>3.2.33. <code>BITSTREAM</code></title>
  <para>
  The structure <code>BITSTREAM</code> is not part of the 
  <code>calculus</code> type system.  It is used to represent a sequence
  of bits such as is used, for example, in the encoding of TDF. 
  </para>
  </sect3>  
  
  <sect3 id="buff">
    <title>3.2.34. <code>BUFFER</code></title>
  <para>
  The structure <code>BUFFER</code> is not part of the <code>calculus</code>
  type system.  It is used to represent a sequence of characters. 
  </para>
  </sect3>  
  
  <sect3 id="opt">
    <title>3.2.35. <code>OPTIONS</code></title>
  <para>
  The structure <code>OPTIONS</code> is not part of the <code>calculus</code>
  type system.  It is used to represent the state of the 
  <A HREF="pragma.html#low">compiler options</A> at a particular point
  in the input file. 
  </para>
  </sect3>  
  
  <sect3 id="pptok">
    <title>3.2.36. <code>PPTOKEN</code></title>
  <para>
  The structure <code>PPTOKEN</code> is not part of the <code>calculus</code>
  type system.  It is used to represent a linked list of preprocessing
  tokens.  Each token has an associated <code>sid</code> lexical token
  number, <code>tok</code>, plus additional data dependent on the token
  type.  Each token also records a pointer to the current 
  <A HREF="#opt"><code>OPTIONS</code></A> value. 
  </para>
  </sect3>
  </sect2>
  
  <sect2>
    <title>3.3. Error catalogue</title>
  <para>
  This section describes the error catalogue which lies at the heart
  of the C++ producer's error reporting routines.  The full 
  <A HREF="error1.html">error catalogue syntax</A> is given as an annex.
  A typical entry in the catalogue is as follows: 
  <programlisting>
        class_union_deriv ( CLASS_TYPE: ct )
        {
            USAGE:              serious
            PROPERTIES:         ansi
            KEY (ISO)           &quot;9.5&quot;
            KEY (STANDARD)      &quot;The union '&quot;ct&quot;' can't have base classes&quot;
        }
  </programlisting>
  This defines an error, <code>class_union_deriv</code>, which takes
  a single parameter <code>ct</code> of type <code>CLASS_TYPE</code>.
  The severity of this error is <code>serious</code>; that is to say,
  a constraint error.  The error property <code>ansi</code> indicates
  that the error arises from the ISO C++ standard, the associated 
  <code>ISO</code> key indicating section 9.5.  Finally the text to
  be printed for this error, including a reference to <code>ct</code>,
  is given.  Looking up section 9.5 in the ISO C++ standard reveals
  the corresponding constraint in paragraph 1: 
  <BLOCKQUOTE>
  <I>A union shall not have base classes.</I>
  </BLOCKQUOTE>
  Each constraint within the ISO C++ standard has a corresponding error
  in this way.  The errors are named in a systematic fashion using the
  section names used in the draft standard.  For example, section 9.5
  is called <code>class.union</code>, so all the constraint errors arising
  from this section have names of the form <code>class_union_*</code>.
  These error names can be used in the <A HREF="pragma.html#low">low
  level directives</A> such as: 
  <programlisting>
        #pragma TenDRA++ error &quot;class_union_deriv&quot; <I>allow</I>
  </programlisting>
  to modify the error severity.  The effect of reducing the severity
  of a constraint error in this way is undefined. 
  </para>
  <para>
  In addition to the obvious error severity levels, <code>serious</code>,
  <code>warning</code> and <code>none</code>, the error catalogue specifies
  a list of optional severity levels along with their default values.
  For example, the entry: 
  <programlisting>
        link_incompat = serious
  </programlisting>
  sets up an option named <code>link_incompat</code> which is a constraint
  error by default.  Errors with this severity, such as: 
  <programlisting>
        dcl_stc_external ( LONG_ID: id, PTR_LOC: loc )
        {
            USAGE:              link_incompat
            PROPERTIES:         ansi
            KEY (ISO)           &quot;7.1.1&quot;
            KEY (STANDARD)      &quot;'&quot;id&quot;' previously declared with external
                                 linkage (at &quot;loc&quot;)&quot;
        }
  </programlisting>
  are therefore constraint errors.  The severity associated with 
  <code>link_incompat</code> can be modified either 
  <A HREF="pragma.html#low">directly</A>, using the directive: 
  <programlisting>
        #pragma TenDRA++ option &quot;link_incompat&quot; <I>allow</I>
  </programlisting>
  or <A HREF="pragma.html#linkage">indirectly</A> using the directive:
  <programlisting>
        #pragma TenDRA incompatible linkage <I>allow</I>
  </programlisting>
  the effect being to modify the severity of the associated error messages.
  </para>
  <para>
  The error catalogue is processed by a simple tool, 
  <code>make_err</code>, which generates C code which is compiled into
  the C++ producer.  Each error in the catalogue is assigned a number
  (there are currently 873 errors in the catalogue) which gives an index
  into an automatically generated table of error information.  It is
  this error number, together with a list of error arguments, which
  forms the associated <A HREF="alg.html#err"><code>ERROR</code> object</A>.
  <code>make_err</code> generates a macro for each error in the catalogue
  which takes arguments of the appropriate types (which may be statically
  checked) and creates an <code>ERROR</code> object.  For example, for
  the entry above this macro takes the form: 
  <programlisting>
        ERROR ERR_class_union_deriv ( CLASS_TYPE ) ;
  </programlisting>
  These macros hide the error catalogue numbers from the rest of the
  C++ producer. 
  </para>
  <para>
  It is also possible to join a number of simple <code>ERROR</code>
  objects to form a single composite <code>ERROR</code>.  The severity
  of the composite error is the maximum of the severities of the component
  errors.  To this purpose a dummy error severity level <code>whatever</code>
  is introduced which is less severe than any other level.  This is
  intended for use with error messages which are only ever used to add
  information to existing errors, and which inherit their severity level
  from the main error. 
  </para>
  <para>
  The text of a simple error message can be found in the table of error
  information.  The text contains certain escape sequences indicating
  where the error arguments are to be printed.  For example, 
  <code>%1</code> indicates the second argument.  The error argument
  sorts - what is referred to as the error signature - is also stored
  in the table of error information as an array of characters, each
  corresponding to an <code>ERR_KEY_</code><I>type</I> macro.  The producer
  defines printing routines for each of the types given by these values,
  and calls the appropriate routine to print the argument. 
  </para>
  <para>
  There are several command-line options which can be used to modify
  the form in which the error message is printed.  The default format
  is as follows: 
  <programlisting>
        &quot;file.C&quot;, line 42: Error:
            [ISO 9.5]: The union 'U' can't have base classes.
  </programlisting>
  The ISO section number can be suppressed using <code>-m-s</code>.
  The <code>-mc</code> option causes the source code line giving rise
  to the error to be printed as part of the message, with <code>!!!!</code>
  marking the position of the error within the line.  The <code>-me</code>
  option causes the error name, <code>class_union_deriv</code>, to be
  printed as part of the message.  The <code>-ml</code> option causes
  the full file location, including the list of <code>#include</code>
  directives used in reaching the file, to be printed.  The <code>-mt</code>
  option causes <code>typedef</code> names to be used when printing
  types, rather than expanding to the type definition. 
  </para>
  </sect2>
  
  <sect2>
    <title>3.4. Parsing C++</title>
  <para>
  The parser used in the C++ producer is generated using the 
  <A HREF="../utilities/sid.html"><code>sid</code> tool</A>.  Because
  of the large size of the generated code (1.3MB), the <code>sid</code>
  output is run through a simple program, <code>sidsplit</code>, which
  splits the output into a number of more manageable modules.  It also
  transforms the code to use the <A HREF="style.html#language"><code>PROTO</code>
  macros</A> used in the rest of the program. 
  </para>
  <para>
  <code>sid</code> is designed as a parser for grammars which can be
  transformed into LL(1) grammars.  The distinguishing feature of these
  grammars is that the parser can always decide what to do next based
  on the current terminal.  This is not the case in C++; in some circumstances
  a potentially unlimited look-ahead is required to distinguish, for
  example, declaration statements from expression statements.  In the
  technical phrase, C++ is an LL(k) grammar. Fortunately there are relatively
  few such situations, and <code>sid</code>
  provides a mechanism, <A HREF="../utilities/sid.html#predicate">predicates</A>,
  for bypassing the normal parsing mechanism in these cases.  Thus it
  is possible, although difficult, to express C++ as a <code>sid</code>
  grammar. 
  </para>
  <para>
  The <code>sid</code> grammar file, <code>syntax.sid</code>, is closely
  based on the ISO C++ grammar.  In particular, the same production
  names have been used.  The grammar has been extended slightly to allow
  common syntactic errors to be detected elegantly.  Other parsing errors
  are handled by <code>sid</code>'s exception mechanism.  At present
  there is only limited recovery after such errors. 
  </para>
  <para>
  The lexical analysis routines in the C++ producer are hand-crafted,
  based on an initial version generated by the simple lexical analyser
  generator, 
  <code>lexi</code>.  <code>lexi</code> has been used more directly
  to generate the lexical analysers for certain of the other automatic
  code generating tools, including <code>calculus</code>, used in the
  producer. 
  </para>
  <para>
  The <code>sid</code> grammar contains a number of entry points.  The
  most important is <code>parse_file</code>, which is used to parse
  a complete C++ translation unit.  The syntax for the 
  <A HREF="pragma.html"><code>#pragma TenDRA</code> directives</A> is
  included within the same grammar with two entry points, 
  <code>parse_tendra</code> in normal use, and <code>parse_preproc</code>
  for use in preprocessing mode.  There are also entry points in the
  grammar for each of the kinds of <A HREF="token.html#args">token argument</A>.
  The parsing routines for token and template arguments are largely
  hand-crafted, based on these primitives. 
  </para>
  <para>
  Certain parsing operations are performed before control passes to
  the 
  <code>sid</code> grammar.  As mentioned above, these include the processing
  of token and template applications.  The other important case concerns
  nested name specifiers.  For example, in: 
  <programlisting>
        class A {
            class B {
                static int c ;
            } ;
        } ;
  
        int A::B::c = 0 ;
  </programlisting>
  the qualified identifier <code>A::B::c</code> is split into two terminals,
  a nested name specifier, <code>A::B::</code>, and an identifier, <code>c</code>,
  which is looked up in the corresponding namespace.  Note that it is
  at this stage that name look-up occurs. An identifier can be mapped
  to one of a number of terminals, including  keywords, type names,
  namespace names and other identifiers, according to the result of
  this look-up.  If the look-up gives a macro then this is expanded
  at this stage. 
  </para>
  </sect2>
  
  <sect2>
    <title>3.5. TDF generation</title>
  <para>
  The TDF encoding as a bitstream is expressed as a series of macros
  generated by the <code>make_tdf</code> tool from the TDF specification
  database.  Note that the version of the TDF database used contains
  a couple of corrections from the standard version: 
  <itemizedlist>
  <listitem>A construct <code>make_token_def</code> has been added to represent
  a     token definition. 
  </listitem>
  <listitem>The sort <code>diag_tag</code> has been added to the edge constructors.
  </listitem>
  </itemizedlist>
  The macros generated only handle the encoding of the construct - the
  construct parameters need to be encoded by hand (the C producer does
  something similar, but including the construct parameters).  For example,
  <code>make_tdf</code> generates a macro: 
  <programlisting>
        void ENC_plus ( BITSTREAM * ) ;
  </programlisting>
  which encodes the <code>plus</code> construct (91 as 7 bits in extended
  format).  A typical use of this macro, for adding the expressions
  <code>a</code> and <code>b</code> would be: 
  <programlisting>
        ENC_plus ( bs ) ;
        ENC_impossible ( bs ) ;
        bs = enc_exp ( bs, a ) ;
        bs = enc_exp ( bs, b ) ;
  </programlisting>
  </para>
  <para>
  Each function or variable is compiled to TDF as its definition is
  encountered.  For some definitions, such as inline functions, the
  compilation may be deferred until it is clear whether or not the identifier
  has been used.  There is a final pass over all identifiers during
  the variable analysis routines which incorporates this check. Because
  of the organisation of a TDF capsule it is necessary to store all
  of the compiled TDF in memory until the end of the program, when the
  complete capsule, including external tag and token names and linkage
  information, is written to the output file. 
  </para>
  </sect2>
  </sect1>
  
  <sect1>
    <title>Annex A. <code>#pragma</code> directive syntax</title>
  <para>
  The following gives a summary of the syntax for the <code>#pragma</code>
  directives used for <A HREF="pragma.html">compiler configuration</A>
  and <A HREF="token.html">token specification</A>: 
  <programlisting>
  
        <I>pragma-directive</I> :
                <A HREF="#tendra"># pragma TenDRA ++<I><SUB>opt</SUB> tendra-directive</I></A>
                <A HREF="#token"># pragma <I>token-directive</I></A>
  
        <A id="tendra"><I>tendra-directive</I></A> :
                <A HREF="#scope"><I>scope-directive</I></A>
                <A HREF="#low"><I>low-level-directive</I></A>
                <A HREF="#analysis"><I>analysis-directive on</I></A>
                <A HREF="#check"><I>check-directive allow</I></A>
                <A HREF="#keyword"><I>keyword-directive</I></A>
                <A HREF="#type"><I>type-directive</I></A>
                <A HREF="#linkage"><I>linkage-directive</I></A>
                <A HREF="#misc"><I>misc-directive</I></A>
                <A HREF="#token1"><I>tendra-token-directive</I></A>
  
        <I>on</I> :
                on
                warning
                off
  
        <I>allow</I> :
                allow
                warning
                disallow
  
  
        <A id="scope"><I>scope-directive</I></A> :
                <A HREF="pragma.html#scope">begin</A>
                <A HREF="pragma.html#scope">begin name environment <I>identifier</I></A>
                <A HREF="pragma.html#scope">end</A>
                <A HREF="pragma.html#scope">directory <I>identifier</I> use environment <I>identifier</I></A>
                <A HREF="pragma.html#scope">use environment <I>identifier</I></A>
                <A HREF="pragma.html#scope">use environment <I>identifier</I> reset <I>allow</I></A>
  
  
        <A id="low"><I>low-level-directive</I></A> :
                <A HREF="pragma.html#low">error <I>string-literal allow</I></A>
                <A HREF="pragma.html#low">error <I>string-literal on</I></A>
                <A HREF="pragma.html#low">error <I>string-literal</I> as option <I>string-literal</I></A>
                <A HREF="pragma.html#low">option <I>string-literal allow</I></A>
                <A HREF="pragma.html#low">option <I>string-literal on</I></A>
                <A HREF="pragma.html#limits">option value <I>string-literal integer-literal</I></A>
                <A HREF="pragma.html#low">use error <I>string-literal</I></A>
  
  
        <A id="analysis"><I>analysis-directive</I></A> :
                <A HREF="pragma.html#init">complete initialization analysis</A>
                <A HREF="pragma.html#elab">complete struct / union analysis</A>
                <A HREF="pragma.html#conv">conversion analysis <I>conversion-spec<SUB>opt</SUB></I></A>
                <A HREF="pragma.html#discard">discard analysis <I>discard-spec<SUB>opt</SUB></I></A>
                <A HREF="pragma.html#switch">enum switch analysis</A>
                <A HREF="pragma.html#linkage">external function linkage</A>
                <A HREF="pragma.html#for">for initialization block</A>
                <A HREF="pragma.html#elab">ignore struct / union / enum tag</A>
                <A HREF="pragma.html#template">implicit export template</A>
                <A HREF="pragma.html#impl_func">implicit function declaration</A>
                <A HREF="pragma.html#exp">integer operator analysis</A>
                <A HREF="pragma.html#exp">integer overflow analysis</A>
                <A HREF="pragma.html#comment">nested comment analysis</A>
                <A HREF="pragma.html#exp">operator precedence analysis</A>
                <A HREF="pragma.html#exp">pointer operator analysis</A>
                <A HREF="pragma.html#throw">throw analysis</A>
                <A HREF="pragma.html#linkage">unify external linkage</A>
                <A HREF="pragma.html#variable">variable analysis</A>
                <A HREF="pragma.html#hide">variable hiding analysis</A>
                <A HREF="pragma.html#weak">weak prototype analysis</A>
  
        <I>conversion-spec</I> :
                ( int - int <I>implicit-spec<SUB>opt</SUB></I> )
                ( int - pointer <I>implicit-spec<SUB>opt</SUB></I> )
                ( pointer - int <I>implicit-spec<SUB>opt</SUB></I> )
                ( pointer - pointer <I>implicit-spec<SUB>opt</SUB></I> )
                ( int - enum implicit )
                ( pointer - void * implicit )
                ( void * - pointer implicit )
  
        <I>implicit-spec</I> :
                implicit
                explicit
  
        <I>discard-spec</I> :
                ( function return )
                ( static )
                ( value )
  
  
        <A id="check"><I>check-directive</I></A> :
                <A HREF="pragma.html#overload">ambiguous overload resolution</A>
                <A HREF="pragma.html#if">assignment as bool</A>
                <A HREF="pragma.html#bitfield">bitfield overflow</A>
                <A HREF="pragma.html#linkage">block function static</A>
                <A HREF="pragma.html#catch_all">catch all</A>
                <A HREF="pragma.html#escape">character escape overflow</A>
                <A HREF="token.html#tokdef">compatible token</A>
                <A HREF="pragma.html#include">complete file includes</A>
                <A HREF="pragma.html#target-if">conditional declaration</A>
                <A HREF="pragma.html#lvalue">conditional lvalue</A>
                <A HREF="pragma.html#overload">conditional overload resolution <I>overload-spec<SUB>opt</SUB></I></A>
                <A HREF="pragma.html#if">const conditional</A>
                <A HREF="pragma.html#macro">directive as macro argument</A>
                <A HREF="pragma.html#identifier">dollar as ident</A>
                <A HREF="pragma.html#elab">extra ,</A>
                <A HREF="pragma.html#decl_none">extra ;</A>
                <A HREF="pragma.html#if">extra ; after conditional</A>
                <A HREF="pragma.html#weak">extra ...</A>
                <A HREF="pragma.html#bitfield">extra bitfield int type</A>
                <A HREF="pragma.html#macro">extra macro definition</A>
                <A HREF="pragma.html#typedef">extra type definition</A>
                <A HREF="pragma.html#switch">fall into case</A>
                <A HREF="pragma.html#elab">forward enum declaration</A>
                <A HREF="pragma.html#conv">function pointer as pointer</A>
                <A HREF="pragma.html#ellipsis">ident ...</A>
                <A HREF="pragma.html#implicit">implicit int type <I>inttype-spec<SUB>opt</SUB></I></A>
                <A HREF="token.html#tokdef">implicit token definition</A>
                <A HREF="token.html#spec">incompatible interface declaration</A>
                <A HREF="token.html#member">incompatible member declaration</A>
                <A HREF="pragma.html#linkage">incompatible linkage</A>
                <A HREF="pragma.html#weak">incompatible promoted function argument</A>
                <A HREF="pragma.html#compatible">incompatible type qualifier</A>
                <A HREF="pragma.html#return">incompatible void return</A>
                <A HREF="pragma.html#complete">incomplete type as object type</A>
                <A HREF="pragma.html#ppdir">indented # directive</A>
                <A HREF="pragma.html#ppdir">indented directive after #</A>
                <A HREF="pragma.html#init">initialization of struct / union ( auto )</A>
                <A HREF="pragma.html#longlong">longlong type</A>
                <A HREF="pragma.html#ppdir">no directive / nline after ident</A>
                <A HREF="pragma.html#empty">no external declaration</A>
                <A HREF="pragma.html#macro">no ident after #</A>
                <A HREF="pragma.html#lex">no nline after file end</A>
                <A HREF="token.html#tokdef">no token definition</A>
                <A HREF="pragma.html#overload">overload resolution</A>
                <A HREF="pragma.html#weak">prototype</A>
                <A HREF="pragma.html#weak">prototype ( weak )</A>
                <A HREF="token.html#exp">rvalue token as const</A>
                <A HREF="pragma.html#ppdir">text after directive</A>
                <A HREF="pragma.html#lvalue">this lvalue</A>
                <A HREF="pragma.html#string">unify incompatible string literal</A>
                <A HREF="pragma.html#ppdir">unknown directive</A>
                <A HREF="pragma.html#escape">unknown escape</A>
                <A HREF="pragma.html#ppdir">unknown pragma</A>
                <A HREF="pragma.html#decl_none">unknown struct / union</A>
                <A HREF="pragma.html#string">unmatched quote</A>
                <A HREF="pragma.html#reach">unreachable code</A>
                <A HREF="pragma.html#init">variable initialization</A>
                <A HREF="pragma.html#macro">weak macro equality</A>
                <A HREF="pragma.html#string">writeable string literal</A>
  
        <I>inttype-spec</I> :
                for const / volatile
                for external declaration
                for function return
  
        <I>overload-spec</I> :
                ( complete )
                ( incomplete )
  
  
        <A id="keyword"><I>keyword-directive</I></A> :
                <A HREF="#keyword">keyword <I>identifier</I> for <I>keyword-spec</I></A>
                <A HREF="pragma.html#keyword-spec">undef keyword <I>identifier</I></A>
  
        <A id="keyword-spec"><I>keyword-spec</I></A> :
                <A HREF="pragma.html#discard">discard value</A>
                <A HREF="pragma.html#variable">discard variable</A>
                <A HREF="pragma.html#switch">exhaustive</A>
                <A HREF="pragma.html#switch">fall into case</A>
                <A HREF="pragma.html#keyword">keyword <I>identifier</I></A>
                <A HREF="pragma.html#keyword">operator <I>operator</I></A>
                <A HREF="pragma.html#variable">set</A>
                <A HREF="pragma.html#reach">set reachable</A>
                <A HREF="pragma.html#reach">set unreachable</A>
                <A HREF="pragma.html#conv">type representation</A>
                <A HREF="pragma.html#weak">weak</A>
  
  
        <A id="type-directive"><I>type-directive</I></A> :
                <A HREF="pragma.html#reach">bottom <I>identifier</I></A>
                <A HREF="pragma.html#char">character <I>character-sign</I></A>
                <A HREF="pragma.html#identifier">character <I>character-literal character-mapping</I></A>
                <A HREF="pragma.html#identifier">character <I>string-literal character-mapping</I></A>
                <A HREF="lib.html#arith">compute promote <I>identifier</I></A>
                <A HREF="pragma.html#escape">escape <I>character-literal character-mapping</I></A>
                <A HREF="pragma.html#int">integer literal <I>literal-spec</I></A>
                <A HREF="lib.html#arith">promoted <I>type-id</I> : <I>type-id</I></A>
                <A HREF="pragma.html#char">set character literal : <I>type-id</I></A>
                <A HREF="pragma.html#longlong">set longlong type : <I>longlong-spec</I></A>
                <A HREF="pragma.html#char">set ptrdiff_t : <I>type-id</I></A>
                <A HREF="pragma.html#char">set size_t : <I>type-id</I></A>
                <A HREF="pragma.html#char">set wchar_t : <I>type-id</I></A>
                <A HREF="pragma.html#string">set string literal : <I>string-const</I></A>
                <A HREF="pragma.html#std">set std namespace : <I>scope-name</I></A>
                <A HREF="#type-spec">type <I>identifier</I> for <I>type-spec</I></A>
  
        <I>character-sign</I> :
                signed
                unsigned
                either
  
        <I>character-mapping</I> :
                as <I>character-literal</I> allow
                disallow
  
        <I>literal-spec</I> :
                <I>literal-base literal-suffix<SUB>opt</SUB> literal-type-list</I>
  
        <I>literal-base</I> :
                decimal
                octal
                hexadecimal
  
        <I>literal-suffix</I> :
                unsigned
                long
                unsigned long
                long long
                unsigned long long
  
        <I>literal-type-list</I> :
                * <I>literal-type-spec</I>
                <I>integer-literal literal-type-spec</I> | <I>literal-type-list</I>
                ? <I>literal-type-spec</I> | <I>literal-type-list</I>
  
        <I>literal-type-spec</I> :
                : <I>type-id</I>
                * <I>allow<SUB>opt</SUB></I> : <I>identifier</I>
                * * <I>allow<SUB>opt</SUB></I> :
  
        <I>longlong-spec</I> :
                long
                long long
  
        <I>string-const</I> :
                const
                no const
  
        <I>scope-name</I> :
                <I>identifier</I>
                ::
  
        <A id="type-spec"><I>type-spec</I></A> :
                <A HREF="pragma.html#reach">bottom</A>
                <A HREF="pragma.html#char">ptrdiff_t</A>
                <A HREF="pragma.html#char">size_t</A>
                <A HREF="pragma.html#char">wchar_t</A>
                <A HREF="pragma.html#printf">... printf</A>
                <A HREF="pragma.html#printf">... scanf</A>
  
  
        <A id="linkage"><I>linkage-directive</I></A> :
                <A HREF="pragma.html#linkage">const linkage <I>linkage</I></A>
                <A HREF="pragma.html#linkage">external linkage <I>string-literal</I></A>
                <A HREF="pragma.html#linkage">external volatile_t</A>
                <A HREF="pragma.html#linkage">inline linkage <I>linkage</I></A>
                <A HREF="pragma.html#linkage">linkage resolution : <I>linkage-spec</I></A>
  
        <I>linkage</I> :
                external
                internal
  
        <I>linkage-spec</I> :
                ( <I>linkage</I> ) on
                ( <I>linkage</I> ) warning
                off
  
  
        <A id="misc"><I>misc-directive</I></A> :
                <A HREF="pragma.html#weak">argument <I>type-id</I> as ...</A>
                <A HREF="pragma.html#weak">argument <I>type-id</I> as <I>type-id</I></A>
                <A HREF="pragma.html#compatible">compatible type : <I>type-id</I> == <I>type-id</I> : <I>allow</I></A>
                <A HREF="pragma.html#conv">conversion <I>identifier-list</I> allow</A>
                <A HREF="dump.html#scope">declaration block <I>identifier</I> begin</A>
                <A HREF="dump.html#scope">declaration block end</A>
                <A HREF="pragma.html#ppdir">directive <I>directive-spec directive-state</I></A>
                <A HREF="pragma.html#variable">discard <I>expression</I></A>
                <A HREF="pragma.html#switch">exhaustive</A>
                <A HREF="pragma.html#cast">explicit cast <I>cast-spec<SUB>opt</SUB> allow</I></A>
                <A HREF="pragma.html#include">includes depth <I>integer-literal</I></A>
                <A HREF="pragma.html#static">preserve <I>preserve-list</I></A>
                <A HREF="pragma.html#variable">set <I>expression</I></A>
                <A HREF="pragma.html#limits">set error limit <I>integer-literal</I></A>
                <A HREF="pragma.html#identifier">set name limit <I>integer-literal</I> warning<I><SUB>opt</SUB></I></A>
                <A HREF="pragma.html#discard">suspend static <I>identifier-list</I></A>
  
        <I>directive-spec</I> :
                assert
                file
                ident
                import
                include_next
                unassert
                warning
                weak
  
        <I>directive-state</I> :
                allow
                warning
                disallow
                ( ignore ) allow
                ( ignore ) warning
  
        <I>cast-operator</I> :
                static_cast
                const_cast
                reinterpret_cast
  
        <I>cast-spec</I> :
                as <I>cast-operator</I>
                <I>cast-spec</I> | <I>cast-operator</I>
  
        <I>preserve-list</I> :
                <I>identifier-list</I>
                *
  
        <I>identifier-list</I> :
                <I>identifier identifier-list<SUB>opt</SUB></I>
  
  
        <A id="token"><I>token-directive</I></A> :
                <A HREF="token.html#spec">token <I>token-spec</I></A>
                <A HREF="token.html#tokdef">no_def <I>token-list</I></A>
                <A HREF="token.html#tokdef">define <I>token-list</I></A>
                <A HREF="token.html#tokdef">ignore <I>token-list</I></A>
                <A HREF="token.html#tokdef">interface <I>token-list</I></A>
                <A HREF="token.html#tokdef">undef token <I>token-list</I></A>
                <A HREF="token.html#tokdef">extend interface <I>header-name</I></A>
                <A HREF="token.html#tokdef">implement interface <I>header-name</I></A>
  
        <A id="token1"><I>tendra-token-directive</I></A> :
                <A HREF="token.html#spec">token <I>token-spec</I></A>
                <A HREF="token.html#tokdef">no_def <I>token-list</I></A>
                <A HREF="token.html#tokdef">define <I>token-list</I></A>
                <A HREF="token.html#tokdef">reject <I>token-list</I></A>
                <A HREF="token.html#tokdef">interface <I>token-list</I></A>
                <A HREF="token.html#tokdef">undef token <I>token-list</I></A>
                <A HREF="token.html#tokdef">extend <I>header-name</I></A>
                <A HREF="token.html#tokdef">implement <I>header-name</I></A>
                <A HREF="token.html#tokdef">member definition <I>type-id</I> : <I>identifier member-offset</I></A>
  
        <I>member-offset</I> :
                ::<I><SUB>opt</SUB> id-expression</I>
                <I>member-offset</I> . ::<I><SUB>opt</SUB> id-expression</I>
                <I>member-offset</I> [ <I>constant-expression</I> ]
  
        <I>token-list</I> :
                <I>token-id token-list<SUB>opt</SUB></I>
                # <I>preproc-token-list</I>
  
        <I>token-id</I> :
                <I>token-namespace<SUB>opt</SUB> identifier</I>
                <I>type-id</I> . <I>identifier</I>
  
  
        <I>token-spec</I> :
                <I>token-introduction token-identification</I>
  
        <I>token-introduction</I> :
                <I>exp-token</I>
                <I>statement-token</I>
                <I>type-token</I>
                <I>member-token</I>
                <I>procedure-token</I>
  
        <I>token-identification</I> :
                <I>token-namespace<SUB>opt</SUB> identifier</I> # <I>external-identifier<SUB>opt</SUB></I>
  
        <I>token-namespace</I> :
                TAG
  
        <I>external-identifier</I> :
                -
                <I>preproc-token-list</I>
  
        <I>exp-token</I> :
                EXP <I>exp-storage<SUB>opt</SUB></I> : <I>type-id</I> :
                NAT
                INTEGER
  
        <I>exp-storage</I> :
                lvalue
                rvalue
                const
  
        <I>statement-token</I> :
                STATEMENT
  
        <I>type-token</I> :
                TYPE
                VARIETY
                VARIETY signed
                VARIETY unsigned
                FLOAT
                ARITHMETIC
                SCALAR
                CLASS
                STRUCT
                UNION
  
        <I>member-token</I> :
                MEMBER <I>access-specifier<SUB>opt</SUB> member-type-id</I> : <I>type-id</I> :
  
        <I>member-type-id</I> :
                <I>type-id</I>
                <I>type-id</I> % <I>constant-expression</I>
  
        <I>access-specifier</I> :
                public
                protected
                private
  
        <I>procedure-token</I> :
                <I>general-procedure</I>
                <I>simple-procedure</I>
                <I>function-procedure</I>
  
        <I>general-procedure</I> :
                PROC { <I>bound-toks<SUB>opt</SUB></I> | <I>prog-pars<SUB>opt</SUB></I> } <I>token-introduction</I>
  
        <I>bound-toks</I> :
                <I>bound-token</I>
                <I>bound-token</I> , <I>bound-toks</I>
  
        <I>bound-token</I> :
                <I>token-introduction token-namespace<SUB>opt</SUB> identifier</I>
  
        <I>prog-pars</I> :
                <I>program-parameter</I>
                <I>program-parameter</I> , <I>prog-pars</I>
  
        <I>program-parameter</I> :
                EXP <I>identifier</I>
                STATEMENT <I>identifier</I>
                TYPE <I>type-id</I>
                MEMBER <I>type-id</I> : <I>identifier</I>
                PROC <I>identifier</I>
  
        <I>simple-procedure</I> :
                PROC ( <I>simple-toks<SUB>opt</SUB></I> ) <I>token-introduction</I>
  
        <I>simple-toks</I> :
                <I>simple-token</I>
                <I>simple-token</I> , <I>simple-toks</I>
  
        <I>simple-token</I> :
                <I>token-introduction token-namespace<SUB>opt</SUB> identifier<SUB>opt</SUB></I>
  
        <I>function-procedure</I> :
                FUNC <I>type-id</I> :
  </programlisting>
  </para>
  </sect1>
  
  <sect1>
    <title>Annex B. Symbol table dump syntax</title>
  <para>
  The following gives a summary of the syntax for the 
  <A HREF="dump.html">symbol table dump file</A> (version 1.1): 
  <programlisting>
  
        <I>dump-file</I> :
                <I>command-list<SUB>opt</SUB></I>
  
        <I>command-list</I> :
                <I>command command-list<SUB>opt</SUB></I>
  
        <I>command</I> :
                <I>version-command</I>
                <I>identifier-command</I>
                <I>scope-command</I>
                <I>override-command</I>
                <I>base-command</I>
                <I>api-command</I>
                <I>template-command</I>
                <I>promotion-command</I>
                <I>error-command</I>
                <I>path-command</I>
                <I>file-command</I>
                <I>include-command</I>
                <I>string-command</I>
  
        <I>version-command</I> :
                V <I>number number string</I>
  
  
        <I>location</I> :
                <I>number number number string string</I>
                <I>number number number string</I> *
                <I>number number number</I> *
                <I>number number</I> *
                <I>number</I> *
                *
  
  
        <I>identifier</I> :
                <I>number</I> = <I>identifier-name access<SUB>opt</SUB> scope-identifier</I>
                <I>number</I>
  
        <I>identifier-name</I> :
                <I>string</I>
                C <I>type</I>
                D <I>type</I>
                O <I>string</I>
                T <I>type</I>
  
        <I>access</I> :
                N
                B
                P
  
        <I>scope-identifier</I> :
                <I>identifier</I>
                *
  
        <I>identifier-command</I> :
                D <I>identifier-info type-info</I>
                M <I>identifier-info type-info</I>
                T <I>identifier-info type-info</I>
                Q <I>identifier-info</I>
                U <I>identifier-info</I>
                L <I>identifier-info</I>
                C <I>identifier-info</I>
                W <I>identifier-info type-info</I>
                I <I>identifier-command</I>
  
        <I>identifier-info</I> :
                <I>identifier-key location identifier</I>
  
        <I>identifier-key</I> :
                K
                MO
                MF
                MB
                TC
                TS
                TU
                TE
                TA
                NN
                NA
                VA
                VP
                VE
                VS
                FE <I>function-key<SUB>opt</SUB></I>
                FS <I>function-key<SUB>opt</SUB></I>
                FB <I>function-key<SUB>opt</SUB></I>
                CF <I>function-key<SUB>opt</SUB></I>
                CS <I>function-key<SUB>opt</SUB></I>
                CV <I>function-key<SUB>opt</SUB></I>
                CM
                CD
                E
                L
                XO
                XF
                XP
                XT
  
        <I>function-key</I> :
                C <I>function-key<SUB>opt</SUB></I>
                I <I>function-key<SUB>opt</SUB></I>
  
        <I>type-info</I> :
                <I>type identifier<SUB>opt</SUB></I>
                <I>sort</I>
                <I>scope-identifier</I>
                *
  
  
        <I>scope-command</I> :
                SS <I>scope-key location identifier</I>
                SE <I>scope-key location identifier</I>
  
        <I>scope-key</I> :
                N
                S
                B
                D
                H
                CT
                CF
                CC
  
  
        <I>override-command</I> :
                O <I>identifier identifier</I>
  
  
        <I>base-command</I> :
                B <I>identifier-key identifier base-graph</I>
  
        <I>base-graph</I> :
                <I>base-class</I>
                <I>base-class</I> ( <I>base-list</I> )
  
        <I>base-class</I> :
                <I>number</I> = V<I><SUB>opt</SUB> access<SUB>opt</SUB> type-name</I>
                <I>number</I> :
  
        <I>base-list</I> :
                <I>base-graph base-list<SUB>opt</SUB></I>
  
        <I>base-number</I> :
                <I>number</I> : <I>type-name</I>
  
  
        <I>api-command</I> :
                X <I>identifier-key identifier string</I>
  
  
        <I>template-command</I> :
                Z <I>identifier-key identifier token-application specialise-info</I>
  
        <I>specialise-info</I> :
                <I>identifier</I>
                <I>token-application</I>
                *
  
  
        <I>type</I> :
                <I>type-name</I>
                c
                s
                i
                l
                x
                b
                w
                y
                z
                f
                d
                r
                v
                u
                Sc
                Uc
                Us
                Ui
                Ul
                Ux
                C <I>type</I>
                V <I>type</I>
                P <I>type</I>
                R <I>type</I>
                M <I>type-name</I> : <I>type</I>
                F <I>type parameter-types</I>
                A <I>nat<SUB>opt</SUB></I> : <I>type</I>
                B <I>nat</I> : <I>type</I>
                t <I>parameter-list<SUB>opt</SUB></I> : <I>type</I>
                p <I>type</I>
                a <I>type</I> : <I>type</I>
                n <I>lit-base<SUB>opt</SUB> lit-suffix<SUB>opt</SUB></I>
                W <I>type parameter-types</I>
                q <I>type</I>
                Q <I>string</I>
                *
  
        <I>type-name</I> :
                <I>identifier</I>
                <I>token-application</I>
  
        <I>parameter-types</I> :
                : <I>exception-spec<SUB>opt</SUB> func-qualifier<SUB>opt</SUB></I> :
                . <I>exception-spec<SUB>opt</SUB> func-qualifier<SUB>opt</SUB></I> :
                . <I>exception-spec<SUB>opt</SUB> func-qualifier<SUB>opt</SUB></I> .
                , <I>type parameter-types</I>
  
        <I>func-qualifier</I> :
                C <I>func-qualifier<SUB>opt</SUB></I>
                V <I>func-qualifier<SUB>opt</SUB></I>
  
        <I>exception-spec</I> :
                ( <I>exception-list<SUB>opt</SUB></I> )
  
        <I>exception-list</I> :
                <I>type</I>
                <I>type</I> , <I>exception-list</I>
  
        <I>nat</I> :
                + <I>number</I>
                - <I>number</I>
                <I>identifier</I>
                <I>token-application</I>
                <I>string</I>
  
        <I>parameter-list</I> :
                <I>identifier</I>
                <I>identifier</I> , <I>parameter-list</I>
  
        <I>lit-base</I> :
                O
                X
  
        <I>lit-suffix</I> :
                U
                l
                Ul
                x
                Ux
  
  
        <I>promotion-command</I> :
                P <I>type</I> : <I>type</I>
  
  
        <I>sort</I> :
                <I>expression-sort</I>
                <I>statement-sort</I>
                <I>type-sort</I>
                <I>tag-type-sort</I>
                <I>member-sort</I>
                <I>proc-sort</I>
                <I>func-sort</I>
                <I>template-sort</I>
                <I>macro-sort</I>
  
        <I>expression-sort</I> :
                ZEL <I>type</I>
                ZER <I>type</I>
                ZEC <I>type</I>
                ZN
  
        <I>statement-sort</I> :
                ZS
  
        <I>type-sort</I> :
                ZTO
                ZTI
                ZTF
                ZTA
                ZTP
                ZTS
                ZTU
  
        <I>tag-type-sort</I> :
                ZTTS
                ZTTU
  
        <I>member-sort</I> :
                ZM <I>type</I> : <I>type-name</I>
  
        <I>proc-sort</I> :
                ZPG <I>parameter-list<SUB>opt</SUB></I> ; <I>parameter-list<SUB>opt</SUB></I> : <I>sort</I>
                ZPS <I>parameter-list<SUB>opt</SUB></I> : <I>sort</I>
  
        <I>func-sort</I> :
                ZF <I>type</I>
  
        <I>template-sort</I> :
                ZTt <I>parameter-list<SUB>opt</SUB></I> :
  
        <I>macro-sort</I> :
                ZUO
                ZUF <I>number</I>
  
        <I>token-application</I> :
                T <I>identifier</I> , <I>token-argument-list</I> :
  
        <I>token-argument-list</I> :
                <I>token-argument</I>
                <I>token-argument</I> , <I>token-argument-list</I>
  
        <I>token-argument</I> :
                E <I>expression</I>
                N <I>nat</I>
                S <I>statement</I>
                T <I>type</I>
                M <I>member</I>
                F <I>identifier</I>
                C <I>identifier</I>
  
        <I>expression</I> :
                <I>nat</I>
  
        <I>statement</I> :
                <I>expression</I>
  
        <I>member</I> :
                <I>identifier</I>
                <I>string</I>
  
  
        <I>error-name</I> :
                <I>number</I> = <I>string</I>
                <I>number</I>
  
        <I>error-command</I> :
                ES <I>location error-info</I>
                EW <I>location error-info</I>
                EI <I>location error-info</I>
                EF <I>location error-info</I>
                EC <I>error-info</I>
                EA <I>error-argument</I>
  
        <I>error-info</I> :
                <I>error-name number number</I>
  
        <I>error-argument</I> :
                B <I>base-number</I>
                C <I>scope-identifier</I>
                E <I>expression</I>
                H <I>identifier-name</I>
                I <I>identifier</I>
                L <I>location</I>
                N <I>nat</I>
                S <I>string</I>
                T <I>type</I>
                V <I>number</I>
                V - <I>number</I>
  
  
        <I>path-command</I> :
                FD <I>number</I> = <I>string string<SUB>opt</SUB></I>
  
        <I>directory</I> :
                <I>number</I>
                *
  
        <I>file-command</I> :
                FS <I>location directory</I>
                FE <I>location</I>
  
        <I>include-command</I> :
                FIA <I>location string</I>
                FIQ <I>location string</I>
                FIN <I>location string</I>
                FIS <I>location string</I>
                FIE <I>location string</I>
                FIR <I>location</I>
  
  
        <I>string-command</I> :
                A <I>location string</I>
                AC <I>location string</I>
                AL <I>location string</I>
                ACL <I>location string</I>
  </programlisting>
  </para>
  </sect1>
  
  <sect1>
    <title>Annex C. Error catalogue syntax</title>
  <para>
  The following gives a summary of the syntax for the 
  <A HREF="error.html">error catalogue</A> accepted by the 
  <code>make_err</code> tool.  Identifiers are normal C-style identifiers,
  strings consist of any sequence of characters enclosed inside 
  <code>&quot;....&quot;</code>.  The escape sequences <code>\&quot;</code>
  and 
  <code>\\</code> are allowed in strings; other characters (including
  newline characters) map to themselves.  C-style comments are allowed.
  <programlisting>
  
        <I>error-database</I> :
                <I>header types<SUB>opt</SUB> properties<SUB>opt</SUB> keys<SUB>opt</SUB> usages<SUB>opt</SUB> entries<SUB>opt</SUB></I>
  
        <I>header</I> :
                <I>database-name<SUB>opt</SUB> rig-name<SUB>opt</SUB> prefixes<SUB>opt</SUB></I>
  
  
        <I>database-name</I> :
                DATABASE_NAME : <I>identifier</I>
  
        <I>rig-name</I> :
                RIG : <I>identifier</I>
  
  
        <I>prefixes</I> :
                PREFIX : <I>output-prefix<SUB>opt</SUB> compiler-prefix<SUB>opt</SUB> error-prefix<SUB>opt</SUB></I>
  
        <I>output-prefix</I> :
                compiler_output -&gt; <I>identifier</I>
  
        <I>compiler-prefix</I> :
                from_compiler -&gt; <I>identifier</I>
  
        <I>error-prefix</I> :
                from_database -&gt; <I>identifier</I>
  
  
        <I>types</I> :
                TYPES : <I>name-list<SUB>opt</SUB></I>
  
        <I>properties</I> :
                PROPERTIES : <I>name-list<SUB>opt</SUB></I>
  
        <I>keys</I> :
                KEYS : <I>name-list<SUB>opt</SUB></I>
  
        <I>usages</I> :
                USAGE : <I>name-list<SUB>opt</SUB></I>
  
        <I>name</I> :
                <I>identifier</I>
                <I>identifier</I> = <I>identifier</I>
                <I>identifier</I> = <I>identifier</I> | <I>identifier</I>
  
        <I>name-list</I> :
                <I>name</I>
                <I>name</I> , <I>name-list</I>
  
  
        <I>type-name</I> :
                <I>identifier</I>
  
        <I>property-name</I> :
                <I>identifier</I>
  
        <I>key-name</I> :
                <I>identifier</I>
  
        <I>usage-name</I> :
                <I>identifier</I>
  
  
        <I>entries</I> :
                ENTRIES : <I>entries-list<SUB>opt</SUB></I>
  
        <I>entry-list</I> :
                <I>entry entry-list<SUB>opt</SUB></I>
  
        <I>entry</I> :
                <I>identifier</I> ( <I>param-list<SUB>opt</SUB></I> ) { <I>entry-body</I> }
  
        <I>entry-body</I> :
                <I>alt-name<SUB>opt</SUB> entry-usage<SUB>opt</SUB> entry-properties<SUB>opt</SUB> map-list<SUB>opt</SUB></I>
  
  
        <I>parameter</I> :
                <I>type-name</I> : <I>identifier</I>
  
        <I>param-list</I> :
                <I>parameter</I>
                <I>parameter</I> , <I>param-list</I>
  
        <I>param-name</I> :
                <I>identifier</I>
  
  
        <I>alt-name</I> :
                ALT_NAME : <I>identifier</I>
  
        <I>entry-usage</I> :
                USAGE : <I>usage-name</I>
                USAGE : <I>usage-name</I> | <I>usage-name</I>
  
        <I>entry-properties</I> :
                PROPERTIES : <I>property-list<SUB>opt</SUB></I>
  
        <I>property-list</I> :
                <I>property-name</I>
                <I>property-name</I> , <I>property-list</I>
  
  
        <I>map</I> :
                KEY ( <I>key-name</I> ) <I>message-list<SUB>opt</SUB></I>
                KEY ( <I>key-name</I> ) <I>message-list<SUB>opt</SUB></I> | <I>message-list<SUB>opt</SUB></I>
  
        <I>map-list</I> :
                <I>map map-list<SUB>opt</SUB></I>
  
        <I>message-list</I> :
                <I>string message-list<SUB>opt</SUB></I>
                <I>param-name message-list<SUB>opt</SUB></I>
        </programlisting>
      </para>
    </sect1>
  </chapter>
</book>