Rev 6 | Blame | Compare with Previous | Last modification | View Log | RSS feed
<?xml version="1.0" standalone="no"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
<!--
$Id$
-->
<book>
<bookinfo>
<title>C++ Producer Guide</title>
<corpauthor>The TenDRA Project</corpauthor>
<author>
<firstname>Jeroen</firstname>
<surname>Ruigrok van der Werven</surname>
</author>
<authorinitials>JRvdW</authorinitials>
<pubdate>2004</pubdate>
<copyright>
<year>2004</year>
<year>2005</year>
<holder>The TenDRA Project</holder>
</copyright>
<copyright>
<year>1998</year>
<holder>DERA</holder>
</copyright>
</bookinfo>
<chapter>
<sect1 id="intro">
<title>Introduction</title>
<para>This document is designed as a technical overview of the TenDRA C++
to TDF/ANDF producer. It is divided into two broad areas; descriptions
of the <A HREF="#interface">public interfaces</A> of the producer, and
an overview of the producer <A HREF="#program">source code</A>.</para>
<para>Whereas the interface description contains most of the information
which would be required in a users' guide, it is not necessarily in a
readily digestible form. The C++ producer is designed to complement the
existing TenDRA C to TDF producer; although they are completely distinct
programs, the same design philosophy underlies both and they share a
number of common interfaces. There are no radical differences between
the two producers, besides the fact that the C++ producer covers a
vastly larger and more complex language. This means that much of the
<A HREF="#tdfc">existing documentation on the C producer</A> can be
taken as also applying to the C++ producer. This document tries to make
clear where the C++ producer extends the C producer's interfaces, and
those portions of these interfaces which are not directly applicable to
C++.</para>
<para>
A familiarity with both C++ and TDF is assumed. The version of C++
implemented is that given by the <A HREF="#cplusplus">draft ISO C++
standard</A>. All references to "ISO C++" within the document
should strictly be qualified using the word "draft", but
for convenience this has been left implicit. The C++ producer has
a number of switches which allow it to be configured for older dialects
of C++. In particular, the version of C++ described in the <A HREF="#arm">ARM
(Annotated Reference Manual)</A> is fully supported.
</para>
<para>The <A HREF="#tdf">TDF specification</A> (version 4.0) may be consulted
for a description of the compiler intermediate language used. The
paper
<A HREF="#port"><I>TDF and Portability</I></A> provides a useful (if
slightly old) introduction to some of the ideas relating to static
program analysis and interface checking which underlie the whole TenDRA
compilation system.
</para>
<para>
The warning sign:
<IMG SRC="../images/warn.gif" ALT="warning"/>
is used within the document to indicate areas where the implementation
is currently incomplete or incorrect.
</para>
<sect2 id="update">
<title>1.1. Updated introduction</title>
<para>Since this document was originally written, the old C producer,
<I>tdfc</I>, has been replaced by a new C producer, <I>tdfc2</I>,
which is just a modified version of the C++ producer, <I>tcpplus</I>.
All C producer documentation continues to apply to the new C producer,
but the new C producer also has many of the features described in this
document as only applying to the C++ producer.</para>
</sect2>
</sect1>
<sect1 id="interface">
<title>Interface descriptions</title>
<para>
The most important public interfaces of the C++ producer are the ISO
C++ standard and the TDF 4.0 specification; however there are other
interfaces, mostly common to both the C and C++ producers, which are
described in this section.
</para>
<para>
An important design criterion of the C++ producer was that it should
be strictly ISO conformant by default, but have a method whereby dialect
features and extra static program analysis can be enabled. This compiler
configuration is controlled by the
<A HREF="pragma.html"><code>#pragma TenDRA</code> directives</A>
described in the first section.
</para>
<para>
The requirement that the C and C++ producers should be able to translate
portable C or C++ programs into target independent TDF requires a
mechanism whereby the target dependent implementations of APIs can
be represented. This mechanism, the <A HREF="token.html"><code>#pragma
token</code> syntax</A>, is described in the following section. Note
that at present this mechanism only contains support for C APIs; it
is considered that the C++ language itself contains sufficient interface
mechanisms for C++ APIs to be described.
</para>
<para>
The C and C++ producers provide two mechanisms whereby type and declaration
information derived from a translation unit can be stored to a file
for post-processing by other tools. The first is the
<A HREF="dump.html">symbol table dump</A>, which is a public interface
designed for use by third party tools. The second is the
<A HREF="link.html">C/C++ spec file</A>, which is designed for ease
of reading and writing by the producers themselves, and is used for
intermodule analysis.
</para>
<para>
The mapping from C++ to TDF implemented by the C++ producer is largely
straightforward. There are however target dependencies arising within
the language itself which require special handling. These are represented
by certain <A HREF="lib.html">standard tokens</A> which the producer
requires to be defined on the target machine. These tokens are also
used to describe the interface between the producer and the run-time
system. Note that the C++ producer is primarily concerned with the
C++ language, not with the standard C++ library. An example implementation
of those library components which are required as an integral part
of the language (memory allocation, exception handling, run-time type
information etc.) is provided. Otherwise, libraries should be obtained
from third parties. A number of hints on <A HREF="std.html">integrating
such libraries</A> with the C++ producer are given.
</para>
</sect1>
<sect1 id="program">
<title>Program overview</title>
<para>
The C++ producer is a large program (over 200000 lines, including
automatically generated code) written in C. A description of the
<A HREF="style.html#language">coding conventions</A> used, the
<A HREF="style.html#api">API</A> observed and the basic organisation
of the <A HREF="style.html#src">source code</A> are described in the
first section.
</para>
<para>
One of the design methods used in the C++ producer is the extensive
use of automatic code generation tools. The type system is based
around the <code>calculus</code> tool, which allows complex type systems
to be described in a simple format. The interface generated by <code>calculus
</code> allows for rigorous static type checking, generic type constructors
for lists, stacks etc., encapsulation of the operations on the types
within the system, and optional run-time checking for null pointers
and discriminated union tags. An overview is given of the <A HREF="alg.html">type
system</A> used as the basis of the C++ producer design. Also see
the
<A HREF="../utilities/calc.html"><code>calculus</code> users' guide</A>.
</para>
<para>
The other general purpose code generation tool used in the C++ producer
is the parser generator, <code>sid</code>. A brief description of
the problems in writing a <A HREF="parse.html">C++ parser</A> is given.
Also see the <A HREF="../utilities/sid.html"><code>sid</code> users'
guide</A>.
</para>
<para>
The other code generation tools used were written specifically for
the C++ producer. The error reporting routines within the producer
are based on an <A HREF="error.html">error catalogue</A>, from which
code for constructing and printing errors is generated. The
<A HREF="tdf.html">TDF output routines</A> are based on primitives
automatically generated from a standard database describing the TDF
specification.
</para>
<para>
The program itself is well commented, so no lower level program documentation
has been provided. When performing development work the producer
should be compiled with the <code>DEBUG</code> macro defined. This
enables the <code>calculus</code> run-time checks, along with other
assertions, and makes available the debugging routines,
<code>DEBUG_</code><I>type</I>, which can be used to print an object
from the internal type system.
</para>
</sect1>
<sect1 id="reference">
<title>References</title>
<itemizedlist>
<listitem><A id="cplusplus"><B>Working paper for Draft Proposed
Internation Standard for Information Systems - Programming Language
C++</B></A>, X3J16/96-0225, December 1996:
<A HREF="http://www.cygnus.com/misc/wp/dec96pub/">
<code>http://www.cygnus.com/misc/wp/dec96pub/</code></A> or
<A HREF="http://www.maths.warwick.ac.uk/c++/pub/wp/html/cd2/">
<code>http://www.maths.warwick.ac.uk/c++/pub/wp/html/cd2/</code></A>.
</listitem>
<listitem><A id="arm"><B>The Annotated C++ Reference Manual</B></A>,
Margaret Ellis and Bjarne Stroustrup, ISBN 0-201-51459-1,
Addison-Wesley, 1990:
<A HREF="http://heg-school.aw.com/cseng/authors/ellis/annocpp/annocpp.html">
<code>http://heg-school.aw.com/cseng/authors/ellis/annocpp/annocpp.html</code>
</A>
</listitem>
<listitem><A id="tdf"><B>TDF Specification, Issue 4.0</B></A>:
<A HREF="../tdf/spec1.html">attached</A>.
</listitem>
<listitem><A id="tdfc"><B>C Checker Reference Manual</B></A>:
<A HREF="../tdfc/tdfc1.html">attached</A>.
</listitem>
<listitem><A id="port"><B>TDF and Portability</B></A>:
<A HREF="../port/port1.html">attached</A>.
</listitem>
<listitem><A id="cstyle"><B>C Coding Standards</B></A>,
DRA/CIS(SE2)/WI/94/57/2.0 (OSSG internal document).
</listitem>
</itemizedlist>
</sect1>
<sect1>
<title>
C++ Producer Guide: Invocation
</title>
<sect2>
<title>2.1. Invocation</title>
<para>
In this section it is described how the C++ to TDF producer,
<code>tcpplus</code>, fits into the overall compilation scheme controlled
by the TenDRA compiler front-end, <code>tcc</code>, or the TenDRA
checker front-end, <code>tchk</code>. While it is possible to use
<code>tcpplus</code> as a stand-alone program, it is recommended that
it should be invoked via <code>tcc</code> or <code>tchk</code>. The
<code>tcc</code> users' guide should be consulted for more details.
</para>
<para>
<code>tcc</code> and <code>tchk</code> require the <code>-Yc++</code>
command-line option in order to enable their C++ capabilities. Files
with a <code>.C</code> suffix are recognised as C++ source files and
passed to <code>tcpplus</code> for processing (see
<A HREF="#compile">below</A>). It is possible to change the suffix
used for C++ source files; for example <code>-sC:cc</code> causes
<code>.cc</code> files to be recognised as C++ source files. An interesting
variation is <code>-sC:c</code> which causes C source files to be
processed by the C++ producer. Similarly <code>.I</code> files are
recognised as preprocessed C++ source files and <code>.K</code>
files are recognised as C++ spec files.
</para>
<para>
Most of the command-line option handling for <code>tcpplus</code>
is done by <code>tcc</code> and <code>tchk</code>, however it is possible
to pass the option <I>opt</I> directly to <code>tcpplus</code> using
the option <code>-Wx,</code><I>opt</I> to <code>tcc</code> or <code>tchk</code>.
Similarly <code>-Wg,</code><I>opt</I> and <code>-WS,</code><I>opt</I>
can be used to pass options to the C++ preprocessor and the C++ spec
linker (both of which are actually <code>tcpplus</code> invoked with
different options) respectively.
</para>
<sect3 id="compile">
<title>2.1.1. Compilation scheme</title>
<para>
The overall compilation scheme controlled by <code>tcc</code>, as
it relates to the C++ producer, can be represented as follows:
<IMG SRC="../images/compile.gif" ALT="compilation scheme"/>
Each C++ source file, <code>a.C</code> say, is processed using
<code>tcpplus</code> to give an output TDF capsule, <code>a.j</code>,
which is passed to the installer phase of <code>tcc</code>. The capsule
is linked with any target dependent token definition libraries, translated
to assembler and assembled to give a binary object file,
<code>a.o</code>. The various object files comprising the program
are then linked with the system libraries to give a final executable,
<code>a.out</code>.
</para>
<para>
In addition to this main compilation scheme, <code>tcpplus</code>
can additionally be made to output a <A HREF="link.html">C++ spec
file</A>
for each C++ source file, <code>a.K</code> say. These C++ spec files
can be linked, using <code>tcpplus</code> in its spec linker mode,
to give an additional TDF capsule, <code>x.j</code> say, and a combined
C++ spec file, <code>x.K</code>. The main purpose of this C++ spec
linking is to perform intermodule checks on the program, however in
the course of this checking exported templates which are defined in
one module and used in another are instantiated. This extra code
is output to <code>x.j</code>, which is then installed and linked
in the normal way.
</para>
<para>
Note that intermodule checks, and hence intermodule template instantiations,
are only performed if the <code>-im</code> option is passed to <code>tcc</code>.
</para>
<para>
The TenDRA checker, <code>tchk</code>, is similar to <code>tcc</code>
except that it disables TDF output and has intermodule analysis enabled
by default.
</para>
</sect3>
<sect3 id="option">
<title>2.1.2. Producer options</title>
<para>
The general form for the invocation of <code>tcpplus</code> is as
follows:
<programlisting>
tcpplus [ <I>options</I> ] [ <I>input-file</I> ] .... [ <I>output-file</I> ]
</programlisting>
The output file can alternatively be specified using the
<A HREF="#output"><code>-o</code> option</A>. If no output file is
given, or the output file is <code>-</code>, the standard output is
used. In general there can be any number of input files. If no input
file is given, or the input file is <code>-</code>, the standard input
is used.
</para>
<para>
<code>tcpplus</code> has three modes which determine the form of its
input and output files. The default mode is compilation, in which
a single input C++ source file is translated into an output TDF capsule.
In preprocessing mode, specified using the
<A HREF="#preproc"><code>-E</code> option</A>, a single input C++
source file is preprocessed into an output C++ source file. Note
that the preprocessor is built into <code>tcpplus</code>, rather than,
as with most other compilers, being a separate program. The final
mode is
<A HREF="link.html">C++ spec linking</A>, specified using the
<A HREF="#linker"><code>-S</code> option</A>. Any number of C++ spec
input files are linked and any code generated as a result (for example,
template instantiations) is written to the output TDF capsule.
</para>
<para>
In either compilation or spec linking mode, a C++ spec output file
can be generated, in addition to the TDF capsule, using the
<A HREF="#spec"><code>-s</code> option</A>. In any mode a symbol
table dump output file can generated using the <A HREF="#dump"><code>-d</code>
option</A>.
</para>
<para>
Command-line options can appear in any order and can be interspersed
with the input and output files, except following a <code>--</code>
option. All the multi-part options can be given either as one or
two command-line arguments, so that <code>-I</code><I>directory</I>
and
<code>-I</code> <I>directory</I> are equivalent. The recognised options
are as follows:
<itemizedlist>
<listitem><B>-A<I>predicate</I>(<I>tokens</I>)</B>
Asserts that the given predicate is true, that is to say:
<programlisting>
#assert <I>predicate</I> ( <I>tokens</I> )
</programlisting>
The special case <code>-A-</code> undefines all the built-in predicates
(of which there are none). Use of this option automatically enables
support for the <A HREF="pragma.html#ppdir"><code>#assert</code> and
<code>#unassert</code> directives</A>.
</listitem>
<listitem><B>-D<I>macro</I></B>
<B>-D<I>macro</I>=<I>tokens</I></B>
Defines the given macro to be 1 in the first case, or the given sequence
of preprocessing tokens in the second case, that is to say:
<programlisting>
#define <I>macro</I> 1
#define <I>macro tokens</I>
</programlisting>
respectively. In fact <code>-D</code> and <code>-U</code> options
to
<code>tcc</code> are not passed as <code>-D</code> and <code>-U</code>
options to <code>tcpplus</code>. Instead a
<A HREF="#start-up">start-up</A> file containing the equivalent
<code>#define</code> and <code>#undef</code> directives is used.
</listitem>
<listitem><A id="preproc"><B>-E</B></A>
Enables preprocessing mode in which the input C++ source file is preprocessed
into the output file.
</listitem>
<listitem><B>-F<I>file</I></B>
Causes a list of command-line options to be read from <I>file</I>.
Other than empty lines and lines beginning with <code>#</code>, each
line in the file is treated as if it had been specified as a separate
command-line option.
</listitem>
<listitem><B>-H</B>
Enables verbose inclusion mode in which warnings are printed at the
start and end of each included source file.
</listitem>
<listitem><B>-I<I>directory</I></B>
Adds the given directory to the list searched for included source
files. No such directories are built into the producer by default.
</listitem>
<listitem><A id="directory"><B>-N<I>name</I>:<I>directory</I></B></A>
This is identical to <code>-I</code><I>directory</I> except that it
also associates the given identifier with the directory. The directory
name can be used to specify a <A HREF="pragma.html#scope">compilation
profile</A> to be used on files included from this directory.
</listitem>
<listitem><A id="linker"><B>-S</B></A>
Enables C++ spec linker mode, in which any number of C++ spec input
files are linked together.
</listitem>
<listitem><B>-U<I>macro</I></B>
Undefines the given macro, that is to say:
<programlisting>
#undef <I>macro</I>
</programlisting>
The special case <code>-U-</code> undefines all the built-in macros.
These may be described as follows:
<programlisting>
#define __FILE__ <I>(current file)</I>
#define __LINE__ <I>(current line)</I>
#define __TIME__ <I>(current time)</I>
#define __DATE__ <I>(current date)</I>
#define __STDC__ 1
#define __STDC_VERSION__ 199409L
#define __cplusplus 199711L
</programlisting>
The actual value of <code>__cplusplus</code> gives the date of the
draft ISO C++ standard on which the current version of the producer
is based. The value given above gives the expected date of the final
C++ standard.
</listitem>
<listitem><B>-V</B>
Causes the name of each function to be printed to the standard output
as it is compiled.
</listitem>
<listitem><B>-W<I>option</I></B>
Sets the given <A HREF="pragma.html#low">compiler option</A> to give
a warning, that is to say:
<programlisting>
#pragma TenDRA option "<I>option</I>" warning
</programlisting>
The special case <code>-Wall</code> enables a wide range of warnings.
</listitem>
<listitem><B>-X</B>
Disables exception handling. The <A HREF="lib.html#except">current
implementation</A> can be a large run-time overhead if not required.
The effect of linking any module compiled with this option with a
module which throws an exception is undefined. This is equivalent
to <A HREF="#output"><code>-j-e</code></A>.
</listitem>
<listitem><B>-a</B>
Causes complete program analysis to be applied. That is to say it
is assumed that no other translation units need to be linked in order
for the program to execute.
</listitem>
<listitem><B>-c</B>
Disables TDF output. The output file will still be a valid TDF capsule,
but it will contain no information. This is equivalent to
<A HREF="#output"><code>-j-c</code></A>.
</listitem>
<listitem><para><A id="dump"><B>-d<I>opt</I>=<I>dump-file</I></B></A>
Specifies the given file as a <A HREF="dump.html">symbol table dump</A>
output file. <I>opt</I> will be a series of characters describing
the information to be dumped, as follows:
<table>
<tr><th>Key</th>
<th>Description</th>
</tr>
<tr><td><code>a</code></td>
<td>equivalent to <code>ehlmu</code></td>
</tr>
<tr><td><code>c</code></td>
<td>dump string literals</td>
</tr>
<tr><td><code>e</code></td>
<td>dump error messages</td>
</tr>
<tr><td><code>h</code></td>
<td>dump header information</td>
</tr>
<tr><td><code>k</code></td>
<td>dump keyword identifiers</td>
</tr>
<tr><td><code>l</code></td>
<td>dump local variables</td>
</tr>
<tr><td><code>m</code></td>
<td>dump macro identifiers</td>
</tr>
<tr><td><code>s</code></td>
<td>dump scope information</td>
</tr>
<tr><td><code>u</code></td>
<td>dump identifier usage information</td>
</tr>
</table>
</para>
<para>
Note that these correspond to the <code>tcc -sym</code> options.
</para>
</listitem>
<listitem><A id="end-up"><B>-e<I>file</I></B></A>
Specifies the given file as an end-up file. This is equivalent to
adding:
<programlisting>
#include "<I>file</I>"
</programlisting>
at the end of the input source file. More than one end-up file may
be given; they are processed in the order given.
</listitem>
<listitem><A id="start-up"><B>-f<I>file</I></B></A>
Specifies the given file as a start-up file. This is equivalent to
adding:
<programlisting>
#include "<I>file</I>"
</programlisting>
at the start of the input source file. More than one start-up file
may be given; they are processed in the order given.
</listitem>
<listitem><B>-g</B>
Specifies that the output TDF capsule should also contain information
to allow for the generation of run-time debugging directives. This
is equivalent to <A HREF="#output"><code>-jg</code></A>.
</listitem>
<listitem><B>-h</B>
Causes a full list of command-line options to be printed. This includes
a number not documented here which are unlikely to prove useful to
the normal user.
</listitem>
<listitem><A id="output"><B>-j<I>opt</I></B></A>
Sets the TDF output options given by <I>opt</I>. This consists of
a sequence of characters describing the options to be enabled or disabled.
By default, or following a <code>+</code>, the options are enabled;
following a <code>-</code> they are disabled. The available options
are as follows:
</listitem>
<table>
<tr><th>Key</th>
<th>Default</th>
<th>Description</th>
</tr>
<tr><td><code>a</code></td>
<td>off</td>
<td>output external names for local objects</td>
</tr>
<tr><td><code>b</code></td>
<td>off</td>
<td>work round old installer bugs</td>
</tr>
<tr><td><code>c</code></td>
<td>on</td>
<td>output TDF capsule</td>
</tr>
<tr><td><code>d</code></td>
<td>off</td>
<td>output termination function</td>
</tr>
<tr><td><code>e</code></td>
<td>on</td>
<td>output exceptions</td>
</tr>
<tr><td><code>f</code></td>
<td>on</td>
<td>mangle template function signatures</td>
</tr>
<tr><td><code>g</code></td>
<td>off</td>
<td>output debugging information</td>
</tr>
<tr><td><code>i</code></td>
<td>off</td>
<td>output dynamic initialisers as a function</td>
</tr>
<tr><td><code>n</code></td>
<td>on</td>
<td>mangle object names</td>
</tr>
<tr><td><code>o</code></td>
<td>off</td>
<td>order class data members by access</td>
</tr>
<tr><td><code>p</code></td>
<td>on</td>
<td>output partial destructors</td>
</tr>
<tr><td><code>r</code></td>
<td>on</td>
<td>output run-time type information</td>
</tr>
<tr><td><code>s</code></td>
<td>on</td>
<td>output shared string literals</td>
</tr>
<tr><td><code>t</code></td>
<td>off</td>
<td>output token declarations</td>
</tr>
<tr><td><code>u</code></td>
<td>on</td>
<td>output unused static variables</td>
</tr>
<tr><td><code>v</code></td>
<td>off</td>
<td>output local virtual function tables</td>
</tr>
</table>
<listitem><A id="error"><B>-m<I>opt</I></B></A>
Sets the error formatting options given by <I>opt</I>. This consists
of a sequence of characters describing the options to be enabled or
disabled. By default, or following a <code>+</code>, the options are
enabled; following a <code>-</code> they are disabled. The available
options are as follows:
<table>
<tr><th>Key</th>
<th>Default</th>
<th>Description</th>
</tr>
<tr><td><code>c</code></td>
<td>off</td>
<td>show source code with error</td>
</tr>
<tr><td><code>e</code></td>
<td>off</td>
<td>show error name</td>
</tr>
<tr><td><code>f</code></td>
<td>on</td>
<td>reliable <code>fseek</code> function</td>
</tr>
<tr><td><code>g</code></td>
<td>off</td>
<td>record statement locations</td>
</tr>
<tr><td><code>i</code></td>
<td>on</td>
<td>reliable <code>stat</code> function</td>
</tr>
<tr><td><code>k</code></td>
<td>off</td>
<td>enable C++ spec output</td>
</tr>
<tr><td><code>l</code></td>
<td>off</td>
<td>output full error location</td>
</tr>
<tr><td><code>s</code></td>
<td>on</td>
<td>output ISO section number</td>
</tr>
<tr><td><code>t</code></td>
<td>off</td>
<td>use <code>typedef</code> names in errors</td>
</tr>
<tr><td><code>w</code></td>
<td>off</td>
<td>disable warnings</td>
</tr>
<tr><td><code>z</code></td>
<td>off</td>
<td>continue after error</td>
</tr>
</table>
</listitem>
<listitem><A id="table"><B>-n<I>port-table</I></B></A>
Specifies that the given <A HREF="pragma.html#table">portability table</A>
should be used to specify the basic configuration parameters.
</listitem>
<listitem><A id="output"><B>-o<I>output-file</I></B></A>
Gives an alternative method of specifying the output file.
</listitem>
<listitem><B>-q</B>
Causes the program to quit immediately without processing its input
files. This is useful primarily in version and command-line option
queries.
</listitem>
<listitem><A id="spec"><B>-s<I>spec-file</I></B></A>
Specifies the given file as a C++ spec output file.
</listitem>
<listitem><B>-t</B>
Specifies that token declarations should be included in the output
TDF capsule. While these are strictly unnecessary, they help when
pretty-printing the output. This is equivalent to
<A HREF="#output"><code>-jt</code></A>.
</listitem>
<listitem><A id="unmangle"><B>-u</B></A>
The form:
<programlisting>
tcpplus -u <I>name</I> .... <I>name</I>
</programlisting>
can be used to print the unmangled forms of a list of
<A HREF="lib.html#mangle">mangled identifier names</A> to the standard
output.
</listitem>
<listitem><B>-v</B>
Causes the C++ producer version number, plus information on the versions
of C++ and TDF supported, to be printed to the standard error.
</listitem>
<listitem><B>-w</B>
Disables all warning messages. This is equivalent to
<A HREF="#error"><code>-mw</code></A>.
</listitem>
<listitem><B>-z</B>
Forces an output file to be created even if compilation errors occur.
The effect of installing a TDF capsule produced using this option
is undefined. This is equivalent to <A HREF="#error"><code>-mz</code></A>.
</listitem>
<listitem><B>--</B>
Marks the last option. Any subsequent arguments are interpreted as
input and output files even if they resemble command-line options.
</listitem>
</itemizedlist>
</para>
</sect3>
</sect2>
<sect2>
<title>2.2. Compiler configuration</title>
<para>
This section describes how the C++ producer can be configured to apply
extra static checks or to support various dialects of C++. In all
cases the default behaviour is precisely that specified in the ISO
C++ standard with no extra checks.
</para>
<para>
Certain very basic configuration information is specified using a
<A HREF="#table">portability table</A>, however the primary method
of configuration is by means of <code>#pragma</code> directives.
These directives may be placed within the program itself, however
it is generally more convenient to group them into a
<A HREF="man.html#start-up">start-up file</A> in order to create a
<A id="usr">user-defined compilation profile</A>. The
<code>#pragma</code> directives recognised by the C++ producer have
one of the equivalent forms:
<programlisting>
#pragma TenDRA ....
#pragma TenDRA++ ....
</programlisting>
Some of these are common to the C and C++ producers (although often
with differing default behaviour). The C producer will ignore any
<code>TenDRA++</code> directives, so these may be used in compilation
profiles which are to be used by both producers. In the descriptions
below, the presence of a <code>++</code> is used to indicate a directive
which is C++ specific; the other directives are common to both producers.
</para>
<para>
Within the description of the <code>#pragma</code> syntax, <I>on</I>
stands for <code>on</code>, <code>off</code> or <code>warning</code>,
<I>allow</I> stands for <code>allow</code>, <code>disallow</code>
or
<code>warning</code>, <I>string-literal</I> is any string literal,
<I>integer-literal</I> is any integer literal, <I>identifier</I> is
any simple, unqualified identifier name, and <I>type-id</I> is any
type identifier. Other syntactic items are described in the text.
A
<A HREF="pragma1.html">complete grammar</A> for the <code>#pragma</code>
directives accepted by the C++ producer is given as an annex.
</para>
<sect3 id="table">
<title>2.2.1. Portability tables</title>
<para>
Certain very basic configuration information is read from a file called
a portability table, which may be specified to the producer using
a
<A HREF="man.html#table"><code>-n</code> option</A>. This information
includes the minimum sizes of the basic integral types, the
<A HREF="#char">sign of plain <code>char</code></A>, and whether signed
types can be assumed to be symmetric (for example, [-127,127]) or
maximum (for example, [-128,127]).
</para>
<para>
The default portability table values, which are built into the producer,
can be expressed in the form:
<programlisting>
char_bits 8
short_bits 16
int_bits 16
long_bits 32
signed_range symmetric
char_type either
ptr_int none
ptr_fn no
non_prototype_checks yes
multibyte 1
</programlisting>
This illustrates the syntax for the portability table; note that all
ten entries are required, even though the last four are ignored.
</para>
</sect3>
<sect3 id="low">
<title>2.2.2. Low level configuration</title>
<para>
The simplest level of configuration is to reset the severity level
of a particular error message using:
<programlisting>
#pragma TenDRA++ error <I>string-literal on</I>
#pragma TenDRA++ error <I>string-literal allow</I>
</programlisting>
The given <I>string-literal</I> should name an error from the
<A HREF="error.html">error catalogue</A>. A severity of <code>on</code>
or <code>disallow</code> indicates that the associated diagnostic
message should be an error, which causes the compilation to fail.
A severity of
<code>warning</code> indicates that the associated diagnostic message
should be a warning, which is printed but allows the compilation to
continue. A severity of <code>off</code> or <code>allow</code>
indicates that the associated error should be ignored. Reducing the
severity of any error from its default value, other than via one of
the dialect directives described in this section, results in undefined
behaviour.
</para>
<para>
The next level of configuration is to reset the severity level of
a particular compiler option using:
<programlisting>
#pragma TenDRA++ option <I>string-literal on</I>
#pragma TenDRA++ option <I>string-literal allow</I>
</programlisting>
The given <I>string-literal</I> should name an option from the option
catalogue. The simplest form of compiler option just sets the severity
level of one or more error messages. Some of these options may require
additional processing to be applied.</para>
<para>
It is possible to link a particular error message to a particular
compiler option using:
<programlisting>
#pragma TenDRA++ error <I>string-literal</I> as option <I>string-literal</I>
</programlisting>
</para>
<para>
Note that the directive:
<programlisting>
#pragma TenDRA++ use error <I>string-literal</I>
</programlisting>
can be used to raise a given error at any point in a translation unit
in a similar fashion to the <code>#error</code> directive. The values
of any parameters for this error are unspecified.
</para>
<para>
The directives just described give the primitive operations on error
messages and compiler options. Many of the remaining directives in
this section are merely higher level ways of expressing these primitives.
</para>
</sect3>
<sect3 id="scope">
<title>2.2.3. Checking scopes</title>
<para>
Most compiler options are scoped. A checking scope may be defined
by enclosing a list of declarations within:
<programlisting>
#pragma TenDRA begin
....
#pragma TenDRA end
</programlisting>
If the final <code>end</code> directive is omitted then the scope
ends at the end of the translation unit. Checking scopes may be nested
in the obvious way. A checking scope inherits its initial set of
checks from its enclosing scope (this includes the implicit main checking
scope consisting of the entire input file). Any checks switched on
or off within a scope apply only to the remainder of that scope and
any scope it contains. A particular check can only be set once in
a given scope. The set of applied checks reverts to its previous state
at the end of the scope.</para>
<para>
A checking scope can be named using the directives:
<programlisting>
#pragma TenDRA begin name environment <I>identifier</I>
....
#pragma TenDRA end
</programlisting>
Checking scope names occupy a namespace distinct from any other namespace
within the translation unit. A named scope defines a set of modifications
to the current checking scope. These modifications may be reapplied
within a different scope using:
<programlisting>
#pragma TenDRA use environment <I>identifier</I>
</programlisting>
The default behaviour is not to allow checks set in the named checking
scope to be reset in the current scope. This can however be modified
using:
<programlisting>
#pragma TenDRA use environment <I>identifier</I> reset <I>allow</I>
</programlisting>
</para>
<para>
Another use of a named checking scope is to associate a checking scope
with a named include file directory. This is done using:
<programlisting>
#pragma TenDRA directory <I>identifier</I> use environment <I>identifier</I>
</programlisting>
where the directory name is one introduced via a
<A HREF="man.html#directory"><code>-N</code> command-line option</A>.
The effect of this directive, if a <code>#include</code> directive
is found to resolve to a file from the given directory, is as if the
file was enclosed in directives of the form:
<programlisting>
#pragma TenDRA begin
#pragma TenDRA use environment <I>identifier</I> reset allow
....
#pragma TenDRA end
</programlisting>
</para>
<para>
The checks applied to the expansion of a macro definition are those
from the scope in which the macro was defined, not that in which it
was expanded. The macro arguments are checked in the scope in which
they are specified, that is to say, the scope in which the macro is
expanded. This enables macro definitions to remain localised with
respect to checking scopes.
</para>
</sect3>
<sect3 id="limits">
<title>2.2.4. Implementation limits</title>
<para>
This table gives the default implementation limits imposed by the
C++ producer for the various implementation quantities listed in Annex
B of the ISO C++ standard, together with the minimum limits allowed
in ISO C and C++. A default limit of <I>none</I> means that the quantity
is limited only by the size of the host machine (either <code>ULONG_MAX</code>
or until it runs out of memory). A limit of <I>target</I> means that
while no limits is imposed by the C++ front-end, particular target
machines may impose such limits.
</para>
<table>
<tr><th>Quantity identifier</th>
<th>Min C limit</th> <th>Min C++ limit</th>
<th>Default limit</th>
</tr>
<tr><td>statement_depth</td>
<td>15</td> <td>256</td>
<td>none</td>
</tr>
<tr><td>hash_if_depth</td>
<td>8</td> <td>256</td>
<td>none</td>
</tr>
<tr><td>declarator_max</td>
<td>12</td> <td>256</td>
<td>none</td>
</tr>
<tr><td>paren_depth</td>
<td>32</td> <td>256</td>
<td>none</td>
</tr>
<tr><td>name_limit</td>
<td>31</td> <td>1024</td>
<td>none</td>
</tr>
<tr><td>extern_name_limit</td>
<td>6</td> <td>1024</td>
<td>target</td>
</tr>
<tr><td>external_ids</td>
<td>511</td> <td>65536</td>
<td>target</td>
</tr>
<tr><td>block_ids</td>
<td>127</td> <td>1024</td>
<td>none</td>
</tr>
<tr><td>macro_ids</td>
<td>1024</td> <td>65536</td>
<td>none</td>
</tr>
<tr><td>func_pars</td>
<td>31</td> <td>256</td>
<td>none</td>
</tr>
<tr><td>func_args</td>
<td>31</td> <td>256</td>
<td>none</td>
</tr>
<tr><td>macro_pars</td>
<td>31</td> <td>256</td>
<td>none</td>
</tr>
<tr><td>macro_args</td>
<td>31</td> <td>256</td>
<td>none</td>
</tr>
<tr><td>line_length</td>
<td>509</td> <td>65536</td>
<td>none</td>
</tr>
<tr><td>string_length</td>
<td>509</td> <td>65536</td>
<td>none</td>
</tr>
<tr><td>sizeof_object</td>
<td>32767</td> <td>262144</td>
<td>target</td>
</tr>
<tr><td>include_depth</td>
<td>8</td> <td>256</td>
<td>256</td>
</tr>
<tr><td>switch_cases</td>
<td>257</td> <td>16384</td>
<td>none</td>
</tr>
<tr><td>data_members</td>
<td>127</td> <td>16384</td>
<td>none</td>
</tr>
<tr><td>enum_consts</td>
<td>127</td> <td>4096</td>
<td>none</td>
</tr>
<tr><td>nested_class</td>
<td>15</td> <td>256</td>
<td>none</td>
</tr>
<tr><td>atexit_funcs</td>
<td>32</td> <td>32</td>
<td>target</td>
</tr>
<tr><td>base_classes</td>
<td>N/A</td> <td>16384</td>
<td>none</td>
</tr>
<tr><td>direct_bases</td>
<td>N/A</td> <td>1024</td>
<td>none</td>
</tr>
<tr><td>class_members</td>
<td>N/A</td> <td>4096</td>
<td>none</td>
</tr>
<tr><td>virtual_funcs</td>
<td>N/A</td> <td>16384</td>
<td>none</td>
</tr>
<tr><td>virtual_bases</td>
<td>N/A</td> <td>1024</td>
<td>none</td>
</tr>
<tr><td>static_members</td>
<td>N/A</td> <td>1024</td>
<td>none</td>
</tr>
<tr><td>friends</td>
<td>N/A</td> <td>4096</td>
<td>none</td>
</tr>
<tr><td>access_declarations</td>
<td>N/A</td> <td>4096</td>
<td>none</td>
</tr>
<tr><td>ctor_initializers</td>
<td>N/A</td> <td>6144</td>
<td>none</td>
</tr>
<tr><td>scope_qualifiers</td>
<td>N/A</td> <td>256</td>
<td>none</td>
</tr>
<tr><td>external_specs</td>
<td>N/A</td> <td>1024</td>
<td>none</td>
</tr>
<tr><td>template_pars</td>
<td>N/A</td> <td>1024</td>
<td>none</td>
</tr>
<tr><td>instance_depth</td>
<td>N/A</td> <td>17</td>
<td>17</td>
</tr>
<tr><td>exception_handlers</td>
<td>N/A</td> <td>256</td>
<td>none</td>
</tr>
<tr><td>exception_specs</td>
<td>N/A</td> <td>256</td>
<td>none</td>
</tr>
</table>
<para>
It is possible to impose lower limits on most of the quantities listed
above by means of the directive:
<programlisting>
#pragma TenDRA++ option value <I>string-literal integer-literal</I>
</programlisting>
where <I>string-literal</I> gives one of the quantity identifiers
listed above and <I>integer-literal</I> gives the limit to be imposed.
An error is reported if the quantity exceeds this limit (note however
that checks have not yet been implemented for all of the quantities
listed). Note that the <A HREF="#identifier"><code>name_limit</code></A>
and
<A HREF="#include"><code>include_depth</code></A> implementation limits
can be set using dedicated directives.
</para>
<para>
The maximum number of errors allowed before the producer bails out
can be set using the directive:
<programlisting>
#pragma TenDRA++ set error limit <I>integer-literal</I>
</programlisting>
The default value is 32.
</para>
</sect3>
<sect3 id="lex">
<title>2.2.5. Lexical analysis</title>
<para>
During lexical analysis, a source file which is not empty should end
in a newline character. It is possible to relax this constraint using
the directive:
<programlisting>
#pragma TenDRA no nline after file end <I>allow</I>
</programlisting>
</para>
</sect3>
<sect3 id="keyword">
<title>2.2.6. Keywords</title>
<para>
In several places in this section it is described how to introduce
keywords for TenDRA language extensions. By default, no such extra
keywords are defined. There are also low-level directives for defining
and undefining keywords. The directive:
<programlisting>
#pragma TenDRA++ keyword <I>identifier</I> for keyword <I>identifier</I>
</programlisting>
can be used to introduce a keyword (the first identifier) standing
for the standard C++ keyword given by the second identifier. The
directive:
<programlisting>
#pragma TenDRA++ keyword <I>identifier</I> for operator <I>operator</I>
</programlisting>
can similarly be used to introduce a keyword giving an alternative
representation for the given operator or punctuator, as, for example,
in:
<programlisting>
#pragma TenDRA++ keyword and for operator &&
</programlisting>
Finally the directive:
<programlisting>
#pragma TenDRA++ undef keyword <I>identifier</I>
</programlisting>
can be used to undefine a keyword.
</para>
</sect3>
<sect3 id="comment">
<title>2.2.7. Comments</title>
<para>
C-style comments do not nest. The directive:
<programlisting>
#pragma TenDRA nested comment analysis <I>on</I>
</programlisting>
enables a check for the characters <code>/*</code> within C-style
comments.
</para>
</sect3>
<sect3 id="identifier-names">
<title>2.2.8. Identifier names</title>
<para>
During lexical analysis, each character in the source file has an
associated look-up value which is used to determine whether the character
can be used in an identifier name, is a white space character etc.
These values are stored in a simple look-up table. It is possible
to set the look-up value using:
<programlisting>
#pragma TenDRA++ character <I>character-literal</I> as <I>character-literal</I> allow
</programlisting>
which sets the look-up for the first character to be the default look-up
for the second character. The form:
<programlisting>
#pragma TenDRA++ character <I>character-literal</I> disallow
</programlisting>
sets the look-up of the character to be that of an invalid character.
The forms:
<programlisting>
#pragma TenDRA++ character <I>string-literal</I> as <I>character-literal</I> allow
#pragma TenDRA++ character <I>string-literal</I> disallow
</programlisting>
can be used to modify the look-up values for the set of characters
given by the string literal. For example:
<programlisting>
#pragma TenDRA character '$' as 'a' allow
#pragma TenDRA character '\r' as ' ' allow
</programlisting>
allows <code>$</code> to be used in identifier names (like <code>a</code>)
and carriage return to be a white space character. The former is
a common dialect feature and can also be controlled by the directive:
<programlisting>
#pragma TenDRA dollar as ident <I>allow</I>
</programlisting>
</para>
<para>
The maximum number of characters allowed in an identifier name can
be set using the directives:
<programlisting>
#pragma TenDRA set name limit <I>integer-literal</I>
#pragma TenDRA++ set name limit <I>integer-literal</I> warning
</programlisting>
This length is given by the <code>name_limit</code> implementation
quantity
<A HREF="#limits">mentioned above</A>. Identifiers which exceed this
length raise an error or a warning, but are not truncated.
</para>
</sect3>
<sect3 id="int">
<title>2.2.9. Integer literals</title>
<para>
The rules for finding the type of an integer literal can be described
using directives of the form:
<programlisting>
#pragma TenDRA integer literal <I>literal-spec</I>
</programlisting>
where:
<programlisting>
<I>literal-spec</I> :
<I>literal-base literal-suffix<SUB>opt</SUB> literal-type-list</I>
<I>literal-base</I> :
octal
decimal
hexadecimal
<I>literal-suffix</I> :
unsigned
long
unsigned long
long long
unsigned long long
<I>literal-type-list</I> :
* <I>literal-type-spec</I>
<I>integer-literal literal-type-spec</I> | <I>literal-type-list</I>
? <I>literal-type-spec</I> | <I>literal-type-list</I>
<I>literal-type-spec</I> :
: <I>type-id</I>
* <I>allow<SUB>opt</SUB></I> : <I>identifier</I>
* * <I>allow<SUB>opt</SUB></I> :
</programlisting>
Each directive gives a literal base and suffix, describing the form
of an integer literal, and a list of possible types for literals of
this form. This list gives a mapping from the value of the literal
to the type to be used to represent the literal. There are three
cases for the literal type; it may be a given integral type, it may
be calculated using a given <A HREF="lib.html#literal">literal type
token</A>, or it may cause an error to be raised. There are also
three cases for describing a literal range; it may be given by values
less than or equal to a given integer literal, it may be given by
values which are guaranteed to fit into a given integral type, or
it may be match any value. For example:
<programlisting>
#pragma token PROC ( VARIETY c ) VARIETY l_i # ~lit_int
#pragma TenDRA integer literal decimal 32767 : int | ** : l_i
</programlisting>
describes how to find the type of a decimal literal with no suffix.
Values less that or equal to 32767 have type <code>int</code>; larger
values have target dependent type calculated using the token
<code>~lit_int</code>. Introducing a <code>warning</code> into the
directive will cause a warning to be printed if the token is used
to calculate the value.
</para>
<para>
Note that this scheme extends that implemented by the C producer,
because of the need for more accurate information in the C++ producer.
For example, the specification above does not fully express the ISO
rule that the type of a decimal integer is the first of the types
<code>int</code>, <code>long</code> and <code>unsigned long</code>
which it fits into (it only expresses the first step). However with
the C++ extensions it is possible to write:
<programlisting>
#pragma token PROC ( VARIETY c ) VARIETY l_i # ~lit_int
#pragma TenDRA integer literal decimal ? : int | ? : long |\
? : unsigned long | ** : l_i
</programlisting>
</para>
</sect3>
<sect3 id="char">
<title>2.2.10. Character literals and built-in types</title>
<para>
By default, a simple character literal has type <code>int</code> in
C and type <code>char</code> in C++. The type of such literals can
be controlled using the directive:
<programlisting>
#pragma TenDRA++ set character literal : <I>type-id</I>
</programlisting>
The type of a wide character literal is given by the implementation
defined type <code>wchar_t</code>. By default, the definition of
this type is taken from the target machine's <code><stddef.h></code>
C header (note that in ISO C++, <code>wchar_t</code> is actually a
keyword, but its underlying representation must be the same as in
C). This definition can be overridden in the producer by means of
the directive:
<programlisting>
#pragma TenDRA set wchar_t : <I>type-id</I>
</programlisting>
for an integral type <I>type-id</I>. Similarly, the definitions of
the other implementation dependent integral types which arise naturally
within the language - the type of the difference of two pointers,
<code>ptrdiff_t</code>, and the type of the <code>sizeof</code>
operator, <code>size_t</code> - given in the <code><stddef.h></code>
header can be overridden using the directives:
<programlisting>
#pragma TenDRA set ptrdiff_t : <I>type-id</I>
#pragma TenDRA set size_t : <I>type-id</I>
</programlisting>
These directives are useful when targeting a specific machine on which
the definitions of these types are known; while they may not affect
the code generated they can cut down on spurious conversion warnings.
Note that although these types are built into the producer they are
not visible to the user unless an appropriate header is included (with
the exception of the keyword <code>wchar_t</code> in ISO C++), however
the directives:
<programlisting>
#pragma TenDRA++ type <I>identifier</I> for <I>type-name</I>
</programlisting>
can be used to make these types visible. They are equivalent to a
<code>typedef</code> declaration of <I>identifier</I> as the given
built-in type, <code>ptrdiff_t</code>, <code>size_t</code> or
<code>wchar_t</code>.
</para>
<para>
Whether plain <code>char</code> is signed or unsigned is implementation
dependent. By default the implementation is determined by the definition
of the <A HREF="lib.html#arith"><code>~char</code> token</A>, however
this can be overridden in the producer either by means of the
<A HREF="#table">portability table</A> or by the directive:
<programlisting>
#pragma TenDRA character <I>character-sign</I>
</programlisting>
where <I>character-sign</I> can be <code>signed</code>,
<code>unsigned</code> or <code>either</code> (the default). Again
this directive is useful primarily when targeting a specific machine
on which the signedness of <code>char</code> is known.
</para>
</sect3>
<sect3 id="string">
<title>2.2.11. String literals</title>
<para>
By default, character string literals have type <code>char [n]</code>
in C and older dialects of C++, but type <code>const char [n]</code>
in ISO C++. Similarly wide string literals have type <code>wchar_t
[n]</code>
or <code>const wchar_t [n]</code>. Whether string literals are
<code>const</code> or not can be controlled using the two directives:
<programlisting>
#pragma TenDRA++ set string literal : const
#pragma TenDRA++ set string literal : no const
</programlisting>
In the case where literals are <code>const</code>, the array-to-pointer
conversion is allowed to cast away the <code>const</code> to allow
for a degree of backwards compatibility. The status of this deprecated
conversion can be controlled using the directive:
<programlisting>
#pragma TenDRA writeable string literal <I>allow</I>
</programlisting>
(yes, I know that that should be <code>writable</code>). Note that
this directive has a slightly different meaning in the C producer.
</para>
<para>
Adjacent string literals tokens of similar types (either both character
string literals or both wide string literals) are concatenated at
an early stage in parser, however it is unspecified what happens if
a character string literal token is adjacent to a wide string literal
token. By default this gives an error, but the directive:
<programlisting>
#pragma TenDRA unify incompatible string literal <I>allow</I>
</programlisting>
can be used to enable the strings to be concatenated to give a wide
string literal.
</para>
<para>
If a <code>'</code> or <code>"</code> character does not have
a matching closing quote on the same line then it is undefined whether
an implementation should report an unterminated string or treat the
quote as a single unknown character. By default, the C++ producer
treats this as an unterminated string, but this behaviour can be controlled
using the directive:
<programlisting>
#pragma TenDRA unmatched quote <I>allow</I>
</programlisting>
</para>
</sect3>
<sect3 id="escape">
<title>2.2.12. Escape sequences</title>
<para>
By default, if the character following the <code>\</code> in an escape
sequence is not one of those listed in the ISO C or C++ standards
then an error is given. This behaviour, which is left unspecified
by the standards, can be controlled by the directive:
<programlisting>
#pragma TenDRA unknown escape <I>allow</I>
</programlisting>
The result is that the <code>\</code> in unknown escape sequences
is ignored, so that <code>\z</code> is interpreted as <code>z</code>,
for example. Individual escape sequences can be enabled or disabled
using the directives:
<programlisting>
#pragma TenDRA++ escape <I>character-literal</I> as <I>character-literal</I> allow
#pragma TenDRA++ escape <I>character-literal</I> disallow
</programlisting>
so that, for example:
<programlisting>
#pragma TenDRA++ escape 'e' as '\033' allow
#pragma TenDRA++ escape 'a' disallow
</programlisting>
sets <code>\e</code> to be the ASCII escape character and disables
the alert character <code>\a</code>.
</para>
<para>
By default, if the value of a character, given for example by a
<code>\x</code> escape sequence, does not fit into its type then an
error is given. This implementation dependent behaviour can however
be controlled by the directive:
<programlisting>
#pragma TenDRA character escape overflow <I>allow</I>
</programlisting>
the value being converted to its type in the normal way.
</para>
</sect3>
<sect3 id="ppdir">
<title>2.2.13. Preprocessing directives</title>
<para>
Non-standard preprocessing directives can be controlled using the
directives:
<programlisting>
#pragma TenDRA directive <I>ppdir allow</I>
#pragma TenDRA directive <I>ppdir</I> (ignore) <I>allow</I>
</programlisting>
where <I>ppdir</I> can be <code>assert</code>, <code>file</code>,
<code>ident</code>, <code>import</code> (C++ only),
<code>include_next</code> (C++ only), <code>unassert</code>,
<code>warning</code> (C++ only) or <code>weak</code>. The second form
causes the directive to be processed but ignored (note that there is no
<code>(ignore) disallow</code> form). The treatment of other unknown
preprocessing directives can be controlled using:
<programlisting>
#pragma TenDRA unknown directive <I>allow</I>
</programlisting>
Cases where the token following the <code>#</code> in a preprocessing
directive is not an identifier can be controlled using:
<programlisting>
#pragma TenDRA no directive/nline after ident <I>allow</I>
</programlisting>
When permitted, unknown preprocessing directives are ignored.
</para>
<para>
By default, unknown <code>#pragma</code> directives are ignored without
comment, however this behaviour can be modified using the directive:
<programlisting>
#pragma TenDRA unknown pragma <I>allow</I>
</programlisting>
Note that any unknown <code>#pragma TenDRA</code> directives always
give an error.
</para>
<para>
Older preprocessors allowed text after <code>#else</code> and
<code>#endif</code> directives. The following directive can be used
to enable such behaviour:
<programlisting>
#pragma TenDRA text after directive <I>allow</I>
</programlisting>
Such text after a directive is ignored.
</para>
<para>
Some older preprocessors have problems with white space in preprocessing
directives - whether at the start of the line, before the initial
<code>#</code>, or between the <code>#</code> and the directive identifier.
Such white space can be detected using the directives:
<programlisting>
#pragma TenDRA indented # directive <I>allow</I>
#pragma TenDRA indented directive after # <I>allow</I>
</programlisting>
respectively.
</para>
</sect3>
<sect3 id="target-if">
<title>2.2.14. Target dependent conditional inclusion</title>
<para>
One of the effects of trying to compile code in a target independent
manner is that it is not always possible to completely evaluate the
condition in a <code>#if</code> directive. Thus the conditional inclusion
needs to be preserved until the installer phase. This can only be
done if the target dependent <code>#if</code> is more structured than
is normally required for preprocessing directives. There are two cases;
in the first, where the <code>#if</code> appears in a statement, it
is treated as if it were a <code>if</code> statement with braces including
its branches; that is:
<programlisting>
#if cond
true_statements
#else
false_statements
#endif
</programlisting>
maps to:
<programlisting>
if ( cond ) {
true_statements
} else {
false_statements
}
</programlisting>
In the second case, where the <code>#if</code> appears in a list of
declarations, normally gives an error. The can however be overridden
by the directive:
<programlisting>
#pragma TenDRA++ conditional declaration <I>allow</I>
</programlisting>
which causes both branches of the <code>#if</code> to be analysed.
</para>
</sect3>
<sect3 id="include">
<title>2.2.15. File inclusion directives</title>
<para>
There is a maximum depth of nested <code>#include</code>
directives allowed by the C++ producer. This depth is given by the
<code>include_depth</code> implementation quantity
<A HREF="#limits">mentioned above</A>. Its value is fairly small
in order to detect recursive inclusions. The maximum depth can be
set using:
<programlisting>
#pragma TenDRA includes depth <I>integer-literal</I>
</programlisting>
</para>
<para>
A further check, for full pathnames in <code>#include</code> directives
(which may not be portable), can be enabled using the directive:
<programlisting>
#pragma TenDRA++ complete file includes <I>allow</I>
</programlisting>
</para>
</sect3>
<sect3 id="macro">
<title>2.2.16. Macro definitions</title>
<para>
By default, multiple consistent definitions of a macro are allowed.
This behaviour can be controlled using the directive:
<programlisting>
#pragma TenDRA extra macro definition <I>allow</I>
</programlisting>
The ISO C/C++ rules for determining whether two macro definitions
are consistent are fairly restrictive. A more relaxed rule allowing
for consistent renaming of macro parameters can be enabled using:
<programlisting>
#pragma TenDRA weak macro equality <I>allow</I>
</programlisting>
</para>
<para>
In the definition of macros with parameters, a <code>#</code> in the
replacement list must be followed by a parameter name, indicating
the stringising operation. This behaviour can be controlled by the
directive:
<programlisting>
#pragma TenDRA no ident after # <I>allow</I>
</programlisting>
which allows a <code>#</code> which is not followed by a parameter
name to be treated as a normal preprocessing token.
</para>
<para>
In a list of macro arguments, the effect of a sequence of preprocessing
tokens which otherwise resembles a preprocessing directive is undefined.
The C++ producer treats such directives as normal sequences of preprocessing
tokens, but can be made to report such behaviour using:
<programlisting>
#pragma TenDRA directive as macro argument <I>allow</I>
</programlisting>
</para>
</sect3>
<sect3 id="empty">
<title>2.2.17. Empty source files</title>
<para>
ISO C requires that a translation unit should contain at least one
declaration. C++ and older dialects of C allow translation units
which contain no declarations. This behaviour can be controlled using
the directive:
<programlisting>
#pragma TenDRA no external declaration <I>allow</I>
</programlisting>
</para>
</sect3>
<sect3 id="std">
<title>2.2.18. The <code>std</code> namespace</title>
<para>
Several classes declared in the <code>std</code> namespace arise naturally
as part of the C++ language specification. These are as follows:
<programlisting>
std::type_info // type of typeid construct
std::bad_cast // thrown by dynamic_cast construct
std::bad_typeid // thrown by typeid construct
std::bad_alloc // thrown by new construct
std::bad_exception // used in exception specifications
</programlisting>
The definitions of these classes are found, when needed, by looking
up the appropriate class name in the <code>std</code> namespace.
Depending on the context, an error may be reported if the class is
not found. It is possible to modify the namespace which is searched
for these classes using the directive:
<programlisting>
#pragma TenDRA++ set std namespace : <I>scope-name</I>
</programlisting>
where <I>scope-name</I> can be an identifier giving a namespace name
or <code>::</code>, indicating the global namespace.
</para>
</sect3>
<sect3 id="linkage">
<title>2.2.19. Object linkage</title>
<para>
If an object is declared with both external and internal linkage in
the same translation unit then, by default, an error is given. This
behaviour can be changed using the directive:
<programlisting>
#pragma TenDRA incompatible linkage <I>allow</I>
</programlisting>
When incompatible linkages are allowed, whether the resultant identifier
has external or internal linkage can be set using one of the directives:
<programlisting>
#pragma TenDRA linkage resolution : off
#pragma TenDRA linkage resolution : (external) <I>on</I>
#pragma TenDRA linkage resolution : (internal) <I>on</I>
</programlisting>
</para>
<para>
It is possible to declare objects with external linkage in a block.
C leaves it undefined whether declarations of the same object in different
blocks, such as:
<programlisting>
void f ()
{
extern int a ;
....
}
void g ()
{
extern double a ;
....
}
</programlisting>
are checked for compatibility. However in C++ the one definition
rule implies that such declarations are indeed checked for compatibility.
The status of this check can be set using the directive:
<programlisting>
#pragma TenDRA unify external linkage <I>on</I>
</programlisting>
Note that it is not possible in ISO C or C++ to declare objects or
functions with internal linkage in a block. While <code>static</code>
object definitions in a block have a specific meaning, there is no
real reason why <code>static</code> functions should not be declared
in a block. This behaviour can be enabled using the directive:
<programlisting>
#pragma TenDRA block function static <I>allow</I>
</programlisting>
</para>
<para>
Inline functions have external linkage by default in ISO C++, but
internal linkage in older dialects. The default linkage can be set
using the directive:
<programlisting>
#pragma TenDRA++ inline linkage <I>linkage-spec</I>
</programlisting>
where <I>linkage-spec</I> can be <code>external</code> or
<code>internal</code>. Similarly <code>const</code> objects have
internal linkage by default in C++, but external linkage in C. The
default linkage can be set using the directive:
<programlisting>
#pragma TenDRA++ const linkage <I>linkage-spec</I>
</programlisting>
</para>
<para>
Older dialects of C treated all identifiers with external linkage
as if they had been declared <code>volatile</code> (i.e. by being
conservative in optimising such values). This behaviour can be enabled
using the directive:
<programlisting>
#pragma TenDRA external volatile_t
</programlisting>
</para>
<para>
It is possible to set the default language linkage using the directive:
<programlisting>
#pragma TenDRA++ external linkage <I>string-literal</I>
</programlisting>
This is equivalent to enclosing the rest of the current checking scope
in:
<programlisting>
extern <I>string-literal</I> {
....
}
</programlisting>
It is unspecified what happens if such a directive is used within
an explicit linkage specification and does not nest correctly. This
directive is particularly useful when used in a <A HREF="#scope">named
environment</A> associated with an include directory. For example,
it can be used to express the fact that all the objects declared in
headers included from that directory have C linkage.
</para>
<para>
A change in ISO C++ relative to older dialects is that the language
linkage of a function now forms part of the function type. For example:
<programlisting>
extern "C" int f ( int ) ;
int ( *pf ) ( int ) = f ; // error
</programlisting>
The directive:
<programlisting>
#pragma TenDRA++ external function linkage <I>on</I>
</programlisting>
can be used to control whether function types with differing language
linkages, but which are otherwise compatible, are considered compatible
or not.
</para>
</sect3>
<sect3 id="static">
<title>2.2.20. Static identifiers</title>
<para>
By default, objects and functions with internal linkage are mapped
to tags without external names in the output TDF capsule. Thus such
names are not available to the installer and it needs to make up internal
names to represent such objects in its output. This is not desirable
in such operations as profiling, where a meaningful internal name
is needed to make sense of the output. The directive:
<programlisting>
#pragma TenDRA preserve <I>identifier-list</I>
</programlisting>
can be used to preserve the names of the given list of identifiers
with internal linkage. This is done using the <code>static_name_def</code>
TDF construct. The form:
<programlisting>
#pragma TenDRA preserve *
</programlisting>
will preserve the names of all identifiers with internal linkage in
this way.
</para>
</sect3>
<sect3 id="decl_none">
<title>2.2.21. Empty declarations</title>
<para>
ISO C++ requires every declaration or member declaration to introduce
one or more names into the program. The directive:
<programlisting>
#pragma TenDRA unknown struct/union <I>allow</I>
</programlisting>
can be used to relax one particular instance of this rule, by allowing
anonymous class definitions (recall that anonymous unions are objects,
not types, in C++ and so are not covered by this rule). The C++ grammar
also allows a solitary semicolon as a declaration or member declaration;
however such a declaration does not introduce a name and so contravenes
the rule above. The rule can be relaxed in this case using the directive:
<programlisting>
#pragma TenDRA extra ; <I>allow</I>
</programlisting>
Note that the C++ grammar explicitly allows for an extra semicolon
following an inline member function definition, but that semicolons
following other function definitions are actually empty declarations
of the form above. A solitary semicolon in a statement is interpreted
as an empty expression statement rather than an empty declaration
statement.
</para>
</sect3>
<sect3 id="implicit">
<title>2.2.22. Implicit <code>int</code></title>
<para>
The C "implicit <code>int</code>" rule, whereby a type of
<code>int</code>
is inferred in a list of type or declaration specifiers which does
not contain a type name, has been removed in ISO C++, although it
was supported in older dialects of C++. This check is controlled
by the directive:
<programlisting>
#pragma TenDRA++ implicit int type <I>allow</I>
</programlisting>
Partial relaxations of this rules are allowed. The directive:
<programlisting>
#pragma TenDRA++ implicit int type for const/volatile <I>allow</I>
</programlisting>
will allow for implicit <code>int</code> when the list of type specifiers
contains a cv-qualifier. Similarly the directive:
<programlisting>
#pragma TenDRA implicit int type for function return <I>allow</I>
</programlisting>
will allow for implicit <code>int</code> in the return type of a function
definition (this excludes constructors, destructors and conversion
functions, where special rules apply). A function definition is the
only kind of declaration in ISO C where a declaration specifier is
not required. Older dialects of C allowed declaration specifiers to
be omitted in other cases. Support for this behaviour can be enabled
using:
<programlisting>
#pragma TenDRA implicit int type for external declaration <I>allow</I>
</programlisting>
The four cases can be demonstrated in the following example:
<programlisting>
extern a ; // implicit int
const b = 1 ; // implicit const int
f () // implicit function return
{
return 2 ;
}
c = 3 ; // error: not allowed in C++
</programlisting>
</para>
</sect3>
<sect3 id="longlong">
<title>2.2.23. Extended integral types</title>
<para>
The <code>long long</code> integral types are not part of ISO C or
C++ by default, however support for them can be enabled using the
directive:
<programlisting>
#pragma TenDRA longlong type <I>allow</I>
</programlisting>
This support includes allowing <code>long long</code> in type specifiers
and allowing <code>LL</code> and <code>ll</code> as integer literal
suffixes.
</para>
<para>
There is a further directive given by the two cases:
<programlisting>
#pragma TenDRA set longlong type : long long
#pragma TenDRA set longlong type : long
</programlisting>
which can be used to control the implementation of the <code>long
long</code> types. Either they can be mapped to the
<A HREF="lib.html#arith">default representation</A>, which is guaranteed
to contain at least 64 bits, or they can be mapped to the corresponding
<code>long</code> types.
</para>
<para>
Because these <code>long long</code> types are not an intrinsic part
of C++ the implementation does not integrate them into the language
as fully as is possible. This is to prevent the presence or otherwise
of
<code>long long</code> types affecting the semantics of code which
does not use them. For example, it would be possible to extend the
rules for the types of integer literals, integer promotion types and
arithmetic types to say that if the given value does not fit into
the standard integral types then the extended types are tried. This
has not been done, although these rules could be implemented by changing
the definitions of the <A HREF="lib.html#arith">standard tokens</A>
used to determine these types. By default, only the rules for arithmetic
types involving a <code>long long</code> operand and for <code>LL</code>
integer literals mention <code>long long</code> types.
</para>
</sect3>
<sect3 id="bitfield-types">
<title>2.2.24. Bitfield types</title>
<para>
The C++ rules on bitfield types differ slightly from the C rules.
Firstly any integral or enumeration type is allowed in a bitfield,
and secondly the bitfield width may exceed the underlying type size
(the extra bits being treated as padding). These properties can be
controlled using the directives:
<programlisting>
#pragma TenDRA extra bitfield int type <I>allow</I>
#pragma TenDRA bitfield overflow <I>allow</I>
</programlisting>
respectively.
</para>
</sect3>
<sect3 id="elab">
<title>2.2.25. Elaborated type specifiers</title>
<para>
In elaborated type specifiers, the class key (<code>class</code>,
<code>struct</code>, <code>union</code> or <code>enum</code>) should
agree with any previous declaration of the type (except that <code>class</code>
and <code>struct</code> are interchangeable). This requirement can
be relaxed using the directive:
<programlisting>
#pragma TenDRA ignore struct/union/enum tag <I>on</I>
</programlisting>
</para>
<para>
In ISO C and C++ it is not possible to give a forward declaration
of an enumeration type. This constraint can be relaxed using the
directive:
<programlisting>
#pragma TenDRA forward enum declaration <I>allow</I>
</programlisting>
Until the end of its definition, an enumeration type is treated as
an incomplete type (as with class types). In enumeration definitions,
and a couple of other contexts where comma-separated lists are required,
the directive:
<programlisting>
#pragma TenDRA extra , <I>allow</I>
</programlisting>
can be used to allow a trailing comma at the end of the list.
</para>
<para>
The directive:
<programlisting>
#pragma TenDRA complete struct/union analysis <I>on</I>
</programlisting>
can be used to enable a check that every class or union has been completed
within each translation unit in which it is declared.
</para>
</sect3>
<sect3 id="impl_func">
<title>2.2.26. Implicit function declarations</title>
<para>
C, but not C++, allows calls to undeclared functions, the function
being declared implicitly. It is possible to enable support for implicit
function declarations using the directive:
<programlisting>
#pragma TenDRA implicit function declaration <I>on</I>
</programlisting>
Such implicitly declared functions have C linkage and type
<code>int ( ... )</code>.
</para>
</sect3>
<sect3 id="weak">
<title>2.2.27. Weak function prototypes</title>
<para>
The C producer supports a concept, weak prototypes, whereby type checking
can be applied to the arguments of a non-prototype function. This
checking can be enabled using the directive:
<programlisting>
#pragma TenDRA weak prototype analysis <I>on</I>
</programlisting>
The concept of weak prototypes is not applicable to C++, where all
functions are prototyped. The C++ producer does allow the syntax
for explicit weak prototype declarations, but treats them as if they
were normal prototypes. These declarations are denoted by means of
a keyword,
<code>WEAK</code> say, introduced by the directive:
<programlisting>
#pragma TenDRA keyword <I>identifier</I> for weak
</programlisting>
preceding the <code>(</code> of the function declarator. The directives:
<programlisting>
#pragma TenDRA prototype <I>allow</I>
#pragma TenDRA prototype (weak) <I>allow</I>
</programlisting>
which can be used in the C producer to warn of prototype or weak prototype
declarations, are similarly ignored by the C++ producer.
</para>
<para>
The C producer also allows the directives:
<programlisting>
#pragma TenDRA argument <I>type-id</I> as <I>type-id</I>
#pragma TenDRA argument <I>type-id</I> as ...
#pragma TenDRA extra ... <I>allow</I>
#pragma TenDRA incompatible promoted function argument <I>allow</I>
</programlisting>
which control the compatibility of function types. These directives
are ignored by the C++ producer (some of them would make sense in
the context of C++ but would over-complicate function overloading).
</para>
</sect3>
<sect3 id="printf">
<title>2.2.28. <code>printf</code> and <code>scanf</code>
argument checking</title>
<para>
The C producer includes a number of checks that the arguments in a
call to a function in the <code>printf</code> or <code>scanf</code>
families match the given format string. The check is implemented
by using the directives:
<programlisting>
#pragma TenDRA type <I>identifier</I> for ... printf
#pragma TenDRA type <I>identifier</I> for ... scanf
</programlisting>
to introduce a type representing a <code>printf</code> or <code>scanf</code>
format string. For most purposes this type is treated as <code>const
char *</code>, but when it appears in a function declaration it alerts
the producer that any extra arguments passed to that function should
match the format string passed as the corresponding argument. The
TenDRA API headers conditionally declare <code>printf</code>,
<code>scanf</code> and similar functions in something like the form:
<programlisting>
#ifdef __NO_PRINTF_CHECKS
typedef const char *__printf_string ;
#else
#pragma TenDRA type __printf_string for ... printf
#endif
int printf ( __printf_string, ... ) ;
int fprintf ( FILE *, __printf_string, ... ) ;
int sprintf ( char *, __printf_string, ... ) ;
</programlisting>
These declarations can be skipped, effectively disabling this check,
by defining the <code>__NO_PRINTF_CHECKS</code> macro.
</para>
<para>
<IMG SRC="../images/warn.gif" ALT="warning"/>
These <code>printf</code> and <code>scanf</code> format string checks
have not yet been implemented in the C++ producer due to presence
of an alternative, type checked, I/O package - namely
<code><iostream></code>. The format string types are simply
treated as <code>const char *</code>.
</para>
</sect3>
<sect3 id="typedef">
<title>2.2.29. Type declarations</title>
<para>
C does not allow multiple definitions of a <code>typedef</code> name,
whereas C++ allows multiple consistent definitions. This behaviour
can be controlled using the directive:
<programlisting>
#pragma TenDRA extra type definition <I>allow</I>
</programlisting>
</para>
</sect3>
<sect3 id="compatible">
<title>2.2.30. Type compatibility</title>
<para>
The directive:
<programlisting>
#pragma TenDRA incompatible type qualifier <I>allow</I>
</programlisting>
allows objects to be redeclared with different cv-qualifiers (normally
such redeclarations would be incompatible). The composite type is
qualified using the join of the cv-qualifiers in the various redeclarations.
</para>
<para>
The directive:
<programlisting>
#pragma TenDRA compatible type : <I>type-id</I> == <I>type-id</I> : <I>allow
</I>
</programlisting>
asserts that the given two types are compatible. Currently the only
implemented version is <code>char * == void *</code> which enables
<code>char *</code> to be used as a generic pointer as it was in older
dialects of C.
</para>
</sect3>
<sect3 id="complete">
<title>2.2.31. Incomplete types</title>
<para>
Some dialects of C allow incomplete arrays as member types. These
are generally used as a place-holder at the end of a structure to
allow for the allocation of an arbitrarily sized array. Support for
this feature can be enabled using the directive:
<programlisting>
#pragma TenDRA incomplete type as object type <I>allow</I>
</programlisting>
</para>
</sect3>
<sect3 id="type-conversions">
<title>2.2.32. Type conversions</title>
<para>
There are a number of directives which allow various classes of type
conversion to be checked. The directives:
<programlisting>
#pragma TenDRA conversion analysis (int-int explicit) <I>on</I>
#pragma TenDRA conversion analysis (int-int implicit) <I>on</I>
</programlisting>
will check for unsafe explicit or implicit conversions between arithmetic
types. Similarly conversions between pointers and arithmetic types
can be checked using:
<programlisting>
#pragma TenDRA conversion analysis (int-pointer explicit) <I>on</I>
#pragma TenDRA conversion analysis (int-pointer implicit) <I>on</I>
</programlisting>
or equivalently:
<programlisting>
#pragma TenDRA conversion analysis (pointer-int explicit) <I>on</I>
#pragma TenDRA conversion analysis (pointer-int implicit) <I>on</I>
</programlisting>
Conversions between pointer types can be checked using:
<programlisting>
#pragma TenDRA conversion analysis (pointer-pointer explicit) <I>on</I>
#pragma TenDRA conversion analysis (pointer-pointer implicit) <I>on</I>
</programlisting>
</para>
<para>
There are some further variants which can be used to enable useful
sets of conversion checks. For example:
<programlisting>
#pragma TenDRA conversion analysis (int-int) <I>on</I>
</programlisting>
enables both implicit and explicit arithmetic conversion checks.
The directives:
<programlisting>
#pragma TenDRA conversion analysis (int-pointer) <I>on</I>
#pragma TenDRA conversion analysis (pointer-int) <I>on</I>
#pragma TenDRA conversion analysis (pointer-pointer) <I>on</I>
</programlisting>
are equivalent to their corresponding explicit forms (because the
implicit forms are illegal by default). The directive:
<programlisting>
#pragma TenDRA conversion analysis <I>on</I>
</programlisting>
is equivalent to the four directives just given. It enables checks
on implicit and explicit arithmetic conversions, explicit arithmetic
to pointer conversions and explicit pointer conversions.
</para>
<para>
The default settings for these checks are determined by the implicit
and explicit conversions allowed in C++. Note that there are differences
between the conversions allowed in C and C++. For example, an arithmetic
type can be converted implicitly to an enumeration type in C, but
not in C++. The directive:
<programlisting>
#pragma TenDRA conversion analysis (int-enum implicit) <I>on</I>
</programlisting>
can be used to control the status of this conversion. The level of
severity for an error message arising from such a conversion is the
maximum of the severity set by this directive and that set by the
<code>int-int implicit</code> directive above.
</para>
<para>
The implicit pointer conversions described above do not include conversions
to and from the generic pointer <code>void *</code>, which have their
own controlling directives. A pointer of type <code>void *</code>
can be converted implicitly to another pointer type in C but not in
C++; this is controlled by the directive:
<programlisting>
#pragma TenDRA++ conversion analysis (void*-pointer implicit) <I>on</I>
</programlisting>
The reverse conversion, from a pointer type to <code>void *</code>
is allowed in both C and C++, and has a controlling directive:
<programlisting>
#pragma TenDRA++ conversion analysis (pointer-void* implicit) <I>on</I>
</programlisting>
</para>
<para>
In ISO C and C++, a function pointer can only be cast to other function
pointers, not to object pointers or <code>void *</code>. Many dialects
however allow function pointers to be cast to and from other pointers.
This behaviour can be controlled using the directive:
<programlisting>
#pragma TenDRA function pointer as pointer <I>allow</I>
</programlisting>
which causes function pointers to be treated in the same way as all
other pointers.
</para>
<para>
The integer conversion checks described above only apply to unsafe
conversions. A simple-minded check for shortening conversions is
not adequate, as is shown by the following example:
<programlisting>
char a = 1, b = 2 ;
char c = a + b ;
</programlisting>
the sum <code>a + b</code> is evaluated as an <code>int</code> which
is then shortened to a <code>char</code>. Any check which does not
distinguish this sort of "safe" shortening conversion from
unsafe shortening conversions such as:
<programlisting>
int a = 1, b = 2 ;
char c = a + b ;
</programlisting>
is not likely to be very useful. The producer therefore associates
two types with each integral expression; the first is the normal,
representation type and the second is the underlying, semantic type.
Thus in the first example, the representation type of <code>a + b</code>
is <code>int</code>, but semantically it is still a <code>char</code>.
The conversion analysis is based on the semantic types.
</para>
<para>
<IMG SRC="../images/warn.gif" ALT="warning"/>
The C producer supports a directive:
<programlisting>
#pragma TenDRA keyword <I>identifier</I> for type representation
</programlisting>
whereby a keyword can be introduced which can be used to explicitly
declare a type with given representation and semantic components.
Unfortunately this makes the <A HREF="parse.html">C++ grammar</A>
ambiguous, so it has not yet been implemented in the C++ producer.
</para>
<para>
It is possible to allow individual conversions by means of conversion
tokens. A <A HREF="token.html">procedure token</A> which takes one
rvalue expression program parameter and returns an rvalue expression,
such as:
<programlisting>
#pragma token PROC ( EXP : t : ) EXP : s : conv #
</programlisting>
can be regarded as mapping expressions of type <code>t</code> to expressions
of type <code>s</code>. The directive:
<programlisting>
#pragma TenDRA conversion <I>identifier-list</I> allow
</programlisting>
can be used to nominate such a token as a conversion token. That
is to say, if the conversion, whether explicit or implicit, from <code>t</code>
to <code>s</code> cannot be done by other means, it is done by applying
the token <code>conv</code>, so:
<programlisting>
t a ;
s b = a ; // maps to conv ( a )
</programlisting>
Note that, unlike conversion functions, conversion tokens can be applied
to any types.
</para>
</sect3>
<sect3 id="cast">
<title>2.2.33. Cast expressions</title>
<para>
ISO C++ introduces the constructs <code>static_cast</code>,
<code>const_cast</code> and <code>reinterpret_cast</code>, which can
be used in various contexts where an old style explicit cast would
previously have been used. By default, an explicit cast can perform
any combination of the conversions performed by these three constructs.
To aid migration to the new style casts the directives:
<programlisting>
#pragma TenDRA++ explicit cast as <I>cast-state allow</I>
#pragma TenDRA++ explicit cast <I>allow</I>
</programlisting>
where <I>cast-state</I> is defined as follows:
<programlisting>
<I>cast-state</I> :
static_cast
const_cast
reinterpret_cast
static_cast | <I>cast-state</I>
const_cast | <I>cast-state</I>
reinterpret_cast | <I>cast-state</I>
</programlisting>
can be used to restrict the conversions which can be performed using
explicit casts. The first form sets the interpretation of explicit
cast to be combinations of the given constructs; the second resets
the interpretation to the default. For example:
<programlisting>
#pragma TenDRA++ explicit cast as static_cast | const_cast allow
</programlisting>
means that conversions requiring <code>reinterpret_cast</code> (the
most unportable conversions) will not be allowed to be performed using
explicit casts, but will have to be given as a <code>reinterpret_cast</code>
construct. Changing <code>allow</code> to <code>warning</code> will
also cause a warning to be issued for every explicit cast expression.
</para>
</sect3>
<sect3 id="ellipsis">
<title>2.2.34. Ellipsis functions</title>
<para>
The directive:
<programlisting>
#pragma TenDRA ident ... <I>allow</I>
</programlisting>
may be used to enable or disable the use of <code>...</code> as a
primary expression in a function defined with ellipsis. The type
of such an expression is implementation defined. This expression
is used in the definition of the <A HREF="lib.html#ellipsis"><code>va_start
</code>
macro</A> in the <code><stdarg.h></code> header. This header
automatically enables this switch.
</para>
</sect3>
<sect3 id="overload">
<title>2.2.35. Overloaded functions</title>
<para>
Older dialects of C++ did not report ambiguous overloaded function
resolutions, but instead resolved the call to the first of the most
viable candidates to be declared. This behaviour can be controlled
using the directive:
<programlisting>
#pragma TenDRA++ ambiguous overload resolution <I>allow</I>
</programlisting>
There are occasions when the resolution of an overloaded function
call is not clear. The directive:
<programlisting>
#pragma TenDRA++ overload resolution <I>allow</I>
</programlisting>
can be used to report the resolution of any such call (whether explicit
or implicit) where there is more than one viable candidate.
</para>
<para>
An interesting consequence of compiling C++ in a target independent
manner is that certain overload resolutions can only be determined
at install-time. For example, in:
<programlisting>
int f ( int ) ;
int f ( unsigned int ) ;
int f ( long ) ;
int f ( unsigned long ) ;
int a = f ( sizeof ( int ) ) ; // which f?
</programlisting>
the type of the <code>sizeof</code> operator, <code>size_t</code>,
is target dependent, but its promotion must be one of the types
<code>int</code>, <code>unsigned int</code>, <code>long</code> or
<code>unsigned long</code>. Thus the call to <code>f</code> always
has a unique resolution, but what it is is target dependent. The
equivalent directives:
<programlisting>
#pragma TenDRA++ conditional overload resolution <I>allow</I>
#pragma TenDRA++ conditional overload resolution (complete) <I>allow</I>
</programlisting>
can be used to warn about such target dependent overload resolutions.
By default, such resolutions are only allowed if there is a unique
resolution for each possible implementation of the argument types
(note that, for simplicity, the possibility of <code>long long</code>
implementation types is ignored). The directive:
<programlisting>
#pragma TenDRA++ conditional overload resolution (incomplete) <I>allow</I>
</programlisting>
can be used to allow target dependent overload resolutions which only
have resolutions for some of the possible implementation types (if
one of the <code>f</code> declarations above was removed, for example).
If the implementation does not match one of these types then an install-time
error is given.
</para>
<para>
There are restrictions on the set of candidate functions involved
in a target dependent overload resolution. Most importantly, it should
be possible to bring their return types to a common type, as if by
a series of <code>?:</code> operations. This common type is the type
of the target dependent call. By this means, target dependent types
are prevented from propagating further out into the program. Note
that since sets of overloaded functions usually have the same semantics,
this does not usually present a problem.
</para>
</sect3>
<sect3 id="expressions">
<title>2.2.36. Expressions</title>
<para>
The directive:
<programlisting>
#pragma TenDRA operator precedence analysis <I>on</I>
</programlisting>
can be used to enable a check for expressions where the operator precedence
is not necessarily what might be expected. The intended precedence
can be clarified by means of explicit parentheses. The precedence
levels checked are as follows:
<itemizedlist>
<listitem><code>&&</code> versus <code>||</code>.
</listitem>
<listitem><code><<</code> and <code>>></code> versus binary
<code>+</code> and <code>-</code>.
</listitem>
<listitem>Binary <code>&</code> versus binary <code>+</code>, <code>-</code>,
<code>==</code>, <code>!=</code>, <code>></code>, <code>>=</code>,
<code><</code> and <code><=</code>.
</listitem>
<listitem><code>^</code> versus binary <code>&</code>, <code>+</code>,
<code>-</code>, <code>==</code>, <code>!=</code>, <code>></code>,
<code>>=</code>, <code><</code> and <code><=</code>.
</listitem>
<listitem><code>|</code> versus binary <code>^</code>, <code>&</code>,
<code>+</code>, <code>-</code>, <code>==</code>, <code>!=</code>,
<code>></code>, <code>>=</code>, <code><</code> and <code><=
</code>.
</listitem>
</itemizedlist>
Also checked are expressions such as <code>a < b < c</code>
which do not have their normal mathematical meaning. For example,
in:
<programlisting>
d = a << b + c ; // precedence is a << ( b + c )
</programlisting>
the precedence is counter-intuitive, although strangely enough, it
isn't in:
<programlisting>
cout << b + c ; // precedence is cout << ( b + c )
</programlisting>
</para>
<para>
Other dubious arithmetic operations can be checked for using the directive:
<programlisting>
#pragma TenDRA integer operator analysis <I>on</I>
</programlisting>
This includes checks for operations, such as division by a negative
value, which are implementation dependent, and those such as testing
whether an unsigned value is less than zero, which serve no purpose.
Similarly the directive:
<programlisting>
#pragma TenDRA++ pointer operator analysis <I>on</I>
</programlisting>
checks for dubious pointer operations. This includes very simple
bounds checking for arrays and checking that only the simple literal
<code>0</code>
is used in null pointer constants:
<programlisting>
char *p = 1 - 1 ; // valid, but weird
</programlisting>
</para>
<para>
The directive:
<programlisting>
#pragma TenDRA integer overflow analysis <I>on</I>
</programlisting>
is used to control the treatment of overflows in the evaluation of
integer constant expressions. This includes the detection of division
by zero.
</para>
</sect3>
<sect3 id="initialiser-expressions">
<title>2.2.37. Initialiser expressions</title>
<para>
C, but not C++, only allows constant expressions in static initialisers.
The directive:
<programlisting>
#pragma TenDRA variable initialization <I>allow</I>
</programlisting>
can be enable support for C++-style dynamic initialisers. Conversely,
it can be used in C++ to detect such dynamic initialisers.
</para>
<para>
In older dialects of C it was not possible to initialise an automatic
variable of structure or union type. This can be checked for using
the directive:
<programlisting>
#pragma TenDRA initialization of struct/union (auto) <I>allow</I>
</programlisting>
</para>
<para>
The directive:
<programlisting>
#pragma TenDRA++ complete initialization analysis <I>on</I>
</programlisting>
can be used to check aggregate initialisers. The initialiser should
be fully bracketed (i.e. with no elision of braces), and should have
an entry for each member of the structure or array.
</para>
</sect3>
<sect3 id="lvalue">
<title>2.2.38. Lvalue expressions</title>
<para>
C++ defines the results of several operations to be lvalues, whereas
they are rvalues in C. The directive:
<programlisting>
#pragma TenDRA conditional lvalue <I>allow</I>
</programlisting>
is used to apply the C++ rules for lvalues in conditional (<code>?:</code>)
expressions.
</para>
<para>
Older dialects of C++ allowed <code>this</code> to be treated as an
lvalue. It is possible to enable support for this dialect feature
using the directive:
<programlisting>
#pragma TenDRA++ this lvalue <I>allow</I>
</programlisting>
however it is recommended that programs using this feature should
be modified.
</para>
</sect3>
<sect3 id="discard">
<title>2.2.39. Discarded expressions</title>
<para>
The directive:
<programlisting>
#pragma TenDRA discard analysis <I>on</I>
</programlisting>
can be used to enable a check for values which are calculated but
not used. There are three checks controlled by this directive, each
of which can be controlled independently. The directive:
<programlisting>
#pragma TenDRA discard analysis (function return) <I>on</I>
</programlisting>
checks for functions which return a value which is not used. The
check needs to be enabled for both the declaration and the call of
the function in order for a discarded function return to be reported.
Discarded returns for overloaded operator functions are never reported.
The directive:
<programlisting>
#pragma TenDRA discard analysis (value) <I>on</I>
</programlisting>
checks for other expressions which are not used. Finally, the directive:
<programlisting>
#pragma TenDRA discard analysis (static) <I>on</I>
</programlisting>
checks for variables with internal linkage which are defined but not
used.
</para>
<para>
An unused function return or other expression can be asserted to be
deliberately discarded by explicitly casting it to <code>void</code>
or, equivalently, preceding it by a keyword introduced using the directive:
<programlisting>
#pragma TenDRA keyword <I>identifier</I> for discard value
</programlisting>
A static variable can be asserted to be deliberately unused by including
it in list of identifiers in a directive of the form:
<programlisting>
#pragma TenDRA suspend static <I>identifier-list</I>
</programlisting>
</para>
</sect3>
<sect3 id="if">
<title>2.2.40. Conditional and iteration statements</title>
<para>
The directive:
<programlisting>
#pragma TenDRA const conditional <I>allow</I>
</programlisting>
can be used to enable a check for constant expressions used in conditional
contexts. A literal constant is allowed in the condition of a <code>while
</code>, <code>for</code> or <code>do</code> statement to allow for
such common constructs as:
<programlisting>
while ( true ) {
// while statement body
}
</programlisting>
and target dependent constant expressions are allowed in the condition
of an <code>if</code> statement, but otherwise constant conditions
are reported according to the status of this check.
</para>
<para>
The common error of writing <code>=</code> rather than <code>==</code>
in conditions can be detected using the directive:
<programlisting>
#pragma TenDRA assignment as bool <I>allow</I>
</programlisting>
which can be used to disallow such assignment expressions in contexts
where a boolean is expected. The error message can be suppressed
by enclosing the assignment within parentheses.
</para>
<para>
Another common error associated with iteration statements, particularly
with certain <A HREF="style.html">heretical</A> brace styles, is the
accidental insertion of an extra semicolon as in:
<programlisting>
for ( init ; cond ; step ) ;
{
// for statement body
}
</programlisting>
The directive:
<programlisting>
#pragma TenDRA extra ; after conditional <I>allow</I>
</programlisting>
can be used to enable a check for such suspicious empty iteration
statement bodies (it actually checks for <code>;{</code>).
</para>
</sect3>
<sect3 id="switch">
<title>2.2.41. Switch statements</title>
<para>
A <code>switch</code> statement is said to be exhaustive if its control
statement is guaranteed to take one of the values of its
<code>case</code> labels, or if it has a <code>default</code> label.
The TenDRA C and C++ producers allow a <code>switch</code> statement
to be asserted to be exhaustive using the syntax:
<programlisting>
switch ( cond ) EXHAUSTIVE {
// switch statement body
}
</programlisting>
where <code>EXHAUSTIVE</code> is either the directive:
<programlisting>
#pragma TenDRA exhaustive
</programlisting>
or a keyword introduced using:
<programlisting>
#pragma TenDRA keyword <I>identifier</I> for exhaustive
</programlisting>
Knowing whether a <code>switch</code> statement is exhaustive or not
means that checks relying on flow analysis (including variable usage
checks) can be applied more precisely.
</para>
<para>
In certain circumstances it is possible to deduce whether a
<code>switch</code> statement is exhaustive or not. For example,
the directive:
<programlisting>
#pragma TenDRA enum switch analysis <I>on</I>
</programlisting>
enables a check on <code>switch</code> statements on values of enumeration
type. Such statements should be exhaustive, either explicitly by
using the <code>EXHAUSTIVE</code> keyword or declaring a
<code>default</code> label, or implicitly by having a <code>case</code>
label for each enumerator. Conversely, the value of each <code>case</code>
label should equal the value of an enumerator. For the purposes of
this check, boolean values are treated as if they were declared using
an enumeration type of the form:
<programlisting>
enum bool { false = 0, true = 1 } ;
</programlisting>
</para>
<para>
A common source of errors in <code>switch</code> statements is the
fall-through from one <code>case</code> or <code>default</code>
statement to the next. A check for this can be enabled using:
<programlisting>
#pragma TenDRA fall into case <I>allow</I>
</programlisting>
<code>case</code> or <code>default</code> labels where fall-through
from the previous statement is intentional can be marked by preceding
them by a keyword, <code>FALL_THRU</code> say, introduced using the
directive:
<programlisting>
#pragma TenDRA keyword <I>identifier</I> for fall into case
</programlisting>
</para>
</sect3>
<sect3 id="for">
<title>2.2.42. For statements</title>
<para>
In ISO C++ the scope of a variable declared in a for-init-statement
is the body of the <code>for</code> statement; in older dialects it
extended to the end of the enclosing block. So:
<programlisting>
for ( int i = 0 ; i < 10 ; i++ ) {
// for statement body
}
return i ; // OK in older dialects, error in ISO C++
</programlisting>
This behaviour is controlled by the directive:
<programlisting>
#pragma TenDRA++ for initialization block <I>on</I>
</programlisting>
a state of <code>on</code> corresponding to the ISO rules and
<code>off</code> to the older rules. Perhaps most useful is the
<code>warning</code> state which implements the old rules but gives
a warning if a variable declared in a for-init-statement is used outside
the corresponding <code>for</code> statement body. A program which
does not give such warnings should compile correctly under either
set of rules.
</para>
</sect3>
<sect3 id="return">
<title>2.2.43. Return statements</title>
<para>
In C, but not in C++, it is possible to have a <code>return</code>
statement without an expression in a function which does not return
<code>void</code>. It is possible to enable this behaviour using
the directive:
<programlisting>
#pragma TenDRA incompatible void return <I>allow</I>
</programlisting>
Note that this check includes the implicit <code>return</code> caused
by falling off the end of a function. The effect of such a
<code>return</code> statement is undefined. The C++ rule that falling
off the end of <code>main</code> is equivalent to returning a value
of 0 overrides this check.
</para>
</sect3>
<sect3 id="reach">
<title>2.2.44. Unreached code analysis</title>
<para>
The directive:
<programlisting>
#pragma TenDRA unreachable code <I>allow</I>
</programlisting>
enables a flow analysis check to detect unreachable code. It is possible
to assert that a statement is reached or not reached by preceding
it by a keyword introduced by one of the directives:
<programlisting>
#pragma TenDRA keyword <I>identifier</I> for set reachable
#pragma TenDRA keyword <I>identifier</I> for set unreachable
</programlisting>
</para>
<para>
The fact that certain functions, such as <code>exit</code>, do not
return a value can be exploited in the flow analysis routines. The
equivalent directives:
<programlisting>
#pragma TenDRA bottom <I>identifier</I>
#pragma TenDRA++ type <I>identifier</I> for bottom
</programlisting>
can be used to introduce a <code>typedef</code> declaration for the
type, bottom, returned by such functions. The TenDRA API headers
declare
<code>exit</code> and similar functions in this way, for example:
<programlisting>
#pragma TenDRA bottom __bottom
__bottom exit ( int ) ;
__bottom abort ( void ) ;
</programlisting>
The bottom type is compatible with <code>void</code> in function declarations
to allow such functions to be redeclared in their conventional form.
</para>
</sect3>
<sect3 id="variable">
<title>2.2.45. Variable flow analysis</title>
<para>
The directive:
<programlisting>
#pragma TenDRA variable analysis <I>on</I>
</programlisting>
enables checks on the uses of automatic variables and function parameters.
These checks detect:
<itemizedlist>
<listitem>If a variable is not used in its scope.
</listitem>
<listitem>If the value of a variable is used before it has been assigned
to.
</listitem>
<listitem>If a variable is assigned to twice without an intervening use.
</listitem>
<listitem>If a variable is assigned to twice without an intervening sequence
point.
</listitem>
</itemizedlist>
as illustrated by the variables <code>a</code>, <code>b</code>,
<code>c</code> and <code>d</code> respectively in:
<programlisting>
void f ()
{
int a ; // a never used
int b ;
int c = b ; // b not initialised
c = 0 ; // c assigned to twice
int d = 0 ;
d = ++d ; // d assigned to twice
}
</programlisting>
The second, and more particularly the third, of these checks requires
some fairly sophisticated flow analysis, so any hints which can be
picked up from <A HREF="#switch">exhaustive <code>switch</code>
statements</A> etc. is likely to increase the accuracy of the errors
detected.
</para>
<para>
In a non-static member function the various non-static data members
are analysed as if they were automatic variables. It is checked that
each member is initialised in a constructor. A common source of initialisation
problems in a constructor is that the base classes and members are
initialised in the canonical order of virtual bases, non-virtual direct
bases and members in the order of their declaration, rather than in
the order in which their initialisers appear in the constructor definition.
Therefore a check that the initialisers appear in the canonical order
is also applied.
</para>
<para>
It is possible to change the state of a variable during the variable
analysis using the directives:
<programlisting>
#pragma TenDRA set <I>expression</I>
#pragma TenDRA discard <I>expression</I>
</programlisting>
The first asserts that the variable given by the <I>expression</I>
has been assigned to; the second asserts that the variable is not
used. An alternative way of expressing this is by means of keywords:
<programlisting>
SET ( <I>expression</I> )
DISCARD ( <I>expression</I> )
</programlisting>
introduced using the directives.
<programlisting>
#pragma TenDRA keyword <I>identifier</I> for set
#pragma TenDRA keyword <I>identifier</I> for discard variable
</programlisting>
respectively. These expressions can appear in expression statements
and as the first argument of a comma expression.
</para>
<para>
<IMG SRC="../images/warn.gif" ALT="warning"/>
The variable flow analysis checks have not yet been completely implemented.
They may not detect errors in certain circumstances and for extremely
convoluted code may occasionally give incorrect errors.
</para>
</sect3>
<sect3 id="hide">
<title>2.2.46. Variable hiding</title>
<para>
The directive:
<programlisting>
#pragma TenDRA variable hiding analysis <I>on</I>
</programlisting>
can be used to enable a check for hiding of other variables and, in
member functions, data members, by local variable declarations.
</para>
</sect3>
<sect3 id="exception">
<title>2.2.47. Exception analysis</title>
<para>
The ISO C++ rules do not require exception specifications to be checked
statically. This is to facilitate the integration of large systems
where a single change in an exception specification could have ramifications
throughout the system. However it is often useful to apply such checks,
which can be enabled using the directive:
<programlisting>
#pragma TenDRA++ throw analysis <I>on</I>
</programlisting>
This detects any potentially uncaught exceptions and other exception
problems. In the error messages arising from this check, an uncaught
exception of type <code>...</code> means that an uncaught exception
of an unknown type (arising, for example, from a function without
an exception specification) may be thrown. For example:
<programlisting>
void f ( int ) throw ( int ) ;
void g ( int ) throw ( long ) ;
void h ( int ) ;
void e () throw ( int )
{
f ( 1 ) ; // OK
g ( 2 ) ; // uncaught 'long' exception
h ( 3 ) ; // uncaught '...' exception
}
</programlisting>
</para>
</sect3>
<sect3 id="template">
<title>2.2.48. Template compilation</title>
<para>
The C++ producer makes the distinction between exported templates,
which may be used in one module and defined in another, and non-exported
templates, which must be defined in every module in which they are
used. As in the ISO C++ standard, the <code>export</code> keyword
is used to distinguish between the two cases. In the past, different
compilers have had different template compilation models; either all
templates were exported or no templates were exported. The latter
is easily emulated - if the <code>export</code> keyword is not used
then no templates will be exported. To emulate the former behaviour
the directive:
<programlisting>
#pragma TenDRA++ implicit export template <I>on</I>
</programlisting>
can be used to treat all templates as if they had been declared using
the <code>export</code> keyword.
</para>
<para>
<IMG SRC="../images/warn.gif" ALT="warning"/>
The automatic instantiation of exported templates has not yet been
implemented correctly. It is intended that such instantiations will
be generated during <A HREF="link.html">intermodule analysis</A>
(where they conceptually belong). At present it is necessary to work
round this using explicit instantiations.
</para>
</sect3>
<sect3 id="catch_all">
<title>2.2.49. Other checks</title>
<para>
Several checks of varying utility have been implemented in the C++
producer but do not as yet have individual directives controlling
their use. These can be enabled <I>en masse</I> using the directive:
<programlisting>
#pragma TenDRA++ catch all <I>allow</I>
</programlisting>
It is intended that this directive will be phased out as these checks
are assigned controlling directives. It is possible to achieve finer
control over these checks by enabling their individual error messages
<A HREF="#low">as described above</A>.
</para>
</sect3>
</sect2>
<sect2 id="token">
<title>2.3. Token syntax</title>
<para>
The C and C++ producers allow place-holders for various categories
of syntactic classes to be expressed using directives of the form:
<programlisting>
#pragma TenDRA token <I>token-spec</I>
</programlisting>
or simply:
<programlisting>
#pragma token <I>token-spec</I>
</programlisting>
These place-holders are represented as TDF tokens and hence are called
tokens. These tokens stand for a certain type, expression or whatever
which is to be represented by a certain named TDF token in the producer
output. This mechanism is used, for example, to allow C API specifications
to be represented target independently. The types, functions and
expressions comprising the API can be described using <code>#pragma
token</code> directives and the target dependent definitions of these
tokens, representing the implementation of the API on a particular
machine, can be linked in later. This mechanism is described in detail
elsewhere.
</para>
<para>
A <A HREF="pragma1.html#token">summary of the grammar</A> for the
<code>#pragma token</code> directives accepted by the C++ producer
is given as an annex.
</para>
<sect3 id="spec">
<title>2.3.1. Token specifications</title>
<para>
A token specification is divided into two components, a
<I>token-introduction</I> giving the token sort, and a
<I>token-identification</I> giving the internal and external token
names:
<programlisting>
<I>token-spec</I> :
<I>token-introduction token-identification</I>
<I>token-introduction</I> :
<I>exp-token</I>
<I>statement-token</I>
<I>type-token</I>
<I>member-token</I>
<I>procedure-token</I>
<I>token-identification</I> :
<I>token-namespace<SUB>opt</SUB> identifier</I> # <I>external-identifier<SUB>opt</SUB></I>
<I>token-namespace</I> :
TAG
<I>external-identifier</I> :
-
<I>preproc-token-list</I>
</programlisting>
The <code>TAG</code> qualifier is used to indicate that the internal
name lies in the C tag namespace. This only makes sense for structure
and union types. The external token name can be given by any sequence
of preprocessing tokens. These tokens are not macro expanded. If
no external name is given then the internal name is used. The special
external name <code>-</code> is used to indicate that the token does
not have an associated external name, and hence is local to the current
translation unit. Such a local token must be defined. White space
in the external name (other than at the start or end) is used to indicate
that a TDF unique name should be used. The white space serves as
a separator for the unique name components.
</para>
<H4><A id="expression-tokens">Expression tokens</A></H4>
<para>
Expression tokens are specified as follows:
<programlisting>
<I>exp-token</I> :
EXP <I>exp-storage<SUB>opt</SUB></I> : <I>type-id</I> :
NAT
INTEGER
</programlisting>
representing a expression of the given type, a non-negative integer
constant and general integer constant, respectively. Each expression
has an associated storage class:
<programlisting>
<I>exp-storage</I> :
lvalue
rvalue
const
</programlisting>
indicating whether it is an lvalue, an rvalue or a compile-time constant
expression. An absent <I>exp-storage</I> is equivalent to
<code>rvalue</code>. All expression tokens lie in the macro namespace;
that is, they may potentially be defined as macros.
</para>
<para>
For backwards compatibility with the C producer, the directive:
<programlisting>
#pragma TenDRA++ rvalue token as const <I>allow</I>
</programlisting>
causes <code>rvalue</code> tokens to be treated as <code>const</code>
tokens.</para>
<H4>Statement tokens</H4>
<para>
Statement tokens are specified as follows:
<programlisting>
<I>statement-token</I> :
STATEMENT
</programlisting>
All statement tokens lie in the macro namespace.
</para>
<H4>Type tokens</H4>
<para>
Type tokens are specified as follows:
<programlisting>
<I>type-token</I> :
TYPE
VARIETY
VARIETY signed
VARIETY unsigned
FLOAT
ARITHMETIC
SCALAR
CLASS
STRUCT
UNION
</programlisting>
representing a generic type, an integral type, a signed integral type,
an unsigned integral type, a floating point type, an arithmetic (integral
or floating point) type, a scalar (arithmetic or pointer) type, a
class type, a structure type and a union type respectively.
</para>
<para>
<IMG SRC="../images/warn.gif" ALT="warning"/>
Floating-point, arithmetic and scalar token types have not yet been
implemented correctly in either the C or C++ producers.
</para>
<H4><A id="member">Member tokens</A></H4>
<para>
Member tokens are specified as follows:
<programlisting>
<I>member-token</I> :
MEMBER <I>access-specifier<SUB>opt</SUB> member-type-id</I> : <I>type-id</I> :
</programlisting>
where an <I>access-specifier</I> of <code>public</code> is assumed
if none is given. The member type is given by:
<programlisting>
<I>member-type-id</I> :
<I>type-id</I>
<I>type-id</I> % <I>constant-expression</I>
</programlisting>
where <code>%</code> is used to denote bitfield members (since
<code>:</code> is used as a separator). The second type denotes the
structure or union the given member belongs to. Different types can
have members with the same internal name, but the external token name
must be unique. Note that only non-static data members can be represented
in this form.
</para>
<para>
Two declarations for the same <code>MEMBER</code> token (including token
definitions) should have the same type, however the directive:
<programlisting>
#pragma TenDRA++ incompatible member declaration <I>allow</I>
</programlisting>
allows declarations with different types, provided these types have the
same size and alignment requirements.
</para>
<H4>Procedure tokens</H4>
<para>
Procedure, or high-level, tokens are specified in one of three ways:
<programlisting>
<I>procedure-token</I> :
<I>general-procedure</I>
<I>simple-procedure</I>
<I>function-procedure</I>
</programlisting>
All procedure tokens (except ellipsis functions - see below) lie in
the macro namespace. The most general form of procedure token specifies
two sets of parameters. The bound parameters are those which are
used in encoding the actual TDF output, and the program parameters
are those which are <A HREF="#args">specified in the program</A>.
The program parameters are expressed in terms of the bound parameters.
A program parameter can be an expression token parameter, a statement
token parameter, a member token parameter, a procedure token parameter
or any type. The bound parameters are deduced from the program parameters
by a similar process to that used in template argument deduction.
<programlisting>
<I>general-procedure</I> :
PROC { <I>bound-toks<SUB>opt</SUB></I> | <I>prog-pars<SUB>opt</SUB></I> } <I>token-introduction
</I>
<I>bound-toks</I> :
<I>bound-token</I>
<I>bound-token</I> , <I>bound-toks</I>
<I>bound-token</I> :
<I>token-introduction token-namespace<SUB>opt</SUB> identifier</I>
<I>prog-pars</I> :
<I>program-parameter</I>
<I>program-parameter</I> , <I>prog-pars</I>
<I>program-parameter</I> :
EXP <I>identifier</I>
STATEMENT <I>identifier</I>
TYPE <I>type-id</I>
MEMBER <I>type-id</I> : <I>identifier</I>
PROC <I>identifier</I>
</programlisting>
</para>
<para>
The simplest form of a <I>general-procedure</I> is one in which the
<I>prog-pars</I> correspond precisely to the <I>bound-toks</I>. In
this case the syntax:
<programlisting>
<I>simple-procedure</I> :
PROC ( <I>simple-toks<SUB>opt</SUB></I> ) <I>token-introduction</I>
<I>simple-toks</I> :
<I>simple-token</I>
<I>simple-token</I> , <I>simple-toks</I>
<I>simple-token</I> :
<I>token-introduction token-namespace<SUB>opt</SUB> identifier<SUB>opt</SUB></I>
</programlisting>
may be used. Note that the parameter names are optional.
</para>
<para>
A function token is specified as follows:
<programlisting>
<I>function-procedure</I> :
FUNC <I>type-id</I> :
</programlisting>
where the given type is a function type. This has two effects: firstly
a function with the given type is declared; secondly, if the function
type has the form:
<programlisting>
r ( p1, ...., pn )
</programlisting>
a procedure token with sort:
<programlisting>
PROC ( EXP rvalue : p1 :, ...., EXP rvalue : pn : ) EXP rvalue : r :
</programlisting>
is declared. For ellipsis function types only the function, not the
token, is declared. Note that the token behaves like a macro definition
of the corresponding function. Unless explicitly enclosed in a linkage
specification, a function declared using a <code>FUNC</code>
token has C linkage. Note that it is possible for two <code>FUNC</code>
tokens to have the same internal name, because of function overloading,
however external names must be unique.
</para>
<para>
The directive:
<programlisting>
#pragma TenDRA incompatible interface declaration <I>allow</I>
</programlisting>
can be used to allow incompatible redeclarations of functions declared
using <code>FUNC</code> tokens. The token declaration takes precedence.
</para>
<para>
<IMG SRC="../images/warn.gif" ALT="warning"/>
Certain of the more complex examples of <code>PROC</code> tokens such
as, for example, tokens with <code>PROC</code> parameters, have not
been implemented in either the C or C++ producers.
</para>
</sect3>
<sect3 id="token-arguments">
<title>2.3.2. Token arguments</title>
<para>
As mentioned above, the program parameters for a <code>PROC</code>
token are those specified in the program itself. These arguments
are expressed as a comma-separated list enclosed in brackets, the
form of each argument being determined by the corresponding program
parameter.
</para>
<para>
An <code>EXP</code> argument is an assignment expression. This must
be an lvalue for <code>lvalue</code> tokens and a constant expression
for
<code>const</code> tokens. The argument is converted to the token
type (for <code>lvalue</code> tokens this is essentially a conversion
between the corresponding reference types). A <code>NAT</code> or
<code>INTEGER</code> argument is an integer constant expression.
In the former case this must be non-negative.
</para>
<para>
A <code>STATEMENT</code> argument is a statement. This statement
should not contain any labels or any <code>goto</code> or <code>return</code>
statements.
</para>
<para>
A type argument is a type identifier. This must name a type of the
correct category for the corresponding token. For example, a
<code>VARIETY</code> token requires an integral type.
</para>
<para>
<A id="offset">A member argument must describe the offset of a member
or nested member of the given structure or union type</A>. The type
of the member should agree with that of the <code>MEMBER</code> token.
The general form of a member offset can be described in terms of member
selectors and array indexes as follows:
<programlisting>
<I>member-offset</I> :
::<I><SUB>opt</SUB> id-expression</I>
<I>member-offset</I> . ::<I><SUB>opt</SUB> id-expression</I>
<I>member-offset</I> [ <I>constant-expression</I> ]
</programlisting>
</para>
<para>
A <code>PROC</code> argument is an identifier. This identifier must
name a <code>PROC</code> token of the appropriate sort.
</para>
</sect3>
<sect3 id="tokdef">
<title>2.3.3. Defining tokens</title>
<para>
Given a token specification of a syntactic object and a normal language
definition of the same object (including macro definitions if the
token lies in the macro namespace), the producers attempt to unify
the two by defining the TDF token in terms of the given definition.
Whether the token specification occurs before or after the language
definition is immaterial. Unification also takes place in situations
where, for example, two types are known to be compatible. Multiple
consistent explicit token definitions are allowed by default when
allowed by the language; this is controlled by the directive:
<programlisting>
#pragma TenDRA compatible token <I>allow</I>
</programlisting>
The default unification behaviour may be modified using the directives:
<programlisting>
#pragma TenDRA no_def <I>token-list</I>
#pragma TenDRA define <I>token-list</I>
#pragma TenDRA reject <I>token-list</I>
</programlisting>
or equivalently:
<programlisting>
#pragma no_def <I>token-list</I>
#pragma define <I>token-list</I>
#pragma ignore <I>token-list</I>
</programlisting>
which set the state of the tokens given in <I>token-list</I>. A state
of <code>no_def</code> means that no unification is attempted and
that any attempt to explicitly define the token results in an error.
A state of <code>define</code> means that unification takes place
and that the token must be defined somewhere in the translation unit.
A state of <code>reject</code> means that unification takes place as
normal, but any resulting token definition is discarded and not output
to the TDF capsule.
</para>
<para>
If a token with the state <code>define</code> is not defined, then the
behaviour depends on the sort of the token. A <code>FUNC</code> token
is implicitly defined in terms of its underlying function, such as:
<programlisting>
#define f( a1, ...., an ) ( f ) ( a1, ...., an )
</programlisting>
Other undefined tokens cause an error. This behaviour can be modified
using the directives:
<programlisting>
#pragma TenDRA++ implicit token definition <I>allow</I>
#pragma TenDRA++ no token definition <I>allow</I>
</programlisting>
respectively.</para>
<para>
The primitive operations, <code>no_def</code>, <code>define</code> and
<code>reject</code>, can also be expressed using the context sensitive
directive:
<programlisting>
#pragma TenDRA interface <I>token-list</I>
</programlisting>
or equivalently:
<programlisting>
#pragma interface <I>token-list</I>
</programlisting>
By default this is equivalent to <code>no_def</code>, but may be modified
by inclusion using one of the directives:
<programlisting>
#pragma TenDRA extend <I>header-name</I>
#pragma TenDRA implement <I>header-name</I>
</programlisting>
or equivalently:
<programlisting>
#pragma extend interface <I>header-name</I>
#pragma implement interface <I>header-name</I>
</programlisting>
These are equivalent to:
<programlisting>
#include <I>header-name</I>
</programlisting>
except that the form <code>[....]</code> is allowed as a header name.
This is equivalent to <code><....></code> except that it starts
the directory search after the point at which the including file was
found, rather than at the start of the path (i.e. it is equivalent
to the
<code>#include_next</code> directive found in some preprocessors).
The effect of the <code>extend</code> directive on the state of the
<code>interface</code> directive is as follows:
<programlisting>
no_def -> no_def
define -> reject
reject -> reject
</programlisting>
The effect of the <code>implement</code> directive is as follows:
<programlisting>
no_def -> define
define -> define
reject -> reject
</programlisting>
That is to say, a <code>implement</code> directive will cause all
the tokens in the given header to be defined and their definitions
output. Any tokens included in this header by <code>extend</code>
may be defined, but their definitions will not be output. This is
precisely the behaviour which is required to ensure that each token
is defined exactly once in an API library build.
</para>
<para>
The lists of tokens in the directives above are expressed in the form:
<programlisting>
<I>token-list</I> :
<I>token-id token-list<SUB>opt</SUB></I>
# <I>preproc-token-list</I>
</programlisting>
where a <I>token-id</I> represents an internal token name:
<programlisting>
<I>token-id</I> :
<I>token-namespace<SUB>opt</SUB> identifier</I>
<I>type-id</I> . <I>identifier</I>
</programlisting>
Note that member tokens are specified by means of both the member
name and its parent type. In this type specifier, <code>TAG</code>,
rather than
<code>class</code>, <code>struct</code> or <code>union</code>, may
be used in elaborated type specifiers for structure and union tokens.
If the
<I>token-id</I> names an overloaded function then the directive is
applied to all <code>FUNC</code> tokens of that name. It is possible
to be more selective using the <code>#</code> form which allows the
external token name to be specified. Such an entry must be the last
in a <I>token-list</I>.
</para>
<para>
A related directive has the form:
<programlisting>
#pragma TenDRA++ undef token <I>token-list</I>
</programlisting>
which undefines all the given tokens so that they are no longer visible.
</para>
<para>
As noted above, a macro is only considered as a token definition if
the token lies in the macro namespace. Tokens which are not in the
macro namespace, such as types and members, cannot be defined using
macros. Occasionally API implementations do define member selector
as macros in terms of other member selectors. Such a token needs
to be explicitly defined using a directive of the form:
<programlisting>
#pragma TenDRA member definition <I>type-id</I> : <I>identifier member-offset
</I>
</programlisting>
where <I>member-offset</I> is <A HREF="#offset">as above</A>.
</para>
</sect3>
</sect2>
<sect2>
<title>2.4. Symbol table dump</title>
<para>
The symbol table dump provides a method whereby third party tools
can interface with the C and C++ producers. The producer outputs
information on the identifiers declared within a source file, their
uses etc. into a file which can then be post-processed by a separate
tool. Any error messages and warnings can also be included in this
file, allowing more sophisticated error presentation tools to be written.
</para>
<para>
The file to be used as the symbol table output file, plus details
of what information is to be included in the dump file can be specified
using the <A HREF="man.html#dump"><code>-d</code> command-line option</A>.
The format of the dump file is described below; a
<A HREF="dump1.html">summary of the syntax</A> is given as an annex.
</para>
<sect3 id="lexical-elements">
<title>2.4.1. Lexical elements</title>
<para>
A symbol table dump file consists of a sequence of characters giving
information on identifiers, errors etc. arising from a translation
unit. The fundamental lexical tokens are a <I>number</I>, consisting
of a sequence of decimal digits, and a <I>string</I>, consisting of
a sequence of characters enclosed in angle braces. A <I>string</I>
can have one of two forms:
<programlisting>
<I>string</I> :
<<I>characters</I>>
&<I>number</I><<I>characters</I>>
</programlisting>
In the first form, the <I>characters</I> are terminated by the first
<code>></code> character encountered. In the second form, the
number of characters is given by the preceding <I>number</I>. No
white space is allowed either before or after the <I>number</I>.
To aid parsers, the C++ producer always uses the second form for strings
containing more than 100 characters. There are no escape characters
in strings; the
<I>characters</I> can contain any characters, including newlines and
<code>#</code>, except that the first form cannot contain a
<code>></code> character.
</para>
<para>
Space, tab and newline characters are white space. Comments begin
with
<code>#</code> and run to the end of the line. Comments are treated
as white space. All other characters are treated as distinct lexical
tokens.
</para>
</sect3>
<sect3 id="main">
<title>2.4.2. Overall syntax</title>
<para>
A symbol table dump file takes the form of a list of commands of various
kinds conveying information on the analysed file. This can be represented
as follows:
<programlisting>
<I>dump-file</I> :
<I>command-list<SUB>opt</SUB></I>
<I>command-list</I> :
<I>command command-list<SUB>opt</SUB></I>
<I>command</I> :
<I>version-command</I>
<I>identifier-command</I>
<I>scope-command</I>
<I>override-command</I>
<I>base-command</I>
<I>api-command</I>
<I>template-command</I>
<I>promotion-command</I>
<I>error-command</I>
<I>path-command</I>
<I>file-command</I>
<I>include-command</I>
<I>string-command</I>
</programlisting>
The various kinds of command are discussed below. The first command
in the dump file should be of the form:
<programlisting>
<I>version-command</I> :
V <I>number number string</I>
</programlisting>
where the two numbers give the version of the dump file format (the
version described here is 1.1 so both numbers should be 1) and the
string gives the language being represented, for example,
<code><C++></code>.
</para>
</sect3>
<sect3 id="file-locations">
<title>2.4.3. File locations</title>
<para>
A location within a source file can be specified using three
<I>number</I>s and two <I>string</I>s. These give respectively, the
column number, the line number taking <code>#line</code> directives
into account, the line number not taking <code>#line</code> directives
into account, the file name taking <code>#line</code> directives into
account, and the file name not taking <code>#line</code> directives
into account. Any or all of the trailing elements can be replaced
by
<code>*</code> to indicate that they have not changed relative to
the last <I>location</I> given. Note that for the two line numbers,
unchanged means that the difference of the line numbers, taking
<code>#line</code> directives into account or not, is unchanged.
Thus:
<programlisting>
<I>location</I> :
<I>number number number string string</I>
<I>number number number string</I> *
<I>number number number</I> *
<I>number number</I> *
<I>number</I> *
*
</programlisting>
Note that there is a concept of the <A id="crt_loc">current file
location</A>, relative to which other locations are given. The initial
value of the current file location is undefined. Unless otherwise
stated, all <I>location</I> elements update the current file location.
</para>
</sect3>
<sect3 id="identifiers">
<title>2.4.4. Identifiers</title>
<para>
Each identifier is represented in the symbol table dump by a unique
number. The same number always represents the same identifier.
</para>
<H4><A id="hashid">Identifier names</A></H4>
<para>
The number representing an identifier is introduced in the first declaration
or use of that identifier and thereafter the number alone is used
to denote the identifier:
<programlisting>
<I>identifier</I> :
<I>number</I> = <I>identifier-name access<SUB>opt</SUB> scope-identifier</I>
<I>number</I>
</programlisting>
</para>
<para>
The identifier name is given by:
<programlisting>
<I>identifier-name</I> :
<I>string</I>
C <I>type</I>
D <I>type</I>
O <I>string</I>
T <I>type</I>
</programlisting>
denoting respectively, a simple identifier name, a constructor for
a type, a destructor for a type, an overloaded operator function name,
and a conversion function name. The empty string is used for anonymous
identifiers.
</para>
<para>
The optional identifier access is given by:
<programlisting>
<I>access</I> :
N
B
P
</programlisting>
denoting <code>public</code>, <code>protected</code> and
<code>private</code> respectively. An absent <I>access</I> is equivalent
to <code>public</code>. Note that all identifiers, not just class
members, can have access specifiers; however the access of a non-member
is always <code>public</code>.
</para>
<para>
The <A HREF="#scope">scope</A> (i.e. class, namespace, block etc.)
in which an identifier is declared is given by:
<programlisting>
<I>scope-identifier</I> :
<I>identifier</I>
*
</programlisting>
denoting either a named or an unnamed scope.
</para>
<H4><A id="use">Identifier uses</A></H4>
<para>
Each declaration or use of an identifier is represented by a command
of the form:
<programlisting>
<I>identifier-command</I> :
D <I>identifier-info type-info</I>
M <I>identifier-info type-info</I>
T <I>identifier-info type-info</I>
Q <I>identifier-info</I>
U <I>identifier-info</I>
L <I>identifier-info</I>
C <I>identifier-info</I>
W <I>identifier-info type-info</I>
</programlisting>
where:
<programlisting>
<I>identifier-info</I> :
<I>identifier-key location identifier</I>
</programlisting>
gives the kind of identifier being declared or used, the location
of the declaration or use, and the number associated with the identifier.
Each declaration may, depending on the <I>identifier-key</I>, associate
various <I>type-info</I> with the identifier, giving its type etc.
</para>
<para>
The various kinds of <I>identifier-command</I> are described below.
Any can be preceded by <code>I</code> to indicate an implicit declaration
or use. <code>D</code> denotes a definition. <code>M</code> (make)
denotes a declaration. <code>T</code> denotes a tentative definition
(C only). <code>Q</code> denotes the end of a definition, for those
identifiers such as classes and functions whose definitions may be
spread over several lines. <code>U</code> denotes an undefine operation
(such as <code>#undef</code> for macro identifiers). <code>C</code>
denotes a call to a function identifier; <code>L</code> (load) denotes
other identifier uses. Finally <code>W</code> denotes implicit type
information such as the C producer gleans from its
<A HREF="pragma.html#weak">weak prototype analysis</A>.
</para>
<para>
The various <I>identifier-key</I>s are their associated <I>type-info</I>
fields are given by the following table:
</para>
<table>
<tr><th>Key</th>
<th>Type information</th>
<th>Description</th>
</tr>
<tr><td><code>K</code></td>
<td><code>*</code></td>
<td>keyword</td>
</tr>
<tr><td><code>MO</code></td>
<td><I>sort</I></td>
<td>object macro</td>
</tr>
<tr><td><code>MF</code></td>
<td><I>sort</I></td>
<td>function macro</td>
</tr>
<tr><td><code>MB</code></td>
<td><I>sort</I></td>
<td>built-in macro</td>
</tr>
<tr><td><code>TC</code></td>
<td><I>type</I></td>
<td>class tag</td>
</tr>
<tr><td><code>TS</code></td>
<td><I>type</I></td>
<td>structure tag</td>
</tr>
<tr><td><code>TU</code></td>
<td><I>type</I></td>
<td>union tag</td>
</tr>
<tr><td><code>TE</code></td>
<td><I>type</I></td>
<td>enumeration tag</td>
</tr>
<tr><td><code>TA</code></td>
<td><I>type</I></td>
<td><code>typedef</code> name</td>
</tr>
<tr><td><code>NN</code></td>
<td><code>*</code></td>
<td>namespace name</td>
</tr>
<tr><td><code>NA</code></td>
<td><I>scope-identifier</I></td>
<td>namespace alias</td>
</tr>
<tr><td><code>VA</code></td>
<td><I>type</I></td>
<td>automatic variable</td>
</tr>
<tr><td><code>VP</code></td>
<td><I>type</I></td>
<td>function parameter</td>
</tr>
<tr><td><code>VE</code></td>
<td><I>type</I></td>
<td><code>extern</code> variable</td>
</tr>
<tr><td><code>VS</code></td>
<td><I>type</I></td>
<td><code>static</code> variable</td>
</tr>
<tr><td><code>FE</code></td>
<td><I>type identifier<SUB>opt</SUB></I></td>
<td><code>extern</code> function</td>
</tr>
<tr><td><code>FS</code></td>
<td><I>type identifier<SUB>opt</SUB></I></td>
<td><code>static</code> function</td>
</tr>
<tr><td><code>FB</code></td>
<td><I>type identifier<SUB>opt</SUB></I></td>
<td>built-in operator function</td>
</tr>
<tr><td><code>CF</code></td>
<td><I>type identifier<SUB>opt</SUB></I></td>
<td>member function</td>
</tr>
<tr><td><code>CS</code></td>
<td><I>type identifier<SUB>opt</SUB></I></td>
<td><code>static</code> member function</td>
</tr>
<tr><td><code>CV</code></td>
<td><I>type identifier<SUB>opt</SUB></I></td>
<td>virtual member function</td>
</tr>
<tr><td><code>CM</code></td>
<td><I>type</I></td>
<td>data member</td>
</tr>
<tr><td><code>CD</code></td>
<td><I>type</I></td>
<td><code>static</code> data member</td>
</tr>
<tr><td><code>E</code></td>
<td><I>type</I></td>
<td>enumerator</td>
</tr>
<tr><td><code>L</code></td>
<td><code>*</code></td>
<td>label</td>
</tr>
<tr><td><code>XO</code></td>
<td><I>sort</I></td>
<td>object token</td>
</tr>
<tr><td><code>XF</code></td>
<td><I>sort</I></td>
<td>procedure token</td>
</tr>
<tr><td><code>XP</code></td>
<td><I>sort</I></td>
<td>token parameter</td>
</tr>
<tr><td><code>XT</code></td>
<td><I>sort</I></td>
<td>template parameter</td>
</tr>
</table>
<para>
The function identifier keys can optionally be followed by
<code>C</code> indicating that the function has C linkage, and
<code>I</code> indicating that the function is inline. By default,
functions declared in a C++ dump file have C++ linkage and functions
declared in a C dump file have C linkage. The optional
<I>identifier</I> which forms part of the <I>type-info</I> of these
functions is used to form linked lists of overloaded functions.
</para>
<H4><A id="scope">Identifier scopes</A></H4>
<para>
Each identifier belongs to a scope, called its parent scope, in which
it is declared. For example, the parent of a member of a class is
the class itself. This information is expressed in an identifier
declaration using a <I>scope-identifier</I>. In addition to the obvious
scopes such as classes and namespaces, there are other scopes such
as blocks in function definitions. It is possible to introduce dummy
identifiers to name such scopes. The parent of such a dummy identifier
will be the enclosing scope identifier, so these dummy identifiers
naturally represent the block structure. The parent of the top-level
block in a function definition can be considered to be the function
itself.
</para>
<para>
Information on the start and end of such scopes is given by:
<programlisting>
<I>scope-command</I> :
SS <I>scope-key location identifier</I>
SE <I>scope-key location identifier</I>
</programlisting>
where:
<programlisting>
<I>scope-key</I> :
N
S
B
D
H
CT
CF
CC
</programlisting>
gives the kind of scope involved: a namespace, a class, a block, some
other declarative scope, a declaration block (see below), a true conditional
scope, a false conditional scope or a target dependent conditional
scope.
</para>
<para>
A declaration block is a sequence of declarations enclosed in directives
of the form:
<programlisting>
#pragma TenDRA declaration block <I>identifier</I> begin
....
#pragma TenDRA declaration block end
</programlisting>
This allows the sequence of declarations to be associated with the
given
<I>identifier</I> in the symbol dump file. This technique is used
in the API description files to aid analysis tools in determining
which declarations are part of the API.
</para>
<H4><A id="scope">Other identifier information</A></H4>
<para>
Other information associated with an identifier may be expressed using
other dump commands. For example:
<programlisting>
<I>override-command</I> :
O <I>identifier identifier</I>
</programlisting>
is used to express the fact that the two <I>identifier</I>s are virtual
member functions, the first of which overrides the second.
</para>
<para>
The command:
<programlisting>
<I>base-command</I> :
B <I>identifier-key identifier base-graph</I>
<I>base-graph</I> :
<I>base-class</I>
<I>base-class</I> ( <I>base-list</I> )
<I>base-class</I> :
<I>number</I> = V<I><SUB>opt</SUB> access<SUB>opt</SUB> type-name</I>
<I>number</I> :
<I>base-list</I> :
<I>base-graph base-list<SUB>opt</SUB></I>
</programlisting>
associates a base class graph with a class identifier. Any class
which does not have an associated <I>base-command</I> can be assumed
to have no base classes. Each node in the graph is a <I>type-name</I>
with an associated list of base classes. A <code>V</code> is used
to indicate a virtual base class. Each node is numbered; duplicate
numbers are used to indicate bases identified via the virtual base
class structure. Any base class can then be referred to as:
<programlisting>
<I>base-number</I> :
<I>number</I> : <I>type-name</I>
</programlisting>
indicating the base class with the given number in the given class.
</para>
<para>
The command:
<programlisting>
<I>api-command</I> :
X <I>identifier-key identifier string</I>
</programlisting>
associates the external token name given by the <I>string</I> with
the given tokenised identifier.
</para>
<para>
The command:
<programlisting>
<I>template-command</I> :
Z <I>identifier-key identifier token-application specialise-info</I>
</programlisting>
is used to introduce an identifier corresponding to an instance of
a template, <I>token-application</I>. This instance may correspond
to a specialisation of the primary template; this information is represented
by:
<programlisting>
<I>specialise-info</I> :
<I>identifier</I>
<I>token-application</I>
*
</programlisting>
where <code>*</code> indicates a non-specialised instance.
</para>
</sect3>
<sect3 id="types">
<title>2.4.5. Types</title>
<para>
The <A id="built-in">built-in types</A> are represented in the symbol
table dump as follows:
</para>
<table>
<tr><th>Type</th>
<th>Encoding</th>
<th>Type</th>
<th>Encoding</th>
</tr>
<tr><td>char</td>
<td><code>c</code></td>
<td>float</td>
<td><code>f</code></td>
</tr>
<tr><td>signed char</td>
<td><code>Sc</code></td>
<td>double</td>
<td><code>d</code></td>
</tr>
<tr><td>unsigned char</td>
<td><code>Uc</code></td>
<td>long double</td>
<td><code>r</code></td>
</tr>
<tr><td>signed short</td>
<td><code>s</code></td>
<td>void</td>
<td><code>v</code></td>
</tr>
<tr><td>unsigned short</td>
<td><code>Us</code></td>
<td>(bottom)</td>
<td><code>u</code></td>
</tr>
<tr><td>signed int</td>
<td><code>i</code></td>
<td>bool</td>
<td><code>b</code></td>
</tr>
<tr><td>unsigned int</td>
<td><code>Ui</code></td>
<td>ptrdiff_t</td>
<td><code>y</code></td>
</tr>
<tr><td>signed long</td>
<td><code>l</code></td>
<td>size_t</td>
<td><code>z</code></td>
</tr>
<tr><td>unsigned long</td>
<td><code>Ul</code></td>
<td>wchar_t</td>
<td><code>w</code></td>
</tr>
<tr><td>signed long long</td>
<td><code>x</code></td>
<td>-</td>
<td>-</td>
</tr>
<tr><td>unsigned long long</td>
<td><code>Ux</code></td>
<td>-</td>
<td>-</td>
</tr>
</table>
<para>
Named types (classes, enumeration types etc.) can be represented by
the corresponding identifier or token application:
<programlisting>
<I>type-name</I> :
<I>identifier</I>
<I>token-application</I>
</programlisting>
<A id="composite">Composite and qualified types</A> are represented
in terms of their subtypes as follows:
</para>
<table>
<tr><th>Type</th>
<th>Encoding</th>
</tr>
<tr><td><code>const</code> type</td>
<td><code>C</code> <I>type</I></td>
</tr>
<tr><td><code>volatile</code> type</td>
<td><code>V</code> <I>type</I></td>
</tr>
<tr><td>pointer type</td>
<td><code>P</code> <I>type</I></td>
</tr>
<tr><td>reference type</td>
<td><code>R</code> <I>type</I></td>
</tr>
<tr><td>pointer to member type</td>
<td><code>M</code> <I>type-name</I> <code>:</code> <I>type</I></td>
</tr>
<tr><td>function type</td>
<td><code>F</code> <I>type parameter-types</I></td>
</tr>
<tr><td>array type</td>
<td><code>A</code> <I>nat<SUB>opt</SUB></I> <code>:</code> <I>type</I></td>
</tr>
<tr><td>bitfield type</td>
<td><code>B</code> <I>nat</I> <code>:</code> <I>type</I></td>
</tr>
<tr><td>template type</td>
<td><code>t</code> <I>parameter-list<SUB>opt</SUB></I> <code>:</code> <I>type</I></td>
</tr>
<tr><td>promotion type</td>
<td><code>p</code> <I>type</I></td>
</tr>
<tr><td>arithmetic type</td>
<td><code>a</code> <I>type</I> <code>:</code> <I>type</I></td>
</tr>
<tr><td>integer literal type</td>
<td><code>n</code> <I>lit-base<SUB>opt</SUB> lit-suffix<SUB>opt</SUB></I></td>
</tr>
<tr><td>weak function prototype (C only)</td>
<td><code>W</code> <I>type parameter-types</I></td>
</tr>
<tr><td>weak parameter type (C only)</td>
<td><code>q</code> <I>type</I></td>
</tr>
</table>
<para>
Other types can be represented by their textual representation using
the form <code>Q</code> <I>string</I>, or by <code>*</code>, indicating
an unknown type.
</para>
<para>
The parameter types for a function type are represented as follows:
<programlisting>
<I>parameter-types</I> :
: <I>exception-spec<SUB>opt</SUB> func-qualifier<SUB>opt</SUB></I> :
. <I>exception-spec<SUB>opt</SUB> func-qualifier<SUB>opt</SUB></I> :
. <I>exception-spec<SUB>opt</SUB> func-qualifier<SUB>opt</SUB></I> .
, <I>type parameter-types</I>
</programlisting>
where the <code>::</code> form indicates that there are no further
parameters, the <code>.:</code> form indicates that the parameters
are terminated by an ellipsis, and the <code>..</code> form indicates
that no information is available on the further parameters (this can
only happen with non-prototyped functions in C). The function qualifiers
are given by:
<programlisting>
<I>func-qualifier</I> :
C <I>func-qualifier<SUB>opt</SUB></I>
V <I>func-qualifier<SUB>opt</SUB></I>
</programlisting>
representing <code>const</code> and <code>volatile</code> member functions.
The function exception specifier is given by:
<programlisting>
<I>exception-spec</I> :
( <I>exception-list<SUB>opt</SUB></I> )
<I>exception-list</I> :
<I>type</I>
<I>type</I> , <I>exception-list</I>
</programlisting>
with an absent exception specifier, as in C++, indicating that any
exception may be thrown.
</para>
<para>
Array and bitfield sizes are represented as follows:
<programlisting>
<I>nat</I> :
+ <I>number</I>
- <I>number</I>
<I>identifier</I>
<I>token-application</I>
<I>string</I>
</programlisting>
where a <I>string</I> is used to hold a textual representation of
complex values.
</para>
<para>
Template types are represented by a list of template parameters, which
will have previously been declared using the <code>XT</code> identifier
key, followed by the underlying type expressed in terms of these parameters.
The parameters are represented as follows:
<programlisting>
<I>parameter-list</I> :
<I>identifier</I>
<I>identifier</I> , <I>parameter-list</I>
</programlisting>
</para>
<para>
Integer literal types are represented by the value of the literal
followed by a representation of the literal base and suffix. These
are given by:
<programlisting>
<I>lit-base</I> :
O
X
</programlisting>
representing octal and hexadecimal literals respectively (decimal
is the default), and:
<programlisting>
<I>lit-suffix</I> :
U
l
Ul
x
Ux
</programlisting>
representing the <code>U</code>, <code>L</code>, <code>UL</code>,
<code>LL</code> and <code>ULL</code> suffixes respectively.
</para>
<para>
Target dependent integral promotion types are represented using
<code>p</code>, so for example the promotion of <code>unsigned short</code>
is represented as <code>pUs</code>. Information on the other cases,
where the promotion type is known, can be given in a command of the
form:
<programlisting>
<I>promotion-command</I> :
P <I>type</I> : <I>type</I>
</programlisting>
Thus the fact that the promotion of <code>short</code> is <code>int</code>
would be expressed by the command <code>Ps:i</code>.
</para>
</sect3>
<sect3 id="sort">
<title>2.4.6. Sorts</title>
<para>
A <I>sort</I> in the symbol table dump corresponds to the sort of
a token declared in the <A HREF="token.html#spec"><code>#pragma token</code>
syntax</A>. Expression tokens are represented as follows:
<programlisting>
<I>expression-sort</I> :
ZEL <I>type</I>
ZER <I>type</I>
ZEC <I>type</I>
ZN
</programlisting>
corresponding to <code>lvalue</code>, <code>rvalue</code> and
<code>const</code> <code>EXP</code> tokens of the given type, and
<code>NAT</code> or <code>INTEGER</code> tokens, respectively. Statement
tokens are represent by:
<programlisting>
<I>statement-sort</I> :
ZS
</programlisting>
</para>
<para>
Type tokens are represented as follows:
<programlisting>
<I>type-sort</I> :
ZTO
ZTI
ZTF
ZTA
ZTP
ZTS
ZTU
</programlisting>
corresponding to <code>TYPE</code>, <code>VARIETY</code>, <code>FLOAT</code>,
<code>ARITHMETIC</code>, <code>SCALAR</code>, <code>STRUCT</code>
or
<code>CLASS</code>, and <code>UNION</code> token respectively. There
are corresponding <code>TAG</code> forms:
<programlisting>
<I>tag-type-sort</I> :
ZTTS
ZTTU
</programlisting>
</para>
<para>
Member tokens are represented using:
<programlisting>
<I>member-sort</I> :
ZM <I>type</I> : <I>type-name</I>
</programlisting>
where the first type gives the member type and the second gives the
parent structure or union type.
</para>
<para>
Procedure tokens can be represented using:
<programlisting>
<I>proc-sort</I> :
ZPG <I>parameter-list<SUB>opt</SUB></I> ; <I>parameter-list<SUB>opt</SUB></I> : <I>sort</I>
ZPS <I>parameter-list<SUB>opt</SUB></I> : <I>sort</I>
</programlisting>
The first form corresponds to the more general form of <code>PROC</code>
token, that expressed using <code>{ .... | .... }</code>, which has
separate lists of bound and program parameters. These token parameters
will have previously been declared using the <code>XP</code> identifier
key. The second form corresponds to the case where the bound and
program parameter lists are equal, that expressed as a <code>PROC</code>
token using <code>( .... )</code>. A more specialised version of
this second form is a <code>FUNC</code> token, which is represented
as:
<programlisting>
<I>func-sort</I> :
ZF <I>type</I>
</programlisting>
</para>
<para>
As noted above, template parameters are represented by a <I>sort</I>.
Template type parameters are represented by <code>ZTO</code>, while
template expression parameters are represent by <code>ZEC</code>
(recall that such parameters are always constant expressions). The
remaining case, template template parameters, can be represented as:
<programlisting>
<I>template-sort</I> :
ZTt <I>parameter-list<SUB>opt</SUB></I> :
</programlisting>
</para>
<para>
Finally, the number of parameters in a macro definition is represented
by a <I>sort</I> of the form:
<programlisting>
<I>macro-sort</I> :
ZUO
ZUF <I>number</I>
</programlisting>
corresponding to a object-like macro and a function-like macro with
the given number of parameters, respectively.
</para>
</sect3>
<sect3 id="token-applications">
<title>2.4.7. Token applications</title>
<para>
Given an identifier representing a <code>PROC</code> token or a template,
an application of that token or an instance of that template can be
represented using:
<programlisting>
<I>token-application</I> :
T <I>identifier</I> , <I>token-argument-list</I> :
</programlisting>
where the token or template arguments are given by:
<programlisting>
<I>token-argument-list</I> :
<I>token-argument</I>
<I>token-argument</I> , <I>token-argument-list</I>
</programlisting>
Note that the case where there are no arguments is generally just
represented by <I>identifier</I>; this case is specified separately
in the rest of the grammar.
</para>
<para>
A <I>token-argument</I> can represent a value of any of the sorts
listed above: expressions, integer constants, statements, types, members,
functions and templates. These are given respectively by:
<programlisting>
<I>token-argument</I> :
E <I>expression</I>
N <I>nat</I>
S <I>statement</I>
T <I>type</I>
M <I>member</I>
F <I>identifier</I>
C <I>identifier</I>
</programlisting>
where:
<programlisting>
<I>expression</I> :
<I>nat</I>
<I>statement</I> :
<I>expression</I>
<I>member</I> :
<I>identifier</I>
<I>string</I>
</programlisting>
</para>
</sect3>
<sect3 id="error">
<title>2.4.8. Errors</title>
<para>
Each error in the C++ <A HREF="error.html">error catalogue</A> is
represented by a number. These numbers happen to correspond to the
position of the error within the catalogue, but in general this need
not be the case. The first use of each error introduces the error
number by associating it with a <I>string</I> giving the error name.
This has the form <code>cpp.</code><I>error</I> where <I>error</I>
gives an error name from the C++ (<code>cpp</code>) error catalogue.
Thus:
<programlisting>
<I>error-name</I> :
<I>number</I> = <I>string</I>
<I>number</I>
</programlisting>
</para>
<para>
Each error message written to the symbol table dump has the form:
<programlisting>
<I>error-command</I> :
ES <I>location error-info</I>
EW <I>location error-info</I>
EI <I>location error-info</I>
EF <I>location error-info</I>
EC <I>error-info</I>
EA <I>error-argument</I>
</programlisting>
denoting constraint errors, warnings, internal errors, fatal errors,
continuation errors and error arguments respectively. Note that an
error message may consist of several components; the initial error
plus a number of continuation errors. Each error message may also
have a number of error argument associated with it. This error information
is given by:
<programlisting>
<I>error-info</I> :
<I>error-name number number</I>
</programlisting>
where the first <I>number</I> gives the number of error arguments
which should be read, and the second is nonzero to indicate that a
continuation error should be read.
</para>
<para>
Each error argument has one of the forms:
<programlisting>
<I>error-argument</I> :
B <I>base-number</I>
C <I>scope-identifier</I>
E <I>expression</I>
H <I>identifier-name</I>
I <I>identifier</I>
L <I>location</I>
N <I>nat</I>
S <I>string</I>
T <I>type</I>
V <I>number</I>
V - <I>number</I>
</programlisting>
corresponding to the various syntactic categories described above.
Note that a <I>location</I> error argument, while expressed relative
to the
<A HREF="#crt_loc">current file location</A>, does not change this
location.
</para>
</sect3>
<sect3 id="file">
<title>2.4.9. File inclusions</title>
<para>
It is possible to include information on header files within the symbol
table dump. Firstly a number is associated with each directory on
the <code>#include</code> search path:
<programlisting>
<I>path-command</I> :
FD <I>number</I> = <I>string string<SUB>opt</SUB></I>
</programlisting>
The first <I>string</I> gives the directory pathname; the second,
if present, gives the associated directory name as specified in the
<A HREF="man.html#directory"><code>-N</code> command-line option</A>.
</para>
<para>
Now the start and end of each file are marked using:
<programlisting>
<I>file-command</I> :
FS <I>location directory</I>
FE <I>location</I>
</programlisting>
where <I>directory</I> gives the number of the directory in the search
path where the file was found, or <code>*</code> if the file was found
by other means. It is worth noting that if, for example, a function
definition is the last item in a file, the <code>FE</code> command
will appear in the symbol table dump before the <code>QFE</code> command
for the end of the function definition. This is because lexical analysis,
where the end of file is detected, takes place before parsing, where
the end of function is detected.
</para>
<para>
A <code>#include</code> directive, whether explicit or implicit, can
be represented using:
<programlisting>
<I>include-command</I> :
FIA <I>location string</I>
FIQ <I>location string</I>
FIN <I>location string</I>
FIS <I>location string</I>
FIE <I>location string</I>
FIR <I>location</I>
</programlisting>
the first three corresponding to header names of the forms
<code><....></code>, <code>"...."</code> and <code>[....]</code>
respectively, the next two corresponding to <A HREF="man.html#start-up">start-up
</A>
and <A HREF="man.html#end-up">end-up</A> files, and the final form
being used to resume the original file after the <code>#include</code>
directive has been processed.
</para>
</sect3>
<sect3 id="string-literals">
<title>2.4.10. String literals</title>
<para>
It is possible to dump information on string literals to the symbol
table dump file using the commands:
<programlisting>
<I>string-command</I> :
A <I>location string</I>
AC <I>location string</I>
AL <I>location string</I>
ACL <I>location string</I>
</programlisting>
representing string literals, character literals, wide string literals
and wide character literals respectively. The given <I>string</I>
gives the string text.
</para>
</sect3>
</sect2>
<sect2>
<title>2.5. Intermodule analysis</title>
<para>
<IMG SRC="../images/warn.gif" ALT="warning"/>
The C++ spec linking routines have not yet been completely implemented,
and so are disabled in the current version of the C++ producer.
</para>
<para>
A C++ spec file is a dump of the C++ producer's <A HREF="alg.html">internal
representation</A> of a translation unit. Such files can be written
to, and read from, disk to perform such operations as intermodule
analysis.
</para>
<para>
Note that the format of a C++ spec file is specific to the C++ producer
and may change between releases to reflect modifications in the internal
type system. The C producer has a similar dump format, called a C
spec file, however the two are incompatible. If intermodule analysis
between C and C++ source files is required then the <A HREF="dump.html">symbol
table dump</A> format should be used.
</para>
</sect2>
<sect2>
<title>2.6. Implementation details</title>
<para>
This section describes various of the implementation details of the
C++ producer TDF output. In particular it describes the standard
TDF tokens used to represent the target dependent aspects of the language
and to provide links into the run-time system. Many of these tokens
are common to the C and C++ producers. Those which are unique to
the C++ producer have names of the form <code>~cpp.*</code>. Note
that the description is in terms of TDF tokens, not the internal tokens
introduced by the
<A HREF="token.html"><code>#pragma token</code> syntax</A>.
</para>
<para>
There are two levels of implementation in the run-time system. The
actual interface between the producer and the run-time system is given
by the standard tokens. The provided implementation defines these
tokens in a way appropriate to itself. An alternative implementation
would have to define the tokens differently. It is intended that
the standard tokens are sufficiently generic to allow a variety of
implementations to hook into the producer output in the manner they
require.
</para>
<sect3 id="arith">
<title>2.6.1. Arithmetic types</title>
<para>
The representations of the basic arithmetic types are target dependent,
so, for example, an <code>int</code> may contain 16, 32, 64 or some
other number of bits. Thus it is necessary to introduce a token to
stand for each of the built-in arithmetic types (including the
<A HREF="pragma.html#longlong"><code>long long</code> types</A>).
Each integral type is represented by a <code>VARIETY</code> token
as follows: </para>
<table>
<tr><th>Type</th>
<th>Token</th>
<th>Encoding</th>
</tr>
<tr><td>char</td>
<td>~char</td>
<td>0</td>
</tr>
<tr><td>signed char</td>
<td>~signed_char</td>
<td>0 | 4 = 4</td>
</tr>
<tr><td>unsigned char</td>
<td>~unsigned_char</td>
<td>0 | 8 = 8</td>
</tr>
<tr><td>signed short</td>
<td>~signed_short</td>
<td>1 | 4 = 5</td>
</tr>
<tr><td>unsigned short</td>
<td>~unsigned_short</td>
<td>1 | 8 = 9</td>
</tr>
<tr><td>signed int</td>
<td>~signed_int</td>
<td>2 | 4 = 6</td>
</tr>
<tr><td>unsigned int</td>
<td>~unsigned_int</td>
<td>2 | 8 = 10</td>
</tr>
<tr><td>signed long</td>
<td>~signed_long</td>
<td>3 | 4 = 7</td>
</tr>
<tr><td>unsigned long</td>
<td>~unsigned_long</td>
<td>3 | 8 = 11</td>
</tr>
<tr><td>signed long long</td>
<td>~signed_longlong</td>
<td>3 | 4 | 16 = 23 </td>
</tr>
<tr><td>unsigned long long</td>
<td>~unsigned_longlong</td>
<td>3 | 8 | 16 = 27</td>
</tr>
</table>
<para>
Similarly each floating point type is represent by a
<code>FLOATING_VARIETY</code> token:
</para>
<table>
<tr><th>Type</th> <th>Token</th>
</tr>
<tr><td>float</td> <td>~float</td>
</tr>
<tr><td>double</td> <td>~double</td>
</tr>
<tr><td>long double</td> <td>~long_double</td>
</tr>
</table>
<para>
Each integral type also has an encoding as a <code>SIGNED_NAT</code>
as shown above. This number is a bit pattern built up from the following
values:
</para>
<table>
<tr><th>Type</th> <th>Encoding</th>
</tr>
<tr><td>char</td> <td>0</td>
</tr>
<tr><td>short</td> <td>1</td>
</tr>
<tr><td>int</td> <td>2</td>
</tr>
<tr><td>long</td> <td>3</td>
</tr>
<tr><td>signed</td> <td>4</td>
</tr>
<tr><td>unsigned</td> <td>8</td>
</tr>
<tr><td>long long</td> <td>16</td>
</tr>
</table>
<para>
Any target dependent integral type can be represented by a
<code>SIGNED_NAT</code> token using this encoding. This representation,
rather than one based on <code>VARIETY</code>s, is used for ease of
manipulation. The token:
<programlisting>
~convert : ( SIGNED_NAT ) -> VARIETY
</programlisting>
gives the mapping from the integral encoding to the representing variety.
For example, it will map <code>6</code> to <code>~signed_int</code>.
</para>
<para>
The token:
<programlisting>
~promote : ( SIGNED_NAT ) -> SIGNED_NAT
</programlisting>
describes how to form the promotion of an integral type according
to the ISO C/C++ value preserving rules, and is used by the producer
to represent target dependent promotion types. For example, the promotion
of <code>unsigned short</code> may be <code>int</code> or <code>unsigned
int</code> depending on the representation of these types; that is
to say, <code>~promote ( 9 )</code> will be <code>6</code> on some
machines and <code>10</code> on others. Although <code>~promote</code>
is used by default, a program may specify another token with the same
sort signature to be used in its place by means of the directive:
<programlisting>
#pragma TenDRA compute promote <I>identifier</I>
</programlisting>
For example, a standard token <code>~sign_promote</code> is defined
which gives the older C sign preserving promotion rules. In addition,
the promotion of an individual type can be specified using:
<programlisting>
#pragma TenDRA promoted <I>type-id</I> : <I>promoted-type-id</I>
</programlisting>
</para>
<para>
The token:
<programlisting>
~arith_type : ( SIGNED_NAT, SIGNED_NAT ) -> SIGNED_NAT
</programlisting>
similarly describes how to form the usual arithmetic result type from
two promoted integral operand types. For example, the arithmetic
type of <code>long</code> and <code>unsigned int</code> may be
<code>long</code> or <code>unsigned long</code> depending on the representation
of these types; that is to say,
<code>~arith_type ( 7, 10 )</code> will be <code>7</code> on some
machines and <code>11</code> on others.
</para>
<para>
Any tokenised type declared using:
<programlisting>
#pragma token VARIETY v # tv
</programlisting>
will be represented by a <code>SIGNED_NAT</code> token with external
name
<code>tv</code> corresponding to the encoding of <code>v</code>.
Special cases of this are the implementation dependent integral types
which arise naturally within the language. The external token names
for these types are given below:
</para>
<table>
<tr><th>Type</th> <th>Token</th>
</tr>
<tr><td>bool</td> <td>~cpp.bool</td>
</tr>
<tr><td>ptrdiff_t</td> <td>ptrdiff_t</td>
</tr>
<tr><td>size_t</td> <td>size_t</td>
</tr>
<tr><td>wchar_t</td> <td>wchar_t</td>
</tr>
</table>
<para>
So, for example, a <code>sizeof</code> expression has shape
<code>~convert ( size_t )</code>. The token <code>~cpp.bool</code>
is defined in the default implementation, but the other tokens are
defined according to their definitions on the target machine in the
normal API library building mechanism.
</para>
</sect3>
<sect3 id="literal">
<title>2.6.2. Integer literal types</title>
<para>
The <A HREF="pragma.html#int">type of an integer literal</A> is defined
in terms of the first in a list of possible integral types. The first
type in which the literal value can be represented gives the type
of the literal. For small literals it is possible to work out the
type exactly, however for larger literals the result is target dependent.
For example, the literal <code>50000</code> will have type <code>int</code>
on machines in which <code>50000</code> fits into an <code>int</code>,
and
<code>long</code> otherwise. This target dependent mapping is given
by a series of tokens of the form:
<programlisting>
~lit_* : ( SIGNED_NAT ) -> SIGNED_NAT
</programlisting>
which map a literal value to the representation of an integral type.
The token used depends on the list of possible types, which in turn
depends on the base used to represent the literal and the integer
suffix used, as given in the following table:
</para>
<table>
<tr><th>Base</th>
<th>Suffix</th>
<th>Token</th>
<th>Types</th>
</tr>
<tr><td>decimal</td>
<td>none</td>
<td>~lit_int</td>
<td>int, long, unsigned long</td>
</tr>
<tr><td>octal</td>
<td>none</td>
<td>~lit_hex</td>
<td>int, unsigned int, long, unsigned long</td>
</tr>
<tr><td>hexadecimal</td>
<td>none</td>
<td>~lit_hex</td>
<td>int, unsigned int, long, unsigned long</td>
</tr>
<tr><td>any</td>
<td>U</td>
<td>~lit_unsigned</td>
<td>unsigned int, unsigned long</td>
</tr>
<tr><td>any</td>
<td>L</td>
<td>~lit_long</td>
<td>long, unsigned long</td>
</tr>
<tr><td>any</td>
<td>UL</td>
<td>~lit_ulong</td>
<td>unsigned long</td>
</tr>
<tr><td>any</td>
<td>LL</td>
<td>~lit_longlong</td>
<td>long long, unsigned long long</td>
</tr>
<tr><td>any</td>
<td>ULL</td>
<td>~lit_ulonglong</td>
<td>unsigned long long</td>
</tr>
</table>
<para>
Thus, for example, the shape of the integer literal 50000 is:
<programlisting>
~convert ( ~lit_int ( 50000 ) )
</programlisting>
</para>
</sect3>
<sect3 id="bitfield">
<title>2.6.3. Bitfield types</title>
<para>
The sign of a plain bitfield type, declared without using
<code>signed</code> or <code>unsigned</code>, is left unspecified
in C and C++. The token:
<programlisting>
~cpp.bitf_sign : ( SIGNED_NAT ) -> BOOL
</programlisting>
is used to give a mapping from integral types to the sign of a plain
bitfield of that type, in a form suitable for use in the TDF
<code>bfvar_bits</code> construct. (Note that <code>~cpp.bitf_sign</code>
should have been a standard C token but was omitted.)
</para>
</sect3>
<sect3 id="pointer">
<title>2.6.4. Generic pointers</title>
<para>
TDF has no concept of a generic pointer type, so tokens are used to
defer the representation of <code>void *</code> and the basic operations
on it to the target machine. The fundamental token is:
<programlisting>
~ptr_void : () -> SHAPE
</programlisting>
which gives the representation of <code>void *</code>. This shape
will be denoted by <code>pv</code> in the description of the following
tokens. It is not guaranteed that <code>pv</code> is a TDF <code>pointer</code>
shape, although normally it will be implemented as a pointer to a
suitable alignment.
</para>
<para>
The token:
<programlisting>
~null_pv : () -> EXP pv
</programlisting>
gives the value of a null pointer of type <code>void *</code>. Generic
pointers can also be converted to and from other pointers. These
conversions are represented by the tokens:
<programlisting>
~to_ptr_void : ( ALIGNMENT a, EXP POINTER a ) -> EXP pv
~from_ptr_void : ( ALIGNMENT a, EXP pv ) -> EXP POINTER a
</programlisting>
where the given alignment describes the destination or source pointer
type. Finally a generic pointer may be tested against the null pointer
or two generic pointers may be compared. These operations are represented
by the tokens:
<programlisting>
~pv_test : ( EXP pv, LABEL, NTEST ) -> EXP TOP
~cpp.pv_compare : ( EXP pv, EXP pv, LABEL, NTEST ) -> EXP TOP
</programlisting>
where the given <code>NTEST</code> gives the comparison to be applied
and the given label gives the destination to jump to if the test fails.
(Note that <code>~cpp.pv_compare</code> should have been a standard
C token but was omitted.)
</para>
</sect3>
<sect3 id="undefined-conversions">
<title>2.6.5. Undefined conversions</title>
<para>
Several conversions in C and C++ can only be represented by undefined
TDF. For example, converting a pointer to an integer can only be
represented in TDF by forming a union of the pointer and integer shapes,
putting the pointer into the union and pulling the integer out. Such
conversions are tokenised. Undefined conversions not mentioned below
may be performed by combining those given with the standard, well-defined,
conversions.
</para>
<para>
The token:
<programlisting>
~ptr_to_ptr : ( ALIGNMENT a, ALIGNMENT b, EXP POINTER a ) -> EXP POINTER b
</programlisting>
is used to convert between two incompatible pointer types. The first
alignment describes the source pointer shape while the second describes
the destination pointer shape. Note that if the destination alignment
is greater than the source alignment then the source pointer can be
used in most TDF constructs in place of the destination pointer, so
the use of <code>~ptr_to_ptr</code> can be omitted (the exception
is
<code>pointer_test</code> which requires equal alignments). Base
class pointer conversions are examples of these well-behaved, alignment
preserving conversions.
</para>
<para>
The tokens:
<programlisting>
~f_to_pv : ( EXP PROC ) -> EXP pv
~pv_to_f : ( EXP pv ) -> EXP PROC
</programlisting>
are used to convert pointers to functions to and from <code>void *</code>
(these conversions are not allowed in ISO C/C++ but are in older dialects).
</para>
<para>
The tokens:
<programlisting>
~i_to_p : ( VARIETY v, ALIGNMENT a, EXP INTEGER v ) -> EXP POINTER a
~p_to_i : ( ALIGNMENT a, VARIETY v, EXP POINTER a ) -> EXP INTEGER v
~i_to_pv : ( VARIETY v, EXP INTEGER v ) -> EXP pv
~pv_to_i : ( VARIETY v, EXP pv ) -> EXP INTEGER v
</programlisting>
are used to convert integers to and from <code>void *</code> and other
pointers.
</para>
</sect3>
<sect3 id="div">
<title>2.6.6. Integer division</title>
<para>
The precise form of the integer division and remainder operations
in C and C++ is left unspecified with respect to the sign of the result
if either operand is negative. The tokens:
<programlisting>
~div : ( EXP INTEGER v, EXP INTEGER v ) -> EXP INTEGER v
~rem : ( EXP INTEGER v, EXP INTEGER v ) -> EXP INTEGER v
</programlisting>
are used to represent integer division and remainder. They will map
onto one of the pairs of TDF constructs, <code>div0</code> and <code>rem0</code>,
<code>div1</code> and <code>rem1</code> or <code>div2</code> and
<code>rem2</code>.
</para>
</sect3>
<sect3 id="call">
<title>2.6.7. Calling conventions</title>
<para>
The function calling conventions used by the C++ producer are essentially
the same as those used by the C producer with one exception. That
is to say, all types except arrays are passed by value (note that
individual installers may modify these conventions to conform to their
own ABIs).
</para>
<para>
The exception concerns classes with a non-trivial constructor, destructor
or assignment operator. These classes are passed as function arguments
by taking a reference to a copy of the object (although it is often
possible to eliminate the copy and pass a reference to the object
directly). They are passed as function return values by adding an
extra parameter to the start of the function parameters giving a reference
to a location into which the return value should be copied.
</para>
<H4>Member functions</H4>
<para>
Non-static member functions are implemented in the obvious fashion,
by passing a pointer to the object the method is being applied to
as the first argument (or the second argument if the method has an
extra argument for its return value).
</para>
<H4><A id="ellipsis">Ellipsis functions</A></H4>
<para>
Calls to functions declared with ellipses are via the
<code>apply_proc</code> TDF construct, with all the arguments being
treated as non-variable. However the definition of such a function
uses the <code>make_proc</code> construct with a variable parameter.
This parameter can be referred to within the program using the
<A HREF="pragma.html#ellipsis"><code>...</code> expression</A>. The
type of this expression is given by the built-in token:
<programlisting>
~__va_t : () -> SHAPE
</programlisting>
The <code>va_start</code> macro declared in the
<code><stdarg.h></code> header then describes how the variable
parameter (expressed as <code>...</code>) can be converted to an expression
of type <code>va_list</code> suitable for use in the
<code>va_arg</code> macro.
</para>
<para>
Note that the variable parameter is in effect only being used to determine
where the first optional parameter is defined. The assumption is
that all such parameters are located contiguously on the stack, however
the fact that calls to such functions do not use the variable parameter
mechanism means that this is not automatically the case. Strictly
speaking this means that the implementation of ellipsis functions
uses undefined behaviour in TDF, however given the non-type-safe function
calling rules in C this is unavoidable and installers need to make
provision for such calls (by dumping any parameters from registers
to the stack if necessary). Given the theoretically type-safe nature
of C++ it would be possible to avoid such undefined behaviour, but
the need for C-compatible calling conventions prevents this.
</para>
</sect3>
<sect3 id="ptr_mem">
<title>2.6.8. Pointers to data members</title>
<para>
The representation of, and operations on, pointers to data members
are represented by tokens to allow for a variety of implementations.
It is assumed that all pointers to data members (as opposed to pointers
to function members) are represented by the same shape:
<programlisting>
~cpp.pm.type : () -> SHAPE
</programlisting>
This shape will be denoted by <code>pm</code> in the description of
the following tokens.
</para>
<para>
There are two basic methods of constructing a pointer to a data member.
The first is to take the address of a data member of a class. A data
member is represented in TDF by an expression which gives the offset
of the member from the start of its enclosing <code>compound</code>
shape (note that it is not possible to take the address of a member
of a virtual base). The mapping from this offset to a pointer to a
data member is given by:
<programlisting>
~cpp.pm.make : ( EXP OFFSET ) -> EXP pm
</programlisting>
The second way of constructing a pointer to a data member is to use
a null pointer to member:
<programlisting>
~cpp.pm.null : () -> EXP pm
</programlisting>
The other fundamental operation on a pointer to data member is to
turn it back into an offset expression which can be added to a pointer
to a class to access a member of that class in a <code>.*</code> or
<code>->*</code>
operation. This is done by the token:
<programlisting>
~cpp.pm.offset : ( EXP pm, ALIGNMENT a ) -> EXP OFFSET ( a, a )
</programlisting>
Note that it is necessary to specify an alignment in order to describe
the shape of the result. The value of this token is undefined if
the given expression is a null pointer to data member.
</para>
<para>
A pointer to a data member of a non-virtual base class can be converted
to a pointer to a data member of a derived class. The reverse conversion
is also possible using <code>static_cast</code>. If the base is a
<A HREF="#primary">primary base class</A> then these conversions are
trivial and have no effect. Otherwise null pointers to data members
are converted to null pointers to data members, and the non-null cases
are handled by the tokens:
<programlisting>
~cpp.pm.cast : ( EXP pm, EXP OFFSET ) -> EXP pm
~cpp.pm.uncast : ( EXP pm, EXP OFFSET ) -> EXP pm
</programlisting>
where the given offset is the offset of the base class within the
derived class. It is also possible to convert between any two pointers
to data members using <code>reinterpret_cast</code>. This conversion
is implied by the equality of representation between any two pointers
to data members and has no effect.
</para>
<para>
The only remaining operations on pointer to data members are to test
one against the null pointer to data member and to compare two pointer
to data members. These are represented by the tokens:
<programlisting>
~cpp.pm.test : ( EXP pm, LABEL, NTEST ) -> EXP TOP
~cpp.pm.compare : ( EXP pm, EXP pm, LABEL, NTEST ) -> EXP TOP
</programlisting>
where the given <code>NTEST</code> gives the comparison to be applied
and the given label gives the destination to jump to if the test fails.
</para>
<para>
In the default implementation, pointers to data members are implemented
as <code>int</code>. The null pointer to member is represented by
0 and the address of a class member is represented by 1 plus the offset
of the member (in bytes). Casting to and from a derived class then
correspond to adding or subtracting the base class offset (in bytes),
and pointer to member comparisons correspond to integer comparisons.
</para>
</sect3>
<sect3 id="ptr_mem_func">
<title>2.6.9. Pointers to function members</title>
<para>
As with pointers to data members, pointers to function members and
the operations on them are represented by tokens to allow for a range
of implementations. All pointers to function members are represented
by the same shape:
<programlisting>
~cpp.pmf.type : () -> SHAPE
</programlisting>
This shape will be denoted by <code>pmf</code> in the description
of the following tokens. Many of the tokens take an expression which
has a shape which is a pointer to the alignment of <code>pmf</code>.
This will be denoted by <code>ppmf</code>.
</para>
<para>
There are two basic methods for constructing a pointer to a function
member. The first is to take the address of a non-static member function
of a class. There are two cases, depending on whether or not the
member function is virtual. The non-virtual case is given by the
token:
<programlisting>
~cpp.pmf.make : ( EXP PROC, EXP OFFSET, EXP OFFSET ) -> EXP pmf
</programlisting>
where the first argument is the address of the corresponding function,
the second argument gives any base class offset which is to be added
when calling this function (to deal with inherited member functions),
and the third argument is a zero offset.
</para>
<para>
For virtual functions, a pointer to function member of the form above
is entered in the <A HREF="#vtable">virtual function table</A> for
the corresponding class. The actual pointer to the virtual function
member then gives a reference into the virtual function table as follows:
<programlisting>
~cpp.pmf.vmake : ( SIGNED_NAT, EXP OFFSET, EXP, EXP ) -> EXP pmf
</programlisting>
where the first argument gives the index of the function within the
virtual function table, the second argument gives the offset of the
<I>vptr</I> field within the class, and the third and fourth arguments
are zero offsets.
</para>
<para>
The second way of constructing a pointer to a function member is to
use a null pointer to function member:
<programlisting>
~cpp.pmf.null : () -> EXP pmf
~cpp.pmf.null2 : () -> EXP pmf
</programlisting>
For technical reasons there are two versions of this token, although
they have the same value. The first token is used in static initialisers;
the second token is used in other expressions. </para>
<para>
The cast operations on pointers to function members are more complex
than those on pointers to data members. The value to be cast is copied
into a temporary and one of the tokens:
<programlisting>
~cpp.pmf.cast : ( EXP ppmf, EXP OFFSET, EXP, EXP OFFSET ) -> EXP TOP
~cpp.pmf.uncast : ( EXP ppmf, EXP OFFSET, EXP, EXP OFFSET ) -> EXP TOP
</programlisting>
is applied to modify the value of the temporary according to the given
cast. The first argument gives the address of the temporary, the
second gives the base class offset to be added or subtracted, the
third gives the number to be added or subtracted to convert virtual
function indexes for the base class into virtual function indexes
for the derived class, and the fourth gives the offset of the <I>vptr</I>
field within the class. Again, the ability to use <code>reinterpret_cast</code>
to convert between any two pointer to function member types arises
because of the uniform representation of these types.
</para>
<para>
As with pointers to data members, there are tokens implementing comparisons
on pointers to function members:
<programlisting>
~cpp.pmf.test : ( EXP ppmf, LABEL, NTEST ) -> EXP TOP
~cpp.pmf.compare : ( EXP ppmf, EXP ppmf, LABEL, NTEST ) -> EXP TOP
</programlisting>
Note however that the arguments are passed by reference.
</para>
<para>
The most important, and most complex, operation is calling a function
through a pointer to function member. The first step is to copy the
pointer to function member into a temporary. The token:
<programlisting>
~cpp.pmf.virt : ( EXP ppmf, EXP, ALIGNMENT ) -> EXP TOP
</programlisting>
is then applied to the temporary to convert a pointer to a virtual
function member to a normal pointer to function member by looking
it up in the corresponding virtual function table. The first argument
gives the address of the temporary, the second gives the object to
which the function is to be applied, and the third gives the alignment
of the corresponding class. Now the base class conversion to be applied
to the object can be determined by applying the token:
<programlisting>
~cpp.pmf.delta : ( ALIGNMENT a, EXP ppmf ) -> EXP OFFSET ( a, a )
</programlisting>
to the temporary to find the offset to be added. Finally the function
to be called can be extracted from the temporary using the token:
<programlisting>
~cpp.pmf.func : ( EXP ppmf ) -> EXP PROC
</programlisting>
The function call then procedes as normal.
</para>
<para>
The default implementation is that described in the ARM, where each
pointer to function member is represented in the form:
<programlisting>
struct PTR_MEM_FUNC {
short delta ;
short index ;
union {
void ( *func ) () ;
short off ;
} u ;
} ;
</programlisting>
The <code>delta</code> field gives the base class offset (in bytes)
to be added before applying the function. The <code>index</code>
field is 0 for null pointers, -1 for non-virtual function pointers
and the index into the virtual function table for virtual function
pointers (as described below these indexes start from 1). For non-virtual
function pointers the function itself is given by the <code>u.func</code>
field. For virtual function pointers the offset of the <I>vptr</I>
field within the class is given by the <code>u.off</code> field.
</para>
</sect3>
<sect3 id="class">
<title>2.6.10. Class layout</title>
<para>
Consider a class with no base classes:
<programlisting>
class A {
// A's members
} ;
</programlisting>
Each object of class <I>A</I> needs its own copy of the non-static
data members of <I>A</I> and, for polymorphic types, a means of referencing
the virtual function table and run-time type information for <I>A</I>.
This is accomplished using a layout of the form:
<IMG SRC="../images/class.gif" ALT="class A"/>
where the <I>A</I> component consists of the non-static data members
and
<I>vptr A</I> is a pointer to the virtual function table for <I>A</I>.
For non-polymorphic classes the <I>vptr A</I> field is omitted; otherwise
space for <I>vptr A</I> needs to be allocated within the class and
the pointer needs to be initialised in each constructor for <I>A</I>.
The precise layout of the <A HREF="#vtable">virtual function table</A>
and the <A HREF="#rtti">run-time type information</A> is given below.
</para>
<para>
Two alternative ways of laying out the non-static data members within
the class are implemented. The first, which is default, gives them
in the order in which they are declared in the class definition.
The second lays out the <code>public</code>, the <code>protected</code>,
and the <code>private</code> members in three distinct sections, the
members within each section being given in the order in which they
are declared. The latter can be enabled using the <code>-jo</code>
command-line option.
</para>
<para>
The offset of each member within the class (including <I>vptr A</I>)
can be calculated in terms of the offset of the previous member.
The first member has offset zero. The offset of any other member
is given by the offset of the previous member plus the size of the
previous member, rounded up to the alignment of the current member.
The overall size of the class is given by the offset of the last member
plus the size of the last member, rounded up using the token:
<programlisting>
~comp_off : ( EXP OFFSET ) -> EXP OFFSET
</programlisting>
which allows for any target dependent padding at the end of the class.
The shape of the class is then a <code>compound</code> shape with
this offset.
</para>
<para>
Classes with no members need to be treated slightly differently.
The shape of such a class is given by the token:
<programlisting>
~cpp.empty.shape : () -> SHAPE
</programlisting>
(recall that an empty class still has a nonzero size). The token:
<programlisting>
~cpp.empty.offset : () -> EXP OFFSET
</programlisting>
is used to represent the offset required for an empty class when it
is used as a base class. This may be a zero offset.
</para>
<para>
Bitfield members provide a slight complication to the picture above.
The offset of a bitfield is additionally padded using the token:
<programlisting>
~pad : ( EXP OFFSET, SHAPE, SHAPE ) -> EXP OFFSET
</programlisting>
where the two shapes give the type underlying the bitfield and the
bitfield itself.
</para>
<para>
The layout of unions is similar to that of classes except that all
members have zero offset, and the size of the union is the maximum
of the sizes of its members, suitably padded. Of course unions cannot
be polymorphic and cannot have base classes.
</para>
<para>
Pointers to incomplete classes are represented by means of the alignment:
<programlisting>
~cpp.empty.align : () -> ALIGNMENT
</programlisting>
This token is also used for the alignment of a complete class if that
class is never used in the generated TDF in a manner which requires
it to be complete. This can lead to savings on the size of the generated
code by preventing the need to define all the member offset tokens
in order to find the shape of the class.
</para>
</sect3>
<sect3 id="derive">
<title>2.6.11. Derived class layout</title>
<para>
The description of the implementation of derived classes will be given
in terms of the example class hierarchy given by:
<programlisting>
class A {
// A's members
} ;
class B : public A {
// B's members
} ;
class C : public A {
// C's members
} ;
class D : public B, public C {
// D's members
} ;
</programlisting>
or, as a directed acyclic graph:
</para>
<IMG SRC="../images/graph.gif" ALT="class D"/>
<H4>Single inheritance</H4>
<para>
The layout of class <I>A</I> is given by:
<IMG SRC="../images/classA.gif" ALT="class A"/>
as above. Class <I>B</I> inherits all the members of class <I>A</I>
plus those members explicitly declared within class <I>B</I>. In
addition, class <I>B</I> inherits all the virtual member functions
of <I>A</I>, some of which may be overridden in <I>B</I>, extended
by any additional virtual functions declared in <I>B</I>. This may
be represented as follows:
<IMG SRC="../images/classB.gif" ALT="class B"/>
where <I>A</I> denotes those members inherited from the base class
and
<I>B</I> denotes those members added in the derived class. Note that
an object of class <I>B</I> contains a sub-object of class <I>A</I>.
The fact that this sub-object is located at the start of <I>B</I>
means that the base class conversion from <I>B</I> to <I>A</I> is
trivial. Any base class with this property is called a
<A id="primary">primary base class</A>.
</para>
<para>
Note that in theory two virtual function tables are required, the
normal virtual function table for <I>B</I>, denoted by <I>vtbl B</I>,
and a modified virtual function table for <I>A</I>, denoted by <I>vtbl
B::A</I>, taking into account any overriding virtual functions within
<I>B</I>, and pointing to <I>B</I>'s run-time type information. This
latter means that the dynamic type information for the <I>A</I> sub-object
relates to
<I>B</I> rather than <I>A</I>. However these two tables can usually
be combined - if the virtual functions added in <I>B</I> are listed
in the virtual function table after those inherited from <I>A</I>
and the form of the overriding is <A HREF="#override">suitably well
behaved</A>
(in the sense defined below) then <I>vptr B::A</I> is an initial segment
of <I>vptr B</I>. It is also possible to remove the <I>vptr B</I>
field and use <I>vptr B::A</I> in its place in this case (it has to
be this way round to preserve the <I>A</I> sub-object). Thus the
items shaded in the diagram can be removed.
</para>
<para>
The class <I>C</I> is similarly given by:
<IMG SRC="../images/classC.gif" ALT="class C"/>
</para>
<H4>Multiple inheritance</H4>
<para>
Class <I>D</I> is more complex because of the presence of multiple
inheritance. <I>D</I> inherits all the members of <I>B</I>, including
those which <I>B</I> inherits from <I>A</I>, plus all the members
of
<I>C</I>, including those which <I>C</I> inherits from <I>A</I>.
It also inherits all of the virtual member functions from <I>B</I>
and
<I>C</I>, some of which may be overridden in <I>D</I>, extended by
any additional virtual functions declared in <I>D</I>. This may be
represented as follows:
<IMG SRC="../images/classD.gif" ALT="class D"/>
Note that there are two copies of <I>A</I> in <I>D</I> because virtual
inheritance has not been used.
</para>
<para>
The <I>B</I> base class of <I>D</I> is essentially similar to the
single inheritance case already discussed; the <I>C</I> base class
is different however. Note firstly that the <I>C</I> sub-object of
<I>D</I> is located at a non-zero offset, <I>delta D::C</I>, from
the start of the object. This means that the base class conversion
from <I>D</I> to <I>C</I>
consists of adding this offset (for pointer conversions things are
further complicated by the need to allow for null pointers). Also
<I>vtbl D::C</I> is not an initial segment of <I>vtbl D</I> because
this contains the virtual functions inherited from <I>B</I> first,
followed by those inherited from <I>C</I>, followed by those first
declared in <I>D</I> (there are <A HREF="#override">other reasons</A>
as well). Thus <I>vtbl D::C</I> cannot be eliminated.
</para>
<H4>Virtual inheritance</H4>
<para>
Virtual inheritance introduces a further complication. Now consider
the class hierarchy given by:
<programlisting>
class A {
// A's members
} ;
class B : virtual public A {
// B's members
} ;
class C : virtual public A {
// C's members
} ;
class D : public B, public C {
// D's members
} ;
</programlisting>
or, as a <A id="diamond">directed acyclic graph</A>:
<IMG SRC="../images/diamond.gif" ALT="class D"/>
As before <I>A</I> is given by:
<IMG SRC="../images/classA.gif" ALT="class A"/>
but now <I>B</I> is given by:
<IMG SRC="../images/virtualB.gif" ALT="class B"/>
Rather than having the sub-object of class <I>A</I> directly as part
of
<I>B</I>, the class now contains a pointer, <I>ptr A</I>, to this
sub-object. The virtual sub-objects are always located at the end
of a class layout; their offset may therefore vary for different objects,
however the offset for <I>ptr A</I> is always fixed. The <I>ptr A</I>
field is initialised in each constructor for <I>B</I>. In order to
perform the base class conversion from <I>B</I> to <I>A</I>, the contents
of <I>ptr A</I> are taken (again provision needs to be made for null
pointers in pointer conversions). In cases when the dynamic type
of the <I>B</I> object can be determined statically it is possible
to access the <I>A</I> sub-object directly by adding a suitable offset.
Because this conversion is non-trivial (see <A HREF="#override">below</A>)
the virtual function table <I>vtbl B::A</I> is not an initial segment
of
<I>vtbl B</I> and cannot be eliminated.
</para>
<para>
The class <I>C</I> is similarly given by:
<IMG SRC="../images/virtualC.gif" ALT="class C"/>
Now the class <I>D</I> is given by:
<IMG SRC="../images/virtualD.gif" ALT="class D"/>
Note that there is a single <I>A</I> sub-object of <I>D</I> referenced
by the <I>ptr A</I> fields in both the <I>B</I> and <I>C</I> sub-objects.
The elimination of <I>vtbl D::B</I> is as above.
</para>
</sect3>
<sect3 id="constr">
<title>2.6.12. Constructors and destructors</title>
<para>
The implementation of constructors and destructors, whether explicitly
or implicitly defined, is slightly more complex than that of other
member functions. For example, the constructors need to set up the
internal <I>vptr</I> and <I>ptr</I> fields mentioned above.
</para>
<para>
The order of initialisation in a constructor is as follows:
<itemizedlist>
<listitem>The internal <I>ptr</I> fields giving the locations of the virtual
base classes are initialised.
</listitem>
<listitem>The constructors for the virtual base classes are called.
</listitem>
<listitem>The constructors for the non-virtual direct base classes are called.
</listitem>
<listitem>The internal <I>vptr</I> fields giving the locations of the virtual
function tables are initialised.
</listitem>
<listitem>The constructors for the members of the class are called.
</listitem>
<listitem>The main constructor body is executed.
</listitem>
</itemizedlist>
To ensure that each virtual base is only initialised once, if a class
has a virtual base class then all its constructors have an implicit
extra parameter of type <code>int</code>. The first two steps above
are then only applied if this flag is nonzero. In normal applications
of the constructor this argument will be 1, however in base class
initialisations such as those in the third and fourth steps above,
it will be 0.
</para>
<para>
Note that similar steps to protect virtual base classes are not taken
in an implicitly declared <code>operator=</code> function. The order
of assignment in this case is as follows:
<itemizedlist>
<listitem>The assignment operators for the direct base classes (both virtual
and non-virtual) are called.
</listitem>
<listitem>The assignment operators for the members of the class are called.
</listitem>
<listitem>A reference to the object assigned to (i.e. <code>*this</code>)
is returned.
</listitem>
</itemizedlist>
</para>
<para>
The order of destruction in a destructor is essentially the reverse
of the order of construction:
<itemizedlist>
<listitem>The main destructor body is executed.
</listitem>
<listitem>The destructor for the members of the class are called.
</listitem>
<listitem>The internal <I>vptr</I> fields giving the locations of the virtual
function tables are re-initialised.
</listitem>
<listitem>The destructors for the non-virtual direct base classes are called.
</listitem>
<listitem>The destructors for the virtual base classes are called.
</listitem>
<listitem>If necessary the space occupied by the object is deallocated.
</listitem>
</itemizedlist>
All destructors have an extra parameter of type <code>int</code>.
The virtual base classes are only destroyed if this flag is nonzero
when and-ed with 2. The space occupied by the object is only deallocated
if this flag is nonzero when and-ed with 1. This deallocation is
equivalent to inserting:
<programlisting>
delete this ;
</programlisting>
in the destructor. The <code>operator delete</code> function is called
via the destructor in this way in order to implement the pseudo-virtual
nature of these deallocation functions. Thus for normal destructor
calls the extra argument is 2, for base class destructor calls it
is 0, and for calls arising from a <code>delete</code> expression
it is 3.
</para>
<para>
The point at which the virtual function tables are initialised in
the constructor, and the fact that they are re-initialised in the
destructor, is to ensure that virtual functions called from base class
initialisers are handled correctly (see ISO C++ 12.7).
</para>
<para>
A further complication arises from the need to destroy
<A id="partial">partially constructed objects</A> if an exception
is thrown in a constructor. A count is maintained of the number of
base classes and members constructed within a constructor. If an
exception is thrown then it is caught in the constructor, the constructed
base classes and members are destroyed, and the exception is re-thrown.
The count variable is used to determine which bases and members need
to be destroyed.
</para>
<para>
<IMG SRC="../images/warn.gif" ALT="warning"/> These partial destructors
currently do not interact correctly with any exception specification
on the constructor. Exceptions thrown within destructors are not
correctly handled either.
</para>
</sect3>
<sect3 id="vtable">
<title>2.6.13. Virtual function tables</title>
<para>
The virtual functions in a polymorphic class are given in its virtual
function table in the following order: firstly those virtual functions
inherited from its direct base classes (which may be overridden in
the derived class) followed by those first declared in the derived
class in the order in which they are declared. Note that this can
result in virtual functions inherited from virtual base classes appearing
more than once. The virtual functions are numbered from 1 (this is
slightly more convenient than numbering from 0 in the default implementation).
</para>
<para>
The virtual function table for this class has shape:
<programlisting>
~cpp.vtab.type : ( NAT ) -> SHAPE
</programlisting>
the argument being <I>n + 1</I> where <I>n</I> is the number of virtual
functions in the class (there is also a token:
<programlisting>
~cpp.vtab.diag : () -> SHAPE
</programlisting>
which is used in the diagnostic output for a generic virtual function
table). The table is created using the token:
<programlisting>
~cpp.vtab.make : ( EXP pti, EXP OFFSET, NAT, EXP NOF ) -> EXP vt
</programlisting>
where the first expression gives the address of the <A HREF="#rtti">run-time
type information structure</A> for the class, the second expression
gives the offset of the <I>vptr</I> field within the class (i.e. <I>voff</I>),
the integer constant is <I>n + 1</I>, and the final expression is
a
<code>make_nof</code> construct giving information on each of the
<I>n</I>
virtual functions.
</para>
<para>
The information given on each virtual function in this table has the
form of a <A HREF="#ptr_mem_func">pointer to function member</A> formed
using the token:
<programlisting>
~cpp.pmf.make : ( EXP PROC, EXP OFFSET, EXP OFFSET ) -> EXP pmf
</programlisting>
as above, except that the third argument gives the offset of the base
class in virtual function tables such as <I>vtbl B::A</I>. For pure
virtual functions the function pointer in this token is given by:
<programlisting>
~cpp.vtab.pure : () -> EXP PROC
</programlisting>
In the default implementation this gives a function
<code>__TCPPLUS_pure</code> which just calls <code>abort</code>.
</para>
<para>
To avoid duplicate copies of virtual function tables and run-time
type information structures being created, the ARM algorithm is used.
The virtual function table and run-time type information structure
for a class are defined in the module containing the definition of
the first non-inline, non-pure virtual function declared in that class.
If such a function does not exist then duplicate copies are created
in every module which requires them. In the former case the virtual
function table will have an <A HREF="#other">external tag name</A>;
in the latter case it will be an internal tag. This scheme can be
overridden using the <code>-jv</code> command-line option, which causes
local virtual function tables to be output for all classes.
</para>
<para>
Note that the discussion above applies to both simple virtual function
tables, such as <I>vtbl B</I> above, and to those arising from base
classes, such as <I>vtbl B::A</I>. <A id="override">We are now
in a position to precisely determine when <I>vtbl B::A</I> is an initial
segment of <I>vtbl B</I> and hence can be eliminated</A>. Firstly,
<I>A</I> must be the first direct base class of <I>B</I> and cannot
be virtual. This is to ensure both that there are no virtual functions
in <I>vtbl B</I> before those inherited from <I>A</I>, and that the
corresponding base class conversion is trivial so that the pointers
to function members of <I>B</I> comprising the virtual function table
can be equally regarded as pointers to function members of <I>A</I>.
The second requirement is that if a virtual function for <I>A</I>,
<I>f</I>, is overridden in <I>B</I> then the return type for <I>B::f</I>
cannot differ from the return type for <I>A::f</I> by a non-trivial
conversion (recall that ISO C++ allows the return types to differ
by a base class conversion). In the non-trivial conversion case the
function entered in <I>vtbl B::A</I> needs to be, not <I>B::f</I>
as in <I>vtbl B</I>, but a stub function which calls <I>B::f</I> and
converts its return value to the return type of <I>A::f</I>.
</para>
<H4>Calling virtual functions</H4>
<para>
The virtual function call mechanism is implemented using the token:
<programlisting>
~cpp.vtab.func : ( EXP ppvt, SIGNED_NAT ) -> EXP ppmf
</programlisting>
which has as its arguments a reference to the <I>vptr</I> field of
the object the function is to be called for, and the number of the
virtual function to be called. It returns a reference to the corresponding
pointer to function member within the object's virtual function table.
The function is then called by extracting the base class offset to
be added, and the function to be called, from this reference using
the tokens:
<programlisting>
~cpp.pmf.delta : ( ALIGNMENT a, EXP ppmf ) -> EXP OFFSET ( a, a )
~cpp.pmf.func : ( EXP ppmf ) -> EXP PROC
</programlisting>
described as part of the <A HREF="#ptr_mem_func">pointer to function
member call mechanism</A> above.
</para>
</sect3>
<sect3 id="rtti">
<title>2.6.14. Run-time type information</title>
<para>
Each C++ type can be associated with a run-time type information structure
giving information about that type. These type information structures
have shape given by the token:
<programlisting>
~cpp.typeid.type : () -> SHAPE
</programlisting>
which corresponds to the representation for the standard type
<code>std::type_info</code> declared in the header
<code><typeinfo></code>. Each type information structure consists
of a tag number, giving information on the kind of type represented,
a string literal, giving the name of the type, and a pointer to a
list of base type information structures. These are combined to give
a type information structure using the token:
<programlisting>
~cpp.typeid.make : ( SIGNED_NAT, EXP, EXP ) -> EXP ti
</programlisting>
Each base type information structure has shape given by the token:
<programlisting>
~cpp.baseid.type : () -> SHAPE
</programlisting>
It consists of a pointer to a type information structure, an expression
used to describe the offset of a base class, a pointer to the next
base type information structure in the list, and two integers giving
information on type qualifiers etc. These are combined to give a
base type information structure using the token:
<programlisting>
~cpp.baseid.make : ( EXP, EXP, EXP, SIGNED_NAT, SIGNED_NAT ) -> EXP bi
</programlisting>
</para>
<para>
The following table gives the various tag numbers used in type information
structures plus a list of the base type information structures associated
with each type. Macros giving these tag numbers are provided in the
default implementation in a header, <code>interface.h</code>, which
is shared by the C++ producer.
</para>
<para>
<table>
<tr><th>Type</th>
<th>Form</th>
<th>Tag</th>
<th>Base information</th>
</tr>
<tr><td>integer</td>
<td>-</td>
<td>0</td>
<td>-</td>
</tr>
<tr><td>floating point</td>
<td>-</td>
<td>1</td>
<td>-</td>
</tr>
<tr><td>void</td>
<td>-</td>
<td>2</td>
<td>-</td>
</tr>
<tr><td>class or struct</td>
<td>class T</td>
<td>3</td>
<td>[base,access,virtual], ....</td>
</tr>
<tr><td>union</td>
<td>union T</td>
<td>4</td>
<td>-</td>
</tr>
<tr><td>enumeration</td>
<td>enum T</td>
<td>5</td>
<td>-</td>
</tr>
<tr><td>pointer</td>
<td>cv T *</td>
<td>6</td>
<td>[T,cv,0]</td>
</tr>
<tr><td>reference</td>
<td>cv T &</td>
<td>7</td>
<td>[T,cv,0]</td>
</tr>
<tr><td>pointer to member</td>
<td>cv T S::*</td>
<td>8</td>
<td>[S,0,0], [T,cv,0]</td>
</tr>
<tr><td>array</td>
<td>cv T [n]</td>
<td>9</td>
<td>[T,cv,n]</td>
</tr>
<tr><td>bitfield</td>
<td>cv T : n</td>
<td>10</td>
<td>[T,cv,n]</td>
</tr>
<tr><td>C++ function</td>
<td>cv T ( S1, ...., Sn )</td>
<td>11</td>
<td>[T,cv,0], [S1,0,0], ...., [Sn,0,0]</td>
</tr>
<tr><td>C function</td>
<td>cv T ( S1, ...., Sn )</td>
<td>12</td>
<td>[T,cv,0], [S1,0,0], ...., [Sn,0,0]</td>
</tr>
</table>
</para>
<para>
In the form column <code>cv T</code> is used to denote not only the
normal cv-qualifiers but, when <code>T</code> is a function type,
the member function cv-qualifiers. Arrays with an unspecified bound
are treated as if their bound was zero. Functions with ellipsis are
treated as if they had an extra parameter of a dummy type named
<code>...</code> (see below). Note the distinction between C++ and
C function types.
</para>
<para>
Each base type information structure is described as a triple consisting
of a type and two integers. One of these integers may be used to
encode a type qualifier, <code>cv</code>, as follows:
</para>
<para>
<table>
<tr><th>Qualifier</th> <th>Encoding</th>
</tr>
<tr><td>none</td> <td>0</td>
</tr>
<tr><td>const</td> <td>1</td>
</tr>
<tr><td>volatile</td> <td>2</td>
</tr>
<tr><td>const volatile</td><td>3</td>
</tr>
</table>
</para>
<para>
The base type information for a class consists of information on each
of its direct base classes. The includes the offset of this base
within the class (for a virtual base class this is the offset of the
corresponding
<I>ptr</I> field), whether the base is virtual (1) or not (0), and
the base class access, encoded as follows:
</para>
<para>
<table>
<tr><th>Access</th> <th>Encoding</th>
</tr>
<tr><td>public</td> <td>0</td>
</tr>
<tr><td>protected</td> <td>1</td>
</tr>
<tr><td>private</td> <td>2</td>
</tr>
</table>
</para>
<para>
For example, the run-time type information structures for the classes
declared in the <A HREF="#diamond">diamond lattice</A> above can be
represented as follows:
<IMG SRC="../images/rttiD.gif" ALT="typeid D"/>
</para>
<H4>Defining run-time type information structures</H4>
<para>
For built-in types, the run-time type information structure may be
referenced by the token:
<programlisting>
~cpp.typeid.basic : ( SIGNED_NAT ) -> EXP pti
</programlisting>
where the argument gives the encoding of the type as given in the
following table:
</para>
<table>
<tr><th>Type</th> <th>Encoding</th>
<th>Type</th> <th>Encoding</th>
</tr>
<tr><td>char</td> <td>0</td>
<td>unsigned long</td> <td>11</td>
</tr>
<tr><td>(error)</td> <td>1</td>
<td>float</td> <td>12</td>
</tr>
<tr><td>void</td> <td>2</td>
<td>double</td> <td>13</td>
</tr>
<tr><td>(bottom)</td> <td>3</td>
<td>long double</td> <td>14</td>
</tr>
<tr><td>signed char</td> <td>4</td>
<td>wchar_t</td> <td>16</td>
</tr>
<tr><td>signed short</td> <td>5</td>
<td>bool</td> <td>17</td>
</tr>
<tr><td>signed int</td> <td>6</td>
<td>(ptrdiff_t)</td> <td>18</td>
</tr>
<tr><td>signed long</td> <td>7</td>
<td>(size_t)</td> <td>19</td>
</tr>
<tr><td>unsigned char</td> <td>8</td>
<td>(...)</td> <td>20</td>
</tr>
<tr><td>unsigned short</td><td>9</td>
<td>signed long long</td>
<td>23</td>
</tr>
<tr><td>unsigned int</td> <td>10</td>
<td>unsigned long long</td>
<td>27</td>
</tr>
</table>
<para>
Note that the encoding for the basic integral types is the same as
that
<A HREF="#arith">given above</A>. The other types are assigned to
unused values. Note that the encodings for <code>ptrdiff_t</code>
and
<code>size_t</code> are not used, instead that for their implementation
is used (using the standard tokens <code>ptrdiff_t</code> and
<code>size_t</code>). The encodings for <code>bool</code> and
<code>wchar_t</code> are used because they are conceptually distinct
types even though they are implemented as one of the basic integral
types. The type labelled <code>...</code> is the dummy used in the
representation of ellipsis functions. The default implementation
uses an array of type information structures, <code>__TCPPLUS_typeid</code>,
to implement <code>~cpp.typeid.basic</code>.
</para>
<para>
The run-time type information structures for classes are defined in
the same place as their <A HREF="#vtable">virtual function tables</A>.
Other run-time type information structures are defined in whatever
modules require them. In the former case the type information structure
will have an <A HREF="#other">external tag name</A>; in the latter
case it will be an internal tag.
</para>
<H4>Accessing run-time type information</H4>
<para>
The primary means of accessing the run-time type information for an
object is using the <code>typeid</code> construct. In cases where
the operand type can be determined statically, the address of the
corresponding type information structure is returned. In other cases
the token:
<programlisting>
~cpp.typeid.ref : ( EXP ppvt ) -> EXP pti
</programlisting>
is used, where the argument gives a reference to the <I>vptr</I> field
of the object being checked. From this information it is trivial
to trace the corresponding type information.
</para>
<para>
Another means of querying the run-time type information for an object
is using the <code>dynamic_cast</code> construct. When the result
cannot be determined statically, this is implemented using the token:
<programlisting>
~cpp.dynam.cast : ( EXP ppvt, EXP pti ) -> EXP pv
</programlisting>
where the first expression gives a reference to the <I>vptr</I> field
of the object being cast and the second gives the run-time type information
for the type being cast to. In the default implementation this token
is implemented by the procedure <code>__TCPPLUS_dynamic_cast</code>.
The key point to note is that the virtual function table contains
the offset, <I>voff</I>, of the <I>vptr</I> field from the start of
the most complete object. Thus it is possible to find the address
of the most complete object. The run-time type information contains
enough information to determine whether this object has a sub-object
of the type being cast to, and if so, how to find the address of this
sub-object. The result is returned as a <code>void *</code>, with
the null pointer indicating that the conversion is not possible.
</para>
</sect3>
<sect3 id="dynamic-initialisation">
<title>2.6.15. Dynamic initialisation</title>
<para>
The dynamic initialisation of variables with static storage duration
in C++ is implemented by means of the TDF <code>initial_value</code>
construct. However in order for the producer to maintain control
over the order of initialisation, rather than each variable being
initialised separately using <code>initial_value</code>, a single
expression is created which initialises all the variables in a module,
and this initialiser expression is used to initialise a single dummy
variable using <code>initial_value</code>. Note that, while this
enables the variables within a single module to be initialised in
the order in which they are defined, the order of initialisation between
different modules is unspecified.
</para>
<para>
The implementation needs to keep a list of those variables with static
storage duration which have been initialised so that it can call the
destructors for these objects at the end of the program. This is done
by declaring a variable of shape:
<programlisting>
~cpp.destr.type : () -> SHAPE
</programlisting>
for each such object with a non-trivial destructor. Each element
of an array is considered a distinct object. Immediately after the
variable has been initialised the token:
<programlisting>
~cpp.destr.global : ( EXP pd, EXP POINTER c, EXP PROC ) -> EXP TOP
</programlisting>
is called to add the variable to the list of objects to be destroyed.
The first argument is the address of the dummy variable just declared,
the second is the address of the object to be destroyed, and the third
is the destructor to be used. In this way a list giving the objects
to be destroyed, and the order in which to destroy them, is built
up. Note that partially constructed objects are destroyed within
their constructors (see <A HREF="#partial">above</A>) so that only
completely constructed objects need to be considered.
</para>
<para>
The implementation also needs to ensure that it calls the destructors
in this list at the end of the program, including calls of
<code>exit</code>. This is done by calling the token:
<programlisting>
~cpp.destr.init : () -> EXP TOP
</programlisting>
at the start of each <code>initial_value</code> construct. In the
default implementation this uses <code>atexit</code> to register a
function, <code>__TCPPLUS_term</code>, which calls the destructors.
To aid alternative implementations the token:
<programlisting>
~cpp.start : () -> EXP TOP
</programlisting>
is called at the start of the <code>main</code> function, however
this has no effect in the default implementation.
</para>
</sect3>
<sect3 id="except">
<title>2.6.16. Exception handling</title>
<para>
Conceptually, exception handling can be described in terms of the
following diagram:
<IMG SRC="../images/try.gif" ALT="try stack"/>
At any point in the execution of the program there is a stack of currently
active <code>try</code> blocks and currently active local variables.
A
<code>try</code> block is pushed onto the stack as it is entered and
popped from the stack when it is left (whether directly or via a jump).
A local variable with a non-trivial destructor is pushed onto the
stack just after its constructor has been called at the start of its
scope, and popped from the stack just before its destructor is called
at the end of its scope (including before jumps out of its scope).
Each element of an array is considered a separate object. Each <code>try</code>
block has an associated list of handlers. Each local variable has
an associated destructor.
</para>
<para>
Provided no exception is thrown this stack grows and shrinks in a
well-behaved manner as execution proceeds. When an exception is thrown
an exception manager is invoked to find a matching exception handler.
The exception manager proceeds to execute a loop to unwind the stack
as follows. If the stack is empty then the exception cannot be caught
and
<code>std::terminate</code> is called. Otherwise the top element
is popped from the stack. If this is a local variable then the associated
destructor is called for the variable. If the top element is a
<code>try</code> block then the current exception is compared in turn
to each of the associated handlers. If a match is found then execution
jumps to the handler body, otherwise the exception manager continues
to the next element of the stack.
</para>
<para>
Note that this description is purely conceptual. There is no need
for exception handling to be implemented by a stack in this way (although
the default implementation uses a similar technique). It does however
serve to illustrate the various stages which must exist in any implementation.
</para>
<H4>Try blocks</H4>
<para>
At the start of a <code>try</code> block a variable of shape:
<programlisting>
~cpp.try.type : () -> SHAPE
</programlisting>
is declared corresponding to the stack element for this block. This
is then initialised using the token:
<programlisting>
~cpp.try.begin : ( EXP ptb, EXP POINTER fa, EXP POINTER ca ) -> EXP TOP
</programlisting>
</para>
where the first argument is a pointer to this variable, the second
argument is the TDF <code>current_env</code> construct, and the third
argument is the result of the TDF <code>make_local_lv</code> construct
on the label which is used to mark the first handler associated with
the block. Note that the last two arguments enable a TDF
<code>long_jump</code> construct to be applied to transfer control
to the first handler.
<para>
When control exits from a <code>try</code> block, whether by reaching
the end of the block or jumping out of it, the block is removed from
the stack using the token:
<programlisting>
~cpp.try.end : ( EXP ptb ) -> EXP TOP
</programlisting>
where the argument is a pointer to the <code>try</code> block variable.
</para>
<H4>Local variables</H4>
<para>
The technique used to add a local variable with a non-trivial destructor
to the stack is similar to that used in the dynamic initialisation
of global variables. A local variable of shape <code>~cpp.destr.type</code>
is declared at the start of the variable scope. This is initialised
just after the constructor for the variable is called using the token:
<programlisting>
~cpp.destr.local : ( EXP pd, EXP POINTER c, EXP PROC ) -> EXP TOP
</programlisting>
where the first argument is a pointer to the variable being initialised,
the second is a pointer to the local variable to be destroyed, and
the third is the destructor to be called. At the end of the variable
scope, just before its destructor is called, the token:
<programlisting>
~cpp.destr.end : ( EXP pd ) -> EXP TOP
</programlisting>
where the argument is a pointer to destructor variable, is called
to remove the local variable destructor from the stack. Note that
partially constructed objects are destroyed within their constructors
(see
<A HREF="#partial">above</A>) so that only completely constructed
objects need to be considered.
</para>
<para>
In cases where the local variable may be conditionally initialised
(for example a temporary variable in the second operand of a <code>||</code>
operation) the local variable of shape <code>~cpp.destr.type</code>
is initialised to the value given by the token:
<programlisting>
~cpp.destr.null : () -> EXP d
</programlisting>
(normally it is left uninitialised). Before the destructor for this
variable is called the value of the token:
<programlisting>
~cpp.destr.ptr : ( EXP pd ) -> EXP POINTER c
</programlisting>
is tested. If <code>~cpp.destr.local</code> has been called for this
variable then this token returns a pointer to the variable, otherwise
it returns a null pointer. The token <code>~cpp.destr.end</code>
and the destructor are only called if this token indicates that the
variable has been initialised.
</para>
<H4>Throwing an exception</H4>
<para>
When a <code>throw</code> expression with an argument is encountered
a number of steps performed. Firstly, space is allocated to hold
the exception value using the token:
<programlisting>
~cpp.except.alloc : ( EXP VARIETY size_t ) -> EXP pv
</programlisting>
the argument of which gives the size of the value. The space allocated
is returned as an expression of type <code>void *</code>. Secondly,
the exception value is copied into the space allocated, using a copy
constructor if appropriate. Finally the exception is raised using
the token:
<programlisting>
~cpp.except.throw : ( EXP pv, EXP pti, EXP PROC ) -> EXP BOTTOM
</programlisting>
The first argument gives the pointer to the exception value, returned
by
<code>~cpp.except.alloc</code>, the second argument gives a pointer
to the run-time type information for the exception type, and the third
argument gives the destructor to be called to destroy the exception
value (if any). This token sets the current exception to the given
values and invokes the exception manager as above.
</para>
<para>
A <code>throw</code> expression without an argument results in a call
to the token:
<programlisting>
~cpp.except.rethrow : () -> EXP BOTTOM
</programlisting>
which re-invokes the exception manager with the current exception.
If there is no current exception then the implementation should call
<code>std::terminate</code>.
</para>
<H4>Handling an exception</H4>
<para>
The exception manager proceeds to find an exception in the manner
described above, unwinding the stack and calling destructors for local
variables. When a <code>try</code> block is popped from the stack
a TDF <code>long_jump</code> is applied to transfer control to its
list of handlers. For each handler in turn it is checked whether
the handler can catch the current exception. For <code>...</code>
handlers this is always true; for other handlers it is checked using
the token:
<programlisting>
~cpp.except.catch : ( EXP pti ) -> EXP VARIETY int
</programlisting>
where the argument is a pointer to the run-time type information for
the handler type. This token gives 1 if the exception is caught by
this handler, and 0 otherwise. If the exception is not caught by
the handler then the next handler is checked, until there are no more
handlers associated with the <code>try</code> block. In this case
control is passed back to the exception manager by re-throwing the
current exception using <code>~cpp.except.rethrow</code>.
</para>
<para>
If an exception is caught by a handler then a number of steps are
performed. Firstly, if appropriate, the handler variable is initialised
by copying the current exception value. A pointer to the current
exception value can be obtained using the token:
<programlisting>
~cpp.except.value : () -> EXP pv
</programlisting>
Once this initialisation is complete the token:
<programlisting>
~cpp.except.caught : () -> EXP TOP
</programlisting>
is called to indicate that the exception has been caught. The handler
body is then executed. When control exits from the handler, whether
by reaching the end of the handler or by jumping out of it, the token:
<programlisting>
~cpp.except.end : () -> EXP TOP
</programlisting>
is called to indicate that the exception has been completed. Note
that the implementation should call the destructor for the current
exception and free the space allocated by <code>~cpp.except.alloc</code>
at this point. Execution then continues with the statement following
the handler.
</para>
<para>
To conclude, the TDF generated for a <code>try</code> block and its
associated list of handlers has the form:
<programlisting>
variable (
long_jump_access,
stack_tag,
make_value ( ~cpp.try.type ),
conditional (
handler_label,
sequence (
~cpp.try.begin (
obtain_tag ( stack_tag ),
current_env,
make_local_lv ( handler_label ) ),
<I>try-block-body</I>,
~cpp.try.end ),
conditional (
catch_label_1,
sequence (
integer_test (
not_equal,
catch_label_1,
~cpp.except.catch (
<I>handler-1-typeid</I> ) )
variable (
handler_tag_1,
<I>handler-1-init</I> (
~cpp.except.value ),
sequence (
~cpp.except.caught,
<I>handler-1-body</I> ) )
~cpp.except.end )
conditional (
catch_label_2,
<I>further-handlers</I>,
~cpp.except.rethrow ) ) ) )
</programlisting>
</para>
<para>
Note that for a local variable to maintain its previous value when
an exception is caught in this way it is necessary to declare it
using the TDF <code>long_jump_access</code> construct. Any local
variable which contains a <code>try</code> block in its scope is declared
in this way.
</para>
<para>
To aid implementations in the writing of exception managers the following
standard tokens are provided:
<programlisting>
~cpp.ptr.code : () -> SHAPE POINTER ca
~cpp.ptr.frame : () -> SHAPE POINTER fa
~cpp.except.jump : ( EXP POINTER fa, EXP POINTER ca ) -> EXP BOTTOM
</programlisting>
These give the shape of the TDF <code>make_local_lv</code> construct,
the shape of the TDF <code>current_env</code> construct, and direct
access to the TDF <code>long_jump</code> access. The exception manager
in the default implementation is a function called <code>__TCPPLUS_throw</code>.
</para>
<H4>Exception specifications</H4>
<para>
If a function is declared with an exception specification then extra
code needs to be generated in the function definition to catch any
unexpected exceptions thrown by the function and to call <code>std::unexpected
</code>. Since this is a potentially high overhead for small functions,
this extra code is not generated if it can be proved that such unexpected
exceptions can never be thrown (the analysis is essentially the same
as that in the
<A HREF="pragma.html#exception">exception analysis</A> check).
</para>
<para>
The implementation of exception specification is to enclose the entire
function definition in a <code>try</code> block. The handler for
this block uses <code>~cpp.except.catch</code> to check whether the
current exception can be caught by any of the types listed in the
exception specification. If so the current exception is re-thrown.
If none of these types catch the current exception then the token:
<programlisting>
~cpp.except.bad : ( SIGNED_NAT ) -> EXP TOP
</programlisting>
is called. The argument is 1 if the exception specification includes
the special type <code>std::bad_exception</code>, and 0 otherwise.
The implementation should call <code>std::unexpected</code>, but how
any exceptions thrown during this call are to be handled depends on
the value of the argument.
</para>
</sect3>
<sect3 id="mangle">
<title>2.6.17. Mangled identifier names</title>
<para>
In a similar fashion to other C++ compilers, the C++ producer needs
a method of mapping C++ identifiers to a form suitable for further
processing, namely TDF tag names. This mangled name contains an encoding
of the identifier name, its parent namespace or class and its type.
Identifiers with C linkage are not mangled. The producer contains
a built-in <A HREF="man.html#unmangle">name unmangler</A>
which performs the reverse operation of transforming the mangled form
of an identifier name back to the underlying identifier. This can
be useful when analysing system linker errors.
</para>
<para>
Note that the type of an identifier forms part of its mangled name
not only for functions, but also for variables. Many other compilers
do not mangle variable names, however the ISO C++ rules on namespaces
and variables with C linkage make it necessary (this can be suppressed
using the <code>-j-n</code> command-line option). Declaring the language
linkage of a variable inconsistently can therefore lead to linking
errors with the C++ producer which are not detected by other compilers.
A common example is:
<programlisting>
extern int errno ;
</programlisting>
which, leaving aside whether <code>errno</code> is actually an external
variable, should be:
<programlisting>
extern "C" int errno ;
</programlisting>
</para>
<para>
As described above, the mangled form of an identifier has three components;
the identifier name, the identifier namespace and the identifier type.
Two underscores (<code>__</code>) are used to separate the name component
from the namespace and type components. The mangling scheme used
is based on that described in the ARM. The description below is not
complete; the mangling and unmangling routines themselves should be
consulted for a complete description.
</para>
<H4>Mangling identifier names</H4>
<para>
Simple identifier names are mapped to themselves. Unicode characters
of the forms <code>\u</code><I>xxxx</I> and <code>\U</code><I>xxxxxxxx</I>
are mapped to <code>__k</code><I>xxxx</I> and <code>__K</code><I>xxxxxxxx</I>
respectively, where the hex digits are output in their canonical lower-case
form. Constructors are mapped to <code>__ct</code> and destructors
to <code>__dt</code>. Conversions functions are mapped to
<code>__op</code><I>type</I> where <I>type</I> is the mangled form
of the conversion type. Overloaded operator functions,
<code>operator@</code>, are mapped as follows:
</para>
<table>
<tr><th>Operator</th> <th>Mapping</th>
<th>Operator</th> <th>Mapping</th>
<th>Operator</th> <th>Mapping</th>
</tr>
<tr><td>&</td> <td>__ad</td>
<td>&=</td> <td>__aad</td>
<td>[]</td> <td>__vc</td>
</tr>
<tr><td>-></td> <td>__rf</td>
<td>->*</td> <td>__rm</td>
<td>=</td> <td>__as</td>
</tr>
<tr><td>,</td> <td>__cm</td>
<td>~</td> <td>__co</td>
<td>/</td> <td>__dv</td>
</tr>
<tr><td>/=</td> <td>__adv</td>
<td>==</td> <td>__eq</td>
<td>()</td> <td>__cl</td>
</tr>
<tr><td>></td> <td>__gt</td>
<td>>=</td> <td>__ge</td>
<td><</td> <td>__lt</td>
</tr>
<tr><td><=</td> <td>__le</td>
<td>&&</td> <td>__aa</td>
<td>||</td> <td>__oo</td>
</tr>
<tr><td><<</td> <td>__ls</td>
<td><<=</td> <td>__als</td>
<td>-</td> <td>__mi</td>
</tr>
<tr><td>-=</td> <td>__ami</td>
<td>--</td> <td>__mm</td>
<td>!</td> <td>__nt</td>
</tr>
<tr><td>!=</td> <td>__ne</td>
<td>|</td> <td>__or</td>
<td>|=</td> <td>__aor</td>
</tr>
<tr><td>+</td> <td>__pl</td>
<td>+=</td> <td>__apl</td>
<td>++</td> <td>__pp</td>
</tr>
<tr><td>%</td> <td>__md</td>
<td>%=</td> <td>__amd</td>
<td>>></td> <td>__rs</td>
</tr>
<tr><td>>>=</td> <td>__ars</td>
<td>*</td> <td>__ml</td>
<td>*=</td> <td>__aml</td>
</tr>
<tr><td>^</td> <td>__er</td>
<td>^=</td> <td>__aer</td>
<td>delete</td> <td>__dl</td>
</tr>
<tr><td>delete []</td> <td>__vd</td>
<td>new</td> <td>__nw</td>
<td>new []</td> <td>__vn</td>
</tr>
<tr><td>?:</td> <td>__cn</td>
<td>:</td> <td>__cs</td>
<td>::</td> <td>__cc</td>
</tr>
<tr><td>.</td> <td>__df</td>
<td>.*</td> <td>__dm</td>
<td>abs</td> <td>__ab</td>
</tr>
<tr><td>max</td> <td>__mx</td>
<td>min</td> <td>__mn</td>
<td>sizeof</td> <td>__sz</td>
</tr>
<tr><td>typeid</td> <td>__td</td>
<td>vtable</td> <td>__tb</td>
<td>-</td> <td>-</td>
</tr>
</table>
<para>
Note that this table contains a number of operators which are not
part of C++ or cannot be overloaded in C++. These are used in the
representation of target dependent integer constants.
</para>
<H4>Mangling namespace names</H4>
<para>
The global namespace is mapped to an empty string. Simple namespace
and class names are mapped as above, but are preceded by a series
of decimal digits giving the length of the mangled name. Nested namespaces
and classes are represented by a sequence of such namespace names,
preceded by the number of elements in the sequence. This takes the
form <code>Q</code><I>digit</I> if there are less than 10 elements,
or
<code>Q_</code><I>digits</I><code>_</code> if there are more than
10. Note that members of anonymous classes or namespaces are local
to their translation unit, and so do not have external tag names.
</para>
<H4>Mangling types</H4>
<para>
The mangling of types is essentially similar to that used in the
<A HREF="dump.html">symbol table dump</A> format. The type used in
the mangled name for an identifier ignores the return type for a function
and ignores the most significant bound for an array.
</para>
<para>
The built-in types are mapped in precisely the same way as in the
<A HREF="dump.html#built-in">symbol table dump</A>. Class and enumeration
types are mapped to their type names mangled in the same way as the
namespace names above. The exception to this is that in a class member,
the parent class is mapped to <code>X</code>.
</para>
<para>
The composite types are again mapped in a similar fashion to that
in the <A HREF="dump.html#composite">dump file</A>. For example,
<code>PCc</code> represents <code>const char *</code>. The only difficult
case concerns function parameter types where the ARM
<code>T</code> and <code>N</code> encodings are used for duplicate
parameter types. The function return type is included in the mangled
form except for function identifier types. In the cases where the
identifier is known always to represent a function (constructors,
destructors etc.) the initial <code>F</code>
indicating a function type is also omitted.
</para>
<para>
The types of template functions and classes are represented by the
underlying template and the template arguments giving rise to the
instance. Template classes are preceded by <code>t</code>; template
functions are preceded by <code>G</code> rather than <code>F</code>.
Type arguments are represented by <code>Z</code> followed by the type
value; non-type arguments are represented by the argument type followed
by the argument value. In the underlying type the template parameters
are represented by <code>m0</code>, <code>m1</code> etc. An alternative
scheme, in which the mangled form of a template function includes
the type of that instance, rather than the underlying template, can
be enabled using the <code>-j-f</code>
command-line option.
</para>
<H4><A id="other">Other mangled names</A></H4>
<para>
The <A HREF="#vtable">virtual function table</A> for a class, when
this is a variable with external linkage, is named <code>__vt__</code><I>type
</I>, where <I>type</I> is the mangled form of the class name. The
virtual function table for a base class is named <code>__vt__</code><I>base</I>
where <I>base</I> is a sequence of mangled class names specifying
the base class. The <A HREF="#rtti">run-time type information structure</A>
for a type, when this is a variable with external linkage, is named
<code>__ti__</code><I>type</I>, where <I>type</I> is the mangled form
of the type name.
</para>
<H4>Mangled name examples</H4>
<para>
The following gives some examples of the name mangling scheme:
<programlisting>
class A {
static int a ; // a__1Ai
public :
A () ; // __ct__1A
A ( int ) ; // __ct__1Ai
A ( const A & ) ; // __ct__1ARCX
virtual ~A () ; // __dt__1A
operator bool () ; // __opb__1A
bool operator! () ; // __nt__1A
} ;
// virtual function table __vt__1A
// run-time type information __ti__1A
int f ( A *, int, A * ) ; // f__FP1AiT1
int b = 2 ; // b__i
int c [3] ; // c__A_i
namespace N {
int *p = 0 ; // p__1NPi
}
</programlisting>
</para>
</sect3>
</sect2>
<sect2>
<title>2.7. Standard library</title>
<para>
At present the default implementation contains only a very small fraction
of the ISO C++ library, namely those headers -
<code><exception></code>, <code><new></code> and
<code><typeinfo></code> - which are an integral part of the
language specification. These headers are also those which require
the most cooperation between the producer and the library implementation,
as described in the <A HREF="lib.html">previous section</A>.
</para>
<para>
It is suggested that if further library components are required then
they be acquired from third parties. It should be noted however that
such libraries may require <A HREF="#porting">some effort</A> to be
ported to an ISO compliant compiler; for example, some information
on porting the <code>libio</code> component of <code>libg++</code>,
which contains some very compiler-dependent code, are
<A HREF="#libio">given below</A>. Libraries compiled with other C++
compilers may not link correctly with modules compiled using <code>tcc</code>.
</para>
<sect3 id="porting">
<title>2.7.1. Common porting problems</title>
<para>
Experience in porting pre-ISO C++ programs has shown that the following
new ISO C++ features tend to cause the most problems:
<itemizedlist>
<listitem><A HREF="pragma.html#implicit">Implicit <code>int</code></A> has
been banned.
</listitem>
<listitem><A HREF="pragma.html#string">String literals are now <code>const</code>
</A>, although in simple assignments the <code>const</code> is
implicitly removed.
</listitem>
<listitem>The scope of a <A HREF="pragma.html#for">variable declared in
a for-init-statement</A> is the <code>for</code> statement itself.
</listitem>
<listitem><A HREF="lib.html#mangle">Variables have linkage</A> and so should
be declared <code>extern "C"</code> if appropriate.
</listitem>
<listitem>The standard C library is now declared in the <code>std</code>
namespace.
</listitem>
<listitem>The <A HREF="pragma.html#template">template compilation model</A>
has been clarified. The notation for explicit instantiation and
specialisation has changed.
</listitem>
<listitem>Templates are analysed at their point of definition as well as
their point of instantiation.
</listitem>
<listitem><A HREF="pragma.html#keyword">New keywords</A> have been introduced.
</listitem>
</itemizedlist>
Note that many of these features have controlling <code>#pragma</code>
directives, so that it is possible to switch to using the pre-ISO
features.
</para>
</sect3>
<sect3 id="libio">
<title>2.7.2. Porting <code>libio</code></title>
<para>
Perhaps the library component which is most likely to be required
is
<code><iostream></code>. A readily available freeware implementation
of a pre-ISO (i.e. non-template) <code><iostream></code>
package is given by the <code>libio</code> component of <code>libg++</code>.
This section describes some of the problems encountered in porting
this package (version 2.7.1).
</para>
<para>
The <A HREF="man.html"><code>tcc</code> compiler flags</A> used in
porting <code>libio</code> were:
<programlisting>
tcc -Yposix -Yc++ -sC:cc
</programlisting>
indicating that the POSIX API is to be used and that the <code>.cc</code>
suffix is used to identify C++ source files.
</para>
<para>
In <code>iostream.h</code>, <code>cin</code>, <code>cout</code>,
<code>cerr</code> and <code>clog</code> should be declared with C
linkage, otherwise the C++ producer includes the type in the
<A HREF="lib.html#mangle">mangled name</A> and the fake
<code>iostream</code> hacks in <code>stdstream.cc</code> don't work.
The definition of <code>EOF</code> in this header can cause problems
if both <code>iostream.h</code> and <code>stdio.h</code> are included.
In this case <code>stdio.h</code> should be included first.
</para>
<para>
In <code>stdstream.cc</code>, the <A HREF="lib.html#derive">correct
definitions</A> for the fake <code>iostream</code> structures are
as follows:
<programlisting>
struct _fake_istream::myfields {
_ios_fields *vb ; // pointer to virtual base class ios
_IO_ssize_t _gcount ; // istream fields
void *vptr ; // pointer to virtual function table
} ;
struct _fake_ostream::myfields {
_ios_fields *vb ; // pointer to virtual base class ios
void *vptr ; // pointer to virtual function table
} ;
</programlisting>
The fake definition macros are then defined as follows:
<programlisting>
#define OSTREAM_DEF( NAME, SBUF, TIE, EXTRA_FLAGS )\
extern "C" _fake_ostream NAME = { { &NAME.base, 0 }, .... } ;
#define ISTREAM_DEF( NAME, SBUF, TIE, EXTRA_FLAGS )\
extern "C" _fake_istream NAME = { { &NAME.base, 0, 0 }, .... } ;
</programlisting>
Note that these are declared with C linkage as above.
</para>
<para>
In <code>stdstrbufs.cc</code>, the <A HREF="lib.html#other">correct
definitions</A> for the virtual function table names are as follows:
<programlisting>
#define filebuf_vtable __vt__7filebuf
#define stdiobuf_vtable __vt__8stdiobuf
</programlisting>
Note that the <code>_G_VTABLE_LABEL_PREFIX</code> macro is incorrectly
defined by the configuration process (it should be <code>__vt__</code>),
but the <code>##</code> directives in which it is used don't work
on an ISO compliant preprocessor anyway (token concatenation takes
place after replacement of macro parameters, but before further macro
expansion). The dummy virtual function tables should also be declared
with C linkage to suppress name mangling.
</para>
In addition, the initialisation of the standard streams relies on
the file pointers <code>stdout</code> etc. being constant expressions,
which in general they are not. The directive:
<programlisting>
#pragma TenDRA++ rvalue token as const allow
</programlisting>
will cause the C++ producer to assume that all <A HREF="token.html#exp">
tokenised rvalue expressions</A> are constant.
<para>
In <code>streambuf.cc</code>, if <code>errno</code> is to be explicitly
declared it should have C linkage or be declared in the <code>std</code>
namespace.
</para>
<para>
In <code>iomanip.cc</code>, the explicit template instantiations should
be prefixed by <code>template</code>. The corresponding template
declarations in <code>iomanip.h</code> should be declared using
<A HREF="pragma.html#template"><code>export</code></A> (note that
the <code>__GNUG__</code> version uses <code>extern</code>, which
may yet win out over <code>export</code>).
</para>
</sect3>
</sect2>
</sect1>
<sect1>
<title>
C++ Producer Guide: Style guide
</title>
<sect2>
<title>3.1. Source code organisation</title>
<para>
This section describes the basic organisation of the source code for
the C++ producer. This includes the coding conventions applied, the
application programming interface (API) observed and the division
of the code into separate modules.
</para>
<sect3 id="language">
<title>3.1.1. C coding standard</title>
<para>
The C++ producer is written in a subset of C which is compatible with
C++ (it compiles with most C compilers, but also bootstraps itself).
It has been written to conform to the local (OSSG)
<A HREF="index.html#cstyle">C coding standard</A>; most of the conformance
checking being automated by use of a
<A HREF="pragma.html#usr">user-defined compilation profile</A>,
<code>ossg_std.h</code>. The standard macros described in the coding
standard are defined in the standard header <code>ossg.h</code>. This
is included from the header <code>config.h</code> which is included
by all source files. The default definitions for these macros, set
according to the value of <code>__STDC__</code> and other compiler-defined
macros, should be correct, but they can be overridden by defining
the <code>FS_*</code> macros, described in the header, as command-line
options.
</para>
<para>
The most important of these macros are those used to handle function
prototypes, enabling both ISO and pre-ISO C compilers to be accommodated.
Simple function definitions take the form:
<programlisting>
ret function
PROTO_N ( ( p1, p2, ...., pn ) )
PROTO_T ( par1 p1 X par2 p2 X .... X parn pn )
{
....
}
</programlisting>
with the <code>PROTO_N</code> macro being used to list the parameter
names (note the double bracket) and the <code>PROTO_T</code> macro
being used to list the parameter types using <code>X</code> (cartesian
product) as a separator. The corresponding function declaration will
have the form:
<programlisting>
ret function PROTO_S ( ( par1, par2, ...., parn ) ) ;
</programlisting>
The case where there are no parameter types is defined using:
<programlisting>
ret function
PROTO_Z ()
{
....
}
</programlisting>
and declared as:
<programlisting>
ret function PROTO_S ( ( void ) ) ;
</programlisting>
Functions with ellipses are defined using:
<programlisting>
#if FS_STDARG
#include <stdarg.h>
#else
#include <varargs.h>
#endif
ret function
PROTO_V ( ( par1 p1, par2 p2, ...., parn pn, ... ) )
{
va_list args ;
....
#if FS_STDARG
va_start ( args, pn ) ;
#else
par1 p1 ;
par2 p2 ;
....
parn pn ;
va_start ( args ) ;
p1 = va_arg ( args, par1 ) ;
p2 = va_arg ( args, par2 ) ;
....
pn = va_arg ( args, parn ) ;
#endif
....
va_end ( args ) ;
....
}
</programlisting>
and declared as:
<programlisting>
ret function PROTO_W ( ( par1, par2, ...., parn, ... ) ) ;
</programlisting>
Note that <code><varargs.h></code> does not allow for parameters
preceding the <code>va_alist</code>, so the fixed parameters need
to be explicitly assigned from <code>args</code>.
</para>
<para>
The following <A HREF="pragma.html#keyword">TenDRA keywords</A> are
defined (with suitable default values for non-TenDRA compilers):
<programlisting>
#pragma TenDRA keyword SET for set
#pragma TenDRA keyword UNUSED for discard variable
#pragma TenDRA keyword IGNORE for discard value
#pragma TenDRA keyword EXHAUSTIVE for exhaustive
#pragma TenDRA keyword REACHED for set reachable
#pragma TenDRA keyword UNREACHED for set unreachable
#pragma TenDRA keyword FALL_THROUGH for fall into case
</programlisting>
</para>
<para>
Various flags giving properties of the compiler being used are defined
in <code>ossg.h</code>. Among the most useful are <code>FS_STDARG</code>,
which is true if the compiler supports ellipsis functions (see above),
and <code>FS_STDC_HASH</code>, which is true if the preprocessor supports
the ISO stringising and concatenation operators. The macros
<code>CONST</code> and <code>VOLATILE</code>, to be used in place
of
<code>const</code> and <code>volatile</code>, are also defined.
</para>
<para>
A policy of rigorous static program checking is enforced. The TenDRA
C producer is applied with the user-defined compilation mode
<code>ossg_std.h</code> and intermodule checks enabled. Checking
is applied with both the C and <code>#pragma token</code>
<A HREF="../utilities/calc.html"><code>calculus</code> output files</A>.
The C++ producer itself is applied with the same checks. <code>gcc
-Wall</code> and various versions of <code>lint</code> are also periodically
applied.
</para>
</sect3>
<sect3 id="api">
<title>3.1.2. API usage and target dependencies</title>
<para>
Most of the API features used in the C++ producer are to be found
in the ISO C API, with just a couple of extensions from POSIX required.
These POSIX features can be disabled with minimal loss of functionality
by defining the macro <code>FS_POSIX</code> to be false.
</para>
<para>
The following features are used from the ISO <code><stdio.h></code>
header:
<programlisting>
BUFSIZ EOF FILE SEEK_SET
fclose fflush fgetc fgets
fopen fprintf fputc fputs
fread fseek fwrite rewind
sprintf stderr stdin stdout
vfprintf
</programlisting>
from the ISO <code><stdlib.h></code> header:
<programlisting>
EXIT_SUCCESS EXIT_FAILURE NULL abort
exit free malloc realloc
size_t
</programlisting>
and from the ISO <code><string.h></code> header:
<programlisting>
memcmp memcpy strchr strcmp
strcpy strlen strncmp strrchr
</programlisting>
The three headers just mentioned are included in all source files
via the
<code>ossg_api.h</code> header file (included by <code>config.h</code>).
The remaining headers are only included as and when they are needed.
The following features are used from the ISO <code><ctype.h></code>
header:
<programlisting>
isalpha isprint
</programlisting>
from the ISO <code><limits.h></code> header:
<programlisting>
UCHAR_MAX UINT_MAX ULONG_MAX
</programlisting>
from the ISO <code><stdarg.h></code> header:
<programlisting>
va_arg va_end va_list va_start
</programlisting>
(note that if <code>FS_STDARG</code> is false the XPG3
<code><varargs.h></code> header is used instead); and from the
ISO
<code><time.h></code> header:
<programlisting>
localtime time time_t struct tm
tm::tm_hour tm::tm_mday tm::tm_min tm::tm_mon
tm::tm_sec tm::tm_year
</programlisting>
The following features are used from the POSIX
<code><sys/stat.h></code> header:
<programlisting>
stat struct stat stat::st_dev stat::st_ino
stat::st_mtime
</programlisting>
The <code><sys/types.h></code> header is also included to provide
the necessary types for <code><sys/stat.h></code>.
</para>
<para>
There are a couple of target dependencies in the producer which can
overridden using command-line options:
<itemizedlist>
<listitem>It assumes that if a count of the number of characters read from
an input file is maintained, then that count value can be used as
an argument to <code>fseek</code>. This may not be true on machines
where the end of line marker consists of both a newline and a carriage
return. In this case the <code>-m-f</code> command-line option can
be used to switch to a slower, but more portable, algorithm for setting
file positions.
</listitem>
<listitem>It assumes that a file is uniquely determined by the
<code>st_dev</code> and <code>st_ino</code> fields of its corresponding
<code>stat</code> value. This is used when processing
<code>#include</code> directives to prevent a file being read more
than once if this is not necessary. This assumption may not be true
on machines with a small <code>ino_t</code> type which have file systems
mounted from machines with a larger <code>ino_t</code> type. In this
case the <code>-m-i</code> command-line option can be used to disable
this check.
</listitem>
</itemizedlist>
</para>
</sect3>
<sect3 id="src">
<title>3.1.3. Source code modules</title>
<para>
For convenience, the source code is divided between a number of directories:
<itemizedlist>
<listitem>The base directory contains only the module containing the
<code>main</code> function, the basic type descriptions and the
<code>Makefile</code>.
</listitem>
<listitem>The directories <code>obj_c</code> and <code>obj_tok</code> contain
respectively the C and <code>#pragma token</code> headers generated
from the type algebra by <A HREF="../utilities/calc.html"><code>calculus</code>
</A>. The directory <code>obj_templ</code> contains certain <code>calculus
</code>
template files.
</listitem>
<listitem>The directory <code>utility</code> contains routines for such
utility operations as memory allocation and error reporting, including
the <A HREF="error.html">error catalogue</A>.
</listitem>
<listitem>The directory <code>parse</code> contains routines concerned with
parsing and preprocessing the input, including the
<A HREF="../utilities/sid.html"><code>sid</code> grammar</A>.
</listitem>
<listitem>The directory <code>construct</code> contains routines for building
up and analysing the internal representation of the parsed code.
</listitem>
<listitem>The directory <code>output</code> contains routines for outputting
the internal representation in various formats including as a
<A HREF="tdf.html">TDF capsule</A>, a <A HREF="link.html">C++ spec
file</A>, or a <A HREF="dump.html">symbol table dump file</A>.
</listitem>
</itemizedlist>
</para>
<para>
Each module consists of a C source file, <code><I>file</I>.c</code>
say, containing function definitions, and a corresponding header file
<code><I>file</I>.h</code> containing the declarations of these functions.
The header is included within its corresponding source file to check
these declarations; it is protected against multiple inclusions by
a macro of the form <code><I>FILE</I>_INCLUDED</code>. The header
contains a brief comment describing the purpose of the module; each
function in the source file contains a comment describing its purpose,
its inputs and its output.
</para>
<para>
The following table lists all the source modules in the C++ producer
with a brief description of the purpose of each:
</para>
<para>
<table>
<tr><th>Module</th> <th>Directory</th>
<th>Purpose</th>
</tr>
<tr><td>access</td> <td>construct</td>
<td>member access control</td>
</tr>
<tr><td>allocate</td> <td>construct</td>
<td><code>new</code> and <code>delete</code> expressions</td>
</tr>
<tr><td>assign</td> <td>construct</td>
<td>assignment expressions</td>
</tr>
<tr><td>basetype</td> <td>construct</td>
<td>basic type operations</td>
</tr>
<tr><td>buffer</td> <td>utility</td>
<td>buffer reading and writing routines</td>
</tr>
<tr><td>c_class</td> <td>obj_c</td>
<td><code>calculus</code> support routines</td>
</tr>
<tr><td>capsule</td> <td>output</td>
<td>top-level TDF encoding routines</td>
</tr>
<tr><td>cast</td> <td>construct</td>
<td>cast expressions</td>
</tr>
<tr><td>catalog</td> <td>utility</td>
<td>error catalogue definition</td>
</tr>
<tr><td>char</td> <td>parse</td>
<td>character sets</td>
</tr>
<tr><td>check</td> <td>construct</td>
<td>expression checking</td>
</tr>
<tr><td>chktype</td> <td>construct</td>
<td>type checking</td>
</tr>
<tr><td>class</td> <td>construct</td>
<td>class and enumeration definitions</td>
</tr>
<tr><td>compile</td> <td>output</td>
<td>TDF tag definition encoding routines</td>
</tr>
<tr><td>constant</td> <td>parse</td>
<td>integer constant evaluation</td>
</tr>
<tr><td>construct</td> <td>construct</td>
<td>constructors and destructors</td>
</tr>
<tr><td>convert</td> <td>construct</td>
<td>standard type conversions</td>
</tr>
<tr><td>copy</td> <td>construct</td>
<td>expression copying</td>
</tr>
<tr><td>debug</td> <td>utility</td>
<td>development aids</td>
</tr>
<tr><td>declare</td> <td>construct</td>
<td>variable and function declarations</td>
</tr>
<tr><td>decode</td> <td>output</td>
<td>bitstream reading routines</td>
</tr>
<tr><td>derive</td> <td>construct</td>
<td>base class graphs; inherited members</td>
</tr>
<tr><td>destroy</td> <td>construct</td>
<td>garbage collection routines</td>
</tr>
<tr><td>diag</td> <td>output</td>
<td>TDF diagnostic output routines</td>
</tr>
<tr><td>dump</td> <td>output</td>
<td>symbol table dump routines</td>
</tr>
<tr><td>encode</td> <td>output</td>
<td>bitstream writing routines</td>
</tr>
<tr><td>error</td> <td>utility</td>
<td>error output routines</td>
</tr>
<tr><td>exception</td> <td>construct</td>
<td>exception handling</td>
</tr>
<tr><td>exp</td> <td>output</td>
<td>TDF expression encoding routines</td>
</tr>
<tr><td>expression</td> <td>construct</td>
<td>expression processing</td>
</tr>
<tr><td>file</td> <td>parse</td>
<td>low-level I/O routines</td>
</tr>
<tr><td>function</td> <td>construct</td>
<td>function definitions and calls</td>
</tr>
<tr><td>hash</td> <td>parse</td>
<td>hash table and identifier name routines</td>
</tr>
<tr><td>identifier</td> <td>construct</td>
<td>identifier expressions</td>
</tr>
<tr><td>init</td> <td>output</td>
<td>TDF initialiser expression encoding routines</td>
</tr>
<tr><td>initialise</td> <td>construct</td>
<td>variable initialisers</td>
</tr>
<tr><td>instance</td> <td>construct</td>
<td>template instances and specialisations</td>
</tr>
<tr><td>inttype</td> <td>construct</td>
<td>integer and floating point type routines</td>
</tr>
<tr><td>label</td> <td>construct</td>
<td>labels and jumps</td>
</tr>
<tr><td>lex</td> <td>parse</td>
<td>lexical analysis</td>
</tr>
<tr><td>literal</td> <td>parse</td>
<td>integer and string literals</td>
</tr>
<tr><td>load</td> <td>output</td>
<td>C++ spec reading routines</td>
</tr>
<tr><td>macro</td> <td>parse</td>
<td>macro expansion</td>
</tr>
<tr><td>main</td> <td>-</td>
<td>main routine; command-line arguments</td>
</tr>
<tr><td>mangle</td> <td>output</td>
<td>identifier name mangling</td>
</tr>
<tr><td>member</td> <td>construct</td>
<td>member selector expressions</td>
</tr>
<tr><td>merge</td> <td>construct</td>
<td>intermodule merge routines</td>
</tr>
<tr><td>namespace</td> <td>construct</td>
<td>namespaces; name look-up</td>
</tr>
<tr><td>operator</td> <td>construct</td>
<td>overloaded operators</td>
</tr>
<tr><td>option</td> <td>utility</td>
<td>compiler options</td>
</tr>
<tr><td>overload</td> <td>construct</td>
<td>overload resolution</td>
</tr>
<tr><td>parse</td> <td>parse</td>
<td>low-level parser routines</td>
</tr>
<tr><td>pragma</td> <td>parse</td>
<td><code>#pragma</code> directives</td>
</tr>
<tr><td>predict</td> <td>parse</td>
<td>parser look-ahead routines</td>
</tr>
<tr><td>preproc</td> <td>parse</td>
<td>preprocessing directives</td>
</tr>
<tr><td>print</td> <td>utility</td>
<td>error argument printing routines</td>
</tr>
<tr><td>quality</td> <td>construct</td>
<td>extra expression checks</td>
</tr>
<tr><td>redeclare</td> <td>construct</td>
<td>variable and function redeclarations</td>
</tr>
<tr><td>rewrite</td> <td>construct</td>
<td>inline member function definitions</td>
</tr>
<tr><td>save</td> <td>output</td>
<td>C++ spec writing routines</td>
</tr>
<tr><td>shape</td> <td>output</td>
<td>TDF shape encoding routines</td>
</tr>
<tr><td>statement</td> <td>construct</td>
<td>statement processing</td>
</tr>
<tr><td>stmt</td> <td>output</td>
<td>TDF statement encoding routines</td>
</tr>
<tr><td>struct</td> <td>output</td>
<td>TDF structure encoding routines</td>
</tr>
<tr><td>syntax[0-9]*</td> <td>parse</td>
<td><code>sid</code> parser output</td>
</tr>
<tr><td>system</td> <td>utility</td>
<td>system dependent routines</td>
</tr>
<tr><td>table</td> <td>parse</td>
<td>portability table reading</td>
</tr>
<tr><td>template</td> <td>construct</td>
<td>template declarations and checks</td>
</tr>
<tr><td>throw</td> <td>output</td>
<td>TDF exception handling encoding routines</td>
</tr>
<tr><td>tok</td> <td>output</td>
<td>TDF standard tokens encoding</td>
</tr>
<tr><td>tokdef</td> <td>construct</td>
<td>token definitions</td>
</tr>
<tr><td>token</td> <td>construct</td>
<td>token declarations and expansion</td>
</tr>
<tr><td>typeid</td> <td>construct</td>
<td>run-time type information</td>
</tr>
<tr><td>unmangle</td> <td>output</td>
<td>identifier name unmangling</td>
</tr>
<tr><td>variable</td> <td>construct</td>
<td>variable analysis</td>
</tr>
<tr><td>virtual</td> <td>construct</td>
<td>virtual functions</td>
</tr>
<tr><td>xalloc</td> <td>utility</td>
<td>memory allocation routines</td>
</tr>
</table>
</para>
</sect3>
</sect2>
<sect2>
<title>3.2. Type system</title>
<para>
This section describes the type system used in the C++ producer. Unless
otherwise stated the types are declared using the
<A HREF="../utilities/calc.html"><code>calculus</code> tool</A> as
part of the algebra, <code>c_class.alg</code>. The design of this
type algebra was clearly largely based on the concepts underlying
the C++ language; however TDF provided an important influence, not
merely as the intended target language, but also because of its clear
presentation of essential language features.
</para>
<sect3 id="primitive">
<title>3.2.1. Primitive types</title>
<para>
The primitive types used within the algebra <code>c_class</code> are
defined as follows:
<programlisting>
int = "int" ;
unsigned = "unsigned" ;
string = "character *" ;
ulong_type (ulong) = "unsigned long" ;
BITSTREAM_P (bits) = "BITSTREAM *" ;
PPTOKEN_P (pptok) = "PPTOKEN *" ;
</programlisting>
The integral types are self-explanatory. All string literals used
in the C++ producer are based on the character type:
<programlisting>
typedef unsigned char character ;
</programlisting>
hence the definition of <code>string</code>. The remaining primitive
give links to those portions of the type system which are defined
outside of the algebra. The types <A HREF="#bits"><code>BITSTREAM</code></A>
and <A HREF="#pptok"><code>PPTOKEN</code></A> are described below.
</para>
</sect3>
<sect3 id="cv">
<title>3.2.2. <code>CV_SPEC</code></title>
<para>
The enumeration type <code>CV_SPEC</code> (short name <code>cv</code>)
is used to represent a C++ type qualifier. It takes the form of a
bitfield, the elements of which can be or-ed together to represent
combinations of type qualifiers. The cv-qualifiers are represented
by <code>cv_const</code> and <code>cv_volatile</code> in the obvious
manner. The value <code>cv_lvalue</code> is used as a qualifier to
indicate whether a type is an lvalue or an rvalue. Other values are
used in function types to represent the function language linkage.
</para>
</sect3>
<sect3 id="ntype">
<title>3.2.3. <code>BUILTIN_TYPE</code></title>
<para>
The enumeration type <code>BUILTIN_TYPE</code> (<code>ntype</code>)
is used to represent the built-in C++ types (<code>char</code>,
<code>float</code>, <code>void</code> etc.). It is used chiefly as
an index into tables of type information.
</para>
</sect3>
<sect3 id="btype">
<title>3.2.4. <code>BASE_TYPE</code></title>
<para>
The enumeration type <code>BASE_TYPE</code> (<code>btype</code>) is
used to represent a C++ simple type specifier such as <code>signed</code>,
<code>short</code> or <code>int</code>. It takes the form of a bitfield,
the elements of which can be or-ed together to represent combinations
of type specifiers. Its chief use is when reading a type from the
input file; the various simple type specifiers are combined to give
a value of this type, which is then mapped to an actual <A HREF="#type">C++
type</A>.
</para>
</sect3>
<sect3 id="itype">
<title>3.2.5. <code>INT_TYPE</code></title>
<para>
The union type <code>INT_TYPE</code> (<code>itype</code>) is used
to represent an integral or bitfield C++ type. The basic integral
types are given by the <code>basic</code> field. Bitfield types are
represented by the <code>bitfield</code> field. There are also fields
representing target dependent integral promotion, arithmetic and integer
literal types, plus <code>VARIETY</code> tokens. Only one <code>INT_TYPE</code>
object is created for each integral type.
</para>
</sect3>
<sect3 id="ftype">
<title>3.2.6. <code>FLOAT_TYPE</code></title>
<para>
The union type <code>FLOAT_TYPE</code> (<code>ftype</code>) is used
to represent a floating point C++ type. The basic floating point
types are given by the <code>basic</code> field. There are also fields
representing target dependent argument promotion and arithmetic types,
plus <code>FLOAT</code> tokens. Only one <code>FLOAT_TYPE</code>
object is created for each floating point type.
</para>
</sect3>
<sect3 id="cinfo">
<title>3.2.7. <code>CLASS_INFO</code></title>
<para>
The enumeration type <code>CLASS_INFO</code> (<code>cinfo</code>)
is used to represent information relating to a class or enumeration
definition. It takes the form of a bitfield, the elements of which
can be or-ed together to represent various combinations of properties.
</para>
</sect3>
<sect3 id="cusage">
<title>3.2.8. <code>CLASS_USAGE</code></title>
<para>
The enumeration type <code>CLASS_USAGE</code> (<code>cusage</code>)
is used to represent information relating to the way a class is used.
It takes the form of a bitfield, the elements of which can be or-ed
together to represent various combinations of properties.
</para>
</sect3>
<sect3 id="ctype">
<title>3.2.9. <code>CLASS_TYPE</code></title>
<para>
The union type <code>CLASS_TYPE</code> (<code>ctype</code>) is used
to represent a C++ class or union. The main components are an
<A HREF="#id">identifier</A> giving the class name,
<A HREF="#cinfo">class information</A> and <A HREF="#cusage">class
usage</A> fields, a <A HREF="#nspace">namespace</A> giving the class
members, a <A HREF="#graph">graph</A> representing the base class
structure, and a <A HREF="#virt">virtual function table</A>. Only
one
<code>CLASS_TYPE</code> object is created for each class or union.
</para>
<para>
Each class maintains a list, <code>pals</code>, of class and function
identifiers which are declared as friends of that class. It also
maintains a list, <code>chums</code>, of those class types which declare
it to be a friend (this is what is actually used in the access checks).
Similarly each function identifier maintains a list,
<code>chums</code>, of those class types which declare it to be a
friend.
</para>
<para>
Each class maintains a list of its constructors, destructors and conversion
functions (included inherited conversion functions). It also maintains
a list of its virtual base classes. This information can be obtained
by other means but it is more convenient to record it within the class
type itself.
</para>
</sect3>
<sect3 id="graph">
<title>3.2.10. <code>GRAPH</code></title>
<para>
The union type <code>GRAPH</code> (<code>graph</code>) is used to
represent a directed acyclic graph arising from the base classes of
a class. Each node of the graph has a <code>head</code> which is
a
<A HREF="#ctype">class type</A>, and several <code>tails</code> which
give the base class graphs for that class. Each node has pointers,
<code>top</code>, to the top of the graph (i.e. the most derived class),
and <code>up</code>, to the node of which the current node is a direct
base. Each node also has an <code>access</code> field which gives
information on the base access, whether it is virtual or not, and
so on, in the form of a <A HREF="#dspec"><code>DECL_SPEC</code></A>.
Virtual bases are handled by the <code>equal</code> field which defines
an equivalence relation on the graph which identifies equivalent virtual
bases.
</para>
</sect3>
<sect3 id="virt">
<title>3.2.11. <code>VIRTUAL</code></title>
<para>
The union type <code>VIRTUAL</code> (<code>virt</code>) is used to
represent the virtual functions declared in a class. The <code>table</code>
field is used to represent a virtual function table, and consists
primarily of a list of <code>VIRTUAL</code> objects giving the virtual
functions for the associated class. These virtual functions are of
four kinds, each represented by a union field. A virtual function
first declared in a class is represented by the <code>simple</code>
field; a virtual function in a class which overrides an inherited
virtual function is represented by the <code>override</code> field;
an inherited, non-overridden virtual function which is not overridden
in a base class is represented by the
<code>inherit</code> field; a inherited, non-overridden virtual function
which is overridden in some base class is represented by the
<code>complex</code> field.
</para>
</sect3>
<sect3 id="etype">
<title>3.2.12. <code>ENUM_TYPE</code></title>
<para>
The union type <code>ENUM_TYPE</code> (<code>etype</code>) is used
to represent a C++ enumeration type. This consists primarily of an
<A HREF="#id">identifier</A> giving the enumeration name, a
<A HREF="#cinfo">class information</A> field, a <A HREF="#type">type</A>
giving the underlying representation of the enumeration type, and
a list of <A HREF="#id">identifiers</A> giving the enumerators comprising
the enumeration.
</para>
</sect3>
<sect3 id="type">
<title>3.2.13. <code>TYPE</code></title>
<para>
The union type <code>TYPE</code> (<code>type</code>) is used to represent
a C++ type. Every type has an associated <A HREF="#cv">type qualifier</A>,
<code>qual</code>, which determines whether the type is
<code>const</code>, <code>volatile</code> or an lvalue. A type may
also have an associated <A HREF="#id">identifier</A>, <code>name</code>,
giving the corresponding type name (the null identifier being used
for unnamed types). The other type components are determined by the
union tag. Each of the type constructs above has a corresponding
field in the <code>TYPE</code> union:
<code>integer</code> for <A HREF="#itype">integral types</A>,
<code>floating</code> for <A HREF="#ftype">floating point types</A>,
<code>bitfield</code> for <A HREF="#itype">bitfield types</A>,
<code>compound</code> for <A HREF="#ctype">class or union types</A>,
and
<code>enumerate</code> for <A HREF="#etype">enumeration types</A>.
There are also fields <code>top</code> and <code>bottom</code>
corresponding to <code>void</code> and bottom (the type used to represent
values which never return).
</para>
<para>
Other fields of the <code>TYPE</code> union represent composite types;
for example, the <code>array</code> field, representing array types,
comprises a base type, <code>sub</code>, and an <A HREF="#nat">integer
constant</A> giving the array bound, <code>size</code>. These are
generally simple, apart from <code>func</code>, representing a function
type. This has the obvious components: a return type, <code>ret</code>,
a list of parameter types, <code>ptypes</code>, and a flag indicating
ellipsis functions, <code>ellipsis</code>. It also has an associated
<A HREF="#nspace">namespace</A>, <code>pars</code>, in which the function
parameters are declared. The parameter identifiers are extracted
from this as a list, <code>pids</code>. Member function qualifiers
and language linkage information are represented by a
<A HREF="#cv"><code>CV_QUAL</code></A>, <code>mqual</code>. The implicit
extra parameter for member functions is recorded in the list
<code>mtypes</code>, which adds this extra type to the start of
<code>ptypes</code>. Finally <code>except</code> gives any exception
specifiers; the case where the exception specifier is absent being
represented by the special value, <code>univ_type_set</code>.
</para>
</sect3>
<sect3 id="dspec">
<title>3.2.14. <code>DECL_SPEC</code></title>
<para>
The enumeration type <code>DECL_SPEC</code> (<code>dspec</code>) is
used to represent information on the declaration and usage of an identifier.
It takes the form of a bitfield, the elements of which can be or-ed
together to represent various combinations of properties. The 32
bits in this bitfield (the maximum which can be represented portably)
are a significant restriction. This means that the same member of
<code>DECL_SPEC</code> is often used to mean different things in different
contexts. This can prove confusing on occasions.
</para>
</sect3>
<sect3 id="hashid">
<title>3.2.15. <code>HASHID</code></title>
<para>
The union type <code>HASHID</code> (<code>hashid</code>) is used to
represent a C++ identifier name. The simplest form of identifier
name,
<code>name</code>, consists of just a string of characters, such as
<code>foo</code>. Extended identifier names, <code>ename</code>,
are similar, but may contain Unicode characters. There are however
other forms of identifier name in C++: conversion function names (<code>conv
</code>) such as <code>operator int</code>, overloaded operator names
(<code>op</code>) such as <code>operator+</code>, constructor names
(<code>constr</code>), and destructor names (<code>destr</code>).
There are also names which are used for anonymous identifiers (<code>anon</code>).
</para>
<para>
Note the distinction between an identifier name and an actual
<A HREF="#id">identifier</A>. The latter is a meaning associated
with a name in a particular context. Every identifier name has an
associated underlying meaning, <code>id</code>. This is used to handle
keywords and macros, but for most identifier names this will be a
dummy identifier. Nested underlying meanings (such as a macro hiding
a keyword) are handled by linking the <code>alias</code> fields of
the corresponding identifiers. Every identifier name also has a <code>cache
</code> field which is used to record the look-up of this name as
an unqualified identifier. This may be set to the null identifier
to indicate that the look-up needs to be re-evaluated.
</para>
<para>
Identifier names are stored in one of a small number of hash tables,
linked using their <code>next</code> field. Each name has only one
entry in these tables, allowing equality of names to be implemented
as <code>EQ_hashid</code>.
</para>
</sect3>
<sect3 id="qual">
<title>3.2.16. <code>QUALIFIER</code></title>
<para>
The enumeration type <code>QUALIFIER</code> (<code>qual</code>) is
used to represent the various ways in which an identifier name can
be qualified. For example, <code>::A::a</code> is represented by
<code>qual_full</code>. The value <code>qual_mark</code> is used
in the representation of function identifier expressions to indicate
that overload resolution has been performed.
</para>
</sect3>
<sect3 id="identifier">
<title>3.2.17. <code>IDENTIFIER</code></title>
<para>
The union type <code>IDENTIFIER</code> (<code>id</code>) is used to
represent the various kinds of C++ identifiers. Every identifier
has an associated <A HREF="#hashid">identifier name</A>, a parent
<A HREF="#nspace">namespace</A>, a <A HREF="#dspec">declaration information</A>
field, and a <A HREF="#loc">location</A> for its declaration or definition.
Each identifier also has an
<code>alias</code> field which is normally used to represent the aliasing
which can occur in inheritance or <code>using</code>
declarations.
</para>
<para>
The various fields of the <code>IDENTIFIER</code> union correspond
to the various kinds of identifier which can arise in C++ - class
names, functions, variables, class members, macros, keywords etc.
Each field has appropriate components giving its type, its definition
or whatever other information is required. For example, the <code>variable
</code>
field has a <A HREF="#type">type</A> and two <A HREF="#exp">expressions</A>,
giving the constructor and destructor values for the object.
</para>
<para>
Most of these identifier components are self-explanatory, however
the treatment of overloaded functions bears discussion. The various
fields representing functions have an <code>over</code> component
which is used to link overloaded functions together. A set of overloaded
functions is treated as if it were a single <code>IDENTIFIER</code>
- the first in the list - for the purposes of storing in a <A HREF="#member">namespace
member</A>; the other overloaded meanings are accessed by chasing
down the <code>over</code> components. In other situations, whether
a function identifier represents a single function or a set of overloaded
functions can be worked out from the context. For example, in identifier
expressions the <A HREF="#qual">identifier qualifier</A> is used to
mark whether overload resolution has taken place.
</para>
</sect3>
<sect3 id="member">
<title>3.2.18. <code>MEMBER</code></title>
<para>
The union type <code>MEMBER</code> (<code>member</code>) is used to
represent a member of a <A HREF="#nspace">namespace</A>. Each member
contains two identifiers, <code>id</code> and <code>alt</code>. The
<code>id</code> field gives the meaning associated with a particular
name in this namespace; the <code>alt</code> field is used to represent
a type name which may be hidden by a non-type name.
</para>
<para>
There are two kinds of member, <code>small</code> and <code>large</code>,
corresponding to whether the namespace holds its members in a simple
linked list or in a hash table.
</para>
</sect3>
<sect3 id="nspace">
<title>3.2.19. <code>NAMESPACE</code></title>
<para>
The union type <code>NAMESPACE</code> (<code>nspace</code>) is used
to represent the set of identifiers declared in a particular scope.
For example, the members declared in a C++ class or namespace, the
parameters declared in a function declarator and the local variables
declared in a block all form scopes. The various kinds of scope are
distinguished as different fields of the union, but there are basically
two categories. The first, such as function blocks, which have relatively
small numbers of elements, store their members as a simple linked
lists. The second, such as classes, which have larger numbers of
elements, store their members in hash tables. In both cases the elements
are stored using the <A HREF="#member"><code>MEMBER</code></A>
type.
</para>
<para>
The key operation on a namespace is to look up a particular
<A HREF="#hashid">identifier name</A> in its linked list or hash table
of members to find the meaning, if any, associated with that name
in the namespace. This can be a complex operation because of the
need to take base classes and <code>using</code> directives (as stored
in the <code>use</code> component) into account.
</para>
</sect3>
<sect3 id="nat">
<title>3.2.20. <code>NAT</code></title>
<para>
The union type <code>NAT</code> (<code>nat</code>) is used to represent
an integer constant expression. Values are represented as lists of
16 bit 'digits'. Values which fit into a single digit are represented
by the <code>small</code> field; larger values by the <code>large</code>
field. Negated values can be represented by the <code>neg</code>
field. Folding of integer constant expressions is performed in the
producer, however the result can only be represented as described
above if its value is target independent. Target dependent values
are represented by the <code>calc</code> field which contains an
<A HREF="#exp">expression</A> describing how to calculate the value.
The <code>token</code> field is used to represent <code>NAT</code>
tokens.
</para>
<para>
Objects representing small integer constants are created at the start
of the program and stored in a table for ease of access. Larger constants
are created as and when they are required.
</para>
</sect3>
<sect3 id="flt">
<title>3.2.21. <code>FLOAT</code></title>
<para>
The union type <code>FLOAT</code> (<code>flt</code>) is used to represent
a floating point constant expression. There is only one field, <code>simple
</code>, which corresponds to a floating point literal. No folding
of floating point constant expressions is attempted in the producer
(it is virtually impossible to do so in a target independent manner).
</para>
<para>
Objects representing useful floating point constants (0.0, 1.0 etc.)
are created for each floating point type and stored as part of the
corresponding <A HREF="#ftype"><code>FLOAT_TYPE</code></A>. Other
values are created as and when they are required.
</para>
</sect3>
<sect3 id="str">
<title>3.2.22. <code>STRING</code></title>
<para>
The union type <code>STRING</code> (<code>str</code>) is used to represent
a string constant expression. There is only one field,
<code>simple</code>, which corresponds to a character string literal,
however the <code>kind</code> field can be used to modify the interpretation
put on the characters appearing in the <code>text</code>
field. By default, each character in <code>text</code> corresponds
to a single character in the literal; however an alternative representation,
in which <code>text</code> consists of a sequence of multibyte characters
- one control character plus four value characters - is used in more
complex cases.
</para>
<para>
All strings are stored in a hash table intended to ensure that the
same <code>STRING</code> object is used for equal string literals.
This not only saves space during the processing of the input file,
but also facilitates the output of shared string literals in the TDF
capsule.
</para>
<para>
Note that the terminal zero character does not form part of the
<code>STRING</code> object. Instead information on this is stored
as part of the type of a <A HREF="#exp">string literal expression</A>.
The text of the string literal is either truncated or padded with
zeros until its length matches the size of the array bound in the
type of the corresponding literal expression.
</para>
</sect3>
<sect3 id="ntest">
<title>3.2.23. <code>NTEST</code></title>
<para>
The enumeration type <code>NTEST</code> (<code>ntest</code>) is used
to represent the various C++ relational operators (<code>==</code>,
<code>!=</code>, <code>></code> etc.). The values correspond to
the encoding of the TDF <code>NTEST</code> sort, which facilitates
code generation. The values also have the property that the values
for complementary operators (such as <code><</code> and
<code>>=</code>) always add up to the same value,
<code>ntest_negate</code>, allowing operators to be complemented in
a straightforward manner.
</para>
</sect3>
<sect3 id="rmode">
<title>3.2.24. <code>RMODE</code></title>
<para>
The enumeration type <code>RMODE</code> (<code>rmode</code>) is used
to represent the various C++ rounding modes (towards zero, towards
smaller etc.). The values correspond to the encoding of the TDF
<code>RMODE</code> sort, which facilitates code generation.
</para>
</sect3>
<sect3 id="exp">
<title>3.2.25. <code>EXP</code></title>
<para>
The union type <code>EXP</code> (<code>exp</code>) is used to represent
a C++ expression or statement. Each expression has an associated
<A HREF="#type">type</A>, <code>type</code>, but most of the information
about an expression is stored in one of the large number of fields
of the <code>EXP</code> union. Most of these fields are fairly simple.
For example, there are fields corresponding to <A HREF="#nat">integer
literals</A>, <A HREF="#flt">floating point literals</A>,
<A HREF="#str">string literals</A> and <A HREF="#id">identifiers</A>.
Composite expressions are formed in the normal way; for example, there
are various binary operators comprising two argument expressions.
The
<code>EXP</code> fields corresponding to statements are slightly more
complex. They each have a <code>parent</code> field which points
to the enclosing statement. A couple of cases bear additional discussion.
</para>
<para>
The <code>sequence</code> field represents a compound statement or
block. This contains a <A HREF="#nspace">namespace</A>, in which
any local variables are declared, and a list of expressions, giving
the statements comprising the block. The null namespace is used if
the block does not constitute a scope. The first statement in the
list is always a dummy to enable <code>first</code> and <code>last</code>
pointers to be maintained to the start and end of the list without
having to worry about null lists.
</para>
<para>
<A id="solve">The <code>solve_stmt</code> field corresponds to the
TDF <code>labelled</code> construct</A> (in early versions of TDF
this construct was called <code>solve</code>, hence the terminology).
The problem is that C and C++ labels and <code>goto</code>s are totally
unstructured, whereas the TDF label constructs are structured. Any
statement which contains unstructured labels is enclosed in a
<code>solve_stmt</code> construct, enclosing both the labelled statement
and all jumps to it (in general this cannot be done until the end
of the function). Any labels or variables which are bypassed by such
unstructured jumps also need to be pulled out to the <code>solve_stmt</code>
construct. It is not just explicit labels which can cause such problems;
complex <code>switch</code> statements have the same effect.
</para>
</sect3>
<sect3 id="off">
<title>3.2.26. <code>OFFSET</code></title>
<para>
The union type <code>OFFSET</code> (<code>off</code>) is used to represent
an offset expression. This is used as an adjunct to the normal
<A HREF="#exp">expression</A> representation. The <code>OFFSET</code>
union has fields corresponding to a type offset (used in pointer arithmetic),
the offset of a member of a class and the offset of a base class.
There are also simple operations on offsets, such as multiplication
by an expression.
</para>
</sect3>
<sect3 id="tok">
<title>3.2.27. <code>TOKEN</code></title>
<para>
The union type <code>TOKEN</code> (<code>tok</code>) is used to represent
one of a number of different categories within the C++ language.
It corresponds to the sort of a token declared using the
<A HREF="token.html"><code>#pragma token</code> syntax</A>. Thus
there are fields corresponding to expression, statement, integer constant,
type, function, member and procedure tokens. The similarities between
<code>PROC</code> tokens and templates have been remarked above; for
example, the parameters of the template:
<programlisting>
template < class T, int n > class A {
T a [n] ;
// ....
} ;
</programlisting>
are essentially equivalent to those in the procedure token:
<programlisting>
PROC ( TYPE T, EXP const : int : n ) ....
</programlisting>
(recall that non-type template arguments are always constant expressions).
Thus a field, <code>templ</code>, of the <code>TOKEN</code> union
is used to represent lists of template parameters. Note that a further
field, <code>class</code>, is also required to represent template
template parameters. A <A HREF="#type">template type</A> is represented
by a field, <code>templ</code>, of the union <code>TYPE</code>, which
comprises a template sort and a sub-type expressed in terms of the
template parameters.
</para>
<para>
In addition to representing token and template sorts in this way,
the
<code>TOKEN</code> union is used to represent token and template arguments.
Each of the parameter sorts listed above has an appropriate
<code>value</code> component which can store a value of that sort.
Many of the union types in the algebra, including <A HREF="#type">types</A>
and <A HREF="#exp">expressions</A>, have a field of the form:
<programlisting>
token -> {
IDENTIFIER tok ;
LIST TOKEN args ;
}
</programlisting>
representing the given token <A HREF="#id">identifier</A> applied
to the given list of arguments.
</para>
<para>
<A id="form">Template instances are represented slightly differently
from token applications</A>. Each instance of a template class or
a template function gives rise to a new class or function
<A HREF="#id">identifier</A>. This identifier has an underlying form
giving the template identifier and the template arguments. This is
expressed as a <code>token</code> member of the
<A HREF="#type"><code>TYPE</code></A> union (although it is not technically
a type, this happens to be the most convenient representation). Each
such form has an associated
<A HREF="#inst"><code>INSTANCE</code></A> component which gives further
information about the template instance. The form for a template
function instance is stored in the <code>form</code> component of
the corresponding <A HREF="#id">identifier</A>. The form for a template
class instance is stored in the <code>form</code> component of the
corresponding <A HREF="#ctype">class type</A>.
</para>
<para>
Members of instances of template classes also have a form type, but
in this case the form is an <code>instance</code> type. This gives
a link back to the corresponding member of the template class.
</para>
</sect3>
<sect3 id="inst">
<title>3.2.28. <code>INSTANCE</code></title>
<para>
The union type <code>INSTANCE</code> (<code>inst</code>) is used to
represent a particular instance of a template or token. Each
<A HREF="#tok">template sort</A> has an associated list of all the
instances of that template, which is used to ensure that the same
template applied with the same arguments always has the same value.
Information on partial or explicit specialisations and usage information
are stored as part of the corresponding
<code>INSTANCE</code>. Each template instance identifier has a link
back to its corresponding <code>INSTANCE</code> via its
<A HREF="#form"><code>form</code> component</A>.
</para>
</sect3>
<sect3 id="err">
<title>3.2.29. <code>ERROR</code></title>
<para>
The union type <code>ERROR</code> (<code>err</code>) is used to represent
an error arising during the compilation of a C++ program. Errors are
first class objects within the producer and can be passed to and from
procedures. Each error has an associated <code>severity</code>
(serious, warning, none etc.). Simple errors are represented by the
<code>simple</code> field, which consists of an index, <code>number</code>,
into the error catalogue, plus a variable length list of error arguments.
Errors can be combined into composite errors using the
<code>compound</code> field, which represents the join of two errors
-
<code>head</code> followed by <code>tail</code>.
</para>
<para>
The chief operation on an error after it has been built up is to report
it. Each error report consists of an error object and a
<A HREF="#loc">file location</A> indicating where the error occurred.
</para>
</sect3>
<sect3 id="var">
<title>3.2.30. <code>VARIABLE</code></title>
<para>
The structure type <code>VARIABLE</code> (<code>var</code>) is used
to represent a variable state and is used in the variable analysis
checks.
</para>
</sect3>
<sect3 id="location">
<title>3.2.31. <code>LOCATION</code></title>
<para>
The structure type <code>LOCATION</code> (<code>loc</code>) is used
to represent a location in an input file. It comprises a pointer
to an
<A HREF="#posn">input file position</A>, <code>posn</code>, modified
by a line number, taking <code>#line</code> directives into account,
<code>line</code>. Note that character positions within the line
are not currently recorded.
</para>
</sect3>
<sect3 id="posn">
<title>3.2.32. <code>POSITION</code></title>
<para>
The structure type <code>POSITION</code> (<code>posn</code>) is used
to represent a position in an input file. It consists of two file
names,
<code>file</code> taking <code>#line</code> directives into account,
and
<code>input</code> giving the actual file name, plus a line number
offset, <code>offset</code>, which gives the difference between the
line number taking <code>#line</code> directives into account and
the actual line number. Other information stored includes the datestamp
on the input file, <code>datestamp</code>, and a pointer to a
<A HREF="#loc">file location</A> which, for files included using
<code>#include</code>, gives the location the file was included from.
</para>
</sect3>
<sect3 id="bits">
<title>3.2.33. <code>BITSTREAM</code></title>
<para>
The structure <code>BITSTREAM</code> is not part of the
<code>calculus</code> type system. It is used to represent a sequence
of bits such as is used, for example, in the encoding of TDF.
</para>
</sect3>
<sect3 id="buff">
<title>3.2.34. <code>BUFFER</code></title>
<para>
The structure <code>BUFFER</code> is not part of the <code>calculus</code>
type system. It is used to represent a sequence of characters.
</para>
</sect3>
<sect3 id="opt">
<title>3.2.35. <code>OPTIONS</code></title>
<para>
The structure <code>OPTIONS</code> is not part of the <code>calculus</code>
type system. It is used to represent the state of the
<A HREF="pragma.html#low">compiler options</A> at a particular point
in the input file.
</para>
</sect3>
<sect3 id="pptok">
<title>3.2.36. <code>PPTOKEN</code></title>
<para>
The structure <code>PPTOKEN</code> is not part of the <code>calculus</code>
type system. It is used to represent a linked list of preprocessing
tokens. Each token has an associated <code>sid</code> lexical token
number, <code>tok</code>, plus additional data dependent on the token
type. Each token also records a pointer to the current
<A HREF="#opt"><code>OPTIONS</code></A> value.
</para>
</sect3>
</sect2>
<sect2>
<title>3.3. Error catalogue</title>
<para>
This section describes the error catalogue which lies at the heart
of the C++ producer's error reporting routines. The full
<A HREF="error1.html">error catalogue syntax</A> is given as an annex.
A typical entry in the catalogue is as follows:
<programlisting>
class_union_deriv ( CLASS_TYPE: ct )
{
USAGE: serious
PROPERTIES: ansi
KEY (ISO) "9.5"
KEY (STANDARD) "The union '"ct"' can't have base classes"
}
</programlisting>
This defines an error, <code>class_union_deriv</code>, which takes
a single parameter <code>ct</code> of type <code>CLASS_TYPE</code>.
The severity of this error is <code>serious</code>; that is to say,
a constraint error. The error property <code>ansi</code> indicates
that the error arises from the ISO C++ standard, the associated
<code>ISO</code> key indicating section 9.5. Finally the text to
be printed for this error, including a reference to <code>ct</code>,
is given. Looking up section 9.5 in the ISO C++ standard reveals
the corresponding constraint in paragraph 1:
<BLOCKQUOTE>
<I>A union shall not have base classes.</I>
</BLOCKQUOTE>
Each constraint within the ISO C++ standard has a corresponding error
in this way. The errors are named in a systematic fashion using the
section names used in the draft standard. For example, section 9.5
is called <code>class.union</code>, so all the constraint errors arising
from this section have names of the form <code>class_union_*</code>.
These error names can be used in the <A HREF="pragma.html#low">low
level directives</A> such as:
<programlisting>
#pragma TenDRA++ error "class_union_deriv" <I>allow</I>
</programlisting>
to modify the error severity. The effect of reducing the severity
of a constraint error in this way is undefined.
</para>
<para>
In addition to the obvious error severity levels, <code>serious</code>,
<code>warning</code> and <code>none</code>, the error catalogue specifies
a list of optional severity levels along with their default values.
For example, the entry:
<programlisting>
link_incompat = serious
</programlisting>
sets up an option named <code>link_incompat</code> which is a constraint
error by default. Errors with this severity, such as:
<programlisting>
dcl_stc_external ( LONG_ID: id, PTR_LOC: loc )
{
USAGE: link_incompat
PROPERTIES: ansi
KEY (ISO) "7.1.1"
KEY (STANDARD) "'"id"' previously declared with external
linkage (at "loc")"
}
</programlisting>
are therefore constraint errors. The severity associated with
<code>link_incompat</code> can be modified either
<A HREF="pragma.html#low">directly</A>, using the directive:
<programlisting>
#pragma TenDRA++ option "link_incompat" <I>allow</I>
</programlisting>
or <A HREF="pragma.html#linkage">indirectly</A> using the directive:
<programlisting>
#pragma TenDRA incompatible linkage <I>allow</I>
</programlisting>
the effect being to modify the severity of the associated error messages.
</para>
<para>
The error catalogue is processed by a simple tool,
<code>make_err</code>, which generates C code which is compiled into
the C++ producer. Each error in the catalogue is assigned a number
(there are currently 873 errors in the catalogue) which gives an index
into an automatically generated table of error information. It is
this error number, together with a list of error arguments, which
forms the associated <A HREF="alg.html#err"><code>ERROR</code> object</A>.
<code>make_err</code> generates a macro for each error in the catalogue
which takes arguments of the appropriate types (which may be statically
checked) and creates an <code>ERROR</code> object. For example, for
the entry above this macro takes the form:
<programlisting>
ERROR ERR_class_union_deriv ( CLASS_TYPE ) ;
</programlisting>
These macros hide the error catalogue numbers from the rest of the
C++ producer.
</para>
<para>
It is also possible to join a number of simple <code>ERROR</code>
objects to form a single composite <code>ERROR</code>. The severity
of the composite error is the maximum of the severities of the component
errors. To this purpose a dummy error severity level <code>whatever</code>
is introduced which is less severe than any other level. This is
intended for use with error messages which are only ever used to add
information to existing errors, and which inherit their severity level
from the main error.
</para>
<para>
The text of a simple error message can be found in the table of error
information. The text contains certain escape sequences indicating
where the error arguments are to be printed. For example,
<code>%1</code> indicates the second argument. The error argument
sorts - what is referred to as the error signature - is also stored
in the table of error information as an array of characters, each
corresponding to an <code>ERR_KEY_</code><I>type</I> macro. The producer
defines printing routines for each of the types given by these values,
and calls the appropriate routine to print the argument.
</para>
<para>
There are several command-line options which can be used to modify
the form in which the error message is printed. The default format
is as follows:
<programlisting>
"file.C", line 42: Error:
[ISO 9.5]: The union 'U' can't have base classes.
</programlisting>
The ISO section number can be suppressed using <code>-m-s</code>.
The <code>-mc</code> option causes the source code line giving rise
to the error to be printed as part of the message, with <code>!!!!</code>
marking the position of the error within the line. The <code>-me</code>
option causes the error name, <code>class_union_deriv</code>, to be
printed as part of the message. The <code>-ml</code> option causes
the full file location, including the list of <code>#include</code>
directives used in reaching the file, to be printed. The <code>-mt</code>
option causes <code>typedef</code> names to be used when printing
types, rather than expanding to the type definition.
</para>
</sect2>
<sect2>
<title>3.4. Parsing C++</title>
<para>
The parser used in the C++ producer is generated using the
<A HREF="../utilities/sid.html"><code>sid</code> tool</A>. Because
of the large size of the generated code (1.3MB), the <code>sid</code>
output is run through a simple program, <code>sidsplit</code>, which
splits the output into a number of more manageable modules. It also
transforms the code to use the <A HREF="style.html#language"><code>PROTO</code>
macros</A> used in the rest of the program.
</para>
<para>
<code>sid</code> is designed as a parser for grammars which can be
transformed into LL(1) grammars. The distinguishing feature of these
grammars is that the parser can always decide what to do next based
on the current terminal. This is not the case in C++; in some circumstances
a potentially unlimited look-ahead is required to distinguish, for
example, declaration statements from expression statements. In the
technical phrase, C++ is an LL(k) grammar. Fortunately there are relatively
few such situations, and <code>sid</code>
provides a mechanism, <A HREF="../utilities/sid.html#predicate">predicates</A>,
for bypassing the normal parsing mechanism in these cases. Thus it
is possible, although difficult, to express C++ as a <code>sid</code>
grammar.
</para>
<para>
The <code>sid</code> grammar file, <code>syntax.sid</code>, is closely
based on the ISO C++ grammar. In particular, the same production
names have been used. The grammar has been extended slightly to allow
common syntactic errors to be detected elegantly. Other parsing errors
are handled by <code>sid</code>'s exception mechanism. At present
there is only limited recovery after such errors.
</para>
<para>
The lexical analysis routines in the C++ producer are hand-crafted,
based on an initial version generated by the simple lexical analyser
generator,
<code>lexi</code>. <code>lexi</code> has been used more directly
to generate the lexical analysers for certain of the other automatic
code generating tools, including <code>calculus</code>, used in the
producer.
</para>
<para>
The <code>sid</code> grammar contains a number of entry points. The
most important is <code>parse_file</code>, which is used to parse
a complete C++ translation unit. The syntax for the
<A HREF="pragma.html"><code>#pragma TenDRA</code> directives</A> is
included within the same grammar with two entry points,
<code>parse_tendra</code> in normal use, and <code>parse_preproc</code>
for use in preprocessing mode. There are also entry points in the
grammar for each of the kinds of <A HREF="token.html#args">token argument</A>.
The parsing routines for token and template arguments are largely
hand-crafted, based on these primitives.
</para>
<para>
Certain parsing operations are performed before control passes to
the
<code>sid</code> grammar. As mentioned above, these include the processing
of token and template applications. The other important case concerns
nested name specifiers. For example, in:
<programlisting>
class A {
class B {
static int c ;
} ;
} ;
int A::B::c = 0 ;
</programlisting>
the qualified identifier <code>A::B::c</code> is split into two terminals,
a nested name specifier, <code>A::B::</code>, and an identifier, <code>c</code>,
which is looked up in the corresponding namespace. Note that it is
at this stage that name look-up occurs. An identifier can be mapped
to one of a number of terminals, including keywords, type names,
namespace names and other identifiers, according to the result of
this look-up. If the look-up gives a macro then this is expanded
at this stage.
</para>
</sect2>
<sect2>
<title>3.5. TDF generation</title>
<para>
The TDF encoding as a bitstream is expressed as a series of macros
generated by the <code>make_tdf</code> tool from the TDF specification
database. Note that the version of the TDF database used contains
a couple of corrections from the standard version:
<itemizedlist>
<listitem>A construct <code>make_token_def</code> has been added to represent
a token definition.
</listitem>
<listitem>The sort <code>diag_tag</code> has been added to the edge constructors.
</listitem>
</itemizedlist>
The macros generated only handle the encoding of the construct - the
construct parameters need to be encoded by hand (the C producer does
something similar, but including the construct parameters). For example,
<code>make_tdf</code> generates a macro:
<programlisting>
void ENC_plus ( BITSTREAM * ) ;
</programlisting>
which encodes the <code>plus</code> construct (91 as 7 bits in extended
format). A typical use of this macro, for adding the expressions
<code>a</code> and <code>b</code> would be:
<programlisting>
ENC_plus ( bs ) ;
ENC_impossible ( bs ) ;
bs = enc_exp ( bs, a ) ;
bs = enc_exp ( bs, b ) ;
</programlisting>
</para>
<para>
Each function or variable is compiled to TDF as its definition is
encountered. For some definitions, such as inline functions, the
compilation may be deferred until it is clear whether or not the identifier
has been used. There is a final pass over all identifiers during
the variable analysis routines which incorporates this check. Because
of the organisation of a TDF capsule it is necessary to store all
of the compiled TDF in memory until the end of the program, when the
complete capsule, including external tag and token names and linkage
information, is written to the output file.
</para>
</sect2>
</sect1>
<sect1>
<title>Annex A. <code>#pragma</code> directive syntax</title>
<para>
The following gives a summary of the syntax for the <code>#pragma</code>
directives used for <A HREF="pragma.html">compiler configuration</A>
and <A HREF="token.html">token specification</A>:
<programlisting>
<I>pragma-directive</I> :
<A HREF="#tendra"># pragma TenDRA ++<I><SUB>opt</SUB> tendra-directive</I></A>
<A HREF="#token"># pragma <I>token-directive</I></A>
<A id="tendra"><I>tendra-directive</I></A> :
<A HREF="#scope"><I>scope-directive</I></A>
<A HREF="#low"><I>low-level-directive</I></A>
<A HREF="#analysis"><I>analysis-directive on</I></A>
<A HREF="#check"><I>check-directive allow</I></A>
<A HREF="#keyword"><I>keyword-directive</I></A>
<A HREF="#type"><I>type-directive</I></A>
<A HREF="#linkage"><I>linkage-directive</I></A>
<A HREF="#misc"><I>misc-directive</I></A>
<A HREF="#token1"><I>tendra-token-directive</I></A>
<I>on</I> :
on
warning
off
<I>allow</I> :
allow
warning
disallow
<A id="scope"><I>scope-directive</I></A> :
<A HREF="pragma.html#scope">begin</A>
<A HREF="pragma.html#scope">begin name environment <I>identifier</I></A>
<A HREF="pragma.html#scope">end</A>
<A HREF="pragma.html#scope">directory <I>identifier</I> use environment <I>identifier</I></A>
<A HREF="pragma.html#scope">use environment <I>identifier</I></A>
<A HREF="pragma.html#scope">use environment <I>identifier</I> reset <I>allow</I></A>
<A id="low"><I>low-level-directive</I></A> :
<A HREF="pragma.html#low">error <I>string-literal allow</I></A>
<A HREF="pragma.html#low">error <I>string-literal on</I></A>
<A HREF="pragma.html#low">error <I>string-literal</I> as option <I>string-literal</I></A>
<A HREF="pragma.html#low">option <I>string-literal allow</I></A>
<A HREF="pragma.html#low">option <I>string-literal on</I></A>
<A HREF="pragma.html#limits">option value <I>string-literal integer-literal</I></A>
<A HREF="pragma.html#low">use error <I>string-literal</I></A>
<A id="analysis"><I>analysis-directive</I></A> :
<A HREF="pragma.html#init">complete initialization analysis</A>
<A HREF="pragma.html#elab">complete struct / union analysis</A>
<A HREF="pragma.html#conv">conversion analysis <I>conversion-spec<SUB>opt</SUB></I></A>
<A HREF="pragma.html#discard">discard analysis <I>discard-spec<SUB>opt</SUB></I></A>
<A HREF="pragma.html#switch">enum switch analysis</A>
<A HREF="pragma.html#linkage">external function linkage</A>
<A HREF="pragma.html#for">for initialization block</A>
<A HREF="pragma.html#elab">ignore struct / union / enum tag</A>
<A HREF="pragma.html#template">implicit export template</A>
<A HREF="pragma.html#impl_func">implicit function declaration</A>
<A HREF="pragma.html#exp">integer operator analysis</A>
<A HREF="pragma.html#exp">integer overflow analysis</A>
<A HREF="pragma.html#comment">nested comment analysis</A>
<A HREF="pragma.html#exp">operator precedence analysis</A>
<A HREF="pragma.html#exp">pointer operator analysis</A>
<A HREF="pragma.html#throw">throw analysis</A>
<A HREF="pragma.html#linkage">unify external linkage</A>
<A HREF="pragma.html#variable">variable analysis</A>
<A HREF="pragma.html#hide">variable hiding analysis</A>
<A HREF="pragma.html#weak">weak prototype analysis</A>
<I>conversion-spec</I> :
( int - int <I>implicit-spec<SUB>opt</SUB></I> )
( int - pointer <I>implicit-spec<SUB>opt</SUB></I> )
( pointer - int <I>implicit-spec<SUB>opt</SUB></I> )
( pointer - pointer <I>implicit-spec<SUB>opt</SUB></I> )
( int - enum implicit )
( pointer - void * implicit )
( void * - pointer implicit )
<I>implicit-spec</I> :
implicit
explicit
<I>discard-spec</I> :
( function return )
( static )
( value )
<A id="check"><I>check-directive</I></A> :
<A HREF="pragma.html#overload">ambiguous overload resolution</A>
<A HREF="pragma.html#if">assignment as bool</A>
<A HREF="pragma.html#bitfield">bitfield overflow</A>
<A HREF="pragma.html#linkage">block function static</A>
<A HREF="pragma.html#catch_all">catch all</A>
<A HREF="pragma.html#escape">character escape overflow</A>
<A HREF="token.html#tokdef">compatible token</A>
<A HREF="pragma.html#include">complete file includes</A>
<A HREF="pragma.html#target-if">conditional declaration</A>
<A HREF="pragma.html#lvalue">conditional lvalue</A>
<A HREF="pragma.html#overload">conditional overload resolution <I>overload-spec<SUB>opt</SUB></I></A>
<A HREF="pragma.html#if">const conditional</A>
<A HREF="pragma.html#macro">directive as macro argument</A>
<A HREF="pragma.html#identifier">dollar as ident</A>
<A HREF="pragma.html#elab">extra ,</A>
<A HREF="pragma.html#decl_none">extra ;</A>
<A HREF="pragma.html#if">extra ; after conditional</A>
<A HREF="pragma.html#weak">extra ...</A>
<A HREF="pragma.html#bitfield">extra bitfield int type</A>
<A HREF="pragma.html#macro">extra macro definition</A>
<A HREF="pragma.html#typedef">extra type definition</A>
<A HREF="pragma.html#switch">fall into case</A>
<A HREF="pragma.html#elab">forward enum declaration</A>
<A HREF="pragma.html#conv">function pointer as pointer</A>
<A HREF="pragma.html#ellipsis">ident ...</A>
<A HREF="pragma.html#implicit">implicit int type <I>inttype-spec<SUB>opt</SUB></I></A>
<A HREF="token.html#tokdef">implicit token definition</A>
<A HREF="token.html#spec">incompatible interface declaration</A>
<A HREF="token.html#member">incompatible member declaration</A>
<A HREF="pragma.html#linkage">incompatible linkage</A>
<A HREF="pragma.html#weak">incompatible promoted function argument</A>
<A HREF="pragma.html#compatible">incompatible type qualifier</A>
<A HREF="pragma.html#return">incompatible void return</A>
<A HREF="pragma.html#complete">incomplete type as object type</A>
<A HREF="pragma.html#ppdir">indented # directive</A>
<A HREF="pragma.html#ppdir">indented directive after #</A>
<A HREF="pragma.html#init">initialization of struct / union ( auto )</A>
<A HREF="pragma.html#longlong">longlong type</A>
<A HREF="pragma.html#ppdir">no directive / nline after ident</A>
<A HREF="pragma.html#empty">no external declaration</A>
<A HREF="pragma.html#macro">no ident after #</A>
<A HREF="pragma.html#lex">no nline after file end</A>
<A HREF="token.html#tokdef">no token definition</A>
<A HREF="pragma.html#overload">overload resolution</A>
<A HREF="pragma.html#weak">prototype</A>
<A HREF="pragma.html#weak">prototype ( weak )</A>
<A HREF="token.html#exp">rvalue token as const</A>
<A HREF="pragma.html#ppdir">text after directive</A>
<A HREF="pragma.html#lvalue">this lvalue</A>
<A HREF="pragma.html#string">unify incompatible string literal</A>
<A HREF="pragma.html#ppdir">unknown directive</A>
<A HREF="pragma.html#escape">unknown escape</A>
<A HREF="pragma.html#ppdir">unknown pragma</A>
<A HREF="pragma.html#decl_none">unknown struct / union</A>
<A HREF="pragma.html#string">unmatched quote</A>
<A HREF="pragma.html#reach">unreachable code</A>
<A HREF="pragma.html#init">variable initialization</A>
<A HREF="pragma.html#macro">weak macro equality</A>
<A HREF="pragma.html#string">writeable string literal</A>
<I>inttype-spec</I> :
for const / volatile
for external declaration
for function return
<I>overload-spec</I> :
( complete )
( incomplete )
<A id="keyword"><I>keyword-directive</I></A> :
<A HREF="#keyword">keyword <I>identifier</I> for <I>keyword-spec</I></A>
<A HREF="pragma.html#keyword-spec">undef keyword <I>identifier</I></A>
<A id="keyword-spec"><I>keyword-spec</I></A> :
<A HREF="pragma.html#discard">discard value</A>
<A HREF="pragma.html#variable">discard variable</A>
<A HREF="pragma.html#switch">exhaustive</A>
<A HREF="pragma.html#switch">fall into case</A>
<A HREF="pragma.html#keyword">keyword <I>identifier</I></A>
<A HREF="pragma.html#keyword">operator <I>operator</I></A>
<A HREF="pragma.html#variable">set</A>
<A HREF="pragma.html#reach">set reachable</A>
<A HREF="pragma.html#reach">set unreachable</A>
<A HREF="pragma.html#conv">type representation</A>
<A HREF="pragma.html#weak">weak</A>
<A id="type-directive"><I>type-directive</I></A> :
<A HREF="pragma.html#reach">bottom <I>identifier</I></A>
<A HREF="pragma.html#char">character <I>character-sign</I></A>
<A HREF="pragma.html#identifier">character <I>character-literal character-mapping</I></A>
<A HREF="pragma.html#identifier">character <I>string-literal character-mapping</I></A>
<A HREF="lib.html#arith">compute promote <I>identifier</I></A>
<A HREF="pragma.html#escape">escape <I>character-literal character-mapping</I></A>
<A HREF="pragma.html#int">integer literal <I>literal-spec</I></A>
<A HREF="lib.html#arith">promoted <I>type-id</I> : <I>type-id</I></A>
<A HREF="pragma.html#char">set character literal : <I>type-id</I></A>
<A HREF="pragma.html#longlong">set longlong type : <I>longlong-spec</I></A>
<A HREF="pragma.html#char">set ptrdiff_t : <I>type-id</I></A>
<A HREF="pragma.html#char">set size_t : <I>type-id</I></A>
<A HREF="pragma.html#char">set wchar_t : <I>type-id</I></A>
<A HREF="pragma.html#string">set string literal : <I>string-const</I></A>
<A HREF="pragma.html#std">set std namespace : <I>scope-name</I></A>
<A HREF="#type-spec">type <I>identifier</I> for <I>type-spec</I></A>
<I>character-sign</I> :
signed
unsigned
either
<I>character-mapping</I> :
as <I>character-literal</I> allow
disallow
<I>literal-spec</I> :
<I>literal-base literal-suffix<SUB>opt</SUB> literal-type-list</I>
<I>literal-base</I> :
decimal
octal
hexadecimal
<I>literal-suffix</I> :
unsigned
long
unsigned long
long long
unsigned long long
<I>literal-type-list</I> :
* <I>literal-type-spec</I>
<I>integer-literal literal-type-spec</I> | <I>literal-type-list</I>
? <I>literal-type-spec</I> | <I>literal-type-list</I>
<I>literal-type-spec</I> :
: <I>type-id</I>
* <I>allow<SUB>opt</SUB></I> : <I>identifier</I>
* * <I>allow<SUB>opt</SUB></I> :
<I>longlong-spec</I> :
long
long long
<I>string-const</I> :
const
no const
<I>scope-name</I> :
<I>identifier</I>
::
<A id="type-spec"><I>type-spec</I></A> :
<A HREF="pragma.html#reach">bottom</A>
<A HREF="pragma.html#char">ptrdiff_t</A>
<A HREF="pragma.html#char">size_t</A>
<A HREF="pragma.html#char">wchar_t</A>
<A HREF="pragma.html#printf">... printf</A>
<A HREF="pragma.html#printf">... scanf</A>
<A id="linkage"><I>linkage-directive</I></A> :
<A HREF="pragma.html#linkage">const linkage <I>linkage</I></A>
<A HREF="pragma.html#linkage">external linkage <I>string-literal</I></A>
<A HREF="pragma.html#linkage">external volatile_t</A>
<A HREF="pragma.html#linkage">inline linkage <I>linkage</I></A>
<A HREF="pragma.html#linkage">linkage resolution : <I>linkage-spec</I></A>
<I>linkage</I> :
external
internal
<I>linkage-spec</I> :
( <I>linkage</I> ) on
( <I>linkage</I> ) warning
off
<A id="misc"><I>misc-directive</I></A> :
<A HREF="pragma.html#weak">argument <I>type-id</I> as ...</A>
<A HREF="pragma.html#weak">argument <I>type-id</I> as <I>type-id</I></A>
<A HREF="pragma.html#compatible">compatible type : <I>type-id</I> == <I>type-id</I> : <I>allow</I></A>
<A HREF="pragma.html#conv">conversion <I>identifier-list</I> allow</A>
<A HREF="dump.html#scope">declaration block <I>identifier</I> begin</A>
<A HREF="dump.html#scope">declaration block end</A>
<A HREF="pragma.html#ppdir">directive <I>directive-spec directive-state</I></A>
<A HREF="pragma.html#variable">discard <I>expression</I></A>
<A HREF="pragma.html#switch">exhaustive</A>
<A HREF="pragma.html#cast">explicit cast <I>cast-spec<SUB>opt</SUB> allow</I></A>
<A HREF="pragma.html#include">includes depth <I>integer-literal</I></A>
<A HREF="pragma.html#static">preserve <I>preserve-list</I></A>
<A HREF="pragma.html#variable">set <I>expression</I></A>
<A HREF="pragma.html#limits">set error limit <I>integer-literal</I></A>
<A HREF="pragma.html#identifier">set name limit <I>integer-literal</I> warning<I><SUB>opt</SUB></I></A>
<A HREF="pragma.html#discard">suspend static <I>identifier-list</I></A>
<I>directive-spec</I> :
assert
file
ident
import
include_next
unassert
warning
weak
<I>directive-state</I> :
allow
warning
disallow
( ignore ) allow
( ignore ) warning
<I>cast-operator</I> :
static_cast
const_cast
reinterpret_cast
<I>cast-spec</I> :
as <I>cast-operator</I>
<I>cast-spec</I> | <I>cast-operator</I>
<I>preserve-list</I> :
<I>identifier-list</I>
*
<I>identifier-list</I> :
<I>identifier identifier-list<SUB>opt</SUB></I>
<A id="token"><I>token-directive</I></A> :
<A HREF="token.html#spec">token <I>token-spec</I></A>
<A HREF="token.html#tokdef">no_def <I>token-list</I></A>
<A HREF="token.html#tokdef">define <I>token-list</I></A>
<A HREF="token.html#tokdef">ignore <I>token-list</I></A>
<A HREF="token.html#tokdef">interface <I>token-list</I></A>
<A HREF="token.html#tokdef">undef token <I>token-list</I></A>
<A HREF="token.html#tokdef">extend interface <I>header-name</I></A>
<A HREF="token.html#tokdef">implement interface <I>header-name</I></A>
<A id="token1"><I>tendra-token-directive</I></A> :
<A HREF="token.html#spec">token <I>token-spec</I></A>
<A HREF="token.html#tokdef">no_def <I>token-list</I></A>
<A HREF="token.html#tokdef">define <I>token-list</I></A>
<A HREF="token.html#tokdef">reject <I>token-list</I></A>
<A HREF="token.html#tokdef">interface <I>token-list</I></A>
<A HREF="token.html#tokdef">undef token <I>token-list</I></A>
<A HREF="token.html#tokdef">extend <I>header-name</I></A>
<A HREF="token.html#tokdef">implement <I>header-name</I></A>
<A HREF="token.html#tokdef">member definition <I>type-id</I> : <I>identifier member-offset</I></A>
<I>member-offset</I> :
::<I><SUB>opt</SUB> id-expression</I>
<I>member-offset</I> . ::<I><SUB>opt</SUB> id-expression</I>
<I>member-offset</I> [ <I>constant-expression</I> ]
<I>token-list</I> :
<I>token-id token-list<SUB>opt</SUB></I>
# <I>preproc-token-list</I>
<I>token-id</I> :
<I>token-namespace<SUB>opt</SUB> identifier</I>
<I>type-id</I> . <I>identifier</I>
<I>token-spec</I> :
<I>token-introduction token-identification</I>
<I>token-introduction</I> :
<I>exp-token</I>
<I>statement-token</I>
<I>type-token</I>
<I>member-token</I>
<I>procedure-token</I>
<I>token-identification</I> :
<I>token-namespace<SUB>opt</SUB> identifier</I> # <I>external-identifier<SUB>opt</SUB></I>
<I>token-namespace</I> :
TAG
<I>external-identifier</I> :
-
<I>preproc-token-list</I>
<I>exp-token</I> :
EXP <I>exp-storage<SUB>opt</SUB></I> : <I>type-id</I> :
NAT
INTEGER
<I>exp-storage</I> :
lvalue
rvalue
const
<I>statement-token</I> :
STATEMENT
<I>type-token</I> :
TYPE
VARIETY
VARIETY signed
VARIETY unsigned
FLOAT
ARITHMETIC
SCALAR
CLASS
STRUCT
UNION
<I>member-token</I> :
MEMBER <I>access-specifier<SUB>opt</SUB> member-type-id</I> : <I>type-id</I> :
<I>member-type-id</I> :
<I>type-id</I>
<I>type-id</I> % <I>constant-expression</I>
<I>access-specifier</I> :
public
protected
private
<I>procedure-token</I> :
<I>general-procedure</I>
<I>simple-procedure</I>
<I>function-procedure</I>
<I>general-procedure</I> :
PROC { <I>bound-toks<SUB>opt</SUB></I> | <I>prog-pars<SUB>opt</SUB></I> } <I>token-introduction</I>
<I>bound-toks</I> :
<I>bound-token</I>
<I>bound-token</I> , <I>bound-toks</I>
<I>bound-token</I> :
<I>token-introduction token-namespace<SUB>opt</SUB> identifier</I>
<I>prog-pars</I> :
<I>program-parameter</I>
<I>program-parameter</I> , <I>prog-pars</I>
<I>program-parameter</I> :
EXP <I>identifier</I>
STATEMENT <I>identifier</I>
TYPE <I>type-id</I>
MEMBER <I>type-id</I> : <I>identifier</I>
PROC <I>identifier</I>
<I>simple-procedure</I> :
PROC ( <I>simple-toks<SUB>opt</SUB></I> ) <I>token-introduction</I>
<I>simple-toks</I> :
<I>simple-token</I>
<I>simple-token</I> , <I>simple-toks</I>
<I>simple-token</I> :
<I>token-introduction token-namespace<SUB>opt</SUB> identifier<SUB>opt</SUB></I>
<I>function-procedure</I> :
FUNC <I>type-id</I> :
</programlisting>
</para>
</sect1>
<sect1>
<title>Annex B. Symbol table dump syntax</title>
<para>
The following gives a summary of the syntax for the
<A HREF="dump.html">symbol table dump file</A> (version 1.1):
<programlisting>
<I>dump-file</I> :
<I>command-list<SUB>opt</SUB></I>
<I>command-list</I> :
<I>command command-list<SUB>opt</SUB></I>
<I>command</I> :
<I>version-command</I>
<I>identifier-command</I>
<I>scope-command</I>
<I>override-command</I>
<I>base-command</I>
<I>api-command</I>
<I>template-command</I>
<I>promotion-command</I>
<I>error-command</I>
<I>path-command</I>
<I>file-command</I>
<I>include-command</I>
<I>string-command</I>
<I>version-command</I> :
V <I>number number string</I>
<I>location</I> :
<I>number number number string string</I>
<I>number number number string</I> *
<I>number number number</I> *
<I>number number</I> *
<I>number</I> *
*
<I>identifier</I> :
<I>number</I> = <I>identifier-name access<SUB>opt</SUB> scope-identifier</I>
<I>number</I>
<I>identifier-name</I> :
<I>string</I>
C <I>type</I>
D <I>type</I>
O <I>string</I>
T <I>type</I>
<I>access</I> :
N
B
P
<I>scope-identifier</I> :
<I>identifier</I>
*
<I>identifier-command</I> :
D <I>identifier-info type-info</I>
M <I>identifier-info type-info</I>
T <I>identifier-info type-info</I>
Q <I>identifier-info</I>
U <I>identifier-info</I>
L <I>identifier-info</I>
C <I>identifier-info</I>
W <I>identifier-info type-info</I>
I <I>identifier-command</I>
<I>identifier-info</I> :
<I>identifier-key location identifier</I>
<I>identifier-key</I> :
K
MO
MF
MB
TC
TS
TU
TE
TA
NN
NA
VA
VP
VE
VS
FE <I>function-key<SUB>opt</SUB></I>
FS <I>function-key<SUB>opt</SUB></I>
FB <I>function-key<SUB>opt</SUB></I>
CF <I>function-key<SUB>opt</SUB></I>
CS <I>function-key<SUB>opt</SUB></I>
CV <I>function-key<SUB>opt</SUB></I>
CM
CD
E
L
XO
XF
XP
XT
<I>function-key</I> :
C <I>function-key<SUB>opt</SUB></I>
I <I>function-key<SUB>opt</SUB></I>
<I>type-info</I> :
<I>type identifier<SUB>opt</SUB></I>
<I>sort</I>
<I>scope-identifier</I>
*
<I>scope-command</I> :
SS <I>scope-key location identifier</I>
SE <I>scope-key location identifier</I>
<I>scope-key</I> :
N
S
B
D
H
CT
CF
CC
<I>override-command</I> :
O <I>identifier identifier</I>
<I>base-command</I> :
B <I>identifier-key identifier base-graph</I>
<I>base-graph</I> :
<I>base-class</I>
<I>base-class</I> ( <I>base-list</I> )
<I>base-class</I> :
<I>number</I> = V<I><SUB>opt</SUB> access<SUB>opt</SUB> type-name</I>
<I>number</I> :
<I>base-list</I> :
<I>base-graph base-list<SUB>opt</SUB></I>
<I>base-number</I> :
<I>number</I> : <I>type-name</I>
<I>api-command</I> :
X <I>identifier-key identifier string</I>
<I>template-command</I> :
Z <I>identifier-key identifier token-application specialise-info</I>
<I>specialise-info</I> :
<I>identifier</I>
<I>token-application</I>
*
<I>type</I> :
<I>type-name</I>
c
s
i
l
x
b
w
y
z
f
d
r
v
u
Sc
Uc
Us
Ui
Ul
Ux
C <I>type</I>
V <I>type</I>
P <I>type</I>
R <I>type</I>
M <I>type-name</I> : <I>type</I>
F <I>type parameter-types</I>
A <I>nat<SUB>opt</SUB></I> : <I>type</I>
B <I>nat</I> : <I>type</I>
t <I>parameter-list<SUB>opt</SUB></I> : <I>type</I>
p <I>type</I>
a <I>type</I> : <I>type</I>
n <I>lit-base<SUB>opt</SUB> lit-suffix<SUB>opt</SUB></I>
W <I>type parameter-types</I>
q <I>type</I>
Q <I>string</I>
*
<I>type-name</I> :
<I>identifier</I>
<I>token-application</I>
<I>parameter-types</I> :
: <I>exception-spec<SUB>opt</SUB> func-qualifier<SUB>opt</SUB></I> :
. <I>exception-spec<SUB>opt</SUB> func-qualifier<SUB>opt</SUB></I> :
. <I>exception-spec<SUB>opt</SUB> func-qualifier<SUB>opt</SUB></I> .
, <I>type parameter-types</I>
<I>func-qualifier</I> :
C <I>func-qualifier<SUB>opt</SUB></I>
V <I>func-qualifier<SUB>opt</SUB></I>
<I>exception-spec</I> :
( <I>exception-list<SUB>opt</SUB></I> )
<I>exception-list</I> :
<I>type</I>
<I>type</I> , <I>exception-list</I>
<I>nat</I> :
+ <I>number</I>
- <I>number</I>
<I>identifier</I>
<I>token-application</I>
<I>string</I>
<I>parameter-list</I> :
<I>identifier</I>
<I>identifier</I> , <I>parameter-list</I>
<I>lit-base</I> :
O
X
<I>lit-suffix</I> :
U
l
Ul
x
Ux
<I>promotion-command</I> :
P <I>type</I> : <I>type</I>
<I>sort</I> :
<I>expression-sort</I>
<I>statement-sort</I>
<I>type-sort</I>
<I>tag-type-sort</I>
<I>member-sort</I>
<I>proc-sort</I>
<I>func-sort</I>
<I>template-sort</I>
<I>macro-sort</I>
<I>expression-sort</I> :
ZEL <I>type</I>
ZER <I>type</I>
ZEC <I>type</I>
ZN
<I>statement-sort</I> :
ZS
<I>type-sort</I> :
ZTO
ZTI
ZTF
ZTA
ZTP
ZTS
ZTU
<I>tag-type-sort</I> :
ZTTS
ZTTU
<I>member-sort</I> :
ZM <I>type</I> : <I>type-name</I>
<I>proc-sort</I> :
ZPG <I>parameter-list<SUB>opt</SUB></I> ; <I>parameter-list<SUB>opt</SUB></I> : <I>sort</I>
ZPS <I>parameter-list<SUB>opt</SUB></I> : <I>sort</I>
<I>func-sort</I> :
ZF <I>type</I>
<I>template-sort</I> :
ZTt <I>parameter-list<SUB>opt</SUB></I> :
<I>macro-sort</I> :
ZUO
ZUF <I>number</I>
<I>token-application</I> :
T <I>identifier</I> , <I>token-argument-list</I> :
<I>token-argument-list</I> :
<I>token-argument</I>
<I>token-argument</I> , <I>token-argument-list</I>
<I>token-argument</I> :
E <I>expression</I>
N <I>nat</I>
S <I>statement</I>
T <I>type</I>
M <I>member</I>
F <I>identifier</I>
C <I>identifier</I>
<I>expression</I> :
<I>nat</I>
<I>statement</I> :
<I>expression</I>
<I>member</I> :
<I>identifier</I>
<I>string</I>
<I>error-name</I> :
<I>number</I> = <I>string</I>
<I>number</I>
<I>error-command</I> :
ES <I>location error-info</I>
EW <I>location error-info</I>
EI <I>location error-info</I>
EF <I>location error-info</I>
EC <I>error-info</I>
EA <I>error-argument</I>
<I>error-info</I> :
<I>error-name number number</I>
<I>error-argument</I> :
B <I>base-number</I>
C <I>scope-identifier</I>
E <I>expression</I>
H <I>identifier-name</I>
I <I>identifier</I>
L <I>location</I>
N <I>nat</I>
S <I>string</I>
T <I>type</I>
V <I>number</I>
V - <I>number</I>
<I>path-command</I> :
FD <I>number</I> = <I>string string<SUB>opt</SUB></I>
<I>directory</I> :
<I>number</I>
*
<I>file-command</I> :
FS <I>location directory</I>
FE <I>location</I>
<I>include-command</I> :
FIA <I>location string</I>
FIQ <I>location string</I>
FIN <I>location string</I>
FIS <I>location string</I>
FIE <I>location string</I>
FIR <I>location</I>
<I>string-command</I> :
A <I>location string</I>
AC <I>location string</I>
AL <I>location string</I>
ACL <I>location string</I>
</programlisting>
</para>
</sect1>
<sect1>
<title>Annex C. Error catalogue syntax</title>
<para>
The following gives a summary of the syntax for the
<A HREF="error.html">error catalogue</A> accepted by the
<code>make_err</code> tool. Identifiers are normal C-style identifiers,
strings consist of any sequence of characters enclosed inside
<code>"...."</code>. The escape sequences <code>\"</code>
and
<code>\\</code> are allowed in strings; other characters (including
newline characters) map to themselves. C-style comments are allowed.
<programlisting>
<I>error-database</I> :
<I>header types<SUB>opt</SUB> properties<SUB>opt</SUB> keys<SUB>opt</SUB> usages<SUB>opt</SUB> entries<SUB>opt</SUB></I>
<I>header</I> :
<I>database-name<SUB>opt</SUB> rig-name<SUB>opt</SUB> prefixes<SUB>opt</SUB></I>
<I>database-name</I> :
DATABASE_NAME : <I>identifier</I>
<I>rig-name</I> :
RIG : <I>identifier</I>
<I>prefixes</I> :
PREFIX : <I>output-prefix<SUB>opt</SUB> compiler-prefix<SUB>opt</SUB> error-prefix<SUB>opt</SUB></I>
<I>output-prefix</I> :
compiler_output -> <I>identifier</I>
<I>compiler-prefix</I> :
from_compiler -> <I>identifier</I>
<I>error-prefix</I> :
from_database -> <I>identifier</I>
<I>types</I> :
TYPES : <I>name-list<SUB>opt</SUB></I>
<I>properties</I> :
PROPERTIES : <I>name-list<SUB>opt</SUB></I>
<I>keys</I> :
KEYS : <I>name-list<SUB>opt</SUB></I>
<I>usages</I> :
USAGE : <I>name-list<SUB>opt</SUB></I>
<I>name</I> :
<I>identifier</I>
<I>identifier</I> = <I>identifier</I>
<I>identifier</I> = <I>identifier</I> | <I>identifier</I>
<I>name-list</I> :
<I>name</I>
<I>name</I> , <I>name-list</I>
<I>type-name</I> :
<I>identifier</I>
<I>property-name</I> :
<I>identifier</I>
<I>key-name</I> :
<I>identifier</I>
<I>usage-name</I> :
<I>identifier</I>
<I>entries</I> :
ENTRIES : <I>entries-list<SUB>opt</SUB></I>
<I>entry-list</I> :
<I>entry entry-list<SUB>opt</SUB></I>
<I>entry</I> :
<I>identifier</I> ( <I>param-list<SUB>opt</SUB></I> ) { <I>entry-body</I> }
<I>entry-body</I> :
<I>alt-name<SUB>opt</SUB> entry-usage<SUB>opt</SUB> entry-properties<SUB>opt</SUB> map-list<SUB>opt</SUB></I>
<I>parameter</I> :
<I>type-name</I> : <I>identifier</I>
<I>param-list</I> :
<I>parameter</I>
<I>parameter</I> , <I>param-list</I>
<I>param-name</I> :
<I>identifier</I>
<I>alt-name</I> :
ALT_NAME : <I>identifier</I>
<I>entry-usage</I> :
USAGE : <I>usage-name</I>
USAGE : <I>usage-name</I> | <I>usage-name</I>
<I>entry-properties</I> :
PROPERTIES : <I>property-list<SUB>opt</SUB></I>
<I>property-list</I> :
<I>property-name</I>
<I>property-name</I> , <I>property-list</I>
<I>map</I> :
KEY ( <I>key-name</I> ) <I>message-list<SUB>opt</SUB></I>
KEY ( <I>key-name</I> ) <I>message-list<SUB>opt</SUB></I> | <I>message-list<SUB>opt</SUB></I>
<I>map-list</I> :
<I>map map-list<SUB>opt</SUB></I>
<I>message-list</I> :
<I>string message-list<SUB>opt</SUB></I>
<I>param-name message-list<SUB>opt</SUB></I>
</programlisting>
</para>
</sect1>
</chapter>
</book>