Subversion Repositories tendra.SVN

Rev

Rev 2 | Blame | Compare with Previous | Last modification | View Log | RSS feed

<!-- Crown Copyright (c) 1998 -->
<HTML>
<HEAD>
<TITLE>
C++ Producer Guide: Type system 
</TITLE>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#400080" ALINK="#FF0000">

<H1>C++ Producer Guide</H1>
<H3>March 1998</H3>
<A HREF="error.html"><IMG SRC="../images/next.gif" ALT="next section"></A>
<A HREF="style.html"><IMG SRC="../images/prev.gif" ALT="previous section"></A>
<A HREF="index.html"><IMG SRC="../images/top.gif" ALT="current document"></A>
<A HREF="../index.html"><IMG SRC="../images/home.gif" ALT="TenDRA home page">
</A>
<IMG SRC="../images/no_index.gif" ALT="document index"><P>
<HR>

<DL>
<DT><A HREF="#primitive"><B>3.2.1</B> - Primitive types</A><DD>
<DT><A HREF="#cv"><B>3.2.2</B> - <CODE>CV_SPEC</CODE></A><DD>
<DT><A HREF="#ntype"><B>3.2.3</B> - <CODE>BUILTIN_TYPE</CODE></A><DD>
<DT><A HREF="#btype"><B>3.2.4</B> - <CODE>BASE_TYPE</CODE></A><DD>
<DT><A HREF="#itype"><B>3.2.5</B> - <CODE>INT_TYPE</CODE></A><DD>
<DT><A HREF="#ftype"><B>3.2.6</B> - <CODE>FLOAT_TYPE</CODE></A><DD>
<DT><A HREF="#cinfo"><B>3.2.7</B> - <CODE>CLASS_INFO</CODE></A><DD>
<DT><A HREF="#cusage"><B>3.2.8</B> - <CODE>CLASS_USAGE</CODE></A><DD>
<DT><A HREF="#ctype"><B>3.2.9</B> - <CODE>CLASS_TYPE</CODE></A><DD>
<DT><A HREF="#graph"><B>3.2.10</B> - <CODE>GRAPH</CODE></A><DD>
<DT><A HREF="#virt"><B>3.2.11</B> - <CODE>VIRTUAL</CODE></A><DD>
<DT><A HREF="#etype"><B>3.2.12</B> - <CODE>ENUM_TYPE</CODE></A><DD>
<DT><A HREF="#type"><B>3.2.13</B> - <CODE>TYPE</CODE></A><DD>
<DT><A HREF="#dspec"><B>3.2.14</B> - <CODE>DECL_SPEC</CODE></A><DD>
<DT><A HREF="#hashid"><B>3.2.15</B> - <CODE>HASHID</CODE></A><DD>
<DT><A HREF="#qual"><B>3.2.16</B> - <CODE>QUALIFIER</CODE></A><DD>
<DT><A HREF="#id"><B>3.2.17</B> - <CODE>IDENTIFIER</CODE></A><DD>
<DT><A HREF="#member"><B>3.2.18</B> - <CODE>MEMBER</CODE></A><DD>
<DT><A HREF="#nspace"><B>3.2.19</B> - <CODE>NAMESPACE</CODE></A><DD>
<DT><A HREF="#nat"><B>3.2.20</B> - <CODE>NAT</CODE></A><DD>
<DT><A HREF="#flt"><B>3.2.21</B> - <CODE>FLOAT</CODE></A><DD>
<DT><A HREF="#str"><B>3.2.22</B> - <CODE>STRING</CODE></A><DD>
<DT><A HREF="#ntest"><B>3.2.23</B> - <CODE>NTEST</CODE></A><DD>
<DT><A HREF="#rmode"><B>3.2.24</B> - <CODE>RMODE</CODE></A><DD>
<DT><A HREF="#exp"><B>3.2.25</B> - <CODE>EXP</CODE></A><DD>
<DT><A HREF="#off"><B>3.2.26</B> - <CODE>OFFSET</CODE></A><DD>
<DT><A HREF="#tok"><B>3.2.27</B> - <CODE>TOKEN</CODE></A><DD>
<DT><A HREF="#inst"><B>3.2.28</B> - <CODE>INSTANCE</CODE></A><DD>
<DT><A HREF="#err"><B>3.2.29</B> - <CODE>ERROR</CODE></A><DD>
<DT><A HREF="#var"><B>3.2.30</B> - <CODE>VARIABLE</CODE></A><DD>
<DT><A HREF="#loc"><B>3.2.31</B> - <CODE>LOCATION</CODE></A><DD>
<DT><A HREF="#posn"><B>3.2.32</B> - <CODE>POSITION</CODE></A><DD>
<DT><A HREF="#bits"><B>3.2.33</B> - <CODE>BITSTREAM</CODE></A><DD>
<DT><A HREF="#buff"><B>3.2.34</B> - <CODE>BUFFER</CODE></A><DD>
<DT><A HREF="#opt"><B>3.2.35</B> - <CODE>OPTIONS</CODE></A><DD>
<DT><A HREF="#pptok"><B>3.2.36</B> - <CODE>PPTOKEN</CODE></A><DD>
</DL>
<HR>

<H2>3.2. Type system</H2>
<P>
This section describes the type system used in the C++ producer. Unless
otherwise stated the types are declared using the 
<A HREF="../utilities/calc.html"><CODE>calculus</CODE> tool</A> as
part of the algebra, <CODE>c_class.alg</CODE>.  The design of this
type algebra was clearly largely based on the concepts underlying
the C++ language; however TDF provided an important influence, not
merely as the intended target language, but also because of its clear
presentation of essential language features. 
</P>

<HR>
<H3><A NAME="primitive">3.2.1. Primitive types</A></H3>
<P>
The primitive types used within the algebra <CODE>c_class</CODE> are
defined as follows: 
<PRE>
        int = &quot;int&quot; ;
        unsigned = &quot;unsigned&quot; ;
        string = &quot;character *&quot; ;
        ulong_type (ulong) = &quot;unsigned long&quot; ;
        BITSTREAM_P (bits) = &quot;BITSTREAM *&quot; ;
        PPTOKEN_P (pptok) = &quot;PPTOKEN *&quot; ;
</PRE>
The integral types are self-explanatory.  All string literals used
in the C++ producer are based on the character type: 
<PRE>
        typedef unsigned char character ;
</PRE>
hence the definition of <CODE>string</CODE>.  The remaining primitive
give links to those portions of the type system which are defined
outside of the algebra.  The types <A HREF="#bits"><CODE>BITSTREAM</CODE></A>
and <A HREF="#pptok"><CODE>PPTOKEN</CODE></A> are described below.
</P>

<HR>
<H3><A NAME="cv">3.2.2. <CODE>CV_SPEC</CODE></A></H3>
<P>
The enumeration type <CODE>CV_SPEC</CODE> (short name <CODE>cv</CODE>)
is used to represent a C++ type qualifier.  It takes the form of a
bitfield, the elements of which can be or-ed together to represent
combinations of type qualifiers.  The cv-qualifiers are represented
by <CODE>cv_const</CODE> and <CODE>cv_volatile</CODE> in the obvious
manner.  The value <CODE>cv_lvalue</CODE> is used as a qualifier to
indicate whether a type is an lvalue or an rvalue.  Other values are
used in function types to represent the function language linkage.
</P>

<HR>
<H3><A NAME="ntype">3.2.3. <CODE>BUILTIN_TYPE</CODE></A></H3>
<P>
The enumeration type <CODE>BUILTIN_TYPE</CODE> (<CODE>ntype</CODE>)
is used to represent the built-in C++ types (<CODE>char</CODE>, 
<CODE>float</CODE>, <CODE>void</CODE> etc.).  It is used chiefly as
an index into tables of type information. 
</P>

<HR>
<H3><A NAME="btype">3.2.4. <CODE>BASE_TYPE</CODE></A></H3>
<P>
The enumeration type <CODE>BASE_TYPE</CODE> (<CODE>btype</CODE>) is
used to represent a C++ simple type specifier such as <CODE>signed</CODE>,
<CODE>short</CODE> or <CODE>int</CODE>.  It takes the form of a bitfield,
the elements of which can be or-ed together to represent combinations
of type specifiers.  Its chief use is when reading a type from the
input file; the various simple type specifiers are combined to give
a value of this type, which is then mapped to an actual <A HREF="#type">C++
type</A>. 
</P>

<HR>
<H3><A NAME="itype">3.2.5. <CODE>INT_TYPE</CODE></A></H3>
<P>
The union type <CODE>INT_TYPE</CODE> (<CODE>itype</CODE>) is used
to represent an integral or bitfield C++ type.  The basic integral
types are given by the <CODE>basic</CODE> field.  Bitfield types are
represented by the <CODE>bitfield</CODE> field.  There are also fields
representing target dependent integral promotion, arithmetic and integer
literal types, plus <CODE>VARIETY</CODE> tokens.  Only one <CODE>INT_TYPE</CODE>
object is created for each integral type. 
</P>

<HR>
<H3><A NAME="ftype">3.2.6. <CODE>FLOAT_TYPE</CODE></A></H3>
<P>
The union type <CODE>FLOAT_TYPE</CODE> (<CODE>ftype</CODE>) is used
to represent a floating point C++ type.  The basic floating point
types are given by the <CODE>basic</CODE> field.  There are also fields
representing target dependent argument promotion and arithmetic types,
plus <CODE>FLOAT</CODE> tokens.  Only one <CODE>FLOAT_TYPE</CODE>
object is created for each floating point type. 
</P>

<HR>
<H3><A NAME="cinfo">3.2.7. <CODE>CLASS_INFO</CODE></A></H3>
<P>
The enumeration type <CODE>CLASS_INFO</CODE> (<CODE>cinfo</CODE>)
is used to represent information relating to a class or enumeration
definition.  It takes the form of a bitfield, the elements of which
can be or-ed together to represent various combinations of properties.
</P>

<HR>
<H3><A NAME="cusage">3.2.8. <CODE>CLASS_USAGE</CODE></A></H3>
<P>
The enumeration type <CODE>CLASS_USAGE</CODE> (<CODE>cusage</CODE>)
is used to represent information relating to the way a class is used.
It takes the form of a bitfield, the elements of which can be or-ed
together to represent various combinations of properties. 
</P>

<HR>
<H3><A NAME="ctype">3.2.9. <CODE>CLASS_TYPE</CODE></A></H3>
<P>
The union type <CODE>CLASS_TYPE</CODE> (<CODE>ctype</CODE>) is used
to represent a C++ class or union.  The main components are an 
<A HREF="#id">identifier</A> giving the class name, 
<A HREF="#cinfo">class information</A> and <A HREF="#cusage">class
usage</A> fields, a <A HREF="#nspace">namespace</A> giving the class
members, a <A HREF="#graph">graph</A> representing the base class
structure, and a <A HREF="#virt">virtual function table</A>.  Only
one 
<CODE>CLASS_TYPE</CODE> object is created for each class or union.
</P>
<P>
Each class maintains a list, <CODE>pals</CODE>, of class and function
identifiers which are declared as friends of that class.  It also
maintains a list, <CODE>chums</CODE>, of those class types which declare
it to be a friend (this is what is actually used in the access checks).
Similarly each function identifier maintains a list, 
<CODE>chums</CODE>, of those class types which declare it to be a
friend. 
</P>
<P>
Each class maintains a list of its constructors, destructors and conversion
functions (included inherited conversion functions).  It also maintains
a list of its virtual base classes.  This information can be obtained
by other means but it is more convenient to record it within the class
type itself. 
</P>

<HR>
<H3><A NAME="graph">3.2.10. <CODE>GRAPH</CODE></A></H3>
<P>
The union type <CODE>GRAPH</CODE> (<CODE>graph</CODE>) is used to
represent a directed acyclic graph arising from the base classes of
a class.  Each node of the graph has a <CODE>head</CODE> which is
a 
<A HREF="#ctype">class type</A>, and several <CODE>tails</CODE> which
give the base class graphs for that class.  Each node has pointers,
<CODE>top</CODE>, to the top of the graph (i.e. the most derived class),
and <CODE>up</CODE>, to the node of which the current node is a direct
base.  Each node also has an <CODE>access</CODE> field which gives
information on the base access, whether it is virtual or not, and
so on, in the form of a <A HREF="#dspec"><CODE>DECL_SPEC</CODE></A>.
Virtual bases are handled by the <CODE>equal</CODE> field which defines
an equivalence relation on the graph which identifies equivalent virtual
bases.  
</P>

<HR>
<H3><A NAME="virt">3.2.11. <CODE>VIRTUAL</CODE></A></H3>
<P>
The union type <CODE>VIRTUAL</CODE> (<CODE>virt</CODE>) is used to
represent the virtual functions declared in a class.  The <CODE>table</CODE>
field is used to represent a virtual function table, and consists
primarily of a list of <CODE>VIRTUAL</CODE> objects giving the virtual
functions for the associated class.  These virtual functions are of
four kinds, each represented by a union field.  A virtual function
first declared in a class is represented by the <CODE>simple</CODE>
field; a virtual function in a class which overrides an inherited
virtual function is represented by the <CODE>override</CODE> field;
an inherited, non-overridden virtual function which is not overridden
in a base class is represented by the 
<CODE>inherit</CODE> field; a inherited, non-overridden virtual function
which is overridden in some base class is represented by the 
<CODE>complex</CODE> field. 
</P>

<HR>
<H3><A NAME="etype">3.2.12. <CODE>ENUM_TYPE</CODE></A></H3>
<P>
The union type <CODE>ENUM_TYPE</CODE> (<CODE>etype</CODE>) is used
to represent a C++ enumeration type.  This consists primarily of an
<A HREF="#id">identifier</A> giving the enumeration name, a 
<A HREF="#cinfo">class information</A> field, a <A HREF="#type">type</A>
giving the underlying representation of the enumeration type, and
a list of <A HREF="#id">identifiers</A> giving the enumerators comprising
the enumeration. 
</P>

<HR>
<H3><A NAME="type">3.2.13. <CODE>TYPE</CODE></A></H3>
<P>
The union type <CODE>TYPE</CODE> (<CODE>type</CODE>) is used to represent
a C++ type.  Every type has an associated <A HREF="#cv">type qualifier</A>,
<CODE>qual</CODE>, which determines whether the type is 
<CODE>const</CODE>, <CODE>volatile</CODE> or an lvalue.  A type may
also have an associated <A HREF="#id">identifier</A>, <CODE>name</CODE>,
giving the corresponding type name (the null identifier being used
for unnamed types).  The other type components are determined by the
union tag.  Each of the type constructs above has a corresponding
field in the <CODE>TYPE</CODE> union: 
<CODE>integer</CODE> for <A HREF="#itype">integral types</A>, 
<CODE>floating</CODE> for <A HREF="#ftype">floating point types</A>,
<CODE>bitfield</CODE> for <A HREF="#itype">bitfield types</A>, 
<CODE>compound</CODE> for <A HREF="#ctype">class or union types</A>,
and 
<CODE>enumerate</CODE> for <A HREF="#etype">enumeration types</A>.
There are also fields <CODE>top</CODE> and <CODE>bottom</CODE>
corresponding to <CODE>void</CODE> and bottom (the type used to represent
values which never return). 
</P>
<P>
Other fields of the <CODE>TYPE</CODE> union represent composite types;
for example, the <CODE>array</CODE> field, representing array types,
comprises a base type, <CODE>sub</CODE>, and an <A HREF="#nat">integer
constant</A> giving the array bound, <CODE>size</CODE>.  These are
generally simple, apart from <CODE>func</CODE>, representing a function
type.  This has the obvious components: a return type, <CODE>ret</CODE>,
a list of parameter types, <CODE>ptypes</CODE>, and a flag indicating
ellipsis functions, <CODE>ellipsis</CODE>.  It also has an associated
<A HREF="#nspace">namespace</A>, <CODE>pars</CODE>, in which the function
parameters are declared.  The parameter identifiers are extracted
from this as a list, <CODE>pids</CODE>.  Member function qualifiers
and language linkage information are represented by a 
<A HREF="#cv"><CODE>CV_QUAL</CODE></A>, <CODE>mqual</CODE>.  The implicit
extra parameter for member functions is recorded in the list 
<CODE>mtypes</CODE>, which adds this extra type to the start of 
<CODE>ptypes</CODE>.  Finally <CODE>except</CODE> gives any exception
specifiers; the case where the exception specifier is absent being
represented by the special value, <CODE>univ_type_set</CODE>. 
</P>

<HR>
<H3><A NAME="dspec">3.2.14. <CODE>DECL_SPEC</CODE></A></H3>
<P>
The enumeration type <CODE>DECL_SPEC</CODE> (<CODE>dspec</CODE>) is
used to represent information on the declaration and usage of an identifier.
It takes the form of a bitfield, the elements of which can be or-ed
together to represent various combinations of properties.  The 32
bits in this bitfield (the maximum which can be represented portably)
are a significant restriction.  This means that the same member of
<CODE>DECL_SPEC</CODE> is often used to mean different things in different
contexts.  This can prove confusing on occasions. 
</P>

<HR>
<H3><A NAME="hashid">3.2.15. <CODE>HASHID</CODE></A></H3>
<P>
The union type <CODE>HASHID</CODE> (<CODE>hashid</CODE>) is used to
represent a C++ identifier name.  The simplest form of identifier
name, 
<CODE>name</CODE>, consists of just a string of characters, such as
<CODE>foo</CODE>.  Extended identifier names, <CODE>ename</CODE>,
are similar, but may contain Unicode characters.  There are however
other forms of identifier name in C++: conversion function names (<CODE>conv
</CODE>) such as <CODE>operator int</CODE>, overloaded operator names
(<CODE>op</CODE>) such as <CODE>operator+</CODE>, constructor names
(<CODE>constr</CODE>), and destructor names (<CODE>destr</CODE>).
There are also names which are used for anonymous identifiers (<CODE>anon</CODE>).
</P>
<P>
Note the distinction between an identifier name and an actual 
<A HREF="#id">identifier</A>.  The latter is a meaning associated
with a name in a particular context.  Every identifier name has an
associated underlying meaning, <CODE>id</CODE>.  This is used to handle
keywords and macros, but for most identifier names this will be a
dummy identifier. Nested underlying meanings (such as a macro hiding
a keyword) are handled by linking the <CODE>alias</CODE> fields of
the corresponding identifiers.  Every identifier name also has a <CODE>cache
</CODE> field which is used to record the look-up of this name as
an unqualified identifier.  This may be set to the null identifier
to indicate that the look-up needs to be re-evaluated. 
</P>
<P>
Identifier names are stored in one of a small number of hash tables,
linked using their <CODE>next</CODE> field.  Each name has only one
entry in these tables, allowing equality of names to be implemented
as <CODE>EQ_hashid</CODE>. 
</P>

<HR>
<H3><A NAME="qual">3.2.16. <CODE>QUALIFIER</CODE></A></H3>
<P>
The enumeration type <CODE>QUALIFIER</CODE> (<CODE>qual</CODE>) is
used to represent the various ways in which an identifier name can
be qualified.  For example, <CODE>::A::a</CODE> is represented by
<CODE>qual_full</CODE>.  The value <CODE>qual_mark</CODE> is used
in the representation of function identifier expressions to indicate
that overload resolution has been performed. 
</P>

<HR>
<H3><A NAME="id">3.2.17. <CODE>IDENTIFIER</CODE></A></H3>
<P>
The union type <CODE>IDENTIFIER</CODE> (<CODE>id</CODE>) is used to
represent the various kinds of C++ identifiers.  Every identifier
has an associated <A HREF="#hashid">identifier name</A>, a parent
<A HREF="#nspace">namespace</A>, a <A HREF="#dspec">declaration information</A>
field, and a <A HREF="#loc">location</A> for its declaration or definition.
Each identifier also has an 
<CODE>alias</CODE> field which is normally used to represent the aliasing
which can occur in inheritance or <CODE>using</CODE>
declarations. 
</P>
<P>
The various fields of the <CODE>IDENTIFIER</CODE> union correspond
to the various kinds of identifier which can arise in C++ - class
names, functions, variables, class members, macros, keywords etc.
Each field has appropriate components giving its type, its definition
or whatever other information is required.  For example, the <CODE>variable
</CODE>
field has a <A HREF="#type">type</A> and two <A HREF="#exp">expressions</A>,
giving the constructor and destructor values for the object. 
</P>
<P>
Most of these identifier components are self-explanatory, however
the treatment of overloaded functions bears discussion.  The various
fields representing functions have an <CODE>over</CODE> component
which is used to link overloaded functions together.  A set of overloaded
functions is treated as if it were a single <CODE>IDENTIFIER</CODE>
- the first in the list - for the purposes of storing in a <A HREF="#member">namespace
member</A>; the other overloaded meanings are accessed by chasing
down the <CODE>over</CODE> components.  In other situations, whether
a function identifier represents a single function or a set of overloaded
functions can be worked out from the context.  For example, in identifier
expressions the <A HREF="#qual">identifier qualifier</A> is used to
mark whether overload resolution has taken place. 
</P>

<HR>
<H3><A NAME="member">3.2.18. <CODE>MEMBER</CODE></A></H3>
<P>
The union type <CODE>MEMBER</CODE> (<CODE>member</CODE>) is used to
represent a member of a <A HREF="#nspace">namespace</A>.  Each member
contains two identifiers, <CODE>id</CODE> and <CODE>alt</CODE>.  The
<CODE>id</CODE> field gives the meaning associated with a particular
name in this namespace; the <CODE>alt</CODE> field is used to represent
a type name which may be hidden by a non-type name. 
</P>
<P>
There are two kinds of member, <CODE>small</CODE> and <CODE>large</CODE>,
corresponding to whether the namespace holds its members in a simple
linked list or in a hash table. 
</P>

<HR>
<H3><A NAME="nspace">3.2.19. <CODE>NAMESPACE</CODE></A></H3>
<P>
The union type <CODE>NAMESPACE</CODE> (<CODE>nspace</CODE>) is used
to represent the set of identifiers declared in a particular scope.
For example, the members declared in a C++ class or namespace, the
parameters declared in a function declarator and the local variables
declared in a block all form scopes.  The various kinds of scope are
distinguished as different fields of the union, but there are basically
two categories.  The first, such as function blocks, which have relatively
small numbers of elements, store their members as a simple linked
lists.  The second, such as classes, which have larger numbers of
elements, store their members in hash tables.  In both cases the elements
are stored using the <A HREF="#member"><CODE>MEMBER</CODE></A>
type. 
</P>
<P>
The key operation on a namespace is to look up a particular 
<A HREF="#hashid">identifier name</A> in its linked list or hash table
of members to find the meaning, if any, associated with that name
in the namespace.  This can be a complex operation because of the
need to take base classes and <CODE>using</CODE> directives (as stored
in the <CODE>use</CODE> component) into account. 
</P>

<HR>
<H3><A NAME="nat">3.2.20. <CODE>NAT</CODE></A></H3>
<P>
The union type <CODE>NAT</CODE> (<CODE>nat</CODE>) is used to represent
an integer constant expression.  Values are represented as lists of
16 bit 'digits'.  Values which fit into a single digit are represented
by the <CODE>small</CODE> field; larger values by the <CODE>large</CODE>
field.  Negated values can be represented by the <CODE>neg</CODE>
field. Folding of integer constant expressions is performed in the
producer, however the result can only be represented as described
above if its value is target independent.  Target dependent values
are represented by the <CODE>calc</CODE> field which contains an 
<A HREF="#exp">expression</A> describing how to calculate the value.
The <CODE>token</CODE> field is used to represent <CODE>NAT</CODE>
tokens. 
</P>
<P>
Objects representing small integer constants are created at the start
of the program and stored in a table for ease of access.  Larger constants
are created as and when they are required. 
</P>

<HR>
<H3><A NAME="flt">3.2.21. <CODE>FLOAT</CODE></A></H3>
<P>
The union type <CODE>FLOAT</CODE> (<CODE>flt</CODE>) is used to represent
a floating point constant expression.  There is only one field, <CODE>simple
</CODE>, which corresponds to a floating point literal.  No folding
of floating point constant expressions is attempted in the producer
(it is virtually impossible to do so in a target independent manner).
</P>
<P>
Objects representing useful floating point constants (0.0, 1.0 etc.)
are created for each floating point type and stored as part of the
corresponding <A HREF="#ftype"><CODE>FLOAT_TYPE</CODE></A>.  Other
values are created as and when they are required. 
</P>

<HR>
<H3><A NAME="str">3.2.22. <CODE>STRING</CODE></A></H3>
<P>
The union type <CODE>STRING</CODE> (<CODE>str</CODE>) is used to represent
a string constant expression.  There is only one field, 
<CODE>simple</CODE>, which corresponds to a character string literal,
however the <CODE>kind</CODE> field can be used to modify the interpretation
put on the characters appearing in the <CODE>text</CODE>
field.  By default, each character in <CODE>text</CODE> corresponds
to a single character in the literal; however an alternative representation,
in which <CODE>text</CODE> consists of a sequence of multibyte characters
- one control character plus four value characters - is used in more
complex cases. 
</P>
<P>
All strings are stored in a hash table intended to ensure that the
same <CODE>STRING</CODE> object is used for equal string literals.
This not only saves space during the processing of the input file,
but also facilitates the output of shared string literals in the TDF
capsule. 
</P>
<P>
Note that the terminal zero character does not form part of the 
<CODE>STRING</CODE> object.  Instead information on this is stored
as part of the type of a <A HREF="#exp">string literal expression</A>.
The text of the string literal is either truncated or padded with
zeros until its length matches the size of the array bound in the
type of the corresponding literal expression. 
</P>

<HR>
<H3><A NAME="ntest">3.2.23. <CODE>NTEST</CODE></A></H3>
<P>
The enumeration type <CODE>NTEST</CODE> (<CODE>ntest</CODE>) is used
to represent the various C++ relational operators (<CODE>==</CODE>,
<CODE>!=</CODE>, <CODE>&gt;</CODE> etc.).  The values correspond to
the encoding of the TDF <CODE>NTEST</CODE> sort, which facilitates
code generation.  The values also have the property that the values
for complementary operators (such as <CODE>&lt;</CODE> and 
<CODE>&gt;=</CODE>) always add up to the same value, 
<CODE>ntest_negate</CODE>, allowing operators to be complemented in
a straightforward manner. 
</P>

<HR>
<H3><A NAME="rmode">3.2.24. <CODE>RMODE</CODE></A></H3>
<P>
The enumeration type <CODE>RMODE</CODE> (<CODE>rmode</CODE>) is used
to represent the various C++ rounding modes (towards zero, towards
smaller etc.).  The values correspond to the encoding of the TDF 
<CODE>RMODE</CODE> sort, which facilitates code generation. 
</P>

<HR>
<H3><A NAME="exp">3.2.25. <CODE>EXP</CODE></A></H3>
<P>
The union type <CODE>EXP</CODE> (<CODE>exp</CODE>) is used to represent
a C++ expression or statement.  Each expression has an associated
<A HREF="#type">type</A>, <CODE>type</CODE>, but most of the information
about an expression is stored in one of the large number of fields
of the <CODE>EXP</CODE> union.  Most of these fields are fairly simple.
For example, there are fields corresponding to <A HREF="#nat">integer
literals</A>, <A HREF="#flt">floating point literals</A>, 
<A HREF="#str">string literals</A> and <A HREF="#id">identifiers</A>.
Composite expressions are formed in the normal way; for example, there
are various binary operators comprising two argument expressions.
The 
<CODE>EXP</CODE> fields corresponding to statements are slightly more
complex.  They each have a <CODE>parent</CODE> field which points
to the enclosing statement.  A couple of cases bear additional discussion.
</P>
<P>
The <CODE>sequence</CODE> field represents a compound statement or
block.  This contains a <A HREF="#nspace">namespace</A>, in which
any local variables are declared, and a list of expressions, giving
the statements comprising the block.  The null namespace is used if
the block does not constitute a scope.  The first statement in the
list is always a dummy to enable <CODE>first</CODE> and <CODE>last</CODE>
pointers to be maintained to the start and end of the list without
having to worry about null lists. 
</P>
<P>
<A NAME="solve">The <CODE>solve_stmt</CODE> field corresponds to the
TDF <CODE>labelled</CODE> construct</A> (in early versions of TDF
this construct was called <CODE>solve</CODE>, hence the terminology).
The problem is that C and C++ labels and <CODE>goto</CODE>s are totally
unstructured, whereas the TDF label constructs are structured.  Any
statement which contains unstructured labels is enclosed in a 
<CODE>solve_stmt</CODE> construct, enclosing both the labelled statement
and all jumps to it (in general this cannot be done until the end
of the function).  Any labels or variables which are bypassed by such
unstructured jumps also need to be pulled out to the <CODE>solve_stmt</CODE>
construct.  It is not just explicit labels which can cause such problems;
complex <CODE>switch</CODE> statements have the same effect. 
</P>

<HR>
<H3><A NAME="off">3.2.26. <CODE>OFFSET</CODE></A></H3>
<P>
The union type <CODE>OFFSET</CODE> (<CODE>off</CODE>) is used to represent
an offset expression.  This is used as an adjunct to the normal 
<A HREF="#exp">expression</A> representation.  The <CODE>OFFSET</CODE>
union has fields corresponding to a type offset (used in pointer arithmetic),
the offset of a member of a class and the offset of a base class.
There are also simple operations on offsets, such as multiplication
by an expression. 
</P>

<HR>
<H3><A NAME="tok">3.2.27. <CODE>TOKEN</CODE></A></H3>
<P>
The union type <CODE>TOKEN</CODE> (<CODE>tok</CODE>) is used to represent
one of a number of different categories within the C++ language. 
It corresponds to the sort of a token declared using the 
<A HREF="token.html"><CODE>#pragma token</CODE> syntax</A>.  Thus
there are fields corresponding to expression, statement, integer constant,
type, function, member and procedure tokens.  The similarities between
<CODE>PROC</CODE> tokens and templates have been remarked above; for
example, the parameters of the template: 
<PRE>
        template &lt; class T, int n &gt class A {
            T a [n] ;
            // ....
        } ;
</PRE>
are essentially equivalent to those in the procedure token: 
<PRE>
        PROC ( TYPE T, EXP const : int : n ) ....
</PRE>
(recall that non-type template arguments are always constant expressions).
Thus a field, <CODE>templ</CODE>, of the <CODE>TOKEN</CODE> union
is used to represent lists of template parameters.  Note that a further
field, <CODE>class</CODE>, is also required to represent template
template parameters.  A <A HREF="#type">template type</A> is represented
by a field, <CODE>templ</CODE>, of the union <CODE>TYPE</CODE>, which
comprises a template sort and a sub-type expressed in terms of the
template parameters. 
</P>
<P>
In addition to representing token and template sorts in this way,
the 
<CODE>TOKEN</CODE> union is used to represent token and template arguments.
Each of the parameter sorts listed above has an appropriate 
<CODE>value</CODE> component which can store a value of that sort.
Many of the union types in the algebra, including <A HREF="#type">types</A>
and <A HREF="#exp">expressions</A>, have a field of the form: 
<PRE>
        token -&gt; {
            IDENTIFIER tok ;
            LIST TOKEN args ;
        }
</PRE>
representing the given token <A HREF="#id">identifier</A> applied
to the given list of arguments. 
</P>
<P>
<A NAME="form">Template instances are represented slightly differently
from token applications</A>.  Each instance of a template class or
a template function gives rise to a new class or function 
<A HREF="#id">identifier</A>.  This identifier has an underlying form
giving the template identifier and the template arguments.  This is
expressed as a <CODE>token</CODE> member of the 
<A HREF="#type"><CODE>TYPE</CODE></A> union (although it is not technically
a type, this happens to be the most convenient representation).  Each
such form has an associated 
<A HREF="#inst"><CODE>INSTANCE</CODE></A> component which gives further
information about the template instance.  The form for a template
function instance is stored in the <CODE>form</CODE> component of
the corresponding <A HREF="#id">identifier</A>.  The form for a template
class instance is stored in the <CODE>form</CODE> component of the
corresponding <A HREF="#ctype">class type</A>. 
</P>
<P>
Members of instances of template classes also have a form type, but
in this case the form is an <CODE>instance</CODE> type.  This gives
a link back to the corresponding member of the template class. 
</P>

<HR>
<H3><A NAME="inst">3.2.28. <CODE>INSTANCE</CODE></A></H3>
<P>
The union type <CODE>INSTANCE</CODE> (<CODE>inst</CODE>) is used to
represent a particular instance of a template or token.  Each 
<A HREF="#tok">template sort</A> has an associated list of all the
instances of that template, which is used to ensure that the same
template applied with the same arguments always has the same value.
Information on partial or explicit specialisations and usage information
are stored as part of the corresponding 
<CODE>INSTANCE</CODE>.  Each template instance identifier has a link
back to its corresponding <CODE>INSTANCE</CODE> via its 
<A HREF="#form"><CODE>form</CODE> component</A>. 
</P>

<HR>
<H3><A NAME="err">3.2.29. <CODE>ERROR</CODE></A></H3>
<P>
The union type <CODE>ERROR</CODE> (<CODE>err</CODE>) is used to represent
an error arising during the compilation of a C++ program. Errors are
first class objects within the producer and can be passed to and from
procedures.  Each error has an associated <CODE>severity</CODE>
(serious, warning, none etc.).  Simple errors are represented by the
<CODE>simple</CODE> field, which consists of an index, <CODE>number</CODE>,
into the error catalogue, plus a variable length list of error arguments.
Errors can be combined into composite errors using the 
<CODE>compound</CODE> field, which represents the join of two errors
- 
<CODE>head</CODE> followed by <CODE>tail</CODE>. 
</P>
<P>
The chief operation on an error after it has been built up is to report
it.  Each error report consists of an error object and a 
<A HREF="#loc">file location</A> indicating where the error occurred.
</P>

<HR>
<H3><A NAME="var">3.2.30. <CODE>VARIABLE</CODE></A></H3>
<P>
The structure type <CODE>VARIABLE</CODE> (<CODE>var</CODE>) is used
to represent a variable state and is used in the variable analysis
checks. 
</P>

<HR>
<H3><A NAME="loc">3.2.31. <CODE>LOCATION</CODE></A></H3>
<P>
The structure type <CODE>LOCATION</CODE> (<CODE>loc</CODE>) is used
to represent a location in an input file.  It comprises a pointer
to an 
<A HREF="#posn">input file position</A>, <CODE>posn</CODE>, modified
by a line number, taking <CODE>#line</CODE> directives into account,
<CODE>line</CODE>.  Note that character positions within the line
are not currently recorded. 
</P>

<HR>
<H3><A NAME="posn">3.2.32. <CODE>POSITION</CODE></A></H3>
<P>
The structure type <CODE>POSITION</CODE> (<CODE>posn</CODE>) is used
to represent a position in an input file.  It consists of two file
names, 
<CODE>file</CODE> taking <CODE>#line</CODE> directives into account,
and 
<CODE>input</CODE> giving the actual file name, plus a line number
offset, <CODE>offset</CODE>, which gives the difference between the
line number taking <CODE>#line</CODE> directives into account and
the actual line number.  Other information stored includes the datestamp
on the input file, <CODE>datestamp</CODE>, and a pointer to a 
<A HREF="#loc">file location</A> which, for files included using 
<CODE>#include</CODE>, gives the location the file was included from.
</P>

<HR>
<H3><A NAME="bits">3.2.33. <CODE>BITSTREAM</CODE></A></H3>
<P>
The structure <CODE>BITSTREAM</CODE> is not part of the 
<CODE>calculus</CODE> type system.  It is used to represent a sequence
of bits such as is used, for example, in the encoding of TDF. 
</P>

<HR>
<H3><A NAME="buff">3.2.34. <CODE>BUFFER</CODE></A></H3>
<P>
The structure <CODE>BUFFER</CODE> is not part of the <CODE>calculus</CODE>
type system.  It is used to represent a sequence of characters. 
</P>

<HR>
<H3><A NAME="opt">3.2.35. <CODE>OPTIONS</CODE></A></H3>
<P>
The structure <CODE>OPTIONS</CODE> is not part of the <CODE>calculus</CODE>
type system.  It is used to represent the state of the 
<A HREF="pragma.html#low">compiler options</A> at a particular point
in the input file. 
</P>

<HR>
<H3><A NAME="pptok">3.2.36. <CODE>PPTOKEN</CODE></A></H3>
<P>
The structure <CODE>PPTOKEN</CODE> is not part of the <CODE>calculus</CODE>
type system.  It is used to represent a linked list of preprocessing
tokens.  Each token has an associated <CODE>sid</CODE> lexical token
number, <CODE>tok</CODE>, plus additional data dependent on the token
type.  Each token also records a pointer to the current 
<A HREF="#opt"><CODE>OPTIONS</CODE></A> value. 
</P>

<HR>
<P><I>Part of the <A HREF="../index.html">TenDRA Web</A>.<BR>Crown
Copyright &copy; 1998.</I></P>
</BODY>
</HTML>