Rev 2 | Blame | Compare with Previous | Last modification | View Log | RSS feed
<!-- Crown Copyright (c) 1998 -->
<HTML>
<HEAD>
<TITLE>
C++ Producer Guide: Token syntax
</TITLE>
</HEAD>
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#400080" ALINK="#FF0000">
<H1>C++ Producer Guide</H1>
<H3>March 1998</H3>
<A HREF="dump.html"><IMG SRC="../images/next.gif" ALT="next section"></A>
<A HREF="pragma.html"><IMG SRC="../images/prev.gif" ALT="previous section"></A>
<A HREF="index.html"><IMG SRC="../images/top.gif" ALT="current document"></A>
<A HREF="../index.html"><IMG SRC="../images/home.gif" ALT="TenDRA home page">
</A>
<IMG SRC="../images/no_index.gif" ALT="document index"><P>
<HR>
<DL>
<DT><A HREF="#spec"><B>2.3.1</B> - Token specifications</A><DD>
<DT><A HREF="#args"><B>2.3.2</B> - Token arguments</A><DD>
<DT><A HREF="#tokdef"><B>2.3.3</B> - Defining tokens</A><DD>
</DL>
<HR>
<H2>2.3. Token syntax</H2>
<P>
The C and C++ producers allow place-holders for various categories
of syntactic classes to be expressed using directives of the form:
<PRE>
#pragma TenDRA token <I>token-spec</I>
</PRE>
or simply:
<PRE>
#pragma token <I>token-spec</I>
</PRE>
These place-holders are represented as TDF tokens and hence are called
tokens. These tokens stand for a certain type, expression or whatever
which is to be represented by a certain named TDF token in the producer
output. This mechanism is used, for example, to allow C API specifications
to be represented target independently. The types, functions and
expressions comprising the API can be described using <CODE>#pragma
token</CODE> directives and the target dependent definitions of these
tokens, representing the implementation of the API on a particular
machine, can be linked in later. This mechanism is described in detail
elsewhere.
</P>
<P>
A <A HREF="pragma1.html#token">summary of the grammar</A> for the
<CODE>#pragma token</CODE> directives accepted by the C++ producer
is given as an annex.
</P>
<HR>
<H3><A NAME="spec">2.3.1. Token specifications</A></H3>
<P>
A token specification is divided into two components, a
<I>token-introduction</I> giving the token sort, and a
<I>token-identification</I> giving the internal and external token
names:
<PRE>
<I>token-spec</I> :
<I>token-introduction token-identification</I>
<I>token-introduction</I> :
<I>exp-token</I>
<I>statement-token</I>
<I>type-token</I>
<I>member-token</I>
<I>procedure-token</I>
<I>token-identification</I> :
<I>token-namespace<SUB>opt</SUB> identifier</I> # <I>external-identifier<SUB>opt</SUB></I>
<I>token-namespace</I> :
TAG
<I>external-identifier</I> :
-
<I>preproc-token-list</I>
</PRE>
The <CODE>TAG</CODE> qualifier is used to indicate that the internal
name lies in the C tag namespace. This only makes sense for structure
and union types. The external token name can be given by any sequence
of preprocessing tokens. These tokens are not macro expanded. If
no external name is given then the internal name is used. The special
external name <CODE>-</CODE> is used to indicate that the token does
not have an associated external name, and hence is local to the current
translation unit. Such a local token must be defined. White space
in the external name (other than at the start or end) is used to indicate
that a TDF unique name should be used. The white space serves as
a separator for the unique name components.
</P>
<H4><A NAME="exp">Expression tokens</A></H4>
<P>
Expression tokens are specified as follows:
<PRE>
<I>exp-token</I> :
EXP <I>exp-storage<SUB>opt</SUB></I> : <I>type-id</I> :
NAT
INTEGER
</PRE>
representing a expression of the given type, a non-negative integer
constant and general integer constant, respectively. Each expression
has an associated storage class:
<PRE>
<I>exp-storage</I> :
lvalue
rvalue
const
</PRE>
indicating whether it is an lvalue, an rvalue or a compile-time constant
expression. An absent <I>exp-storage</I> is equivalent to
<CODE>rvalue</CODE>. All expression tokens lie in the macro namespace;
that is, they may potentially be defined as macros.
</P>
<P>
For backwards compatibility with the C producer, the directive:
<PRE>
#pragma TenDRA++ rvalue token as const <I>allow</I>
</PRE>
causes <CODE>rvalue</CODE> tokens to be treated as <CODE>const</CODE>
tokens.
<H4>Statement tokens</H4>
<P>
Statement tokens are specified as follows:
<PRE>
<I>statement-token</I> :
STATEMENT
</PRE>
All statement tokens lie in the macro namespace.
</P>
<H4>Type tokens</H4>
<P>
Type tokens are specified as follows:
<PRE>
<I>type-token</I> :
TYPE
VARIETY
VARIETY signed
VARIETY unsigned
FLOAT
ARITHMETIC
SCALAR
CLASS
STRUCT
UNION
</PRE>
representing a generic type, an integral type, a signed integral type,
an unsigned integral type, a floating point type, an arithmetic (integral
or floating point) type, a scalar (arithmetic or pointer) type, a
class type, a structure type and a union type respectively.
</P>
<P>
<IMG SRC="../images/warn.gif" ALT="warning">
Floating-point, arithmetic and scalar token types have not yet been
implemented correctly in either the C or C++ producers.
</P>
<H4><A NAME="member">Member tokens</A></H4>
<P>
Member tokens are specified as follows:
<PRE>
<I>member-token</I> :
MEMBER <I>access-specifier<SUB>opt</SUB> member-type-id</I> : <I>type-id</I> :
</PRE>
where an <I>access-specifier</I> of <CODE>public</CODE> is assumed
if none is given. The member type is given by:
<PRE>
<I>member-type-id</I> :
<I>type-id</I>
<I>type-id</I> % <I>constant-expression</I>
</PRE>
where <CODE>%</CODE> is used to denote bitfield members (since
<CODE>:</CODE> is used as a separator). The second type denotes the
structure or union the given member belongs to. Different types can
have members with the same internal name, but the external token name
must be unique. Note that only non-static data members can be represented
in this form.
</P>
<P>
Two declarations for the same <CODE>MEMBER</CODE> token (including token
definitions) should have the same type, however the directive:
<PRE>
#pragma TenDRA++ incompatible member declaration <I>allow</I>
</PRE>
allows declarations with different types, provided these types have the
same size and alignment requirements.
</P>
<H4>Procedure tokens</H4>
<P>
Procedure, or high-level, tokens are specified in one of three ways:
<PRE>
<I>procedure-token</I> :
<I>general-procedure</I>
<I>simple-procedure</I>
<I>function-procedure</I>
</PRE>
All procedure tokens (except ellipsis functions - see below) lie in
the macro namespace. The most general form of procedure token specifies
two sets of parameters. The bound parameters are those which are
used in encoding the actual TDF output, and the program parameters
are those which are <A HREF="#args">specified in the program</A>.
The program parameters are expressed in terms of the bound parameters.
A program parameter can be an expression token parameter, a statement
token parameter, a member token parameter, a procedure token parameter
or any type. The bound parameters are deduced from the program parameters
by a similar process to that used in template argument deduction.
<PRE>
<I>general-procedure</I> :
PROC { <I>bound-toks<SUB>opt</SUB></I> | <I>prog-pars<SUB>opt</SUB></I> } <I>token-introduction
</I>
<I>bound-toks</I> :
<I>bound-token</I>
<I>bound-token</I> , <I>bound-toks</I>
<I>bound-token</I> :
<I>token-introduction token-namespace<SUB>opt</SUB> identifier</I>
<I>prog-pars</I> :
<I>program-parameter</I>
<I>program-parameter</I> , <I>prog-pars</I>
<I>program-parameter</I> :
EXP <I>identifier</I>
STATEMENT <I>identifier</I>
TYPE <I>type-id</I>
MEMBER <I>type-id</I> : <I>identifier</I>
PROC <I>identifier</I>
</PRE>
</P>
<P>
The simplest form of a <I>general-procedure</I> is one in which the
<I>prog-pars</I> correspond precisely to the <I>bound-toks</I>. In
this case the syntax:
<PRE>
<I>simple-procedure</I> :
PROC ( <I>simple-toks<SUB>opt</SUB></I> ) <I>token-introduction</I>
<I>simple-toks</I> :
<I>simple-token</I>
<I>simple-token</I> , <I>simple-toks</I>
<I>simple-token</I> :
<I>token-introduction token-namespace<SUB>opt</SUB> identifier<SUB>opt</SUB></I>
</PRE>
may be used. Note that the parameter names are optional.
</P>
<P>
A function token is specified as follows:
<PRE>
<I>function-procedure</I> :
FUNC <I>type-id</I> :
</PRE>
where the given type is a function type. This has two effects: firstly
a function with the given type is declared; secondly, if the function
type has the form:
<PRE>
r ( p1, ...., pn )
</PRE>
a procedure token with sort:
<PRE>
PROC ( EXP rvalue : p1 :, ...., EXP rvalue : pn : ) EXP rvalue : r :
</PRE>
is declared. For ellipsis function types only the function, not the
token, is declared. Note that the token behaves like a macro definition
of the corresponding function. Unless explicitly enclosed in a linkage
specification, a function declared using a <CODE>FUNC</CODE>
token has C linkage. Note that it is possible for two <CODE>FUNC</CODE>
tokens to have the same internal name, because of function overloading,
however external names must be unique.
</P>
<P>
The directive:
<PRE>
#pragma TenDRA incompatible interface declaration <I>allow</I>
</PRE>
can be used to allow incompatible redeclarations of functions declared
using <CODE>FUNC</CODE> tokens. The token declaration takes precedence.
</P>
<P>
<IMG SRC="../images/warn.gif" ALT="warning">
Certain of the more complex examples of <CODE>PROC</CODE> tokens such
as, for example, tokens with <CODE>PROC</CODE> parameters, have not
been implemented in either the C or C++ producers.
</P>
<HR>
<H3><A NAME="args">2.3.2. Token arguments</A></H3>
<P>
As mentioned above, the program parameters for a <CODE>PROC</CODE>
token are those specified in the program itself. These arguments
are expressed as a comma-separated list enclosed in brackets, the
form of each argument being determined by the corresponding program
parameter.
</P>
<P>
An <CODE>EXP</CODE> argument is an assignment expression. This must
be an lvalue for <CODE>lvalue</CODE> tokens and a constant expression
for
<CODE>const</CODE> tokens. The argument is converted to the token
type (for <CODE>lvalue</CODE> tokens this is essentially a conversion
between the corresponding reference types). A <CODE>NAT</CODE> or
<CODE>INTEGER</CODE> argument is an integer constant expression.
In the former case this must be non-negative.
</P>
<P>
A <CODE>STATEMENT</CODE> argument is a statement. This statement
should not contain any labels or any <CODE>goto</CODE> or <CODE>return</CODE>
statements.
</P>
<P>
A type argument is a type identifier. This must name a type of the
correct category for the corresponding token. For example, a
<CODE>VARIETY</CODE> token requires an integral type.
</P>
<P>
<A NAME="offset">A member argument must describe the offset of a member
or nested member of the given structure or union type</A>. The type
of the member should agree with that of the <CODE>MEMBER</CODE> token.
The general form of a member offset can be described in terms of member
selectors and array indexes as follows:
<PRE>
<I>member-offset</I> :
::<I><SUB>opt</SUB> id-expression</I>
<I>member-offset</I> . ::<I><SUB>opt</SUB> id-expression</I>
<I>member-offset</I> [ <I>constant-expression</I> ]
</PRE>
</P>
<P>
A <CODE>PROC</CODE> argument is an identifier. This identifier must
name a <CODE>PROC</CODE> token of the appropriate sort.
</P>
<HR>
<H3><A NAME="tokdef">2.3.3. Defining tokens</A></H3>
<P>
Given a token specification of a syntactic object and a normal language
definition of the same object (including macro definitions if the
token lies in the macro namespace), the producers attempt to unify
the two by defining the TDF token in terms of the given definition.
Whether the token specification occurs before or after the language
definition is immaterial. Unification also takes place in situations
where, for example, two types are known to be compatible. Multiple
consistent explicit token definitions are allowed by default when
allowed by the language; this is controlled by the directive:
<PRE>
#pragma TenDRA compatible token <I>allow</I>
</PRE>
The default unification behaviour may be modified using the directives:
<PRE>
#pragma TenDRA no_def <I>token-list</I>
#pragma TenDRA define <I>token-list</I>
#pragma TenDRA reject <I>token-list</I>
</PRE>
or equivalently:
<PRE>
#pragma no_def <I>token-list</I>
#pragma define <I>token-list</I>
#pragma ignore <I>token-list</I>
</PRE>
which set the state of the tokens given in <I>token-list</I>. A state
of <CODE>no_def</CODE> means that no unification is attempted and
that any attempt to explicitly define the token results in an error.
A state of <CODE>define</CODE> means that unification takes place
and that the token must be defined somewhere in the translation unit.
A state of <CODE>reject</CODE> means that unification takes place as
normal, but any resulting token definition is discarded and not output
to the TDF capsule.
</P>
<P>
If a token with the state <CODE>define</CODE> is not defined, then the
behaviour depends on the sort of the token. A <CODE>FUNC</CODE> token
is implicitly defined in terms of its underlying function, such as:
<PRE>
#define f( a1, ...., an ) ( f ) ( a1, ...., an )
</PRE>
Other undefined tokens cause an error. This behaviour can be modified
using the directives:
<PRE>
#pragma TenDRA++ implicit token definition <I>allow</I>
#pragma TenDRA++ no token definition <I>allow</I>
</PRE>
respectively.
<P>
The primitive operations, <CODE>no_def</CODE>, <CODE>define</CODE> and
<CODE>reject</CODE>, can also be expressed using the context sensitive
directive:
<PRE>
#pragma TenDRA interface <I>token-list</I>
</PRE>
or equivalently:
<PRE>
#pragma interface <I>token-list</I>
</PRE>
By default this is equivalent to <CODE>no_def</CODE>, but may be modified
by inclusion using one of the directives:
<PRE>
#pragma TenDRA extend <I>header-name</I>
#pragma TenDRA implement <I>header-name</I>
</PRE>
or equivalently:
<PRE>
#pragma extend interface <I>header-name</I>
#pragma implement interface <I>header-name</I>
</PRE>
These are equivalent to:
<PRE>
#include <I>header-name</I>
</PRE>
except that the form <CODE>[....]</CODE> is allowed as a header name.
This is equivalent to <CODE><....></CODE> except that it starts
the directory search after the point at which the including file was
found, rather than at the start of the path (i.e. it is equivalent
to the
<CODE>#include_next</CODE> directive found in some preprocessors).
The effect of the <CODE>extend</CODE> directive on the state of the
<CODE>interface</CODE> directive is as follows:
<PRE>
no_def -> no_def
define -> reject
reject -> reject
</PRE>
The effect of the <CODE>implement</CODE> directive is as follows:
<PRE>
no_def -> define
define -> define
reject -> reject
</PRE>
That is to say, a <CODE>implement</CODE> directive will cause all
the tokens in the given header to be defined and their definitions
output. Any tokens included in this header by <CODE>extend</CODE>
may be defined, but their definitions will not be output. This is
precisely the behaviour which is required to ensure that each token
is defined exactly once in an API library build.
</P>
<P>
The lists of tokens in the directives above are expressed in the form:
<PRE>
<I>token-list</I> :
<I>token-id token-list<SUB>opt</SUB></I>
# <I>preproc-token-list</I>
</PRE>
where a <I>token-id</I> represents an internal token name:
<PRE>
<I>token-id</I> :
<I>token-namespace<SUB>opt</SUB> identifier</I>
<I>type-id</I> . <I>identifier</I>
</PRE>
Note that member tokens are specified by means of both the member
name and its parent type. In this type specifier, <CODE>TAG</CODE>,
rather than
<CODE>class</CODE>, <CODE>struct</CODE> or <CODE>union</CODE>, may
be used in elaborated type specifiers for structure and union tokens.
If the
<I>token-id</I> names an overloaded function then the directive is
applied to all <CODE>FUNC</CODE> tokens of that name. It is possible
to be more selective using the <CODE>#</CODE> form which allows the
external token name to be specified. Such an entry must be the last
in a <I>token-list</I>.
</P>
<P>
A related directive has the form:
<PRE>
#pragma TenDRA++ undef token <I>token-list</I>
</PRE>
which undefines all the given tokens so that they are no longer visible.
</P>
<P>
As noted above, a macro is only considered as a token definition if
the token lies in the macro namespace. Tokens which are not in the
macro namespace, such as types and members, cannot be defined using
macros. Occasionally API implementations do define member selector
as macros in terms of other member selectors. Such a token needs
to be explicitly defined using a directive of the form:
<PRE>
#pragma TenDRA member definition <I>type-id</I> : <I>identifier member-offset
</I>
</PRE>
where <I>member-offset</I> is <A HREF="#offset">as above</A>.
</P>
<HR>
<P><I>Part of the <A HREF="../index.html">TenDRA Web</A>.<BR>Crown
Copyright © 1998.</I></P>
</BODY>
</HTML>