2 |
7u83 |
1 |
<!-- Crown Copyright (c) 1998 -->
|
|
|
2 |
<HTML>
|
|
|
3 |
<HEAD>
|
|
|
4 |
<TITLE>C Checker Reference Manual: The Token Syntax</TITLE>
|
|
|
5 |
</HEAD>
|
|
|
6 |
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#400080" ALINK="#FF0000">
|
|
|
7 |
<A NAME=S144>
|
|
|
8 |
<H1>C Checker Reference Manual</H1>
|
|
|
9 |
<H3>January 1998</H3>
|
|
|
10 |
<A HREF="tdfc21.html"><IMG SRC="../images/next.gif" ALT="next section"></A>
|
|
|
11 |
<A HREF="tdfc19.html"><IMG SRC="../images/prev.gif" ALT="previous section"></A>
|
|
|
12 |
<A HREF="tdfc1.html"><IMG SRC="../images/top.gif" ALT="current document"></A>
|
|
|
13 |
<A HREF="../index.html"><IMG SRC="../images/home.gif" ALT="TenDRA home page">
|
|
|
14 |
</A>
|
|
|
15 |
<IMG SRC="../images/no_index.gif" ALT="document index"><P>
|
|
|
16 |
<HR>
|
|
|
17 |
<DL>
|
|
|
18 |
<DT><A HREF="#S145"><B>F.1 </B> - Introduction</A><DD>
|
|
|
19 |
<DT><A HREF="#S146"><B>F.2 </B> - Program construction using TDF</A><DD>
|
|
|
20 |
<DT><A HREF="#S147"><B>F.3 </B> - The token syntax</A><DD>
|
|
|
21 |
<DT><A HREF="#S148"><B>F.4 </B> - Token identification</A><DD>
|
|
|
22 |
<DT><A HREF="#S149"><B>F.5 </B> - Expression tokens</A><DD>
|
|
|
23 |
<DT><A HREF="#S150"><B>F.6 </B> - Statement tokens</A><DD>
|
|
|
24 |
<DT><A HREF="#S151"><B>F.7 </B> - Type tokens</A><DD>
|
|
|
25 |
<DL>
|
|
|
26 |
<DT><A HREF="#S152"><B>F.7.1 </B> - General type tokens</A><DD>
|
|
|
27 |
<DT><A HREF="#S153"><B>F.7.2 </B> - Integral type tokens</A><DD>
|
|
|
28 |
<DT><A HREF="#S154"><B>F.7.3 </B> - Arithmetic type tokens</A><DD>
|
|
|
29 |
<DT><A HREF="#S155"><B>F.7.4 </B> - Compound type tokens</A><DD>
|
|
|
30 |
<DT><A HREF="#S156"><B>F.7.5 </B> - Type token compatibility, definitions
|
|
|
31 |
etc.</A><DD>
|
|
|
32 |
</DL>
|
|
|
33 |
<DT><A HREF="#S157"><B>F.8 </B> - Selector tokens</A><DD>
|
|
|
34 |
<DT><A HREF="#S158"><B>F.9 </B> - Procedure tokens</A><DD>
|
|
|
35 |
<DL>
|
|
|
36 |
<DT><A HREF="#S159"><B>F.9.1 </B> - General procedure tokens</A><DD>
|
|
|
37 |
<DT><A HREF="#S160"><B>F.9.2 </B> - Simple procedure tokens</A><DD>
|
|
|
38 |
<DT><A HREF="#S161"><B>F.9.3 </B> - Function procedure tokens</A><DD>
|
|
|
39 |
<DT><A HREF="#S162"><B>F.9.4 </B> - Defining procedure tokens</A><DD>
|
|
|
40 |
</DL>
|
|
|
41 |
<DT><A HREF="#S163"><B>F.10 </B> - Tokens and APIs</A><DD>
|
|
|
42 |
</DL>
|
|
|
43 |
|
|
|
44 |
<HR>
|
|
|
45 |
<H1>F The Token Syntax</H1>
|
|
|
46 |
<A NAME=S145>
|
|
|
47 |
<HR><H2>F.1 Introduction</H2>
|
|
|
48 |
The token syntax is used to introduce references to program constructs
|
|
|
49 |
such as types, expressions etc. that can be defined in other compilation
|
|
|
50 |
modules. This can be thought of as a generalisation of function prototyping
|
|
|
51 |
which is used to introduce references to functions defined elsewhere.
|
|
|
52 |
The references introduced by the token syntax are called tokens because
|
|
|
53 |
they are tokens for the program constructs that they reference. The
|
|
|
54 |
token syntax is said to specify a token interface.<P>
|
|
|
55 |
It is expected that the general user will have little direct contact
|
|
|
56 |
with the token syntax, instead using the asbstract standard headers
|
|
|
57 |
provided or using the tspec tool [Ref. 5] to generate their own token
|
|
|
58 |
interface header files automatically. However, it may occasionally
|
|
|
59 |
be necessary to use the raw power of the token syntax directly.<P>
|
|
|
60 |
As an example of the power of the token syntax consider the program
|
|
|
61 |
below:
|
|
|
62 |
<PRE>
|
|
|
63 |
#pragma token TYPE FILE#
|
|
|
64 |
#pragma token EXP rvalue:FILE *:stderr#
|
|
|
65 |
int fprintf(FILE *, const char *, ...);
|
|
|
66 |
void f(void) {
|
|
|
67 |
fprintf(stderr,"hello world\n");
|
|
|
68 |
}
|
|
|
69 |
</PRE>
|
|
|
70 |
The first line of the program introduces a token, FILE, for a type.
|
|
|
71 |
By using its identification, FILE, this token can be used wherever
|
|
|
72 |
a type could have been used throughout the rest of the program. The
|
|
|
73 |
compiler can then compile this program to TDF (the abstract TenDRA
|
|
|
74 |
machine) even though it contains an undefined type. This is fundamental
|
|
|
75 |
to the construction of portable software, where the developer cannot
|
|
|
76 |
assume the definitions of various types as they may be different on
|
|
|
77 |
different machines.<P>
|
|
|
78 |
The second line of the example, which introduces a token for an expression,
|
|
|
79 |
is somewhat more complicated. In order to make use of an expression,
|
|
|
80 |
it is necessary to know its type and whether or not it is an lvalue
|
|
|
81 |
(i.e. whether or not it can be assigned to). As can be seen from the
|
|
|
82 |
example however, it is not necessary to know the exact type of the
|
|
|
83 |
expression because a token can be used to represent its type.<P>
|
|
|
84 |
The TenDRA compiler makes no assumptions about the possible definitions
|
|
|
85 |
of tokens and will raise an error if a program requires information
|
|
|
86 |
about an undefined token. In this way many errors resulting from inadvertent
|
|
|
87 |
use of a definition present on the developer's system can be detected.
|
|
|
88 |
For example, developers often assume that the type FILE will be implemented
|
|
|
89 |
by a structure type when in fact the ISO C standard permits the implementation
|
|
|
90 |
of FILE by any type. In the program above, any attempt to access members
|
|
|
91 |
of stderr would cause the compiler to raise an error.<P>
|
|
|
92 |
<A NAME=S146>
|
|
|
93 |
<HR><H2>F.2 Program construction using TDF</H2>
|
|
|
94 |
Traditional program construction using the C language has two phases:
|
|
|
95 |
compilation and linking.<P>
|
|
|
96 |
In the compilation phase the source text written in the C language
|
|
|
97 |
is mapped to an object code format. This object code is generally
|
|
|
98 |
not complete in itself and must be linked with other program segments
|
|
|
99 |
such as definitions from the system libraries.<P>
|
|
|
100 |
When tokens are involved there is an extra stage in the construction
|
|
|
101 |
process where undefined tokens in one program segment are linked with
|
|
|
102 |
their definitions in another program segment. To summarise, program
|
|
|
103 |
construction using TDF and the TenDRA tools has four basic operations:<P>
|
|
|
104 |
<OL>
|
|
|
105 |
<LI>Source file compilation to TDF. The TDF produced may be incomplete
|
|
|
106 |
in the sense that it may contain undefined tokens;<P>
|
|
|
107 |
<LI>TDF linking. The undefined tokens represented in TDF are linked
|
|
|
108 |
to their definitions in other compilation modules or libraries. During
|
|
|
109 |
linking, tokens with the same identifier are treated as the same token;<P>
|
|
|
110 |
<LI>TDF translation. This is the conversion of TDF into standard object
|
|
|
111 |
file format for a particular machine system. The program is still
|
|
|
112 |
incomplete at this stage in the sense that it may contain undefined
|
|
|
113 |
library functions;<P>
|
|
|
114 |
<LI>Object file linking. This corresponds directly to standard object
|
|
|
115 |
file linking.<P>
|
|
|
116 |
</OL>
|
|
|
117 |
<A NAME=S147>
|
|
|
118 |
<HR><H2>F.3 The token syntax</H2>
|
|
|
119 |
The token syntax is an extension to the ISO C standard language to
|
|
|
120 |
allow the use of tokens to represent program constructs. Tokens can
|
|
|
121 |
be used either in place of, or as well as, the definitions required
|
|
|
122 |
by a program. In the latter case, the tokens exist merely to enforce
|
|
|
123 |
correct definitions and usage of the objects they reference. However
|
|
|
124 |
it should be noted that the presence of a token introduction can alter
|
|
|
125 |
the semantics of a program (examples are given in <A HREF="#3">F.5
|
|
|
126 |
Expression tokens</A>). The semantics have been altered to force programs
|
|
|
127 |
to respect token interfaces where they would otherwise fail to do
|
|
|
128 |
so.<P>
|
|
|
129 |
The token syntax takes the following basic form:<P>
|
|
|
130 |
<PRE>
|
|
|
131 |
#pragma token token-introduction token-identification
|
|
|
132 |
</PRE>
|
|
|
133 |
It is introduced as a pragma to allow other compilers to ignore it,
|
|
|
134 |
though if tokens are being used to replace the definitions needed
|
|
|
135 |
by a program, ignoring these pragmas will generally cause the compilation
|
|
|
136 |
to fail. <P>
|
|
|
137 |
The <EM>token-introduction</EM> defines the kind of token being introduced
|
|
|
138 |
along with any additional information associated with that kind of
|
|
|
139 |
token. Currently there are five kinds of token that can be introduced,
|
|
|
140 |
corresponding approximately to expressions, statements, type-names,
|
|
|
141 |
member designators and function-like macros. <P>
|
|
|
142 |
The <EM>token-identification</EM> provides the means of referring
|
|
|
143 |
to the token, both internally within the program and externally for
|
|
|
144 |
TDF linking purposes.<P>
|
|
|
145 |
<A NAME=S148>
|
|
|
146 |
<HR><H2>F.4 Token identification</H2>
|
|
|
147 |
The syntax for the token-identification is as follows:<P>
|
|
|
148 |
<PRE>
|
|
|
149 |
token identification:
|
|
|
150 |
<EM>name-space</EM><SUB><EM>opt</EM></SUB> <EM>identifier</EM> # <EM>external-identifier
|
|
|
151 |
</EM><SUB><EM>opt</EM></SUB>
|
|
|
152 |
|
|
|
153 |
name-space:
|
|
|
154 |
TAG
|
|
|
155 |
</PRE>
|
|
|
156 |
There is a default name space associated with each kind of token and
|
|
|
157 |
internal identifiers for tokens generally reside in these default
|
|
|
158 |
name spaces. The ISO C standard describes the five name spaces as
|
|
|
159 |
being:<P>
|
|
|
160 |
<OL>
|
|
|
161 |
<LI>The label space, in which all label identifiers reside;<P>
|
|
|
162 |
<LI>The tag space, in which structure, union and enumeration tags
|
|
|
163 |
reside;<P>
|
|
|
164 |
<LI>The member name space, in which structure and union member selectors
|
|
|
165 |
reside;<P>
|
|
|
166 |
<LI>The macro name space, in which all macro definitions reside. Token
|
|
|
167 |
identifiers in the macro name space have no definition and so are
|
|
|
168 |
not expanded. However, they behave as macros in all other respects;<P>
|
|
|
169 |
<LI>The ordinary name space in which all other identifiers reside.<P>
|
|
|
170 |
</OL>
|
|
|
171 |
The exception is compound type-token identifiers (see <A HREF="#16">F.7.4
|
|
|
172 |
Compound type tokens</A>) which by default reside in the ordinary
|
|
|
173 |
name space but can be forced to reside in the tag name space by setting
|
|
|
174 |
the optional name-space to be TAG.<P>
|
|
|
175 |
The first identifier of the <EM>token-identification</EM> provides
|
|
|
176 |
the internal identification of the token. This is the name used to
|
|
|
177 |
identify the token within the program. It must be followed by a #.<P>
|
|
|
178 |
All further preprocessing tokens until the end of the line are treated
|
|
|
179 |
as part of the <EM>external-identifier</EM> with non-empty white space
|
|
|
180 |
sequences being replaced by a single space. The <EM>external-identifier
|
|
|
181 |
</EM>specifies the external identification of the token which is used
|
|
|
182 |
for TDF linking. External token identifications reside in their own
|
|
|
183 |
name space which is distinct from the external name space for functions
|
|
|
184 |
and objects. This means that it is possible to have both a function
|
|
|
185 |
and a token with the same external identification. If the external-identifier
|
|
|
186 |
is omitted it is assumed that the internal and external identifications
|
|
|
187 |
are the same.<P>
|
|
|
188 |
<A NAME=S149>
|
|
|
189 |
<HR><H2>F.5 <A NAME=3>Expression tokens</H2>
|
|
|
190 |
There are various properties associated with expression tokens which
|
|
|
191 |
are used to determine the operations that may be performed upon them.
|
|
|
192 |
<P>
|
|
|
193 |
<UL>
|
|
|
194 |
<LI>Designation is the classification of the value delivered by evaluating
|
|
|
195 |
the expression. The three possible designations, implied by the ISO
|
|
|
196 |
C standard, section 6.3, are:
|
|
|
197 |
<UL>
|
|
|
198 |
<LI>value - expression describes the computation of a value;<P>
|
|
|
199 |
<LI>object - expression designates a variable which may have an associated
|
|
|
200 |
type qualifier giving the access conditions;<P>
|
|
|
201 |
<LI>function designation - expression designates a function.<P>
|
|
|
202 |
</UL>
|
|
|
203 |
<LI>Type specifies the type of the expression ignoring any type qualification;
|
|
|
204 |
<P>
|
|
|
205 |
<LI>Constancy is the property of being a constant expression as specified
|
|
|
206 |
in the ISO C Standard section 6.4.<P>
|
|
|
207 |
</UL>
|
|
|
208 |
The syntax for introducing expression tokens is:
|
|
|
209 |
<PRE>
|
|
|
210 |
exp-token:
|
|
|
211 |
EXP <EM>exp-storage</EM> : <EM>type-name</EM> :
|
|
|
212 |
NAT
|
|
|
213 |
exp-storage:
|
|
|
214 |
rvalue
|
|
|
215 |
lvalue
|
|
|
216 |
const
|
|
|
217 |
</PRE>
|
|
|
218 |
Expression tokens can be introduced using either the EXP or NAT token
|
|
|
219 |
introductions. Expression tokens introduced using NAT are constant
|
|
|
220 |
value designations of type int i.e. they reference constant integer
|
|
|
221 |
expressions. All other expression tokens are assumed to be non-constant
|
|
|
222 |
and are introduced using EXP.<P>
|
|
|
223 |
<UL>
|
|
|
224 |
<LI>The <CODE>exp-storage</CODE> is either lvalue or rvalue. If it
|
|
|
225 |
is lvalue, then the token is an object designation without type qualification.
|
|
|
226 |
If it is rvalue then the token is either a value or a function designation
|
|
|
227 |
depending on whether or not its type is a function type.<P>
|
|
|
228 |
<LI>The <EM>type-name</EM> is the type of the expression to which
|
|
|
229 |
the token refers.<P>
|
|
|
230 |
</UL>
|
|
|
231 |
All internal expression token identifiers must reside in the macro
|
|
|
232 |
name space and this is consequently the default name space for such
|
|
|
233 |
identifiers. Hence the optional name-space, TAG, should not be present
|
|
|
234 |
in an EXP token introduction. Although the use of an expression token
|
|
|
235 |
after it has been introduced is very similar to that of an ordinary
|
|
|
236 |
identifier, as it resides in the macro name space, it has the additional
|
|
|
237 |
properties listed below:<P>
|
|
|
238 |
<UL>
|
|
|
239 |
<LI>expression tokens cannot be hidden by using an inner scope;<P>
|
|
|
240 |
<LI>with respect to #ifdef, expression tokens are defined;<P>
|
|
|
241 |
<LI>the scope of expression tokens can be terminated by #undef.<P>
|
|
|
242 |
</UL>
|
|
|
243 |
In order to make use of tokenised expressions, a new symbol, <CODE>exp-token-name
|
|
|
244 |
</CODE>, has been introduced at translation phase seven of the syntax
|
|
|
245 |
analysis as defined in the ISO C standard. When an expression token
|
|
|
246 |
identifier is encountered by the preprocessor, an <EM>exp-token-name</EM>
|
|
|
247 |
symbol is passed through to the syntax analyser. An <CODE>exp-token-name
|
|
|
248 |
</CODE>provides information about an expression token in the same
|
|
|
249 |
way that a <CODE>typedef-name</CODE> provides information about a
|
|
|
250 |
type introduced using a typedef. This symbol can only occur as part
|
|
|
251 |
of a primary-expression (ISO C standard section 6.3.1) and the expression
|
|
|
252 |
resulting from the use of <CODE>exp-token-name</CODE> will have the
|
|
|
253 |
type, designation and constancy specified in the token introduction.
|
|
|
254 |
As an example, consider the pragma:<P>
|
|
|
255 |
<PRE>
|
|
|
256 |
#pragma token EXP rvalue : int : x#
|
|
|
257 |
</PRE>
|
|
|
258 |
This introduces a token for an expression which is a value designation
|
|
|
259 |
of type int with internal and external name x.<P>
|
|
|
260 |
Expression tokens can either be defined using #define statements or
|
|
|
261 |
by using externals. They can also be resolved as a result of applying
|
|
|
262 |
the type-resolution or assignment-resolution operators (see <A HREF="#17">F.7.5
|
|
|
263 |
Type token compatibility, definitions etc.</A>). Expression token
|
|
|
264 |
definitions are subject to the following constraints:<P>
|
|
|
265 |
<UL>
|
|
|
266 |
<LI>if the <CODE>exp-token-name </CODE>refers to a constant expression
|
|
|
267 |
(i.e. it was introduced using the NAT token introduction), then the
|
|
|
268 |
defining expression must also be a constant expression as expressed
|
|
|
269 |
in the ISO C standard, section 6.4;<P>
|
|
|
270 |
<LI>if the <CODE>exp-token-name</CODE> refers to an lvalue expression,
|
|
|
271 |
then the defining expression must also designate an object and the
|
|
|
272 |
type of the expression token must be resolvable to the type of the
|
|
|
273 |
defining expression. All the type qualifiers of the defining expression
|
|
|
274 |
must appear in the object designation of the token introduction;<P>
|
|
|
275 |
<LI>if the <CODE>exp-token-name</CODE> refers to an expression that
|
|
|
276 |
has function designation, then the type of the expression token must
|
|
|
277 |
be resolvable to the type of the defining expression.<P>
|
|
|
278 |
</UL>
|
|
|
279 |
<P>
|
|
|
280 |
The program below provides two examples of the violation of the second
|
|
|
281 |
constraint. <P>
|
|
|
282 |
<PRE>
|
|
|
283 |
#pragma token EXP lvalue : int : i#
|
|
|
284 |
extern short k;
|
|
|
285 |
#define i 6
|
|
|
286 |
#define i k
|
|
|
287 |
</PRE>
|
|
|
288 |
The expression token i is an object designation of type int. The first
|
|
|
289 |
violation occurs because the expression, 6, does not designate an
|
|
|
290 |
object. The second violation is because the type of the token expression,
|
|
|
291 |
i, is int which cannot be resolved to the type short.<P>
|
|
|
292 |
If the <EM>exp-token-name</EM> refers to an expression that designates
|
|
|
293 |
a value, then the defining expression is converted, as if by assignment,
|
|
|
294 |
to the type of the expression token using the <EM>assignment-resolution</EM>
|
|
|
295 |
operator (see <A HREF="#17">F.7.5 Type token compatibility, definitions
|
|
|
296 |
etc.</A>). With all other designations the defining expression is
|
|
|
297 |
left unchanged. In both cases the resulting expression is used as
|
|
|
298 |
the definition of the expression token. This can subtly alter the
|
|
|
299 |
semantics of a program. Consider the program:<P>
|
|
|
300 |
<PRE>
|
|
|
301 |
#pragma token EXP rvalue:long:li#
|
|
|
302 |
#define li 6
|
|
|
303 |
int f() {
|
|
|
304 |
return sizeof(li);
|
|
|
305 |
}
|
|
|
306 |
</PRE>
|
|
|
307 |
The definition of the token li causes the expression, 6, to be converted
|
|
|
308 |
to long (this is essential to separate the use of li from its definition).
|
|
|
309 |
The function, f, then returns sizeof(long). If the token introduction
|
|
|
310 |
was absent however f would return sizeof(int).<P>
|
|
|
311 |
Although they look similar, expression token definitions using #defines
|
|
|
312 |
are not quite the same as macro definitions. A macro can be defined
|
|
|
313 |
by any preprocessing tokens which are then computed in phase 3 of
|
|
|
314 |
translation as defined in the ISO C standard, whereas tokens are defined
|
|
|
315 |
by assignment-expressions which are computed in phase 7. One of the
|
|
|
316 |
consequences of this is illustrated by the program below:<P>
|
|
|
317 |
<PRE>
|
|
|
318 |
#pragma token EXP rvalue:int :X#
|
|
|
319 |
#define X M+3
|
|
|
320 |
#define M sizeof(int)
|
|
|
321 |
int f(int x) { return (x+X); }
|
|
|
322 |
</PRE>
|
|
|
323 |
If the token introduction of X is absent, the program above will compile
|
|
|
324 |
as, at the time the definition of X is interpreted (when evaluating
|
|
|
325 |
x+X), both M and X are in scope. When the token introduction is present
|
|
|
326 |
the compilation will fail as the definition of X, being part of translation
|
|
|
327 |
phase 7, is interpreted when it is encountered and at this stage M
|
|
|
328 |
is not defined. This can be rectified by reversing the order of the
|
|
|
329 |
definitions of X and M or by bracketing the definition of X. i.e.<P>
|
|
|
330 |
<PRE>
|
|
|
331 |
<EM>#define X (M+3)</EM>
|
|
|
332 |
</PRE>
|
|
|
333 |
Conversely consider:<P>
|
|
|
334 |
<PRE>
|
|
|
335 |
#pragma token EXP rvalue:int:X#
|
|
|
336 |
#define M sizeof(int)
|
|
|
337 |
#define X M+3
|
|
|
338 |
#undef M
|
|
|
339 |
int M(int x) { return (x+X); }
|
|
|
340 |
</PRE>
|
|
|
341 |
The definition of X is computed on line 3 when M is in scope, not
|
|
|
342 |
on line 6 where it is used. Token definitions can be used in this
|
|
|
343 |
way to relieve some of the pressures on name spaces by undefining
|
|
|
344 |
macros that are only used in token definitions. This facility should
|
|
|
345 |
be used with care as it may not be a straightforward matter to convert
|
|
|
346 |
the program back to a conventional C program.<P>
|
|
|
347 |
Expression tokens can also be defined by declaring the <EM>exp-token-name</EM>
|
|
|
348 |
that references the token to be an object with external linkage e.g.<P>
|
|
|
349 |
<PRE>
|
|
|
350 |
#pragma token EXP lvalue:int:x#
|
|
|
351 |
extern int x;
|
|
|
352 |
</PRE>
|
|
|
353 |
The semantics of this program are effectively the same as the semantics
|
|
|
354 |
of:
|
|
|
355 |
<PRE>
|
|
|
356 |
#pragma token EXP lvalue:int:x#
|
|
|
357 |
extern int _x;
|
|
|
358 |
#define x _x
|
|
|
359 |
</PRE>
|
|
|
360 |
<A NAME=S150>
|
|
|
361 |
<HR><H2>F.6 <A NAME=6>Statement tokens</H2>
|
|
|
362 |
The syntax for introducing a statement token is simply:<P>
|
|
|
363 |
<PRE>
|
|
|
364 |
#pragma token STATEMENT init_globs#
|
|
|
365 |
int g(int);
|
|
|
366 |
int f(int x) { init_globs return g(x);}
|
|
|
367 |
</PRE>
|
|
|
368 |
Internal statement token identifiers reside in the macro name space.
|
|
|
369 |
The optional name space, TAG, should not appear in statement token
|
|
|
370 |
introductions.<P>
|
|
|
371 |
The use of statement tokens is analogous to the use of expression
|
|
|
372 |
tokens (see <A HREF="#3">F.5 Expression tokens</A>). A new symbol,
|
|
|
373 |
<EM>stat-token-name</EM>, has been introduced into the syntax analysis
|
|
|
374 |
at phase 7 of translation as defined in the ISO C standard. This token
|
|
|
375 |
is passed through to the syntax analyser whenever the preprocessor
|
|
|
376 |
encounters an identifier referring to a statement token. A <CODE>stat-token-name
|
|
|
377 |
</CODE> can only occur as part of the statement syntax (ISO C standard,
|
|
|
378 |
section 6.6).<P>
|
|
|
379 |
As with expression tokens, statement tokens are defined using #define
|
|
|
380 |
statements. An example of this is shown below:<P>
|
|
|
381 |
<PRE>
|
|
|
382 |
#pragma token STATEMENT i_globs#
|
|
|
383 |
#define i_globs {int i=x;x=3;}
|
|
|
384 |
</PRE>
|
|
|
385 |
The constraints on the definition of statement tokens are:<P>
|
|
|
386 |
<UL>
|
|
|
387 |
<LI>the use of labels is forbidden unless the definition of the statement
|
|
|
388 |
token occurs at the outer level (i.e outside of any compound statement
|
|
|
389 |
forming a function definition);<P>
|
|
|
390 |
<LI>the use of return within the defining statement is not allowed.<P>
|
|
|
391 |
</UL>
|
|
|
392 |
The semantics of the defining statement are precisely the same as
|
|
|
393 |
the semantics of a compound statement forming the definition of a
|
|
|
394 |
function with no parameters and void result. The definition of statement
|
|
|
395 |
tokens carries the same implications for phases of translation as
|
|
|
396 |
the definition of expression tokens (see <A HREF="#3">F.5 Expression
|
|
|
397 |
tokens</A>).<P>
|
|
|
398 |
<A NAME=S151>
|
|
|
399 |
<HR><H2>F.7 <A NAME=7>Type tokens</H2>
|
|
|
400 |
Type tokens are used to introduce references to types. The ISO C standard,
|
|
|
401 |
section 6.1.2.5, identifies the following classification of types:<P>
|
|
|
402 |
<UL>
|
|
|
403 |
<LI>the type char;
|
|
|
404 |
<LI>signed integral types;
|
|
|
405 |
<LI>unsigned integral types;
|
|
|
406 |
<LI>floating types;
|
|
|
407 |
<LI>character types;
|
|
|
408 |
<LI>enumeration types;
|
|
|
409 |
<LI>array types;
|
|
|
410 |
<LI>structure types;
|
|
|
411 |
<LI>union types;
|
|
|
412 |
<LI>function types;
|
|
|
413 |
<LI>pointer types;
|
|
|
414 |
</UL>
|
|
|
415 |
These types fall into the following broader type classifications:<P>
|
|
|
416 |
<UL>
|
|
|
417 |
<LI>integral types - consisting of the signed integral types, the
|
|
|
418 |
unsigned integral types and the type char;<P>
|
|
|
419 |
<LI>arithmetic types - consisting of the integral types and the floating
|
|
|
420 |
types;<P>
|
|
|
421 |
<LI>scalar types - consisting of the arithmetic types and the pointer
|
|
|
422 |
types;<P>
|
|
|
423 |
<LI>aggregate types - consisting of the structure and array types;<P>
|
|
|
424 |
<LI>derived types - consisting of array, structure, union, function
|
|
|
425 |
and pointer types;<P>
|
|
|
426 |
<LI>derived declarator types - consisting of array, function and pointer
|
|
|
427 |
types.<P>
|
|
|
428 |
</UL>
|
|
|
429 |
The classification of a type determines which operations are permitted
|
|
|
430 |
on objects of that type. For example, the ! operator can only be applied
|
|
|
431 |
to objects of scalar type. In order to reflect this, there are several
|
|
|
432 |
type token introductions which can be used to classify the type to
|
|
|
433 |
be referenced, so that the compiler can perform semantic checking
|
|
|
434 |
on the application of operators. The possible type token introductions
|
|
|
435 |
are:<P>
|
|
|
436 |
<PRE>
|
|
|
437 |
type-token:
|
|
|
438 |
TYPE
|
|
|
439 |
VARIETY
|
|
|
440 |
ARITHMETIC
|
|
|
441 |
STRUCT
|
|
|
442 |
UNION
|
|
|
443 |
</PRE>
|
|
|
444 |
<A NAME=S152>
|
|
|
445 |
<H3>F.7.1 <A NAME=9>General type tokens</H3>
|
|
|
446 |
The most general type token introduction is TYPE. This introduces
|
|
|
447 |
a type of unknown classification which can be defined to be any C
|
|
|
448 |
type. Only a few generic operations can be applied to such type tokens,
|
|
|
449 |
since the semantics must be defined for all possible substituted types.
|
|
|
450 |
Assignment and function argument passing are effectively generic operations,
|
|
|
451 |
apart from the treatment of array types. For example, according to
|
|
|
452 |
the ISO C standard, even assignment is not permitted if the left operand
|
|
|
453 |
has array type and we might therefore expect assignment of general
|
|
|
454 |
token types to be illegal. Tokens introduced using the TYPE token
|
|
|
455 |
introduction can thus be regarded as representing non-array types
|
|
|
456 |
with extensions to represent array types provided by applying non-array
|
|
|
457 |
semantics as described below.<P>
|
|
|
458 |
Once general type tokens have been introduced, they can be used to
|
|
|
459 |
construct derived declarator types in the same way as conventional
|
|
|
460 |
type declarators. For example:<P>
|
|
|
461 |
<PRE>
|
|
|
462 |
#pragma token TYPE t_t#
|
|
|
463 |
#pragma token TYPE t_p#
|
|
|
464 |
#pragma token NAT n#
|
|
|
465 |
typedef t_t *ptr_type; /* introduces pointer type */
|
|
|
466 |
typedef t_t fn_type(t_p);/*introduces function type */
|
|
|
467 |
typedef t_t arr_type[n];/*introduces array type */
|
|
|
468 |
</PRE>
|
|
|
469 |
The only standard conversion that can be performed on an object of
|
|
|
470 |
general token type is the lvalue conversion (ISO C standard section
|
|
|
471 |
6.2). Lvalue conversion of an object with general token type is defined
|
|
|
472 |
to return the item stored in the object. The semantics of lvalue conversion
|
|
|
473 |
are thus fundamentally altered by the presence of a token introduction.
|
|
|
474 |
If type <CODE>t_t</CODE> is defined to be an array type the lvalue
|
|
|
475 |
conversion of an object of type <EM>t_t</EM> will deliver a pointer
|
|
|
476 |
to the first array element. If, however, <EM>t_t</EM> is defined to
|
|
|
477 |
be a general token type, which is later defined to be an array type,
|
|
|
478 |
lvalue conversion on an object of type <EM>t_t</EM> will deliver the
|
|
|
479 |
components of the array.<P>
|
|
|
480 |
This definition of lvalue conversion for general token types is used
|
|
|
481 |
to allow objects of general tokenised types to be assigned to and
|
|
|
482 |
passed as arguments to functions. The extensions to the semantics
|
|
|
483 |
of function argument passing and assignment are as follows: if the
|
|
|
484 |
type token is defined to be an array then the components of the array
|
|
|
485 |
are assigned and passed as arguments to the function call; in all
|
|
|
486 |
other cases the assignment and function call are the same as if the
|
|
|
487 |
defining type had been used directly.<P>
|
|
|
488 |
The default name space for the internal identifiers for general type
|
|
|
489 |
tokens is the ordinary name space and all such identifiers must reside
|
|
|
490 |
in this name space. The local identifier behaves exactly as if it
|
|
|
491 |
had been introduced with a typedef statement and is thus treated as
|
|
|
492 |
a <EM>typedef-name </EM>by the syntax analyser.<P>
|
|
|
493 |
<A NAME=S153>
|
|
|
494 |
<H3>F.7.2 <A NAME=11>Integral type tokens</H3>
|
|
|
495 |
The token introduction VARIETY is used to introduce a token representing
|
|
|
496 |
an integral type. A token introduced in this way can only be defined
|
|
|
497 |
as an integral type and can be used wherever an integral type is valid.<P>
|
|
|
498 |
Values which have integral tokenised types can be converted to any
|
|
|
499 |
scalar type (see <A HREF="#7">F.7 Type tokens</A>). Similarly values
|
|
|
500 |
with any scalar type can be converted to a value with a tokenised
|
|
|
501 |
integral type. The semantics of these conversions are exactly the
|
|
|
502 |
same as if the type defining the token were used directly. Consider:<P>
|
|
|
503 |
<PRE>
|
|
|
504 |
#pragma token VARIETY i_t#
|
|
|
505 |
short f(void) {
|
|
|
506 |
i_t x_i = 5;
|
|
|
507 |
return x_i;
|
|
|
508 |
}
|
|
|
509 |
short g(void) {
|
|
|
510 |
long x_i = 5;
|
|
|
511 |
return x_i;
|
|
|
512 |
}
|
|
|
513 |
</PRE>
|
|
|
514 |
Within the function f there are two conversions: the value, 5, of
|
|
|
515 |
type int, is converted to <EM>i_t</EM>, the tokenised integral type,
|
|
|
516 |
and a value of tokenised integral type <EM>i_t</EM> is converted to
|
|
|
517 |
a value of type short. If the type <EM>i_t</EM> were defined to be
|
|
|
518 |
long then the function f would be exactly equivalent to the function
|
|
|
519 |
g.<P>
|
|
|
520 |
The usual arithmetic conversions described in the ISO C standard (section
|
|
|
521 |
6.3.1.5) are defined on integral type tokens and are applied where
|
|
|
522 |
required by the ISO C standard. <P>
|
|
|
523 |
The integral promotions are defined according to the rules introduced
|
|
|
524 |
in Chapter 4. These promotions are first applied to the integral type
|
|
|
525 |
token and then the usual arithmetic conversions are applied to the
|
|
|
526 |
resulting type.<P>
|
|
|
527 |
As with general type tokens, integral type tokens can only reside
|
|
|
528 |
in their default name space, the ordinary name space (the optional
|
|
|
529 |
name-space, TAG, cannot be specified in the token introduction). They
|
|
|
530 |
also behave as though they had been introduced using a typedef statement.<P>
|
|
|
531 |
<A NAME=S154>
|
|
|
532 |
<H3>F.7.3 <A NAME=13>Arithmetic type tokens
|
|
|
533 |
</H3>
|
|
|
534 |
The token introduction ARITHMETIC introduces an arithmetic type token.
|
|
|
535 |
In theory, such tokens can be defined by any arithmetic type, but
|
|
|
536 |
the current implementation of the compiler only permits them to be
|
|
|
537 |
defined by integral types. These type tokens are thus exactly equivalent
|
|
|
538 |
to the integral type tokens introduced using the token introduction
|
|
|
539 |
VARIETY.<P>
|
|
|
540 |
<A NAME=S155>
|
|
|
541 |
<H3>F.7.4 <A NAME=16>Compound type tokens</H3>
|
|
|
542 |
For the purposes of this document, a compound type is a type describing
|
|
|
543 |
objects which have components that are accessible via member selectors.
|
|
|
544 |
All structure and union types are thus compound types, but, unlike
|
|
|
545 |
structure and union types in C, compound types do not necessarily
|
|
|
546 |
have an ordering on their member selectors. In particular, this means
|
|
|
547 |
that some objects of compound type cannot be initialised with an <EM>initialiser-list
|
|
|
548 |
</EM>(see ISO C standard section 6.5.7).<P>
|
|
|
549 |
Compound type tokens are introduced using either the STRUCT or UNION
|
|
|
550 |
token introductions. A compound type token can be defined by any compound
|
|
|
551 |
type, regardless of the introduction used. It is expected, however,
|
|
|
552 |
that programmers will use STRUCT for compound types with non-overlapping
|
|
|
553 |
member selectors and UNION for compound types with overlapping member
|
|
|
554 |
selectors. The compound type token introduction does not specify the
|
|
|
555 |
member selectors of the compound type - these are added later (see
|
|
|
556 |
<A HREF="#19">F.8 Selector tokens</A>).<P>
|
|
|
557 |
Values and objects with tokenised compound types can be used anywhere
|
|
|
558 |
that a structure and union type can be used.<P>
|
|
|
559 |
Internal identifiers of compound type tokens can reside in either
|
|
|
560 |
the ordinary name space or the tag name space. The default is the
|
|
|
561 |
ordinary name space; identifiers placed in the ordinary name space
|
|
|
562 |
behave as if the type had been declared using a typedef statement.
|
|
|
563 |
If the identifier, id say, is placed in the tag name space, it is
|
|
|
564 |
as if the type had been declared as struct id or union id. Examples
|
|
|
565 |
of the introduction and use of compound type tokens are shown below:<P>
|
|
|
566 |
<PRE>
|
|
|
567 |
#pragma token STRUCT n_t#
|
|
|
568 |
#pragma token STRUCT TAG s_t#
|
|
|
569 |
#pragma token UNION TAG u_t#
|
|
|
570 |
|
|
|
571 |
void f() {
|
|
|
572 |
n_t x1;
|
|
|
573 |
struct n_t x2; /* Illegal,n_t not in the tag name space */
|
|
|
574 |
s_t x3; /* Illegal,s_t not in the ordinary name space*/
|
|
|
575 |
struct s_t x4;
|
|
|
576 |
union u_t x5;
|
|
|
577 |
}
|
|
|
578 |
</PRE>
|
|
|
579 |
<A NAME=S156>
|
|
|
580 |
<H3>F.7.5 <A NAME=17>Type token compatibility, definitions etc.</H3>
|
|
|
581 |
A type represented by an undefined type token is incompatible (ISO
|
|
|
582 |
C standard section 6.1.3.6) with all other types except for itself.
|
|
|
583 |
A type represented by a defined type token is compatible with everything
|
|
|
584 |
that is compatible with its definition.<P>
|
|
|
585 |
Type tokens can only be defined by using one of the operations known
|
|
|
586 |
as <CODE>type-resolution </CODE>and <CODE>assignment-resolution</CODE>.
|
|
|
587 |
Note that, as type token identifiers do not reside in the macro name
|
|
|
588 |
space, they cannot be defined using #define statements.<P>
|
|
|
589 |
<CODE>Type-resolution</CODE> operates on two types and is essentially
|
|
|
590 |
identical to the operation of type compatibility (ISO C standard section
|
|
|
591 |
6.1.3.6) with one major exception. In the case where an undefined
|
|
|
592 |
type token is found to be incompatible with the type with which it
|
|
|
593 |
is being compared, the type token is defined to be the type with which
|
|
|
594 |
it is being compared, thereby making them compatible.<P>
|
|
|
595 |
The ISO C standard prohibits the repeated use of typedef statements
|
|
|
596 |
to define a type. However, in order to allow type resolution, the
|
|
|
597 |
compiler allows types to be consistently redefined using multiple
|
|
|
598 |
typedef statements if:<P>
|
|
|
599 |
<UL>
|
|
|
600 |
<LI>there is a resolution of the two types;
|
|
|
601 |
<LI>as a result of the resolution, at least one token is defined.
|
|
|
602 |
</UL>
|
|
|
603 |
As an example, consider the program below:<P>
|
|
|
604 |
<PRE>
|
|
|
605 |
#pragma token TYPE t_t#
|
|
|
606 |
typedef t_t *ptr_t_t;
|
|
|
607 |
typedef int **ptr_t_t;
|
|
|
608 |
</PRE>
|
|
|
609 |
The second definition of <EM>ptr_t_t</EM> causes a resolution of the
|
|
|
610 |
types <EM>t_t</EM><CODE> *</CODE> and <EM>int</EM> <CODE>**</CODE>.
|
|
|
611 |
The rules of type compatibility state that two pointers are compatible
|
|
|
612 |
if their dependent types are compatible, thus type resolution results
|
|
|
613 |
in the definition of <EM>t_t</EM> as int *.<P>
|
|
|
614 |
<CODE>Type-resolution</CODE> can also result in the definition of
|
|
|
615 |
other tokens. The program below results in the expression token N
|
|
|
616 |
being defined as (4*sizeof(int)).<P>
|
|
|
617 |
<PRE>
|
|
|
618 |
#pragma token EXP rvalue:int:N#
|
|
|
619 |
typedef int arr[N];
|
|
|
620 |
typedef int arr[4*sizeof(int)];
|
|
|
621 |
</PRE>
|
|
|
622 |
The <CODE>type-resolution</CODE> operator is not symmetric; a resolution
|
|
|
623 |
of two types, A and B say, is an attempt to resolve type A to type
|
|
|
624 |
B. Thus only the undefined tokens of A can be defined as a result
|
|
|
625 |
of applying the <CODE>type-resolution</CODE> operator. In the examples
|
|
|
626 |
above, if the typedefs were reversed, no <CODE>type-resolution</CODE>
|
|
|
627 |
would take place and the types would be incompatible.<P>
|
|
|
628 |
<CODE>Assignment-resolution</CODE> is similar to <CODE>type-resolution</CODE>
|
|
|
629 |
but it occurs when converting an object of one type to another type
|
|
|
630 |
for the purposes of assignment. Suppose the conversion is not possible
|
|
|
631 |
and the type to which the object is being converted is an undefined
|
|
|
632 |
token type. If the token can be defined in such a way that the conversion
|
|
|
633 |
is possible, then that token will be suitably defined. If there is
|
|
|
634 |
more than one possible definition, the definition causing no conversion
|
|
|
635 |
will be chosen.<P>
|
|
|
636 |
<A NAME=S157>
|
|
|
637 |
<HR><H2>F.8 <A NAME=19>Selector tokens</H2>
|
|
|
638 |
The use of selector tokens is the primary method of adding member
|
|
|
639 |
selectors to compound type tokens. (The only other method is to define
|
|
|
640 |
the compound type token to be a particular structure or union type.)
|
|
|
641 |
The introduction of new selector tokens can occur at any point in
|
|
|
642 |
a program and they can thus be used to add new member selectors to
|
|
|
643 |
existing compound types.<P>
|
|
|
644 |
The syntax for introducing member selector tokens as follows:<P>
|
|
|
645 |
<PRE>
|
|
|
646 |
selector-token:
|
|
|
647 |
MEMBER <EM>selector-type-name</EM> : <EM>type-name</EM> :
|
|
|
648 |
selector-type-name:
|
|
|
649 |
<EM>type-name</EM>
|
|
|
650 |
<EM>type-name</EM> % <EM>constant-expression</EM>
|
|
|
651 |
</PRE>
|
|
|
652 |
The <EM>selector-type-name</EM> specifies the type of the object selected
|
|
|
653 |
by the selector token. If the <EM>selector-type-name</EM> is a plain
|
|
|
654 |
<EM>type-name</EM>, the member selector token has that type. If the
|
|
|
655 |
<CODE>selector-type-name</CODE> consists of a <EM>type-name</EM> and
|
|
|
656 |
a <EM>constant-expression</EM> separated by a % sign, the member selector
|
|
|
657 |
token refers to a bitfield of type <EM>type-name</EM> and width <EM>constant-expression
|
|
|
658 |
</EM>. The second <EM>type-name</EM> gives the compound type to which
|
|
|
659 |
the member selector belongs. For example:<P>
|
|
|
660 |
<PRE>
|
|
|
661 |
#pragma token STRUCT TAG s_t#
|
|
|
662 |
#pragma token MEMBER char*: struct s_t:s_t_mem#
|
|
|
663 |
</PRE>
|
|
|
664 |
introduces a compound token type, <EM>s_t</EM>, which has a member
|
|
|
665 |
selector, <EM>s_t_mem</EM>, which selects an object of type char*.<P>
|
|
|
666 |
Internal identifiers of member selector tokens can only reside in
|
|
|
667 |
the member name space of the compound type to which they belong. Clearly
|
|
|
668 |
this is also the default name space for such identifiers.<P>
|
|
|
669 |
When structure or union types are declared, according to the ISO C
|
|
|
670 |
standard there is an implied ordering on the member selectors. In
|
|
|
671 |
particular this means that:<P>
|
|
|
672 |
<UL>
|
|
|
673 |
<LI>during initialisation with an initialiser-list the identified
|
|
|
674 |
members of a structure are initialised in the order in which they
|
|
|
675 |
were declared. The first identified member of a union is initialised;<P>
|
|
|
676 |
<LI>the addresses of structure members will increase in the order
|
|
|
677 |
in which they were declared.<P>
|
|
|
678 |
</UL>
|
|
|
679 |
The member selectors introduced as selector tokens are not related
|
|
|
680 |
to any other member selectors until they are defined. There is thus
|
|
|
681 |
no ordering on the undefined tokenised member selectors of a compound
|
|
|
682 |
type. If a compound type has only undefined token selectors, it cannot
|
|
|
683 |
be initialised with an initialiser-list. There will be an ordering
|
|
|
684 |
on the defined members of a compound type and in this case, the compound
|
|
|
685 |
type can be initialised automatically.<P>
|
|
|
686 |
The decision to allow unordered member selectors has been taken deliberately
|
|
|
687 |
in order to separate the decision of which members belong to a structure
|
|
|
688 |
from that of where such member components lie within the structure.
|
|
|
689 |
This makes it possible to represent extensions to APIs which require
|
|
|
690 |
extra member selectors to be added to existing compound types.<P>
|
|
|
691 |
As an example of the use of token member selectors, consider the structure
|
|
|
692 |
lconv specified in the ISO C Standard library (section 7.4.3.1). The
|
|
|
693 |
standard does not specify all the members of struct lconv or the order
|
|
|
694 |
in which they appear. This type cannot be represented naturally by
|
|
|
695 |
existing C types, but can be described by the token syntax.<P>
|
|
|
696 |
There are two methods for defining selector tokens, one explicit and
|
|
|
697 |
one implicit. As selector token identifiers do not reside in the macro
|
|
|
698 |
name space they cannot be defined using #define statements.<P>
|
|
|
699 |
Suppose A is an undefined compound token type and mem is an undefined
|
|
|
700 |
selector token for A. If A is later defined to be the compound type
|
|
|
701 |
B and B has a member selector with identifier mem then A.mem is defined
|
|
|
702 |
to be B.mem providing the type of A.mem can be resolved to the type
|
|
|
703 |
of B.mem. This is known as implicit selector token definition. <P>
|
|
|
704 |
In the program shown below the redefinition of the compound type <EM>s_t</EM>
|
|
|
705 |
causes the token for the selector <EM>mem_x</EM> to be implicitly
|
|
|
706 |
defined to be the second member of struct <EM>s_tag</EM>. The consequential
|
|
|
707 |
type resolution leads to the token type <EM>t_t</EM> being defined
|
|
|
708 |
to be int.<P>
|
|
|
709 |
<PRE>
|
|
|
710 |
#pragma token TYPE t_t#
|
|
|
711 |
#pragma token STRUCT s_t#
|
|
|
712 |
#pragma token MEMBER t_t : s_t : mem_x#
|
|
|
713 |
#pragma token MEMBER t_t : s_t : mem_y#
|
|
|
714 |
struct s_tag { int a, mem_x, b; }
|
|
|
715 |
typedef struct s_tag s_t;
|
|
|
716 |
</PRE>
|
|
|
717 |
Explicit selector token definition takes place using the pragma:<P>
|
|
|
718 |
<PRE>
|
|
|
719 |
#pragma DEFINE MEMBER<EM> type-name</EM> <EM>identifier</EM> : <EM>member-designator
|
|
|
720 |
</EM>
|
|
|
721 |
|
|
|
722 |
member-designator:
|
|
|
723 |
<EM>identifier</EM>
|
|
|
724 |
<EM>identifier . member-designator</EM>
|
|
|
725 |
</PRE>
|
|
|
726 |
The <EM>type-name</EM> specifies the compound type to which the selector
|
|
|
727 |
belongs.<P>
|
|
|
728 |
The <EM>identifier</EM> provides the identification of the member
|
|
|
729 |
selector within that compound type.<P>
|
|
|
730 |
The <EM>member-designator</EM> provides the definition of the selector
|
|
|
731 |
token. It must identify a selector of a compound type.<P>
|
|
|
732 |
If the <EM>member-designator</EM> is an identifier, then the identifier
|
|
|
733 |
must be a member of the compound type specified by the <EM>type-name</EM>.
|
|
|
734 |
If the <EM>member-designator</EM> is an identifier, <EM>id</EM> say,
|
|
|
735 |
followed by a further member-designator, M say, then:<P>
|
|
|
736 |
<UL>
|
|
|
737 |
<LI>the identifier id must be a member identifying a selector of the
|
|
|
738 |
compound type specified by<EM> type-name</EM>;<P>
|
|
|
739 |
<LI>the type of the selector identified by id must have compound type,
|
|
|
740 |
C say;<P>
|
|
|
741 |
<LI>the <CODE>member-designator</CODE> M must identify a member selector
|
|
|
742 |
of the compound type C.<P>
|
|
|
743 |
</UL>
|
|
|
744 |
As with implicit selector token definitions, the type of the selector
|
|
|
745 |
token must be resolved to the type of the selector identified by the
|
|
|
746 |
<EM>member-designator</EM>.<P>
|
|
|
747 |
In the example shown below, the selector token mem is defined to be
|
|
|
748 |
the second member of struct <EM>s</EM> which in turn is the second
|
|
|
749 |
member of struct <EM>s_t</EM>.<P>
|
|
|
750 |
<PRE>
|
|
|
751 |
#pragma token STRUCT s_t#
|
|
|
752 |
#pragma token MEMBER int : s_t : mem#
|
|
|
753 |
typedef struct {int x; struct {char y; int z;} s; } s_t;
|
|
|
754 |
#pragma DEFINE MEMBER s_t : mem s.z
|
|
|
755 |
</PRE>
|
|
|
756 |
<A NAME=S158>
|
|
|
757 |
<HR><H2>F.9 <A NAME=21>Procedure tokens</H2>
|
|
|
758 |
Consider the macro SWAP defined below:<P>
|
|
|
759 |
<PRE>
|
|
|
760 |
#define SWAP(T,A,B) { \
|
|
|
761 |
T x; \
|
|
|
762 |
x=B; \
|
|
|
763 |
B=A; \
|
|
|
764 |
A=x; \
|
|
|
765 |
}
|
|
|
766 |
</PRE>
|
|
|
767 |
SWAP can be thought of as a statement that is parameterised by a type
|
|
|
768 |
and two expressions.<P>
|
|
|
769 |
Procedure tokens are based on this concept of parameterisation. Procedure
|
|
|
770 |
tokens reference program constructs that are parameterised by other
|
|
|
771 |
program constructs.<P>
|
|
|
772 |
There are three methods of introducing procedure tokens. These are
|
|
|
773 |
described in the sections below.<P>
|
|
|
774 |
<A NAME=S159>
|
|
|
775 |
<H3>F.9.1 <A NAME=22>General procedure tokens</H3>
|
|
|
776 |
The syntax for introducing a general procedure token is:<P>
|
|
|
777 |
<P>
|
|
|
778 |
<PRE>
|
|
|
779 |
general procedure:
|
|
|
780 |
PROC {<EM> bound-toks</EM><SUB><EM>opt</EM></SUB> | <EM>prog-pars</EM><SUB>
|
|
|
781 |
<EM>opt</EM></SUB> } <EM>token-introduction</EM>
|
|
|
782 |
|
|
|
783 |
simple procedure:
|
|
|
784 |
PROC ( <EM>bound-toks</EM><SUB><EM>opt</EM></SUB> ) <EM>token-introduction
|
|
|
785 |
</EM>
|
|
|
786 |
|
|
|
787 |
bound-toks:
|
|
|
788 |
<EM>bound-token</EM>
|
|
|
789 |
<EM>bound-token</EM>, <EM>bound-toks</EM>
|
|
|
790 |
|
|
|
791 |
bound-token:
|
|
|
792 |
<EM>token-introduction name-space</EM><SUB><EM>opt</EM></SUB> <EM>identifier
|
|
|
793 |
</EM>
|
|
|
794 |
|
|
|
795 |
prog-pars:
|
|
|
796 |
<EM>program-parameter</EM>
|
|
|
797 |
<EM>program-parameter</EM>, <EM>prog-pars</EM>
|
|
|
798 |
|
|
|
799 |
program parameter:
|
|
|
800 |
EXP<EM> identifier</EM>
|
|
|
801 |
STATEMENT<EM> identifier</EM>
|
|
|
802 |
TYPE <EM>type-name-identifier</EM>
|
|
|
803 |
MEMBER <EM>type-name-identifier</EM> : <EM>identifier</EM>
|
|
|
804 |
</PRE>
|
|
|
805 |
The final <CODE>token-introduction</CODE> specifies the kind of program
|
|
|
806 |
construct being parameterised. In the current implementation of the
|
|
|
807 |
compiler, only expressions and statements may be parameterised. The
|
|
|
808 |
internal procedure token identifier is placed in the default name
|
|
|
809 |
space of the program construct which it parameterises. For example,
|
|
|
810 |
the internal identifier of a procedure token parameterising an expression
|
|
|
811 |
would be placed in the macro name space.<P>
|
|
|
812 |
The <CODE>bound-toks</CODE> are the bound token dependencies which
|
|
|
813 |
describe the program constructs upon which the procedure token depends.
|
|
|
814 |
These should not be confused with the parameters of the token. The
|
|
|
815 |
procedure token introduced in:<P>
|
|
|
816 |
<PRE>
|
|
|
817 |
#pragma token PROC {TYPE t,EXP rvalue:t**:e|EXP e} EXP:rvalue:t:dderef#
|
|
|
818 |
</PRE>
|
|
|
819 |
is intended to represent a double dereference and depends upon the
|
|
|
820 |
type of the expression to be dereferenced and upon the expression
|
|
|
821 |
itself but takes only one argument, namely the expression, from which
|
|
|
822 |
both dependencies can be deduced.<P>
|
|
|
823 |
The bound token dependencies are introduced in exactly the same way
|
|
|
824 |
as the tokens described in the previous sections with the identifier
|
|
|
825 |
corresponding to the internal identification of the token. No external
|
|
|
826 |
identification is allowed. The scope of these local identifiers terminates
|
|
|
827 |
at the end of the procedure token introduction, and whilst in scope,
|
|
|
828 |
they hide all other identifiers in the same name space. Such tokens
|
|
|
829 |
are referred to as "bound" because they are local to the
|
|
|
830 |
procedure token.<P>
|
|
|
831 |
Once a bound token dependency has been introduced, it can be used
|
|
|
832 |
throughout the rest of the procedure token introduction in the construction
|
|
|
833 |
of other components.<P>
|
|
|
834 |
The <CODE>prog-pars</CODE> are the program parameters. They describe
|
|
|
835 |
the parameters with which the procedure token is called. The bound
|
|
|
836 |
token dependencies are deduced from these program parameters. <P>
|
|
|
837 |
Each program parameter is introduced with a keyword expressing the
|
|
|
838 |
kind of program construct that it represents. The keywords are as
|
|
|
839 |
follows:<P>
|
|
|
840 |
<UL>
|
|
|
841 |
<LI>EXP . The parameter is an expression and the identifier following
|
|
|
842 |
EXP must be the identification of a bound token for an expression.
|
|
|
843 |
When the procedure token is called, the corresponding parameter must
|
|
|
844 |
be an assignment-expression and is treated as the definition of the
|
|
|
845 |
bound token, thereby providing definitions for all dependencies relating
|
|
|
846 |
to that token. For example, the call of the procedure token dderef,
|
|
|
847 |
introduced above, in the code below:<P>
|
|
|
848 |
<PRE>
|
|
|
849 |
char f(char **c_ptr_ptr){
|
|
|
850 |
return dderef(c_ptr_ptr);
|
|
|
851 |
}
|
|
|
852 |
</PRE>
|
|
|
853 |
causes the expression, e, to be defined to be c_ptr_ptr thus resolving
|
|
|
854 |
the type t** to be char **. The type t is hence defined to be char,
|
|
|
855 |
also providing the type of the expression obtained by the application
|
|
|
856 |
of the procedure token dderef;<P>
|
|
|
857 |
<LI>STATEMENT. The parameter is a statement. Its semantics correspond
|
|
|
858 |
directly to those of EXP;<P>
|
|
|
859 |
<LI>TYPE. The parameter is a type. When the procedure token is applied,
|
|
|
860 |
the corresponding argument must be a<CODE> type-name</CODE>. The parameter
|
|
|
861 |
type is resolved to the argument type in order to define any related
|
|
|
862 |
dependencies;<P>
|
|
|
863 |
<LI>MEMBER. The parameter is a member selector. The <CODE>type-name</CODE>
|
|
|
864 |
specifies the composite type to which the member selector belongs
|
|
|
865 |
and the identifier is the identification of the member selector. When
|
|
|
866 |
the procedure token is applied, the corresponding argument must be
|
|
|
867 |
a <EM>member-designator</EM> of the compound type.<P>
|
|
|
868 |
</UL>
|
|
|
869 |
Currently PROC tokens cannot be passed as program parameters.<P>
|
|
|
870 |
<A NAME=S160>
|
|
|
871 |
<H3>F.9.2 Simple procedure tokens</H3>
|
|
|
872 |
In cases where there is a direct, one-to-one correspondence between
|
|
|
873 |
the bound token dependencies and the program parameters a simpler
|
|
|
874 |
form of procedure token introduction is available.<P>
|
|
|
875 |
Consider the two procedure token introductions below, corresponding
|
|
|
876 |
to the macro SWAP described earlier. <P>
|
|
|
877 |
<PRE>
|
|
|
878 |
/* General procedure introduction */
|
|
|
879 |
#pragma token PROC{TYPE t,EXP lvalue:t:e1,EXP lvalue:t:e2 | \
|
|
|
880 |
TYPE t,EXP e1,EXP e2 } STATEMENT SWAP#
|
|
|
881 |
/* Simple procedure introduction */
|
|
|
882 |
#pragma token PROC(TYPE t,EXP lvalue:t:,EXP lvalue:t: ) STATEMENT SWAP#
|
|
|
883 |
</PRE>
|
|
|
884 |
The <EM>simple-token</EM> syntax is similar to the <EM>bound-token</EM>
|
|
|
885 |
syntax, but it also introduces a program parameter for each bound
|
|
|
886 |
token. The bound token introduced by the <EM>simple-token</EM> syntax
|
|
|
887 |
is defined as though it had been introduced with the<EM> bound-token</EM>
|
|
|
888 |
syntax. If the final identifier is omitted, then no name space can
|
|
|
889 |
be specified, the bound token is not identified and in effect there
|
|
|
890 |
is a local hidden identifier.<P>
|
|
|
891 |
<A NAME=S161>
|
|
|
892 |
<H3>F.9.3 <A NAME=24>Function procedure tokens</H3>
|
|
|
893 |
One of the commonest uses of simple procedure tokens is to represent
|
|
|
894 |
function in-lining. In this case, the procedure token represents the
|
|
|
895 |
in-lining of the function, with the function parameters being the
|
|
|
896 |
program arguments of the procedure token call, and the program construct
|
|
|
897 |
resulting from the call of the procedure token being the corresponding
|
|
|
898 |
in-lining of the function. This is a direct parallel to the use of
|
|
|
899 |
macros to represent functions.<P>
|
|
|
900 |
The syntax for introducing function procedure tokens is:<P>
|
|
|
901 |
<PRE>
|
|
|
902 |
function-procedure:
|
|
|
903 |
FUNC <EM>type-name</EM> :
|
|
|
904 |
</PRE>
|
|
|
905 |
The type-name must be a prototyped function type. The pragma results
|
|
|
906 |
in the declaration of a function of that type with external linkage
|
|
|
907 |
and the introduction of a procedure token suitable for an in-lining
|
|
|
908 |
of the function. (If an ellipsis is present in the prototyped function
|
|
|
909 |
type, it is used in the function declaration but not in the procedure
|
|
|
910 |
token introduction.) Every parameter type and result type is mapped
|
|
|
911 |
onto the token introduction:<P>
|
|
|
912 |
<PRE>
|
|
|
913 |
EXP rvalue:
|
|
|
914 |
</PRE>
|
|
|
915 |
The example below:<P>
|
|
|
916 |
<PRE>
|
|
|
917 |
#pragma token FUNC int(int): putchar#
|
|
|
918 |
</PRE>
|
|
|
919 |
declares a function, putchar, which returns an int and takes an int
|
|
|
920 |
as its argument, and introduces a procedure token suitable for in-lining
|
|
|
921 |
putchar. Note that:<P>
|
|
|
922 |
<PRE>
|
|
|
923 |
#undef putchar
|
|
|
924 |
</PRE>
|
|
|
925 |
will remove the procedure token but not the underlying function.<P>
|
|
|
926 |
<A NAME=S162>
|
|
|
927 |
<H3>F.9.4 Defining procedure tokens</H3>
|
|
|
928 |
All procedure tokens are defined by the same mechanism. Since simple
|
|
|
929 |
and function procedure tokens can be transformed into general procedure
|
|
|
930 |
tokens, the definition will be explained in terms of general procedure
|
|
|
931 |
tokens.<P>
|
|
|
932 |
The syntax for defining procedure tokens is given below and is based
|
|
|
933 |
upon the standard parameterised macro definition. However, as in the
|
|
|
934 |
definitions of expressions and statements, the #defines of procedure
|
|
|
935 |
token identifiers are evaluated in phase 7 of translation as described
|
|
|
936 |
in the ISO C standard.<P>
|
|
|
937 |
<PRE>
|
|
|
938 |
#define <EM>identifier</EM> ( <EM>id-list</EM><SUB><EM>opt</EM></SUB> ) <EM>assignment-expression
|
|
|
939 |
</EM>
|
|
|
940 |
#define<EM> identifier</EM> (<EM> id-list</EM><SUB><EM>opt</EM></SUB> ) <EM>statement
|
|
|
941 |
</EM>
|
|
|
942 |
|
|
|
943 |
id-list:
|
|
|
944 |
<EM>identifier</EM>
|
|
|
945 |
<EM>identifer</EM>, <EM>id-list</EM>
|
|
|
946 |
</PRE>
|
|
|
947 |
The <EM>id-list</EM> must correspond directly to the program parameters
|
|
|
948 |
of the procedure token introduction. There must be precisely one identifier
|
|
|
949 |
for each program parameter. These identifiers are used to identify
|
|
|
950 |
the program parameters of the procedure token being defined and have
|
|
|
951 |
a scope that terminates at the end of the procedure token definition.
|
|
|
952 |
They are placed in the default name spaces for the kinds of program
|
|
|
953 |
constructs which they identify.<P>
|
|
|
954 |
None of the bound token dependencies can be defined during the evaluation
|
|
|
955 |
of the definition of the procedure token since they are effectively
|
|
|
956 |
provided by the arguments of the procedure token each time it is called.
|
|
|
957 |
To illustrate this, consider the example below based on the dderef
|
|
|
958 |
token used earlier.<P>
|
|
|
959 |
<PRE>
|
|
|
960 |
#pragma token PROC{TYPE t, EXP rvalue:t**:e|EXP e}EXP rvalue:t:dderef#
|
|
|
961 |
#define dderef (A) (**(A))
|
|
|
962 |
</PRE>
|
|
|
963 |
The identifiers t and e are not in scope during the definition, being
|
|
|
964 |
merely local identifiers for use in the procedure token introduction.
|
|
|
965 |
The only identifier in scope is A. A identifies an expression token
|
|
|
966 |
which is an rvalue whose type is a pointer to a pointer to a type
|
|
|
967 |
token. The expression token and the type token are provided by the
|
|
|
968 |
arguments at the time of calling the procedure token.<P>
|
|
|
969 |
Again, the presence of a procedure token introduction can alter the
|
|
|
970 |
semantics of a program. Consider the program below.<P>
|
|
|
971 |
<PRE>
|
|
|
972 |
#pragma token PROC {TYPE t, EXP lvalue:t:,EXP lvalue:t:}STATEMENT SWAP#
|
|
|
973 |
#define SWAP(T,A,B)\
|
|
|
974 |
{T x; x=B; B=A; A=x;}
|
|
|
975 |
void f(int x, int y) {
|
|
|
976 |
SWAP(int, x, y)
|
|
|
977 |
}
|
|
|
978 |
</PRE>
|
|
|
979 |
The definition and call of the procedure token are extremely straightforward.
|
|
|
980 |
However, if the procedure token introduction is absent, the swap does
|
|
|
981 |
not take place because x refers to the variable in the inner scope.<P>
|
|
|
982 |
Function procedure tokens are introduced with tentative implicit definitions,
|
|
|
983 |
defining them to be direct calls of the functions they reference and
|
|
|
984 |
effectively removing the in-lining capability. If a genuine definition
|
|
|
985 |
is found later in the compilation, it overrides the tentative definition.
|
|
|
986 |
An example of a tentative definition is shown below:<P>
|
|
|
987 |
<PRE>
|
|
|
988 |
#pragma token FUNC int(int, long) : func#
|
|
|
989 |
#define func(A, B) (func) (A, B)
|
|
|
990 |
</PRE>
|
|
|
991 |
<A NAME=S163>
|
|
|
992 |
<HR><H2>F.10 Tokens and APIs</H2>
|
|
|
993 |
In Chapter 1 we mentioned that one of the main problems in writing
|
|
|
994 |
portable software is the lack of separation between specification
|
|
|
995 |
and implementation of APIs. The TenDRA technology uses the token syntax
|
|
|
996 |
described in the previous sections to provide an abstract description
|
|
|
997 |
of an API specification. Collections of tokens representing APIs are
|
|
|
998 |
called "interfaces". Tchk can compile programs with these
|
|
|
999 |
interfaces in order to check applications against API specifications
|
|
|
1000 |
independently of any particular implementation that may be present
|
|
|
1001 |
on the developer's machine. <P>
|
|
|
1002 |
In order to produce executable code, definitions of the interface
|
|
|
1003 |
tokens must be provided on all target machines. This is done by compiling
|
|
|
1004 |
the interfaces with the system headers and libraries.<P>
|
|
|
1005 |
When developing applications, programmers must ensure that they do
|
|
|
1006 |
not accidentally define a token expressing an API. Implementers of
|
|
|
1007 |
APIs, however, do not want to inadvertently fail to define a token
|
|
|
1008 |
expressing that API. Token definition states have been introduced
|
|
|
1009 |
to enable programmers to instruct the compiler to check that tokens
|
|
|
1010 |
are defined when and only when they wish them to be. This is fundamental
|
|
|
1011 |
to the separation of programs into portable and unportable parts.<P>
|
|
|
1012 |
When tokens are first introduced, they are in the free state. This
|
|
|
1013 |
means that the token can be defined or left undefined and if the token
|
|
|
1014 |
is defined during compilation, its definition will be output as TDF.
|
|
|
1015 |
<P>
|
|
|
1016 |
Once a token has been given a valid definition, its definition state
|
|
|
1017 |
moves to defined. Tokens may only be defined once. Any attempt to
|
|
|
1018 |
define a token in the defined state is flagged as an error.<P>
|
|
|
1019 |
There are three more token definition states which may be set by the
|
|
|
1020 |
programmer. These are as follows:<P>
|
|
|
1021 |
<UL>
|
|
|
1022 |
<LI>Indefinable - the token is not defined and must not be defined.
|
|
|
1023 |
Any attempt to define the token will cause an error. When compiling
|
|
|
1024 |
applications, interface tokens should be in the indefinable state.
|
|
|
1025 |
It is not possible for a token to move from the state of defined to
|
|
|
1026 |
indefinable;<P>
|
|
|
1027 |
<LI>Committed - the token must be defined during the compilation.
|
|
|
1028 |
If no definition is found the compiler will raise an error. Interface
|
|
|
1029 |
tokens should be in the committed state when being compiled with the
|
|
|
1030 |
system headers and libraries to provide definitions;<P>
|
|
|
1031 |
<LI>Ignored - any definition of the token that is assigned during
|
|
|
1032 |
the compilation of the program will not be output as TDF;<P>
|
|
|
1033 |
</UL>
|
|
|
1034 |
These token definition states are set using the pragmas: <P>
|
|
|
1035 |
<PRE>
|
|
|
1036 |
#pragma <EM>token-op token-id-list</EM><SUB><EM>opt</EM></SUB>
|
|
|
1037 |
|
|
|
1038 |
token-op:
|
|
|
1039 |
define
|
|
|
1040 |
no_def
|
|
|
1041 |
ignore
|
|
|
1042 |
interface
|
|
|
1043 |
|
|
|
1044 |
token-id-list:
|
|
|
1045 |
TAG<SUB><EM>opt</EM></SUB> <EM>identifier</EM> <EM>dot-list</EM><SUB><EM>opt
|
|
|
1046 |
</EM></SUB> <EM>token-id-list</EM><SUB><EM>opt</EM></SUB>
|
|
|
1047 |
|
|
|
1048 |
dot-list:
|
|
|
1049 |
. <EM>member-designator</EM>
|
|
|
1050 |
</PRE>
|
|
|
1051 |
The <EM>token-id-list</EM> is the list of tokens to which the definition
|
|
|
1052 |
state applies. The tokens in the <CODE>token-id-list</CODE> are identified
|
|
|
1053 |
by an identifier, optionally preceded by TAG. If TAG is present, the
|
|
|
1054 |
identifier refers to the tag name space, otherwise the macro and ordinary
|
|
|
1055 |
name spaces are searched for the identifier. If there is no <EM>dot-list
|
|
|
1056 |
</EM>present, the identifier must refer to a token. If the <EM>dot-list
|
|
|
1057 |
</EM>is present, the identifier must refer to a compound type and
|
|
|
1058 |
the member-designator must identify a member selector token of that
|
|
|
1059 |
compound type.<P>
|
|
|
1060 |
The<EM> token-op </EM>specifies the definition state to be associated
|
|
|
1061 |
with the tokens in the <CODE>token-id-list</CODE>. There are three
|
|
|
1062 |
literal operators and one context dependent operator, as follows:<P>
|
|
|
1063 |
<OL>
|
|
|
1064 |
<LI><CODE>no_def</CODE> causes the token state to move to indefinable.<P>
|
|
|
1065 |
<LI><CODE>define</CODE> causes the token state to move to committed;<P>
|
|
|
1066 |
<LI><CODE>ignore</CODE> causes the token state to move to ignored;<P>
|
|
|
1067 |
<LI><CODE>interface</CODE> is the context dependent operator and is
|
|
|
1068 |
used when describing extensions to existing APIs.<P>
|
|
|
1069 |
</OL>
|
|
|
1070 |
As an example of an extension API, consider the POSIX stdio.h. This
|
|
|
1071 |
is an extension of the ANSI stdio.h and uses the same tokens to represent
|
|
|
1072 |
the common part of the interface. When compiling applications, nothing
|
|
|
1073 |
can be assumed about the implementation of the ANSI tokens accessed
|
|
|
1074 |
via the POSIX API so they should be in the indefinable state. When
|
|
|
1075 |
the POSIX tokens are being implemented, however, the ANSI implementations
|
|
|
1076 |
can be assumed. The ANSI tokens are then in the ignored state. (Since
|
|
|
1077 |
the definitions of these tokens will have been output already during
|
|
|
1078 |
the definition of the ANSI interface, they should not be output again.)<P>
|
|
|
1079 |
The <CODE>interface</CODE> operator has a variable interpretation
|
|
|
1080 |
to allow the correct definition state to be associated with these
|
|
|
1081 |
`base-API tokens'. The compiler associates a compilation state with
|
|
|
1082 |
each file it processes. These compilation states determine the interpretation
|
|
|
1083 |
of the interface operator within that file.<P>
|
|
|
1084 |
The default compilation state is the standard state. In this state
|
|
|
1085 |
the <CODE>interface</CODE> operator is interpreted as the <CODE>no_def</CODE>
|
|
|
1086 |
operator. This is the standard state for compiling applications in
|
|
|
1087 |
the presence of APIs;<P>
|
|
|
1088 |
Files included using:
|
|
|
1089 |
<PRE>
|
|
|
1090 |
#include header
|
|
|
1091 |
</PRE>
|
|
|
1092 |
have the same compilation state as the file from which they were included.<P>
|
|
|
1093 |
The implementation compilation state is associated with files included
|
|
|
1094 |
using:
|
|
|
1095 |
<PRE>
|
|
|
1096 |
#pragma implement interface <EM>header</EM>
|
|
|
1097 |
</PRE>
|
|
|
1098 |
In this context the <CODE>interface</CODE> operator is interpreted
|
|
|
1099 |
as the <CODE>define</CODE> operator.<P>
|
|
|
1100 |
Including a file using:
|
|
|
1101 |
<PRE>
|
|
|
1102 |
#pragma extend interface <EM>header</EM>
|
|
|
1103 |
</PRE>
|
|
|
1104 |
causes the compilation state to be extension unless the file from
|
|
|
1105 |
which it was included was in the standard state, in which case the
|
|
|
1106 |
compilation state is the standard state. In the extension state the
|
|
|
1107 |
<CODE>interface</CODE> operator is interpreted as the <CODE>ignore</CODE>
|
|
|
1108 |
operator.<P>
|
|
|
1109 |
<HR>
|
|
|
1110 |
<P><I>Part of the <A HREF="../index.html">TenDRA Web</A>.<BR>Crown
|
|
|
1111 |
Copyright © 1998.</I></P>
|
|
|
1112 |
</BODY>
|
|
|
1113 |
</HTML>
|