2 |
7u83 |
1 |
<!-- Crown Copyright (c) 1998 -->
|
|
|
2 |
<HTML>
|
|
|
3 |
<HEAD>
|
|
|
4 |
<TITLE>
|
|
|
5 |
C++ Producer Guide: Parsing C++
|
|
|
6 |
</TITLE>
|
|
|
7 |
</HEAD>
|
|
|
8 |
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#400080" ALINK="#FF0000">
|
|
|
9 |
|
|
|
10 |
<H1>C++ Producer Guide</H1>
|
|
|
11 |
<H3>March 1998</H3>
|
|
|
12 |
<A HREF="tdf.html"><IMG SRC="../images/next.gif" ALT="next section"></A>
|
|
|
13 |
<A HREF="error.html"><IMG SRC="../images/prev.gif" ALT="previous section"></A>
|
|
|
14 |
<A HREF="index.html"><IMG SRC="../images/top.gif" ALT="current document"></A>
|
|
|
15 |
<A HREF="../index.html"><IMG SRC="../images/home.gif" ALT="TenDRA home page">
|
|
|
16 |
</A>
|
|
|
17 |
<IMG SRC="../images/no_index.gif" ALT="document index"><P>
|
|
|
18 |
<HR>
|
|
|
19 |
|
|
|
20 |
<H2>3.4. Parsing C++</H2>
|
|
|
21 |
<P>
|
|
|
22 |
The parser used in the C++ producer is generated using the
|
|
|
23 |
<A HREF="../utilities/sid.html"><CODE>sid</CODE> tool</A>. Because
|
|
|
24 |
of the large size of the generated code (1.3MB), the <CODE>sid</CODE>
|
|
|
25 |
output is run through a simple program, <CODE>sidsplit</CODE>, which
|
|
|
26 |
splits the output into a number of more manageable modules. It also
|
|
|
27 |
transforms the code to use the <A HREF="style.html#language"><CODE>PROTO</CODE>
|
|
|
28 |
macros</A> used in the rest of the program.
|
|
|
29 |
</P>
|
|
|
30 |
<P>
|
|
|
31 |
<CODE>sid</CODE> is designed as a parser for grammars which can be
|
|
|
32 |
transformed into LL(1) grammars. The distinguishing feature of these
|
|
|
33 |
grammars is that the parser can always decide what to do next based
|
|
|
34 |
on the current terminal. This is not the case in C++; in some circumstances
|
|
|
35 |
a potentially unlimited look-ahead is required to distinguish, for
|
|
|
36 |
example, declaration statements from expression statements. In the
|
|
|
37 |
technical phrase, C++ is an LL(k) grammar. Fortunately there are relatively
|
|
|
38 |
few such situations, and <CODE>sid</CODE>
|
|
|
39 |
provides a mechanism, <A HREF="../utilities/sid.html#predicate">predicates</A>,
|
|
|
40 |
for bypassing the normal parsing mechanism in these cases. Thus it
|
|
|
41 |
is possible, although difficult, to express C++ as a <CODE>sid</CODE>
|
|
|
42 |
grammar.
|
|
|
43 |
</P>
|
|
|
44 |
<P>
|
|
|
45 |
The <CODE>sid</CODE> grammar file, <CODE>syntax.sid</CODE>, is closely
|
|
|
46 |
based on the ISO C++ grammar. In particular, the same production
|
|
|
47 |
names have been used. The grammar has been extended slightly to allow
|
|
|
48 |
common syntactic errors to be detected elegantly. Other parsing errors
|
|
|
49 |
are handled by <CODE>sid</CODE>'s exception mechanism. At present
|
|
|
50 |
there is only limited recovery after such errors.
|
|
|
51 |
</P>
|
|
|
52 |
<P>
|
|
|
53 |
The lexical analysis routines in the C++ producer are hand-crafted,
|
|
|
54 |
based on an initial version generated by the simple lexical analyser
|
|
|
55 |
generator,
|
|
|
56 |
<CODE>lexi</CODE>. <CODE>lexi</CODE> has been used more directly
|
|
|
57 |
to generate the lexical analysers for certain of the other automatic
|
|
|
58 |
code generating tools, including <CODE>calculus</CODE>, used in the
|
|
|
59 |
producer.
|
|
|
60 |
</P>
|
|
|
61 |
<P>
|
|
|
62 |
The <CODE>sid</CODE> grammar contains a number of entry points. The
|
|
|
63 |
most important is <CODE>parse_file</CODE>, which is used to parse
|
|
|
64 |
a complete C++ translation unit. The syntax for the
|
|
|
65 |
<A HREF="pragma.html"><CODE>#pragma TenDRA</CODE> directives</A> is
|
|
|
66 |
included within the same grammar with two entry points,
|
|
|
67 |
<CODE>parse_tendra</CODE> in normal use, and <CODE>parse_preproc</CODE>
|
|
|
68 |
for use in preprocessing mode. There are also entry points in the
|
|
|
69 |
grammar for each of the kinds of <A HREF="token.html#args">token argument</A>.
|
|
|
70 |
The parsing routines for token and template arguments are largely
|
|
|
71 |
hand-crafted, based on these primitives.
|
|
|
72 |
</P>
|
|
|
73 |
<P>
|
|
|
74 |
Certain parsing operations are performed before control passes to
|
|
|
75 |
the
|
|
|
76 |
<CODE>sid</CODE> grammar. As mentioned above, these include the processing
|
|
|
77 |
of token and template applications. The other important case concerns
|
|
|
78 |
nested name specifiers. For example, in:
|
|
|
79 |
<PRE>
|
|
|
80 |
class A {
|
|
|
81 |
class B {
|
|
|
82 |
static int c ;
|
|
|
83 |
} ;
|
|
|
84 |
} ;
|
|
|
85 |
|
|
|
86 |
int A::B::c = 0 ;
|
|
|
87 |
</PRE>
|
|
|
88 |
the qualified identifier <CODE>A::B::c</CODE> is split into two terminals,
|
|
|
89 |
a nested name specifier, <CODE>A::B::</CODE>, and an identifier, <CODE>c</CODE>,
|
|
|
90 |
which is looked up in the corresponding namespace. Note that it is
|
|
|
91 |
at this stage that name look-up occurs. An identifier can be mapped
|
|
|
92 |
to one of a number of terminals, including keywords, type names,
|
|
|
93 |
namespace names and other identifiers, according to the result of
|
|
|
94 |
this look-up. If the look-up gives a macro then this is expanded
|
|
|
95 |
at this stage.
|
|
|
96 |
</P>
|
|
|
97 |
|
|
|
98 |
<HR>
|
|
|
99 |
<P><I>Part of the <A HREF="../index.html">TenDRA Web</A>.<BR>Crown
|
|
|
100 |
Copyright © 1998.</I></P>
|
|
|
101 |
</BODY>
|
|
|
102 |
</HTML>
|