Subversion Repositories tendra.SVN

Rev

Rev 2 | Details | Compare with Previous | Last modification | View Log | RSS feed

Rev Author Line No. Line
2 7u83 1
<!-- Crown Copyright (c) 1998 -->
2
<HTML>
3
<HEAD>
4
<TITLE>
5
C++ Producer Guide: Parsing C++ 
6
</TITLE>
7
</HEAD>
8
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#400080" ALINK="#FF0000">
9
 
10
<H1>C++ Producer Guide</H1>
11
<H3>March 1998</H3>
12
<A HREF="tdf.html"><IMG SRC="../images/next.gif" ALT="next section"></A>
13
<A HREF="error.html"><IMG SRC="../images/prev.gif" ALT="previous section"></A>
14
<A HREF="index.html"><IMG SRC="../images/top.gif" ALT="current document"></A>
15
<A HREF="../index.html"><IMG SRC="../images/home.gif" ALT="TenDRA home page">
16
</A>
17
<IMG SRC="../images/no_index.gif" ALT="document index"><P>
18
<HR>
19
 
20
<H2>3.4. Parsing C++</H2>
21
<P>
22
The parser used in the C++ producer is generated using the 
23
<A HREF="../utilities/sid.html"><CODE>sid</CODE> tool</A>.  Because
24
of the large size of the generated code (1.3MB), the <CODE>sid</CODE>
25
output is run through a simple program, <CODE>sidsplit</CODE>, which
26
splits the output into a number of more manageable modules.  It also
27
transforms the code to use the <A HREF="style.html#language"><CODE>PROTO</CODE>
28
macros</A> used in the rest of the program. 
29
</P>
30
<P>
31
<CODE>sid</CODE> is designed as a parser for grammars which can be
32
transformed into LL(1) grammars.  The distinguishing feature of these
33
grammars is that the parser can always decide what to do next based
34
on the current terminal.  This is not the case in C++; in some circumstances
35
a potentially unlimited look-ahead is required to distinguish, for
36
example, declaration statements from expression statements.  In the
37
technical phrase, C++ is an LL(k) grammar. Fortunately there are relatively
38
few such situations, and <CODE>sid</CODE>
39
provides a mechanism, <A HREF="../utilities/sid.html#predicate">predicates</A>,
40
for bypassing the normal parsing mechanism in these cases.  Thus it
41
is possible, although difficult, to express C++ as a <CODE>sid</CODE>
42
grammar. 
43
</P>
44
<P>
45
The <CODE>sid</CODE> grammar file, <CODE>syntax.sid</CODE>, is closely
46
based on the ISO C++ grammar.  In particular, the same production
47
names have been used.  The grammar has been extended slightly to allow
48
common syntactic errors to be detected elegantly.  Other parsing errors
49
are handled by <CODE>sid</CODE>'s exception mechanism.  At present
50
there is only limited recovery after such errors. 
51
</P>
52
<P>
53
The lexical analysis routines in the C++ producer are hand-crafted,
54
based on an initial version generated by the simple lexical analyser
55
generator, 
56
<CODE>lexi</CODE>.  <CODE>lexi</CODE> has been used more directly
57
to generate the lexical analysers for certain of the other automatic
58
code generating tools, including <CODE>calculus</CODE>, used in the
59
producer. 
60
</P>
61
<P>
62
The <CODE>sid</CODE> grammar contains a number of entry points.  The
63
most important is <CODE>parse_file</CODE>, which is used to parse
64
a complete C++ translation unit.  The syntax for the 
65
<A HREF="pragma.html"><CODE>#pragma TenDRA</CODE> directives</A> is
66
included within the same grammar with two entry points, 
67
<CODE>parse_tendra</CODE> in normal use, and <CODE>parse_preproc</CODE>
68
for use in preprocessing mode.  There are also entry points in the
69
grammar for each of the kinds of <A HREF="token.html#args">token argument</A>.
70
The parsing routines for token and template arguments are largely
71
hand-crafted, based on these primitives. 
72
</P>
73
<P>
74
Certain parsing operations are performed before control passes to
75
the 
76
<CODE>sid</CODE> grammar.  As mentioned above, these include the processing
77
of token and template applications.  The other important case concerns
78
nested name specifiers.  For example, in: 
79
<PRE>
80
	class A {
81
	    class B {
82
		static int c ;
83
	    } ;
84
	} ;
85
 
86
	int A::B::c = 0 ;
87
</PRE>
88
the qualified identifier <CODE>A::B::c</CODE> is split into two terminals,
89
a nested name specifier, <CODE>A::B::</CODE>, and an identifier, <CODE>c</CODE>,
90
which is looked up in the corresponding namespace.  Note that it is
91
at this stage that name look-up occurs. An identifier can be mapped
92
to one of a number of terminals, including  keywords, type names,
93
namespace names and other identifiers, according to the result of
94
this look-up.  If the look-up gives a macro then this is expanded
95
at this stage. 
96
</P>
97
 
98
<HR>
99
<P><I>Part of the <A HREF="../index.html">TenDRA Web</A>.<BR>Crown
100
Copyright &copy; 1998.</I></P>
101
</BODY>
102
</HTML>