Subversion Repositories tendra.SVN

Rev

Details | Last modification | View Log | RSS feed

Rev Author Line No. Line
2 7u83 1
<!-- Crown Copyright (c) 1998 -->
2
<HTML>
3
<HEAD>
4
<TITLE>sid users' guide</TITLE>
5
</HEAD>
6
<BODY TEXT="#000000" BGCOLOR="#FFFFFF" LINK="#0000FF" VLINK="#400080" ALINK="#FF0000">
7
<H1>sid Users' Guide</H1>
8
<H3>January 1998</H3>
9
<IMG SRC="../images/no_next.gif" ALT="next section">
10
<IMG SRC="../images/no_prev.gif" ALT="previous section">
11
<IMG SRC="../images/no_top.gif" ALT="current document">
12
<A HREF="../index.html"><IMG SRC="../images/home.gif" ALT="TenDRA home page">
13
</A>
14
<IMG SRC="../images/no_index.gif" ALT="document index"><P>
15
<HR>
16
<DL>
17
<DT><A HREF="#intro"><B>1</B> - Introduction</A><DD>
18
<DT><A HREF="#grammar"><B>2</B> - Grammars</A><DD>
19
<DL>
20
<DT><A HREF="#parse"><B>2.1</B> - Parsing</A><DD>
21
<DT><A HREF="#context-free"><B>2.2</B> - Context free grammars</A><DD>
22
<DT><A HREF="#sid-grammar"><B>2.3</B> - <CODE>sid</CODE> grammars</A><DD>
23
</DL>
24
<DT><A HREF="#overview"><B>3</B> - Overview</A><DD>
25
<DL>
26
<DT><A HREF="#left-recursion"><B>3.1</B> - Left recursion elimination</A><DD>
27
<DT><A HREF="#factoring"><B>3.2</B> - Factoring</A><DD>
28
<DT><A HREF="#optimise"><B>3.3</B> - Optimisations</A><DD>
29
</DL>
30
<DT><A HREF="#invoke"><B>4</B> - Invocation</A><DD>
31
<DT><A HREF="#grammar-file"><B>5</B> - The <CODE>sid</CODE> grammar
32
file</A><DD>
33
<DL>
34
<DT><A HREF="#grammar-lex"><B>5.1</B> - Lexical conventions</A><DD>
35
<DT><A HREF="#type-decl"><B>5.2</B> - The type declaration section</A><DD>
36
<DT><A HREF="#term-decl"><B>5.3</B> - The terminal declaration section</A><DD>
37
<DT><A HREF="#rule-defn"><B>5.4</B> - The rule definition section</A><DD>
38
<DT><A HREF="#entry-point"><B>5.5</B> - The grammar entry points section</A><DD>
39
</DL>
40
<DT><A HREF="#info-file"><B>6</B> - The C information file</A><DD>
41
<DL>
42
<DT><A HREF="#info-lex"><B>6.1</B> - Lexical conventions</A><DD>
43
<DT><A HREF="#prefixes"><B>6.2</B> - The prefixes section</A><DD>
44
<DT><A HREF="#maps"><B>6.3</B> - The maps section</A><DD>
45
<DT><A HREF="#header"><B>6.4</B> - The header section</A><DD>
46
<DT><A HREF="#assign"><B>6.5</B> - The assignments section</A><DD>
47
<DT><A HREF="#param-assign"><B>6.6</B> - The parameter assignments
48
section</A><DD>
49
<DT><A HREF="#result-assign"><B>6.7</B> - The result assignments section</A><DD>
50
<DT><A HREF="#term-result"><B>6.8</B> - The terminal result extraction
51
section</A><DD>
52
<DT><A HREF="#action-defn"><B>6.9</B> - The action definition section</A><DD>
53
<DT><A HREF="#trailer"><B>6.10</B> - The trailer section</A><DD>
54
</DL>
55
<DT><A HREF="#predicate"><B>7</B> - Predicates</A><DD>
56
<DT><A HREF="#exception"><B>8</B> - Error handling</A><DD>
57
<DT><A HREF="#call-by-ref"><B>9</B> - Call by reference</A><DD>
58
<DT><A HREF="#call-entry"><B>10</B> - Calling entry points</A><DD>
59
<DT><A HREF="#glossary"><B>11</B> - Glossary</A><DD>
60
<DT><A HREF="#errors"><B>12</B> - Understanding error messages</A><DD>
61
<DL>
62
<DT><A HREF="#left-errors"><B>12.1</B> - Left recursion elimination
63
errors</A><DD>
64
<DT><A HREF="#first-errors"><B>12.2</B> - First set computation errors</A><DD>
65
<DT><A HREF="#factor-errors"><B>12.3</B> - Factoring errors</A><DD>
66
<DT><A HREF="#check-errors"><B>12.4</B> - Checking errors</A><DD>
67
</DL>
68
</DL>
69
 
70
<HR>
71
<H2><A NAME="intro">1. Introduction</A></H2>
72
<P>
73
This document describes how to use the <CODE>sid</CODE> parser generator.
74
It was written for <CODE>sid</CODE> version 1.9.  The main features
75
of each version of <CODE>sid</CODE> are listed below: 
76
<UL>
77
 
78
<LI><CODE>sid</CODE> 1.0 - this was the first version of <CODE>sid</CODE>
79
to support attribute grammars.  The output language was C. 
80
<LI><CODE>sid</CODE> 1.1 - this was a bug fix version of <CODE>sid</CODE>
81
1.0. 
82
<LI><CODE>sid</CODE> 1.2 - this version of <CODE>sid</CODE> added
83
predicates, exception handling, improved inlining options, and a 
84
grammar test pseudo-language. 
85
<LI><CODE>sid</CODE> 1.3 - this version of <CODE>sid</CODE> added
86
anonymous rules, and a better syntax for the C specific information.
87
 
88
<LI><CODE>sid</CODE> 1.4 - this was a bug fix version of <CODE>sid</CODE>
89
1.3. 
90
<LI><CODE>sid</CODE> 1.5 - this was a bug fix version of <CODE>sid</CODE>
91
1.4.  The command line option syntax changed in this release as well.
92
 
93
<LI><CODE>sid</CODE> 1.6 - this version of <CODE>sid</CODE> changed
94
the     input syntax, added scoped rules, and removed basics (replacing
95
them with terminal symbols).  It also added a stricter ISO C language.
96
 
97
<LI><CODE>sid</CODE> 1.7 - The syntax of the actions file changed
98
slightly in this version.  The error messages and recovery from  
99
syntax errors were also improved in this version.  This version  
100
also added explicit call by reference support (rather than the   
101
inconsistent function call semantics of earlier versions).  The  
102
command line options changed in this version, to support language
103
specific options, and the <CODE>strict-ansi-c</CODE> language was
104
dropped.  Non-local variables were also added. 
105
<LI><CODE>sid</CODE> 1.8 - Initialisers were added for non-local 
106
variables.  Assignment was added. 
107
<LI><CODE>sid</CODE> 1.9 - this was a bug fix version of <CODE>sid</CODE>
108
1.8. 
109
</UL>
110
</P>
111
<P>
112
<CODE>sid</CODE> turns specifications of languages into programs that
113
recognise those languages.  One of the aims of <CODE>sid</CODE> was
114
to separate the specification of the language to be recognised from
115
the language that the recogniser program is written in.  For this
116
reason, input to <CODE>sid</CODE> is split into two components: output
117
language independent information, and output language dependent information.
118
</P>
119
<P>
120
At present, <CODE>sid</CODE> will only output programs in C (either
121
ISO or pre-ISO), but it is designed so that adding new output languages
122
should be fairly simple.  There is one other pseudo-language: the
123
test language.  This is used for testing grammars and the transforms,
124
but will not output a parser. 
125
</P>
126
 
127
<HR>
128
<H2><A NAME="grammar">2. Grammars</A></H2>
129
 
130
<H3><A NAME="parse">2.1. Parsing</A></H3>
131
<P>
132
There are two phases in traditional language recognition that are
133
relevant to <CODE>sid</CODE>: the first is lexical analysis (breaking
134
the input up into terminal symbols); the second is syntax analysis
135
or parsing (ensuring that the terminal symbols in the input are in
136
the correct order). 
137
</P>
138
<P>
139
<CODE>sid</CODE> currently does very little to help with lexical analysis.
140
It is possible to use <CODE>sid</CODE> to produce the lexical analyser,
141
but <CODE>sid</CODE> provides no real support for it at present. 
142
For now, the programmer is expected to write the lexical analyser,
143
or use another tool to produce it. 
144
</P>
145
<P>
146
The lexical analyser should break the input up into a series of terminals.
147
Each of these terminals is allocated a number.  In 
148
<CODE>sid</CODE>, these numbers range from zero to the maximum terminal
149
number (specified in the grammar description).  The terminals may
150
also have data associated with them (e.g. the value of a number),
151
known as the attributes of the terminal, or the results of the terminal.
152
</P>
153
<P>
154
<CODE>sid</CODE> generates the parser.  The parser is a program that
155
reads the sequence of terminals from the lexical analyser, and ensures
156
that they form a valid input in the language being recognised.  If
157
the input is not valid, then the parser will fail (<CODE>sid</CODE>
158
provides mechanisms to allow the parser to recover from errors). 
159
</P>
160
 
161
<H3><A NAME="context-free">2.2. Context free grammars</A></H3>
162
<P>
163
This section provides a brief introduction to grammars.  It is not
164
intended to be an in-depth guide to grammars, more an introduction
165
which defines some terminology. 
166
</P>
167
<P>
168
<CODE>sid</CODE> works with a subset of what are known as context
169
free grammars.  A context free grammar consists of a set of input
170
symbols (known as terminals), a set of rules (descriptions of what
171
are legal forms in the language, also known as non-terminals), and
172
an entry point (the rule from which the grammar starts). 
173
</P>
174
<P>
175
A rule is defined as a series of alternatives (throughout this document
176
the definition of a rule is called a production - this may conflict
177
with some other uses of the term).  Each alternative consists of a
178
sequence of items.  An item can be a terminal or a rule.  As an example
179
(using the <CODE>sid</CODE> notation, which now looks unlike the conventional
180
syntax for grammars), a comma separated list of integers could be
181
specified as: 
182
<PRE>
183
	list-of-integers = {
184
		integer ;
185
	    ||
186
		integer ;
187
		comma ;
188
		list-of-integers ;
189
	} ;
190
</PRE>
191
This production contains two alternatives: the first matches the terminal
192
<CODE>integer</CODE>; the second matches the sequence of terminals
193
<CODE>integer</CODE> followed by <CODE>comma</CODE>, followed by another
194
list of integers. 
195
</P>
196
<P>
197
There is much documentation available on context free grammars (and
198
other types of formal grammar).  The reader is advised to find an
199
alternative source for more information. 
200
</P>
201
 
202
<H3><A NAME="sid-grammar">2.3. <CODE>sid</CODE> grammars</A></H3>
203
<P>
204
<CODE>sid</CODE> grammars are based upon a subset of context free
205
grammars, known as LL(1) grammars.  The main property of such grammars
206
is that the parser can always tell which alternative it is going to
207
parse next by looking at the current terminal symbol.  <CODE>sid</CODE>
208
does a number of transforms to turn grammars that are not in this
209
form into it (although it cannot turn all possible grammars into this
210
form). It also provides facilities that allow the user to alter the
211
control flow of the parser. 
212
</P>
213
<P>
214
<CODE>sid</CODE> makes the following changes to the context free grammars
215
described above: 
216
<OL>
217
<LI>There may be more than one entry point to the grammar. 
218
<LI>As well as being a rule or a terminal, an item may be an action,
219
a predicate or an identity.  An action is just a user supplied function.
220
A predicate is a cross between a basic and an action (it is a user
221
defined function but it may alter the control flow).  An identity
222
is like an assignment in a conventional programming language. 
223
<LI>Rules may take parameters and return results (as may actions;
224
terminals may only return results).  These may be passed between items
225
using names. 
226
<LI>Each rule can have an exception handler associated with it. Exception
227
handlers are used for error recovery when the input being parsed does
228
not match the grammar. 
229
<LI>Rules may be anonymous. 
230
<LI>Rules may have non-local names associated with them.  These names
231
are in scope for that rule and any rules that are defined inside it.
232
The value of each non-local name is saved on entry to the rule, and
233
restored when the rule is exited. 
234
</OL>
235
</P>
236
 
237
<HR>
238
<H2><A NAME="overview">3. Overview</A></H2>
239
<P>
240
<CODE>sid</CODE> takes the input grammar and applies several transformations
241
to it, before it produces the parser.  These transformations allow
242
the language descriptions to be written in a slightly more natural
243
form than would otherwise be necessary. 
244
</P>
245
 
246
<H3><A NAME="left-recursion">3.1. Left recursion elimination</A></H3>
247
<P>
248
The first transformation is to eliminate any left recursive productions,
249
replacing them with right recursive ones.  This transforms a set of
250
rules of the form: 
251
<PRE>
252
	Ai = Aj Bji, Ci
253
</PRE>
254
into: 
255
<PRE>
256
	Ai = Cj Xji
257
	Xji = Bjk Xki, Yji
258
</PRE>
259
where <CODE>Yji</CODE> is the identity function if <CODE>i</CODE>
260
equals 
261
<CODE>j</CODE>, and an error otherwise.  In order for this transformation
262
to work, the following conditions must hold: 
263
<OL>
264
 
265
<LI>The parameter type for all <CODE>Ai</CODE> must be the same. 
266
<LI>The result type for all <CODE>Ai</CODE> must be the same. 
267
<LI>The exception handlers for all <CODE>Ai</CODE> must be the same.
268
 
269
<LI>The parameters to each left recursive call to some <CODE>Aj</CODE>
270
must be exactly the formal parameters to the calling rule <CODE>Ai</CODE>.
271
 
272
<LI>Any non-local name referenced by any rule must be in scope for
273
all rules. 
274
<LI>A rule may not define non-local variables unless it is the only
275
entry point into the cycle (i.e. there is only one named rule in the
276
cycle). 
277
</OL>
278
<CODE>sid</CODE> will report an error if these conditions are not
279
met. 
280
</P>
281
 
282
<H3><A NAME="factoring">3.2. Factoring</A></H3>
283
<P>
284
The second major transformation is to factor the grammar.  It is possible
285
that this phase could go on for ever, so there is a limit to the number
286
of rules that can be generated.  This limit may be changed (see the
287
<A HREF="#invoke">invocation section</A>).  In practice it is rare
288
for this to happen. 
289
</P>
290
<P>
291
The factoring phase tries to increase the number of language specifications
292
that <CODE>sid</CODE> can produce a parser for, by factoring out common
293
initial items in alternatives, e.g.: 
294
<PRE>
295
	X = {
296
		a ; b ;
297
	    ||	a ; c ;
298
	} ;
299
</PRE>
300
would be transformed into something like: 
301
<PRE>
302
	X = {
303
		a ; X1 ;
304
	} ;
305
 
306
	X1 = {
307
		b ;
308
	    ||	c ;
309
	} ;
310
</PRE>
311
It will also expand calls to rules at the beginning of alternatives
312
if this is necessary to allow the parser to be produced (this is the
313
phase that the limit is needed for).  The expansions are done in the
314
following cases: 
315
<OL>
316
 
317
<LI>If the rule is see-through (i.e. there is an expansion of the
318
rule that does not contain any terminals or predicates) and its first
319
set contains terminals in the first set of the rest of the alternative.
320
 
321
<LI>If the rule has a predicate in its first set. 
322
<LI>If the rule has terminals in its first set that are also in the
323
first set of another alternative that does not begin with the same
324
rule. 
325
</OL>
326
If the rule being expanded contains an exception handler, and it is
327
not identical to the exception handler in the enclosing rule, then
328
an error will occur.  Similarly if the rule being expanded defines
329
any non-local variables then an error will occur. 
330
</P>
331
 
332
<H3><A NAME="optimise">3.3. Optimisations</A></H3>
333
<P>
334
After these two transformations, <CODE>sid</CODE> performs some optimisations
335
on the grammar, such as reducing multiple copies of identical rules,
336
removing tail recursion, inlining all basic rules, inlining single
337
alternative rules, inling rules used only once, and inlining everything
338
that can be inlined.  All of the inlining is optional. 
339
</P>
340
<P>
341
After these optimisations, <CODE>sid</CODE> checks the language description
342
to ensure that it is possible to produce a parser for it, and if so
343
it outputs the parser. 
344
</P>
345
 
346
<HR>
347
<H2><A NAME="invoke">4. Invocation</A></H2>
348
<P>
349
<CODE>sid</CODE> should be invoked in the following manner: 
350
<PRE>
351
	sid <I>options input-files output-files</I>
352
</PRE>
353
The options are described later.  The input files should be a number
354
of input file names dependent upon the output language.  The first
355
input file is the <CODE>sid</CODE> grammar file.  In the case of either
356
C dialects there should be one other input file that provides C specific
357
information to <CODE>sid</CODE>.  The number of output files is also
358
language specific.  At present, two output files should be specified
359
for the C languages.  The first should be a <CODE>.c</CODE> file into
360
which the parser is written; the second should be a <CODE>.h</CODE>
361
file into which the terminal definitions and external function declarations
362
are written. 
363
</P>
364
<P>
365
The options list should consist of zero or more of the following options.
366
There are short forms for each of these options as well; see the 
367
<CODE>sid</CODE> manual page for more information on invocation. 
368
<DL>
369
 
370
<DT><B>--dump-file <I>FILE</I></B>
371
<DD>
372
This option causes intermediate dumps of the grammar to be written
373
to the named file.  The format of the dump files is similar to the
374
format of the grammar specification, with the following exceptions:
375
<OL>
376
 
377
<LI>Predicates are written with the predicate result replaced by the
378
predicate identifier (this will always be zero), and the result is
379
followed by a <CODE>?</CODE> to indicate that it was a predicate.
380
As an example, the predicate: 
381
<PRE>
382
	( b, ? ) = &lt;pred&gt; ( a )
383
</PRE>
384
would be printed out as: 
385
<PRE>
386
	( b : Type1T, 0 : Type2T ) ? = &lt;pred&gt; ( a : Type3T )
387
</PRE>
388
 
389
<LI>Items that are considered to be inlinable are prefixed by a 
390
<CODE>+</CODE>.  Items that are tail calls which will be eliminated
391
are prefixed by a <CODE>*</CODE>. 
392
<LI>Nested rules are written at the outer level, with names of the
393
form <CODE>outer-rule::....::inner-rule</CODE>. 
394
<LI>Types are provided on call parameter and result tuples. 
395
<LI>Inline rules are given a generated name, and are written out as
396
a call to the generated rule (and a definition elsewhere). 
397
</OL>
398
<P>
399
 
400
<DT><B>--factor-limit <I>LIMIT</I></B>
401
<DD>
402
This option limits the number of rules that can be created during
403
the factorisation process.  It is probably best not to change this.
404
<P>
405
 
406
<DT><B>--help</B>
407
<DD>
408
This option writes a summary of the command line options to the standard
409
error stream. 
410
<P>
411
 
412
<DT><B>--inline <I>INLINES</I></B>
413
<DD>
414
This option controls what inlining will be done in the output parser.
415
The inlines argument should be a comma separated list of the following
416
words: 
417
<P>
418
<DL>
419
 
420
<DT><CODE>SINGLES</CODE>
421
<DD>
422
This causes single alternative rules to be inlined.  This inlining
423
is no longer performed as a modification to the grammar (it was in
424
version 1.0). 
425
<DT><CODE>BASICS</CODE>
426
<DD>
427
This causes rules that contain only basics (and no exception handlers
428
or empty alternatives) to be inlined.  The restriction on exception
429
handlers and empty alternatives is rather arbitrary, and may be changed
430
later. 
431
<DT><CODE>TAIL</CODE>
432
<DD>
433
This causes tail recursive calls to be inlined.  Without this, tail
434
recursion elimination will not be performed. 
435
<DT><CODE>OTHER</CODE>
436
<DD>
437
This causes other calls to be inlined wherever possible.  Unless the
438
<CODE>MULTI</CODE> inlining is also specified, this will be done only
439
for productions that are called once. 
440
<DT><CODE>MULTI</CODE>
441
<DD>
442
This causes calls to be inlined, even if the rule being called is
443
called more than once.  Turning this inlining on implies <CODE>OTHER</CODE>.
444
Similarly turning off <CODE>OTHER</CODE> inlining will turn off 
445
<CODE>MULTI</CODE> inlining.  For grammars of any size, this is probably
446
best avoided; if used the generated parser may be huge (e.g. a C grammar
447
has produced a file that was several hundred MB in size). 
448
<DT><CODE>ALL</CODE>
449
<DD>
450
This turns on all inlining. 
451
</DL>
452
</P>
453
<P>
454
In addition, prefixing a word with <CODE>NO</CODE> turns off that
455
inlining phase.  The words may be given in any case.  They are evaluated
456
in the order given, so: 
457
<PRE>
458
	--inline noall,singles
459
</PRE>
460
would turn on single alternative rule inlining only, whilst: 
461
<PRE>
462
	--inline singles,noall
463
</PRE>
464
would turn off all inlining.  The default is as if <CODE>sid</CODE>
465
were invoked with the option: 
466
<PRE>
467
	--inline noall,basics,tail
468
</PRE>
469
</P>
470
 
471
<DT><A NAME="lang"><B>--language <I>LANGUAGE</I></B></A>
472
<DD>
473
This option specifies the output language.  Currently this should
474
be one of <CODE>ansi-c</CODE>, <CODE>pre-ansi-c</CODE> and <CODE>test</CODE>.
475
The default is <CODE>ansi-c</CODE>. 
476
<P>
477
The <CODE>ansi-c</CODE> and <CODE>pre-ansi-c</CODE> languages are
478
basically the same.  The only difference is that <CODE>ansi-c</CODE>
479
initially uses function prototypes, and <CODE>pre-ansi-c</CODE> doesn't.
480
The C language specific options are: 
481
</P>
482
<P>
483
<DL>
484
 
485
<DT><CODE>prototypes, proto, no-prototypes, no-proto</CODE>
486
<DD>
487
These enable or disable the use of function prototypes.  By default
488
this is enabled for <CODE>ansi-c</CODE> and disabled for <CODE>pre-ansi-c</CODE>.
489
<P>
490
 
491
<DT><CODE>numeric-ids, numeric, no-numeric-ids, no-numeric</CODE>
492
<DD>
493
These enable or disable the use of numeric identifiers.  Numeric identifiers
494
replace the identifier name with a number, which is mainly of use
495
in stopping identifier names getting too long.  The disadvantage is
496
that the code becomes less readable, and more difficult to debug.
497
Numeric identifiers are not used by default. 
498
<P>
499
 
500
<DT><A NAME="casts"><CODE>casts, cast, no-casts, no-cast</CODE></A>
501
<DD>
502
These enable or disable casting of action and assignment operator
503
immutable parameters.  If enabled, a parameter is cast to its own
504
type when it is substituted into the action.  This will cause some
505
compilers to complain about attempts to modify the parameter (which
506
can help pick out attempts at mutating parameters that should not
507
be mutated).  The disadvantage is that not all compilers will reject
508
attempts at mutation, and that ISO C doesn't allow casting to structure
509
and union types, which means that some code may be illegal.  Parameter
510
casting is disabled by default. 
511
<P>
512
 
513
<DT><CODE>unreachable-macros, unreachable-macro, unreachable-comments,
514
unreachable-comment</CODE>
515
<DD>
516
These choose whether unreachable code is marked by a macro or a comment.
517
The default is to mark unreachable code with a comment 
518
<CODE>/*UNREACHED*/</CODE>, however a macro <CODE>UNREACHED ;</CODE>
519
may be used instead, if desired. 
520
</DL>
521
</P>
522
<P>
523
The <CODE>test</CODE> language only takes one input file, and produces
524
no output file.  It may be used to check that a grammar is valid.
525
In conjunction with the dump file, it may be used to check the transformations
526
that would be applied to the grammar.  There are no language specific
527
options for the <CODE>test</CODE> language. 
528
</P>
529
 
530
<DT><B>--show-errors</B>
531
<DD>
532
This option writes a copy of the current error messages to the standard
533
output.  See the manual entry for more details about changing the
534
error message content. 
535
<P>
536
 
537
<DT><B>--switch <I>OPTION</I></B>
538
<DD>
539
This passes through <I>OPTION</I> as a language specific option. 
540
The valid options are described <A HREF="#lang">above</A>. 
541
<P>
542
 
543
<DT><B>--tab-width <I>NUMBER</I></B>
544
<DD>
545
This option specifies the number of spaces that a tab occupies.  It
546
defaults to 8.  It is only used when indenting output. 
547
<P>
548
 
549
<DT><B>--version</B>
550
<DD>
551
This option causes the version number and supported languages to be
552
written to the standard error stream. 
553
</DL>
554
 
555
<HR>
556
<H2><A NAME="grammar-file">5. The <CODE>sid</CODE> grammar file</A></H2>
557
<P>
558
The <CODE>sid</CODE> grammar file should always be the first input
559
file specified on the command line.  It provides an output language
560
independent description of the language to be recognised.  The file
561
is split up into a number of different sections, each of which begins
562
with a section header.  All sections must be included, although it
563
is possible to leave most of them empty.  The following sections exist
564
at present: type declaration, terminal declaration, rule definition,
565
and grammar entry points.  They must appear in that order.  The sections
566
are detailed below, after the lexical conventions. 
567
</P>
568
 
569
<H3><A NAME="grammar-lex">5.1. Lexical conventions</A></H3>
570
<P>
571
Almost all non-printable characters (but definitely space, tab, newline
572
and form-feed) are considered to be white space, and are ignored other
573
than to separate other tokens.  In addition, two comment forms are
574
recognised: the C comment syntax, where comments begin with 
575
<CODE>/*</CODE>, and end with <CODE>*/</CODE>, although unlike C these
576
comments nest properly; and the C++ line comment syntax, where comments
577
begin with <CODE>//</CODE>, and are terminated at the end of the line.
578
Comments are treated in the same way as white space characters. 
579
</P>
580
<P>
581
Section headers are enclosed in percent characters, and are case insensitive.
582
The following section headers are currently recognised: 
583
<PRE>
584
	%types%
585
	%terminals%
586
	%productions%
587
	%entry%
588
</PRE>
589
</P>
590
<P>
591
Identifiers must begin with a letter, an underscore or a hyphen. 
592
This may be followed by zero or more letters, digits, underscores
593
and hyphens. Identifiers are case sensitive.  The following are all
594
legal identifiers: 
595
<PRE>
596
	expression	mult-expr	plus_expr	expr-1
597
</PRE>
598
Identifiers are split into two namespaces: local names have one space;
599
types, actions, rules, non-local names and terminals share the other
600
namespace, so it is not possible to have an identifier that is a type
601
as well as being a rule for example. 
602
</P>
603
<P>
604
The following symbols are also used: 
605
<PRE>
606
	&amp;	;	=	[	:	]	!	,
607
	||	$	?	{	}	(	)	&lt;
608
	&gt;	-&gt;	::	##
609
</PRE>
610
All other characters are illegal. 
611
</P>
612
 
613
<H3><A NAME="type-decl">5.2. The type declaration section</A></H3>
614
<P>
615
The first section is the type declaration section.  This begins with
616
the types section header, followed by the names of all of the types
617
used by the grammar.  Each type name should be terminated by a semicolon.
618
An example of the type declaration section follows: 
619
<PRE>
620
	%types%
621
	NumberT ;
622
	StringT ;
623
</PRE>
624
This declares two types: <CODE>NumberT</CODE> and <CODE>StringT</CODE>.
625
There is no requirement for the type names to resemble names in the
626
target language (in fact this should be avoided, as it is possible
627
to use many different target languages).  All types used in the grammar
628
must be declared here.  Similarly, all types declared here must be
629
used in the grammar. 
630
</P>
631
 
632
<H3><A NAME="term-decl">5.3. The terminal declaration section</A></H3>
633
<P>
634
After the type declaration section comes the terminal declaration
635
section.  This section declares the terminals that will be used by
636
the grammar.  A terminal is a recogniser for a symbol in the input
637
alphabet of the language to be recognised.  It is possible to declare
638
terminals that are not used by the grammar. 
639
</P>
640
<P>
641
The section begins with its section header, followed by the declarations
642
of the terminals.  Each terminal declaration begins with the name
643
of the terminal being defined, followed by its type, and terminated
644
by a semicolon.  If the terminal is not used in the grammar, the declaration
645
should be preceded by a <CODE>!</CODE> symbol. 
646
</P>
647
<P>
648
A type (for terminals, rules and actions) is written as a colon, followed
649
by the parameter tuple, followed by the <CODE>-&gt;</CODE>
650
symbol, followed by the result tuple.  If the type is zero-tuple to
651
zero-tuple, then the type may be omitted.  A tuple consists of a comma
652
separated sequence of name-type pairs (with the name and type being
653
separated by a colon), surrounded by parentheses.  For parameter tuples,
654
the type may be suffixed by a <CODE>&amp;</CODE> symbol, indicating
655
that call by reference should be used (the default is call by copy).
656
For declarations, the names should be omitted.  For terminals, the
657
parameter type must be the zero-tuple. 
658
</P>
659
<P>
660
The simplest type of terminal declaration is as follows: 
661
<PRE>
662
	terminal1 ;
663
</PRE>
664
This means the same as: 
665
<PRE>
666
	terminal1 : () -&gt; () ;
667
</PRE>
668
An example of a more complex terminal declaration is: 
669
<PRE>
670
	terminal2 : () -&gt; ( :StringT ) ;
671
</PRE>
672
If these terminals were not to be used in the grammar, they should
673
be declared as: 
674
<PRE>
675
	!terminal1 ;
676
	!terminal2 : () -&gt; ( :StringT ) ;
677
</PRE>
678
</P>
679
 
680
<H3><A NAME="rule-defn">5.4. The rule definition section</A></H3>
681
<P>
682
The rule definition section follows the terminal declaration section.
683
It begins with the section header, followed by the definitions and
684
declarations of all of the rules used in the grammar, and the declarations
685
of all of the actions used in the grammar. 
686
</P>
687
<P>
688
Rule declarations look identical to terminal declarations, e.g.: 
689
<PRE>
690
	rule1 : ( :NumberT ) -&gt; ( :NumberListT ) ;
691
	rule2 ;
692
</PRE>
693
Action declarations are similar, although the names are surrounded
694
by angle brackets, e.g.: 
695
<PRE>
696
	&lt;action1&gt; : ( :StringT &amp; ) -&gt; () ;
697
	&lt;action2&gt; ;
698
</PRE>
699
A declaration (or a definition) may be prefixed with the <CODE>::</CODE>
700
symbol.  This symbol forces the definition into the outermost scope.
701
Scopes are described later on. 
702
</P>
703
<P>
704
A rule definition (called a production) looks something like the following:
705
<PRE>
706
	add-expr : () -&gt; ( v : NumberT ) = {
707
		v1 = mul-expr ;
708
		plus ;
709
		v2 = expr ;
710
		v  = &lt;add&gt; ( v1, v2 ) ;
711
	    ||
712
		v1 = mul-expr ;
713
		minus ;
714
		v2 = expr ;
715
		v  = &lt;subtract&gt; ( v1, v2 ) ;
716
	    ##
717
		v = &lt;ex&gt; ;
718
	} ;
719
</PRE>
720
The production begins with the rule name, followed by the parameter
721
and result names and types (in this case, the rule name is 
722
<CODE>add-expr</CODE>, there are no parameters, and there is one result
723
name <CODE>v</CODE> of type <CODE>NumberT</CODE>).  This may optionally
724
be followed by local declarations (there are none here - they are
725
described later). 
726
</P>
727
<P>
728
The left hand side of the rule is followed by the <CODE>=</CODE> symbol,
729
a list of alternatives surrounded by curly braces, and is terminated
730
by a semicolon.  The alternatives are separated by the <CODE>||</CODE>
731
symbol, and the last alternative may be separated from its predecessor
732
(there must be one) using the <CODE>##</CODE> symbol; if this is the
733
case, then this alternative is the exception handler for the production
734
(otherwise it has no exception handler). 
735
</P>
736
<P>
737
An alternative may match the empty string, using the symbol <CODE>$</CODE>
738
and the terminator symbol <CODE>;</CODE>, i.e.: 
739
<PRE>
740
	$ ;
741
</PRE>
742
unless it is an exception handler alternative (in which case it must
743
do something), or a sequence of items.  The empty string is only valid
744
if the production has no results.  If you want to match an empty string
745
in a production that has a result, it is necessary to use an action
746
(or identity) to provide a result. 
747
</P>
748
<P>
749
An item is an identity, or a call to a (possibly anonymous) rule,
750
a terminal, an action or a predicate.  An identity looks like an assignment
751
in conventional programming languages: 
752
<PRE>
753
	( a, b, c ) = ( d, e, f ) ;
754
</PRE>
755
Each tuple must contain the same number of names.  In the case of
756
a one-tuple, the parentheses may be dropped, e.g.: 
757
<PRE>
758
	a = d ;
759
</PRE>
760
Note that this is a binding operation, not an assignment.  Each name
761
on the left hand side must be a new name.  It is illegal to redeclare
762
a name that is already in scope.  It is possible to assign to a name
763
which is already in scope by prefixing that name with the <CODE>&amp;</CODE>
764
symbol, e.g.: 
765
<PRE>
766
	( a, &amp;b, c ) = ( d, e, f ) ;
767
</PRE>
768
would assign to the name <CODE>b</CODE>, which must have been previously
769
defined (it may be a parameter; if it is a call by reference parameter,
770
then the change will propagate outside to the calling rule). 
771
</P>
772
<P>
773
It is also possible to use the <CODE>!</CODE> symbol in the result
774
tuple to ignore results, e.g.: 
775
<PRE>
776
	( a, !, b, ! ) = ( c, d, e, f ) ;
777
</PRE>
778
This is not particularly useful in an identity, but may be more useful
779
in a call to a rule, terminal or action.  A call to a terminal or
780
rule looks like a call to a function in a conventional programming
781
language, e.g.: 
782
<PRE>
783
	( a, b ) = rule1 ( c, d ) ;
784
	( e, f ) = terminal1 () ;
785
</PRE>
786
Calls to actions have the same form, except that action names are
787
surrounded by angle brackets, e.g.: 
788
<PRE>
789
	( g, h, i ) = &lt;action1&gt; ( a, e ) ;
790
</PRE>
791
In addition, one of the names in the result tuple of the call to the
792
action may be the predicate result symbol <CODE>?</CODE>, in which
793
case the action is used as a predicate (more details on predicates
794
are given later). 
795
</P>
796
<P>
797
When calling a rule, terminal or action, it is necessary to have declared
798
it (or in the case of a rule declared or defined it) before the call.
799
</P>
800
<P>
801
If a rule or action is being invoked, and it takes one or more call
802
by reference parameters, then the corresponding arguments should be
803
prefixed by the <CODE>&amp;</CODE> symbol, e.g.: 
804
<PRE>
805
	length = &lt;string-length&gt; ( &amp;string ) ;
806
</PRE>
807
If the rule, terminal or action has the zero-tuple as a result, then
808
only the right hand side of the definition is required, e.g.: 
809
<PRE>
810
	rule2 ( a, b ) ;
811
	terminal2 () ;
812
	&lt;action2&gt; ( c, d ) ;
813
</PRE>
814
If the rule, terminal or action has the zero-tuple as a parameter,
815
then the parameter tuple may be omitted, e.g.: 
816
<PRE>
817
	( a, b ) = rule3 ;
818
	terminal3 ;
819
	c = &lt;action3&gt; ;
820
</PRE>
821
In older versions of <CODE>sid</CODE>, it used to be possible to have
822
ambiguous items, e.g.: 
823
<PRE>
824
	a = b ;
825
</PRE>
826
where <CODE>b</CODE> was both a rule and a name.  As local names may
827
not shadow non-local and global names, then this is no longer a problem.
828
</P>
829
<P>
830
In each case, the data flow through the rule is indicated using names.
831
In the previous example of a production, both alternatives have the
832
same data flow: the call to <CODE>mul-expr</CODE> returns one value,
833
which is stored in the name <CODE>v1</CODE>, and the call to <CODE>expr</CODE>
834
returns one value, which is stored in the name <CODE>v2</CODE>.  Both
835
of these names are passed to the action (<CODE>add</CODE> in the first
836
alternative, <CODE>subtract</CODE> in the second), which returns a
837
value into the name <CODE>v</CODE> (presumably the sum or difference
838
of the previous two values).  The name <CODE>v</CODE> is the name
839
of the result, so this will be returned as the result of the rule.
840
The exception handler (which is invoked if something fails) calls
841
the action 
842
<CODE>ex</CODE> to produce the result <CODE>v</CODE>. 
843
</P>
844
<P>
845
It is necessary that the types of the data flow through the production
846
are correct.  It is also necessary to define all of the result names
847
for the production in each of the alternatives in that production.
848
</P>
849
<P>
850
An anonymous rule is written in the same way as the body of a normal
851
rule, e.g.: 
852
<PRE>
853
	list : () -&gt; ( l : ListT ) = {
854
		n = number ;
855
		/* START ANONYMOUS RULE */ {
856
			? = &lt;at-eof&gt; ;
857
			l = &lt;make-list&gt; ( n ) ;
858
		    ||
859
			comma ;
860
			l1 = list ;
861
			l  = &lt;cons&gt; ( n, l1 ) ;
862
		    ##
863
			l = &lt;com-or-eof&gt; ( n ) ;
864
		} ; /* END ANONYMOUS RULE */
865
	} ;
866
</PRE>
867
An anonymous rule is always inlined. 
868
</P>
869
<P>
870
The rule name may be followed by a sequence of definitions and declarations
871
surrounded by the <CODE>[</CODE> and <CODE>]</CODE> symbols (which
872
are followed by the rest of the rule).  In this case, the definitions
873
are local to the rule, e.g.: 
874
<PRE>
875
	x-list [
876
		x = {
877
			terminal1 ;
878
			terminal2 ;
879
		    ||
880
			terminal3 ;
881
		} ;
882
	] = {
883
		x ;
884
	    ||
885
		x ;
886
		x-list ;
887
	} ;
888
</PRE>
889
In this case, the rule <CODE>x</CODE> would be local to the rule 
890
<CODE>x-list</CODE> and no other rule would be able to use it.  In
891
error messages, the name of the rule <CODE>x</CODE> would be written
892
as 
893
<CODE>x-list::x</CODE>.  All declarations and definitions that occur
894
inside of the <CODE>[</CODE> and <CODE>]</CODE> symbols have the scope
895
of the enclosing rule, unless they are preceded by the <CODE>::</CODE>
896
symbol, in which case they have global scope.  This is particularly
897
necessary for actions, as actions can only be defined with global
898
scope. 
899
</P>
900
<P>
901
It is also possible to define non-local names.  These are declared
902
as an identifier (the name), followed by the <CODE>:</CODE> symbol,
903
followed by another identifier (its type), in a similar manner to
904
an entry in a type tuple.  Non-local names are not allowed at the
905
outermost level (so they may not be prefixed with the <CODE>::</CODE>
906
symbol either).  When a non-local name is defined, it is in scope
907
for all of the rules in its scope that are defined after it is, plus
908
its defining rule. 
909
</P>
910
<P>
911
Non-local names have their values saved on entry to their defining
912
rule, and the value will be restored when the rule is exited.  This
913
includes exiting the rule tail recursively or because of an exception
914
(if the rule has an exception handler, the non-local name will not
915
be restored until the exception handler has exited).  In almost all
916
other respects non-local names are the same as local names.  An example
917
follows: 
918
<PRE>
919
	rule1 [
920
		name1 : Type1T ;
921
		rule1_1 = {
922
			&lt;action1&gt; ( name1 ) ;
923
			rule2 ( name1 ) ;
924
		} ;
925
	] = {
926
		&lt;action2&gt; ( &amp;name1 ) ;
927
		rule1_1 ;
928
	} ;
929
</PRE>
930
It is possible to associate an initialiser with a non-local name,
931
by following the type name with a <CODE>=</CODE> symbol and the action
932
name in angle brackets, e.g.: 
933
<PRE>
934
	rule1 [
935
		name1 : Type1T = &lt;action1&gt; ;
936
	] = {
937
		// ....
938
	} ;
939
</PRE>
940
In this case the action is called at the start of the rule to initialise
941
the non-local name.  The action should return an object of the same
942
type as the non-local name.  Normally, the action takes no parameters,
943
however it may take one parameter of the same type as the non-local
944
name (or a reference to that type), in which case it will be given
945
the saved value of the non-local name as an argument (this may be
946
used to build a stack automatically for example). 
947
</P>
948
 
949
<H3><A NAME="entry-point">5.5. The grammar entry points section</A></H3>
950
<P>
951
The final section lists the entry points to the grammar.  It begins
952
with the section header, followed by a comma separated list of rule
953
names, terminated by a semicolon, e.g.: 
954
<PRE>
955
	%entry% expr ;
956
</PRE>
957
If you are going to use a rule as an entry point into the grammar
958
(i.e. you wish to call the associated function), you must list it
959
in the entry points list.  If not, the function may not exist. 
960
</P>
961
 
962
<HR>
963
<H2><A NAME="info-file">6. The C information file</A></H2>
964
<P>
965
The grammar specification itself is not sufficient to produce a parser.
966
There also needs to be output language specific information to allow
967
the parser to interface with the program it is to be part of.  In
968
the case of the C output routines, <CODE>sid</CODE> needs to know
969
the following information: 
970
<OL>
971
 
972
<LI>What code should precede and succeed the automatically generated
973
code. 
974
<LI>How to map the <CODE>sid</CODE> identifiers into C identifiers.
975
 
976
<LI>How to do assignments for each type. 
977
<LI>How to get the current terminal number. 
978
<LI>How to get the result of the current terminal. 
979
<LI>How to advance the lexical analyser, to get the next terminal.
980
 
981
<LI>What the actions are defined as, and how to pass parameters to
982
them. 
983
<LI>How to save and restore the current terminal when an error occurs.
984
 
985
</OL>
986
Eventually almost all of this should be user suppliable.  At the moment,
987
some of the information is supplied by the user in the C information
988
file, some through macros, and some is built in.  <CODE>sid</CODE>
989
currently gets the information as follows: 
990
</P>
991
<P>
992
1.  The C information file has a header and a trailer section, which
993
define code that precedes and succeeds the code that <CODE>sid</CODE>
994
generates. 
995
</P>
996
<P>
997
2.  The C information file has a section that allows the user to specify
998
mappings from <CODE>sid</CODE> identifiers into C identifiers.  These
999
are only valid for the following types of identifiers: types, functions
1000
(implementations of rules) and terminals.  For other identifier types
1001
(or when no mapping is supplied), <CODE>sid</CODE> uses some default
1002
rules: 
1003
<P>
1004
Firstly, <CODE>sid</CODE> applies a transform to the <CODE>sid</CODE>
1005
identifier name, to make it a legal C identifier.  At present this
1006
maps 
1007
<CODE>_</CODE> to <CODE>__</CODE>, <CODE>-</CODE> to <CODE>_H</CODE>
1008
and 
1009
<CODE>:</CODE> (this occurs in scoped names) to <CODE>_C</CODE>. 
1010
All other characters are left unmodified.  This transform cannot be
1011
changed. 
1012
</P>
1013
<P>
1014
<CODE>sid</CODE> also puts a prefix before all identifiers, to try
1015
to prevent clashes (and also to make automatically generated - i.e.
1016
numeric - identifiers legal).  These prefixes can be redefined for
1017
each class of identifier, in the C information file.  They should
1018
be chosen so as not to clash with any other identifiers (i.e. no other
1019
identifiers should begin with that prefix). 
1020
</P>
1021
<P>
1022
By default, the following prefixes are used: 
1023
</P>
1024
<P>
1025
<DL>
1026
 
1027
<DT><CODE>ZT</CODE>
1028
<DD>
1029
This prefix is used before type identifiers, for the type name itself.
1030
 
1031
<DT><CODE>ZR</CODE>
1032
<DD>
1033
This prefix is used before rule identifiers, for the rule's implementation
1034
function. 
1035
<DT><CODE>ZL</CODE>
1036
<DD>
1037
This prefix is used before rule identifiers, for the rule's label
1038
when tail recursion is being eliminated.  In this case, a number is
1039
added to the suffix before the identifier name, to prevent clashes
1040
when a rule is inlined twice in the same function.  It is also used
1041
before other labels that are automatically generated and are just
1042
numbered. 
1043
<DT><CODE>ZI</CODE>
1044
<DD>
1045
This prefix is used before name identifiers used as parameters to
1046
functions, or in normal usage.  It is also used by non-local names
1047
(which doesn't cause a problem as they always occur scoped, and local
1048
names never do). 
1049
<DT><CODE>ZO</CODE>
1050
<DD>
1051
This prefix is used before name identifiers used as results of functions.
1052
Results are passed as reference parameters, and this suffix is used
1053
then.  Another identifier with the <CODE>ZI</CODE> prefix is also
1054
used within the function, and the type reference assignment operator
1055
is used at the end of the function to assign the results to the reference
1056
parameters. 
1057
<DT><CODE>ZB</CODE>
1058
<DD>
1059
This prefix is used before the terminal symbol names in the generated
1060
header file. 
1061
</DL>
1062
</P>
1063
 
1064
<P>
1065
3.  Normally, <CODE>sid</CODE> will do assignments using the C assignment
1066
operator.  Sometimes, this will not do the right thing, so the user
1067
can define a set of assignment operations for any type in the C information
1068
file. 
1069
</P>
1070
<P>
1071
4.  <CODE>sid</CODE> expects the <CODE>CURRENT_TERMINAL</CODE> macro
1072
to be defined, and its definition should return an integer that is
1073
the current terminal.  The macro should be an expression, not a statement.
1074
</P>
1075
<P>
1076
5.  It is necessary to define how to extract the results of all terminals
1077
in the C information file (if a terminal doesn't return anything,
1078
then it is not necessary to define how to get the result). 
1079
</P>
1080
<P>
1081
6.  <CODE>sid</CODE> expects the <CODE>ADVANCE_LEXER</CODE> macro
1082
to be defined, and its definition should cause the lexical analyser
1083
to read a new token.  The new terminal number should be accessible
1084
through the 
1085
<CODE>CURRENT_TERMINAL</CODE> macro.  On entry into the parser 
1086
<CODE>CURRENT_TERMINAL</CODE> should give the first terminal number.
1087
</P>
1088
<P>
1089
7.  All actions, and their parameter and result names are defined
1090
in the C information file. 
1091
</P>
1092
<P>
1093
8.  <CODE>sid</CODE> expects the <CODE>SAVE_LEXER</CODE> and 
1094
<CODE>RESTORE_LEXER</CODE> macros to be defined.  The first is called
1095
with an argument which is the error terminal value.  The macro should
1096
save the current terminal's value, and set the current terminal to
1097
be the error terminal value.  The second macro is called without arguments,
1098
and should restore the saved value of the current terminal. 
1099
<CODE>SAVE_LEXER</CODE> will never be called more than once without
1100
a call to <CODE>RESTORE_LEXER</CODE>, so the save stack only needs
1101
one element. 
1102
</P>
1103
<P>
1104
The remainder of this section describes the layout of the C information
1105
file.  The lexical conventions are described first, followed by a
1106
description of the sections in the order in which they should appear.
1107
Unlike the <CODE>sid</CODE> grammar file, not all sections are mandatory.
1108
</P>
1109
 
1110
<H3><A NAME="info-lex">6.1. Lexical conventions</A></H3>
1111
<P>
1112
The lexical conventions of the C information file are very similar
1113
to those of the <CODE>sid</CODE> grammar file.  There is a second
1114
class of identifier: the C identifier, which is a subset of the valid
1115
<CODE>sid</CODE> identifiers; there is also the C code block. 
1116
</P>
1117
<P>
1118
A C code block begins with <CODE>@{</CODE> and is terminated by 
1119
<CODE>@}</CODE>.  The code block consists of all of the characters
1120
between the start and end of the code block, subject to substitutions.
1121
All substitutions begin with the <CODE>@</CODE> character.  The following
1122
substitutions are recognised: 
1123
</P>
1124
<P>
1125
<DL>
1126
 
1127
<DT><CODE>@@</CODE>
1128
<DD>
1129
This substitutes the <CODE>@</CODE> character itself. 
1130
<P>
1131
 
1132
<DT><CODE>@:</CODE><I>label</I>
1133
<DD>
1134
This form marks a label, which will be substituted for in the output
1135
code.  This is necessary, because an action may be inlined into the
1136
same function more than once.  If this happens, then without doing
1137
label substitution there would be two identical labels in the same
1138
scope. With label substitution, this problem is avoided.  In general,
1139
all references to a label within an action should be prefixed with
1140
<CODE>@:</CODE>.  This substitution may not be used in header and
1141
trailer code blocks. 
1142
<P>
1143
 
1144
<DT><CODE>@</CODE><I>identifier</I>
1145
<DD>
1146
This form marks a parameter or result identifier substitution.  If
1147
parameter and result identifiers are not prefixed with an <CODE>@</CODE>
1148
character, then they will not be substituted.  It is an error if the
1149
identifier is not a parameter or a result.  Header and trailer code
1150
blocks have no parameters or results, so it is always an error to
1151
use identifier substitution in them.  It is an error if any of the
1152
result identifiers are not substituted at least once. 
1153
<P>
1154
Result identifiers may be assigned to using this form of identifier
1155
substitution, but parameter identifiers may not be (nor may there
1156
address be taken - they are immutable).  To try to prevent this, parameters
1157
that are substituted may be cast to their own type, which makes them
1158
unmodifiable in ISO C (see the notes on the 
1159
<A HREF="#casts"><CODE>casts</CODE> language specific option</A>).
1160
</P>
1161
 
1162
<DT><CODE>@&amp;</CODE><I>identifier</I>
1163
<DD>
1164
This form marks a parameter identifier whose address is to be substituted,
1165
but whose contents will not be modified.  The effects of modifying
1166
the identifier are undefined.  It is an error to use this in parameter
1167
assignment operator definitions. 
1168
<P>
1169
 
1170
<DT><CODE>@=</CODE><I>identifier</I>
1171
<DD>
1172
This form marks a parameter identifier that will be modified.  For
1173
this to be useful, the parameter should be a call by reference parameter,
1174
so that the effect of the modification will be propagated.  This substitution
1175
is only valid in actions (assignment operators are not allowed to
1176
modify their parameters). 
1177
<P>
1178
 
1179
<DT><CODE>@!</CODE>
1180
<DD>
1181
This form marks an exception raise.  In the generated code, a jump
1182
to the current exception handler will be substituted.  This substitution
1183
is only valid in actions. 
1184
<P>
1185
 
1186
<DT><CODE>@.</CODE>
1187
<DD>
1188
This form marks an attempt to access the current terminal.  This substitution
1189
is only valid in actions. 
1190
<P>
1191
 
1192
<DT><CODE>@&gt;</CODE>
1193
<DD>
1194
This form marks an attempt to advance the lexical analyser.  This
1195
substitution is only valid in actions. 
1196
</DL>
1197
</P>
1198
<P>
1199
All other forms are illegal.  Note that in the case of labels and
1200
identifiers, no white space is allowed between the <CODE>@:</CODE>,
1201
<CODE>@</CODE>, <CODE>@&amp;</CODE> or <CODE>@=</CODE> and the identifier
1202
name.  An example of a code block is: 
1203
<PRE>
1204
	@{
1205
		/* A code block */
1206
		{
1207
			int i ;
1208
			if ( @param ) {
1209
				@! ;
1210
			}
1211
			@result = 0 ;
1212
			for ( i = 0 ; i &lt; 100 ; i++ ) {
1213
				printf ( &quot;{%d}\n&quot;, i ) ;
1214
				@result += i ;
1215
			}
1216
			@=param += @result ;
1217
			if ( @. == TOKEN_SEMI ) {
1218
				@&gt; ;
1219
			}
1220
		}
1221
	@}
1222
</PRE>
1223
</P>
1224
 
1225
<H3><A NAME="prefixes">6.2. The prefixes section</A></H3>
1226
<P>
1227
The first section in the C information file is the prefix definition
1228
section.  This section is optional.  It begins with the section header,
1229
followed by a list of prefix definitions.  A prefix definition begins
1230
with the prefix name, followed by a <CODE>=</CODE> symbol, followed
1231
by a C identifier that is the new prefix, and terminated by a semicolon.
1232
The following example shows all of the prefix names, and their default
1233
values: 
1234
<PRE>
1235
	%prefixes%
1236
	type		= ZT ;
1237
	function	= ZR ;
1238
	label		= ZL ;
1239
	input		= ZI ;
1240
	output		= ZO ;
1241
	terminal	= ZB ;
1242
</PRE>
1243
</P>
1244
 
1245
<H3><A NAME="maps">6.3. The maps section</A></H3>
1246
<P>
1247
The section that follows the prefixes section is the maps section.
1248
This section is also optional.  It begins with its section header,
1249
followed by a list of identifier mappings.  An identifier mapping
1250
begins with a 
1251
<CODE>sid</CODE> identifier (either a type, a rule or a terminal),
1252
followed by the <CODE>-&gt;</CODE> symbol, followed by the C identifier
1253
it is to be mapped to, and terminated by a semicolon.  An example
1254
follows: 
1255
<PRE>
1256
	%maps%
1257
	NumberT		-&gt; unsigned ;
1258
	calculator	-&gt; calculator ;
1259
</PRE>
1260
Note that it is not possible to map type identifiers to be arbitrary
1261
C types.  It will be necessary to <CODE>typedef</CODE> or macro define
1262
the type name in the C file. 
1263
</P>
1264
<P>
1265
It is recommended that all types, terminals and entry point rules
1266
have their names mapped in this section, although this is not necessary.
1267
If the names are not mapped, they will have funny names in the rest
1268
of the program. 
1269
</P>
1270
 
1271
<H3><A NAME="header">6.4. The header section</A></H3>
1272
<P>
1273
After the maps section comes the header section.  This begins with
1274
the section header, followed by a code block, followed by a comma,
1275
followed by a second code block, and terminated with a semicolon.
1276
The first code block will be inserted at the beginning of the generated
1277
parser file; the second code block will be inserted at the start of
1278
the generated header file.  An example is: 
1279
<PRE>
1280
	%header% @{
1281
	#include &quot;lexer.h&quot;
1282
 
1283
	LexerT token ;
1284
 
1285
	#define CURRENT_TERMINAL	token.t
1286
	#define ADVANCE_LEXER		next_token ()
1287
 
1288
	extern void terminal_error () ;
1289
	extern void syntax_error () ;
1290
	@}, @{
1291
	@} ;
1292
</PRE>
1293
</P>
1294
 
1295
<H3><A NAME="assign">6.5. The assignments section</A></H3>
1296
<P>
1297
The assignments section follows the header section.  This section
1298
is optional.  Normally, assignment between two identifiers will be
1299
done using the C assignment operator.  In some cases this will not
1300
do the correct thing, and it is necessary to do the assignment differently.
1301
All types for which this applies should have an entry in the assignments
1302
section.  The section begins with its header, followed by definitions
1303
for each type that needs its own assignment operator.  Each definition
1304
should have one parameter, and one result.  The action's name should
1305
be the name of the type.  An example follows: 
1306
<PRE>
1307
	%assignments%
1308
 
1309
	ListT : ( l1 ) -&gt; ( l2 ) = @{
1310
		if ( @l2.head = @l1.head ) {
1311
			@l2.tail = @l1.tail ;
1312
		} else {
1313
			@l2.tail = &amp;( @l2.head ) ;
1314
		}
1315
	@} ;
1316
</PRE>
1317
If a type has an assignment operator defined, it must also have a
1318
parameter assignment operator type defined and a result assignment
1319
operator defined (more precisely it must have either no assignment
1320
operations defined, or all three assignment operations defined). 
1321
</P>
1322
 
1323
<H3><A NAME="param-assign">6.6. The parameter assignments section</A></H3>
1324
<P>
1325
The parameter assignments section is very similar to the assignments
1326
section (which it follows), and is also optional.  If a type has an
1327
assignment section entry, it must have a parameter assignment entry
1328
as well. 
1329
</P>
1330
<P>
1331
The parameter assignment operator is used in function calls to ensure
1332
that the object is copied correctly: if no parameter assignment operator
1333
is provided for a type, the standard C call by copy mechanism is used;
1334
if a parameter assignment operator is provided for a type, then the
1335
address of the object is passed by the calling function, and the called
1336
function declares a local of the same type, and uses the parameter
1337
assignment operator to copy the object (this should be remembered
1338
when passing parameters to entry points that have arguments of a type
1339
that has a parameter assignment operator defined). 
1340
</P>
1341
<P>
1342
The difference between the parameter assignment operator and the assignment
1343
operator is that the parameter identifier to the parameter assignment
1344
operator is a pointer to the object being manipulated, rather than
1345
the object itself.  An example reference assignment section is: 
1346
<PRE>
1347
	%parameter-assignments%
1348
 
1349
	ListT : ( l1 ) -&gt; ( l2 ) = @{
1350
		if ( @l2.head = @l1-&gt;head ) {
1351
			@l2.tail = @l1-&gt;tail ;
1352
		} else {
1353
			@l2.tail = &amp;( @l2.head ) ;
1354
		}
1355
	@} ;
1356
</PRE>
1357
</P>
1358
 
1359
<H3><A NAME="result-assign">6.7. The result assignments section</A></H3>
1360
<P>
1361
The result assignments section is very similar to the assignments
1362
section and the parameter assignments section (which it follows),
1363
and is also optional.  If a type has an assignment section entry,
1364
it must also have a result assignment entry.  The only difference
1365
between the two is that the result identifier of the result assignment
1366
operation is a pointer to the object being manipulated, rather than
1367
the object itself. Result assignments are only used when the results
1368
of a rule are assigned back through the reference parameters passed
1369
into the function.  An example result assignment section is: 
1370
<PRE>
1371
	%result-assignments%
1372
 
1373
	ListT : ( l1 ) -&gt; ( l2 ) = @{
1374
		if ( @l2-&gt;head = @l1.head ) {
1375
			@l2-&gt;tail = @l1.tail ;
1376
		} else {
1377
			@l2-&gt;tail = &amp;( @l2-&gt;head ) ;
1378
		}
1379
	@} ;
1380
</PRE>
1381
</P>
1382
 
1383
<H3><A NAME="term-result">6.8. The terminal result extraction section</A></H3>
1384
<P>
1385
The terminal result extraction section follows the reference assignment
1386
section.  It defines how to extract the results from terminals.  The
1387
section begins with its section header, followed by the terminal extraction
1388
definitions. 
1389
</P>
1390
<P>
1391
There must be a definition for every terminal in the grammar that
1392
returns a result.  It is an error to include a definition for a terminal
1393
that doesn't return a result.  The result of the definition should
1394
be the same as the result of the terminal.  An example of the terminal
1395
result extraction section follows: 
1396
<PRE>
1397
	%terminals%
1398
 
1399
	number : () -&gt; ( n ) = @{
1400
		@n = token.u.number ;
1401
	@} ;
1402
 
1403
	identifier : () -&gt; ( i ) = @{
1404
		@i = token.u.identifier ;
1405
	@} ;
1406
 
1407
	string : () -&gt; ( s ) = @{
1408
		@s = token.u.string ;
1409
	@} ;
1410
</PRE>
1411
</P>
1412
 
1413
<H3><A NAME="action-defn">6.9. The action definition section</A></H3>
1414
<P>
1415
The action definition section follows the terminal result extractor
1416
definition section.  The format is similar to the previous sections:
1417
the section header followed by definitions for all of the actions.
1418
An action definition has the following form: 
1419
<PRE>
1420
	&lt;action-name&gt; : ( parameters ) -&gt; ( results ) = code-block ;
1421
</PRE>
1422
This is similar to the form of all previous definitions, except that
1423
the name is surrounded in angle brackets.  What follows is also true
1424
of the other definitions as well (unless they state otherwise). 
1425
</P>
1426
<P>
1427
The <CODE>action-name</CODE> is a <CODE>sid</CODE> identifier that
1428
is the name of the action being defined; <CODE>parameters</CODE> is
1429
a comma separated list of C identifiers that will be the names of
1430
the parameters passed to the action, and <CODE>results</CODE> is a
1431
comma separated list of C identifiers that will be the names of the
1432
result parameters passed to the action.  The <CODE>code-block</CODE>
1433
is the C code that defines the action.  It is expected that this will
1434
assign a valid result to each of the result identifier names. 
1435
</P>
1436
<P>
1437
The parameter and result tuples have the same form as in the language
1438
independent file, except that the types are optional.  Like the language
1439
independent file, if the type of an action is zero-tuple to zero-tuple,
1440
then the type can be omitted, e.g.: 
1441
<PRE>
1442
	&lt;action&gt; = @{ /* .... */ @} ;
1443
</PRE>
1444
An example action definition section is: 
1445
<PRE>
1446
	%actions%
1447
 
1448
	&lt;add&gt; : ( v1, v2 ) -&gt; ( v3 ) = @{
1449
		@v3 = @v1 + @v2 ;
1450
	@} ;
1451
 
1452
	&lt;subtract&gt; : ( v1 : NumberT, v2 : NumberT ) -&gt; ( v3 : NumberT ) = @{
1453
		@v3 = @v1 - @v2 ;
1454
	@} ;
1455
 
1456
	&lt;multiply&gt; : ( v1 : NumberT, v2 ) -&gt; ( v3 ) = @{
1457
		@v3 = @v1 * @v2 ;
1458
	@} ;
1459
 
1460
	&lt;divide&gt; : ( v1, v2 ) -&gt; ( v3 : NumberT ) = @{
1461
		@v3 = @v1 / @v2 ;
1462
	@} ;
1463
 
1464
	&lt;print&gt; : ( v ) -&gt; () = @{
1465
		printf ( &quot;%u\n&quot;, @v ) ;
1466
	@} ;
1467
 
1468
	&lt;error&gt; = @{
1469
		fprintf ( stderr, &quot;ERROR\n&quot; ) ;
1470
		exit ( EXIT_FAILURE ) ;
1471
	@} ;
1472
</PRE>
1473
Do not define static variables in action definitions; if you do, you
1474
will get unexpected results.  If you wish to use static variables
1475
in actions definitions, then define them in the header block. 
1476
</P>
1477
 
1478
<H3><A NAME="trailer">6.10. The trailer section</A></H3>
1479
<P>
1480
After the action definition section comes the trailer section.  This
1481
has the same form as the header section.  An example is: 
1482
<PRE>
1483
	%trailer% @{
1484
	int main ()
1485
	{
1486
		next_token () ;
1487
		calculator ( NULL ) ;
1488
		return 0 ;
1489
	}
1490
	@}, @{
1491
	@} ;
1492
</PRE>
1493
The code blocks will be appended to the generated parser, and the
1494
generated header file respectively. 
1495
</P>
1496
 
1497
<HR>
1498
<H2><A NAME="predicate">7. Predicates</A></H2>
1499
<P>
1500
Predicates provide the user with a mechanism for altering the control
1501
flow in a manner that terminals alone cannot do. 
1502
</P>
1503
<P>
1504
During the factorisation process, rules that begin with predicates
1505
are expanded if necessary to ensure that predicates that may be used
1506
to select which alternative to go down always begin the alternative,
1507
e.g.: 
1508
<PRE>
1509
	rule1 = {
1510
		rule2 ;
1511
		/* .... */
1512
	    ||
1513
		/* .... */
1514
	} ;
1515
 
1516
	rule2 = {
1517
		? = &lt;predicate&gt; ;
1518
		/* .... */
1519
	    ||
1520
		/* .... */
1521
	} ;
1522
</PRE>
1523
would be expanded into: 
1524
<PRE>
1525
	rule1 = {
1526
		? = predicate ;
1527
		/* .... */
1528
		/* .... */
1529
	    ||
1530
		/* .... */
1531
		/* .... */
1532
	    ||
1533
		/* .... */
1534
	} ;
1535
</PRE>
1536
Also, if a predicate is used to select which alternative to use, it
1537
must be the first thing in the alternative, so the following would
1538
not be allowed: 
1539
<PRE>
1540
	rule = {
1541
		&lt;action&gt; ;
1542
		? = &lt;predicate&gt; ;
1543
		/* .... */
1544
	    ||
1545
		/* .... */
1546
	} ;
1547
</PRE>
1548
When predicates begin a rule, they are executed (in some arbitrary
1549
order) until one of them returns true.  The alternative that this
1550
predicate begins is then selected.  If no predicates return true,
1551
then one of the remaining alternatives is selected based upon the
1552
current terminal (or an error occurs). 
1553
</P>
1554
<P>
1555
It is important that predicates do not contain dependencies upon the
1556
order of evaluation.  In practice, predicates are likely to be simple,
1557
so this shouldn't be a problem. 
1558
</P>
1559
<P>
1560
When predicates are used within an alternative, they behave like terminals.
1561
If they evaluate to true, then parsing continues.  If they evaluate
1562
to false, then an exception is raised. 
1563
</P>
1564
 
1565
<HR>
1566
<H2><A NAME="exception">8. Error handling</A></H2>
1567
<P>
1568
If the input given to the parser is valid, then the parser will not
1569
need to produce any errors.  Unfortunately this is not always the
1570
case, so 
1571
<CODE>sid</CODE> provides a mechanism for handling errors. 
1572
</P>
1573
<P>
1574
When an error occurs, an exception is raised.  This passes control
1575
to the nearest enclosing exception handler.  If there is no exception
1576
handler at all, the entry point function will return with the current
1577
terminal set to the error value. 
1578
</P>
1579
<P>
1580
An exception handler is just an alternative that is executed when
1581
a terminal or predicate fails.  This should obviate the need to rely
1582
upon language specific mechanisms (such as <CODE>setjmp</CODE> and
1583
<CODE>longjmp</CODE>) for error recovery. 
1584
</P>
1585
 
1586
<HR>
1587
<H2><A NAME="call-by-ref">9. Call by reference</A></H2>
1588
<P>
1589
The default behaviour of <CODE>sid</CODE> is to do argument passing
1590
using call by copy semantics, and to not allow mutation of parameters
1591
of rules and actions (however inlined rules, and rules created during
1592
factoring have call by reference parameters).  However it is possible
1593
to give rule and action parameters call by reference semantics, using
1594
the 
1595
<CODE>&amp;</CODE> symbol in the type specification (as described
1596
earlier).  It is also possible to mutate parameters of actions, using
1597
the <CODE>@=</CODE> substitution in the action body (also described
1598
earlier).  It is important to do the correct substitutions in action
1599
definitions, as <CODE>sid</CODE> uses this information to decide where
1600
it can optimise the output code. 
1601
</P>
1602
<P>
1603
If a call by copy parameter is mutated, then <CODE>sid</CODE> will
1604
introduce a new temporary variable and copy the parameter into it
1605
- this temporary will then be mutated.  Similar code will be output
1606
for rules that have call by copy parameters that are mutated (e.g.
1607
as a call by reference argument to an action that mutates its parameters).
1608
</P>
1609
 
1610
<HR>
1611
<H2><A NAME="call-entry">10. Calling entry points</A></H2>
1612
<P>
1613
When calling a function that implements an entry point rule, it should
1614
be called with the rule's parameters as the first arguments, followed
1615
by the addresses of the rule's results as the remaining arguments.
1616
The parameters should have their addresses passed if they are of a
1617
type that has a parameter assignment operator defined, or if the parameter
1618
is a call by reference parameter. 
1619
</P>
1620
<P>
1621
For example, given the following rule: 
1622
<PRE>
1623
	rule1 : ( :Type1T, :Type2T, :Type3T &amp; ) -&gt; ( :Type4T ) ;
1624
</PRE>
1625
where <CODE>Type2T</CODE> has a parameter assignment operator defined,
1626
and <CODE>rule1</CODE> is mapped to <CODE>rule1</CODE> (and the type
1627
names are mapped to themselves), the call would be something like:
1628
<PRE>
1629
	Type1T a = make_type1 () ;
1630
	Type2T b = make_type2 () ;
1631
	Type3T c = make_type3 () ;
1632
	Type4T d ;
1633
 
1634
	rule1 ( a, b, &amp;c, &amp;d ) ;
1635
</PRE>
1636
</P>
1637
 
1638
<HR>
1639
<H2><A NAME="glossary">11. Glossary</A></H2>
1640
<P>
1641
This section describes some of the terms used in the <CODE>sid</CODE>
1642
documentation. 
1643
<DL>
1644
 
1645
<DT><B>Alternative</B>
1646
<DD>
1647
An alternative is a sequence of items. 
1648
<P>
1649
 
1650
<DT><B>Exception handler</B>
1651
<DD>
1652
An exception handler is a special type of alternative.  Each rule
1653
may have at most one exception handler.  An exception handler is invoked
1654
if the current terminal does not match any of the expected terminals,
1655
if a predicate fails, or if an action raises an exception, within
1656
the scope of the exception handler. 
1657
<P>
1658
 
1659
<DT><B>Expansion</B>
1660
<DD>
1661
This is the process of physically inlining a rule into another rule.
1662
It is done during the factoring process to turn the grammar into a
1663
form that a parser can be produced for.  See the entry for factoring.
1664
<P>
1665
 
1666
<DT><B>Factoring</B>
1667
<DD>
1668
This is one of the transforms that <CODE>sid</CODE> performs on the
1669
grammar. See the <A HREF="#overview">overview section</A> for a description
1670
of the factoring process. 
1671
<P>
1672
 
1673
<DT><B>First set</B>
1674
<DD>
1675
The first set of a rule (or alternative) is the set of terminals and
1676
predicates that can start that rule (or alternative). 
1677
<P>
1678
 
1679
<DT><B>Follow set</B>
1680
<DD>
1681
The follow set of a rule is the set of terminals and predicates that
1682
can follow the rule in any of its invocations. 
1683
<P>
1684
 
1685
<DT><B>Inlining</B>
1686
<DD>
1687
This is the process of outputting the code for parsing one rule within
1688
the function that parses another rule.  This is normally done as part
1689
of the output process.  Expansion is a form of inlining performed
1690
during the factoring process, but the inlining is done by modifying
1691
the grammar, rather than as part of the output phase. 
1692
<P>
1693
 
1694
<DT><B>Item</B>
1695
<DD>
1696
An item is the equivalent of a statement in a conventional programming
1697
language.  It can be an invocation of a rule, terminal, action or
1698
predicate, or an identity operation (assignment). 
1699
<P>
1700
 
1701
<DT><B>Name</B>
1702
<DD>
1703
A name is an identifier that is used to pass information between rules
1704
and actions.  Local names are defined within a rule, and only exist
1705
within the rule itself.  Non-local names are defined in a rule's scoped
1706
definitions section and exists in all of the rules in that scope.
1707
Non-local rules are also saved and restored across calls to the rule
1708
that defines them. 
1709
<P>
1710
 
1711
<DT><B>Recursion</B>
1712
<DD>
1713
Recursion is where a rule invokes itself.  Direct recursion is where
1714
the rule invokes itself from one of its own alternatives; indirect
1715
recursion is where a rule invokes another rule (which invokes another
1716
rule etc.) which eventually invokes the original rule. 
1717
</P>
1718
<P>
1719
Left recursion is a form of recursion where all of the recursive calls
1720
occur as the first item in an alternative.  It is not possible to
1721
produce a parser for a grammar that contains left recursions, so 
1722
<CODE>sid</CODE> turns left recursions into right recursions.  This
1723
process is known as left recursion elimination. 
1724
</P>
1725
<P>
1726
Right recursion is a form of recursion where all of the recursive
1727
calls occur as the last item in an alternative. 
1728
</P>
1729
 
1730
<DT><B>Production</B>
1731
<DD>
1732
See rule. 
1733
<P>
1734
 
1735
<DT><B>Rule</B>
1736
<DD>
1737
A rule is a sequence of alternatives.  A rule may contain a special
1738
alternative that is used as an exception handler.  A rule is also
1739
referred to as a production; this term is normally used when talking
1740
about the definition of a rule. 
1741
<P>
1742
 
1743
<DT><B>See-through</B>
1744
<DD>
1745
A rule is said to be see-through if there is an expansion of the rule
1746
that does not contain any terminals or predicates. 
1747
</DL>
1748
</P>
1749
 
1750
<HR>
1751
<H2><A NAME="errors">12. Understanding error messages</A></H2>
1752
<P>
1753
This section tries to explain what some of the error messages that
1754
are reported by the <CODE>sid</CODE> transforms mean.  It does not
1755
contain descriptions of messages like &quot;type 'some type' is unknown&quot;,
1756
as these should be self-explanatory. 
1757
</P>
1758
 
1759
<H3><A NAME="left-errors">12.1. Left recursion elimination errors</A></H3>
1760
 
1761
<P>
1762
<B>The parameter or result types of the left recursive calls in the
1763
following productions do not match: <I>PRODUCTIONS</I></B>: This means
1764
that there is a set of rules which call each other left recursively
1765
(i.e. the first item in some of the alternatives in each rule is a
1766
call to another rule in the set), and they do not all have the same
1767
parameter and result types, e.g.: 
1768
<PRE>
1769
	rule1 : ( a : Type1T, b : Type1T, c : Type2T, d : Type2T ) -&gt; () = {
1770
		rule2 ( a, b ) ;
1771
	    ||
1772
		terminal1 ;
1773
	} ;
1774
 
1775
	rule2 : ( a : Type1T, b : Type2T ) -&gt; () = {
1776
		rule1 ( a, a, b, b ) ;
1777
	    ||
1778
		terminal2 ;
1779
	} ;
1780
</PRE>
1781
</P>
1782
 
1783
<P>
1784
<B>The exception handlers in the left recursion involving the following
1785
productions do not match: <I>PRODUCTIONS</I></B>: This means that
1786
there is a set of productions which call each other left recursively
1787
(i.e. the first item in an alternative is a call to another production
1788
in the set), and they do not all have the same exception handler,
1789
e.g.: 
1790
<PRE>
1791
	rule1 = {
1792
		rule2 ;
1793
	    ||
1794
		terminal1 ;
1795
	    ##
1796
		&lt;action1&gt ;
1797
	} ;
1798
 
1799
	rule2 = {
1800
		rule1 ;
1801
	    ||
1802
		terminal2 ;
1803
	    ##
1804
		&lt;action2&gt ;
1805
	} ;
1806
</PRE>
1807
It is quite likely that when using exception handlers, it may be necessary
1808
to do the left recursion elimination manually to ensure that the exception
1809
handlers occur at the correct place. 
1810
</P>
1811
 
1812
<P>
1813
<B>The argument names of the left recursive calls in the following
1814
productions do not match: <I>PRODUCTIONS</I></B>: This means that
1815
there is a set of productions which call each other left recursively
1816
(i.e. the first item in an alternative is a call to another production
1817
in the set), and the arguments to one of the left recursive calls
1818
are not the same as the parameters of the calling rule, e.g.: 
1819
<PRE>
1820
	rule1 : ( a : Type1T, b : Type1T ) -&gt; () = {
1821
		rule1 ( b, a ) ;
1822
	    ||
1823
		terminal1 ;
1824
	} ;
1825
</PRE>
1826
</P>
1827
 
1828
<P>
1829
<B>A non-local name in the rule '<I>RULE</I>' is not in scope in the
1830
rule '<I>RULE</I>' in the left recursive cycle involving the following
1831
productions: <I>PRODUCTIONS</I></B>: This means that there is a set
1832
of productions which call each other left recursively (i.e. the first
1833
item in an alternative is a call to another production in the set),
1834
and the first named rule uses a non-local name that is not in scope
1835
in the second named rule, e.g.: 
1836
<PRE>
1837
	rule1 [
1838
		name1 : Type1T ;
1839
		rule1_1 [
1840
			name1_1 : Type1T ;
1841
		] = {
1842
			rule1 ;
1843
			&lt;action1_1&gt; ( name1_1 ) ;
1844
		  ||
1845
			terminal1 ;
1846
		} ;
1847
	] = {
1848
		terminal2 ;
1849
	  ||
1850
		rule1_1 ;
1851
		&lt;action1&gt; ( name1 ) ;
1852
	} ;
1853
</PRE>
1854
</P>
1855
 
1856
<P>
1857
<B>The rule '<I>RULE</I>' declares non-local names in the left recursive
1858
cycle with more than one entry point involving the following productions:
1859
<I>PRODUCTIONS</I></B>: This means that there is a set of productions
1860
which call each other left recursively (i.e. the first item in an
1861
alternative is a call to another production in the set), and the named
1862
rule defines non-local variables even though it is not the only entry
1863
point to the cycle, e.g.: 
1864
<PRE>
1865
	rule1 [
1866
		name1 : Type1T ;
1867
		rule1_1 = {
1868
			&lt;action1_1&gt; ( name1 ) ;
1869
		} ;
1870
	] = {
1871
		terminal1 ;
1872
		rule1_1 ;
1873
	    ||
1874
		rule2 ;
1875
		&lt;action1&gt; ( name1 ) ;
1876
	} ;
1877
 
1878
	rule2 = {
1879
		rule1 ;
1880
		&lt;action2&gt ;
1881
	    ||
1882
		terminal2 ;
1883
	} ;
1884
</PRE>
1885
</P>
1886
 
1887
<P>
1888
<B>No cycle termination for the left recursive set involving the following
1889
rules: <I>RULES</I></B>: This means that there is a set of productions
1890
which call each other left recursively (i.e. the first item in an
1891
alternative is a call to another production in the set), and they
1892
do not contain an alternative that begins with a non-left recursive
1893
call, e.g.: 
1894
<PRE>
1895
	rule1 = {
1896
		rule2 ;
1897
	    ||
1898
		rule3 ;
1899
	} ;
1900
 
1901
	rule2 = {
1902
		rule1 ;
1903
	    ||
1904
		rule3 ;
1905
	} ;
1906
 
1907
	rule3 = {
1908
		rule1 ;
1909
	    ||
1910
		rule2 ;
1911
	} ;
1912
</PRE>
1913
</P>
1914
 
1915
<H3><A NAME="first-errors">12.2. First set computation errors</A></H3>
1916
 
1917
<P>
1918
<B>Cannot compute first set for <I>PRODUCTION</I></B>: This means
1919
that <CODE>sid</CODE> cannot compute the set of terminals and predicates
1920
that may start the production.  This is normally because there is
1921
a recursive call (or cycle) that contains no terminals, e.g.: 
1922
<PRE>
1923
	rule1 = {
1924
		&lt;action1&gt; ;
1925
		rule1 ;
1926
	    ||
1927
		terminal1 ;
1928
	} ;
1929
</PRE>
1930
This is not removed by the left recursion elimination phase, as the
1931
call is  not the leftmost item in the alternative. 
1932
</P>
1933
 
1934
<P>
1935
<B>Can see through to predicate '<I>PREDICATE</I>' in production 
1936
<I>PRODUCTION</I></B>: This means that there is a predicate that isn't
1937
the first item in its alternative, but is preceded only by see-through
1938
items, e.g.: 
1939
<PRE>
1940
	rule1 = {
1941
		&lt;action1&gt; ;
1942
		? = &lt;predicate&gt; ;
1943
	    ||
1944
		terminal1 ;
1945
	} ;
1946
</PRE>
1947
</P>
1948
 
1949
<P>
1950
<B>Can see through to predicates in rule '<I>RULE</I>' in production
1951
<I>PRODUCTION</I></B>: This means that the first rule has at least
1952
one predicate in its first set, and the second rule calls it in a
1953
position where it is not the first item in the alternative and is
1954
preceded only by see-through items, e.g.: 
1955
<PRE>
1956
	rule1 = {
1957
		? = &lt;predicate&gt; ;
1958
	    ||
1959
		terminal1 ;
1960
	} ;
1961
 
1962
	rule2 = {
1963
		&lt;action&gt; ;
1964
		rule1 ;
1965
	    ||
1966
		terminal2 ;
1967
	} ;
1968
</PRE>
1969
</P>
1970
 
1971
<P>
1972
<B>The rule '<I>RULE</I>' has all terminals in its first set and has
1973
a redundant see-through alternative</B>: This means that the rule's
1974
first set (the set of all terminals that can start the rule) includes
1975
all possible input terminals, and the rule also has a see-through
1976
alternative.  The see-through alternative will never be used, as one
1977
of the other alternatives will always be chosen. 
1978
</P>
1979
 
1980
<H3><A NAME="factor-errors">12.3. Factoring errors</A></H3>
1981
 
1982
<P>
1983
<B>Too many productions (<I>NUMBER</I>) created during factorisation</B>:
1984
This normally means that <CODE>sid</CODE> cannot factor the grammar.
1985
You will need to rewrite the offending part.  Unfortunately there
1986
is no easy way to do this.  Start by looking at the dump file for
1987
a set of rules that seem to have been expanded a lot. 
1988
</P>
1989
 
1990
<P>
1991
<B>The rule '<I>RULE</I>' cannot be expanded into '<I>RULE</I>' as
1992
the exception handlers don't match</B>: When <CODE>sid</CODE> performs
1993
factoring, it needs to expand calls to certain rules into the rules
1994
that calls them (this is described in the 
1995
<A HREF="#overview">overview section</A>).  If the called rule has
1996
an exception handler and it is not the same as the exception handler
1997
of the calling rule, then the expansion will fail. 
1998
</P>
1999
 
2000
<P>
2001
<B>The rule '<I>RULE</I>' cannot be expanded into '<I>RULE</I>' as
2002
it contains non-local name definitions</B>: When <CODE>sid</CODE>
2003
performs factoring, it needs to expand calls to certain rules into
2004
the rules that calls them (this is described in the 
2005
<A HREF="#overview">overview section</A>).  If the called rule defines
2006
any non-local names, then the expansion will fail. 
2007
</P>
2008
 
2009
<H3><A NAME="check-errors">12.4. Checking errors</A></H3>
2010
 
2011
<P>
2012
<B>Collision of terminal(s) <I>TERMINALS</I> in rule '<I>RULE</I>'</B>:
2013
This error means that more than one alternative in the named rule
2014
begins with the named terminals, e.g.: 
2015
<PRE>
2016
	rule1 = {
2017
		&lt;action1&gt; ;
2018
		terminal1 ;
2019
	    ||
2020
		terminal1 ;
2021
	} ;
2022
</PRE>
2023
Normally, the factoring process will remove the problem, but when
2024
something like the above happens to stop the factoring occurring,
2025
this error will be produced. 
2026
</P>
2027
 
2028
<P>
2029
<B>Collision of predicate '<I>PREDICATE</I>' in rule '<I>RULE</I>'</B>:
2030
This error occurs when more than one alternative in the named rule
2031
begins with the named predicate, e.g.: 
2032
<PRE>
2033
	rule1 = {
2034
		( a, ? ) = &lt;predicate&gt; ;
2035
		&lt;action1&gt; ( a ) ;
2036
	    ||
2037
		( ?, b ) = &lt;predicate&gt; ;
2038
		&lt;action2&gt; ( b ) ;
2039
	} ;
2040
</PRE>
2041
Again, it is normally the case that the factoring process will remove
2042
this problem, but if the same predicate uses different predicate results
2043
in different alternatives, this error will be produced. 
2044
</P>
2045
 
2046
<P>
2047
<B>The terminal(s) <I>TERMINALS</I> can start rule '<I>RULE</I>' which
2048
is see-through, and the same terminal(s) may appear in the following
2049
situations: <I>ALTERNATIVES</I></B>: This means that there are one
2050
or more terminals that can start the named rule (which is see-through),
2051
and may also follow it, e.g.: 
2052
<PRE>
2053
	rule1 = {
2054
		terminal1 ;
2055
	    ||
2056
		$ ;
2057
	} ;
2058
 
2059
	rule2 = {
2060
		rule1 ;
2061
		terminal1 ;
2062
	    ||
2063
		terminal2 ;
2064
	} ;
2065
</PRE>
2066
The alternatives listed are the alternatives which call the rule,
2067
and contain (some of) the named terminals after the call.  The call
2068
is highlighted. 
2069
</P>
2070
 
2071
<P>
2072
<B>The predicate(s) <I>PREDICATES</I> can start rule '<I>RULE</I>'
2073
which is see-through and the same predicate(s) may appear in the following
2074
situations: <I>ALTERNATIVES</I></B>: This means that there are one
2075
or more predicates that can start the named rule (which is see-through),
2076
and may also follow it, e.g.: 
2077
<PRE>
2078
	rule1 = {
2079
		? = &lt;predicate&gt; ;
2080
	    ||
2081
		$ ;
2082
	} ;
2083
 
2084
	rule2 = {
2085
		terminal1 ;
2086
		rule1 ;
2087
		? = &lt;predicate&gt; ;
2088
	    ||
2089
		terminal2 ;
2090
	} ;
2091
</PRE>
2092
The alternatives listed are the alternatives which call the rule,
2093
and contain (some of) the named predicates after the call.  The call
2094
is highlighted. 
2095
</P>
2096
 
2097
<P>
2098
<B>The rule '<I>RULE</I>' contains more than one see-through alternative</B>:
2099
This error occurs if the rule has more than one alternative that doesn't
2100
need to read a terminal or a predicate, e.g.: 
2101
<PRE>
2102
	rule1 = {
2103
		&lt;action1&gt; ;
2104
	    ||
2105
		&lt;action2&gt; ;
2106
	} ;
2107
</PRE>
2108
</P>
2109
 
2110
<HR>
2111
<P><I>Part of the <A HREF="../index.html">TenDRA Web</A>.<BR>Crown
2112
Copyright &copy; 1998.</I></P>
2113
</BODY>
2114
</HTML>