gppg grammar: how to discard token(for example comment)

Mar 21, 2015 at 11:51 AM
Hi, I got lex and y files of ansi C (http://www.quut.com) and modify them by gppg compatible way. All ok, but I need "COMMENT" token for lexer. So my parser did not know about this token and failed every time. I am new in parsers theory, but I know about error discarding mode:
So my questions: 1) how to use token discard mechanism of PGGP? 2) if i don't want use error discard mode - how can I fix 'C' grammar from www.quut.com for COMMENT token?
Coordinator
Mar 22, 2015 at 11:46 PM
Edited Mar 22, 2015 at 11:57 PM
Hi fsmoke
Thanks for the enquiry. This is a common question -- there are some input sequences that you want to recognize (to make sure that they are well formed) but are white-space as far as the parser is concerned. There are two ways of doing this. The first is to write a rule in your lex file to recognize the pattern but do not return a token.
You can see an example of this in the file gppg.lex. Look at lines 95 to 111.

The other method is a little more complicated. If you look at any scanner file produced by GPLEX you will see that the yylex function goes ...
do {next = Scan(); } while (next >= parserMax); return next;
The idea is that the parser only wants to know about token values below parserMax. If you lex file does not define this value then you get ALL the tokens. However if you define parserMax in the *.y file then all of the tokens with higher numbers will be not be passed on to the parser. You will see an example of this in gplex.y and gplex.lex.
In gplex.y you have, at line 29
%token maxParseToken EOL csCommentL errTok repErr
This is the last line of token declarations, so the last four are > maxParserToken. This means that although the gplex.lex file says things like (line 89)
<*>{OneLineCmnt} { return (int)Tokens.csCommentL; }
the returned token will never make it out of yylex and through to the parser.
So, why would anyone want to write a "return someToken" that the parser would never see?
Well sometimes a scanner can have more than one client. For example if you have a scanner that feeds a parser, and also a text editor, the parser wants to skip comments, but the editor wants to see them, so it can color them a different color. There is a bit about "colorizing scanners" in the manual, in section 1.2.1

Best of luck with the project.
John
Marked as answer by k_john_gough on 3/22/2015 at 4:57 PM