Unexpected token, expecting EOF

Aug 23, 2013 at 10:03 PM
I'm creating a lexical analyzer and a parser that takes an input and returns a structured tree that the application can use. The lexer itself is pretty simple:
%option stack, minimize, parser, verbose, persistbuffer, unicode, compressNext, embedbuffers

andop AND|and|And
orop OR|or|Or
notop NOT|not|Not
literals \"[^"]+\"
expression [A-Za-z0-9()]+

%%
{andop} return (int)Tokens.CONJUNCTION;
{orop} return (int)Tokens.DISJUNCTION;
{notop} return (int)Tokens.NOTOPERAND;
{literals} {yylval.StringValue = yytext; return (int)Tokens.LITERAL;}
{expression} {yylval.StringValue = yytext; return (int)Tokens.EXPRESSION;}

/* skip whitespace */
[ \t\n]+ ;
%%
When I test this scanner on its own, it works fine and returns all the expected tokens. Here's the parser:
%{
    public ParameterList program = new ParameterList();
    public string field;
%}

%start program

%union{ 
public Parameter parameter;
public ParameterList parameterList;
public ParameterType type;
public CriteriaParameter criteriaParameter;
public Criterion criterion;
public CriteriaList criteriaList;
public string StringValue; 
}

%token CONJUNCTION
%token DISJUNCTION
%token COMPARISON
%token NOTOPERAND
%token <StringValue> LITERAL
%token <StringValue> EXPRESSION

%%
program             : parameterList {program = $1.parameterList;}
                    ;

parameterList       : { if ($$.parameterList == null) {$$.parameterList = new ParameterList();} }
                    | parameter { 
                                    if ($$.parameterList == null) {$$.parameterList = new ParameterList();}
                                    $$.parameterList.Add($1.parameter);
                                }
                    ;

parameter           : CONJUNCTION {$$.parameter = new OperationParameter(ParameterType.Conjunction, $$.parameterList); }
                    | DISJUNCTION {$$.parameter = new OperationParameter(ParameterType.Disjunction, $$.parameterList); }
                    | COMPARISON {$$.parameter = new OperationParameter(ParameterType.Comparison, $$.parameterList); }
                    | criterion {
                                    if ($$.criteriaList == null) {$$.criteriaList = new CriteriaList();}
                                    $$.criteriaList.Add($$.criterion);

                                    CriteriaParameter crParameter = new CriteriaParameter(field, $$.criteriaList);
                                    $$.parameter = crParameter;
                                }
                    ;

criterion           : NOTOPERAND criterion
                            {
                                Criterion currentCriterion = $2.criterion;
                                currentCriterion.Operator = Operator.DoesNotContain;
                                $$.criterion = currentCriterion;
                            }

                    | '(' criterion ')'
                        {
                            $$.criterion = $2.criterion;
                        }
                    | LITERAL
                            {
                                Criterion criterion = new Criterion();
                                criterion.Value = $1;
                                criterion.Operator = Operator.Contains;
                                $$.criterion = criterion;
                            }
                    | EXPRESSION 
                            {
                                Criterion criterion = new Criterion();
                                criterion.Value = $1;
                                criterion.Operator = Operator.Contains;
                                $$.criterion = criterion;
                            }
                    ;
%%

public Parser(Scanner scn) : base(scn) { }
When I invoke the parser for an input like "hello and world", I get the following parse error:

Syntax error, unexpected CONJUCTION, expecting EOF

It seems like the parser stops at the first token ("hello"), which makes me think my program rule is not defined correctly. How would I tell the parser to parse the whole input and not stop at the first token? Any help is appreciated.
Coordinator
Aug 25, 2013 at 5:30 AM
Hi
The problem is with your grammar. Briefly, you have a NonTerminal called "parameterList" but there is no recursion in the definition: in essence this declares that a parameterList is just a single parameter. So the parser stops after the first parameter expecting EOF. Here is my test framework, with all your semantic actions edited out.
%namespace gppg24
%output=Parser.cs
%visibility internal

%start program

%token CONJUNCTION
%token DISJUNCTION
%token COMPARISON
%token NOTOPERAND
%token LITERAL
%token EXPRESSION

%%

program             : parameterList 
                    ;

parameterList       : parameter
                    | parameterList parameter 
                    ;

parameter           : CONJUNCTION 
                    | DISJUNCTION 
                    | COMPARISON 
                    | criterion 
                    ;

criterion           : NOTOPERAND criterion

                    | '(' criterion ')'

                    | LITERAL

                    | EXPRESSION 
                    ;
%%

public Parser(Scanner scn) : base(scn) { }

public static void Main(string[] argp) {
  Scanner scanner = new Scanner();
  scanner.SetSource(argp);
  Parser parser = new Parser(scanner);
  parser.Parse();
}
There is also a serious problem with your lexical grammar. In your syntax you want to recognize the symbols '(' and ')', but these are captured by your catch-all definition of "EXPRESSION". I think you need a real grammar for expressions, with the left and right parentheses as separate tokens. Here is the lexical definition that I tested. I did not attempt to fix the problem that I mentioned with expressions.
%option stack, minimize, verbose, persistbuffer, unicode, compressNext, embedbuffers
%option out:Scanner.cs
%visibility internal

%namespace gppg24

andop AND|and|And
orop OR|or|Or
notop NOT|not|Not
literals \"[^"]+\"
expression [A-Za-z0-9()]+

%%
{andop}      return (int)Tokens.CONJUNCTION;
{orop}       return (int)Tokens.DISJUNCTION;
{notop}      return (int)Tokens.NOTOPERAND;
{literals}   return (int)Tokens.LITERAL;
{expression} return (int)Tokens.EXPRESSION;

/* skip whitespace */
[ \t\n]+ ;
%%
Finally, a piece of advice. If you need to follow the actions of your parser over a small input compile with the symbol TRACE_ACTIONS defined. You get a trace of parsing steps that is best understood from the output of > gppg /report file.y, or even gppg /report /verbose file.y.

Best of luck with the project.
John
Aug 26, 2013 at 6:16 PM
Edited Aug 26, 2013 at 10:55 PM
Hi John,
Thank you for your reply. I tried your code it compiles just fine. parser.Parse() also returns true when I use it in code, however, it's not returning the desired results. Lets say my input is "Hello and World". I should see two criteria parameters in my parameterList, though I only see "world". I assume I need to do the same thing (recursive call) on my parameter level, not sure how it should be structured though. Basically to have something like this:
parameter           : criterion 
                    | parameter CONJUNCTION
                    | parameter DISJUNCTION
                    | parameter COMPARISON
                    ;
I'm just not sure how to go about the actions in this case. Any help is appreciated.