Semantic value as RefType

Mar 8, 2012 at 9:20 AM
Edited Mar 8, 2012 at 9:22 AM

Hello! Thank you for GPPG. I use it in my course on compiler construction over three years (Russia, Southern Federal University, faculty of mechanics, mathematics and computer sciences).

One question. How correclly make semantic value ref type? Documentation says that it's possible. But I don't see good way.

I do the following things:

1. I define class Union { } instead of %union - this is perfectly clear

2. I write %YYSTYPE NameSpace.Union

3. In all lex branches I write following lines:

yylval = new Union();
yylval.ti = new token_info(yytext);

I.e. I write yylval = new Union(); before any lexical action. Am I right?

4. In yacc-generated file I write by hands the following line:

  protected override void DoAction(int action)
      CurrentSemanticValue = new Union();    /// Oh!!!!!


Am I right?

It seems to me  there is another way... But there is no appropriate example in documentation or in distributive. Even gppg.y uses old %union 


Mikhalkovich Stanislav
Mar 19, 2012 at 9:49 AM
Edited Mar 19, 2012 at 10:00 AM

Hi All.

And thanks to Mikhalkovich for raising this important point.

It is true that all the programs in the distribution, including the examples do not use reference types for YYSTYPE.  I will include a simple example for the future, because using reference types is a little different.  The most important point is that the "CurrentValue" belonging to the parser needs to be explicitly created.

However, using a reference type is almost always the best choice for compilers, or other tools that create abstract syntax trees.  Here is an outline of how that might work.

Define a new abstract class, say call it "Node", and declare

%YYSTYPE Namespace.Node

Now, Node will have derived classes like ExpressionNode, StatementNode, StatementListNode, BlockNode and so on.  Each of these classes will have a static method called MakeExpressionNode, MakeStatementNode and so on taking appropriate argument types. The lists will also have method to add new elements, and so on.  For simplicity I will assume that the names of the sub-types correspond to the names of the non-terminal symbols that they represent. However, an abstract syntax tree would probably use the same BinaryNode type for ALL expressions of binary valence.

Now a production such as for Block, would like like this --

Block : '{' StatementList '}'        { $$ = MakeBlockNode($2); } // $2 is of type StatementListNode


StatementList : Statement       { $$ = MakeStatementListNode($1); } // Method creates a singleton list.

                      | StatementList  Statement  { $1.Append($2); $$ = $1; } // no new object needed


Note that the fields of these classes will generally be of sub-types of Node, so the incoming arguments of Node type must be cast to the appropriate type inside the factory method, or in the text of the semantic action.

Hope this helps to get everyone started.  I will post a full example soon.

Mar 23, 2012 at 10:29 AM
Edited Mar 23, 2012 at 10:32 AM

Hi John!

Thank you for your answer and example. It is absolutely clear for me.

Your approach is good and simple but has some problems in my project.

1. My compiler has about 800 rules and about 70 fields in Union type. Among them there are such fields as SyntaxTreeNode (base class for all nodes in syntax tree), Object, List<int>, bool. For example:

for_statement : FOR id := expr direction expr DO statement { $$ = new ForNode($2,$4,$6,$8,$5,@$); }

direction : TO {$$ = true;}

               | DOWNTO {$$ = false;}


This example demonstates the necessity to use types such as bool together with types SyntaxTreeNode.

How to modify your approach with %YYSTYPE RealTree.Node? I think there are 2 ways.

The 1-st one is to use %union with 4-5 fields and with SyntaxTreeNode as one of these fields.

The 2-nd one is to use  %YYSTYPE Mynamespace.Node and to make fictive nodes such as FictiveSyntaxTreeNodeForBoolean, FictiveSyntaxTreeNodeForListObjects with the only field in every such class.

Both ways have one defect: they require EXPLICIT TYPE CAST in situations such as

expr : expr + expr { $$ = new BinaryNode((ExprNode)$1,(ExprNode)$2,@$); }

These type casts slows parsing of big program code about two times. I want to write

expr : expr + expr { $$ = new BinaryNode($1,$2,@$); }

and to have gain in performance 2 times!

I want to solve this problem in the following way. I'll use %union with 5-6 fields: SyntaxTreeNode and some other most frequent Nodes (4-5). I think this approach will be the middle way between reftype and big valuetype SemanticValue. I draw your attention to the fact that in situation where I used big valuetype SemanticValue (70-80 fields), I received the catastrophic runtime error StackOverflow (!!! - only under Win 7 - 64). Since that I decided not to use big valuetyped SemanticValue or to use reftype SemanticValue... Now I test this approach on performance.


And one question on your code RealTree Example

You write:

...    | LITERAL            { $$ = $1; }


...    |  LITERAL          // $$ is automatically lexer.yylval

Why in first case you write $$ = $1; and in second one you say this assignment will occur automatically? What is the the fundamental difference between these two cases?

And I thought that in case when  %YYSTYPE is reftype the default action is

$$ = null

but not a

$$ = $1

Am I wrong?


Mar 29, 2012 at 6:27 AM

Hi Mikhalkovich

Yes, you are correct.  In your case creating a small "union" with a Node field and some primitive types is a good choice.  If you declare the types of the non-terminals in the *.y file, then the field selection inside the semantic actions will be automatic.

As to your last questions:  The different treatment of the the default actions was just carelessness on my part. It is not wrong, just not consistent.

The default action for productions with empty right hand side is $$ = default(TValue).  So if %YYSTYPE is a reference type $$ = null.

The default action for any other production is $$ = $1.   In the case that the first symbol is a terminal this will be the scanner yylval at the time that the terminal was scanned.   For most scanners the yylval value will only be set for those symbols that actually have some extra information to pass to the parser.  For my RealTree example this only happens for numbers (passing a leaf node holding the double value), or for a variable name (passing a leaf node holding the name character).

 If I were to follow your example and declare a %union for the RealTree example it would allow the scanner to directly pass the double or the char and the tree builder would then access those fields to construct the leaf nodes.

Sometimes it may be important to know that the "default action" is executed even if there is a user-specified action.  So if wish to have a semantic action that goes

{ $$ = $1; DoSomething($2); }

then you may leave out the first statement, since the default action will perform the assignment anyway.  I have considered having the default action performed ONLY if there is no user-defined action, but making such a change would break existing applications!