AST

Introduction

The AST is the main data structure of the front-end and is created by the parser.

AST stands for Abstract Syntax Tree.

This is a tree because it is a graph with nodes and links between nodes. As the graph is acyclic and each node but the root has only one parent (the link that point to it). In the front-end there is only one root which represent the set of libraries.

The tree is a syntax tree because it follows the grammar of the VHDL language: there is for example a node per operation (like or, and or +), a node per declaration, a node per statement, a node per design unit (like entity or architecture). The front-end needs to represent the source file using the grammar because most of the VHDL rules are defined according to the grammar.

Finally, the tree is abstract because it is an abstraction of the source file. Comments and layout aren’t kept in the syntax tree. Furthermore, if you rename a declaration or change the value of a literal, the tree will have exactely the same shape.

But we can also say that the tree is neither abstract, nor syntaxic and nor a tree.

It is not abstract because it contains all the information from the source file (except comments) are available in the AST, inclusing the location. So the source file can be reprinted (the name unparsed is also used) from the AST. If a mechanism is also added to deal with comments, the source file can even be pretty-printed from the AST.

It is not purely syntaxic because the semantic analysis pass decorate the tree with semantic information. For example the type of each expression and sub-expression is computed. This is necessary to detect some semantic error like assigning an array to an integer.

Finally, it is not anymore a tree because new links are added during semantic analysis. Simple names are linked to their declaration.

The AST in GHDL

The GHDL AST is described in file vhdl-nodes.ads.

An interesting particularity about the AST is the presence of a meta-model.

The meta-model is not formally described. What would be the meta-meta-model is very simple: there are elements and attributes. An element is composed of attributes, and an attribute is either a value (a flag, an integer, an enumeration) or a link to an element.

(When someone wants to be clever, he often speaks about meta-model in order to confuse you. Don’t let him impress you. The trick is to answer him with any sentence containing ‘meta-meta-model’).

In the GHDL meta-mode, there are only 3 elements:

  • variable list of nodes (List). These are like vectors as the length can be changed.
  • Fixed lists of nodes (Flist). The length of a fixed list is defined at creation.
  • Nodes. A node has a kind (Iir_Kind which is also defined in the file), and fields. The kind is set at creation and cannot be changed, while fields can be.

Or without using the word meta-model, the AST is composed of nodes and lists.

The meta-model describes the type of the attributes: most of them are either a node reference, a boolean flag or a enumerated type (like Iir_Staticness). But there are also links: a reference to another node or to a list.

The accessors for the node are generated automatically by the python script src/xtools/pnodes.py.

Why a meta-model ?

All ASTs could have a meta-model, because the definition of elements and attributes is very generic. But there is a detail: the definition of an element is static. So for each node, the list of attribute and their type is static and each list is a list of the same element type. So there is no bag, nor dynamic typing. This is per the definition of the meta-meta-model.

But in GHDL there is an API at the meta-model level in file vhdl-nodes_meta.ads. There is the list of all attribute types in enumeration Types_Enum. There is the list of all possible attributes in enumeration Fields_Enum. For a particular kind of node, you can get the list of fields with Get_Field and for every type, there is API to get or set any field of any node.

Having a meta-model API allows to build algorithm that deals with any node. The dumper (in file vhdl-disp_tree.ad[sb]) is used to dump a node and possibly its sub-nodes. This is very useful while debugging GHDL. It is written using the meta-model, so it knows how to display a boolean and the various other enumerated types, and how to display a list. To display a node, it just gets the kind of the type, prints the kind name and queries all the fields of the node. There is nothing particular to a specific kind, so you don’t need to modify the dumper if you add a node.

The dumper won’t be a strong enough reason by itself to have a meta-model. But the pass to create instances is a good one. When a vhdl-2008 package is instantiated, at least the package declaration is created in the AST (this is needed because there are possibly new types). And creating an instance using the meta-model is much simpler (and much more generic) that creating the instance using directly the nodes. The code to create instances is in files vhdl-sem_inst.ad[sb].

The meta-model API is moslty automatically generated by the python script.

Dealing with ownership

The meta-model also structures the tree, because there is a notion of ownership: every element (but the root) has only one parent that owns it, and there are no cycle in the ownership. So the tree is really a tree.

That simplifies algorithms because it is easier to walk a tree than a graph. It is also easier to free a sub-tree than a sub-graph.

Getting a real tree from the parser might look obvious, but it is not. Consider the following VHDL declaration:

variable v1, v2 : std_logic_vector (1 downto 0) := "00";

Both variables v1 and v2 share the same type and the same initial value. The GHDL AST uses two different strategies:

  • For the type, there is two fields in the node: subtype_indication and type. The subtype_indication is owned and set only on the first variable to the output of the parser. The type field is a reference and set on all variables to the result of analysis of subtype_indication.
  • For the initial value, there is only one field default_value that is set on all variables. But the ownership is controlled by a flag in the node (an attribute) named is_ref. It is set to false on the first variable and true for the others.

The notion of ownership is highlighten by the Rust language, and indeed this is an important notion. The implementation of the Rust AST has to be investigated.

Node Type

TBC: 32-bit, extensions.