Header lexy/parse_tree.hpp

Class lexy::parse_tree

lexy/parse_tree.hpp
namespace lexy
{
    template <reader Reader, typename TokenKind = void,
              typename MemoryResource = default-resource>
    class parse_tree
    {
    public:
        //=== construction ===//
        class builder;

        constexpr parse_tree();
        constexpr explicit parse_tree(MemoryResource* resource);

        parse_tree(const parse_tree&) = delete;
        parse_tree& operator=(const parse_tree&) = delete;

        parse_tree(parse_tree&&);
        parse_tree& operator=(parse_tree&&);

        //=== container interface ===//
        bool empty() const noexcept;

        void clear() noexcept;

        //=== nodes ===//
        class node;
        class node_kind;

        node root() const noexcept;

        //=== traversal ===//
        class traverse_range;

        traverse_range traverse(node n) const noexcept;
        traverse_range traverse() const noexcept;
    };

    template <input Input, typename TokenKind = void,
              typename MemoryResource = default-resource>
    using parse_tree_for
      = lexy::parse_tree<input_reader<Input>, TokenKind, MemoryResource>;
}

A lossless, untyped, immutable parse tree.

It is an ordered, rooted tree that represents the structure of the parsed inputs. It has two kinds of nodes: token nodes and production nodes. Token nodes represent individual tokens of the input: they store the lexy::lexeme and the lexy::token_kind and do not have children. Production nodes represent productions of the input: they are identified by the lexy::production_name and have zero or more child nodes for the tokens and child productions of the production. The root node of a non-empty tree is always the production node for the top-level production of the grammar.

The tree is parametrized on the Reader, which determines the type of lexy::lexeme stored by token nodes, and the TokenKind of lexy::token_kind. The latter is void by default, which means integers are used to identify tokens.

The tree is immutable: once constructed, the nodes cannot be modified in any way; changing a tree is only possible by re-assigning it. It is not copyable, but moveable.

All memory allocation for the tree is done via a MemoryResource object, which must be a class with the same interface as std::pmr::memory_resource. By default, it uses new and delete. The memory resource object passed to the constructor does not propagate during copy/move/swap.

Internally, the nodes of the tree are stored in big chunks of continuous memory. Each node has the size of three pointers and they form a linked list.

Tip
Use lexy::parse_as_tree to build a parse tree for an input.
Caution
The parse tree does not own the contents of token nodes, so make sure the input stays alive as long as the tree does.

Construction: Constructors

lexy/parse_tree.hpp
constexpr parse_tree();
constexpr explicit parse_tree(MemoryResource* resource);

Construct an empty parse tree without any nodes.

The default constructor is only valid for the `default-resource` and uses the default resource for memory allocation. The second overload assigns the specified resource, which is not changed by further assignment.

Construction: lexy::parse_tree::builder

lexy/parse_tree.hpp
class parse_tree::builder
{
    using iterator = typename Reader::iterator;

public:
    //=== constructors ===//
    explicit builder(parse_tree&& tree, production auto root);
    explicit builder(production auto root)
    : builder(parse_tree{}, root)
    {}

    //=== building ===//
    struct marker;

    marker start_production(production auto production);

    void token(token_kind<TokenKind> kind, iterator begin, _iterator _end);

    void finish_production(marker&& m);
    void cancel_production(marker&& m);

    parse_tree&& finish() &&;
};

Manually builds a non-empty parse tree.

The constructor can optionally take an existing parse tree, which will be clear()`ed. This allows re-using already allocated memory or a custom memory resource. The root node of the tree will be a production node for the specified `root production, which is the active node (see below).

Then the tree can be built using the following methods:

start_production

Start construction for a new production node for production and push it the active node’s list of children. It returns a marker object, which must eventually be passed to finish_production or cancel_production. The new production node will be the active node.

If production is a lexy::transparent_production, no new node is created. However, the marker object must still be passed to finish_production or cancel_production.

token

Construct a new token node and push it to the active node’s list of children. The node will have the specified lexy::token_kind and the lexeme [begin, end) of the input.

If kind is lexy::unknown_token_kind and begin == end, no token node is constructed.

If the production of the active node is a lexy::token_production, the previously last child of the production node is also a token node, and its kind is the same as kind, no token node is constructed. Instead, the previous token node is extended to cover everything up to end. This assumes that there is no gap between token nodes.

finish_production

Finishes the production node of the corresponding marker object, which must be the active node. The parent node will become active node again.

cancel_production

Cancels construction of the production node of the corresponding marker object, which must be the active node. The node and all children already added to it will be removed from the parse tree; it is returned to the same state it had before the corresponding start_production call.

finish

Finishes the construction of the entire tree and returns it. The active node must be the root node.

Container interface

lexy/parse_tree.hpp
bool empty() const noexcept; (1)

void clear() noexcept;       (2)
  1. Returns true if the tree is empty, false otherwise. An empty tree does not have any nodes.

  2. Clears the tree by removing all nodes, but without deallocating memory.

Nodes: lexy::parse_tree::node_kind

lexy/parse_tree.hpp
class parse_tree::node_kind
{
public:
    //=== access ===//
    bool is_token() const noexcept;
    bool is_production() const noexcept;

    bool is_root() const noexcept;
    bool is_token_production() const noexcept;

    const char* name() const noexcept;

    //=== comparison ===//
    friend bool operator==(node_kind lhs, node_kind rhs);
    friend bool operator!=(node_kind lhs, node_kind rhs);

    friend bool operator==(node_kind nk, token_kind<TokenKind> tk);
    friend bool operator==(token_kind<TokenKind> tk, node_kind nk);
    friend bool operator!=(node_kind nk, token_kind<TokenKind> tk);
    friend bool operator!=(token_kind<TokenKind> tk, node_kind nk);

    friend bool operator==(node_kind nk, production auto p);
    friend bool operator==(production auto p, node_kind nk);
    friend bool operator!=(node_kind nk, production auto p);
    friend bool operator!=(production auto p, node_kind nk);
};

Information about the kind of a node.

is_token

true if the node is a token node, false otherwise. is_token() == !is_production().

is_production

true if the node is a production node, false otherwise. is_production() == !is_token().

is_root

true if the node is the root node of the tree, false otherwise. The root node is always a production node.

is_token_production

true if the node is a production node that is a lexy::token_production, false otherwise.

name

For a production node, returns lexy::production_name. For a token node, returns .name() of its lexy::token_kind.

Node kinds can be compared with equality with each other, lexy::token_kind and productions. Two node kinds are equal if they are either both token nodes with the same token nodes, or both production nodes for the same production. A node kind and a token kind is equal, if the node kind is a token node with that kind, and a node kind and a production is equal, if it is a production node for that production.

Nodes: lexy::parse_tree::node

lexy/parse_tree.hpp
class parse_tree::node
{
public:
    //=== properties ===//
    void* address() const noexcept;

    node_kind kind() const noexcept;

    lexy::lexeme<Reader>           lexeme() const noexcept;
    lexy::token<Reader, TokenKind> token()  const noexcept;

    //=== relationships ===//
    node parent() const noexcept;

    class children_range;
    children_range children() const noexcept;

    class sibling_range;
    sibling_range siblings() const noexcept;

    bool is_last_child() const noexcept;

    //=== comparison ===//
    friend bool operator==(node lhs, node rhs) noexcept;
    friend bool operator!=(node lhs, node rhs) noexcept;
};

A reference to node in the parse tree.

Internally, this is just a pointer to the node data structure.

Properties:
address

The address of the referenced node in memory. It uniquely identifies the node.

kind

The lexy::parse_tree::node_kind of the node.

lexeme

For a token node, returns the lexy::lexeme of the node. For a production node, returns an empty lexeme.

token

Requires that the node is a token node; returns the stored lexy::token of the node.

Two node references can be compared for equality, which compares their addresses.

Node relationships: Parent

lexy/parse_tree.hpp
node parse_tree::node::parent() const noexcept;

Returns a reference to its parent node.

For the root node, which does not have a parent node, returns *this.

This operation is O(number of siblings).

Node relationships: Children

lexy/parse_tree.hpp
class parse_tree::node::children_range
{
public:
    class iterator; // value_type = node
    class sentinel;

    iterator begin() const noexcept;
    sentinel end()   const noexcept;

    bool empty() const noexcept;
    std::size_t size() const noexcept;
};

children_range parse_tree::node::children() const noexcept;

A sized range that iterates over all direct children of the referenced node in order.

For a token node, this is always an empty range.

Node relationships: Siblings

lexy/parse_tree.hpp
class parse_tree::node::sibling_range
{
public:
    class iterator; // value_type = node

    iterator begin() const noexcept;
    iterator end() const noexcept;
};

sibling_range parse_tree::node::siblings() const noexcept;

A range that iterates over all siblings of the referenced node.

The siblings of a node are all other child nodes of its parent. Iteration begins with the child that is following the referenced node and continues until the last child of the parent node. It then wraps around to the first child and ends when it reaches the referenced node again. The referenced node is not included; no node is its own sibling.

For the root node, this is always an empty range.


Nodes: Root node

lexy/parse_tree.hpp
node parse_tree::root() const noexcept;

A reference to the root node of the tree.

The tree must not be empty.

Traversal

lexy/parse_tree.hpp
namespace lexy
{
    enum class traverse_event
    {
        enter,
        exit,
        leaf,
    };
}
lexy/parse_tree.hpp
class parse_tree::traverse_range
{
public:
    class iterator; // struct value_type { traverse_event event; node node; };

    iterator begin() const noexcept;
    iterator end()   const noexcept;

    bool empty() const noexcept;
};

traverse_range parse_tree::traverse(node n) const noexcept;

traverse_range parse_tree::traverse() const noexcept;

A range that traverses all descendants of a node.

The first overload traverses all descendants of the node n, which includes n itself. The second overload traverses all nodes in the parse tree. For a non-empty tree, it is equivalent to traverse(root()). For an empty tree, it returns the empty range.

The value type of the traverse range’s iterator is a pair of lexy::traverse_event and node. The traverse event indicates why a node is visited, and node is the reference to the current node.

For a token node n, traverse(n) is a one element range whose value is n itself with the traverse_event::leaf.

For a production node n, traverse(n) is at least a two element range. The first element is n itself with the traverse_event::enter. It then recursively traverses all direct children of n. The final element is again n with the traverse_event::exit.

Example 1. Print a tree
auto depth = 0;
for (auto [event, node] : tree.traverse())
{
    switch (event)
    {
    case lexy::traverse_event::enter:
        ++depth;
        indent(depth);
        print_node(node);
        break;
    case lexy::traverse_event::exit:
        --depth;
        break;

    case lexy::traverse_event::leaf:
        indent(depth);
        print_node(node);
        break;
    }
}
Note
Traversing the parse tree is an optimized operation that does not involve dynamic memory allocation or recursion. Instead, each iteration step simply follows a pointer.

See also