Header lexy/dsl.hpp

The rule DSL for specifying the grammar.

template <typename T>
concept rule = ;

The grammar in lexy is specified in several productions, where each one defines an associated rule. This rule is an object built from the objects and functions of namespace lexy::dsl that defines some (implementation-defined) parsing function. Parsing a rule takes the reader, which remembers the current position of the input, and the context, which stores information about the current production and whitespace rules, and is responsible for handling errors and values.

Parsing can have one of the following results:

  • Parsing can succeed. Then it consumes some input by advancing the reader position and produces zero or more values.

  • Parsing can fail. Then it reports an error, potentially after having consumed some input but without producing values. The parent rule can react to the failure by recovering from it or they fail itself.

  • Parsing can fail, but then recover. Then it has reported an error, but now it has consumed enough input to be in a known good state and parsing continues normally. See error recovery  for details.

A branch rule is a special kind of rule that has an easy to check condition. They are used to guide decisions in the parsing algorithm. Every branch rule defines some (implementation defined) branch parsing function. It mostly behaves the same as the normal parse rule, but can have one additional result: branch parsing can backtrack. If it backtracks, it hasn’t consumed any input, raised errors or produced values. The parsing algorithm is then free to try another branch.

Note
The idea is that a branch rule can relatively quickly decide whether or not it should backtrack. If a branch rule does not backtrack, but fails instead, this failure is propagated and the parsing algorithm does not try another branch.

A token rule is a special kind of rule that describes the atomic elements. Parsing them never produces any values and can happen easily, as such they’re also branch rules where the entire rule is used as the condition. Because they’re atomic elements of the input, they also participate in automatic whitespace skipping : after every token, lexy will automatically skip whitespace, if one has been defined.

The parse context stores state that can be accessed during parsing. This includes things like the current recursion depth, see lexy::dsl::recurse , whether or not automatic whitespace skipping is currently enabled, see whitespace skipping , but also arbitrary user-defined variables, see lexy::dsl::context_flag , lexy::dsl::context_counter , and lexy::dsl::context_identifier .

When a rule modifies the context during parsing, by adding an additional context variable for example, this modification is available for all following rules in the current production and all child productions. In particular, the modification is no longer visible in any parent production. If a rule is parsed in a loop, e.g. by lexy::dsl::loop  or lexy::dsl::list , any context modification does not persist between loop iterations, and is also not available outside the loop.

How to read the DSL documentation [1]

The behavior of a rule is described by the following sections.

Matching/parsing

This section describes what input is matched for the rule to succeed, and what is consumed. For token rules it is called "matching", otherwise "parsing".

It often delegates to the behavior of other rules: Here, the term "parsing" refers to the parsing operation of a rule, "branch parsing" or "try to parse" refers to the special parsing operation of a branch rule, which can backtrack, "matching" refers to the parsing operation of a token rule, which cannot produce values, and "try matching" refers to the branch parsing operation of a token rule, which cannot produce values or raise errors.

Branch parsing

This section describes what input is matched, consumed, and leads to a backtracking for a branch rule. Note that a rule can parse something different here than during non-branch parsing.

Errors

This section describes what errors are raised, when, and where. It also describes whether the rule can recover after the error.

Values

This section describes what values are produced during a successful parsing operation. It is omitted for token rules, which never produce values.

Parse tree

This section describes what nodes are created in the lexy::parse_tree. If omitted, a token rule creates a single token node covering everything consumed, and a rule produces no extra nodes besides the ones created by the other rules it parses.

If a rule parses another rule in a new context (e.g. lexy::dsl::peek ), the other rule does not have access to context variables, and any context modification is not visible outside of the rule.

The rule DSL

Primitive rules
lexy::dsl::any 

match anything

lexy::dsl::eof 

match EOF

lexy::dsl::newline  and lexy::dsl::eol 

match the end of a line

Literal rules
lexy::dsl::lit_c 

match a single character

lexy::dsl::lit  and LEXY_LIT 

match character sequences

lexy::dsl::lit_b 

match a sequence of bytes

lexy::dsl::lit_cp 

match a code point with the specified value

punctuators 

match common punctuation

lexy::dsl::literal_set  and LEXY_LITERAL_SET 

match one of the specified literals

lexy::dsl::followed_by  and lexy::dsl::not_followed_by 

ensure a literal is (not) followed by a char class

lexy::dsl::ascii::case_folding  and lexy::dsl::unicode::simple_case_folding 

match a literal case-insensitively

Char classes
lexy::dsl::code_point 

match specific Unicode code points

lexy::dsl::ascii 

match ASCII char classes

lexy::dsl::unicode 

match Unicode char classes

lexy::dsl::operator/ (char class) , lexy::dsl::operator- (unary) , lexy::dsl::operator- , lexy::dsl::operator& 

combine char classes

LEXY_CHAR_CLASS 

create a named char class

Branch conditions
lexy::dsl::operator>> 

add a branch condition to a rule

lexy::dsl::else_ 

branch condition that is always taken

lexy::dsl::peek  and lexy::dsl::peek_not 

check whether something matches without consuming it

lexy::dsl::lookahead 

check whether something matches somewhere in the input without consuming it

Combinators
lexy::dsl::token 

turn a rule into a token

lexy::dsl::operator+ 

parse a sequence of rules

lexy::dsl::operator| 

parse one of the specified (branch) rules

lexy::dsl::combination  and lexy::dsl::partial_combination 

parse all (some) of the (branch) rules in arbitrary order

lexy::dsl::if_  and lexy::dsl::opt 

parse a branch rule if its condition matches

lexy::dsl::loop 

parse a rule repeatedly

lexy::dsl::while_  and lexy::dsl::while_one 

parse a branch rule while its condition matches

lexy::dsl::list 

parse a list of things

lexy::dsl::times  and lexy::dsl::repeat 

parse a rule N times

lexy::dsl::until 

skip everything until a rule matches

Brackets and delimited
lexy::dsl::terminator 

parse something that ends with a terminator

lexy::dsl::brackets 

parse something surrounded by brackets

lexy::dsl::delimited  and lexy::dsl::escape 

parse everything between two delimiters, with optional escape sequences

Productions
lexy::dsl::p  and lexy::dsl::recurse 

parse another production

lexy::dsl::inline_ 

parse another production’s rule inline

lexy::dsl::return_ 

exit early from parsing a production

lexy::dsl::subgrammar 

parse a production defined in a different source file

Values
lexy::dsl::capture 

capture everything consumed by a token rule

lexy::dsl::position 

produce the current input position

lexy::dsl::nullopt 

produce an empty placeholder value

lexy::dsl::member 

parse something into a member variable

lexy::dsl::scan 

parse a completely user-defined rule

lexy::dsl::parse_as 

parses a rule ensuring it always produces a specific value

Errors and error recovery
lexy::dsl::error 

explicitly raise an error

lexy::dsl::must 

raise an error if a branch backtracks

lexy::dsl::try_ 

recover from a failed rule

lexy::dsl::recover 

recover by looking and then continuing with some other rule

lexy::dsl::find 

recover by looking for synchronization tokens

Whitespace
lexy::dsl::whitespace 

explicitly skip whitespace

lexy::dsl::no_whitespace 

do not skip whitespace

Identifiers
lexy::dsl::identifier 

parse an identifier

lexy::dsl::keyword 

parse a keyword

lexy::dsl::symbol 

parse one of the specified symbols and produce their value

lexy::dsl::flag  and lexy::dsl::flags 

parses (multiple) symbols representing enum flags in any order

Numbers
lexy::dsl::zero 

parse zero

lexy::dsl::digit 

parse a digit

lexy::dsl::digits 

parse one or more digits

lexy::dsl::n_digits 

parse N digits

lexy::dsl::integer 

convert digits to an integer

lexy::dsl::sign , lexy::dsl::plus_sign  and lexy::dsl::minus_sign 

parse a sign

lexy::dsl::code_point_id 

convert N digits into a code point

Operator precedence parsing
lexy::dsl::op 

parse an operator

lexy::dsl::operator/ (operator) 

parse one of multiple operators

expression 

parse an expression consisting of multiple operators

Context-sensitive parsing
Byte input
lexy::dsl::bytes  and lexy::dsl::padding_bytes 

parse N bytes

lexy::dsl::bint8 , lexy::dsl::bint16 , …​

parse a little/big endian integer

lexy::dsl::bits 

parse a byte with specific bit patterns

lexy::dsl::bom 

parse a byte-order mark (BOM)

Input and action specific rules
lexy::dsl::argv_separator 

match the argument separator of a lexy::argv_input  lexy::dsl::tnode Experimental and link:/reference/dsl/parse_tree_node/#pnode[lexy::dsl::pnode +<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><!--! Font Awesome Pro 6.1.1 by @fontawesome - https://fontawesome.com License - https://fontawesome.com/license (Commercial License) Copyright 2022 Fonticons, Inc. -→ <title>Experimental</title> <path d="M437.2 403.5L319.1 215L319.1 64h7.1c13.25 0 23.1-10.75 23.1-24l-.0002-16c0-13.25-10.75-24-23.1-24H120C106.8 0 96.01 10.75 96.01 24l-.0002 16c0 13.25 10.75 24 23.1 24h7.1L128 215l-117.2 188.5C-18.48 450.6 15.27 512 70.89 512h306.2C432.7 512 466.5 450.5 437.2 403.5zM137.1 320l48.15-77.63C189.8 237.3 191.9 230.8 191.9 224l.0651-160h63.99l-.06 160c0 6.875 2.25 13.25 5.875 18.38L309.9 320H137.1z"/></svg>

+]

match a node of a lexy::parse_tree_input Experimental

lexy::dsl::debug 

generate a debug event that is visualized by lexy::trace