Header lexy/dsl.hpp
The rule DSL for specifying the grammar.
template <typename T>
concept rule = …;
The grammar in lexy is specified in several productions, where each one defines an associated rule.
This rule is an object built from the objects and functions of namespace lexy::dsl
that defines some (implementation-defined) parsing function.
Parsing a rule takes the reader, which remembers the current position of the input, and the context, which stores information about the current production and whitespace rules, and is responsible for handling errors and values.
Parsing can have one of the following results:
Parsing can succeed. Then it consumes some input by advancing the reader position and produces zero or more values.
Parsing can fail. Then it reports an error, potentially after having consumed some input but without producing values. The parent rule can react to the failure by recovering from it or they fail itself.
Parsing can fail, but then recover. Then it has reported an error, but now it has consumed enough input to be in a known good state and parsing continues normally. See
error recovery
for details.
A branch rule is a special kind of rule that has an easy to check condition. They are used to guide decisions in the parsing algorithm. Every branch rule defines some (implementation defined) branch parsing function. It mostly behaves the same as the normal parse rule, but can have one additional result: branch parsing can backtrack. If it backtracks, it hasn’t consumed any input, raised errors or produced values. The parsing algorithm is then free to try another branch.
Note | The idea is that a branch rule can relatively quickly decide whether or not it should backtrack. If a branch rule does not backtrack, but fails instead, this failure is propagated and the parsing algorithm does not try another branch. |
A token rule is a special kind of rule that describes the atomic elements. Parsing them never produces any values and can happen easily, as such they’re also branch rules where the entire rule is used as the condition. Because they’re atomic elements of the input, they also participate in automatic whitespace skipping : after every token, lexy will automatically skip whitespace, if one has been defined.
The parse context stores state that can be accessed during parsing.
This includes things like the current recursion depth, see lexy::dsl::recurse
,
whether or not automatic whitespace skipping is currently enabled, see whitespace skipping ,
but also arbitrary user-defined variables, see lexy::dsl::context_flag
, lexy::dsl::context_counter
, and lexy::dsl::context_identifier
.
When a rule modifies the context during parsing, by adding an additional context variable for example,
this modification is available for all following rules in the current production and all child productions.
In particular, the modification is no longer visible in any parent production.
If a rule is parsed in a loop, e.g. by lexy::dsl::loop
or lexy::dsl::list
,
any context modification does not persist between loop iterations, and is also not available outside the loop.
How to read the DSL documentation [1]
The behavior of a rule is described by the following sections.
- Matching/parsing
This section describes what input is matched for the rule to succeed, and what is consumed. For token rules it is called "matching", otherwise "parsing".
It often delegates to the behavior of other rules: Here, the term "parsing" refers to the parsing operation of a rule, "branch parsing" or "try to parse" refers to the special parsing operation of a branch rule, which can backtrack, "matching" refers to the parsing operation of a token rule, which cannot produce values, and "try matching" refers to the branch parsing operation of a token rule, which cannot produce values or raise errors.
- Branch parsing
This section describes what input is matched, consumed, and leads to a backtracking for a branch rule. Note that a rule can parse something different here than during non-branch parsing.
- Errors
This section describes what errors are raised, when, and where. It also describes whether the rule can recover after the error.
- Values
This section describes what values are produced during a successful parsing operation. It is omitted for token rules, which never produce values.
- Parse tree
This section describes what nodes are created in the
lexy::parse_tree
. If omitted, a token rule creates a single token node covering everything consumed, and a rule produces no extra nodes besides the ones created by the other rules it parses.
If a rule parses another rule in a new context (e.g. lexy::dsl::peek
),
the other rule does not have access to context variables, and any context modification is not visible outside of the rule.
The rule DSL
Primitive rules
lexy::dsl::any
match anything
lexy::dsl::eof
match EOF
lexy::dsl::newline
andlexy::dsl::eol
match the end of a line
Literal rules
lexy::dsl::lit_c
match a single character
lexy::dsl::lit
andLEXY_LIT
match character sequences
lexy::dsl::lit_b
match a sequence of bytes
lexy::dsl::lit_cp
match a code point with the specified value
punctuators
match common punctuation
lexy::dsl::literal_set
andLEXY_LITERAL_SET
match one of the specified literals
lexy::dsl::followed_by
andlexy::dsl::not_followed_by
ensure a literal is (not) followed by a char class
lexy::dsl::ascii::case_folding
andlexy::dsl::unicode::simple_case_folding
match a literal case-insensitively
Char classes
lexy::dsl::code_point
match specific Unicode code points
lexy::dsl::ascii
match ASCII char classes
lexy::dsl::unicode
match Unicode char classes
lexy::dsl::operator/ (char class)
,lexy::dsl::operator- (unary)
,lexy::dsl::operator-
,lexy::dsl::operator&
combine char classes
LEXY_CHAR_CLASS
create a named char class
Branch conditions
lexy::dsl::operator>>
add a branch condition to a rule
lexy::dsl::else_
branch condition that is always taken
lexy::dsl::peek
andlexy::dsl::peek_not
check whether something matches without consuming it
lexy::dsl::lookahead
check whether something matches somewhere in the input without consuming it
Combinators
lexy::dsl::token
turn a rule into a token
lexy::dsl::operator+
parse a sequence of rules
lexy::dsl::operator|
parse one of the specified (branch) rules
lexy::dsl::combination
andlexy::dsl::partial_combination
parse all (some) of the (branch) rules in arbitrary order
lexy::dsl::if_
andlexy::dsl::opt
parse a branch rule if its condition matches
lexy::dsl::loop
parse a rule repeatedly
lexy::dsl::while_
andlexy::dsl::while_one
parse a branch rule while its condition matches
lexy::dsl::list
parse a list of things
lexy::dsl::times
andlexy::dsl::repeat
parse a rule
N
timeslexy::dsl::until
skip everything until a rule matches
Brackets and delimited
lexy::dsl::terminator
parse something that ends with a terminator
lexy::dsl::brackets
parse something surrounded by brackets
lexy::dsl::delimited
andlexy::dsl::escape
parse everything between two delimiters, with optional escape sequences
Productions
lexy::dsl::p
andlexy::dsl::recurse
parse another production
lexy::dsl::inline_
parse another production’s rule inline
lexy::dsl::return_
exit early from parsing a production
lexy::dsl::subgrammar
parse a production defined in a different source file
Values
lexy::dsl::capture
capture everything consumed by a token rule
lexy::dsl::position
produce the current input position
lexy::dsl::nullopt
produce an empty placeholder value
lexy::dsl::member
parse something into a member variable
lexy::dsl::scan
parse a completely user-defined rule
lexy::dsl::parse_as
parses a rule ensuring it always produces a specific value
Errors and error recovery
lexy::dsl::error
explicitly raise an error
lexy::dsl::must
raise an error if a branch backtracks
lexy::dsl::try_
recover from a failed rule
lexy::dsl::recover
recover by looking and then continuing with some other rule
lexy::dsl::find
recover by looking for synchronization tokens
Whitespace
lexy::dsl::whitespace
explicitly skip whitespace
lexy::dsl::no_whitespace
do not skip whitespace
Identifiers
lexy::dsl::identifier
parse an identifier
lexy::dsl::keyword
parse a keyword
lexy::dsl::symbol
parse one of the specified symbols and produce their value
lexy::dsl::flag
andlexy::dsl::flags
parses (multiple) symbols representing enum flags in any order
Numbers
lexy::dsl::zero
parse zero
lexy::dsl::digit
parse a digit
lexy::dsl::digits
parse one or more digits
lexy::dsl::n_digits
parse N digits
lexy::dsl::integer
convert digits to an integer
lexy::dsl::sign
,lexy::dsl::plus_sign
andlexy::dsl::minus_sign
parse a sign
lexy::dsl::code_point_id
convert N digits into a code point
Operator precedence parsing
lexy::dsl::op
parse an operator
lexy::dsl::operator/ (operator)
parse one of multiple operators
expression
parse an expression consisting of multiple operators
Context-sensitive parsing
lexy::dsl::context_flag
a boolean flag
lexy::dsl::context_counter
an integer counter
lexy::dsl::context_identifier
an identifier variable
Byte input
lexy::dsl::bytes
andlexy::dsl::padding_bytes
parse
N
byteslexy::dsl::bint8
,lexy::dsl::bint16
, …parse a little/big endian integer
lexy::dsl::bits
parse a byte with specific bit patterns
lexy::dsl::bom
parse a byte-order mark (BOM)
Input and action specific rules
lexy::dsl::argv_separator
match the argument separator of a
lexy::argv_input
lexy::dsl::tnode
and link:/reference/dsl/parse_tree_node/#pnode[lexy::dsl::pnode
+<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 448 512"><!--! Font Awesome Pro 6.1.1 by @fontawesome - https://fontawesome.com License - https://fontawesome.com/license (Commercial License) Copyright 2022 Fonticons, Inc. -→ <title>Experimental</title> <path d="M437.2 403.5L319.1 215L319.1 64h7.1c13.25 0 23.1-10.75 23.1-24l-.0002-16c0-13.25-10.75-24-23.1-24H120C106.8 0 96.01 10.75 96.01 24l-.0002 16c0 13.25 10.75 24 23.1 24h7.1L128 215l-117.2 188.5C-18.48 450.6 15.27 512 70.89 512h306.2C432.7 512 466.5 450.5 437.2 403.5zM137.1 320l48.15-77.63C189.8 237.3 191.9 230.8 191.9 224l.0651-160h63.99l-.06 160c0 6.875 2.25 13.25 5.875 18.38L309.9 320H137.1z"/></svg>- +]
match a node of a
lexy::parse_tree_input
lexy::dsl::debug
generate a debug event that is visualized by
lexy::trace