Header lexy/dsl/delimited.hpp
Rules for parsing delimited/quoted strings with escape sequences.
Rule DSL lexy::dsl::delimited
lexy/dsl/delimited.hpp
namespace lexy
{
struct missing_delimiter {};
}
namespace lexy::dsl
{
struct delimited-dsl // note: not a rule itself
{
constexpr branch-rule auto open() const;
constexpr branch-rule auto close() const;
constexpr delimited-dsl limit(char-class-rule limit);
template <typename ErrorTag>
constexpr delimited-dsl limit(char-class-rule limit);
//=== rules ===//
constexpr rule auto operator()(char-class-rule c) const;
constexpr rule auto operator()(char-class-rule c, escape-dsl ... escapes) const;
};
constexpr delimited-dsl delimited(branch-rule auto open,
branch-rule auto close);
constexpr delimited-dsl delimited(branch-rule auto delim)
{
return delimited(delim, delim)
}
}
delimited
is not a rule, but a DSL for specifying rules that all parse zero or more "characters" surrounded by delimiters, with optional escape sequences.
It can be created using two overloads.
The first overload takes a branch rule that matches the open
(ing) delimiter and one that matches the close
(ing) delimiter.
The second overload takes just one rule that matches both opening and closing delimiter.
Common delimiters, like quotation marks, are predefined (see below).
Note | See lexy::dsl::brackets if you want to parse an arbitrary rule surrounded by brackets.
This is one is designed for lists of characters. |
Branch rules .open()
and .close()
lexy/dsl/delimited.hpp
constexpr branch-rule auto open() const;
constexpr branch-rule auto close() const;
.open()
/.close()
returns the branch rules that were passed to the top-level lexy::dsl::delimited()
.
.limit()
lexy/dsl/delimited.hpp
constexpr delimited-dsl limit(char-class-rule limit);
template <typename ErrorTag>
constexpr delimited-dsl limit(char-class-rule limit);
Provide a limit to detect a missing closing delimiter.
delimited
only stops parsing once it matches close
;
if close
is missing in the input it will consume the entire input.
By specifying a limit, which is a char class rule,
it fails once it matches one of them before the closing delimiter.
The second overload also specifies an ErrorTag
, which is used instead of lexy::missing_delimiter
.
Caution | The limit must be a character that must not be allowed inside the delimited. |
Rule .operator()
lexy/dsl/delimited.hpp
constexpr rule auto operator()(char-class-rule c) const;
constexpr rule auto operator()(char-class-rule c, escape-dsl ... escapes) const;
.operator()
returns a rule that parses zero or more occurrences of the char class rule c
inside the delimited,
with optional escape
sequences.
- Requires
The encoding of the input is a char encoding.
Each escape sequence
escapes
must begin with a distinct escape character (e.g. one with backslash and one with dollar).
- Parsing
Parses
open()
, then enters a loop where it repeatedly does the following in order:Tries parsing
close()
. If that succeeds, finishes.Tries match any of the token rules provided as a limit, if there are any, and tries to match
lexy::dsl::eof
. If any of them match, fails with a missing delimiter.For the second overload, tries to parse all
escapes
(see below for what it does).Parses
c
. While parsing the delimited, automatic whitespace skipping is disabled; whitespace is only skipped afterclose()
.
- Branch parsing
Tries parsing
open()
and backtracks if that did not succeed. Otherwise, parses the same loop as described above.- Errors
All errors raised by parsing
open()
.lexy::missing_delimiter
or a specifiedErrorTag
: if the limits match. Its range covers everything since the opening delimiter. The rule then fails.All errors raised by parsing
escape
. Recovers by simply continuing with the next iteration of the loop at the position whereescape
has left of. Note that no value ofescape
is produced.All errors raised by parsing
c
. It can recover for by simply discarding the bad character and continuing after it. Otherwise, it fails.
- Values
It creates a sink of the current context. The sink is invoked with a
lexy::lexeme
capturing everything consumed byc
; a sequence of contiguous characters is merged into a single lexeme. It is also invoked with every value produced byescape
. The invocations happen separately in lexical order. The rule then produces all values ofopen()
, the final value of the sink, and all values ofclose()
.- Parse tree
delimited
does has any special parse tree handling: it will create the nodes foropen()
, then the nodes for eachc
andescape
, and the nodes forclose()
. However, instead of creating separate token nodes for eachc
, adjacent token nodes are merged into a single one covering as much as possible. A character that is skipped during error recovery will create a token node whoselexy::predefined_token_kind
islexy::error_token_kind
.
struct quoted : lexy::token_production
{
static constexpr auto rule = [] {
// Arbitrary code points that aren't control characters.
auto c = -dsl::ascii::control;
return dsl::quoted(c);
}();
};
struct production
{
static constexpr auto whitespace = dsl::ascii::space;
static constexpr auto rule = dsl::p<quoted> + dsl::semicolon;
};
Tip | Use the sink lexy::as_string to produce a std::string from the rule. |
Predefined delimited
lexy/dsl/delimited.hpp
namespace lexy::dsl
{
constexpr delimited-dsl quoted = delimited(lit<"\"">);
constexpr delimited-dsl triple_quoted = delimited(lit<"\"\"\"">);
constexpr delimited-dsl single_quoted = delimited(lit<"'">);
constexpr delimited-dsl backticked = delimited(lit<“">);
constexpr delimited-dsl double_backticked = delimited(lit<"`”>);
constexpr delimited-dsl triple_backticked = delimited(lit<“`”>);
}
ASCII quotation marks are pre-defined.
Warning | The naming scheme for triple_quoted and single_quoted is not consistent,
but the terminology is common else where. |
Rule DSL lexy::dsl::escape
lexy/dsl/delimited.hpp
namespace lexy
{
struct invalid_escape_sequence {};
}
namespace lexy::dsl
{
struct escape-dsl // note: not a rule itself
{
constexpr escape-dsl rule(branch-rule auto r) const;
constexpr escape-dsl capture(branch-rule auto r) const;
template <const symbol_table& SymbolTable>
constexpr escape-dsl symbol(token-rule auto t) const;
template <const symbol_table& SymbolTable>
constexpr escape-dsl symbol() const;
};
constexpr escape-dsl escape(token-rule auto escape_char);
}
escape
is not a rule but a DSL for specifying escape sequences.
It is created by giving it the escape_char
, a token rule that matches the initial escape characters.
Common escape characters are predefined.
The various member functions all add potential rules that parse the part of an escape sequence after the initial escape character.
The resulting DSL can then only be used with delimited
, where it is treated like a branch rule and as such documented like one.
- Branch parsing
Tries to match and consume
escape_char
, backtracks otherwise. Afterescape_char
has been consumed, tries to parse each escape sequence (see below) in order of the member function invocations, like achoice
would.- Errors
All errors raised by each escape sequence.
escape
then fails butdelimited
recovers (see above).lexy::invalid_escape_sequence
: if none of the escape sequences match; its range covers theescape_char
.escape
then fails butdelimited
recovers (see above).
- Values
All values produced by the selected escape sequence.
delimited
forwards them to the sink in one invocation.
struct production
{
// A mapping of the simple escape sequences to their replacement values.
static constexpr auto escaped_symbols = lexy::symbol_table<char> //
.map<'"'>('"')
.map<'\\'>('\\')
.map<'/'>('/')
.map<'b'>('\b')
.map<'f'>('\f')
.map<'n'>('\n')
.map<'r'>('\r')
.map<'t'>('\t');
static constexpr auto rule = [] {
// Arbitrary code points that aren't control characters.
auto c = -dsl::ascii::control;
// Escape sequences start with a backlash.
// They either map one of the symbols,
// or a Unicode code point of the form uXXXX.
auto escape = dsl::backslash_escape //
.symbol<escaped_symbols>()
.rule(dsl::lit_c<'u'> >> dsl::code_point_id<4>);
return dsl::quoted(c, escape);
}();
// Need to specify a target encoding to handle the code point.
static constexpr auto value = lexy::as_string<std::string, lexy::utf8_encoding>;
};
Escape sequence .rule()
lexy/dsl/delimited.hpp
constexpr escape-dsl rule(branch-rule auto r) const;
.rule()
specifies an escape sequence that simply tries to parse the branch rule r
.
Escape sequence .capture()
lexy/dsl/delimited.hpp
constexpr escape-dsl capture(branch-rule auto r) const
{
return this->rule(lexy::dsl::capture(r));
}
.capture()
specifies an escape sequence that tries to parse the branch rule t
and produces a lexy::lexeme
.
It is equivalent to lexy::dsl::capture
.
Escape sequence .symbol()
lexy/dsl/delimited.hpp
template <const symbol_table& SymbolTable>
constexpr escape-dsl symbol(token-rule auto t) const
{
return this->rule(lexy::dsl::symbol<SymbolTable>(t));
}
template <const symbol_table& SymbolTable>
constexpr escape-dsl symbol() const
{
return this->rule(lexy::dsl::symbol<SymbolTable>);
}
.symbol()
specifies an escape sequence that parses a symbol.
The first overload forwards to argument version lexy::dsl::symbol
:
it matches t
and looks it up in the SymbolTable
and corresponding value produced.
The second overload forwards to the non-argument version that immediately looks up a symbol of the SymbolTable
.
Predefined escapes
lexy/dsl/delimited.hpp
namespace lexy::dsl
{
constexpr escape-dsl backslash_escape = escape(lit_c<'\\'>);
constexpr escape-dsl dollar_escape = escape(lit_c<'$'>);
}
Escape sequences beginning with common ASCII characters are pre-defined.
Note | They don’t actually define any escape sequences, just the initial character. |