Header lexy/dsl/whitespace.hpp
Facilities for skipping whitespace.
By default, lexy does not treat whitespace in any particular way and it has to be parsed just like anything else in the input. However, as there are grammars that allow whitespace in a lot of places, it is often convenient to have it taken care of. lexy can be instructed to handle whitespace, using either manual or automatic whitespace skipping.
Manual whitespace skipping is done using lexy::dsl::whitespace
.
It skips zero or more of whitespace defined by ws
and can be inserted everywhere you want to skip over whitespace.
This method is recommended where whitespace is an essential part of the grammar.
See email.cpp
or xml.cpp
for examples of manual whitespace skipping.
Automatic whitespace skipping is done by adding a static constexpr auto whitespace
member to the root production.
This is a rule that defines default whitespace for the entire grammar, as the ws
argument did in the manual example.
lexy then skips zero or more occurrences of whitespace after every token rule in the grammar, unless it has been manually disabled (see below).
This method is recommend where whitespace is not important and is just there to format the input nicely.
See config.cpp
or json.cpp
for examples of automatic whitespace skipping.
Note | "Whitespace" does not mean literal whitespace characters. It can also include comments (or whatever else you want). |
Rule lexy::dsl::whitespace
lexy/dsl/whitespace.hpp
namespace lexy::dsl
{
class ws-rule // models rule
{};
constexpr ws-rule whitespace(rule auto ws);
constexpr ws-rule operator|(ws-rule rhs, rule auto lhs) const;
constexpr ws-rule operator|(rule auto rhs, ws-rule lhs) const;
}
The manual whitespace
overload is a rule that skips whitespace as defined by its argument.
- Requires
ws
is either a branch rule or achoice
that does not produce any values.ws
does not containlexy::dsl::p
orlexy::dsl::recurse
rule.
- Parses
Parses
lexy::dsl::loop
(ws | lexy::dsl::else_ >> lexy::dsl::break_)
in a context where whitespace skipping is disabled.- Errors
All errors raised during parsing of
ws | lexy::dsl::else_ >> lexy::dsl::break_
. The rule then fails ifws
has failed; even if in a branch context.- Parse tree
A single token node with the
lexy::predefined_token_kind
lexy::whitespace_token_kind
whose range covers everything consumed; all individual token nodes of the whitespace rules are merged into this one. It is only added to the parse tree if it is not empty.
For convenience, operator|
are overloaded for the whitespace rule.
Here, whitespace(a) | b
is entirely equivalent to whitespace(a | b)
, and likewise for the other overloads.
They simply allow adding more whitespace to a rule after it has already been wrapped in whitespace
.
Tip | Use lexy::dsl::ascii::space to skip all ASCII whitespace characters. |
Automatic whitespace skipping
For automatic whitespace skipping lexy inserts a lexy::dsl::whitespace
(ws)
rule after every token rule, a lexy::dsl::p
or lexy::dsl::recurse
rule that parses a production inheriting from lexy::token_production
, or after a lexy::dsl::no_whitespace
rule;
or when starting to parse a production that defines a new whitespace rule.
Here ws
is determined as follows:
If automatic whitespace skipping has been disabled (e.g. by using
lexy::dsl::no_whitespace()
),ws
is the rule that matches the empty string. As such, no automatic whitespace skipping takes place.If
lexy::production_whitespace
for the current production and the whitespace production is non-void,ws
is that rule. Here, the whitespace production is determined by following anylexy::dsl::p
orlexy::dsl::recurse
calls backwards, until a production that defines a::whitespace
member, the top-level production originally passed to a parse function, or a production inheriting fromlexy::token_production
is reached.Otherwise (if it is
void
),ws
is the rule that matches the empty string and no whitespace skipping takes place.
// An inner production that does not override the whitespace.
struct inner_normal
{
// After every token in this rule, the whitespace is '+',
// as determined by its root production `production`.
static constexpr auto rule //
= dsl::parenthesized(LEXY_LIT("inner") + LEXY_LIT("normal"));
};
// An inner production that overrides the current whitespace definition.
struct inner_override
{
static constexpr auto whitespace = dsl::lit_c<'-'>;
// After every token in this rule, the whitespace is '-',
// as determined by the `whitespace` member of the current production.
static constexpr auto rule //
= dsl::parenthesized(LEXY_LIT("inner") + LEXY_LIT("override"));
};
// A token production that does not have inner whitespace.
struct inner_token : lexy::token_production
{
struct inner_inner
{
// No whitespace is skipped here, as its root production is `inner_token`,
// which does not have a `whitespace` member.
static constexpr auto rule = LEXY_LIT("inner") + LEXY_LIT("token");
};
// No whitespace is skipped here, as the current production inherits from
// `lexy::token_production`.
static constexpr auto rule = dsl::parenthesized(dsl::p<inner_inner>);
};
// A token production that does have inner whitespace, but different one.
struct inner_token_whitespace : lexy::token_production
{
struct inner_inner
{
// After every token in this rule, the whitespace is '_',
// as determined by its root production `inner_token_whitespace`.
static constexpr auto rule //
= LEXY_LIT("inner") + LEXY_LIT("token") + LEXY_LIT("whitespace");
};
static constexpr auto whitespace = dsl::lit_c<'_'>;
static constexpr auto rule = dsl::parenthesized(dsl::p<inner_inner>);
};
// The root production defines whitespace.
struct production
{
static constexpr auto whitespace = dsl::lit_c<'+'>;
// After every token in this rule, the whitespace is '+',
// as determined by the `whitespace` member of the current production.
// Whitespace is also skipped after the two token productions.
static constexpr auto rule
= dsl::p<inner_normal> + dsl::comma + dsl::p<inner_override> + dsl::comma
+ dsl::p<inner_token> + dsl::comma + dsl::p<inner_token_whitespace> //
+ dsl::period + dsl::eof;
};
Caution | If e.g. a token production defines a new whitespace rule, this is skipped after the last token of the production. Then the whitespace rule of the parent production is skipped as well, as seen in the example. |
Tip | Use lexy::dsl::ascii::space to skip all ASCII whitespace characters. |
Rule lexy::dsl::no_whitespace
lexy/dsl/whitespace.hpp
namespace lexy::dsl
{
constexpr rule no_whitespace(rule auto rule);
constexpr branch-rule no_whitespace(branch-rule auto rule);
}
no_whitespace
is a rule that parses rule
without automatic whitespace skipping.
- (Branch) Parsing
Parses
rule
in a context where there is no current whitespace rule andlexy::dsl::whitespace
does nothing.- Errors
All errors raised by
rule
. The rule then fails ifrule
has failed.- Values
All values produced by
rule
.
Tip | In most situations, you should prefer a lexy::token_production instead.
no_whitespace is mostly used as implementation detail for rules that should never have whitespace skipping, like lexy::dsl::delimited . |
Caution | When r contains a lexy::dsl::p or lexy::dsl::recurse rule, whitespace skipping is re-enabled while parsing the production. |