Header lexy/dsl/identifier.hpp
The identifier
and keyword
rules.
Rule lexy::dsl::identifier
lexy/dsl/identifier.hpp
namespace lexy
{
struct reserved_identifier {};
}
namespace lexy::dsl
{
struct identifier-dsl // models branch-rule
{
//=== modifiers ===//
constexpr identifier-dsl reserve(auto ... rules) const;
constexpr identifier-dsl reserve_prefix(auto ... rules) const;
constexpr identifier-dsl reserve_containing(auto ... rules) const;
constexpr identifier-dsl reserve_suffix(auto ... rules) const;
//=== sub-rules ===//
constexpr token-rule auto pattern() const;
constexpr token-rule auto leading_pattern() const;
constexpr token-rule auto trailing_pattern() const;
};
constexpr identifier-dsl identifier(char-class-rule auto leading,
char-class-rule auto trailing);
constexpr identifier-dsl identifier(char-class-rule auto c)
{
return identifier(c, c);
}
}
identifier
is a rule that parses an identifier.
It can be created using two overloads.
The first overload takes a char class rule that matches the leading
character of the identifier,
and one that matches all trailing
characters after the first.
The second overload takes just one char class rule and uses it both as leading
and trailing
characters.
- Requires
The encoding of the input is a char encoding.
- Parsing
Matches and consumes the token
.pattern()
(see below). Then verifies that the lexeme formed from.pattern()
(excluding any trailing whitespace), is not reserved (see below).- Branch parsing
Tries to match and consume the token
.pattern()
(see below), backtracking if that fails. Otherwise it checks for reserved identifiers and backtracks if it was reserved. As such, branch parsing only raises errors due to the implicit whitespace skipping.- Errors
All errors raised by
.pattern()
. The rule then fails if not during branch parsing.lexy::reserved_identifier
: if the identifier is reserved; its range covers the identifier. The rule then recovers.
- Values
A single
lexy::lexeme
that is the parsed identifier (excluding any trailing whitespace).- Parse tree
The single token node created by
.pattern()
(see below). Its kind cannot be overridden.
Tip | Use the character classes from lexy::dsl::ascii for simple identifier matching as seen in the example. |
Tip | Use the callback lexy::as_string to convert the lexy::lexeme to a string. |
Reserving identifiers
lexy/dsl/identifier.hpp
constexpr identifier-dsl reserve(auto ... rules) const; (1)
constexpr identifier-dsl reserve_prefix(auto ... rules) const; (2)
constexpr identifier-dsl reserve_containing(auto ... rules) const; (3)
constexpr identifier-dsl reserve_suffix(auto ... rules) const; (4)
Reserves an identifier.
Initially, no identifier is reserved.
Identifiers are reserved by calling .reserve()
or its variants passing it a literal rule or lexy::dsl::literal_set
.
If this has happened, parsing the identifier
rule creates a partial input from the lexeme and matches it against the specified rules as follows:
(1)
.reserve()
: All rules specified here are matched against the partial input. If they match the entire partial input, the identifier is reserved.(2)
.reserve_prefix()
: All rules specified here are matched against the partial input. If they match a prefix of the partial input, the identifier is reserved.(3)
.reserve_containing()
: All rules specified here are matched against the partial input. If they match somewhere in the partial input, the identifier is reserved.(4)
.reserve_suffix()
: All rules specified here are matched against the partial input. If they match a suffix of the partial input, the identifier is reserved.
If one rule
passed to a .reserve()
call or variant uses case folding (e.g. lexy::dsl::ascii::case_folding
), all other rules in the same call also use that case folding, but not rules in a different call.
This is because internally each call creates a fresh lexy::dsl::literal_set
, which has that behavior.
struct production
{
static constexpr auto rule = [] {
// Define the general identifier syntax.
auto head = dsl::ascii::alpha_underscore;
auto tail = dsl::ascii::alpha_digit_underscore;
auto id = dsl::identifier(head, tail);
// Define some keywords.
auto kw_int = LEXY_KEYWORD("int", id);
auto kw_struct = LEXY_KEYWORD("struct", id);
// ...
// Parse an identifier
return id
// ... that is not a keyword,
.reserve(kw_int, kw_struct)
// ... doesn't start with an underscore,
.reserve_prefix(dsl::lit_c<'_'>)
// ... or contains a double underscore.
.reserve_containing(LEXY_LIT("__"));
}();
};
struct production
{
static constexpr auto rule = [] {
// Define the general identifier syntax.
auto head = dsl::ascii::alpha_underscore;
auto tail = dsl::ascii::alpha_digit_underscore;
auto id = dsl::identifier(head, tail);
// Define some case insensitive keywords.
auto kw_int = dsl::ascii::case_folding(LEXY_KEYWORD("int", id));
auto kw_struct = dsl::ascii::case_folding(LEXY_KEYWORD("struct", id));
// ...
// Parse an identifier that is not a keyword.
return id.reserve(kw_int, kw_struct);
}();
};
Caution | The identifier rule doesn’t magically learn about the keywords you have created.
They are only reserved if you actually pass them to .reserve() .
This design allows you to use a different set of reserved identifiers in different places in the grammar. |
Token rule .pattern()
lexy/dsl/identifier.hpp
constexpr token-rule auto pattern() const;
.pattern()
is a token rule that matches the basic form of the identifier without checking for reserved identifiers.
- Matching
Matches and consumes
leading
, then matches and consumeslexy::dsl::while_
(trailing)
, whereleading
andtrailing
are the arguments passed toidentifier()
. Whitespace skipping is disabled inside thepattern()
, but it will be skipped afterpattern()
.- Errors
All errors raised by matching
leading
. The rule then fails.- Parse tree
A single token node whose range covers everything consumed. Its
lexy::predefined_token_kind
islexy::identifier_token_kind
.
Token rules .leading_pattern()
, .trailing_pattern()
lexy/dsl/identifier.hpp
constexpr token-rule auto leading_pattern() const;
constexpr token-rule auto trailing_pattern() const;
They simply return leading
/trailing
from the arguments passed to identifier()
.
Literal rule lexy::dsl::keyword
lexy/dsl/identifier.hpp
namespace lexy::dsl
{
template <auto Char>
constexpr literal-rule auto keyword(identifier-dsl identifier);
template <auto Str>
constexpr literal-rule auto keyword(identifier-dsl identifier);
}
#define LEXY_KEYWORD(Str, Identifier) lexy::dsl::keyword<Str>(Identifier)
keyword
is a literal rule that matches a keyword.
- Matching
Tries to match and consume
identifier.pattern()
, i.e. the basic pattern of an identifier ignoring any reserved identifiers. Then creates a partial input that covers everything just consumed (without the trailing whitespace) and matcheslexy::dsl::lit
<Str>
on that input. Succeeds only if that consumes the entire partial input.- Errors
lexy::expected_keyword
: if eitheridentifier.pattern()
or thelit
rule failed. Its range covers the everything consumed byidentifier.pattern()
and its.string()
isStr
.- Parse tree
Single token node with the
lexy::predefined_token_kind
lexy::literal_token_kind
.
The macro LEXY_KEYWORD(Str, Identifier)
is equivalent to keyword<Str>(Identifier)
,
except that it also works on older compilers that do not support C++20’s extended NTTPs.
Use this instead of keyword<Str>(identifier)
if you need to support them.
Note | While lexy::dsl::lit <"int"> would happily consume a prefix of "integer" , keyword<"int">(id) , for a matching id , would not. |
Note | A keyword does not necessarily need to be a reserved identifier or vice-versa. |
Note | The encoding caveats of literal rules apply here as well. |
Tip | Use lexy::dsl::ascii::case_folding or its Unicode variants to parse a case insensitive keyword. |