Header lexy/dsl/identifier.hpp

The identifier and keyword rules.

Rule lexy::dsl::identifier

lexy/dsl/identifier.hpp
namespace lexy
{
    struct reserved_identifier {};
}

namespace lexy::dsl
{
    struct identifier-dsl // models branch-rule
    {
        //=== modifiers ===//
        constexpr identifier-dsl reserve(rule auto ... rules) const;
        constexpr identifier-dsl reserve_prefix(rule auto ... rules) const;
        constexpr identifier-dsl reserve_containing(rule auto ... rules) const;

        //=== sub-rules ===//
        constexpr token-rule auto pattern() const;

        constexpr token-rule auto leading_pattern() const;
        constexpr token-rule auto trailing_pattern() const;
    };

    constexpr identifier-dsl identifier(token-rule auto leading,
                                        token-rule auto trailing);

    constexpr identifier-dsl identifier(token-rule auto pattern)
    {
        return identifier(pattern, pattern);
    }

}

identifier is a rule that parses an identifier.

It can be created using two overloads. The first overload takes a token rule that matches the leading character of the identifier, and one that matches all trailing characters after the first. The second overload takes just one token rule and uses it both as leading and trailing characters.

Parsing

Matches and consumes the token .pattern() (see below). Then verifies that the lexeme formed from .pattern() (excluding any trailing whitespace), is not reserved (see below).

Branch parsing

Tries to match and consume the token .pattern() (see below), backtracking if that fails. Otherwise it checks for reserved identifiers but does not backtrack when one is encountered.

Errors
  • All errors raised by .pattern(). The rule then fails if not during branch parsing.

  • lexy::reserved_identifier: if the identifier is reserved; its range covers the identifier. The rule then recovers.

Values

A single lexy::lexeme that is the parsed identifier (excluding any trailing whitespace).

Parse tree

The single token node created by .pattern() (see below). Its kind cannot be overridden.

Example 1. Parse a C like identifier
struct production
{
    static constexpr auto rule = [] {
        auto head = dsl::ascii::alpha_underscore;
        auto tail = dsl::ascii::alpha_digit_underscore;
        return dsl::identifier(head, tail);
    }();
};
Tip
Use the character classes from lexy::dsl::ascii for simple identifier matching as seen in the example.
Tip
Use the callback lexy::as_string to convert the lexy::lexeme to a string.

Reserving identifiers

lexy/dsl/identifier.hpp
constexpr identifier-dsl reserve(rule auto ... rules) const; (1)
constexpr identifier-dsl reserve_prefix(rule auto ... rules) const; (2)
constexpr identifier-dsl reserve_containing(rule auto ... rules) const; (3)

Reserves an identifier.

Initially, no identifier is reserved. Identifiers are reserved by calling .reserve() or its variants. If this has happened, parsing the identifier rule creates a partial input from the lexeme and matches it against the specified rules as follows:

  • (1) .reserve(): All rules specified here are matched against the partial input. If they match the entire partial input, the identifier is reserved. This is comparable to lexy::dsl::operator-.

  • (2) .reserve_prefix(): All rules specified here are matched against the partial input. If they match a prefix of the partial input, the identifier is reserved. This is comparable to lexy::dsl::operator- with lexy::dsl::any.

  • (3) .reserve_containing(): All rules specified here are matched against the partial input. If they match somewhere in the partial input, the identifier is reserved. This is comparable to lexy::dsl::find.

Example 2. Parse a C like identifier that is not reserved
struct production
{
    static constexpr auto rule = [] {
        // Define the general identifier syntax.
        auto head = dsl::ascii::alpha_underscore;
        auto tail = dsl::ascii::alpha_digit_underscore;
        auto id   = dsl::identifier(head, tail);

        // Define some keywords.
        auto kw_int    = LEXY_KEYWORD("int", id);
        auto kw_struct = LEXY_KEYWORD("struct", id);
        // ...

        // Parse an identifier
        return id
            // ... that is not a keyword,
            .reserve(kw_int, kw_struct)
            // ... doesn't start with an underscore,
            .reserve_prefix(dsl::lit_c<'_'>)
            // ... or contains a double underscore.
            .reserve_containing(LEXY_LIT("__"));
    }();
};
Caution
The identifier rule doesn’t magically learn about the keywords you have created. They are only reserved if you actually pass them to .reserve(). This design allows you to use a different set of reserved identifiers in different places in the grammar.
Note
The common case of passing keywords or literals to .reserve() is optimized using a trie.

Token rule .pattern()

lexy/dsl/identifier.hpp
constexpr token-rule auto pattern() const;

.pattern() is a token rule that matches the basic form of the identifier without checking for reserved identifiers.

Matching

Matches and consumes leading, then matches and consumes lexy::dsl::while_(trailing), where leading and trailing are the arguments passed to identifier(). Whitespace skipping is disabled inside the pattern(), but it will be skipped after pattern().

Errors

All errors raised by matching leading. The rule then fails.

Parse tree

A single token node whose range covers everything consumed. Its lexy::predefined_token_kind is lexy::identifier_token_kind.

Token rules .leading_pattern(), .trailing_pattern()

lexy/dsl/identifier.hpp
constexpr token-rule auto leading_pattern() const;
constexpr token-rule auto trailing_pattern() const;

They simply return leading/trailing from the arguments passed to identifier().

Token rule lexy::dsl::keyword

lexy/dsl/identifier.hpp
namespace lexy::dsl
{
    template <auto Str>
    constexpr token-rule auto keyword(identifier-dsl identifier);
}

#define LEXY_KEYWORD(Str, Identifier) lexy::dsl::keyword<Str>(identifier)

keyword is a token rule that matches a keyword.

Matching

Tries to match and consume identifier.pattern(), i.e. the basic pattern of an identifier ignoring any reserved identifiers. Then creates a partial input that covers everything just consumed (without the trailing whitespace) and matches lexy::dsl::lit<Str> on that input. Succeeds only if that consumes the entire partial input.

Errors

lexy::expected_keyword: if either identifier.pattern() or the lit rule failed. Its range covers the everything consumed by identifier.pattern() and its .string() is Str.

The macro LEXY_KEYWORD(Str, Identifier) is equivalent to keyword<Str>(Identifier), except that it also works on older compilers that do not support C++20’s extended NTTPs. Use this instead of keyword<Str>(identifier) if you need to support them.

Example 3. Parse a keyword
struct production
{
    static constexpr auto rule = [] {
        // Define the general identifier syntax.
        auto head = dsl::ascii::alpha_underscore;
        auto tail = dsl::ascii::alpha_digit_underscore;
        auto id   = dsl::identifier(head, tail);

        // Parse a keyword.
        return LEXY_KEYWORD("int", id);
    }();
};
Note
While lexy::dsl::lit<"int"> would happily consume a prefix of "integer", keyword<"int">(id), for a matching id, would not.
Note
A keyword does not necessarily need to be a reserved identifier or vice-versa.
Note
The same encoding caveats of lexy::dsl::lit apply here as well.

See also