Header lexy/token.hpp

Identifying and storing tokens of the input.

Enum lexy::predefined_token_kind

lexy/token.hpp
namespace lexy
{
    enum predefined_token_kind
    {
        unknown_token_kind,

        error_token_kind,
        whitespace_token_kind,
        any_token_kind,

        literal_token_kind,
        position_token_kind,
        eof_token_kind,

        identifier_token_kind,
        digits_token_kind,
    };
}

Predefined token kinds for special token rules, as given in the table below.

The predefined token kinds
Token KindToken Rule

lexy::unknown_token_kind

all token rules by default

lexy::error_token_kind

tokens produced during the "discard input" phase of error recovery, e.g. by lexy::dsl::find  or lexy::dsl::recover 

lexy::whitespace_token_kind

lexy::dsl::whitespace  (not actually a token rule)

lexy::any_token_kind

lexy::dsl::any , lexy::dsl::code_point  (without predicate), lexy::dsl::until 

lexy::literal_token_kind

lexy::dsl::lit , lexy::dsl::code_point  (literal version), and other tokens that are fully identified by their spelling

lexy::position_token_kind

lexy::dsl::position  (not actually a token rule)

lexy::eof_token_kind

lexy::dsl::eof 

lexy::identifier_token_kind

lexy::dsl::identifier  and lexy::dsl::symbol 

lexy::digits_token_kind

lexy::dsl::digit , lexy::dsl::digits , and other rules parsing digits

Class lexy::token_kind

lexy/token.hpp
namespace lexy
{
    template <typename TokenKind = void>
    class token_kind
    {
        using underlying-type
          = std::conditional_t<std::is_void_v<TokenKind>, int, TokenKind>;

    public:
        //=== constructors ===//
        constexpr token_kind() noexcept;
        constexpr token_kind(predefined_token_kind value) noexcept;

        constexpr token_kind(underlying-type value) noexcept;

        constexpr token_kind(token-rule auto token_rule) noexcept;

        //== access ===//
        constexpr explicit operator bool() const noexcept;
        constexpr bool is_predefined() const noexcept;
        constexpr bool ignore_if_empty() const noexcept;

        constexpr underlying-type get() const noexcept;

        constexpr const char* name() const noexcept;

        static constexpr std::uint_least16_t to_raw(token_kind kind) noexcept;
        static constexpr token_kind from_raw(std::uint_least16_t kind) noexcept;

        friend constexpr bool operator==(token_kind lhs, token_kind rhs) noexcept;
        friend constexpr bool operator!=(token_kind lhs, token_kind rhs) noexcept;
    };
}

Identifies a token-rule .

It either stores a lexy::predefined_token_kind or a user-defined token kind given by TokenKind. If TokenKind is void (the default), the assumes a user-defined token type is int. Otherwise, TokenKind must be an enumeration type.

Token rules are associated with their kind using .kind  or by specializing lexy::token_kind_map_for . Some token rules that require special behavior in the parse tree have a lexy::predefined_token_kind . For all others, the token kind is unknown by default.

Internally, all values are stored as a std::uint_least16_t.

Constructors

lexy/token.hpp
constexpr token_kind() noexcept
: token_kind(lexy::unknown_token_kind)
{}

constexpr token_kind(predefined_token_kind value) noexcept;

Initialize with the given lexy::predefined_token_kind .

lexy/token.hpp
constexpr token_kind(underlying-type value) noexcept;

template <typename T>
    requires std::is_enum_v<T>
token_kind(T value)                  -> token_kind<T>;
token_kind(std::integral auto value) -> token_kind<void>;

Initialize with the given user-defined token kind. If TokenKind is void, it accepts an int, otherwise TokenKind itself. value must fit in a 15bit unsigned integer.

If CTAD is used and the argument is an integer, deduces void for TokenKind. Otherwise, deduces the specified information type.

lexy/token.hpp
constexpr token_kind(token-rule auto token_rule) noexcept;

Initialize with the token kind of the given token rule.

This is determined as follows:

  1. If lexy::token_kind_map_for  instantiated with TokenKind contains a token kind for token_rule, uses that.

  2. Otherwise, if token_rule has been assigned a lexy::predefined_token_kind  by lexy, uses that.

  3. Otherwise, if token_rule has been assigned a user-defined token kind by .kind , whose type is compatible, uses that. If TokenKind == void, a user-defined token kind is compatible if it is an integral value; else, a user-defined token kind is compatible if it has the same enumeration type.

  4. Otherwise, uses lexy::unknown_token_kind.

Cases 2 and 3 are subject to the same range restrictions as the constructor that takes a user-defined value directly.

Access

lexy/token.hpp
constexpr explicit operator bool() const noexcept; (1)

constexpr bool is_predefined() const noexcept;     (2)

constexpr bool ignore_if_empty() const noexcept;   (3)
  1. Returns true if the token kind is not lexy::unknown_token_kind, false otherwise.

  2. Returns true if the token kind is user-defined (including unknown), false otherwise.

  3. Returns true if an empty token of that kind should be ignored by lexy::parse_tree  and related, false otherwise. It currently returns true for lexy::unknown_token_kind, lexy::error_token_kind, lexy::whitespace_token_kind.

lexy/token.hpp
constexpr underlying-type get() const noexcept;

Returns the value of the token kind.

If TokenKind is void, the return type is int. Otherwise, it is TokenKind.

If the token kind is user-defined, returns its value unchanged. If the token kind is predefined, returns an implementation defined value. This value is guaranteed to uniquely identify the predefined token kind and distinguish it from all user-defined token types, but it must not be passed to the constructor taking a user-defined token kind.

lexy/token.hpp
constexpr const char* name() const noexcept;

Returns the name of the token kind.

If the token kind is lexy::unknown_token_kind, the name is "token". If the token kind is some other predefined token kind, the name is a nice version of the enumeration name (e.g. "EOF" for lexy::eof_token_kind). If the token kind is user-defined and the ADL call token_kind_name(get()) resolves to a const char*, returns that. Otherwise, returns "token" for user-defined token kinds.

Note
ADL only works if the TokenKind is an enumeration and not void.

lexy::token_kind_map

lexy/token.hpp
namespace lexy
{
    class token-kind-map
    {
    public:
        template <auto TokenKind>
        consteval token-kind-map map(token-rule auto token_rule) const;
    };

    constexpr auto token_kind_map = token-kind-map();

    template <typename TokenKind>
    constexpr auto token_kind_map_for = token_kind_map;
}

Defines a compile-time mapping of token rules to a user-defined TokenKind enum.

It is initially empty. A mapping is added by calling .map() which associates TokenKind with the token_rule; its result is a map that contains this mapping in addition to all previous mappings. TokenKind must always have the same type.

The mapping is associated with the user-defined TokenKind enum by specializing token_kind_map_for; the default specialization is the empty mapping for all token kinds. This specialization is used by the lexy::token_kind  constructor that takes a token rule.

Example 1. Associate custom token kinds with the default playground example
enum class my_token_kind
{
    greeting,
    exclamation_mark,
};

template <>
constexpr auto lexy::token_kind_map_for<my_token_kind>
    // Start with the empty map.
    = lexy::token_kind_map
          // Map the greeting token.
          .map<my_token_kind::greeting>(LEXY_LIT("Hello"))
          // Map the exclamation token.
          .map<my_token_kind::exclamation_mark>(dsl::exclamation_mark);
Caution
Token rules are identified based on type. If two token rules are equivalent but have different types, they’re token kind is not going to be picked up.
Tip
It is usually better to specify the token kind inline in the grammar using .kind .

Class lexy::token

lexy/token.hpp
namespace lexy
{
    template <reader Reader, typename TokenKind = void>
    class token
    {
    public:
        using encoding  = typename Reader::encoding;
        using char_type = typename encoding::char_type;
        using iterator  = typename Reader::iterator;

        //=== constructors ===//
        explicit constexpr token(token_kind<TokenKind> kind,
                                 lexy::lexeme<Reader> lexeme) noexcept;
        explicit constexpr token(token_kind<TokenKind> kind,
                                 iterator begin, iterator end) noexcept;

        //=== access ===//
        constexpr token_kind<TokenKind> kind()   const noexcept;
        constexpr lexy::lexeme<Reader>  lexeme() const noexcept;

        constexpr const char* name() const noexcept
        {
            return kind().name();
        }

        constexpr iterator position() const noexcept
        {
            return lexeme().begin();
        }
    };

    template <input Input, typename TokenKind = void>
    using token_for = token<input_reader<Input>, TokenKind>;
}

Stores a token as a pair of lexy::token_kind  and lexy::lexeme .

A token is not to be confused with a token rule: the latter describes what sort of input constitutes a token (e.g. a sequence of decimal digits or the keyword int), while the former is the concrete realization of the rule (e.g. the number 123 at offset 10, or the keyword int at offset 23).

Constructors

lexy/token.hpp
explicit constexpr token(token_kind<TokenKind> kind,
                         lexy::lexeme<Reader> lexeme) noexcept;
explicit constexpr token(token_kind<TokenKind> kind,
                         iterator begin, iterator end) noexcept;

template <typename TokenKind, typename Reader>
token(token_kind<TokenKind>, lexy::lexeme<Reader>) -> token<Reader, TokenKind>;
template <typename T, typename Reader>
    requires std::is_enum_v<T>
token(T kind, lexy::lexeme<Reader>) -> token<Reader, T>;
template <typename T, typename Reader>
token(std::integral auto kind, lexy::lexeme<Reader>) -> token<Reader, void>;

Constructs the token from kind and lexeme.

If CTAD is used, the arguments can be deduced for the first overload.

See also