Header `lexy/dsl/literal.hpp`

Token rules that match exact characters.

lexy/dsl/literal.hpp

template <typename T>
concept literal-rule = token-rule<T> && …;

A literal rule is a special token rule that matches a specified sequence of code units. They include lexy::dsl::lit and variants, but also lexy::dsl::keyword.

Each literal rule has the same parsing behavior, unless otherwise specified:

Requires

The encoding of the input is a char encoding and compatible with the char type of the rule: Either the char type is the same as the char type of the encoding, or the literal rule matches only ASCII characters.
If it attempts to match a non-ASCII character, the encoding must be UTF-8, UTF-16, or UTF-32.

Matching

If the char type of the rule is the same as the char type of the encoding, compares each code unit in the sequence with the current code unit of the input in order. If they match, one code unit is consumed and the process is repeated.
If the char type of the rule is not the same as the char type of the encoding, all code units in the sequence are ASCII characters. They are transcoded to the target encoding by a static_cast, then it behaves the same as in the case above.

Errors

lexy::expected_literal: if one code unit did not compare equal or the reader reached the end of the input. Its .string() is the code unit sequence, its .index() is the index of the code unit where the mismatch/missing one occurred, and its .position() is the reader position where it started to match the literal.

Parse tree

Single token node with the lexy::predefined_token_kind lexy::literal_token_kind.

Note	As they are token rule, literal rules try to skip whitespace directly following the literal. Use `lexy::dsl::no_whitespace` to prevent that.

Caution

It is not checked whether the code unit sequence is a well-formed string (e.g. that it contains no ill-formed UTF-8), and no normalization or other equivalence checking is done while matching. It is also not checked whether the input contains actual well-formed code units, they are simply compared one by one.

Literal rule `lexy::dsl::lit_c`

lexy/dsl/literal.hpp

namespace lexy::dsl
{
    template <auto C>
    constexpr literal-rule auto lit_c;
}

lit_c<C>, where C is a character type, is a literal rule that matches the single code unit C.

Tip	Literals that match common `punctuators` are pre-defined.

Literal rule `lexy::dsl::lit`

lexy/dsl/literal.hpp

namespace lexy::dsl
{
    template <auto Str>
    constexpr literal-rule auto lit;
}

#define LEXY_LIT(Str) lexy::dsl::lit<Str>

lit is a literal rule that matches the specified sequence of code units.

The macro LEXY_LIT(Str) is equivalent to lit<Str>, except that it also works on older compilers that do not support C++20’s extended NTTPs. Use this instead of lit<Str> if you need to support them.

Example 1. Hello World!

struct production
{
    static constexpr auto rule = LEXY_LIT("Hello World!");
};

Example 2. A different character type, but only ASCII characters

struct production
{
    // The character type doesn't matter if it only contains ASCII characters.
    // The literal is encoded in UTF-16 whereas the (playground) input
    // is encoded in UTF-8, but as its only ASCII characters,
    // lexy will transcode for you.
    static constexpr auto rule = LEXY_LIT(u"Hello World!");
};

Example 3. UTF-8 encoded string literal

struct production
{
    // The string literal contains UTF-8 text,
    // which means the input needs to be UTF-8 encoded as well.
    //
    // WARNING: This will only match if both agree on a normalization for 'ä'!
    static constexpr auto rule = LEXY_LIT(u8"ä");
};

Tip	When using non-ASCII characters in a `lit<Str>` rule, it is best to specify code points with the `\uXXXX` escape sequences and normalize the input before passing it to `lexy`.

Note	While `lit<"int">` would happily consume a prefix of `"integer"`, `lexy::dsl::keyword<"int">(id)`, for a matching `id`, would not. Similar, `lit<"=">` would also consume a prefix of `==`, `lexy::dsl::not_followed_by` can be used to prevent that.

Literal rule `lexy::dsl::lit_b`

lexy/dsl/literal.hpp

namespace lexy::dsl
{
    template <unsigned char ... C>
    constexpr literal-rule auto lit_b;
}

lit_b<C…> is a literal rule that matches the specified sequence of bytes.

Unless all bytes are also valid ASCII characters, it requires that the input encoding is lexy::byte_encoding.

Tip	Use `lexy::dsl::bom` to match a byte-order mark.

Literal rule `lexy::dsl::lit_cp`

lexy/dsl/literal.hpp

namespace lexy::dsl
{
    template <char32_t ... CodePoint>
    constexpr literal-rule auto lit_cp;
}

lit_cp is a literal rule that matches the specific CodePoint sequences expressed as a sequence of code units in the encoding of the input.

It behaves identical to lexy::dsl::lit where Str is determined by encoding all `CodePoint`s in the encoding of the input.

Example 4. Match a smiley face

struct production
{
    static constexpr auto rule = dsl::lit_cp<0x1F642> + dsl::eof;
};

Token rule `lexy::dsl::literal_set`

lexy/dsl/literal.hpp

namespace lexy
{
    struct expected_literal_set {};
}

namespace lexy::dsl
{
    constexpr literal-set literal_set(literal-rule auto ... literals);

    template <typename T>
    constexpr literal-set literal_set(symbol-table<T> symbols);

    constexpr literal-set operator/(literal-set lhs, literal-rule auto rhs);
    constexpr literal-set operator/(literal-set lhs, literal-set auto rhs);
}

#define LEXY_LITERAL_SET(...)

literal_set is a token rule that matches one of the specified literals.

Requires

Each argument is a literal rule.
If one literal rule uses case folding (e.g. lexy::dsl::ascii::case_folding), the other rules either do not use it, or use the same case folding rule; different case foldings cannot be mixed.

Matching

Tries to match each literal rule. If case folding is used, it applies to all rules in the set. Succeeds, if one of the matched, consuming the longest one.

Errors

lexy::expected_literal_set: if none of the literal rules matched; at the original reader position. The rule then fails without consuming anything.

Parse tree

Single token node with the lexy::predefined_token_kind lexy::literal_token_kind.

The second overload creates a literal set that matches all the symbols of the specified lexy::symbol_table. It ignores their respective values.

operator/ can be used to extend a literal set and add more literal rules to it. The resulting literal set matches everything already matched by lhs, as well as rhs.

The macro LEXY_LITERAL_SET(args) is equivalent to literal_set(args), except the type of the individual rules is erased. This can shorten type names in error messages.

Example 5. Match one of the given literals

struct production
{
    static constexpr auto rule //
        = dsl::literal_set(LEXY_LIT("a"), LEXY_LIT("abc"), LEXY_LIT("bc"));
};

Note	The implementation uses a trie to match them efficiently, instead of trying one after the other.

Tip	If you want to match a set of literals but also get information about which one matched, use `lexy::dsl::symbol` instead.

Literal rule lexy::dsl::lit_c

Literal rule lexy::dsl::lit

Literal rule lexy::dsl::lit_b

Literal rule lexy::dsl::lit_cp

Token rule lexy::dsl::literal_set

See also

Literal rule `lexy::dsl::lit_c`

Literal rule `lexy::dsl::lit`

Literal rule `lexy::dsl::lit_b`

Literal rule `lexy::dsl::lit_cp`

Token rule `lexy::dsl::literal_set`