Header lexy/dsl/literal.hpp

Token rules that match exact characters.

Token rule lexy::dsl::lit

lexy/dsl/literal.hpp
namespace lexy::dsl
{
    template <auto Str>
    constexpr token-rule auto lit;
}

#define LEXY_LIT(Str) lexy::dsl::lit<Str>

lit is a token rule that matches the specified sequence of characters.

Requires

Str is a string literal of some character type CharT, that is compatible with the input encoding, i.e. it must be one of the two cases described below.

Matching
  1. If CharT is the same as character type of the input encoding, lit<Str> compares each code unit of Str in order with the next code unit of the reader and consumes it. It is not checked whether Str is a well-formed string (e.g. that it contains no ill-formed UTF-8), and no normalization or other equivalence checking is done.

  2. Otherwise, if CharT is char, Str must only contain ASCII characters (i.e. 0x00-0x7F). lit<Str> converts each ASCII character to the input encoding with a simply static_cast and then proceeds as done in case 1.

Errors

lexy::expected_literal: if one code unit did not compare equal or the reader reached the end of the input. Its .string() is Str, its .index() is the index of the code unit where the mismatch/missing one occurred, and its .position() is the reader position where it started to match the literal.

The macro LEXY_LIT(Str) is equivalent to lit<Str>, except that it also works on older compilers that do not support C++20’s extended NTTPs. Use this instead of lit<Str> if you need to support them.

Example 1. Hello World!
struct production
{
    static constexpr auto rule = LEXY_LIT("Hello World!");
};
Example 2. A different character type, but only ASCII characters
struct production
{
    // The character type doesn't matter if it only contains ASCII characters.
    // The literal is encoded in UTF-16 whereas the (playground) input
    // is encoded in UTF-8, but as its only ASCII characters,
    // lexy will transcode for you.
    static constexpr auto rule = LEXY_LIT(u"Hello World!");
};
Example 3. UTF-8 encoded string literal
struct production
{
    // The string literal contains UTF-8 text,
    // which means the input needs to be UTF-8 encoded as well.
    //
    // WARNING: This will only match if both agree on a normalization for 'ä'!
    static constexpr auto rule = LEXY_LIT(u8"ä");
};
Tip
When using non-ASCII characters in a lit<Str> rule, it is best to specify code points with the \uXXXX escape sequences and normalize the input before passing it to lexy.
Note
As a token rule, lit<Str> tries to skip whitespace directly following the literal. Use lexy::dsl::no_whitespace to prevent that.
Note
While lit<"int"> would happily consume a prefix of "integer", lexy::dsl::keyword[<"int">(id)], for a matching id, would not.

Token rule lexy::dsl::lit_c

lexy/dsl/literal.hpp
namespace lexy::dsl
{
    template <auto C>
    constexpr token-rule auto lit_c;
}

lit_c<C>, where C is a character type, is equivalent to lit<Str>, where Str is the string literal consisting of the single character C.

The same restrictions on character type apply.

Tip
Literals that match common punctuators are pre-defined.

Token rule lexy::dsl::lit_b

lexy/dsl/literal.hpp
namespace lexy::dsl
{
    template <unsigned char ... C>
    constexpr token-rule auto lit_b;
}

lit_b<C…​> matches a sequence of bytes.

Requires

The encoding must be a single-byte encoding, such as lexy::default_encoding or lexy::byte_encoding.

Matching

Matches and consumes the exact sequence of bytes.

Errors

lexy::expected_literal: if one code unit did not compare equal or the reader reached the end of the input. Its .string() is the string of the bytes converted to the correct character type, its .index() is the index of the code unit where the mismatch/missing one occurred, and its .position() is the reader position where it started to match the literal.

Tip
Use lexy::dsl::bom to match a byte-order mark.

See also