Header lexy/dsl/literal.hpp

Token rules that match exact characters.

lexy/dsl/literal.hpp
template <typename T>
concept literal-rule = token-rule<T> && ;

A literal rule is a special token rule that matches a specified sequence of code units. They include lexy::dsl::lit  and variants, but also lexy::dsl::keyword .

Each literal rule has the same parsing behavior, unless otherwise specified:

Requires
  • The encoding of the input is a char encoding and compatible with the char type of the rule: Either the char type is the same as the char type of the encoding, or the literal rule matches only ASCII characters.

  • If it attempts to match a non-ASCII character, the encoding must be UTF-8, UTF-16, or UTF-32.

Matching
  • If the char type of the rule is the same as the char type of the encoding, compares each code unit in the sequence with the current code unit of the input in order. If they match, one code unit is consumed and the process is repeated.

  • If the char type of the rule is not the same as the char type of the encoding, all code units in the sequence are ASCII characters. They are transcoded to the target encoding by a static_cast, then it behaves the same as in the case above.

Errors

lexy::expected_literal : if one code unit did not compare equal or the reader reached the end of the input. Its .string() is the code unit sequence, its .index() is the index of the code unit where the mismatch/missing one occurred, and its .position() is the reader position where it started to match the literal.

Parse tree

Single token node with the lexy::predefined_token_kind  lexy::literal_token_kind.

Note
As they are token rule, literal rules try to skip whitespace directly following the literal. Use lexy::dsl::no_whitespace  to prevent that.
Caution
It is not checked whether the code unit sequence is a well-formed string (e.g. that it contains no ill-formed UTF-8), and no normalization or other equivalence checking is done while matching. It is also not checked whether the input contains actual well-formed code units, they are simply compared one by one.

Literal rule lexy::dsl::lit_c

lexy/dsl/literal.hpp
namespace lexy::dsl
{
    template <auto C>
    constexpr literal-rule auto lit_c;
}

lit_c<C>, where C is a character type, is a literal rule that matches the single code unit C.

Tip
Literals that match common punctuators  are pre-defined.

Literal rule lexy::dsl::lit

lexy/dsl/literal.hpp
namespace lexy::dsl
{
    template <auto Str>
    constexpr literal-rule auto lit;
}

#define LEXY_LIT(Str) lexy::dsl::lit<Str>

lit is a literal rule that matches the specified sequence of code units.

The macro LEXY_LIT(Str) is equivalent to lit<Str>, except that it also works on older compilers that do not support C++20’s extended NTTPs. Use this instead of lit<Str> if you need to support them.

Example 1. Hello World!
struct production
{
    static constexpr auto rule = LEXY_LIT("Hello World!");
};
Example 2. A different character type, but only ASCII characters
struct production
{
    // The character type doesn't matter if it only contains ASCII characters.
    // The literal is encoded in UTF-16 whereas the (playground) input
    // is encoded in UTF-8, but as its only ASCII characters,
    // lexy will transcode for you.
    static constexpr auto rule = LEXY_LIT(u"Hello World!");
};
Example 3. UTF-8 encoded string literal
struct production
{
    // The string literal contains UTF-8 text,
    // which means the input needs to be UTF-8 encoded as well.
    //
    // WARNING: This will only match if both agree on a normalization for 'ä'!
    static constexpr auto rule = LEXY_LIT(u8"ä");
};
Tip
When using non-ASCII characters in a lit<Str> rule, it is best to specify code points with the \uXXXX escape sequences and normalize the input before passing it to lexy.
Note
While lit<"int"> would happily consume a prefix of "integer", lexy::dsl::keyword <"int">(id), for a matching id, would not. Similar, lit<"="> would also consume a prefix of ==, lexy::dsl::not_followed_by  can be used to prevent that.

Literal rule lexy::dsl::lit_b

lexy/dsl/literal.hpp
namespace lexy::dsl
{
    template <unsigned char ... C>
    constexpr literal-rule auto lit_b;
}

lit_b<C…​> is a literal rule that matches the specified sequence of bytes.

Unless all bytes are also valid ASCII characters, it requires that the input encoding is lexy::byte_encoding .

Tip
Use lexy::dsl::bom  to match a byte-order mark.

Literal rule lexy::dsl::lit_cp

lexy/dsl/literal.hpp
namespace lexy::dsl
{
    template <char32_t ... CodePoint>
    constexpr literal-rule auto lit_cp;
}

lit_cp is a literal rule that matches the specific CodePoint sequences expressed as a sequence of code units in the encoding of the input.

It behaves identical to lexy::dsl::lit  where Str is determined by encoding all `CodePoint`s in the encoding of the input.

Example 4. Match a smiley face
struct production
{
    static constexpr auto rule = dsl::lit_cp<0x1F642> + dsl::eof;
};

Token rule lexy::dsl::literal_set

lexy/dsl/literal.hpp
namespace lexy
{
    struct expected_literal_set {};
}

namespace lexy::dsl
{
    constexpr literal-set literal_set(literal-rule auto ... literals);

    template <typename T>
    constexpr literal-set literal_set(symbol-table<T> symbols);

    constexpr literal-set operator/(literal-set lhs, literal-rule auto rhs);
    constexpr literal-set operator/(literal-set lhs, literal-set auto rhs);
}

#define LEXY_LITERAL_SET(...)

literal_set is a token rule that matches one of the specified literals.

Requires
  • Each argument is a literal rule.

  • If one literal rule uses case folding (e.g. lexy::dsl::ascii::case_folding ), the other rules either do not use it, or use the same case folding rule; different case foldings cannot be mixed.

Matching

Tries to match each literal rule. If case folding is used, it applies to all rules in the set. Succeeds, if one of the matched, consuming the longest one.

Errors

lexy::expected_literal_set: if none of the literal rules matched; at the original reader position. The rule then fails without consuming anything.

Parse tree

Single token node with the lexy::predefined_token_kind  lexy::literal_token_kind.

The second overload creates a literal set that matches all the symbols of the specified lexy::symbol_table . It ignores their respective values.

operator/ can be used to extend a literal set and add more literal rules to it. The resulting literal set matches everything already matched by lhs, as well as rhs.

The macro LEXY_LITERAL_SET(args) is equivalent to literal_set(args), except the type of the individual rules is erased. This can shorten type names in error messages.

Example 5. Match one of the given literals
struct production
{
    static constexpr auto rule //
        = dsl::literal_set(LEXY_LIT("a"), LEXY_LIT("abc"), LEXY_LIT("bc"));
};
Note
The implementation uses a trie to match them efficiently, instead of trying one after the other.
Tip
If you want to match a set of literals but also get information about which one matched, use lexy::dsl::symbol  instead.

See also