Header lexy/dsl/char_class.hpp

Char class builders and combinators.

lexy/dsl/char_class.hpp
template <typename T>
concept char-class-rule = token-rule<T> && ;

A char class rule is a special token rule that matches a single code point from a given set, the char class. Each char class rule has the same parsing behavior:

Requires

The encoding of the input is a char encoding.

Matching
  • If the current code unit is an ASCII character, matches and consumes it. Checks if that character is part of the char class.

  • Otherwise, if the char class contains non-ASCII characters, matches and consumes all code points that form a code point in this encoding. For ASCII, UTF-32, lexy::default_encoding  and lexy::byte_encoding , this is always a single code unit, for UTF-8, this is up to 4 code units, and for UTF-16, this is up to 2 code units. Checks if that code point is part of the char class.

Errors

lexy::expected_char_class  with the name of the char class, at the starting reader position if

  • the current code unit is ASCII but not part of the char class, or

  • the current code point is not part of the char class, or

  • the current code unit(s) do not form a valid code point. This includes surrogates, overlong UTF-8 sequences, or out of range code points. The rule then fails.

For a char class rule, .error  and .kind  are overridden to ensure the resulting token rule is still a char class rule. In the case of .error, the corresponding error type is raised instead.

lexy/dsl/char_class.hpp
template <typename T>
concept literal-char-class-rule;

A literal char class rule is a rule that can behave like a char class. Those are

Literal char class rules can be used with the char class operators. If they specify a non-ASCII character, the input encoding must be ASCII, UTF-8, UTF-16, or UTF-32 (if it is ASCII, the character will never be matched). The exception is lexy::dsl::lit_b  where the encoding must be lexy::default_encoding  or lexy::byte_encoding .

Char class LEXY_CHAR_CLASS

lexy/dsl/char_class.hpp
#define LEXY_CHAR_CLASS(Name, CharClassRule)

LEXY_CHAR_CLASS is a char class rule that is a (re)named version of CharClassRule.

It is equivalent to CharClassRule, but overrides its name (in the error) to be the string literal Name. It also type-erases the type of CharClassRule to shorten it, which can improve error messages and compilation times.

Example 1. Match an atom of an email address
struct production
{
    static constexpr auto atext
        = LEXY_CHAR_CLASS("atext",
                          dsl::ascii::alpha / dsl::ascii::digit / LEXY_LIT("!") / LEXY_LIT("#")
                              / LEXY_LIT("$") / LEXY_LIT("%") / LEXY_LIT("&") / LEXY_LIT("'")
                              / LEXY_LIT("*") / LEXY_LIT("+") / LEXY_LIT("-") / LEXY_LIT("/")
                              / LEXY_LIT("=") / LEXY_LIT("?") / LEXY_LIT("^") / LEXY_LIT("_")
                              / LEXY_LIT("`") / LEXY_LIT("{") / LEXY_LIT("|") / LEXY_LIT("}"));

    static constexpr auto rule = dsl::identifier(atext);
};

Char class union lexy::dsl::operator/ (char class)

lexy/dsl/char_class.hpp
namespace lexy::dsl
{
    constexpr char-class-rule auto operator/(char-class-rule auto lhs,
                                             char-class-rule auto rhs);
    constexpr char-class-rule auto operator/(char-class-rule auto lhs,
                                             literal-char-class-rule auto rhs);
    constexpr char-class-rule auto operator/(literal-char-class-rule auto lhs,
                                             char-class-rule auto rhs);
}

operator/ (char class union) is a char class rule that matches a union of char classes.

It matches all characters that are contained in at least one of the char classes. Its name is union.

Example 2. Match an identifier consisting of upper case characters or digits
struct production
{
    static constexpr auto rule //
        = dsl::identifier(dsl::ascii::upper / dsl::ascii::digit);
};

Char class complement lexy::dsl::operator- (unary)

lexy/dsl/char_class.hpp
namespace lexy::dsl
{
    constexpr char-class-rule auto operator-(char-class-rule auto rule);
    constexpr char-class-rule auto operator-(literal-char-class-rule auto rule);
}

operator- (char class complement) is a char class rule that matches the complement of another char class.

It matches all characters that are not part of the char class rule. Its name is complement.

Example 3. Match non-control characters in a string literal
struct production
{
    static constexpr auto rule = [] {
        // Arbitrary code points that aren't control characters.
        auto c = -dsl::ascii::control;

        return dsl::quoted(c);
    }();
};
Note
For most char classes, operator- does not work with the encoding lexy::default_encoding: For example,-dsl::ascii::control could either mean non-control ASCII characters or non-control Unicode characters. You need to explicitly pick one interpretation with lexy::ascii_encoding or lexy::utf8_encoding.

Char class minus lexy::dsl::operator-

lexy/dsl/char_class.hpp
namespace lexy::dsl
{
    constexpr char-class-rule auto operator-(char-class-rule auto set,
                                             char-class-rule auto minus);
    constexpr char-class-rule auto operator-(char-class-rule auto set,
                                             literal-char-class-rule auto minus);
}

operator- (char class minus) is a char class rule that removes characters from another char class.

It matches all characters that are part of set but not part of minus. Its name is minus.

Example 4. Match upper case characters except for X
struct production
{
    static constexpr auto rule //
        = dsl::identifier(dsl::ascii::upper - dsl::lit_c<'X'>);
};

Char class intersection lexy::dsl::operator&

lexy/dsl/char_class.hpp
namespace lexy::dsl
{
    constexpr char-class-rule auto operator&(char-class-rule auto lhs,
                                             char-class-rule auto rhs);
    constexpr char-class-rule auto operator&(char-class-rule auto lhs,
                                             literal-char-class-rule auto rhs);
    constexpr char-class-rule auto operator&(literal-char-class-rule auto lhs,
                                             char-class-rule auto rhs);
    constexpr char-class-rule auto operator&(literal-char-class-rule auto lhs,
                                             literal-char-class-rule auto rhs);
}

operator& (char class intersection) is a char class rule that matches an intersection of char classes.

It matches all characters that are contained in all of the char classes. Its name is intersection.

Example 5. Match all printable space characters
struct production
{
    static constexpr auto rule //
        = dsl::quoted(dsl::ascii::space & dsl::ascii::print);
};

See also