Header lexy/dsl/char_class.hpp
Char class builders and combinators.
lexy/dsl/char_class.hpp
template <typename T>
concept char-class-rule = token-rule<T> && …;
A char class rule is a special token rule that matches a single code point from a given set, the char class. Each char class rule has the same parsing behavior:
- Requires
The encoding of the input is a char encoding.
- Matching
If the current code unit is an ASCII character, matches and consumes it. Checks if that character is part of the char class.
Otherwise, if the char class contains non-ASCII characters, matches and consumes all code points that form a code point in this encoding. For ASCII, UTF-32,
lexy::default_encoding
andlexy::byte_encoding
, this is always a single code unit, for UTF-8, this is up to 4 code units, and for UTF-16, this is up to 2 code units. Checks if that code point is part of the char class.
- Errors
lexy::expected_char_class
with the name of the char class, at the starting reader position ifthe current code unit is ASCII but not part of the char class, or
the current code point is not part of the char class, or
the current code unit(s) do not form a valid code point. This includes surrogates, overlong UTF-8 sequences, or out of range code points. The rule then fails.
For a char class rule, .error
and .kind
are overridden to ensure the resulting token rule is still a char class rule.
In the case of .error
, the corresponding error type is raised instead.
lexy/dsl/char_class.hpp
template <typename T>
concept literal-char-class-rule;
A literal char class rule is a rule that can behave like a char class. Those are
a
lexy::dsl::lit
(orlexy::dsl::lit_c
) rule matching a single ASCII character,a
lexy::dsl::lit
(orlexy::dsl::lit_c
) rule whose char type ischar32_t
matching a single character,a
lexy::dsl::lit_b
rule matching an arbitrary byte,a
lexy::dsl::lit_cp
rule, andlexy::dsl::operator/ (char class)
combinations of the above.
Literal char class rules can be used with the char class operators.
If they specify a non-ASCII character, the input encoding must be ASCII, UTF-8, UTF-16, or UTF-32 (if it is ASCII, the character will never be matched).
The exception is lexy::dsl::lit_b
where the encoding must be lexy::default_encoding
or lexy::byte_encoding
.
Char class LEXY_CHAR_CLASS
lexy/dsl/char_class.hpp
#define LEXY_CHAR_CLASS(Name, CharClassRule)
LEXY_CHAR_CLASS
is a char class rule that is a (re)named version of CharClassRule
.
It is equivalent to CharClassRule
, but overrides its name (in the error) to be the string literal Name
.
It also type-erases the type of CharClassRule
to shorten it, which can improve error messages and compilation times.
struct production
{
static constexpr auto atext
= LEXY_CHAR_CLASS("atext",
dsl::ascii::alpha / dsl::ascii::digit / LEXY_LIT("!") / LEXY_LIT("#")
/ LEXY_LIT("$") / LEXY_LIT("%") / LEXY_LIT("&") / LEXY_LIT("'")
/ LEXY_LIT("*") / LEXY_LIT("+") / LEXY_LIT("-") / LEXY_LIT("/")
/ LEXY_LIT("=") / LEXY_LIT("?") / LEXY_LIT("^") / LEXY_LIT("_")
/ LEXY_LIT("`") / LEXY_LIT("{") / LEXY_LIT("|") / LEXY_LIT("}"));
static constexpr auto rule = dsl::identifier(atext);
};
Char class union lexy::dsl::operator/ (char class)
lexy/dsl/char_class.hpp
namespace lexy::dsl
{
constexpr char-class-rule auto operator/(char-class-rule auto lhs,
char-class-rule auto rhs);
constexpr char-class-rule auto operator/(char-class-rule auto lhs,
literal-char-class-rule auto rhs);
constexpr char-class-rule auto operator/(literal-char-class-rule auto lhs,
char-class-rule auto rhs);
}
operator/
(char class union) is a char class rule that matches a union of char classes.
It matches all characters that are contained in at least one of the char classes.
Its name is union
.
Char class complement lexy::dsl::operator- (unary)
lexy/dsl/char_class.hpp
namespace lexy::dsl
{
constexpr char-class-rule auto operator-(char-class-rule auto rule);
constexpr char-class-rule auto operator-(literal-char-class-rule auto rule);
}
operator-
(char class complement) is a char class rule that matches the complement of another char class.
It matches all characters that are not part of the char class rule
.
Its name is complement
.
Note | For most char classes, operator- does not work with the encoding lexy::default_encoding :
For example,-dsl::ascii::control could either mean non-control ASCII characters or non-control Unicode characters.
You need to explicitly pick one interpretation with lexy::ascii_encoding or lexy::utf8_encoding . |
Char class minus lexy::dsl::operator-
lexy/dsl/char_class.hpp
namespace lexy::dsl
{
constexpr char-class-rule auto operator-(char-class-rule auto set,
char-class-rule auto minus);
constexpr char-class-rule auto operator-(char-class-rule auto set,
literal-char-class-rule auto minus);
}
operator-
(char class minus) is a char class rule that removes characters from another char class.
It matches all characters that are part of set
but not part of minus
.
Its name is minus
.
Char class intersection lexy::dsl::operator&
lexy/dsl/char_class.hpp
namespace lexy::dsl
{
constexpr char-class-rule auto operator&(char-class-rule auto lhs,
char-class-rule auto rhs);
constexpr char-class-rule auto operator&(char-class-rule auto lhs,
literal-char-class-rule auto rhs);
constexpr char-class-rule auto operator&(literal-char-class-rule auto lhs,
char-class-rule auto rhs);
constexpr char-class-rule auto operator&(literal-char-class-rule auto lhs,
literal-char-class-rule auto rhs);
}
operator&
(char class intersection) is a char class rule that matches an intersection of char classes.
It matches all characters that are contained in all of the char classes.
Its name is intersection
.