Header lexy/dsl/code_point.hpp

Rules for matching (specific) code points.

Token rule lexy::dsl::code_point

lexy/dsl/code_point.hpp
namespace lexy::dsl
{
    class code-point-dsl // models token-rule
    {
    public:
        template <typename Predicate>
        constexpr token-rule auto if_() const; // see below
    };

    constexpr code-point-dsl auto code_point;
}

code_point is a token rule that matches a single Unicode code point.

Requires

The input encoding is ASCII, UTF-8, UTF-16, or UTF-32. In particular, lexy::default_encoding and lexy::byte_encoding are not supported.

Matching

Matches and consumes all code points that form a code point in this encoding. For ASCII and UTF-32, this is always a single code unit, for UTF-8, this is up to 4 code units, and for UTF-16, this is up to 2 code units.

Errors

lexy::expected_char_class ("<encoding>.code_point"): if the current code unit(s) do not form a valid code point; at the starting reader position. This includes surrogates, overlong UTF-8 sequences, or out of range code points (especially for ASCII). The rule then fails.

Example 1. Parse one code point in the inputs encoding
struct production
{
    static constexpr auto rule = dsl::code_point + dsl::eof;
};
Caution
As a token rule, it matches whitespace immediately following the code point. As such, the rule is best used in contexts where automatic whitespace skipping is disabled.
Note
If the input has been validated, the rule only fails if the reader is at the end of the input.

Token rule lexy::dsl::code_point.if_

lexy/dsl/code_point.hpp
template <std::predicate<lexy::code_point> Predicate>
  requires std::is_default_constructible_v<Predicate>
constexpr token-rule auto if_() const;

code_point.if_ is a token rule that matches a code point fulfilling a given predicate.

Matches

Matches and consumes the normal code_point rule.

Errors
  • lexy::expected_char_class ("<type name of Predicate>"): if Predicate{}(cp) == false, where cp is the code point we have just consumed; at the starting reader position. The rule then fails.

  • All errors raised by the normal code_point rule. The rule then fails.

Example 2. Parse even code points only
struct production
{
    struct even
    {
        constexpr bool operator()(lexy::code_point cp)
        {
            return cp.value() % 2 == 0;
        }
    };

    static constexpr auto rule = dsl::code_point.if_<even>() + dsl::eof;
};
Note
As the rule uses the type name of Predicate in the error, it does not accept a lambda as predicate, but should be called with a named type instead.
Note
In the future, lexy will gain support for specifying code point ranges and Unicode character classes in a more convenient way, as done for lexy::dsl::ascii.
Caution
The same caveat about whitespace as for code_point applies here as well.

See also