Header lexy/dsl/code_point.hpp

Char class rule lexy::dsl::code_point

lexy/dsl/code_point.hpp
namespace lexy::dsl
{
    class code-point-dsl // models char-class-rule
    {
    public:
        template <typename Predicate>
        constexpr token-rule auto if_() const;

        template <char32_t ... CPs>
        constexpr token-rule auto set() const;
        template <char32_t Low, char32_t High>
        constexpr token-rule auto range() const;

        constexpr token-rule auto ascii() const;
        constexpr token-rule auto bmp() const;
        constexpr token-rule auto noncharacter() const;

        template <lexy::code_point::general_category_t Category>
        constexpr token-rule auto general_category() const;
        template <lexy::code_point::gc-group CategoryGroup>
        constexpr token-rule auto general_category() const;
    };

    constexpr code-point-dsl auto code_point;
}

code_point is a char class rule that matches a specified set of Unicode code points.

code_point

matches an arbitrary scalar code point Its char class name is code-point.

code_point.if_()

matches all code points where the predicate returns true. The predicate must have a constexpr bool operator()(lexy::code_point). Its char class name is the type name of P.

code_point.set()

matches the specified code points. Its name is code-point.set.

code_point.range()

matches the code points in the range [Low, High] (both sides inclusive). Its name is code-point.range.

code_point.ascii()

matches all ASCII code points. Its name is code-point.ASCII.

code_point.bmp()

matches all code points in the BMP. Its name is code-point.BMP.

code_point.noncharacter()

matches all non-character code points. Its name is code-point.noncharacter.

code_point.general_category()

matches all code points with the specified category or category group. Its name is the name of the category. This requires the Unicode database, except for Cc, Cs, and Co, which are fixed.

Example 1. Parse one code point in the inputs encoding
struct production
{
    static constexpr auto rule = dsl::code_point + dsl::eof;
};
Example 2. Parse even code points only
struct production
{
    struct even
    {
        constexpr bool operator()(lexy::code_point cp)
        {
            return cp.value() % 2 == 0;
        }
    };

    static constexpr auto rule = dsl::code_point.if_<even>() + dsl::eof;
};
Caution
As a token rule, it matches whitespace  immediately following the code point. As such, the rule is best used in contexts where automatic whitespace skipping is disabled.
Note
See lexy::dsl::unicode  for common predefined predicates.
Note
.ascii(), .bmp(), and .noncharacter() corresponds to the corresponding member function of lexy::code_point . The other classification functions don’t have rules: * cp.is_valid() and cp.is_scalar() is always true; cp.is_surrogate() is never true. * cp.is_control() is general category Cc. * cp.is_private_use() is general category Co.

See also