Header lexy/dsl/integer.hpp

The integer, code_point_id and code_unit_id rules.

lexy::integer_traits

lexy/dsl/integer.hpp
namespace lexy
{
    template <typename T>
    struct integer_traits;

    template <typename Integer>
        requires std::is_integral_v<Integer> && !std::is_same_v<Integer, bool>
    struct integer_traits<Integer> {  };

    template <>
    struct integer_traits<lexy::code_point> {  };

    template <typename T>
    struct unbounded {};
    template <typename T>
    struct integer_traits<unbounded<T>> {  };

    template <typename T, T Max>
    struct bounded {};
    template <typename T, T Max>
    struct integer_traits<bounded<T, Max>> {  };
}

The class template integer_traits gives information about integers that are used to parse them.

Every specialization must have members that look the following with the indicated semantic meaning:

lexy/dsl/integer.hpp
struct integer-traits-specialization
{
    // The actual integer type that is being parsed;
    // it is usually the template parameter itself.
    // type(0) must be a valid expression that creates the zero integer.
    using type = some-type;

    // If true, type has a maximal value it can store.
    // If false, type is unbounded and range checks during parsing are disabled.
    static constexpr bool is_bounded;

    // Precondition: digit < Radix.
    // Effects: result = result * Radix + digit or equivalent.
    template <int Radix>
    static constexpr void add_digit_unchecked(type& result, unsigned digit);
};

If is_bounded == true, it must also have the following additional members:

lexy/dsl/integer.hpp
struct integer-traits-bounded-specialization
{
    // The number of digits necessary to write the maximal value of type in the given Radix.
    template <int Radix>
    static constexpr std::size_t max_digit_count;

    // Precondition: digit < Radix.
    // Effects: result = result * Radix + digit or equivalent.
    // Returns: true if the resulting value fits in type, false if an overflow occurred or would occurr.
    template <int Radix>
    static constexpr bool add_digit_checked(type& result, unsigned digit);
};

The integer traits are specialized for the built-in integral types (except bool) with the expected semantics, as well as lexy::code_point .

The specialization for lexy::unbounded<T> behaves the same as the one for plain T, except that is_bounded == false: Integer overflow can happen and has the usual C++ semantics (UB for signed, wrapping for unsigned).

The specialization for lexy::bounded<T, Max> behaves the same as the one for plain T, except that the maximal accepted integer value has been lowered to Max instead of std::numeric_limits<T>:::max(). Everything greater than Max is considered to be overflow.

Tip
Use lexy::unbounded<T> when you know that overflow is impossible, e.g. because you’re parsing two hexadecimal digits as an unsigned char.

Rule lexy::dsl::integer

lexy/dsl/integer.hpp
namespace lexy
{
    struct integer_overflow {};
}

namespace lexy::dsl
{
    template <typename T, typename Base>
    constexpr branch-rule auto integer(token-rule auto digits);

    template <typename T>
    constexpr branch-rule auto integer(digits-dsl  digits);
    template <typename T>
    constexpr branch-rule auto integer(ndigits-dsl digits);

    template <typename T>
    constexpr auto integer = integer<T>(digits<>);
    template <typename T, typename Base>
    constexpr auto integer = integer<T>(digits<Base>);
}

integer is a branch rule that parses a sequence of digits as an integer.

Requires
Parsing

Parses digits, which defaults to lexy::dsl::digits , with the base defaulting to lexy::dsl::decimal.

Branch parsing

Tries to parse digits, backtracks if that backtracks. It will not backtrack on integer overflow.

Errors
  • lexy::integer_overflow: if converting the consumed digits to an integer leads to overflow. Its range covers everything consumed by digits. The rule then recovers without consuming additional input; the integer value produced is the last value it had before the overflow occurred.

  • All errors raised by parsing digits. The rule then recovers by consuming as much additional digits of the Base as possible. If digits is a known instantiation with a separator, it will also skip separators. This happens without any validation for trailing separators or leading zeros. Recovery fails if digits and this recovery process combined haven’t consumed any input. Otherwise, it converts everything consumed and recovery succeeds.

Values

First produces all values from parsing digits. Then produces the integer of type T by iterating over the code units consumed by digits and handling them as follows: If a code unit is a valid digit of Base, its numerical value is determined and the resulting digit added to the result using lexy::integer_traits. Otherwise, the code unit is ignored without any additional validation.

Example 1. Parse an int
struct production
{
    static constexpr auto rule = [] {
        auto digits = dsl::digits<>.sep(dsl::digit_sep_tick).no_leading_zero();
        return dsl::integer<int>(digits);
    }();

    static constexpr auto value = lexy::as_integer<int>;
};

Rule lexy::dsl::code_point_id

lexy/dsl/integer.hpp
namespace lexy
{
    struct invalid_code_point {};
}

namespace lexy::dsl
{
    template <std::size_t N, typename Base = hex>
    constexpr branch-rule auto code_point_id;
}

code_point_id is a branch rule that parses a sequence of N digits as a lexy::code_point .

code_point_id<N, Base> behaves almost exactly like integer<lexy::code_point>(n_digits<N, Base>). The only difference is that integer overflows raises a generic error with tag lexy::invalid_code_point as opposed to lexy::integer_overflow.

Example 2. Parse a code point value
struct production
{
    static constexpr auto rule = [] {
        return LEXY_LIT("\\u") >> dsl::code_point_id<4>    // \uXXXX
               | LEXY_LIT("\\U") >> dsl::code_point_id<8>; // \uXXXXXXXX
    }();

    // Encode the resulting code point as UTF-8.
    static constexpr auto value = lexy::as_string<std::string, lexy::utf8_encoding>;
};
Caution
The rule still recovers from a lexy::invalid_code_point. The lexy::code_point produced might be invalid in that case, i.e. .is_invalid() == true.

Rule lexy::dsl::code_unit_id

lexy/dsl/integer.hpp
namespace lexy
{
    struct invalid_code_unit {};
}

namespace lexy::dsl
{
    template <encoding Encoding, std::size_t N, typename Base = hex>
    constexpr branch-rule auto code_unit_id;
}

code_unit_id is a branch rule that parses a sequence of N digits as a code unit of the specified encoding.

code_unit_id<Encoding, N, Base> behaves almost exactly like integer<typename Encoding::char_type>(n_digits<N, Base>). The only difference is that integer overflows raises a generic error with tag lexy::invalid_code_unit as opposed to lexy::integer_overflow.

Example 3. Parse a code point value
struct production
{
    // String with \xNN escape sequence.
    static constexpr auto rule
        = dsl::quoted(dsl::ascii::print,
                      dsl::backslash_escape.rule(LEXY_LIT("x")
                                                 >> dsl::code_unit_id<lexy::utf8_encoding, 2>));

    static constexpr auto value = lexy::as_string<std::string, lexy::utf8_encoding>;
};
Caution
The rule still recovers from a lexy::invalid_code_unit. The code unit produced has been truncated (somehow) in that case.

See also