Header lexy/dsl/integer.hpp
The integer
, code_point_id
and code_unit_id
rules.
lexy::integer_traits
lexy/dsl/integer.hpp
namespace lexy
{
template <typename T>
struct integer_traits;
template <typename Integer>
requires std::is_integral_v<Integer> && !std::is_same_v<Integer, bool>
struct integer_traits<Integer> { … };
template <>
struct integer_traits<lexy::code_point> { … };
template <typename T>
struct unbounded {};
template <typename T>
struct integer_traits<unbounded<T>> { … };
template <typename T, T Max>
struct bounded {};
template <typename T, T Max>
struct integer_traits<bounded<T, Max>> { … };
}
The class template integer_traits
gives information about integers that are used to parse them.
Every specialization must have members that look the following with the indicated semantic meaning:
lexy/dsl/integer.hpp
struct integer-traits-specialization
{
// The actual integer type that is being parsed;
// it is usually the template parameter itself.
// type(0)
must be a valid expression that creates the zero integer.
using type = some-type;
// If true
, type
has a maximal value it can store.
// If false
, type
is unbounded and range checks during parsing are disabled.
static constexpr bool is_bounded;
// Precondition: digit < Radix
.
// Effects: result = result * Radix + digit
or equivalent.
template <int Radix>
static constexpr void add_digit_unchecked(type& result, unsigned digit);
};
If is_bounded == true
, it must also have the following additional members:
lexy/dsl/integer.hpp
struct integer-traits-bounded-specialization
{
// The number of digits necessary to write the maximal value of type
in the given Radix
.
template <int Radix>
static constexpr std::size_t max_digit_count;
// Precondition: digit < Radix
.
// Effects: result = result * Radix + digit
or equivalent.
// Returns: true
if the resulting value fits in type
, false
if an overflow occurred or would occurr.
template <int Radix>
static constexpr bool add_digit_checked(type& result, unsigned digit);
};
The integer traits are specialized for the built-in integral types (except bool
) with the expected semantics,
as well as lexy::code_point
.
The specialization for lexy::unbounded<T>
behaves the same as the one for plain T
,
except that is_bounded == false
:
Integer overflow can happen and has the usual C++ semantics (UB for signed, wrapping for unsigned).
The specialization for lexy::bounded<T, Max>
behaves the same as the one for plain T
, except that the maximal accepted integer value has been lowered to Max
instead of std::numeric_limits<T>:::max()
.
Everything greater than Max
is considered to be overflow.
Tip | Use lexy::unbounded<T> when you know that overflow is impossible,
e.g. because you’re parsing two hexadecimal digits as an unsigned char . |
Rule lexy::dsl::integer
lexy/dsl/integer.hpp
namespace lexy
{
struct integer_overflow {};
}
namespace lexy::dsl
{
template <typename T, typename Base>
constexpr branch-rule auto integer(token-rule auto digits);
template <typename T>
constexpr branch-rule auto integer(digits-dsl digits);
template <typename T>
constexpr branch-rule auto integer(ndigits-dsl digits);
template <typename T>
constexpr auto integer = integer<T>(digits<>);
template <typename T, typename Base>
constexpr auto integer = integer<T>(digits<Base>);
}
integer
is a branch rule that parses a sequence of digits as an integer.
- Requires
T
is a type with a specialization oflexy::integer_traits
.Base
is one of the supportedbases
. Ifdigits
is some instantiation oflexy::dsl::digits
orlexy::dsl::n_digits
,Base
is deduced and must not be specified.
- Parsing
Parses
digits
, which defaults tolexy::dsl::digits
, with the base defaulting tolexy::dsl::decimal
.- Branch parsing
Tries to parse
digits
, backtracks if that backtracks. It will not backtrack on integer overflow.- Errors
lexy::integer_overflow
: if converting the consumed digits to an integer leads to overflow. Its range covers everything consumed bydigits
. The rule then recovers without consuming additional input; the integer value produced is the last value it had before the overflow occurred.All errors raised by parsing
digits
. The rule then recovers by consuming as much additional digits of theBase
as possible. Ifdigits
is a known instantiation with a separator, it will also skip separators. This happens without any validation for trailing separators or leading zeros. Recovery fails ifdigits
and this recovery process combined haven’t consumed any input. Otherwise, it converts everything consumed and recovery succeeds.
- Values
First produces all values from parsing
digits
. Then produces the integer of typeT
by iterating over the code units consumed bydigits
and handling them as follows: If a code unit is a valid digit ofBase
, its numerical value is determined and the resulting digit added to the result usinglexy::integer_traits
. Otherwise, the code unit is ignored without any additional validation.
Rule lexy::dsl::code_point_id
lexy/dsl/integer.hpp
namespace lexy
{
struct invalid_code_point {};
}
namespace lexy::dsl
{
template <std::size_t N, typename Base = hex>
constexpr branch-rule auto code_point_id;
}
code_point_id
is a branch rule that parses a sequence of N
digits as a lexy::code_point
.
code_point_id<N, Base>
behaves almost exactly like integer<lexy::code_point>(n_digits<N, Base>)
.
The only difference is that integer overflows raises a generic error with tag lexy::invalid_code_point
as opposed to lexy::integer_overflow
.
struct production
{
static constexpr auto rule = [] {
return LEXY_LIT("\\u") >> dsl::code_point_id<4> // \uXXXX
| LEXY_LIT("\\U") >> dsl::code_point_id<8>; // \uXXXXXXXX
}();
// Encode the resulting code point as UTF-8.
static constexpr auto value = lexy::as_string<std::string, lexy::utf8_encoding>;
};
Caution | The rule still recovers from a lexy::invalid_code_point .
The lexy::code_point produced might be invalid in that case, i.e. .is_invalid() == true . |
Rule lexy::dsl::code_unit_id
lexy/dsl/integer.hpp
namespace lexy
{
struct invalid_code_unit {};
}
namespace lexy::dsl
{
template <encoding Encoding, std::size_t N, typename Base = hex>
constexpr branch-rule auto code_unit_id;
}
code_unit_id
is a branch rule that parses a sequence of N
digits as a code unit of the specified encoding.
code_unit_id<Encoding, N, Base>
behaves almost exactly like integer<typename Encoding::char_type>(n_digits<N, Base>)
.
The only difference is that integer overflows raises a generic error with tag lexy::invalid_code_unit
as opposed to lexy::integer_overflow
.
Caution | The rule still recovers from a lexy::invalid_code_unit .
The code unit produced has been truncated (somehow) in that case. |