Header lexy/dsl/unicode.hpp
Char class rules for matching Unicode char classes.
Unicode char classes
lexy/dsl/unicode.hpp
namespace lexy::dsl
{
namespace unicode
{
constexpr char-class-rule auto control;
constexpr char-class-rule auto blank;
constexpr char-class-rule auto newline;
constexpr char-class-rule auto other_space;
constexpr char-class-rule auto space;
constexpr char-class-rule auto digit;
constexpr char-class-rule auto lower;
constexpr char-class-rule auto upper;
constexpr char-class-rule auto alpha;
constexpr char-class-rule auto alpha_digit;
constexpr char-class-rule auto alnum = alpha_digit;
constexpr char-class-rule auto word;
constexpr char-class-rule auto graph;
constexpr char-class-rule auto print;
constexpr char-class-rule auto character;
}
}
These char class rules match one Unicode code point from a char class, as specified in the table below.
Each class is a superset of the corresponding rule in lexy::dsl::ascii
.
They require the Unicode database.
The char classes
Token Rule | Char Class |
---|---|
| |
| |
|
|
|
|
|
|
| |
| |
| |
| |
| |
|
|
| everything but |
| |
| any code point that is assigned (i.e. not |
Caution | Unlike in the ASCII case, alpha is not lower or upper : there are alphabetic characters that don’t have a case. |
Caution | Differentiate between lexy::dsl::unicode::newline , which matches \r or \n and others, and lexy::dsl::newline , which matches \r\n or \n ! |
Caution | As token rules, they match whitespace immediately following the character. As such, the rule is best used in contexts where automatic whitespace skipping is disabled. They can safely be used as part of the whitespace definition. |
Note | There is no dsl::unicode::punct .
The Unicode standard defines it as general category P (Punctuation), which is unsatisfactory as it does not include e.g. $ unlike dsl::ascii::punct (it’s a currency symbol instead).
POSIX includes $ as well as other non-alphabetic symbols, which is unsatisfactory as dsl::unicode::punct would include characters Unicode does not consider punctuation. |
Unicode identifier classes
lexy/dsl/unicode.hpp
namespace lexy::dsl
{
namespace unicode
{
constexpr char-class-rule auto xid_start;
constexpr char-class-rule auto xid_start_underscore;
constexpr char-class-rule auto xid_continue;
}
}
These char class rules match one Unicode code point from the XID_Start
/XID_Continue
character classes.
They are used to parse Unicode-aware lexy::dsl::identifier
.
xid_start
matches any Unicode character that can occur at the beginning of an identifier. It is a superset oflexy::dsl::ascii::alpha
.xid_start_underscore
matchesxid_start
or_
(underscore. It is a superset oflexy::dsl::ascii::alpha_underscore
.xid_continue
matches any Unicode character that can occur after the initial character of an identifier. It is a superset oflexy::dsl::ascii::alpha_digit_underscore
.
They require the Unicode database.
Warning | xid_start does not include _ (underscore)! |