dsl::scan: it will now be invoked with the previously produced values.
dsl::parse_asto ensure that a rule always produces a value (e.g. when combined with the
lexy::lexeme_inputto support multi-pass parsing.
dsl::terminator(term)(branch)into a branch rule, as opposed to being a plain rule (#74).
lexy::forward<void>into a sink.
Support references in
Fix bug that prevented
lexy::parsewith a root production whose value is
Fix bug that caused infinite template instantiations for recursive scans.
Fix bug that didn’t skip whitespace in
lexy::scannerfor token productions.
|The following changelog items track the historic development; only breaking changes are documented.|
dsl::lit_cp in a char class now requires a Unicode encoding; use
dsl::lit_b to support default/byte encoding.
lexy::token_productionthat define a
::whitespacemember now skip whitespace in the direct rule as well. Previously, it would only apply the whitespace rule to child productions but not the production itself.
BEHAVIOR CHANGE: production rules that define a
::whitespacemember now skip whitespace before parsing. This also applies to the root production, so whitespace at the beginning of the input is now skipped automatically.
::whitespacemember is now longer a branch rule: as it will now skip whitespace first, it can’t be used as a branch condition.
dsl::whitespace(no arguments); it’s now unnecessary as initial whitespace is skipped automatically.
dsl::capture()is removed. If you’re using
dsl::capture_token()you need to rename it to
dsl::capture()(compile error). If you’re using
dsl::capture()on a non-token rule, you need to use
dsl::scaninstead and manually produce the value (compile error). If you’re using
dsl::capture()on a token, this will no longer capture trailing whitespace (silent behavior change). I can’t imagine a situation where capturing trailing whitespace was intended.
BEHAVIOR CHANGE: if a non-root production defines a
::whitespacemember, it will now also apply to all children. Previously, it would only apply to the production that defined the member, and not it’s children (except if it was a token production).
dsl::eolin the newline case) generate a token node with the
lexy::eol_token_kindhave been removed.
dsl::eolare now branch rules: replace
Require a char class rule in
dsl::delimited(): instead of
Require literal rules in
.limit()of error recovery rules.
Require literal rules in
.reserve()and variants of
dsl::bomnow generates a
lexy::expected_literalerror instead of
BEHAVIOR CHANGE: the introduction of char class rules changes error messages and token kinds in some situations.
dsl::lit_cp<Cp>and moved to
Require char classes in
operator-for tokens; removed
Require char classes in
dsl::integer now uses
lexy::digits_token_kind instead of
lexy::error_token_kind during recovery.
dsl::lit_b now require
lexy/input_location.hpp instead, which has a different interface but more functionality.
Added more pre-defined token kinds: for example, tokens created by
LEXY_LIT()now have their own literal token kind. This breaks code that does not use user-defined token kinds and does matching on
dsl::delimited()now merges adjacent characters into a single
lexy::lexemethat is passed to the sink.
lexy::token_productionnow longer merges adjacent tokens, but
dsl::delimited()merges character tokens.
Terminator rules are no longer branch rules; this behavior was somewhat confusing. If you need branch rules, you can manually write the equivalent rules.
dsl::integer()now requires a token rule. This ensures the correct behavior in combination with whitespace skipping.
BEHAVIOR CHANGE: branch parsing an identifier will now backtrack without raising an error if it can match an identifier, but it is reserved. Previously, this would not backtrack and then raise an error (but trivially recover). This behavior is consistent with
Removed branch functionality of token sequence (again). It was already removed once as it was unimplementable due to automatic whitespace skipping, but then re-implemented later on. But as it turns out, it is in fact unimplementable and the current implementation was completely broken. Instead of
tok1 + tok2 >> rule1 | tok1 + tok3 >> rule2use
tok1 >> (tok2 >> rule1 | tok3 >> rule2).
dsl::encode(). The rule was completely broken in combination with
dsl::capture()and rules built on top like
BEHAVIOR CHANGE: error recovery now produces a new error token in the parse tree. This ensures that the parse tree stays lossless even in the presence of errors.
dsl::find()now always raise the recovery events. If you’re using them outside of
dsl::try_(), this is not what you want, so don’t do them - they’re not meant for it.
lexy::read_file_result is no longer an input; you need to call
.buffer() when passing it to a parse action.
lexy::parse_as_tree()to new header
cancel_production, and its
Moved callback adapters and composition into new header files, but still implicitly included by
Removed overload of
lexy::bindthat takes a sink; bind individual items in a separate production instead.
Removed unneeded overloads of
lexy::as_sinkand changed the transcoding behavior: It will now only use the pointer + size constructor if the character types match and no longer
Simplified and minimized interface of the input classes, removing e.g. iterators from them.
Moved definition of
encoding.hppto new header
dsl::else_into a tag object that can only be used with
operator>>, instead of a stand-alone rule.
dsl::lookahead()are no longer no-ops when used outside a branch condition. Instead, they will perform lookahead and raise an error if that fails.
Improved and simplified interface for
dsl::context_counter: instead of
.compare(), you now use
.is()as a branch condition, and instead of
.require(), you now use
lexy::fold[_inplace]is now longer a callback, only a sink; use
lexy::callback(lexy::fold(…))to turn it into a callback if needed.
lexy::nulloptinstead of an empty sink result if the list has no items. If you’re using pre-defined callbacks like
lexy::as_string, it continues to work as expected. If you’re using
sink >> callback,
callbacknow requires one overload that takes
operator|) is no longer a branch rule if it would be an unconditional branch rule;
using an unconditional choice as a branch is almost surely a bug.
dsl::id; use a separate production instead.
lexy::sink; instead of
dsl::twiceno longer produce an array, but instead all values individually. Use
lexy::foldinstead of a loop.
lexy_ext/shell.hpp, with the namespace change to
BEHAVIOR CHANGE: Don’t produce a tag value if no sign was present in
dsl::[minus/plus_]sign. If you use
lexy::as_integeras callback, this doesn’t affect you.
BEHAVIOR CHANGE: Don’t consume input in
BEHAVIOR CHANGE: Produce only a single whitespace node in parse tree, instead of the individual token nodes. Prohibited
dsl::recurseinside the whitespace rule.
-1. Also changed callback
The alternative rule
/now tries to find the longest match instead of the first one. If it was well-specified before, this doesn’t change anything.
dsl::switch_(); use the new
dsl::symbol()instead which is more efficient as well.
dsl::escape(); use the new
Restructure callback header files; an
#include <lexy/callback.hpp>might be necessary now.
Support empty token nodes in the parse tree if they don’t have an unknown kind. In particular, the parse tree will now contain an EOF node at the end.
lexy::unknown_token_kindinto a value (as opposed to the type it was before).
Changed the return type of
lexy_ext::read_file()) to use a new
Changed the return type of
lexy::parse_as_tree()to a new
Changed the return type of
lexy::parse()to a new
An error callback that returns a non-void type must now be a sink. Use
lexy::collect<Container>(error_callback)to create a sink that stores all results in the container. If the error callback returns void, no change is required.
dsl::sep()now has that behavior as well.
dsl::prevent()now recover from errors, which might lead to worse error messages in certain situations. If they’re used as intended — to create a better error message if something didn’t work out — this shouldn’t happen.
Removed empty state from
lexy::result. It was only added because it was useful internally, but this is no longer the case.
Reverted optimization that merged multiple lexemes in the sink/tokens of
dsl::delimited(). Tokens are instead now automatically merged by the parse tree builder if direct children of a
dsl::switch_(rule).case_()now requires a branch of the form
token >> rule, previously it could take an arbitrary branch.
Unified error interface:
.error<Tag>(e.g. for tokens,
BEHAVIOR CHANGE: if
dsl::code_point_idoverflows, the tag is now
Replaced use of
const char*in all user facing functions. As a consequence, automatic type name now requires GCC > 8.
lexy::make_error_location(). It has been replaced by
Removed support for arbitrary rules as content of a
dsl::delimited() rule, no only tokens are allowed.
Also removed support for an escape choice in the
dsl::delimited() rule, it must be a branch now.
As a related change, the sink will now be invoked with a
lexy::lexeme that can span multiple occurrences of the content token,
not multiple times (one lexeme per token occurrence) as it was previously.
This means that a
dsl::quoted(dsl::code_point) rule will now invoke the sink only once giving it a
lexy::lexeme that spans the entire content of the string literal.
Previously it was invoked once per
Limited implicit conversion of
lexy::nullopt to types that are like
std::optional or pointers.
rule | lexy::dsl::value_t<T> to keep the previous behavior of getting a default constructed object of type
dsl::whitespacerule. Whitespace can now be parsed manually or automatically.
To parse whitespace manually, replace
rule + dsl::whitespace(rule), or otherwise insert
dsl::whitespace(rule)calls where appropriate. See
examples/xml.cppfor an example of manual whitespace skipping.
To parse whitespace automatically, define a
static constexpr auto whitespacemember in the root production of the grammar. This rule is then skipped after every token. To temporarily disable automatic whitespace skipping inside one production, inherit from
examples/json.cppfor an example of automatic whitespace skipping.
Removed support for choices in while, i.e.
dsl::while_(a | b | c). This can be replaced by
dsl::loop(a | b | c | dsl::break_).
dsl::context_counterdue to implementation problems. Use
A sequence rule using
operator+is now longer a branch. Previously, it was a branch if it consisted of only tokens. However, this was unimplementable in combination with automatic whitespace skipping.
A branch condition that is a sequence is only required if you have something like
prefix + a >> rule_a | prefix + b >> rule_b. Use
prefix + (a >> rule_a | b >> rule_b)instead.
Removed context sensitive parsing mechanism from
Removed callback from
lexy::as_collection; they’re now only sink.
lexy::constructcan be used in most cases instead.
::valuecallbacks from productions. There are three cases:
A production has a
valuemember only: this continues to work as before.
A production has a
listmember only: just rename it to
value. It is treated as a sink automatically when required.
A production has a
valuemember: add a
valuemember that uses
sink >> callback, where
sinkwas the previous
callback. This will use
sinkto construct the list then pass everything to
lexy::resultnow has an empty state. It is only used internally and never exposed to the user. As a related change, the default constructor has been removed due to unclear semantics. Use
lexy::result(lexy::result_error)to restore its behavior of creating a default constructed error.
Patternconcept with a new
Branchconcept (See #10). A
Branchis a rule that can make branching decision (it is required by choices and can be used as branch condition). A
Tokenis an atomic parse unit; it is also a
Most patterns (e.g.
LEXY_LIT) are now tokens, which doesn’t break anything. Some patterns are now branches (e.g.
dsl::peek()), which breaks in rules that now require tokens (e.g.
dsl::until()). The remaining patterns are now plain rules (e.g.
dsl::while_(condition >> then)), which makes them unusable as branch conditions.
The patterns that are now branches:
condition >> thenwas a pattern if
thenis a pattern, now it is always a branch
The patterns that are now plain rules:
a sequence using
operator+(it is still a token if all arguments are tokens, so it can be used as condition)
a choice using
operator|, even if all arguments are tokens (use
operator/instead which is a token)
dsl::while_[one](), even if the argument is a token
The following rules previously required only patterns but now require tokens:
a minus using
dsl::escape()(the escape character itself) and its
automatic capturing of
If you have a breaking change because you now use a non-token rule where a token was expected, use
dsl::token(), which turns an arbitrary rule into a token (just like
dsl::match()turned a rule into a pattern).
dsl::token()instead. If you previously had
dsl::peek(dsl::match(rule)) >> thenyou can now even use
dsl::peek(rule) >> then, as
dsl::peek[_not]()have learned to support arbitrary rules.
patternis now a token, you can use
rule.error<Tag>()instead. Otherwise, use
dsl::trailing_sep(pattern). You can now use
dsl::capture()is now a branch and the separators have learned to support branches.
dsl::digit<Base> - dsl::zero(potentially with a nice error specified using
.error()) instead of
dsl::success, as it is now longer needed internally. It can be added back if needed.
BEHAVIOR CHANGE: As part of the branch changes,
dsl::lookahead()are now no-ops if not used as branch condition. For example,
prefix + dsl::peek(rule) + suffixis equivalent to
prefix + suffix. In most cases, this is only a change in the error message as they don’t consume characters. Use
dsl::prevent()if the lookahead was intended.
BEHAVIOR CHANGE: Errors in whitespace are currently not reported. For example, if you have
/* unterminated C comment int i;and support space and C comments as whitespace, this would previously raise an error about the unterminated C comment. Right now, it will try to skip the C comment, fail, and then just be done with whitespace skipping. The error for the unterminated C comment then manifests as
expected 'int', got '/*'.
This behavior is only temporary until a better solution for whitespace is implemented (see #10).
dsl::item(). They were mainly used to implement
dsl::list(), and became unnecessary after an internal restructuring.
Removed support for choices in lists, i.e.
dsl::list(a | b | c). This can be added back if needed.
dsl::operator!due to implementation problems. Existing uses of
dsl::peek(!rule)can be replaced by
dsl::peek_not(rule); existing uses of
!rule >> do_sthcan be replaced using