Changelog
Upcoming
New features
Add
dsl::context_counter::is<Pred>()and convenience overloads to check whether the value matches some predicate (#238, #239).
Bug fixes
dsl::no_whitespacedid not handle nesteddsl::no_whitespacecorrectly.dsl::no_whitespacedid not re-enable whitespace in child productions.Improve compiler support (#254) and CMake (#244).
Release 2025.05.0
Potential breaking changes
scanner-common::capture_tokenwas renamed toscanner-common::capture, and oldscanner-common::captureremoved. Previously,capture_tokenwas a linker error anyway, but if you’re callingscanner-common::captureit will no longer work for arbitrary rules and instead only likedsl::capture.lexy::parse_as_treewill add a position token to production nodes that would otherwise be empty. That way, no production node will be empty, unless the builder API is used directly.Change
lexy::dsl::try_()error recovery behavior: It will now skip whitespace after the (optional) error recovery rule.Deprecate the
lexy::parse_tree::builder::finish()overload that does not take aremaining_input.The typo
lexy::code_point::spaing_markwas fixed tospacing_mark.
New features
Experimental: Add
lexy::parse_tree_inputandlexy::dsl::tnode/lexy::dsl::pnodeto support multi-pass parsing.Add
lexy::dsl::byte.if_/set/range/asciimethods to match specific bytes.Add an overload of
fatal_error()on scanners that allows construction of type-erased generic errors (#134).Add
lexy::buffer::release()andlexy::buffer::adopt()to deconstruct a buffer into its components and re-assemble it later.Add
lexy::parse_tree::node::position()and::covering_lexeme().Add default argument to
lexy::dsl::flag().Add
lexy::callback_with_state.Pass the parse state to the tag of
lexy::dsl::opif required (#172) and tolexy::dsl::error(#211).Enable CMake install rule for subdirectory builds (#205).
Bug fixes
Add missing
constexprto container callbacks andlexy::as_string.Fix infinite loop in
dsl::delimitedwhen dealing with invalid code points (#173).Fix swallowed errors from case-folding rules (#149).
Fix
lexy::production_namefor productions in an anonymous namespace.Fix bugs in
dsl::scan(#133, #135, #142, #154, #209).Fix bug with the position passed to the tag constructor of
lexy::dsl::op(#170).Fix bug where
lexy_ext::report_errorunconditionally wrote tostderr, ignoring the output iterator.Fix bug with missing
lexy::error_context::positioninlexy::parse_as_tree(#184).Fix
static_assertinlexy::parse_tree(#190).Fix bugs in
lexy::input_location::operator<(#228).Fix bugs in examples (#183)
Add missing
&&inlexy::bind_sink(#221).Workaround compiler bugs and improve documentation (#128, #129, #146, #181, #197, #216, #227).
Release 2022.12.1
Add constructor to
lexy::input_location.lexy::error_context::productionwill not be a transparent production.Fix
lexy::production_info::operator==when the compiler doesn’t merge string literals.Fix SWAR matching of
dsl::ascii::printanddsl::ascii::graph.Fix CMake target installation (#108).
Release 2022.12.0
Potential breaking changes
Change
lexy::dsl::peek_not()error recovery behavior: it will now consume the input it matched to recover, which is more useful.Remove
Productionparameter fromlexy::error_context. It is replaced by a type-erasedlexy::production_info.lexy::validate,lexy::parse, andlexy::parse_as_treenow type-erase generic error tags prior to invoking the callback.Use type-erased
lexy::production_infoinstead ofProductiontype inlexy::parse_tree. This is technically a breaking change, as it may affect overload resolution.
New features
Update Unicode database to Unicode 15.
Use SWAR (SIMD within a register) techniques to optimize token parsing.
Add
lexy::dsl::subgrammarto split a grammar into multiple translation units.Add
lexy::dsl::flagsandlexy::dsl::flagto parse enum flags.Add overload of
lexy::dsl::positionthat parses a rule. This allows using it as branch conditions.Add
lexy::dsl::effectto trigger side-effects during parsing.Add
lexy::subexpression_productionto parse a subexpression.Add
lexy::utf8_char_encoding.Add
lexy::parse_tree::remaining_input()and populate it bylexy::parse_as_tree.Add
lexy::make_buffer_from_inputfunction.Add type-erased version of
lexy::error.Support non-
constparse state.
Bugfixes
Fix bug where
lexy::bindcallback does not forward rvalue arguments; they got turned into lvalues instead.Fix bug where callback composition was not allowed if the final callback returns
void.Fix bug where
dsl::quoted(cc.error<foo>)did not usefooas the error.
Release 2022.05.1
Change
dsl::scan: it will now be invoked with the previously produced values.Add
dsl::parse_asto ensure that a rule always produces a value (e.g. when combined with thedsl::scanchange above).Add
lexy::lexeme_inputto support multi-pass parsing.Turn
dsl::terminator(term)(branch)into a branch rule, as opposed to being a plain rule (#74).Add
dsl::ignore_trailing_sep()separator.Add
lexy::bounded<T, Max>for bounded integer parsing (#72).Add
dsl::code_unit_idrule.Turn
lexy::forward<void>into a sink.Support references in
lexy::parse_resultandlexy::scan_resultFix bug that prevented
lexy::parsewith a root production whose value isvoid.Fix bug that caused infinite template instantiations for recursive scans.
Fix bug that didn’t skip whitespace in
lexy::scannerfor token productions.
Release 2022.05.0
Initial release.
Note | The following changelog items track the historic development; only breaking changes are documented. |
2022-04-21
dsl::lit_cp in a char class now requires a Unicode encoding; use dsl::lit_b to support default/byte encoding.
2022-03-21
BEHAVIOR CHANGE:
lexy::token_productionthat define a::whitespacemember now skip whitespace in the direct rule as well. Previously, it would only apply the whitespace rule to child productions but not the production itself.BEHAVIOR CHANGE: production rules that define a
::whitespacemember now skip whitespace before parsing. This also applies to the root production, so whitespace at the beginning of the input is now skipped automatically.dsl::p<Production>whereProductiondefines a::whitespacemember is now longer a branch rule: as it will now skip whitespace first, it can’t be used as a branch condition.Remove
dsl::whitespace(no arguments); it’s now unnecessary as initial whitespace is skipped automatically.
2022-03-02
BEHAVIOR CHANGE:
dsl::capture_token()is nowdsl::capture(), olddsl::capture()is removed. If you’re usingdsl::capture_token()you need to rename it todsl::capture()(compile error). If you’re usingdsl::capture()on a non-token rule, you need to usedsl::scaninstead and manually produce the value (compile error). If you’re usingdsl::capture()on a token, this will no longer capture trailing whitespace (silent behavior change). I can’t imagine a situation where capturing trailing whitespace was intended.BEHAVIOR CHANGE: if a non-root production defines a
::whitespacemember, it will now also apply to all children. Previously, it would only apply to the production that defined the member, and not it’s children (except if it was a token production).
2022-02-09
BEHAVIOR CHANGE:
dsl::newline(anddsl::eolin the newline case) generate a token node with thelexy::literal_token_kind;lexy::newline_token_kindandlexy::eol_token_kindhave been removed.dsl::eofanddsl::eolare now branch rules: replacedsl::until(dsl::eol)bydsl::until(dsl::newline).or_eof().Removed generic
dsl::operator/(alternative): usedsl::literal_set()ordsl::operator|instead.Require a char class rule in
.limit()ofdsl::delimited(): instead ofdsl::eolordsl::newline, usedsl::ascii::newline.Require literal rules in
dsl::lookahead(),dsl::find(), and.limit()of error recovery rules.Require literal rules in
.reserve()and variants ofdsl::identifier.dsl::bomnow generates alexy::expected_literalerror instead oflexy::expected_char_class.
2022-01-30
BEHAVIOR CHANGE: the introduction of char class rules changes error messages and token kinds in some situations.
Renamed
dsl::code_point.lit<Cp>()todsl::lit_cp<Cp>and moved todsl/literal.hpp.Require char classes in
operator-for tokens; removeddsl::contains()anddsl::prefix().Require char classes in
dsl::delimited()anddsl::identifier().Renamed
.character_class()ofdsl::errorto.name().
2021-12-08
dsl::integer now uses lexy::digits_token_kind instead of lexy::error_token_kind during recovery.
2021-12-01
dsl::bom and dsl::lit_b now require lexy::byte_encoding.
2021-11-30
Remove lexy_ext/input_location.hpp: use lexy/input_location.hpp instead, which has a different interface but more functionality.
2021-11-23
Added more pre-defined token kinds: for example, tokens created by
LEXY_LIT()now have their own literal token kind. This breaks code that does not use user-defined token kinds and does matching onlexy::parse_tree.dsl::delimited()now merges adjacent characters into a singlelexy::lexemethat is passed to the sink.lexy::token_productionnow longer merges adjacent tokens, butdsl::delimited()merges character tokens.
2021-10-13
Terminator rules are no longer branch rules; this behavior was somewhat confusing. If you need branch rules, you can manually write the equivalent rules.
dsl::integer()now requires a token rule. This ensures the correct behavior in combination with whitespace skipping.BEHAVIOR CHANGE: branch parsing an identifier will now backtrack without raising an error if it can match an identifier, but it is reserved. Previously, this would not backtrack and then raise an error (but trivially recover). This behavior is consistent with
dsl::symbol().
2021-10-07
Removed branch functionality of token sequence (again). It was already removed once as it was unimplementable due to automatic whitespace skipping, but then re-implemented later on. But as it turns out, it is in fact unimplementable and the current implementation was completely broken. Instead of
tok1 + tok2 >> rule1 | tok1 + tok3 >> rule2usetok1 >> (tok2 >> rule1 | tok3 >> rule2).Removed
dsl::encode(). The rule was completely broken in combination withdsl::capture()and rules built on top likedsl::identifier().BEHAVIOR CHANGE: error recovery now produces a new error token in the parse tree. This ensures that the parse tree stays lossless even in the presence of errors.
Potential pitfall:
dsl::recover()anddsl::find()now always raise the recovery events. If you’re using them outside ofdsl::try_(), this is not what you want, so don’t do them - they’re not meant for it.
2021-08-22
lexy::read_file_result is no longer an input; you need to call .buffer() when passing it to a parse action.
2021-08-17
Replaced lexy_ext::dump_parse_tree() by lexy::visualize().
2021-07-15
Moved
lexy/match.hpp,lexy/parse.hpp, andlexy/validate.hpptolexy/action/match.hpp,lexy/action/parse.hppandlexy/action/validate.hpp.Moved
lexy::parse_as_tree()to new headerlexy/action/parse_as_tree.hpp;lexy::parse_treestayed inlexy/parse_tree.hpp.Renamed
lexy::parse_tree::builder::backtrack_productiontocancel_production, and itsproduction_statetomarker.
2021-07-01
Moved callback adapters and composition into new header files, but still implicitly included by
callback.hpp.Removed overload of
lexy::bindthat takes a sink; bind individual items in a separate production instead.Removed unneeded overloads of
lexy::as_sinkand changed the transcoding behavior: It will now only use the pointer + size constructor if the character types match and no longerreinterpret_cast.
2021-06-27
Simplified and minimized interface of the input classes, removing e.g. iterators from them.
Moved definition of
lexy::code_pointfromencoding.hppto new headercode_point.hpp.
2021-06-20
Turned
dsl::else_into a tag object that can only be used withoperator>>, instead of a stand-alone rule.BEHAVIOR CHANGE:
dsl::peek[_not]()anddsl::lookahead()are no longer no-ops when used outside a branch condition. Instead, they will perform lookahead and raise an error if that fails.Removed
dsl::require/prevent(rule).error<tag>; usedsl::peek[_not](rule).error<tag>instead.Improved and simplified interface for
dsl::context_flaganddsl::context_counter: instead of.select()/.compare(), you now use.is_set()/.is()as a branch condition, and instead of.require(), you now usedsl::must()with.is[_set]().Removed
dsl::context_lexeme; usedsl::context_identifierinstead.
2021-06-18
lexy::fold[_inplace]is now longer a callback, only a sink; uselexy::callback(lexy::fold(…))to turn it into a callback if needed.Removed
dsl::opt_list(); usedsl::opt(dsl::list())instead.BEHAVIOR CHANGE:
.opt_list()ofdsl::terminator/dsl::bracketsnow produceslexy::nulloptinstead of an empty sink result if the list has no items. If you’re using pre-defined callbacks likelexy::as_list,lexy::as_collection, orlexy::as_string, it continues to work as expected. If you’re usingsink >> callback,callbacknow requires one overload that takeslexy::nullopt.Removed
.while[_one]()fromdsl::terminator/dsl::brackets.
2021-06-14
Choice (operator|) is no longer a branch rule if it would be an unconditional branch rule;
using an unconditional choice as a branch is almost surely a bug.
2021-06-13
Removed
dsl::labelanddsl::id; use a separate production instead.Removed
lexy::sink; instead oflexy::sink<T>(fn)uselexy::fold_inplace<T>({}, fn).BEHAVIOR CHANGE:
dsl::times/dsl::twiceno longer produce an array, but instead all values individually. Uselexy::foldinstead of a loop.
2021-06-12
Removed
lexy::null_input.Downgraded
lexy/input/shell.hpptolexy_ext/shell.hpp, with the namespace change tolexy_ext.Removed
.capture()fromdsl::code_point; usedsl::capture()instead.BEHAVIOR CHANGE: Don’t produce a tag value if no sign was present in
dsl::[minus/plus_]sign. If you uselexy::as_integeras callback, this doesn’t affect you.BEHAVIOR CHANGE: Don’t consume input in
dsl::prevent.BEHAVIOR CHANGE: Produce only a single whitespace node in parse tree, instead of the individual token nodes. Prohibited
dsl::p/dsl::recurseinside the whitespace rule.
2021-05-25
Changed
dsl::[plus/minus_]signto producelexy::plus/minus_signinstead of+1/-1. Also changed callbacklexy::as_integerto adapt.Removed
dsl::parse_stateanddsl::parse_state_member; uselexy::bind()withlexy::parse_stateinstead.Removed
dsl::value_*rules; uselexy::bind()ordsl::id/dsl::labelinstead.
2021-04-24
The alternative rule
/now tries to find the longest match instead of the first one. If it was well-specified before, this doesn’t change anything.Removed
dsl::switch_(); use the newdsl::symbol()instead which is more efficient as well.Removed
.lit[_c]()fromdsl::escape(); use the new.symbol()instead.
2021-03-29
Restructure callback header files; an
#include <lexy/callback.hpp>might be necessary now.
2021-03-29
Support empty token nodes in the parse tree if they don’t have an unknown kind. In particular, the parse tree will now contain an EOF node at the end.
Turn
lexy::unknown_token_kindinto a value (as opposed to the type it was before).
2021-03-26
Renamed lexy::raw_encoding to lexy::byte_encoding.
2021-03-23
Changed the return type of
lexy::read_file()(andlexy_ext::read_file()) to use a newlexy::read_file_resultoverlexy::result.Changed the return type of
lexy::validate()andlexy::parse_as_tree()to a newlexy::validate_resulttype.Changed the return type of
lexy::parse()to a newlexy::parse_resulttype.Removed
lexy::result.An error callback that returns a non-void type must now be a sink. Use
lexy::collect<Container>(error_callback)to create a sink that stores all results in the container. If the error callback returns void, no change is required.Removed
dsl::no_trailing_sep();dsl::sep()now has that behavior as well.dsl::require()anddsl::prevent()now recover from errors, which might lead to worse error messages in certain situations. If they’re used as intended — to create a better error message if something didn’t work out — this shouldn’t happen.
2021-02-25
Removed empty state from
lexy::result. It was only added because it was useful internally, but this is no longer the case.Reverted optimization that merged multiple lexemes in the sink/tokens of
dsl::delimited(). Tokens are instead now automatically merged by the parse tree builder if direct children of alexy::token_production.dsl::switch_(rule).case_()now requires a branch of the formtoken >> rule, previously it could take an arbitrary branch.
2021-02-21
Unified error interface:
.error<Tag>()has become.error<Tag>(e.g. for tokens,dsl::switch()).f<Tag>(…)has becomef(…).error<Tag>(e.g. fordsl::require()).ctx.require<Tag>()has becomectx.require().error<Tag>.dsl::[partial_]combination()now have.missing_error<Tag>and.duplicate_error<Tag>members.
BEHAVIOR CHANGE: if
dsl::code_point_idoverflows, the tag is nowlexy::invalid_code_pointinstead oflexy::integer_overflow.
2021-02-20
Replaced use of
lexy::_detail::string_viewbyconst char*in all user facing functions. As a consequence, automatic type name now requires GCC > 8.Removed
lexy::make_error_location(). It has been replaced bylexy_ext::find_input_location().
2021-02-17
Renamed lexy::make_buffer to lexy::make_buffer_from_raw.
2021-02-04
Removed support for arbitrary rules as content of a dsl::delimited() rule, no only tokens are allowed.
Also removed support for an escape choice in the dsl::delimited() rule, it must be a branch now.
As a related change, the sink will now be invoked with a lexy::lexeme that can span multiple occurrences of the content token,
not multiple times (one lexeme per token occurrence) as it was previously.
This means that a dsl::quoted(dsl::code_point) rule will now invoke the sink only once giving it a lexy::lexeme that spans the entire content of the string literal.
Previously it was invoked once per dsl::code_point.
2021-01-11
Limited implicit conversion of lexy::nullopt to types that are like std::optional or pointers.
Replaced lexy::dsl::nullopt by lexy::dsl::value_t<T> and lexy::dsl::opt(rule) by rule | lexy::dsl::value_t<T> to keep the previous behavior of getting a default constructed object of type T.
2021-01-10
Replaced
operator[]anddsl::whitespaced()by newdsl::whitespacerule. Whitespace can now be parsed manually or automatically.To parse whitespace manually, replace
rule[ws]byrule + dsl::whitespace(rule), or otherwise insertdsl::whitespace(rule)calls where appropriate. Seeexamples/email.cpporexamples/xml.cppfor an example of manual whitespace skipping.To parse whitespace automatically, define a
static constexpr auto whitespacemember in the root production of the grammar. This rule is then skipped after every token. To temporarily disable automatic whitespace skipping inside one production, inherit fromlexy::token_production. Seeexamples/tutorial.cpporexamples/json.cppfor an example of automatic whitespace skipping.Removed support for choices in while, i.e.
dsl::while_(a | b | c). This can be replaced bydsl::loop(a | b | c | dsl::break_).
2021-01-09
Removed
.check()fromdsl::context_flagand.check_eq/lt/gtfromdsl::context_counterdue to implementation problems. Use.select()and.compare()instead.A sequence rule using
operator+is now longer a branch. Previously, it was a branch if it consisted of only tokens. However, this was unimplementable in combination with automatic whitespace skipping.A branch condition that is a sequence is only required if you have something like
prefix + a >> rule_a | prefix + b >> rule_b. Useprefix + (a >> rule_a | b >> rule_b)instead.
2021-01-08
Removed context sensitive parsing mechanism from context.hpp (dsl::context_push(), _pop() etc.).
Use dsl::context_lexeme instead: .capture() replaces dsl::context_push() and .require() replaces dsl::context_pop().
2021-01-03
Removed callback from
lexy::as_listandlexy::as_collection; they’re now only sink.lexy::constructcan be used in most cases instead.Merged
::listand::valuecallbacks from productions. There are three cases:A production has a
valuemember only: this continues to work as before.A production has a
listmember only: just rename it tovalue. It is treated as a sink automatically when required.A production has a
listandvaluemember: add avaluemember that usessink >> callback, wheresinkwas the previouslistvalue andcallbackthe previouscallback. This will usesinkto construct the list then pass everything tocallback.
lexy::resultnow has an empty state. It is only used internally and never exposed to the user. As a related change, the default constructor has been removed due to unclear semantics. Uselexy::result(lexy::result_error)to restore its behavior of creating a default constructed error.
2020-12-26
Replaced
Patternconcept with a newTokenandBranchconcept (See #10). ABranchis a rule that can make branching decision (it is required by choices and can be used as branch condition). ATokenis an atomic parse unit; it is also aBranch.Most patterns (e.g.
LEXY_LIT) are now tokens, which doesn’t break anything. Some patterns are now branches (e.g.dsl::peek()), which breaks in rules that now require tokens (e.g.dsl::until()). The remaining patterns are now plain rules (e.g.dsl::while_(condition >> then)), which makes them unusable as branch conditions.The patterns that are now branches:
dsl::errordsl::peek()anddsl::peek_not()condition >> thenwas a pattern ifthenis a pattern, now it is always a branch
The patterns that are now plain rules:
a sequence using
operator+(it is still a token if all arguments are tokens, so it can be used as condition)a choice using
operator|, even if all arguments are tokens (useoperator/instead which is a token)dsl::while_[one](), even if the argument is a tokendsl::times()dsl::if_()
The following rules previously required only patterns but now require tokens:
a minus using
operator-(both arguments)dsl::until()dsl::lookahead()dsl::escape()(the escape character itself) and its.capture()digit separators
automatic capturing of
dsl::delimited()lexy::make_error_location()
If you have a breaking change because you now use a non-token rule where a token was expected, use
dsl::token(), which turns an arbitrary rule into a token (just likedsl::match()turned a rule into a pattern).Removed
dsl::match(); usedsl::token()instead. If you previously haddsl::peek(dsl::match(rule)) >> thenyou can now even usedsl::peek(rule) >> then, asdsl::peek[_not]()have learned to support arbitrary rules.Removed
dsl::try_<Tag>(pattern). Ifpatternis now a token, you can userule.error<Tag>()instead. Otherwise, usedsl::token(pattern).error<Tag>().Removed
.capture()ondsl::sep(pattern)anddsl::trailing_sep(pattern). You can now usedsl::sep(dsl::capture(pattern)), asdsl::capture()is now a branch and the separators have learned to support branches.Removed
.zero()and.non_zero()fromdsl::digit<Base>. Usedsl::zeroinstead ofdsl::digit<Base>.zero(). Usedsl::digit<Base> - dsl::zero(potentially with a nice error specified using.error()) instead ofdsl::digit<Base>.non_zero().Removed
dsl::success, as it is now longer needed internally. It can be added back if needed.BEHAVIOR CHANGE: As part of the branch changes,
dsl::peek(),dsl::peek_not()anddsl::lookahead()are now no-ops if not used as branch condition. For example,prefix + dsl::peek(rule) + suffixis equivalent toprefix + suffix. In most cases, this is only a change in the error message as they don’t consume characters. Usedsl::require()anddsl::prevent()if the lookahead was intended.BEHAVIOR CHANGE: Errors in whitespace are currently not reported. For example, if you have
/* unterminated C comment int i;and support space and C comments as whitespace, this would previously raise an error about the unterminated C comment. Right now, it will try to skip the C comment, fail, and then just be done with whitespace skipping. The error for the unterminated C comment then manifests asexpected 'int', got '/*'.This behavior is only temporary until a better solution for whitespace is implemented (see #10).
2020-12-22
Removed
dsl::build_list()anddsl::item(). They were mainly used to implementdsl::list(), and became unnecessary after an internal restructuring.Removed support for choices in lists, i.e.
dsl::list(a | b | c). This can be added back if needed.Removed
dsl::operator!due to implementation problems. Existing uses ofdsl::peek(!rule)can be replaced bydsl::peek_not(rule); existing uses of!rule >> do_sthcan be replaced usingdsl::terminator().