Changelog
Upcoming
Potential breaking changes
scanner-common::capture_token
was renamed toscanner-common::capture
, and oldscanner-common::capture
removed. Previously,capture_token
was a linker error anyway, but if you’re callingscanner-common::capture
it will no longer work for arbitrary rules and instead only likedsl::capture
.lexy::parse_as_tree
will add a position token to production nodes that would otherwise be empty. That way, no production node will be empty, unless the builder API is used directly.Change
lexy::dsl::try_()
error recovery behavior: It will now skip whitespace after the (optional) error recovery rule.Deprecate the
lexy::parse_tree::builder::finish()
overload that does not take aremaining_input
.The typo
lexy::code_point::spaing_mark
was fixed tospacing_mark
.
New Features
Experimental: Add
lexy::parse_tree_input
andlexy::dsl::tnode
/lexy::dsl::pnode
to support multi-pass parsing.Add
lexy::dsl::byte.if_
/set
/range
/ascii
methods to match specific bytes.Add an overload of
fatal_error()
on scanners that allows construction of type-erased generic errors (#134).Add
lexy::buffer::release()
andlexy::buffer::adopt()
to deconstruct a buffer into its components and re-assemble it later.Add
lexy::parse_tree::node::position()
and::covering_lexeme()
.Add default argument to
lexy::dsl::flag()
.Add
lexy::callback_with_state
.Pass the parse state to the tag of
lexy::dsl::op
if required (#172) and tolexy::dsl::error
(#211).Enable CMake install rule for subdirectory builds (#205).
Bug fixes
Add missing
constexpr
to container callbacks andlexy::as_string
.Fix infinite loop in
dsl::delimited
when dealing with invalid code points (#173).Fix swallowed errors from case-folding rules (#149).
Fix
lexy::production_name
for productions in an anonymous namespace.Fix bugs in
dsl::scan
(#133, #135, #142, #154, #209).Fix bug with the position passed to the tag constructor of
lexy::dsl::op
(#170).Fix bug where
lexy_ext::report_error
unconditionally wrote tostderr
, ignoring the output iterator.Fix bug with missing
lexy::error_context::position
inlexy::parse_as_tree
(#184).Fix
static_assert
inlexy::parse_tree
(#190).Add missing
&&
inlexy::bind_sink
(#221).Workaround compiler bugs and improve documentation.
Release 2022.12.1
Add constructor to
lexy::input_location
.lexy::error_context::production
will not be a transparent production.Fix
lexy::production_info::operator==
when the compiler doesn’t merge string literals.Fix SWAR matching of
dsl::ascii::print
anddsl::ascii::graph
.Fix CMake target installation (#108).
Release 2022.12.0
Potential breaking changes
Change
lexy::dsl::peek_not()
error recovery behavior: it will now consume the input it matched to recover, which is more useful.Remove
Production
parameter fromlexy::error_context
. It is replaced by a type-erasedlexy::production_info
.lexy::validate
,lexy::parse
, andlexy::parse_as_tree
now type-erase generic error tags prior to invoking the callback.Use type-erased
lexy::production_info
instead ofProduction
type inlexy::parse_tree
. This is technically a breaking change, as it may affect overload resolution.
New features
Update Unicode database to Unicode 15.
Use SWAR (SIMD within a register) techniques to optimize token parsing.
Add
lexy::dsl::subgrammar
to split a grammar into multiple translation units.Add
lexy::dsl::flags
andlexy::dsl::flag
to parse enum flags.Add overload of
lexy::dsl::position
that parses a rule. This allows using it as branch conditions.Add
lexy::dsl::effect
to trigger side-effects during parsing.Add
lexy::subexpression_production
to parse a subexpression.Add
lexy::utf8_char_encoding
.Add
lexy::parse_tree::remaining_input()
and populate it bylexy::parse_as_tree
.Add
lexy::make_buffer_from_input
function.Add type-erased version of
lexy::error
.Support non-
const
parse state.
Bugfixes
Fix bug where
lexy::bind
callback does not forward rvalue arguments; they got turned into lvalues instead.Fix bug where callback composition was not allowed if the final callback returns
void
.Fix bug where
dsl::quoted(cc.error<foo>)
did not usefoo
as the error.
Release 2022.05.1
Change
dsl::scan
: it will now be invoked with the previously produced values.Add
dsl::parse_as
to ensure that a rule always produces a value (e.g. when combined with thedsl::scan
change above).Add
lexy::lexeme_input
to support multi-pass parsing.Turn
dsl::terminator(term)(branch)
into a branch rule, as opposed to being a plain rule (#74).Add
dsl::ignore_trailing_sep()
separator.Add
lexy::bounded<T, Max>
for bounded integer parsing (#72).Add
dsl::code_unit_id
rule.Turn
lexy::forward<void>
into a sink.Support references in
lexy::parse_result
andlexy::scan_result
Fix bug that prevented
lexy::parse
with a root production whose value isvoid
.Fix bug that caused infinite template instantiations for recursive scans.
Fix bug that didn’t skip whitespace in
lexy::scanner
for token productions.
Release 2022.05.0
Initial release.
Note | The following changelog items track the historic development; only breaking changes are documented. |
2022-04-21
dsl::lit_cp
in a char class now requires a Unicode encoding; use dsl::lit_b
to support default/byte encoding.
2022-03-21
BEHAVIOR CHANGE:
lexy::token_production
that define a::whitespace
member now skip whitespace in the direct rule as well. Previously, it would only apply the whitespace rule to child productions but not the production itself.BEHAVIOR CHANGE: production rules that define a
::whitespace
member now skip whitespace before parsing. This also applies to the root production, so whitespace at the beginning of the input is now skipped automatically.dsl::p<Production>
whereProduction
defines a::whitespace
member is now longer a branch rule: as it will now skip whitespace first, it can’t be used as a branch condition.Remove
dsl::whitespace
(no arguments); it’s now unnecessary as initial whitespace is skipped automatically.
2022-03-02
BEHAVIOR CHANGE:
dsl::capture_token()
is nowdsl::capture()
, olddsl::capture()
is removed. If you’re usingdsl::capture_token()
you need to rename it todsl::capture()
(compile error). If you’re usingdsl::capture()
on a non-token rule, you need to usedsl::scan
instead and manually produce the value (compile error). If you’re usingdsl::capture()
on a token, this will no longer capture trailing whitespace (silent behavior change). I can’t imagine a situation where capturing trailing whitespace was intended.BEHAVIOR CHANGE: if a non-root production defines a
::whitespace
member, it will now also apply to all children. Previously, it would only apply to the production that defined the member, and not it’s children (except if it was a token production).
2022-02-09
BEHAVIOR CHANGE:
dsl::newline
(anddsl::eol
in the newline case) generate a token node with thelexy::literal_token_kind
;lexy::newline_token_kind
andlexy::eol_token_kind
have been removed.dsl::eof
anddsl::eol
are now branch rules: replacedsl::until(dsl::eol)
bydsl::until(dsl::newline).or_eof()
.Removed generic
dsl::operator/
(alternative): usedsl::literal_set()
ordsl::operator|
instead.Require a char class rule in
.limit()
ofdsl::delimited()
: instead ofdsl::eol
ordsl::newline
, usedsl::ascii::newline
.Require literal rules in
dsl::lookahead()
,dsl::find()
, and.limit()
of error recovery rules.Require literal rules in
.reserve()
and variants ofdsl::identifier
.dsl::bom
now generates alexy::expected_literal
error instead oflexy::expected_char_class
.
2022-01-30
BEHAVIOR CHANGE: the introduction of char class rules changes error messages and token kinds in some situations.
Renamed
dsl::code_point.lit<Cp>()
todsl::lit_cp<Cp>
and moved todsl/literal.hpp
.Require char classes in
operator-
for tokens; removeddsl::contains()
anddsl::prefix()
.Require char classes in
dsl::delimited()
anddsl::identifier()
.Renamed
.character_class()
ofdsl::error
to.name()
.
2021-12-08
dsl::integer
now uses lexy::digits_token_kind
instead of lexy::error_token_kind
during recovery.
2021-12-01
dsl::bom
and dsl::lit_b
now require lexy::byte_encoding
.
2021-11-30
Remove lexy_ext/input_location.hpp
: use lexy/input_location.hpp
instead, which has a different interface but more functionality.
2021-11-23
Added more pre-defined token kinds: for example, tokens created by
LEXY_LIT()
now have their own literal token kind. This breaks code that does not use user-defined token kinds and does matching onlexy::parse_tree
.dsl::delimited()
now merges adjacent characters into a singlelexy::lexeme
that is passed to the sink.lexy::token_production
now longer merges adjacent tokens, butdsl::delimited()
merges character tokens.
2021-10-13
Terminator rules are no longer branch rules; this behavior was somewhat confusing. If you need branch rules, you can manually write the equivalent rules.
dsl::integer()
now requires a token rule. This ensures the correct behavior in combination with whitespace skipping.BEHAVIOR CHANGE: branch parsing an identifier will now backtrack without raising an error if it can match an identifier, but it is reserved. Previously, this would not backtrack and then raise an error (but trivially recover). This behavior is consistent with
dsl::symbol()
.
2021-10-07
Removed branch functionality of token sequence (again). It was already removed once as it was unimplementable due to automatic whitespace skipping, but then re-implemented later on. But as it turns out, it is in fact unimplementable and the current implementation was completely broken. Instead of
tok1 + tok2 >> rule1 | tok1 + tok3 >> rule2
usetok1 >> (tok2 >> rule1 | tok3 >> rule2)
.Removed
dsl::encode()
. The rule was completely broken in combination withdsl::capture()
and rules built on top likedsl::identifier()
.BEHAVIOR CHANGE: error recovery now produces a new error token in the parse tree. This ensures that the parse tree stays lossless even in the presence of errors.
Potential pitfall:
dsl::recover()
anddsl::find()
now always raise the recovery events. If you’re using them outside ofdsl::try_()
, this is not what you want, so don’t do them - they’re not meant for it.
2021-08-22
lexy::read_file_result
is no longer an input; you need to call .buffer()
when passing it to a parse action.
2021-08-17
Replaced lexy_ext::dump_parse_tree()
by lexy::visualize()
.
2021-07-15
Moved
lexy/match.hpp
,lexy/parse.hpp
, andlexy/validate.hpp
tolexy/action/match.hpp
,lexy/action/parse.hpp
andlexy/action/validate.hpp
.Moved
lexy::parse_as_tree()
to new headerlexy/action/parse_as_tree.hpp
;lexy::parse_tree
stayed inlexy/parse_tree.hpp
.Renamed
lexy::parse_tree::builder::backtrack_production
tocancel_production
, and itsproduction_state
tomarker
.
2021-07-01
Moved callback adapters and composition into new header files, but still implicitly included by
callback.hpp
.Removed overload of
lexy::bind
that takes a sink; bind individual items in a separate production instead.Removed unneeded overloads of
lexy::as_sink
and changed the transcoding behavior: It will now only use the pointer + size constructor if the character types match and no longerreinterpret_cast
.
2021-06-27
Simplified and minimized interface of the input classes, removing e.g. iterators from them.
Moved definition of
lexy::code_point
fromencoding.hpp
to new headercode_point.hpp
.
2021-06-20
Turned
dsl::else_
into a tag object that can only be used withoperator>>
, instead of a stand-alone rule.BEHAVIOR CHANGE:
dsl::peek[_not]()
anddsl::lookahead()
are no longer no-ops when used outside a branch condition. Instead, they will perform lookahead and raise an error if that fails.Removed
dsl::require/prevent(rule).error<tag>
; usedsl::peek[_not](rule).error<tag>
instead.Improved and simplified interface for
dsl::context_flag
anddsl::context_counter
: instead of.select()
/.compare()
, you now use.is_set()
/.is()
as a branch condition, and instead of.require()
, you now usedsl::must()
with.is[_set]()
.Removed
dsl::context_lexeme
; usedsl::context_identifier
instead.
2021-06-18
lexy::fold[_inplace]
is now longer a callback, only a sink; uselexy::callback(lexy::fold(…))
to turn it into a callback if needed.Removed
dsl::opt_list()
; usedsl::opt(dsl::list())
instead.BEHAVIOR CHANGE:
.opt_list()
ofdsl::terminator
/dsl::brackets
now produceslexy::nullopt
instead of an empty sink result if the list has no items. If you’re using pre-defined callbacks likelexy::as_list
,lexy::as_collection
, orlexy::as_string
, it continues to work as expected. If you’re usingsink >> callback
,callback
now requires one overload that takeslexy::nullopt
.Removed
.while[_one]()
fromdsl::terminator
/dsl::brackets
.
2021-06-14
Choice (operator|
) is no longer a branch rule if it would be an unconditional branch rule;
using an unconditional choice as a branch is almost surely a bug.
2021-06-13
Removed
dsl::label
anddsl::id
; use a separate production instead.Removed
lexy::sink
; instead oflexy::sink<T>(fn)
uselexy::fold_inplace<T>({}, fn)
.BEHAVIOR CHANGE:
dsl::times
/dsl::twice
no longer produce an array, but instead all values individually. Uselexy::fold
instead of a loop.
2021-06-12
Removed
lexy::null_input
.Downgraded
lexy/input/shell.hpp
tolexy_ext/shell.hpp
, with the namespace change tolexy_ext
.Removed
.capture()
fromdsl::code_point
; usedsl::capture()
instead.BEHAVIOR CHANGE: Don’t produce a tag value if no sign was present in
dsl::[minus/plus_]sign
. If you uselexy::as_integer
as callback, this doesn’t affect you.BEHAVIOR CHANGE: Don’t consume input in
dsl::prevent
.BEHAVIOR CHANGE: Produce only a single whitespace node in parse tree, instead of the individual token nodes. Prohibited
dsl::p
/dsl::recurse
inside the whitespace rule.
2021-05-25
Changed
dsl::[plus/minus_]sign
to producelexy::plus/minus_sign
instead of+1
/-1
. Also changed callbacklexy::as_integer
to adapt.Removed
dsl::parse_state
anddsl::parse_state_member
; uselexy::bind()
withlexy::parse_state
instead.Removed
dsl::value_*
rules; uselexy::bind()
ordsl::id
/dsl::label
instead.
2021-04-24
The alternative rule
/
now tries to find the longest match instead of the first one. If it was well-specified before, this doesn’t change anything.Removed
dsl::switch_()
; use the newdsl::symbol()
instead which is more efficient as well.Removed
.lit[_c]()
fromdsl::escape()
; use the new.symbol()
instead.
2021-03-29
Restructure callback header files; an
#include <lexy/callback.hpp>
might be necessary now.
2021-03-29
Support empty token nodes in the parse tree if they don’t have an unknown kind. In particular, the parse tree will now contain an EOF node at the end.
Turn
lexy::unknown_token_kind
into a value (as opposed to the type it was before).
2021-03-26
Renamed lexy::raw_encoding
to lexy::byte_encoding
.
2021-03-23
Changed the return type of
lexy::read_file()
(andlexy_ext::read_file()
) to use a newlexy::read_file_result
overlexy::result
.Changed the return type of
lexy::validate()
andlexy::parse_as_tree()
to a newlexy::validate_result
type.Changed the return type of
lexy::parse()
to a newlexy::parse_result
type.Removed
lexy::result
.An error callback that returns a non-void type must now be a sink. Use
lexy::collect<Container>(error_callback)
to create a sink that stores all results in the container. If the error callback returns void, no change is required.Removed
dsl::no_trailing_sep()
;dsl::sep()
now has that behavior as well.dsl::require()
anddsl::prevent()
now recover from errors, which might lead to worse error messages in certain situations. If they’re used as intended — to create a better error message if something didn’t work out — this shouldn’t happen.
2021-02-25
Removed empty state from
lexy::result
. It was only added because it was useful internally, but this is no longer the case.Reverted optimization that merged multiple lexemes in the sink/tokens of
dsl::delimited()
. Tokens are instead now automatically merged by the parse tree builder if direct children of alexy::token_production
.dsl::switch_(rule).case_()
now requires a branch of the formtoken >> rule
, previously it could take an arbitrary branch.
2021-02-21
Unified error interface:
.error<Tag>()
has become.error<Tag>
(e.g. for tokens,dsl::switch()
).f<Tag>(…)
has becomef(…).error<Tag>
(e.g. fordsl::require()
).ctx.require<Tag>()
has becomectx.require().error<Tag>
.dsl::[partial_]combination()
now have.missing_error<Tag>
and.duplicate_error<Tag>
members.
BEHAVIOR CHANGE: if
dsl::code_point_id
overflows, the tag is nowlexy::invalid_code_point
instead oflexy::integer_overflow
.
2021-02-20
Replaced use of
lexy::_detail::string_view
byconst char*
in all user facing functions. As a consequence, automatic type name now requires GCC > 8.Removed
lexy::make_error_location()
. It has been replaced bylexy_ext::find_input_location()
.
2021-02-17
Renamed lexy::make_buffer
to lexy::make_buffer_from_raw
.
2021-02-04
Removed support for arbitrary rules as content of a dsl::delimited()
rule, no only tokens are allowed.
Also removed support for an escape choice in the dsl::delimited()
rule, it must be a branch now.
As a related change, the sink will now be invoked with a lexy::lexeme
that can span multiple occurrences of the content token,
not multiple times (one lexeme per token occurrence) as it was previously.
This means that a dsl::quoted(dsl::code_point)
rule will now invoke the sink only once giving it a lexy::lexeme
that spans the entire content of the string literal.
Previously it was invoked once per dsl::code_point
.
2021-01-11
Limited implicit conversion of lexy::nullopt
to types that are like std::optional
or pointers.
Replaced lexy::dsl::nullopt
by lexy::dsl::value_t<T>
and lexy::dsl::opt(rule)
by rule | lexy::dsl::value_t<T>
to keep the previous behavior of getting a default constructed object of type T
.
2021-01-10
Replaced
operator[]
anddsl::whitespaced()
by newdsl::whitespace
rule. Whitespace can now be parsed manually or automatically.To parse whitespace manually, replace
rule[ws]
byrule + dsl::whitespace(rule)
, or otherwise insertdsl::whitespace(rule)
calls where appropriate. Seeexamples/email.cpp
orexamples/xml.cpp
for an example of manual whitespace skipping.To parse whitespace automatically, define a
static constexpr auto whitespace
member in the root production of the grammar. This rule is then skipped after every token. To temporarily disable automatic whitespace skipping inside one production, inherit fromlexy::token_production
. Seeexamples/tutorial.cpp
orexamples/json.cpp
for an example of automatic whitespace skipping.Removed support for choices in while, i.e.
dsl::while_(a | b | c)
. This can be replaced bydsl::loop(a | b | c | dsl::break_)
.
2021-01-09
Removed
.check()
fromdsl::context_flag
and.check_eq/lt/gt
fromdsl::context_counter
due to implementation problems. Use.select()
and.compare()
instead.A sequence rule using
operator+
is now longer a branch. Previously, it was a branch if it consisted of only tokens. However, this was unimplementable in combination with automatic whitespace skipping.A branch condition that is a sequence is only required if you have something like
prefix + a >> rule_a | prefix + b >> rule_b
. Useprefix + (a >> rule_a | b >> rule_b)
instead.
2021-01-08
Removed context sensitive parsing mechanism from context.hpp
(dsl::context_push()
, _pop()
etc.).
Use dsl::context_lexeme
instead: .capture()
replaces dsl::context_push()
and .require()
replaces dsl::context_pop()
.
2021-01-03
Removed callback from
lexy::as_list
andlexy::as_collection
; they’re now only sink.lexy::construct
can be used in most cases instead.Merged
::list
and::value
callbacks from productions. There are three cases:A production has a
value
member only: this continues to work as before.A production has a
list
member only: just rename it tovalue
. It is treated as a sink automatically when required.A production has a
list
andvalue
member: add avalue
member that usessink >> callback
, wheresink
was the previouslist
value andcallback
the previouscallback
. This will usesink
to construct the list then pass everything tocallback
.
lexy::result
now has an empty state. It is only used internally and never exposed to the user. As a related change, the default constructor has been removed due to unclear semantics. Uselexy::result(lexy::result_error)
to restore its behavior of creating a default constructed error.
2020-12-26
Replaced
Pattern
concept with a newToken
andBranch
concept (See #10). ABranch
is a rule that can make branching decision (it is required by choices and can be used as branch condition). AToken
is an atomic parse unit; it is also aBranch
.Most patterns (e.g.
LEXY_LIT
) are now tokens, which doesn’t break anything. Some patterns are now branches (e.g.dsl::peek()
), which breaks in rules that now require tokens (e.g.dsl::until()
). The remaining patterns are now plain rules (e.g.dsl::while_(condition >> then)
), which makes them unusable as branch conditions.The patterns that are now branches:
dsl::error
dsl::peek()
anddsl::peek_not()
condition >> then
was a pattern ifthen
is a pattern, now it is always a branch
The patterns that are now plain rules:
a sequence using
operator+
(it is still a token if all arguments are tokens, so it can be used as condition)a choice using
operator|
, even if all arguments are tokens (useoperator/
instead which is a token)dsl::while_[one]()
, even if the argument is a tokendsl::times()
dsl::if_()
The following rules previously required only patterns but now require tokens:
a minus using
operator-
(both arguments)dsl::until()
dsl::lookahead()
dsl::escape()
(the escape character itself) and its.capture()
digit separators
automatic capturing of
dsl::delimited()
lexy::make_error_location()
If you have a breaking change because you now use a non-token rule where a token was expected, use
dsl::token()
, which turns an arbitrary rule into a token (just likedsl::match()
turned a rule into a pattern).Removed
dsl::match()
; usedsl::token()
instead. If you previously haddsl::peek(dsl::match(rule)) >> then
you can now even usedsl::peek(rule) >> then
, asdsl::peek[_not]()
have learned to support arbitrary rules.Removed
dsl::try_<Tag>(pattern)
. Ifpattern
is now a token, you can userule.error<Tag>()
instead. Otherwise, usedsl::token(pattern).error<Tag>()
.Removed
.capture()
ondsl::sep(pattern)
anddsl::trailing_sep(pattern)
. You can now usedsl::sep(dsl::capture(pattern))
, asdsl::capture()
is now a branch and the separators have learned to support branches.Removed
.zero()
and.non_zero()
fromdsl::digit<Base>
. Usedsl::zero
instead ofdsl::digit<Base>.zero()
. Usedsl::digit<Base> - dsl::zero
(potentially with a nice error specified using.error()
) instead ofdsl::digit<Base>.non_zero()
.Removed
dsl::success
, as it is now longer needed internally. It can be added back if needed.BEHAVIOR CHANGE: As part of the branch changes,
dsl::peek()
,dsl::peek_not()
anddsl::lookahead()
are now no-ops if not used as branch condition. For example,prefix + dsl::peek(rule) + suffix
is equivalent toprefix + suffix
. In most cases, this is only a change in the error message as they don’t consume characters. Usedsl::require()
anddsl::prevent()
if the lookahead was intended.BEHAVIOR CHANGE: Errors in whitespace are currently not reported. For example, if you have
/* unterminated C comment int i;
and support space and C comments as whitespace, this would previously raise an error about the unterminated C comment. Right now, it will try to skip the C comment, fail, and then just be done with whitespace skipping. The error for the unterminated C comment then manifests asexpected 'int', got '/*'
.This behavior is only temporary until a better solution for whitespace is implemented (see #10).
2020-12-22
Removed
dsl::build_list()
anddsl::item()
. They were mainly used to implementdsl::list()
, and became unnecessary after an internal restructuring.Removed support for choices in lists, i.e.
dsl::list(a | b | c)
. This can be added back if needed.Removed
dsl::operator!
due to implementation problems. Existing uses ofdsl::peek(!rule)
can be replaced bydsl::peek_not(rule)
; existing uses of!rule >> do_sth
can be replaced usingdsl::terminator()
.