This is the reference documentation for lexy.

If anything in the documentation could be improved (and there is probably a lot), please raise an issue or — even better — create a PR. Thank you!

Inputs and Encodings

An Input defines the input that will be parsed by lexy. It has a corresponding Encoding that controls, among other things, its character type and whether certain rules are available. The Input itself is unchanging and it produces a Reader which remembers the current position of the input during parsing.

Encodings

lexy/encoding.hpp
namespace lexy
{
    struct default_encoding;
    struct ascii_encoding;
    struct utf8_encoding;
    struct utf16_encoding;
    struct utf32_encoding;
    struct byte_encoding;

    template <typename CharT>
    using deduce_encoding = /* see below */;

    enum class encoding_endianness;
}

An Encoding is a set of pre-defined policy classes that determine the text encoding of an input.

Each encoding has a primary character type, which is the character type of the input. It can also have a secondary character type, which the input should accept, but internally convert to the primary character type. For example, lexy::utf8_encoding’s primary character type is `char8_t, but it also accepts char.

The encoding also has an integer type, which can store either any valid character (code unit to be precise) or a special EOF value, similar to std::char_traits. For some encodings, the integer type can be the same as the character type as not all values are valid code units. This allows optimizations.

Certain rules can require a certain encodings. For example, lexy::dsl::code_point does not work with the lexy::default_encoding, and lexy::dsl::encode requires lexy::byte_encoding.

The supported encodings

lexy::default_encoding

The encoding that will be used when no other encoding is specified. Its character type is char and it can work with any 8-bit encoding (ASCII, UTF-8, extended ASCII etc.). Only use this encoding if you don’t know the exact encoding of your input.

lexy::ascii_encoding

Assumes the input is valid ASCII. Its character type is char.

lexy::utf8_encoding

Assumes the input is valid UTF-8. Its character type is char8_t, but it also accepts char.

lexy::utf16_encoding

Assumes the input is valid UTF-16. Its character type is char16_t, but it also accepts wchar_t on Windows.

lexy::utf32_encoding

Assumes the input is valid UTF-32. Its character type is char32_t, but it also accepts wchar_t on Linux.

lexy::byte_encoding

Does not assume the input is text. Its character type is unsigned char, but it also accepts char and std::byte. Use this encoding if you’re not parsing text or if you’re parsing text consisting of multiple encodings.

Note
If you specify an encoding that does not match the inputs actual encoding, e.g. you say it is UTF-8 but in reality it is some Windows code page, the library will handle it by generating parse errors. The worst that can happen is that you’ll get an unexpected EOF error because the input contains the character that is used to signal EOF in the encoding.

Deducing encoding

If you don’t specify an encoding for your input, lexy can sometimes deduce it by matching the character type to the primary character type. For example, a string of char8_t will be deduce it to be lexy::utf8_encoding. If the character type is char, lexy will deduce lexy::default_encoding (unless that has been overriden by a build option).

Encoding endianness

enum class encoding_endianness
{
    little,
    big,
    bom,
};

In-memory, UTF-16 and UTF-32 come in two flavors: big and little endian. Which version is used, can be specified with the encoding_endianness enumeration. This is only relevant when e.g. reading data from files.

little

The encoding is written using little endian. For single-byte encodings, this has no effect.

big

The encoding is written using big endian. For single-byte encodings, this has no effect.

bom

The endianness is determined using the byte-order mark (BOM) of the encoding. If no BOM is present, defaults to big endian as per Unicode recommendation. For UTF-8, this will skip the optional BOM, but has otherwise no effect. For non-Unicode encodings, this has no effect.

The pre-defined Inputs

Range input

lexy/input/range_input.hpp
namespace lexy
{
    template <typename Encoding, typename Iterator, typename Sentinel = Iterator>
    class range_input
    {
    public:
        using encoding  = Encoding;
        using char_type = typename encoding::char_type;
        using iterator  = Iterator;

        constexpr range_input() noexcept;
        constexpr range_input(Iterator begin, Sentinel end) noexcept;

        constexpr iterator begin() const noexcept;
        constexpr iterator end() const noexcept;

        constexpr Reader reader() const& noexcept;
    };
}

The class lexy::range_input is an input that represents the range [begin, end). CTAD can be used to deduce the encoding from the value type of the iterator.

Note
The input is a lightweight view and does not own any data.
Tip
Use lexy::string_input instead if the range is contiguous.
Example

Using the range input to parse content from a list.

std::list<char8_t> list = ;

// Create the input, deducing the encoding.
auto input = lexy::range_input(list.begin(), list.end());

String input

lexy/input/string_input.hpp
namespace lexy
{
    template <typename Encoding = default_encoding>
    class string_input
    {
    public:
        using encoding  = Encoding;
        using char_type = typename encoding::char_type;
        using iterator  = const char_type*;

        constexpr string_input() noexcept;

        template <typename CharT>
        constexpr string_input(const CharT* begin, const CharT* end) noexcept;
        template <typename CharT>
        constexpr string_input(const CharT* data, std::size_t size) noexcept;

        template <typename View>
        constexpr explicit string_input(const View& view) noexcept;

        constexpr iterator begin() const noexcept;
        constexpr iterator end() const noexcept;

        constexpr Reader reader() const& noexcept;
    };

    template <typename Encoding, typename CharT>
    constexpr auto zstring_input(const CharT* str) noexcept;
    template <typename CharT>
    constexpr auto zstring_input(const CharT* str) noexcept;

    template <typename Encoding = default_encoding>
    using string_lexeme = lexeme_for<string_input<Encoding>>;
    template <typename Tag, typename Encoding = default_encoding>
    using string_error = error_for<string_input<Encoding>, Tag>;
    template <typename Production, typename Encoding = default_encoding>
    using string_error_context = error_context<Production, string_input<Encoding>>;
} // namespace lexy

The class lexy::string_input is an input that represents the string view defined by the constructors. CTAD can be used to deduce the encoding from the character type.

Note
The input is a lightweight view and does not own any data. Use lexy::buffer if you want an owning version.
Pointer constructor
template <typename CharT>
constexpr string_input(const CharT* begin, const CharT* end) noexcept; // (1)
template <typename CharT>
constexpr string_input(const CharT* data, std::size_t size) noexcept; // (2)
  1. The input is the contiguous range [begin, end).

  2. The input is the contiguous range [data, data + size).

CharT must be the primary or secondary character type of the encoding.

View constructor
template <typename View>
constexpr explicit string_input(const View& view) noexcept;

The input is given by the View, which requires a .data() and .size() member. The character type of the View must be the primary or secondary character type of the encoding.

Null-terminated string functions
template <typename Encoding, typename CharT>
constexpr auto zstring_input(const CharT* str) noexcept; // (1)
template <typename CharT>
constexpr auto zstring_input(const CharT* str) noexcept; // (2)
  1. Use the specified encoding.

  2. Deduce the encoding from the character type.

The input is given by the range [str, end), where end is a pointer to the first null character of the string. The return type is an appropriate lexy::string_input instantiation.

Example

Using the string input to parse content from a std::string.

std::string str = ;
auto input = lexy::string_input(str);

Using the string input to parse content from a string literal.

auto input = lexy::zstring_input(u"Hello World!");

Buffer Input

lexy/input/buffer.hpp
namespace lexy
{
template <typename Encoding       = default_encoding,
          typename MemoryResource = /* default resource */>
class buffer
{
public:
    using encoding  = Encoding;
    using char_type = typename encoding::char_type;

    class builder;

    constexpr buffer() noexcept;
    constexpr explicit buffer(MemoryResource* resource) noexcept;

    template <typename CharT>
    explicit buffer(const CharT* data, std::size_t size,
                    MemoryResource* resource = /* default resource */);
    template <typename CharT>
    explicit buffer(const CharT* begin, const CharT* end,
                    MemoryResource* resource = /* default resource */);

    template <typename View>
    explicit buffer(const View&     view,
                    MemoryResource* resource = /* default resource */);

    buffer(const buffer& other, MemoryResource* resource);

    const char_type* begin() const noexcept;
    const char_type* end() const noexcept;

    const char_type* data() const noexcept;

    bool empty() const noexcept;

    std::size_t size() const noexcept;
    std::size_t length() const noexcept;

    Reader reader() const& noexcept;
};

template <typename Encoding, encoding_endianness Endianness>
constexpr auto make_buffer_from_raw;

template <typename Encoding       = default_encoding,
          typename MemoryResource = /* default resource */>
using buffer_lexeme = lexeme_for<buffer<Encoding, MemoryResource>>;
template <typename Tag, typename Encoding = default_encoding,
          typename MemoryResource = /* default resource */>
using buffer_error = error_for<buffer<Encoding, MemoryResource>, Tag>;
template <typename Production, typename Encoding = default_encoding,
          typename MemoryResource = /* default resource */>
using buffer_error_context = error_context<Production, buffer<Encoding, MemoryResource>>;
}

The class lexy::buffer is an immutable, owning variant of lexy::string_input. The memory for the input is allocated using the MemoryResource, which is a class with the same interface as std::pmr::memory_resource. By default, it uses a new and delete for the allocation, just like std::pmr::new_delete_resource. Construction of the buffer is just like lexy::string_input, except for the additional MemoryResource parameter. Once a memory resource has been specified, it will not propagate on assignment.

Tip
As the buffer owns the input, it can terminate it with the EOF character for encodings that have the same character and integer type. This eliminates the "is the reader at eof?"-branch during parsing.
Builder
class builder
{
public:
    explicit builder(std::size_t     size,
                     MemoryResource* resource = /* default resource */);

    char_type* data() const noexcept;
    std::size_t size() const noexcept;

    buffer finish() && noexcept;
};

The builder class separates the allocation and copying of the buffer data. This allows, for example, writing into the immutable buffer from a file. The constructor allocates memory for size characters, then data() gives a mutable pointer to that memory.

Make buffer from raw memory
struct /* unspecified */
{
    auto operator()(const void* memory, std::size_t size) const;

    template <typename MemoryResource>
    auto operator()(const void* memory, std::size_t size, MemoryResource* resource) const;
};

template <typename Encoding, encoding_endianness Endianness>
constexpr auto make_buffer_from_raw = /* unspecified */;

lexy::make_buffer_from_raw is a function object that constructs a lexy::buffer of the specified encoding from raw memory. If necessary, it will take care of the endianness conversion as instructed by the lexy::encoding_endianness enumeration. Any BOM, if present, will not be part of the input.

Example

Using a buffer to parse content from a std::string using UTF-8. This enables the sentinel optimization.

std::string str = ;
auto input = lexy::buffer<lexy::utf8_encoding>(str);

Using a buffer to parse a memory-mapped file containing little endian UTF-16.

auto ptr = mmap();

constexpr auto make_utf16_little
  = lexy::make_buffer_from_raw<lexy::utf16_encoding, lexy::encoding_endianness::little>;
auto input = make_utf16_little(ptr, length);

File Input

lexy/input/file.hpp
namespace lexy
{
    enum class file_error
    {
        os_error,
        file_not_found,
        permission_denied,
    };

    template <typename Encoding       = default_encoding,
              typename MemoryResource = /* default resource */>
    class read_file_result
    {
    public:
        using encoding  = Encoding;
        using char_type = typename encoding::char_type;

        explicit operator bool() const noexcept;

        file_error error() const noexcept;

        const char_type* data() const noexcept;
        std::size_t size() const noexcept;

        Reader reader() const& noexcept;
    };

    template <typename Encoding          = default_encoding,
              encoding_endianness Endian = encoding_endianness::bom,
              typename MemoryResource>
    auto read_file(const char*     path,
                   MemoryResource* resource = /* default resource */)
        -> read_file_result<Encoding, MemoryResource>;
}

The function lexy::read_file() reads the file at the specified path using the specified encoding and endianness. It returns a lexy::read_file_result. If reading failed, the operator bool will return false and .error() will return the error code. If reading was successful, the operator bool will return true and you can call .data()/.size() to get the file contents or treat it as an Input.

Example

Reading UTF-16 from a file with a BOM.

auto result = lexy::read_file<lexy::utf16_encoding>("input.txt");
if (!result)
    throw my_file_read_error_exception(result.error()); // (1)

 (2)
  1. Throw an exception giving it the lexy::file_error.

  2. Now you can use result as an Input or access the file contents.

Command-line argument Input

lexy/input/argv_input.hpp
namespace lexy
{
    class argv_sentinel;
    class argv_iterator;

    constexpr argv_iterator argv_begin(int argc, char* argv[]) noexcept;
    constexpr argv_iterator argv_end(int argc, char* argv[]) noexcept;

    template <typename Encoding = default_encoding>
    class argv_input
    {
    public:
        using encoding  = Encoding;
        using char_type = typename encoding::char_type;
        using iterator  = argv_iterator;

        constexpr argv_input() = default;
        constexpr argv_input(argv_iterator begin, argv_iterator end) noexcept;
        constexpr argv_input(int argc, char* argv[]) noexcept;

        constexpr Reader reader() const& noexcept;
    };

    template <typename Encoding = default_encoding>
    using argv_lexeme = lexeme_for<argv_input<Encoding>>;
    template <typename Tag, typename Encoding = default_encoding>
    using argv_error = error_for<argv_input<Encoding>, Tag>;
    template <typename Production, typename Encoding = default_encoding>
    using argv_error_context = error_context<Production, argv_input<Encoding>>;
}

The class lexy::argv_input is an input that uses the command-line arguments passed to main(). It excludes argv[0], which is the executable name, and includes \0 as a separator between command line arguments.

Note
The input is a lightweight view and does not own any data.
Command-line iterators
class argv_sentinel;
class argv_iterator;

constexpr argv_iterator argv_begin(int argc, char* argv[]) noexcept;
constexpr argv_iterator argv_end(int argc, char* argv[]) noexcept;

The lexy::argv_iterator is a bidirectional iterator iterating over the command-line arguments excluding the initial argument which is the executable name. It can be created using argv_begin() and argv_end().

Example

Use the command line arguments as input.

int main(int argc, char* argv[])
{
    auto input = lexy::argv_input(argc, argv);
    
}

If the program is invoked with ./a.out a 123 b, the input will be a\0123\0b.

Lexemes and Tokens

A lexeme is the part of the input matched by a token rule. It is represented by the class lexy::lexeme. A token is a combination of an identifier that defines the rule it matches, as well as the matched lexeme.

Note
When talking about tokens in the context of rules, it is usually short for token rule, i.e. the rule that defines what is matched, not the concrete realization.

Code point

lexy/encoding.hpp
namespace lexy
{
    class code_point
    {
    public:
        constexpr code_point() noexcept;
        constexpr explicit code_point(char32_t value) noexcept;

        constexpr char32_t value() const noexcept;

        constexpr bool is_valid() const noexcept;
        constexpr bool is_surrogate() const noexcept;
        constexpr bool is_scalar() const noexcept;

        constexpr bool is_ascii() const noexcept;
        constexpr bool is_bmp() const noexcept;

        friend constexpr bool operator==(code_point lhs, code_point rhs) noexcept;
        friend constexpr bool operator!=(code_point lhs, code_point rhs) noexcept;
    };
}

The class lexy::code_point represents a single code point from the input. It is merely a wrapper over a char32_t that contains the numerical code.

Constructors
constexpr code_point() noexcept; // (1)
constexpr explicit code_point(char32_t value) noexcept; (2)
  1. Creates an invalid code point.

  2. Creates the specified code point. The value will be returned from value() unchanged.

Validity
constexpr bool is_valid() const noexcept; // (1)
constexpr bool is_surrogate() const noexcept; // (2)
constexpr bool is_scalar() const noexcept; // (3)
  1. Returns true if the code point is less than 0x10’FFFF, false otherwise.

  2. Returns true if the code point is a UTF-16 surrogate, false otherwise.

  3. Returns true if the code point is valid and not a surrogate, false otherwise.

Category
constexpr bool is_ascii() const noexcept; // (1)
constexpr bool is_bmp() const noexcept; // (2)
  1. Returns true if the code point is ASCII (7-bit value), false otherwise.

  2. Returns true if the code point is in the Unicode BMP (16-bit value), false otherwise.

Lexeme

lexy/lexeme.hpp
namespace lexy
{
    template <typename Reader>
    class lexeme
    {
    public:
        using encoding  = typename Reader::encoding;
        using char_type = typename encoding::char_type;
        using iterator  = typename Reader::iterator;

        constexpr lexeme() noexcept;
        constexpr lexeme(iterator begin, iterator end) noexcept;

        constexpr explicit lexeme(const Reader& reader, iterator begin) noexcept
        : lexeme(begin, reader.cur())
        {}

        constexpr bool empty() const noexcept;

        constexpr iterator begin() const noexcept;
        constexpr iterator end() const noexcept;

        // Only if the iterator is a pointer.
        constexpr const char_type* data() const noexcept;

        // Only if the iterator has `operator-`.
        constexpr std::size_t size() const noexcept;

        // Only if the iterator has `operator[]`.
        constexpr char_type operator[](std::size_t idx) const noexcept;
    };

    template <typename Input>
    using lexeme_for = lexeme<input_reader<Input>>;
}

The class lexy::lexeme represents a sub-range of the input. For convenience, most inputs also provide convenience typedefs that can be used instead of lexy::lexeme_for.

Token Kind

lexy/token.hpp
namespace lexy
{
    enum predefined_token_kind
    {
        unknown_token_kind,
        eof_token_kind,
        position_token_kind,
        identifier_token_kind,
    };

    template <typename TokenKind = void>
    class token_kind
    {
    public:
        constexpr token_kind() noexcept;
        constexpr token_kind(predefined_token_kind value) noexcept;
        constexpr token_kind(TokenKind value) noexcept;
        template <typename TokenRule>
        constexpr token_kind(TokenRule token_rule) noexcept;

        constexpr explicit operator bool() const noexcept;

        constexpr bool is_predefined() const noexcept;

        constexpr const char* name() const noexcept;

        constexpr TokenKind get() const noexcept;

        static constexpr std::uint_least16_t to_raw(token_kind<TokenKind> kind) noexcept;
        static constexpr token_kind<TokenKind> from_raw(std::uint_least16_t kind) noexcept;

        friend constexpr bool operator==(token_kind lhs, token_kind rhs) noexcept;
        friend constexpr bool operator!=(token_kind lhs, token_kind rhs) noexcept;
    };
}

The class lexy::token_kind identifies a token rule. It is merely a wrapper over the specified TokenKind, which is an enum. If TokenKind is void, it is a wrapper over an int.

A token kind can represent any of the lexy::predefined_token_kind as well as any values specified in the given enum, or any integer value. Predefined token kinds are mapped to spare enum values.

Constructors
constexpr token_kind() noexcept;                         // (1)

constexpr token_kind(predefined_token_kind value) noexcept; // (2)

constexpr token_kind(TokenKind value) noexcept; // (3)

template <typename TokenRule>
constexpr token_kind(TokenRule token_rule) noexcept; // (4)
  1. Creates an unknown token kind.

  2. Creates a predefined token kind.

  3. Creates the specified token kind, if TokenKind is void, constructor takes an int.

  4. Creates a token kind from a token rule.

The token kind of a rule is computed as follows:

  • If the token rule was associated with a token kind by calling .kind<value>, the resulting kind is the specified value>.

  • Otherwise, if the map found at lexy::token_kind_map_for<TokenKind> contains a mapping for the TokenRule, it uses that.

  • Otherwise, the token kind is unknown.

Access
constexpr explicit operator bool() const noexcept; // (1)

constexpr bool is_predefined() const noexcept; // (2)

constexpr const char* name() const noexcept; // (3)

constexpr TokenKind get() const noexcept; // (4)
  1. Returns true if the token kind is not unknown, false otherwise.

  2. Returns true if the token kind one of the lexy::predefined_token_kind`s, `false otherwise.

  3. Returns the name of the token kind.

  4. Returns the underlying value of the token kind, which is some other value for predefined tokens.

The name of a token kind is determined as follows:

  • If the TokenKind is void, the name is "token" for all token kinds.

  • Otherwise, if the token kind is unknown, the name is "token".

  • Otherwise, if the token kind is predefined, the name describes the predefined token.

  • Otherwise, if ADL finds an overload const char* token_kind_name(TokenKind kind), returns that as the name.

  • Otherwise, the name is "token" for all tokens.

Token Kind Map

lexy/token.hpp
namespace lexy
{
    class Token-Kind-Map
    {
    public:
        template <auto TokenKind, typename TokenRule>
        consteval Token-Kind-Map map(TokenRule) const;
    };

    inline constexpr auto token_kind_map = Token-Kind-Map{};

    template <typename TokenKind>
    constexpr auto token_kind_map_for = token_kind_map;
}

There are two ways to associate a token kind with a token rule. Either by calling .kind<Kind> on the token rule and giving it a value there, or by specializing the lexy::token_kind_map_for for your TokenKind enumeration.

Example
enum class my_token_kind // (1)
{
    code_point,
    period,
    open_paren,
    close_paren,
};

// (2)
template <>
constexpr auto lexy::token_kind_map_for<my_token_kind>
    = lexy::token_kind_map.map<my_token_kind::code_point>(lexy::dsl::code_point)
                          .map<my_token_kind::period>(lexy::dsl::period)
                          .map<my_token_kind::open_paren>(lexy::dsl::parenthesized.open())
                          .map<my_token_kind::close_paren>(lexy::dsl::parenthesized.close());
  1. Define your TokenKind enumeration.

  2. Define the mapping of token rules to enumeration values.

Note
The token kind is only relevant when lexy::parse_as_tree() is used to parse the input.

Token

lexy/token.hpp
namespace lexy
{
    template <typename Reader, typename TokenKind = void>
    class token
    {
    public:
        explicit constexpr token(token_kind<TokenKind> kind, lexy::lexeme<Reader> lex) noexcept;
        explicit constexpr token(token_kind<TokenKind> kind,
                                 typename Reader::iterator begin,
                                 typename Reader::iterator end) noexcept;

        constexpr token_kind<TokenKind> kind() const noexcept;
        constexpr auto lexeme() const noexcept;

        constexpr auto name() const noexcept { return kind().name(); }

        constexpr auto position() const noexcept -> typename Reader::iterator
        {
            return lexeme().begin();
        }
    };

    template <typename Input, typename TokenKind = void>
    using token_for = token<input_reader<Input>, TokenKind>;
}

The class lexy::token just combines a lexy::token_kind and a lexy::lexeme.

Writing custom Inputs

The Input concept
class Input
{
public:
    Reader reader() const&;
};

An Input is just a class with a reader() member function that returns a Reader to the beginning of the input. The type alias lexy::input_reader<Reader> returns the type of the corresponding reader.

Warning
The interface of a Reader is currently experimental. Refer to the comments in lexy/input/base.hpp.

Matching, parsing and validating

The Production concept
struct Production
{
    static constexpr auto rule = ;
    static constexpr auto whitespace = ; // optional

    static constexpr auto value = ; // optional
};

A Production is type containing a rule and optional callbacks that produce the value. A grammar contains an entry production where parsing begins and all productions referenced by it.

Tip
It is recommended to put all productions of a grammar into a separate namespace.

By passing the entry production of the grammar to lexy::match(), lexy::parse(), or lexy::validate(), the production is parsed.

Matching

lexy/match.hpp
namespace lexy
{
    template <typename Production, typename Input>
    constexpr bool match(const Input& input);
}

The function lexy::match() matches the Production on the given input. If the production accepts the input, returns true, otherwise, returns false. It will discard any values produced and does not give detailed information about why the production did not accept the input.

Note
A production does not necessarily need to consume the entire input for it to match. Add lexy::dsl::eof to the end if the production should consume the entire input. 42

Validating

lexy/validate.hpp
namespace lexy
{
    template <typename ErrorCallback>
    class validate_result
    {
    public:
        using error_callback = ErrorCallback;
        using error_type     = /* return type of the sink */;

        constexpr explicit operator bool() const noexcept
        {
            return is_success();
        }

        constexpr bool is_success() const noexcept; // (1)
        constexpr bool is_error() const noexcept; // (2)
        constexpr bool is_recovered_error() const noexcept; // (3)
        constexpr bool is_fatal_error() const noexcept; // (4)

        constexpr std::size_t error_count() const noexcept;

        constexpr const error_type& errors() const& noexcept;
        constexpr error_type&& errors() && noexcept;
    };

    template <typename Production, typename Input, typename ErrorCallback>
    constexpr auto validate(const Input& input, ErrorCallback error_callback)
        -> validate_result<ErrorCallback>;
}
  1. Returns true if no error occurred during validation.

  2. Returns true if at least one error occurred during validation.

  3. Returns true if at least one error occurred during validation, but parsing could recover after all of them.

  4. Returns true if at least one error occurred during validation and parsing had to cancel.

The function lexy::validate() validates that the Production matches on the given input. If a parse error occurs, it will invoke the error callback (see Error handling); all errors are then returned. It will discard any values produced.

Note
A production does not necessarily need to consume the entire input for it to match. Add lexy::dsl::eof to the end if the production should consume the entire input.

Parsing

lexy/parse.hpp
namespace lexy
{
    template <typename T, typename ErrorCallback>
    class parse_result
    {
    public:
        using value_type     = T;
        using error_callback = ErrorCallback;
        using error_type     = /* return type of the sink */;

        //=== status ===//
        constexpr explicit operator bool() const noexcept
        {
            return is_success();
        }

        constexpr bool is_success() const noexcept; // (1)
        constexpr bool is_error() const noexcept; // (2)
        constexpr bool is_recovered_error() const noexcept; // (3)
        constexpr bool is_fatal_error() const noexcept; // (4)

        //=== value ===//
        constexpr bool has_value() const noexcept; // (5)

        constexpr const T& value() const& noexcept;
        constexpr T&& value() && noexcept;

        //=== error ===//
        constexpr std::size_t error_count() const noexcept;

        constexpr const error_type& errors() const& noexcept;
        constexpr error_type&& errors() && noexcept;
    };

    template <typename Production, typename Input, typename ErrorCallback>
    constexpr auto parse(const Input& input, ErrorCallback error_callback)
        -> parse_result</* see below */, ErrorCallback>;

    template <typename Production, typename Input, typename State, typename ErrorCallback>
    constexpr auto parse(const Input& input, const State& state, ErrorCallback error_callback)
        -> parse_result</* see below */, ErrorCallback>;
}
  1. Returns true if no error occurred during parsing.

  2. Returns true if at least one error occurred during parsing.

  3. Returns true if at least one error occurred during parsing, but parsing could recover after all of them.

  4. Returns true if at least one error occurred during parsing and parsing had to cancel.

  5. Returns true if parsing could produce a value. This can only happen if there was no fatal error.

The function lexy::parse() parses the Production on the given input. The return value is a lexy::parse_result<T, ErrorCallback>, where T is the return type of the Production::value or Production::list callback. If the production accepts the input or there are only recoverable errors, invokes Production::value (see below) with the produced values and returns their result. Invokes the error callback for each parse error (see Error handling) and collects the errors.

The return value on success is determined using Production::value depending on three cases:

  • Production::rule does not contain a list. Then all arguments will be forwarded to Production::value as a callback whose result is returned.

  • Production::rule contains a list and no other rule produces a value. Then Production::value will be used as sink for the list values. If Production::value is also a callback that accepts the result of the sink as argument, it will be invoked with the sink result and the processed result returned. Otherwise, the result of the sink is the final result.

  • Production::rule contains a list and other rules produce values as well. Then Production::value will be used as sink for the list values. The sink result will be added to the other values in order and everything forwarded to Production::value as a callback. The callback result is then returned.

Note
The callback operator>> is useful for case 3 to create a combined callback and sink with the desired behavior.

The second overload of lexy::parse() allows passing an arbitrary state argument. This state will be made available to lexy::parse_state (see Binding arguments) and passed to the .sink() of Production::value, if it accepts it. That way, you can access other information (e.g. allocators for your containers) in the callbacks.

Callbacks

The Callback concept
struct Callback
{
    using return_type = ;

    return_type operator()(Args&&... args) const;
};

struct Sink
{
    class _sink // exposition only
    {
    public:
        using return_type = ;

        void operator()(Args&&... args);

        return_type&& finish() &&;
    };

    _sink sink(Args&&... args) const;
};

A Callback is a function object whose return type is specified by a member typedef. A Sink is a type with a sink() member function, which can take arbitrary arguments (e.g. allocators) and returns a callback. The callback can be invoked multiple times and the final value is return by calling .finish().

Callbacks are used by lexy to compute the parse result and handle error values. They can either be written manually implementing to the above concepts or composed from the pre-defined concepts.

Note
When a Sink is used as Production::value, the .sink() member function must either have zero parameters or a single one that matches the state passed to the lexy::parse() overload. The latter is the case if lexy::bind_sink() is used.

Callback adapters

lexy/callback/base.hpp
namespace lexy
{
    template <typename ReturnType = void, typename... Fns>
    constexpr Callback callback(Fns&&... fns);
}

Creates a callback with the given ReturnType from multiple functions. When calling the resulting callback, it will use overload resolution to determine the correct function to call. It supports function pointers, lambdas, and member function or data pointers.

lexy/callback/base.hpp
namespace lexy
{
    template <typename Sink>
    constexpr Callback callback(Sink&& sink);
}

Creates a callback from the sink. Each argument will be forwarded to a separate invocation of the sink callback. The final result of the sink is returned.

Callback composition

lexy/callback/base.hpp
namespace lexy
{
    template <typename First, typename Second>
    constexpr auto operator|(First first, Second second); // (1)

    template <typename Sink, typename Callback>
    constexpr auto operator>>(Sink sink, Callback callback); // (2)

}
  1. The result of first | second, where first and second are both callbacks, is another callback that first invokes first and then passes the result to second. The result cannot be used as sink.

  2. The result of sink >> callback, is both a sink and a callback. As a sink, it behaves just like sink. As a callback, it takes the result of the sink as well as any other arguments and forwards them to callback.

Example

Build a string, then get its length.

constexpr auto make_string = lexy::callback<std::string>([](const char* str) { return str; });
constexpr auto string_length = lexy::callback<std::size_t>(&std::string::size);

constexpr auto inefficient_strlen = make_string | string_length; // (1)

assert(inefficient_strlen("1234") == 4); // (2)
  1. Compose the two callbacks.

  2. Use it.

Note
The callback operator>> is used for productions whose rule contain both a list and produce other values. The list will be constructed using the sink and then everything will be passed to callback.

The no-op callback

lexy/callback/noop.hpp
namespace lexy
{
    constexpr auto noop = /* unspecified */;
}

lexy::noop is both a callback and a sink. It ignores all arguments passed to it and its return type is void.

Example

Parse the production, but do nothing on errors.

auto result = lexy::parse<my_production>(my_input, lexy::noop); // (1)
if (!result)
    throw my_parse_error(); // (2)
auto value = result.value(); // (3)
  1. Parse my_production. If an error occurs, just return a result<T, void> in the error state.

  2. lexy::noop does not make errors disappear, they still need to be handled.

  3. Do something with the parsed value.

The constant callback

lexy/callback/constant.hpp
template <typename Arg>
consteval auto constant(Arg&& arg);

The lexy::constant() callback does not take any arguments and always produces the given value.

Forwarding a result

lexy/callback/forward.hpp
template <typename T>
constexpr auto forward = /* unspecified */;

The callback lexy::forward<T> can accept either a const T& or a T&& and forwards it. It does not have a sink.

Constructing objects

lexy/callback/object.hpp
namespace lexy
{
    template <typename T>
    constexpr auto construct = /* unspecified */;

    template <typename T, typename PtrT = T*>
    constexpr auto new_ = /* unspecified */;
}

The callback lexy::construct<T> constructs a T by forwarding all arguments to a suitable constructor. If the type does not have a constructor, it forwards all arguments using brace initialization. It does not have a sink.

The callback lexy::new_<T, PtrT> works just like lexy::construct<T>, but it constructs the object on the heap by calling new. The resulting pointer is then converted to the specified PtrT. It does not have a sink.

Example

A callback that creates a std::unique_ptr<std::string>.

constexpr auto make_unique_str = lexy::new_<std::string, std::unique_ptr<std::string>>; // (1)

constexpr auto make_unique_str2 = lexy::new_<std::string> | lexy::construct<std::unique_ptr<std::string>>; // (2)
  1. Specify a suitable PtrT.

  2. Equivalent version that uses composition and lexy::construct instead.

Constructing containers

lexy/callback/container.hpp
namespace lexy
{
    template <typename Container>
    constexpr auto as_list = /* unspecified */;

    template <typename Container>
    constexpr auto as_collection = /* unspecified */;
}

lexy::as_list<Container> is a callback and a sink:

  • As a callback, it accepts a lexy::nullopt which results in an empty Container, or a sequence of arguments with optional allocator. In the latter case, all arguments are added via push_back() to an empty container.

  • If .sink() is called with no arguments, it default constructs a Container. If .sink() is called with a single argument of type Container::allocator_type, it constructs an empty Container using that allocator. It then repeatedly calls push_back() for single arguments and emplace_back() otherwise.

lexy::as_collection<Container> is like lexy::as_list<Container>, but instead of calling push_back() and emplace_back(), it calls insert() and emplace().

Example

Create a std::vector<int> and std::set<int>.

constexpr auto as_int_vector = lexy::as_list<std::vector<int>>;
constexpr auto as_int_set = lexy::as_collection<std::set<int>>;
lexy/callback/container.hpp
namespace lexy
{
template <typename Callback>
constexpr Sink collect(Callback&& callback);

template <typename Container, typename Callback>
constexpr Sink collect(Callback&& callback);
}

Turns a callback into a sink by invoking it multiple times and collecting all the results in a container.

The first version requires that the callback returns void; its sink callback forwards all arguments and increases a count. The final count as a std::size_t is then returned by finish().

The second version requires that the callback returns non-void. Its sink callback creates a default constructed Container, optionally using a passed Container::allocator_type. It then invokes the callback multiple times and adds the result to the container using .push_back(). The final container is then returned.

Note
collect() is useful for the error callback to handle multiple errors.

Constructing strings

lexy/callback/string.hpp
namespace lexy
{
    template <typename String, typename Encoding = /* see below */>
    constexpr auto as_string = /* unspecified */;
}

lexy::as_string<String, Encoding> is both a callback and a sink, which accepts no arguments. It constructs a String object in the given Encoding. If no encoding is specified, it deduces one from the character type of the string.

As a callback, it constructs the string directly from the given argument. Then it accepts:

  • lexy::nullopt, which results in a default-constructed String object.

  • A reference to an existing String object, which is forwarded as the result.

  • A const CharT* and a std::size_t, where CharT is a compatible character type. The two arguments are forwarded to a String constructor.

  • A lexy::lexeme<Reader> lex, where Reader::iterator is a pointer. The character type of the reader must be compatible with the encoding. It constructs the string using String(lex.data(), lex.size()) (potentially casting the pointer type if necessary).

  • A lexy::lexeme<Reader> lex, where Reader::iterator is not a pointer. It constructs the string using String(lex.begin(), lex.end()). The range constructor has to take care of any necessary character conversion.

  • A lexy::code_point. It is encoded into a local character array according to the specified Encoding. Then the string is constructed using a two-argument (const CharT*, std::size_t) constructor.

As a sink, it first default constructs the string, optionally using a passed String::allocator_type. Then it will repeatedly append the following arguments:

  • A single CharT, which is convertible to the strings character type. It is appended by calling .push_back().

  • A reference to an existing String object, which is appended by calling .append().

  • A const CharT* and a std::size_t, where CharT is a compatible character type. The two arguments are forwarded to .append().

  • A lexy::lexeme<Reader> lex, where Reader::iterator is a pointer. The character type of the reader must be compatible with the encoding. It is appended using .append(lex.data(), lex.size()) (potentially casting the pointer type if necessary).

  • A lexy::lexeme<Reader> lex, where Reader::iterator is not a pointer. It constructs the string using .append(lex.begin(), lex.end()). The range append function has to take care of any necessary character conversion.

  • A lexy::code_point. It is encoded into a local character array according to the specified Encoding. Then it is appended to the string using a two-argument .append(const CharT*, std::size_t) overload.

Example
constexpr auto as_utf16_string = lexy::as_string<std::u16string>;                   // (1)
constexpr auto as_utf8_string  = lexy::as_string<std::string, lexy::utf8_encoding>; // (2)
  1. Constructs a std::u16string, deducing the encoding as UTF-16.

  2. Constructs a std::string, specifying the encoding as UTF-8.

Binding arguments

lexy/callback/bind.hpp
namespace lexy
{
    template <typename Callback, typename ... Args>
    constexpr auto bind(Callback&& callback, Args&&... args);
}

lexy::bind() allows binding some parameters of callback to given values or re-ordering arguments before invoking the callback, similar to std::bind().

It can accept either a Callback or a Sink. For a Callback, binds the invocation of its operator(). For a Sink, binds the invocation of the operator() of the sink’s callback that will be invoked for every item.

The arguments to lexy::bind() can either be arbitrary expressions or bind placeholders (see below). If the Nth argument is an expression, the value of the expression will be passed as the Nth argument to callback. If the Nth argument is a bind placeholder, the value produced by the placeholder will be passed as the Nth argument to callback.

Example
constexpr auto make_stars = lexy::bind(lexy::construct<std::string>, lexy::_1 or 0, '*'); // (1)
std::string a = make_stars(3); // (2)
std::string b = make_stars(); // (3)
  1. Bind the (std::size_t, char) constructor of std::string: the first argument will be forwarded, but the second is fixed to '*'.

  2. a will be "***"

  3. b will be "" as the default value for the first argument is 0.

lexy/callback/bind.hpp
namespace lexy
{
    template <typename Sink, typename ... Args>
    constexpr auto bind_sink(Sink&& callback, Args&&... args);
}

lexy::bind_sink() behaves similar to lexy::bind(), but it binds the .sink() function of a Sink instead.

Example
constexpr auto list = lexy::bind_sink(lexy::as_list<std::vector<int>>, lexy::parse_state.map(&MyState::allocator);

Constructs a list using an allocator by forwarding the .allocator member of MyState to the .sink(). This requires that you call lexy::parse() passing it a MyState object.

Note
It does not make sense to use lexy::nth_value or lexy::values as bind placeholders: .sink() does not accept any arguments in normal parsing code, it only takes the state accessible via lexy::parse_state.
lexy::nth_value
lexy/callback/bind.hpp
namespace lexy
{
    template <std::size_t N>
    class nth-value-impl
    {
    public:
        template <typename Arg>
        constexpr nth-value-impl or_(Arg&& fallback) const;
        template <typename Arg>
        constexpr nth-value-impl operator||(Arg&& fallback) const;

        constexpr nth-value-impl or_default() const;

        template <typename Fn>
        constexpr nth-value-impl map(Fn&& fn) const;
    };

    template <std::size_t N>
    constexpr auto nth_value = nth-value-impl<N>{};

    inline namespace placeholders
    {
        constexpr auto _1 = nth_value<1>;
        
        constexpr auto _8 = nth_value<8>;
    }
}

The bind placeholder lexy::nth_value<N> expands to the `N`th argument passed to the bound callback, i.e. the `N`th argument produced by a rule. Arguments are indexed beginning with 1. For convenience, placeholders 1 through 8 are pre-defined.

The member function .map() takes a function fn and returns a placeholder that will expand to fn(value), where value is the `N`th argument. It supports member pointers.

The member function .or_() (or alternatively spelled ||/or) provides a fallback value. If there are fewer than N arguments, it expands to fallback. Without a fallback, this would be a compile-time error. If the N`th argument is of type `lexy::nullopt (e.g. as produced by a dsl::opt() rule), it expands to fallback. Without a fallback, it would expand to the lexy::nullopt value unchanged. The member function .or_default() provides a fallback that can produce a default constructed value of any type via implicit conversion.

If both a mapping and a fallback is specified, the fallback is not passed to fn but kept as-is.

lexy::values
lexy/callback/bind.hpp
namespace lexy
{
    constexpr auto values = values-impl{};
}

The bind placeholder lexy::values expands to all arguments passed to the bound callback, unchanged and in the same order. If the bound callback is invoked with N arguments, it is equivalent to passing lexy::_1, …​, lexy::nth_value<N>.

Note
If you use lexy::nth_value<N> and lexy::values, the `N`th argument will be duplicated.
lexy::parse_state
lexy/callback/bind.hpp
namespace lexy
{
    class parse-state-impl
    {
    public:
        template <typename Fn>
        constexpr parse-state-impl map(Fn&& fn) const;
    };

    constexpr auto parse_state = parse-state-impl{};
}

The bind placeholder lexy::parse_state expands to the state passed in the second overload of lexy::parse(). If there is no parse state, e.g. because the first overload was used or the callback is invoked in a different context, the program is ill-formed.

The member function .map() takes a function fn and returns a placeholder that will expand to fn(state), where state is the parse state. It supports member pointers.

Tip
Use lexy::parse_state to pass allocators to your containers.

Folding and counting

lexy/callback/fold.hpp
namespace lexy
{
    template <typename T, typename Arg = T, typename Op>
    constexpr auto fold(Arg&& init, Op&& op);

    template <typename T, typename Arg = T, typename Op>
    constexpr auto fold_inplace(Arg&& init, Op&& op);
}

lexy::fold and lexy::fold_inplace are sinks.

They initialize their result of type T from init and then fold with every invocation of the sink callback: lexy::fold with result = invoke(op)(result, args…​), lexy::fold_inplace with invoke(op)(result, args…​), where args are the argument of the sink callback.

lexy/callback/fold.hpp
namespace lexy
{
    constexpr auto count;
}

lexy::count is a sink that counts the number of arguments.

It returns the number of invocations of the sink callback as a std::size_t. Each invocation of the sink callback can be done with an arbitrary number of arguments that are all ignored itself.

It is equivalent to a lexy::fold with init to 0 and where the operation increments the result and ignores all other arguments.

Rule-specific callbacks

lexy/callback/aggregate.hpp
namespace lexy
{
    template <typename T>
    constexpr auto as_aggregate = /* unspecified */;
}

The callback and sink lexy::as_aggregate<T> is only used together with the lexy::dsl::member rule and documented there.

lexy/callback/integer.hpp
namespace lexy
{
    template <typename T>
    constexpr auto as_integer = /* unspecified */;
}

The callback lexy::as_integer<T> constructs an integer type T and has three overloads:

template <typename Integer>
T operator()(const Integer& value) const; // (1)

template <typename Integer>
T operator()(lexy::plus_sign sign, const Integer& value) const;  // (2)
template <typename Integer>
T operator()(lexy::minus_sign sign, const Integer& value) const; // (2)
  1. Returns T(value).

  2. Returns T(sign * value).

The second overload is meant to be used together with lexy::dsl::sign and related rules.

Error handling

Parsing errors are reported by constructing a lexy::error object and passing it to the error callback of lexy::parse and lexy::validate together with the lexy::error_context. The error callback must either be a sink, in which case it can return an arbitrary type that represents a collection of all the errors, or is a non-sink callback that returns void, in which case it will be passed to lexy::collect() to turn it into a sink.

The error_type of lexy::validate_result and lexy::parse_result will be the return type of the sink. For a void returning non-sink callback it will be std::size_t, which is the result of lexy::collect().

Example
A void-returning error callback that is not a sink.
class ErrorCallbackVoid
{
public:
    using return_type = void;

    template <typename Production, typename Input, typename Tag>
    void operator()(const lexy::error_context<Production, Input>& context,
                           const lexy::error<lexy::input_reader<Input>, Tag>& error) const;
};
A non-void-returning error callback that is a sink.
class ErrorCallbackSink
{
public:
    class Sink
    {
    public:
        using return_type = /* ... */;

        template <typename Production, typename Input, typename Tag>
        void operator()(const lexy::error_context<Production, Input>& context,
                               const lexy::error<lexy::input_reader<Input>, Tag>& error) const;

        return_type finish() &&;
    };

    Sink sink();
};

Of course, overloading can be used to differentiate between various error types and contexts.

Error types

lexy/error.hpp
namespace lexy
{
    template <typename Reader, typename Tag>
    class error;

    struct expected_literal {};
    template <typename Reader>
    class error<Reader, expected_literal>;

    struct expected_keyword {};
    template <typename Reader>
    class error<Reader, expected_keyword>;

    struct expected_char_class {};
    template <typename Reader>
    class error<Reader, expected_char_class>;

    template <typename Input, typename Tag>
    using error_for = error<input_reader<Input>, Tag>;

    template <typename Reader, typename Tag, typename ... Args>
    constexpr auto make_error(Args&&... args);
}

All errors are represented by instantiations of lexy::error<Reader, Tag>. The Tag is an empty type that specifies the kind of error. There are specializations for two tags to store additional information.

The function lexy::make_error constructs an error object given the reader and tag by forwarding all the arguments.

Generic error
template <typename Reader, typename Tag>
class error
{
    using iterator = typename Reader::iterator;

public:
    constexpr explicit error(iterator pos) noexcept;
    constexpr explicit error(iterator begin, iterator end) noexcept;

    constexpr iterator position() const noexcept;

    constexpr iterator begin() const noexcept;
    constexpr iterator end() const noexcept;

    constexpr const char* message() const noexcept;
};

The primary class template lexy::error<Reader, Tag> represents a generic error without additional metadata. It can either be constructed giving it a single position, then position() == begin() == end(); or a range of the input, then position() == begin() ⇐ end().

The message() is determined using the Tag. By default, it returns the type name of Tag after removing the top-level namespace name. This can be overridden by defining either Tag::name() or Tag::name.

Expected literal error
struct expected_literal
{};

template <typename Reader>
class error<Reader, expected_literal>
{
    using iterator    = typename Reader::iterator;

public:
    constexpr explicit error(iterator position,
                             const typename Reader::char_type* string, std::size_t index) noexcept;

    constexpr iterator position() const noexcept;

    constexpr auto string() const noexcept -> const typename Reader::char_type*;
    constexpr auto character() const noexcept -> typename Reader::char_type;

    constexpr std::size_t index() const noexcept;
};

A specialization of lexy::error is provided if Tag == lexy::expected_literal. It represents the error where a literal string was expected, but could not be matched. It is mainly raised by the lexy::dsl::lit rule.

The error happens at a given position() and with a given string(). The index() is the index into the string where matching failed; e.g. 0 if the input starts with a different character, 2 if the first two characters matched, etc. The character() is the string character at that index.

Expected keyword error
struct expected_keyword
{};

template <typename Reader>
class error<Reader, expected_keyword>
{
    using iterator = typename Reader::iterator;

public:
    constexpr explicit error(iterator begin, iterator end,
                             const typename Reader::char_type* str);

    constexpr iterator position() const noexcept;
    constexpr iterator begin() const noexcept;
    constexpr iterator end() const noexcept;

    constexpr auto string() const noexcept -> const typename Reader::char_type*;
};

A specialization of lexy::error is provided if Tag == lexy::expected_keyword. It represents the error where a keyword was expected, but could not be matched. It is raised by the lexy::dsl::keyword rule.

The error happens at a given position() and with a given string(), which is the text of the keyword. begin() and end() span the entire range of the identifier, which should have been string() but wasn’t.

Character class error
struct expected_char_class
{};

template <typename Reader>
class error<Reader, expected_char_class>
{
    using iterator = typename Reader::iterator;

public:
    constexpr explicit error(iterator position, const char* name) noexcept;

    constexpr iterator position() const noexcept;

    constexpr const char* name() const noexcept;
};

A specialization of lexy::error is provided if Tag == lexy::expected_char_class. It represents the error where any character from a given set of characters was expected, but could not be matched. It is raised by the lexy::dsl::ascii::* rules or lexy::dsl::newline, among others.

The error happens at the given position() and a symbolic name of the character class is returned by name(). By convention, the name format used is <group>.<name> or <name>, where both <group> and <name> consist of characters. Examples include newline, ASCII.alnum and digit.decimal.

Error context

lexy/error.hpp
namespace lexy
{
    template <typename Production, typename Input>
    class error_context
    {
        using iterator = typename input_reader<Input>::iterator;

    public:
        constexpr explicit error_context(const Input& input, iterator pos) noexcept;

        constexpr const Input& input() const noexcept;

        static consteval const char* production();

        constexpr iterator position() const noexcept;
    };
}

The class lexy::error_context<Production, Input> contain information about the context where the error occurred.

The entire input containing the error is returned by input().

The Production whose rule has raised the error is specified as template parameter and its name returned by production(). Like lexy::error<Reader, Tag>::message(), it returns the name of the type without the top level namespace name. This can be overridden by defining Production::name() or Production::name.

The position() of the error context is the input position where the production started parsing.

Parse Tree

lexy/parse_tree.hpp
namespace lexy
{
    enum class traverse_event
    {
        enter,
        exit,
        leaf,
    };

    template <typename Reader, typename TokenKind = void,
              typename MemoryResource = /* default */>
    class parse_tree
    {
    public:
        class builder;

        constexpr parse_tree();
        constexpr explicit parse_tree(MemoryResource* resource);

        bool empty() const noexcept;
        void clear() noexcept;

        class node;
        class node_kind;

        node root() const noexcept; // requires: !empty()

        class traverse_range;

        traverse_range traverse(const node& n) const noexcept;
        traverse_range traverse() const noexcept;
    };

    template <typename Input, typename TokenKind = void,
              typename MemoryResource = /* default */>
    using parse_tree_for = lexy::parse_tree<input_reader<Input>, TokenKind, MemoryResource>;

    template <typename Production, typename TokenKind, typename MemoryResource, typename Input,
              typename ErrorCallback>
    auto parse_as_tree(parse_tree<input_reader<Input>, TokenKind, MemoryResource>& tree,
                       const Input& input, ErrorCallback error_callback)
      -> lexy::validate_result<ErrorCallback>;
}

The class lexy::parse_tree represents a lossless untyped syntax tree.

The function lexy::parse_as_tree() parses a Production on the given input and constructs a lossless parse tree from the result. All parse errors are passed to the error callback (see Error handling) and later returned. If a non-recoverable parse error happens, the tree will be cleared, otherwise it contains the (partial) parse tree of the input. It will discard any values produced by parsing the rules.

The resulting parse tree will contain a parent node for each production, and leaf node for every token. If a token is empty and has an unknown token kind, it will not be added to the parse tree. If a production inherits from lexy::transparent_production, no separate node will be created; instead all child nodes will be added to its parent. If a production inherits from lexy::token_production, tokens are merged when possible: if there are two or more tokens with the same kind directly after each other, only a single node spanning all of them will be added, as opposed to multiple nodes for each individual token.

Traversing the tree and concatenating the lexemes of all tokens will result in the original input.

Manual Tree Building

template <typename Reader, typename TokenKind, typename MemoryResource>
class parse_tree<Reader, TokenKind, MemoryResource>::builder
{
public:
    template <typename Production>
    explicit builder(parse_tree&& tree, Production production); // (1)
    template <typename Production>
    explicit builder(Production production); // (2)

    struct production_state;

    template <typename Production>
    production_state start_production(Production production); // (3)

    void token(token_kind<TokenKind> kind,
               typename Reader::iterator begin, typename Reader::iterator end); // (4)

    void finish_production(production_state&& s); // (5)
    void backtrack_production(production_state&& s); // (6)

    parse_tree finish() &&; // (7)
};
  1. Create a builder that will re-use the memory of the existing tree. Its root node will be associated with the given Production.

  2. Same as above, but does not re-use memory.

  3. Adds a production child node as last child of the current node and activates it. Returns a handle that remembers the previous current node.

  4. Adds a token node to the current node.

  5. Finishes with a child production and activates its parent.

  6. Cancels the currently activated node, by deallocating it and all children. Activates its parent node again.

  7. Returns the finished tree.

Tree Node

template <typename Reader, typename TokenKind, typename MemoryResource>
class parse_tree<Reader, TokenKind, MemoryResource>::node_kind
{
public:
    bool is_token() const noexcept;
    bool is_production() const noexcept;

    bool is_root() const noexcept;
    bool is_token_production() const noexcept;

    const char* name() const noexcept;

    friend bool operator==(node_kind lhs, node_kind rhs);
    friend bool operator!=(node_kind lhs, node_kind rhs);

    friend bool operator==(node_kind nk, token_kind<TokenKind> tk);
    friend bool operator==(token_kind<TokenKind> tk, node_kind nk);
    friend bool operator!=(node_kind nk, token_kind<TokenKind> tk);
    friend bool operator!=(token_kind<TokenKind> tk, node_kind nk);

    template <typename Production>
    friend bool operator==(node_kind nk, Production);
    template <typename Production>
    friend bool operator==(Production p, node_kind nk);
    template <typename Production>
    friend bool operator!=(node_kind nk, Production p);
    template <typename Production>
    friend bool operator!=(Production p, node_kind nk);
};

The class node_kind stores information over the kind of node. Nodes are either associated with a Production or a token rule. The root node is always a Production node.

template <typename Reader, typename TokenKind, typename MemoryResource>
class parse_tree<Reader, TokenKind, MemoryResource>::node
{
public:
    void* address() const noexcept;

    node_kind kind() const noexcept;

    node parent() const noexcept;

    /* sized range */ children() const noexcept;

    /* range */ siblings() const noexcept;

    bool is_last_child() const noexcept;

    lexy::lexeme<Reader> lexeme() const noexcept;
    lexy::token<Reader, TokenKind> token() const noexcept;

    friend bool operator==(node lhs, node rhs) noexcept;
    friend bool operator!=(node lhs, node rhs) noexcept;
};

The class node is a reference to a node in the tree. Two nodes are equal if and only if they point to the same node in the same tree.

Parent Access
node parent() const noexcept;

Returns a reference to a parent node. For the root node, returns a reference to itself.

This operation is O(number of siblings).

Child Access
class children_range
{
public:
    class iterator; // value_type = node
    class sentinel;

    iterator begin() const noexcept;
    sentinel end() const noexcept;

    bool empty() const noexcept;
    std::size_t size() const noexcept;
};

children_range children() const noexcept;

Returns a range object that iterates over all children of the node. For a token node, this is always the empty range.

Sibling Access
class sibling_range
{
public:
    class iterator; // value_type = node

    iterator begin() const noexcept;
    iterator end() const noexcept;

    bool empty() const noexcept;
};

sibling_range siblings() const noexcept;

Returns a range object that iterates over all siblings of a node. It begins with the sibling that is immediately following the node, and continues until it reached the last child of the parent. Then iteration wraps around to the first child of the parent until it ends at the original node. The original node is not included in the sibling range.

Token Access
lexy::lexeme<Reader> lexeme() const noexcept; // (1)
lexy::token<Reader, TokenKind> token() const noexcept; // (2)
  1. Returns the spelling of a token node. For a production node, returns the empty lexeme.

  2. Returns the spelling and token kind of a token node; must not be called on a production node.

Tree Traversal

enum class traverse_event
{
    enter,
    exit,
    leaf,
};
class traverse_range
{
public:
    class iterator; // value_type = { traverse_event, node }

    iterator begin() const noexcept;
    iterator end() const noexcept;

    bool empty() const noexcept;
};

traverse_range traverse(const node& n) const noexcept; // (1)
traverse_range traverse() const noexcept; // (2)
  1. Returns a range that traverses descendants of the given node.

  2. Returns a range that traverses the root node, or an empty range if the tree is empty.

The traverse_range iterates over a node and all its children and their children and so on. Its value type is a (unspecified) pair whose first member is a lexy::traverse_event and whose second member is a node reference.

For a token node, the range contains only the original node with event leaf.

For a production node, the range begins with the original node and event enter. It then does an in-order traversal of all descendants, beginning with the children of a node. When it reaches a token node, produces it with event leaf. When it reaches a production node, produces it with event enter, then all its descendants recursively, and then with event exit. After all descendants of the original node have been produced, finishes with the original node again and event exit.

Example

Prints a tree.

auto depth = 0;
for (auto [event, node] : tree.traverse())
{
    switch (event)
    {
    case lexy::traverse_event::enter:
        ++depth;
        indent(depth);
        print_node(node);
        break;
    case lexy::traverse_event::exit:
        --depth;
        break;

    case lexy::traverse_event::leaf:
        indent(depth);
        print_node(node);
        break;
    }
}
Note
Traversing a node just does pointer chasing. There is no allocation or recursion involved.

The rule DSL

This documentation has been moved here.

Glossary

Branch

A rule that has an associated condition and will only be taken if the condition matches. It is used to make decisions in the parsing algorithm.

Callback

A function object with a return_type member typedef.

Encoding

Set of pre-defined classes that define the text encoding of the input.

Error Callback

The callback used to report errors.

Grammar

An entry production and all productions referenced by it.

Input

Defines the input that will be parsed.

Production

Building-block of a grammar consisting of a rule and an optional callback that produces the parsed value.

Rule

Matches a specific input and then produces a value or an error.

Sink

A type with a sink() method that then returns a function object that can be called multiple times.

Token

A rule that is an atomic building block of the input.