This is the reference documentation for lexy.

If anything in the documentation could be improved (and there is probably a lot), please raise an issue or — even better — create a PR. Thank you!

Inputs and Encodings

An Input defines the input that will be parsed by lexy. It has a corresponding Encoding that controls, among other things, its character type and whether certain rules are available. The Input itself is unchanging and it produces a Reader which remembers the current position of the input during parsing.

Encodings

lexy/encoding.hpp
namespace lexy
{
    struct default_encoding;
    struct ascii_encoding;
    struct utf8_encoding;
    struct utf16_encoding;
    struct utf32_encoding;
    struct byte_encoding;

    template <typename CharT>
    using deduce_encoding = /* see below */;

    enum class encoding_endianness;
}

An Encoding is a set of pre-defined policy classes that determine the text encoding of an input.

Each encoding has a primary character type, which is the character type of the input. It can also have a secondary character type, which the input should accept, but internally convert to the primary character type. For example, lexy::utf8_encoding’s primary character type is `char8_t, but it also accepts char.

The encoding also has an integer type, which can store either any valid character (code unit to be precise) or a special EOF value, similar to std::char_traits. For some encodings, the integer type can be the same as the character type as not all values are valid code units. This allows optimizations.

Certain rules can require a certain encodings. For example, lexy::dsl::code_point does not work with the lexy::default_encoding, and lexy::dsl::encode requires lexy::byte_encoding.

The supported encodings

lexy::default_encoding

The encoding that will be used when no other encoding is specified. Its character type is char and it can work with any 8-bit encoding (ASCII, UTF-8, extended ASCII etc.). Only use this encoding if you don’t know the exact encoding of your input.

lexy::ascii_encoding

Assumes the input is valid ASCII. Its character type is char.

lexy::utf8_encoding

Assumes the input is valid UTF-8. Its character type is char8_t, but it also accepts char.

lexy::utf16_encoding

Assumes the input is valid UTF-16. Its character type is char16_t, but it also accepts wchar_t on Windows.

lexy::utf32_encoding

Assumes the input is valid UTF-32. Its character type is char32_t, but it also accepts wchar_t on Linux.

lexy::byte_encoding

Does not assume the input is text. Its character type is unsigned char, but it also accepts char and std::byte. Use this encoding if you’re not parsing text or if you’re parsing text consisting of multiple encodings.

Note
If you specify an encoding that does not match the inputs actual encoding, e.g. you say it is UTF-8 but in reality it is some Windows code page, the library will handle it by generating parse errors. The worst that can happen is that you’ll get an unexpected EOF error because the input contains the character that is used to signal EOF in the encoding.

Deducing encoding

If you don’t specify an encoding for your input, lexy can sometimes deduce it by matching the character type to the primary character type. For example, a string of char8_t will be deduce it to be lexy::utf8_encoding. If the character type is char, lexy will deduce lexy::default_encoding (unless that has been overriden by a build option).

Encoding endianness

enum class encoding_endianness
{
    little,
    big,
    bom,
};

In-memory, UTF-16 and UTF-32 come in two flavors: big and little endian. Which version is used, can be specified with the encoding_endianness enumeration. This is only relevant when e.g. reading data from files.

little

The encoding is written using little endian. For single-byte encodings, this has no effect.

big

The encoding is written using big endian. For single-byte encodings, this has no effect.

bom

The endianness is determined using the byte-order mark (BOM) of the encoding. If no BOM is present, defaults to big endian as per Unicode recommendation. For UTF-8, this will skip the optional BOM, but has otherwise no effect. For non-Unicode encodings, this has no effect.

The pre-defined Inputs

Null input

lexy/input/null_input.hpp
namespace lexy
{
    template <typename Encoding = default_encoding>
    class null_input
    {
    public:
        constexpr Reader reader() const& noexcept;
    };

    template <typename Encoding = default_encoding>
    using null_lexeme = lexeme_for<null_input<Encoding>>;
    template <typename Tag, typename Encoding = default_encoding>
    using null_error = error_for<null_input<Encoding>, Tag>;
    template <typename Production, typename Encoding = default_encoding>
    using null_error_context = error_context<Production, null_input<Encoding>>;
}

The class lexy::null_input is an input that is always empty.

Range input

lexy/input/range_input.hpp
namespace lexy
{
    template <typename Encoding, typename Iterator, typename Sentinel = Iterator>
    class range_input
    {
    public:
        using encoding  = Encoding;
        using char_type = typename encoding::char_type;
        using iterator  = Iterator;

        constexpr range_input() noexcept;
        constexpr range_input(Iterator begin, Sentinel end) noexcept;

        constexpr iterator begin() const noexcept;
        constexpr iterator end() const noexcept;

        constexpr Reader reader() const& noexcept;
    };
}

The class lexy::range_input is an input that represents the range [begin, end). CTAD can be used to deduce the encoding from the value type of the iterator.

Note
The input is a lightweight view and does not own any data.
Tip
Use lexy::string_input instead if the range is contiguous.
Example

Using the range input to parse content from a list.

std::list<char8_t> list = ;

// Create the input, deducing the encoding.
auto input = lexy::range_input(list.begin(), list.end());

String input

lexy/input/string_input.hpp
namespace lexy
{
    template <typename Encoding = default_encoding>
    class string_input
    {
    public:
        using encoding  = Encoding;
        using char_type = typename encoding::char_type;
        using iterator  = const char_type*;

        constexpr string_input() noexcept;

        template <typename CharT>
        constexpr string_input(const CharT* begin, const CharT* end) noexcept;
        template <typename CharT>
        constexpr string_input(const CharT* data, std::size_t size) noexcept;

        template <typename View>
        constexpr explicit string_input(const View& view) noexcept;

        constexpr iterator begin() const noexcept;
        constexpr iterator end() const noexcept;

        constexpr Reader reader() const& noexcept;
    };

    template <typename Encoding, typename CharT>
    constexpr auto zstring_input(const CharT* str) noexcept;
    template <typename CharT>
    constexpr auto zstring_input(const CharT* str) noexcept;

    template <typename Encoding = default_encoding>
    using string_lexeme = lexeme_for<string_input<Encoding>>;
    template <typename Tag, typename Encoding = default_encoding>
    using string_error = error_for<string_input<Encoding>, Tag>;
    template <typename Production, typename Encoding = default_encoding>
    using string_error_context = error_context<Production, string_input<Encoding>>;
} // namespace lexy

The class lexy::string_input is an input that represents the string view defined by the constructors. CTAD can be used to deduce the encoding from the character type.

Note
The input is a lightweight view and does not own any data. Use lexy::buffer if you want an owning version.
Pointer constructor
template <typename CharT>
constexpr string_input(const CharT* begin, const CharT* end) noexcept; // (1)
template <typename CharT>
constexpr string_input(const CharT* data, std::size_t size) noexcept; // (2)
  1. The input is the contiguous range [begin, end).

  2. The input is the contiguous range [data, data + size).

CharT must be the primary or secondary character type of the encoding.

View constructor
template <typename View>
constexpr explicit string_input(const View& view) noexcept;

The input is given by the View, which requires a .data() and .size() member. The character type of the View must be the primary or secondary character type of the encoding.

Null-terminated string functions
template <typename Encoding, typename CharT>
constexpr auto zstring_input(const CharT* str) noexcept; // (1)
template <typename CharT>
constexpr auto zstring_input(const CharT* str) noexcept; // (2)
  1. Use the specified encoding.

  2. Deduce the encoding from the character type.

The input is given by the range [str, end), where end is a pointer to the first null character of the string. The return type is an appropriate lexy::string_input instantiation.

Example

Using the string input to parse content from a std::string.

std::string str = ;
auto input = lexy::string_input(str);

Using the string input to parse content from a string literal.

auto input = lexy::zstring_input(u"Hello World!");

Buffer Input

lexy/input/buffer.hpp
namespace lexy
{
template <typename Encoding       = default_encoding,
          typename MemoryResource = /* default resource */>
class buffer
{
public:
    using encoding  = Encoding;
    using char_type = typename encoding::char_type;

    class builder;

    constexpr buffer() noexcept;
    constexpr explicit buffer(MemoryResource* resource) noexcept;

    template <typename CharT>
    explicit buffer(const CharT* data, std::size_t size,
                    MemoryResource* resource = /* default resource */);
    template <typename CharT>
    explicit buffer(const CharT* begin, const CharT* end,
                    MemoryResource* resource = /* default resource */);

    template <typename View>
    explicit buffer(const View&     view,
                    MemoryResource* resource = /* default resource */);

    buffer(const buffer& other, MemoryResource* resource);

    const char_type* begin() const noexcept;
    const char_type* end() const noexcept;

    const char_type* data() const noexcept;

    bool empty() const noexcept;

    std::size_t size() const noexcept;
    std::size_t length() const noexcept;

    Reader reader() const& noexcept;
};

template <typename Encoding, encoding_endianness Endianness>
constexpr auto make_buffer_from_raw;

template <typename Encoding       = default_encoding,
          typename MemoryResource = /* default resource */>
using buffer_lexeme = lexeme_for<buffer<Encoding, MemoryResource>>;
template <typename Tag, typename Encoding = default_encoding,
          typename MemoryResource = /* default resource */>
using buffer_error = error_for<buffer<Encoding, MemoryResource>, Tag>;
template <typename Production, typename Encoding = default_encoding,
          typename MemoryResource = /* default resource */>
using buffer_error_context = error_context<Production, buffer<Encoding, MemoryResource>>;
}

The class lexy::buffer is an immutable, owning variant of lexy::string_input. The memory for the input is allocated using the MemoryResource, which is a class with the same interface as std::pmr::memory_resource. By default, it uses a new and delete for the allocation, just like std::pmr::new_delete_resource. Construction of the buffer is just like lexy::string_input, except for the additional MemoryResource parameter. Once a memory resource has been specified, it will not propagate on assignment.

Tip
As the buffer owns the input, it can terminate it with the EOF character for encodings that have the same character and integer type. This eliminates the "is the reader at eof?"-branch during parsing.
Builder
class builder
{
public:
    explicit builder(std::size_t     size,
                     MemoryResource* resource = /* default resource */);

    char_type* data() const noexcept;
    std::size_t size() const noexcept;

    buffer finish() && noexcept;
};

The builder class separates the allocation and copying of the buffer data. This allows, for example, writing into the immutable buffer from a file. The constructor allocates memory for size characters, then data() gives a mutable pointer to that memory.

Make buffer from raw memory
struct /* unspecified */
{
    auto operator()(const void* memory, std::size_t size) const;

    template <typename MemoryResource>
    auto operator()(const void* memory, std::size_t size, MemoryResource* resource) const;
};

template <typename Encoding, encoding_endianness Endianness>
constexpr auto make_buffer_from_raw = /* unspecified */;

lexy::make_buffer_from_raw is a function object that constructs a lexy::buffer of the specified encoding from raw memory. If necessary, it will take care of the endianness conversion as instructed by the lexy::encoding_endianness enumeration. Any BOM, if present, will not be part of the input.

Example

Using a buffer to parse content from a std::string using UTF-8. This enables the sentinel optimization.

std::string str = ;
auto input = lexy::buffer<lexy::utf8_encoding>(str);

Using a buffer to parse a memory-mapped file containing little endian UTF-16.

auto ptr = mmap();

constexpr auto make_utf16_little
  = lexy::make_buffer_from_raw<lexy::utf16_encoding, lexy::encoding_endianness::little>;
auto input = make_utf16_little(ptr, length);

File Input

lexy/input/file.hpp
namespace lexy
{
    enum class file_error
    {
        os_error,
        file_not_found,
        permission_denied,
    };

    template <typename Encoding       = default_encoding,
              typename MemoryResource = /* default resource */>
    class read_file_result
    {
    public:
        using encoding  = Encoding;
        using char_type = typename encoding::char_type;

        explicit operator bool() const noexcept;

        file_error error() const noexcept;

        const char_type* data() const noexcept;
        std::size_t size() const noexcept;

        Reader reader() const& noexcept;
    };

    template <typename Encoding          = default_encoding,
              encoding_endianness Endian = encoding_endianness::bom,
              typename MemoryResource>
    auto read_file(const char*     path,
                   MemoryResource* resource = /* default resource */)
        -> read_file_result<Encoding, MemoryResource>;
}

The function lexy::read_file() reads the file at the specified path using the specified encoding and endianness. It returns a lexy::read_file_result. If reading failed, the operator bool will return false and .error() will return the error code. If reading was successful, the operator bool will return true and you can call .data()/.size() to get the file contents or treat it as an Input.

Example

Reading UTF-16 from a file with a BOM.

auto result = lexy::read_file<lexy::utf16_encoding>("input.txt");
if (!result)
    throw my_file_read_error_exception(result.error()); // (1)

 (2)
  1. Throw an exception giving it the lexy::file_error.

  2. Now you can use result as an Input or access the file contents.

Shell Input

lexy/input/shell.hpp
namespace lexy
{
    template <typename Encoding = default_encoding>
    struct default_prompt;

    template <typename Prompt = default_prompt<>>
    class shell
    {
    public:
        using encoding    = typename Prompt::encoding;
        using char_type   = typename encoding::char_type;
        using prompt_type = Prompt;

        shell();
        explicit shell(Prompt prompt);

        bool is_open() const noexcept;

        Input prompt_for_input();

        class writer;
        template <typename... Args>
        writer write_message(Args&&... args);

        Prompt& get_prompt() noexcept;
        const Prompt& get_prompt() const noexcept;
    };

    template <typename Prompt = default_prompt<>>
    using shell_lexeme = /* unspecified */;
    template <typename Tag, typename Prompt = default_prompt<>>
    using shell_error = /* unspecified */;
    template <typename Production, typename Prompt = default_prompt<>>
    using shell_error_context = /* unspecified */;
}

The class lexy::shell creates an interactive shell to ask for user input and write messages out. The exact behavior is controlled by the Prompt. By default, it uses lexy::default_prompt which reads from stdin and writes to stdout.

Warning
The interface of a Prompt is currently experimental. Refer to lexy::default_prompt if you want to write your own.
State
bool is_open() const noexcept;

A shell is initially open and can receive input, but the user can close the shell. For lexy::default_prompt, the shell is closed if the user enters EOF e.g. by pressing kbd:[Ctrl + D] under Linux.

is_open() returns false if the user has closed it, and true otherwise.

Input
Input prompt_for_input();

A shell object is not itself an Input, but it can be used to create one. Calling prompt_for_input() will ask the user to enter some input, and then return an unspecified Input type that refers to that input. If parsing reaches the end of the input and the shell is still open, it will automatically ask the user for continuation input that will be appended to the current input. Once parsing of the input is done, prompt_for_input() can be called again to request new input from the user.

Warning
Calling prompt_for_input() again will invalidate all memory used by the previous input.

The lexy::default_prompt asks for input by display > ` and reading an entire line from `stdin. If continuation input is requested, it will display `. ` and reads another line.

Output
class writer
{
public:
    // non-copyable

    template <typename CharT>
    writer& operator()(const CharT* str, std::size_t length);
    template <typename CharT>
    writer& operator()(const CharT* str);
    template <typename CharT>
    writer& operator()(CharT c);

    writer& operator()(lexy::lexeme_for</* input type */> lexeme);
};

template <typename... Args>
writer write_message(Args&&... args);

Calling write_message() will prepare the prompt for displaying a message and returns a writer function object that can be used to specify the contents of the message. The arguments of write_message() are forwarded to the prompt and can be used to distinguish between e.g. normal and error messages. The writer can be invoked multiple times to give different parts of the message; the entire message is written out when the writer is destroyed. A writer can only write messages whose character type are the primary or secondary character type of the encoding.

Using lexy::default_prompt does not require any message arguments and it will simply write the message to stdout, appending a newline at the end.

Example

An interactive REPL.

lexy::shell<> shell;
while (shell.is_open())
{
    auto input = shell.prompt_for_input(); // (1)
    auto result = lexy::parse<expression>(input, ); // (2)
    if (result)
        shell.write_message()(result.value()); // (3)
}
  1. Ask the user to enter more input.

  2. Parse the input, requesting continuation input if necessary.

  3. Write the result.

For a full example, see examples/shell.cpp.

Command-line argument Input

lexy/input/argv_input.hpp
namespace lexy
{
    class argv_sentinel;
    class argv_iterator;

    constexpr argv_iterator argv_begin(int argc, char* argv[]) noexcept;
    constexpr argv_iterator argv_end(int argc, char* argv[]) noexcept;

    template <typename Encoding = default_encoding>
    class argv_input
    {
    public:
        using encoding  = Encoding;
        using char_type = typename encoding::char_type;
        using iterator  = argv_iterator;

        constexpr argv_input() = default;
        constexpr argv_input(argv_iterator begin, argv_iterator end) noexcept;
        constexpr argv_input(int argc, char* argv[]) noexcept;

        constexpr Reader reader() const& noexcept;
    };

    template <typename Encoding = default_encoding>
    using argv_lexeme = lexeme_for<argv_input<Encoding>>;
    template <typename Tag, typename Encoding = default_encoding>
    using argv_error = error_for<argv_input<Encoding>, Tag>;
    template <typename Production, typename Encoding = default_encoding>
    using argv_error_context = error_context<Production, argv_input<Encoding>>;
}

The class lexy::argv_input is an input that uses the command-line arguments passed to main(). It excludes argv[0], which is the executable name, and includes \0 as a separator between command line arguments.

Note
The input is a lightweight view and does not own any data.
Command-line iterators
class argv_sentinel;
class argv_iterator;

constexpr argv_iterator argv_begin(int argc, char* argv[]) noexcept;
constexpr argv_iterator argv_end(int argc, char* argv[]) noexcept;

The lexy::argv_iterator is a bidirectional iterator iterating over the command-line arguments excluding the initial argument which is the executable name. It can be created using argv_begin() and argv_end().

Example

Use the command line arguments as input.

int main(int argc, char* argv[])
{
    auto input = lexy::argv_input(argc, argv);
    
}

If the program is invoked with ./a.out a 123 b, the input will be a\0123\0b.

Lexemes and Tokens

A lexeme is the part of the input matched by a token rule. It is represented by the class lexy::lexeme. A token is a combination of an identifier that defines the rule it matches, as well as the matched lexeme.

Note
When talking about tokens in the context of rules, it is usually short for token rule, i.e. the rule that defines what is matched, not the concrete realization.

Code point

lexy/encoding.hpp
namespace lexy
{
    class code_point
    {
    public:
        constexpr code_point() noexcept;
        constexpr explicit code_point(char32_t value) noexcept;

        constexpr char32_t value() const noexcept;

        constexpr bool is_valid() const noexcept;
        constexpr bool is_surrogate() const noexcept;
        constexpr bool is_scalar() const noexcept;

        constexpr bool is_ascii() const noexcept;
        constexpr bool is_bmp() const noexcept;

        friend constexpr bool operator==(code_point lhs, code_point rhs) noexcept;
        friend constexpr bool operator!=(code_point lhs, code_point rhs) noexcept;
    };
}

The class lexy::code_point represents a single code point from the input. It is merely a wrapper over a char32_t that contains the numerical code.

Constructors
constexpr code_point() noexcept; // (1)
constexpr explicit code_point(char32_t value) noexcept; (2)
  1. Creates an invalid code point.

  2. Creates the specified code point. The value will be returned from value() unchanged.

Validity
constexpr bool is_valid() const noexcept; // (1)
constexpr bool is_surrogate() const noexcept; // (2)
constexpr bool is_scalar() const noexcept; // (3)
  1. Returns true if the code point is less than 0x10’FFFF, false otherwise.

  2. Returns true if the code point is a UTF-16 surrogate, false otherwise.

  3. Returns true if the code point is valid and not a surrogate, false otherwise.

Category
constexpr bool is_ascii() const noexcept; // (1)
constexpr bool is_bmp() const noexcept; // (2)
  1. Returns true if the code point is ASCII (7-bit value), false otherwise.

  2. Returns true if the code point is in the Unicode BMP (16-bit value), false otherwise.

Lexeme

lexy/lexeme.hpp
namespace lexy
{
    template <typename Reader>
    class lexeme
    {
    public:
        using encoding  = typename Reader::encoding;
        using char_type = typename encoding::char_type;
        using iterator  = typename Reader::iterator;

        constexpr lexeme() noexcept;
        constexpr lexeme(iterator begin, iterator end) noexcept;

        constexpr explicit lexeme(const Reader& reader, iterator begin) noexcept
        : lexeme(begin, reader.cur())
        {}

        constexpr bool empty() const noexcept;

        constexpr iterator begin() const noexcept;
        constexpr iterator end() const noexcept;

        // Only if the iterator is a pointer.
        constexpr const char_type* data() const noexcept;

        // Only if the iterator has `operator-`.
        constexpr std::size_t size() const noexcept;

        // Only if the iterator has `operator[]`.
        constexpr char_type operator[](std::size_t idx) const noexcept;
    };

    template <typename Input>
    using lexeme_for = lexeme<input_reader<Input>>;
}

The class lexy::lexeme represents a sub-range of the input. For convenience, most inputs also provide convenience typedefs that can be used instead of lexy::lexeme_for.

Token Kind

lexy/token.hpp
namespace lexy
{
    enum predefined_token_kind
    {
        unknown_token_kind,
    };

    template <typename TokenKind = void>
    class token_kind
    {
    public:
        constexpr token_kind() noexcept;
        constexpr token_kind(predefined_token_kind value) noexcept;
        constexpr token_kind(TokenKind value) noexcept;
        template <typename TokenRule>
        constexpr token_kind(TokenRule token_rule) noexcept;

        constexpr explicit operator bool() const noexcept;

        constexpr bool is_predefined() const noexcept;

        constexpr const char* name() const noexcept;

        constexpr TokenKind get() const noexcept;

        static constexpr std::uint_least16_t to_raw(token_kind<TokenKind> kind) noexcept;
        static constexpr token_kind<TokenKind> from_raw(std::uint_least16_t kind) noexcept;

        friend constexpr bool operator==(token_kind lhs, token_kind rhs) noexcept;
        friend constexpr bool operator!=(token_kind lhs, token_kind rhs) noexcept;
    };
}

The class lexy::token_kind identifies a token rule. It is merely a wrapper over the specified TokenKind, which is an enum. If TokenKind is void, it is a wrapper over an int.

A token kind can represent any of the lexy::predefined_token_kind as well as any values specified in the given enum, or any integer value. Predefined token kinds are mapped to spare enum values.

Constructors
constexpr token_kind() noexcept;                         // (1)

constexpr token_kind(predefined_token_kind value) noexcept; // (2)

constexpr token_kind(TokenKind value) noexcept; // (3)

template <typename TokenRule>
constexpr token_kind(TokenRule token_rule) noexcept; // (4)
  1. Creates an unknown token kind.

  2. Creates a predefined token kind.

  3. Creates the specified token kind, if TokenKind is void, constructor takes an int.

  4. Creates a token kind from a token rule.

The token kind of a rule is computed as follows:

  • If the token rule was associated with a token kind by calling .kind<value>, the resulting kind is the specified value>.

  • Otherwise, if the map found at lexy::token_kind_map_for<TokenKind> contains a mapping for the TokenRule, it uses that.

  • Otherwise, the token kind is unknown.

Access
constexpr explicit operator bool() const noexcept; // (1)

constexpr bool is_predefined() const noexcept; // (2)

constexpr const char* name() const noexcept; // (3)

constexpr TokenKind get() const noexcept; // (4)
  1. Returns true if the token kind is not unknown, false otherwise.

  2. Returns true if the token kind one of the lexy::predefined_token_kind`s, `false otherwise.

  3. Returns the name of the token kind.

  4. Returns the underlying value of the token kind, which is some other value for predefined tokens.

The name of a token kind is determined as follows:

  • If the TokenKind is void, the name is "token" for all token kinds.

  • Otherwise, if the token kind is unknown, the name is "token".

  • Otherwise, if the token kind is predefined, the name describes the predefined token.

  • Otherwise, if ADL finds an overload const char* token_kind_name(TokenKind kind), returns that as the name.

  • Otherwise, the name is "token" for all tokens.

Token Kind Map

lexy/token.hpp
namespace lexy
{
    class Token-Kind-Map
    {
    public:
        template <auto TokenKind, typename TokenRule>
        consteval Token-Kind-Map map(TokenRule) const;
    };

    inline constexpr auto token_kind_map = Token-Kind-Map{};

    template <typename TokenKind>
    constexpr auto token_kind_map_for = token_kind_map;
}

There are two ways to associate a token kind with a token rule. Either by calling .kind<Kind> on the token rule and giving it a value there, or by specializing the lexy::token_kind_map_for for your TokenKind enumeration.

Example
enum class my_token_kind // (1)
{
    code_point,
    period,
    open_paren,
    close_paren,
};

// (2)
template <>
constexpr auto lexy::token_kind_map_for<my_token_kind>
    = lexy::token_kind_map.map<my_token_kind::code_point>(lexy::dsl::code_point)
                          .map<my_token_kind::period>(lexy::dsl::period)
                          .map<my_token_kind::open_paren>(lexy::dsl::parenthesized.open())
                          .map<my_token_kind::close_paren>(lexy::dsl::parenthesized.close());
  1. Define your TokenKind enumeration.

  2. Define the mapping of token rules to enumeration values.

Note
The token kind is only relevant when lexy::parse_as_tree() is used to parse the input.

Token

lexy/token.hpp
namespace lexy
{
    template <typename Reader, typename TokenKind = void>
    class token
    {
    public:
        explicit constexpr token(token_kind<TokenKind> kind, lexy::lexeme<Reader> lex) noexcept;
        explicit constexpr token(token_kind<TokenKind> kind,
                                 typename Reader::iterator begin,
                                 typename Reader::iterator end) noexcept;

        constexpr token_kind<TokenKind> kind() const noexcept;
        constexpr auto lexeme() const noexcept;

        constexpr auto name() const noexcept { return kind().name(); }

        constexpr auto position() const noexcept -> typename Reader::iterator
        {
            return lexeme().begin();
        }
    };

    template <typename Input, typename TokenKind = void>
    using token_for = token<input_reader<Input>, TokenKind>;
}

The class lexy::token just combines a lexy::token_kind and a lexy::lexeme.

Writing custom Inputs

The Input concept
class Input
{
public:
    Reader reader() const&;
};

An Input is just a class with a reader() member function that returns a Reader to the beginning of the input. The type alias lexy::input_reader<Reader> returns the type of the corresponding reader.

Warning
The interface of a Reader is currently experimental. Refer to the comments in lexy/input/base.hpp.

Matching, parsing and validating

The Production concept
struct Production
{
    static constexpr auto rule = ;
    static constexpr auto whitespace = ; // optional

    static constexpr auto value = ; // optional
};

A Production is type containing a rule and optional callbacks that produce the value. A grammar contains an entry production where parsing begins and all productions referenced by it.

Tip
It is recommended to put all productions of a grammar into a separate namespace.

By passing the entry production of the grammar to lexy::match(), lexy::parse(), or lexy::validate(), the production is parsed.

Matching

lexy/match.hpp
namespace lexy
{
    template <typename Production, typename Input>
    constexpr bool match(const Input& input);
}

The function lexy::match() matches the Production on the given input. If the production accepts the input, returns true, otherwise, returns false. It will discard any values produced and does not give detailed information about why the production did not accept the input.

Note
A production does not necessarily need to consume the entire input for it to match. Add lexy::dsl::eof to the end if the production should consume the entire input.

Validating

lexy/validate.hpp
namespace lexy
{
    template <typename ErrorCallback>
    class validate_result
    {
    public:
        using error_callback = ErrorCallback;
        using error_type     = /* return type of the sink */;

        constexpr explicit operator bool() const noexcept
        {
            return is_success();
        }

        constexpr bool is_success() const noexcept; // (1)
        constexpr bool is_error() const noexcept; // (2)
        constexpr bool is_recovered_error() const noexcept; // (3)
        constexpr bool is_fatal_error() const noexcept; // (4)

        constexpr std::size_t error_count() const noexcept;

        constexpr const error_type& errors() const& noexcept;
        constexpr error_type&& errors() && noexcept;
    };

    template <typename Production, typename Input, typename ErrorCallback>
    constexpr auto validate(const Input& input, ErrorCallback error_callback)
        -> validate_result<ErrorCallback>;
}
  1. Returns true if no error occurred during validation.

  2. Returns true if at least one error occurred during validation.

  3. Returns true if at least one error occurred during validation, but parsing could recover after all of them.

  4. Returns true if at least one error occurred during validation and parsing had to cancel.

The function lexy::validate() validates that the Production matches on the given input. If a parse error occurs, it will invoke the error callback (see Error handling); all errors are then returned. It will discard any values produced.

Note
A production does not necessarily need to consume the entire input for it to match. Add lexy::dsl::eof to the end if the production should consume the entire input.

Parsing

lexy/parse.hpp
namespace lexy
{
    template <typename T, typename ErrorCallback>
    class parse_result
    {
    public:
        using value_type     = T;
        using error_callback = ErrorCallback;
        using error_type     = /* return type of the sink */;

        //=== status ===//
        constexpr explicit operator bool() const noexcept
        {
            return is_success();
        }

        constexpr bool is_success() const noexcept; // (1)
        constexpr bool is_error() const noexcept; // (2)
        constexpr bool is_recovered_error() const noexcept; // (3)
        constexpr bool is_fatal_error() const noexcept; // (4)

        //=== value ===//
        constexpr bool has_value() const noexcept; // (5)

        constexpr const T& value() const& noexcept;
        constexpr T&& value() && noexcept;

        //=== error ===//
        constexpr std::size_t error_count() const noexcept;

        constexpr const error_type& errors() const& noexcept;
        constexpr error_type&& errors() && noexcept;
    };

    template <typename Production, typename Input, typename ErrorCallback>
    constexpr auto parse(const Input& input, ErrorCallback error_callback)
        -> parse_result</* see below */, ErrorCallback>;

    template <typename Production, typename Input, typename State, typename ErrorCallback>
    constexpr auto parse(const Input& input, State&& state, ErrorCallback error_callback)
        -> parse_result</* see below */, ErrorCallback>;
}
  1. Returns true if no error occurred during parsing.

  2. Returns true if at least one error occurred during parsing.

  3. Returns true if at least one error occurred during parsing, but parsing could recover after all of them.

  4. Returns true if at least one error occurred during parsing and parsing had to cancel.

  5. Returns true if parsing could produce a value. This can only happen if there was no fatal error.

The function lexy::parse() parses the Production on the given input. The return value is a lexy::parse_result<T, ErrorCallback>, where T is the return type of the Production::value or Production::list callback. If the production accepts the input or there are only recoverable errors, invokes Production::value (see below) with the produced values and returns their result. Invokes the error callback for each parse error (see Error handling) and collects the errors.

The return value on success is determined using Production::value depending on three cases:

  • Production::rule does not contain a list. Then all arguments will be forwarded to Production::value as a callback whose result is returned. The Production::value callback must be present.

  • Production::rule contains a list and no other rule produces a value. Then Production::value will be used as sink for the list values. If Production::value is also a callback that accepts the result of the sink as argument, it will be invoked with the sink result and the processed result returned. Otherwise, the result of the sink is the final result.

  • Production::rule contains a list and other rules produce values as well. Then Production::value will be used as sink for the list values. The sink result will be added to the other values in order and everything forwarded to Production::value as a callback. The callback result is then returned.

Note
The callback operator>> is useful for case 3 to create a combined callback and sink with the desired behavior.

The second overload of lexy::parse() allows passing an arbitrary state argument. This will be made available to the lexy::dsl::parse_state and lexy::dsl::parse_state_member rules which can forward it to the Production::value callback.

Callbacks

The Callback concept
struct Callback
{
    using return_type = ;

    return_type operator()(Args&&... args) const;
};

struct Sink
{
    class _sink // exposition only
    {
    public:
        using return_type = ;

        void operator()(Args&&... args);

        return_type&& finish() &&;
    };

    _sink sink() const;
};

A Callback is a function object whose return type is specified by a member typedef. A Sink is a type with a sink() member function that returns a callback. The callback can be invoked multiple times and the final value is return by calling .finish().

Callbacks are used by lexy to compute the parse result and handle error values. They can either be written manually implementing to the above concepts or composed from the pre-defined concepts.

Callback adapters

lexy/callback.hpp
namespace lexy
{
    template <typename ReturnType = void, typename... Fns>
    constexpr Callback callback(Fns&&... fns);
}

Creates a callback with the given ReturnType from multiple functions. When calling the resulting callback, it will use overload resolution to determine the correct function to call. It supports function pointers, lambdas, and member function or data pointers.

lexy/callback.hpp
namespace lexy
{
    template <typename T, typename... Fns>
    constexpr Sink sink(Fns&&... fns);
}

Creates a sink constructing the given T using the given functions. The sink will value-construct the T and then call one of the functions selected by overload resolution, passing it a reference to the resulting object as first argument. It supports function pointers, lambdas, and member function or data pointers.

Example

Creating a sink that will add all values.

constexpr auto adder = lexy::sink<int>([](int& cur, int arg) { cur += arg; }); // (1)

auto s = adder.sink(); // (2)
s(1);
s(2);
s(3);
auto result = std::move(s).finish();
assert(result == 1 + 2 + 3);
  1. Define the sink.

  2. Use it.

lexy/callback.hpp
namespace lexy
{
template <typename Callback>
constexpr Sink collect(Callback&& callback);

template <typename Container, typename Callback>
constexpr Sink collect(Callback&& callback);
}

Turns a callback into a sink by invoking it multiple times and collecting all the results in a container.

The first version requires that the callback returns void; its sink callback forwards all arguments and increases a count. The final count as a std::size_t is then returned by finish().

The second version requires that the callback returns non-void. Its sink callback creates a default constructed Container. It then invokes the callback multiple times and adds the result to the container using .push_back(). The final container is then returned.

Note
collect() is useful for the error callback to handle multiple errors.

Callback composition

lexy/callback.hpp
namespace lexy
{
    template <typename First, typename Second>
    constexpr auto operator|(First first, Second second); // (1)

    template <typename Sink, typename Callback>
    constexpr auto operator>>(Sink sink, Callback callback); // (2)

}
  1. The result of first | second, where first and second are both callbacks, is another callback that first invokes first and then passes the result to second. The result cannot be used as sink.

  2. The result of sink >> callback, is both a sink and a callback. As a sink, it behaves just like sink. As a callback, it takes the result of the sink as well as any other arguments and forwards them to callback.

Example

Build a string, then get its length.

constexpr auto make_string = lexy::callback<std::string>([](const char* str) { return str; });
constexpr auto string_length = lexy::callback<std::size_t>(&std::string::size);

constexpr auto inefficient_strlen = make_string | string_length; // (1)

assert(inefficient_strlen("1234") == 4); // (2)
  1. Compose the two callbacks.

  2. Use it.

Note
The callback operator>> is used for productions whose rule contain both a list and produce other values. The list will be constructed using the sink and then everything will be passed to callback.

The no-op callback

lexy/callback.hpp
namespace lexy
{
    constexpr auto noop = /* unspecified */;
}

lexy::noop is both a callback and a sink. It ignores all arguments passed to it and its return type is void.

Example

Parse the production, but do nothing on errors.

auto result = lexy::parse<my_production>(my_input, lexy::noop); // (1)
if (!result)
    throw my_parse_error(); // (2)
auto value = result.value(); // (3)
  1. Parse my_production. If an error occurs, just return a result<T, void> in the error state.

  2. lexy::noop does not make errors disappear, they still need to be handled.

  3. Do something with the parsed value.

Constructing objects

lexy/callback.hpp
namespace lexy
{
    template <typename T>
    constexpr auto forward = /* unspecified */;

    template <typename T>
    constexpr auto construct = /* unspecified */;

    template <typename T, typename PtrT = T*>
    constexpr auto new_ = /* unspecified */;
}

The callback lexy::forward<T> can accept either a const T& or a T&& and forwards it. It does not have a sink.

The callback lexy::construct<T> constructs a T by forwarding all arguments to a suitable constructor. If the type does not have a constructor, it forwards all arguments using brace initialization. It does not have a sink.

The callback lexy::new_<T, PtrT> works just like lexy::construct<T>, but it constructs the object on the heap by calling new. The resulting pointer is then converted to the specified PtrT. It does not have a sink.

Example

A callback that creates a std::unique_ptr<std::string>.

constexpr auto make_unique_str = lexy::new_<std::string, std::unique_ptr<std::string>>; // (1)

constexpr auto make_unique_str2 = lexy::new_<std::string> | lexy::construct<std::unique_ptr<std::string>>; // (2)
  1. Specify a suitable PtrT.

  2. Equivalent version that uses composition and lexy::construct instead.

Constructing lists

lexy/callback.hpp
namespace lexy
{
    template <typename T>
    constexpr auto as_list = /* unspecified */;

    template <typename T>
    constexpr auto as_collection = /* unspecified */;
}

lexy::as_list<T> is both a callback and a sink. As a callback, it forwards all arguments to the std::initializer_list constructor of T and returns the result. As a sink, it first default constructs a T and then repeatedly calls push_back() for single arguments and emplace_back() otherwise.

lexy::as_collection<T> is like lexy::as_list<T>, but instead of calling push_back() and emplace_back(), it calls insert() and emplace().

Example

Create a std::vector<int> and std::set<int>.

constexpr auto as_int_vector = lexy::as_list<std::vector<int>>;
constexpr auto as_int_set = lexy::as_collection<std::set<int>>;

Constructing strings

lexy/callback.hpp
namespace lexy
{
    template <typename String, typename Encoding = /* see below */>
    constexpr auto as_string = /* unspecified */;
}

lexy::as_string<String, Encoding> is both a callback and a sink. It constructs a String object in the given Encoding. If no encoding is specified, it deduces one from the character type of the string.

As a callback, it constructs the string directly from the given argument. Then it accepts:

  • A reference to an existing String object, which is forwarded as the result.

  • A const CharT* and a std::size_t, where CharT is a compatible character type. The two arguments are forwarded to a String constructor.

  • A lexy::lexeme<Reader> lex, where Reader::iterator is a pointer. The character type of the reader must be compatible with the encoding. It constructs the string using String(lex.data(), lex.size()) (potentially casting the pointer type if necessary).

  • A lexy::lexeme<Reader> lex, where Reader::iterator is not a pointer. It constructs the string using String(lex.begin(), lex.end()). The range constructor has to take care of any necessary character conversion.

  • A lexy::code_point. It is encoded into a local character array according to the specified Encoding. Then the string is constructed using a two-argument (const CharT*, std::size_t) constructor.

As a sink, it first default constructs the string. Then it will repeatedly append the following arguments:

  • A single CharT, which is convertible to the strings character type. It is appended by calling .push_back().

  • A reference to an existing String object, which is appended by calling .append().

  • A const CharT* and a std::size_t, where CharT is a compatible character type. The two arguments are forwarded to .append().

  • A lexy::lexeme<Reader> lex, where Reader::iterator is a pointer. The character type of the reader must be compatible with the encoding. It is appended using .append(lex.data(), lex.size()) (potentially casting the pointer type if necessary).

  • A lexy::lexeme<Reader> lex, where Reader::iterator is not a pointer. It constructs the string using .append(lex.begin(), lex.end()). The range append function has to take care of any necessary character conversion.

  • A lexy::code_point. It is encoded into a local character array according to the specified Encoding. Then it is appended to the string using a two-argument .append(const CharT*, std::size_t) overload.

Example
constexpr auto as_utf16_string = lexy::as_string<std::u16string>;                   // (1)
constexpr auto as_utf8_string  = lexy::as_string<std::string, lexy::utf8_encoding>; // (2)
  1. Constructs a std::u16string, deducing the encoding as UTF-16.

  2. Constructs a std::string, specifying the encoding as UTF-8.

Rule-specific callbacks

lexy/callback.hpp
namespace lexy
{
    template <typename T>
    constexpr auto as_aggregate = /* unspecified */;

    template <typename T>
    constexpr auto as_integer = /* unspecified */;
}

The callback and sink lexy::as_aggregate<T> is only used together with the lexy::dsl::member rule and documented there.

The callback lexy::as_integer<T> constructs an integer type T and has two overloads:

template <typename Integer>
T operator()(const Integer& value) const; // (1)

template <typename Integer>
T operator()(int sign, const Integer& value) const; // (2)
  1. Returns T(value).

  2. Returns T(sign * value).

The second overload is meant to be used together with lexy::dsl::sign and related rules.

Error handling

Parsing errors are reported by constructing a lexy::error object and passing it to the error callback of lexy::parse and lexy::validate together with the lexy::error_context. The error callback must either be a sink, in which case it can return an arbitrary type that represents a collection of all the errors, or is a non-sink callback that returns void, in which case it will be passed to lexy::collect() to turn it into a sink.

The error_type of lexy::validate_result and lexy::parse_result will be the return type of the sink. For a void returning non-sink callback it will be std::size_t, which is the result of lexy::collect().

Example
A void-returning error callback that is not a sink.
class ErrorCallbackVoid
{
public:
    using return_type = void;

    template <typename Production, typename Input, typename Tag>
    void operator()(const lexy::error_context<Production, Input>& context,
                           const lexy::error<lexy::input_reader<Input>, Tag>& error) const;
};
A non-void-returning error callback that is a sink.
class ErrorCallbackSink
{
public:
    class Sink
    {
    public:
        using return_type = /* ... */;

        template <typename Production, typename Input, typename Tag>
        void operator()(const lexy::error_context<Production, Input>& context,
                               const lexy::error<lexy::input_reader<Input>, Tag>& error) const;

        return_type finish() &&;
    };

    Sink sink();
};

Of course, overloading can be used to differentiate between various error types and contexts.

Error types

lexy/error.hpp
namespace lexy
{
    template <typename Reader, typename Tag>
    class error;

    struct expected_literal {};
    template <typename Reader>
    class error<Reader, expected_literal>;

    struct expected_char_class {};
    template <typename Reader>
    class error<Reader, expected_char_class>;

    template <typename Input, typename Tag>
    using error_for = error<input_reader<Input>, Tag>;

    template <typename Reader, typename Tag, typename ... Args>
    constexpr auto make_error(Args&&... args);
}

All errors are represented by instantiations of lexy::error<Reader, Tag>. The Tag is an empty type that specifies the kind of error. There are specializations for two tags to store additional information.

The function lexy::make_error constructs an error object given the reader and tag by forwarding all the arguments.

Generic error
template <typename Reader, typename Tag>
class error
{
    using iterator = typename Reader::iterator;

public:
    constexpr explicit error(iterator pos) noexcept;
    constexpr explicit error(iterator begin, iterator end) noexcept;

    constexpr iterator position() const noexcept;

    constexpr iterator begin() const noexcept;
    constexpr iterator end() const noexcept;

    constexpr const char* message() const noexcept;
};

The primary class template lexy::error<Reader, Tag> represents a generic error without additional metadata. It can either be constructed giving it a single position, then position() == begin() == end(); or a range of the input, then position() == begin() ⇐ end().

The message() is determined using the Tag. By default, it returns the type name of Tag after removing the top-level namespace name. This can be overridden by defining either Tag::name() or Tag::name.

Expected literal error
struct expected_literal
{};

template <typename Reader>
class error<Reader, expected_literal>
{
    using iterator    = typename Reader::iterator;

public:
    constexpr explicit error(iterator position,
                             string_view string, std::size_t index) noexcept;

    constexpr iterator position() const noexcept;

    constexpr auto string() const noexcept -> const typename Reader::char_type*;
    constexpr auto character() const noexcept -> typename Reader::char_type;

    constexpr std::size_t index() const noexcept;
};

A specialization of lexy::error is provided if Tag == lexy::expected_literal. It represents the error where a literal string was expected, but could not be matched. It is mainly raised by the lexy::dsl::lit rule.

The error happens at a given position() and with a given string(). The index() is the index into the string where matching failed; e.g. 0 if the input starts with a different character, 2 if the first two characters matched, etc. The character() is the string character at that index.

Character class error
struct expected_char_class
{};

template <typename Reader>
class error<Reader, expected_char_class>
{
    using iterator = typename Reader::iterator;

public:
    constexpr explicit error(iterator position, const char* name) noexcept;

    constexpr iterator position() const noexcept;

    constexpr const char* name() const noexcept;
};

A specialization of lexy::error is provided if Tag == lexy::expected_char_class. It represents the error where any character from a given set of characters was expected, but could not be matched. It is raised by the lexy::dsl::ascii::* rules or lexy::dsl::newline, among others.

The error happens at the given position() and a symbolic name of the character class is returned by name(). By convention, the name format used is <group>.<name> or <name>, where both <group> and <name> consist of characters. Examples include newline, ASCII.alnum and digit.decimal.

Error context

lexy/error.hpp
namespace lexy
{
    template <typename Production, typename Input>
    class error_context
    {
        using iterator = typename input_reader<Input>::iterator;

    public:
        constexpr explicit error_context(const Input& input, iterator pos) noexcept;

        constexpr const Input& input() const noexcept;

        static consteval const char* production();

        constexpr iterator position() const noexcept;
    };
}

The class lexy::error_context<Production, Input> contain information about the context where the error occurred.

The entire input containing the error is returned by input().

The Production whose rule has raised the error is specified as template parameter and its name returned by production(). Like lexy::error<Reader, Tag>::message(), it returns the name of the type without the top level namespace name. This can be overridden by defining Production::name() or Production::name.

The position() of the error context is the input position where the production started parsing.

Parse Tree

lexy/parse_tree.hpp
namespace lexy
{
    enum class traverse_event
    {
        enter,
        exit,
        leaf,
    };

    template <typename Reader, typename TokenKind = void,
              typename MemoryResource = /* default */>
    class parse_tree
    {
    public:
        class builder;

        constexpr parse_tree();
        constexpr explicit parse_tree(MemoryResource* resource);

        bool empty() const noexcept;
        void clear() noexcept;

        class node;
        class node_kind;

        node root() const noexcept; // requires: !empty()

        class traverse_range;

        traverse_range traverse(const node& n) const noexcept;
        traverse_range traverse() const noexcept;
    };

    template <typename Input, typename TokenKind = void,
              typename MemoryResource = /* default */>
    using parse_tree_for = lexy::parse_tree<input_reader<Input>, TokenKind, MemoryResource>;

    template <typename Production, typename TokenKind, typename MemoryResource, typename Input,
              typename ErrorCallback>
    auto parse_as_tree(parse_tree<input_reader<Input>, TokenKind, MemoryResource>& tree,
                       const Input& input, ErrorCallback error_callback)
      -> lexy::validate_result<ErrorCallback>;
}

The class lexy::parse_tree represents a lossless untyped syntax tree.

The function lexy::parse_as_tree() parses a Production on the given input and constructs a lossless parse tree from the result. All parse errors are passed to the error callback (see Error handling) and later returned. If a non-recoverable parse error happens, the tree will be cleared, otherwise it contains the (partial) parse tree of the input. It will discard any values produced by parsing the rules.

The resulting parse tree will contain a parent node for each production, and leaf node for every token. If a token is empty and has an unknown token kind, it will not be added to the parse tree. If a production inherits from lexy::transparent_production, no separate node will be created; instead all child nodes will be added to its parent. If a production inherits from lexy::token_production, tokens are merged when possible: if there are two or more tokens with the same kind directly after each other, only a single node spanning all of them will be added, as opposed to multiple nodes for each individual token.

Traversing the tree and concatenating the lexemes of all tokens will result in the original input.

Manual Tree Building

template <typename Reader, typename TokenKind, typename MemoryResource>
class parse_tree<Reader, TokenKind, MemoryResource>::builder
{
public:
    template <typename Production>
    explicit builder(parse_tree&& tree, Production production); // (1)
    template <typename Production>
    explicit builder(Production production); // (2)

    struct production_state;

    template <typename Production>
    production_state start_production(Production production); // (3)

    void token(token_kind<TokenKind> kind,
               typename Reader::iterator begin, typename Reader::iterator end); // (4)

    void finish_production(production_state&& s); // (5)
    void backtrack_production(production_state&& s); // (6)

    parse_tree finish() &&; // (7)
};
  1. Create a builder that will re-use the memory of the existing tree. Its root node will be associated with the given Production.

  2. Same as above, but does not re-use memory.

  3. Adds a production child node as last child of the current node and activates it. Returns a handle that remembers the previous current node.

  4. Adds a token node to the current node.

  5. Finishes with a child production and activates its parent.

  6. Cancels the currently activated node, by deallocating it and all children. Activates its parent node again.

  7. Returns the finished tree.

Tree Node

template <typename Reader, typename TokenKind, typename MemoryResource>
class parse_tree<Reader, TokenKind, MemoryResource>::node_kind
{
public:
    bool is_token() const noexcept;
    bool is_production() const noexcept;

    bool is_root() const noexcept;
    bool is_token_production() const noexcept;

    const char* name() const noexcept;

    friend bool operator==(node_kind lhs, node_kind rhs);
    friend bool operator!=(node_kind lhs, node_kind rhs);

    friend bool operator==(node_kind nk, token_kind<TokenKind> tk);
    friend bool operator==(token_kind<TokenKind> tk, node_kind nk);
    friend bool operator!=(node_kind nk, token_kind<TokenKind> tk);
    friend bool operator!=(token_kind<TokenKind> tk, node_kind nk);

    template <typename Production>
    friend bool operator==(node_kind nk, Production);
    template <typename Production>
    friend bool operator==(Production p, node_kind nk);
    template <typename Production>
    friend bool operator!=(node_kind nk, Production p);
    template <typename Production>
    friend bool operator!=(Production p, node_kind nk);
};

The class node_kind stores information over the kind of node. Nodes are either associated with a Production or a token rule. The root node is always a Production node.

template <typename Reader, typename TokenKind, typename MemoryResource>
class parse_tree<Reader, TokenKind, MemoryResource>::node
{
public:
    void* address() const noexcept;

    node_kind kind() const noexcept;

    node parent() const noexcept;

    /* sized range */ children() const noexcept;

    /* range */ siblings() const noexcept;

    bool is_last_child() const noexcept;

    lexy::lexeme<Reader> lexeme() const noexcept;
    lexy::token<Reader, TokenKind> token() const noexcept;

    friend bool operator==(node lhs, node rhs) noexcept;
    friend bool operator!=(node lhs, node rhs) noexcept;
};

The class node is a reference to a node in the tree. Two nodes are equal if and only if they point to the same node in the same tree.

Parent Access
node parent() const noexcept;

Returns a reference to a parent node. For the root node, returns a reference to itself.

This operation is O(number of siblings).

Child Access
class children_range
{
public:
    class iterator; // value_type = node
    class sentinel;

    iterator begin() const noexcept;
    sentinel end() const noexcept;

    bool empty() const noexcept;
    std::size_t size() const noexcept;
};

children_range children() const noexcept;

Returns a range object that iterates over all children of the node. For a token node, this is always the empty range.

Sibling Access
class sibling_range
{
public:
    class iterator; // value_type = node

    iterator begin() const noexcept;
    iterator end() const noexcept;

    bool empty() const noexcept;
};

sibling_range siblings() const noexcept;

Returns a range object that iterates over all siblings of a node. It begins with the sibling that is immediately following the node, and continues until it reached the last child of the parent. Then iteration wraps around to the first child of the parent until it ends at the original node. The original node is not included in the sibling range.

Token Access
lexy::lexeme<Reader> lexeme() const noexcept; // (1)
lexy::token<Reader, TokenKind> token() const noexcept; // (2)
  1. Returns the spelling of a token node. For a production node, returns the empty lexeme.

  2. Returns the spelling and token kind of a token node; must not be called on a production node.

Tree Traversal

enum class traverse_event
{
    enter,
    exit,
    leaf,
};
class traverse_range
{
public:
    class iterator; // value_type = { traverse_event, node }

    iterator begin() const noexcept;
    iterator end() const noexcept;

    bool empty() const noexcept;
};

traverse_range traverse(const node& n) const noexcept; // (1)
traverse_range traverse() const noexcept; // (2)
  1. Returns a range that traverses descendants of the given node.

  2. Returns a range that traverses the root node, or an empty range if the tree is empty.

The traverse_range iterates over a node and all its children and their children and so on. Its value type is a (unspecified) pair whose first member is a lexy::traverse_event and whose second member is a node reference.

For a token node, the range contains only the original node with event leaf.

For a production node, the range begins with the original node and event enter. It then does an in-order traversal of all descendants, beginning with the children of a node. When it reaches a token node, produces it with event leaf. When it reaches a production node, produces it with event enter, then all its descendants recursively, and then with event exit. After all descendants of the original node have been produced, finishes with the original node again and event exit.

Example

Prints a tree.

auto depth = 0;
for (auto [event, node] : tree.traverse())
{
    switch (event)
    {
    case lexy::traverse_event::enter:
        ++depth;
        indent(depth);
        print_node(node);
        break;
    case lexy::traverse_event::exit:
        --depth;
        break;

    case lexy::traverse_event::leaf:
        indent(depth);
        print_node(node);
        break;
    }
}
Note
Traversing a node just does pointer chasing. There is no allocation or recursion involved.

The rule DSL

The rule of a production is specified using a DSL built on top of C++ operator overloading. Everything of the DSL is defined in the namespace lexy::dsl::* and every header available under lexy/dsl/*. The umbrella header lexy/dsl.hpp includes all DSL headers.

A Rule is an object that defines a specific set of input to be parsed. It first tries to match a set of characters from the input by comparing the character at the current reader position to the set of expected characters, temporarily advancing the reader further if necessary. If the matching was successful, a subset of matched characters are consumed by advancing the reader permanently. The rule can then produce zero or more values, which are eventually forwarded to the value callback of its production. If the matching was not successful, an error is produced instead. A failed rule does not consume any characters.

A Branch is a rule that has an associated condition. The parsing algorithm can efficiently check whether the condition would match at the current reader position. As such, they are used whenever the algorithm needs to decide between multiple alternatives. Once the branch condition matches, the branch is taken without any additional backtracking.

A Token is a special Rule that is an atomic element of the input. As a rule, it does not produce any value. Every Token is also a Branch that uses itself as the condition.

Whitespace

By default, lexy does not treat whitespace in any special way. You need to instruct it to do so, using either manual or automatic whitespace skipping.

Manual whitespace skipping is done using lexy::dsl::whitespace(rule). It skips zero or more whitespace characters defined by rule. Insert it everywhere you want to skip over whitespace. See examples/email.cpp or examples/xml.cpp for an example of manual whitespace skipping.

Automatic whitespace skipping is done by adding a static constexpr auto whitespace to the root production, i.e. the production passed to one of the parse functions. This member is initialized to a rule that defines a single whitespace character. lexy will then skip zero or more occurrences of ::whitespace after every token of the entire grammar.

To temporarily disable whitespace skipping for a production, inherit the production from lexy::token_production. Then whitespace will not be skipped for the rule of the production, and all productions reached from that rule. Likewise, lexy::dsl::no_whitespace() can be used to disable it for a single rule.

See examples/tutorial.cpp or examples/json.cpp for an example of automatic whitespace skipping.

Note
"Whitespace" can mean literal whitespace characters, but also comments (or whatever you want it to mean).

lexy::dsl::whitespace (explicit)

lexy/dsl/whitespace.hpp
whitespace(rule) : Rule

whitespace(rule_a) | rule_b = whitespace(rule_a | rule_b)
whitespace(rule_a) / rule_b = whitespace(rule_a / rule_b)

The explicit whitespace rule matches rule zero or more times and treats the result as whitespace. This happens regardless of the state of automatic whitespace skipping.

If the whitespace rule is used inside a choice or alternative, the entire choice/alternative is treated as whitespace instead.

Requires

rule is a branch or a choice rule. It must not produce any values.

Matches

While the branch condition of rule or any of the branch conditions of the choices match, match and consume rule. This will only stop once the branch conditions no longer match. While matching and consuming rule, automatic whitespace skipping is disabled.

Values

None.

Errors

All errors raised by rule after the branch condition has been matched.

lexy::dsl::whitespace (implicit)

lexy/dsl/whitespace.hpp
whitespace : Rule = whitespace(automatic_whitespace_rule)

The implicit whitespace rule is equivalent to the explicit whitespace rule with the current whitespace rule; i.e. it matches the current whitespace rule zero or more times.

The current whitespace rule is determined as follows:

  • If automatic whitespace skipping is disabled, there is no current whitespace rule. lexy::dsl::whitespace does nothing.

  • If the current production inherits from lexy::token_production, there is no current whitespace rule. lexy::dsl::whitespace does nothing.

  • Otherwise, if the current production defines a static constexpr auto whitespace member, its value is the current whitespace rule.

  • Otherwise, if the root production defines a static constexpr auto whitespace member, its value is the current whitespace rule.

Here, the root production is defined as follows:

  • If the current production is a token production, the root production is the current production.

  • Otherwise, if the current production is the production that was originally parsed to the top-level parse function (e.g. lexy::parse()), the root production is the current production.

  • Otherwise, the root production is taken from the production that parsed the lexy::dsl::p or lexy::dsl::recurse rule to start parsing the current production.

This rule is automatically parsed after every token, after a production that inherits from lexy::token_production, or after a lexy::dsl::no_whitespace() rule.

Example
struct token_p : lexy::token_production
{
    struct child
    {
        static constexpr auto rule = dsl::whitespace; // (4)
    };

    static constexpr auto rule = dsl::whitespace + dsl::p<child>; // (3)
};

struct normal_prod
{
    static constexpr auto rule = dsl::whitespace + dsl::p<token_p>; // (2)
};

struct root_prod
{
    static constexpr auto whitespace = dsl::ascii::space;
    static constexpr auto rule = dsl::whitespace + dsl::p<normal_prod>; // (1)
};



auto result = lexy::parse<root_prod>();
  1. Here, the automatic whitespace rule is dsl::ascii::space, as the current production has a whitespace member.

  2. Here, the automatic whitespace rule is also dsl::ascii::space. The current production doesn’t have a whitespace member, but its root production (root_prod) does.

  3. Here, the current production is a token production, so there is no automatic whitespace. The root production is reset to token_p.

  4. Here, the root production is token_p, as that is the root of the parent. As such, there is no automatic whitespace.

lexy::dsl::no_whitespace()

lexy/dsl/whitespace.hpp
no_whitespace(rule)   : Rule
no_whitespace(branch) : Branch

The no_whitespace rule parses the given rule but disables automatic whitespace skipping while doing so. It is a branch if given a branch.

Branch Condition

Whatever branch uses as branch condition. Note that automatic whitespace skipping inside a branch condition is impossible anyway, so nothing is changed there.

Matches

Matches and consumes rule but without performing automatic whitespace skipping after every token; it disables the automatic whitespace rule during the parsing of rule. After rule has been matched, skips implicit whitespace by matching and consuming lexy::dsl::whitespace.

Values

All values produced by rule.

Errors

All errors raised by rule.

Caution
When rule contains a lexy::dsl::p or lexy::dsl::recurse rule, whitespace skipping is re-enabled while that production is parsed.

Primitive Tokens

Note
All tokens, not just the tokens defined here, do implicit whitespace skipping. As such, a token t is really equivalent to t + dsl::whitespace. This has no effect, unless a whitespace rule has been specified.

lexy::dsl::any

lexy/dsl/any.hpp
any : Token

The any token matches anything, i.e. all the remaining input.

Matches

All the remaining input.

Error

n/a (it never fails)

Note
any is useful in combination with partial inputs such as the minus rule or switch_.

lexy::dsl::lit

lexy/dsl/literal.hpp
lit_c<C> : Token
lit<Str> : Token

LEXY_LIT(Str) : Token

The literal tokens match the specified sequence of characters.

Requires
  • C is a character literal.

  • Str is a string literal.

    In both cases, their encoding must be ASCII or match the encoding of the input.

Matches

The specified character or string of characters, which are consumed.

Error

lexy::expected_literal giving it the string and the index where the match failure occurred.

Note
lit<Str> requires C++20 support for extended NTTPs. Use the LEXY_LIT(Str) macro if your compiler does not support them.
lexy/dsl/punctuator.hpp
period    : Token = lit<".">
comma     : Token = lit<",">
colon     : Token = lit<":">
semicolon : Token = lit<";">

hyphen     : Token = lit<"-">
slash      : Token = lit<"/">
backslash  : Token = lit<"\\">
apostrophe : Token = lit<"'">

hash_sign   : Token = lit<"#">
dollar_sign : Token = lit<"$">
at_sign     : Token = lit<"@">

The header lexy/dsl/punctuator.hpp defines common punctuator literals. They are equivalent to a literal matching the specified character.

Character classes

lexy::dsl::eof

lexy/dsl/eof.hpp
eof : Token

The eof token matches EOF.

Matches

Only if the reader is at the end of the input. It does not consume anything (it can’t).

Error

lexy::expected_char_class with the name EOF.

lexy::dsl::newline

lexy/dsl/newline.hpp
newline : Token

The newline token matches a newline.

Matches

\n or \r\n, which is consumed.

Error

lexy::expected_char_class with the name newline.

lexy::dsl::eol

lexy/dsl/newline.hpp
eol : Token

The eol token matches an end-of-line (EOL).

Matches

\n or \r\n, which is consumed. Also matches EOF, which is not consumed.

Error

lexy::expected_char_class with the name EOL.

lexy::dsl::ascii::*

lexy/dsl/ascii.hpp
namespace ascii
{
    control : Token // 0x00-0x1F, 0x7F

    blank       : Token // ' ' (space character) or '\t'
    newline     : Token // '\n' or '\r'
    other_space : Token // '\f' or '\v'
    space       : Token // `blank` or `newline` or `other_space`

    lower : Token // a-z
    upper : Token // A-Z
    alpha : Token // `lower` or `upper`

    digit : Token // 0-9
    alnum : Token // `digit` or `alpha`

    punct : Token // One of: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~

    graph : Token // `alnum` or `punct`
    print : Token // `graph` or ' ' (space characters)

    character : Token // 0x00-0x7F
}

All tokens defined in lexy::dsl::ascii match one of the categories of ASCII characters.

Matches

Matches and consumes one of the set of ASCII characters indicated in the comments.

Errors

A lexy::expected_char_class error with name ASCII.<token>, where <token> is the name of the token.

Note
Every ASCII character except for the space character is in exactly one of control, lower, upper, digit or punct.

lexy::dsl::code_point

lexy/dsl/code_point.hpp
code_point : Token

code_point.capture() : Rule

The code_point token will match and consume a well-formed Unicode code point according to the encoding of the input. If code_point.capture() is used, the consumed code point will be produced as value.

Requires

The encoding of the input is lexy::ascii_encoding, lexy::utf8_encoding, lexy::utf16_encoding, or lexy::utf32_encoding.

Matches

Matches and consumes all code units of the next code point. For ASCII and UTF-32 this is only one, but for UTF-8 and UTF-16 it can be multiple code units. If the code point is too big or a UTF-16 surrogate, it fails. For UTF-8, it also fails for overlong sequences.

Value

If .capture() was called, it will produce the matched code point as a lexy::code_point.

Errors

If it could not match a valid code point, it fails with a lexy::expected_char_class error with name <encoding>.code_point.

Example
// Match and capture one arbitrary code point.
dsl::code_point.capture()
Tip
If you want to match a specific code point, use a literal rule instead. This rule is useful for matching things like string literals that can contain arbitrary code points.

lexy::dsl::operator-

lexy/dsl/minus.hpp
token - except : Token

The minus rule matches the given token, but only if except does not match on the input the rule has consumed.

Requires

except is a token.

Matches

Matches and consumes whatever token match and consume. Then matches except on the same input. Matching fails if except matches the entire input consumed by the token.

Errors

Whatever errors are raised if token is not matched. A generic error with tag lexy::minus_failure if except has matched.

Tip
Use a minus rule to exclude characters from a character class; e.g. lexy::dsl::code_point - lexy::dsl::ascii::control matches all code points except control characters.
Note
Minus rules can be chained. This is equivalent to specifying an alternative for except.
Warning
except has to match everything the rule has consumed before; partial matches don’t count. Use token - (except + lexy::dsl::any) if you want to allow a partial match.

lexy::dsl::token

lexy/dsl/token.hpp
token(rule) : Token

The token rule turns an arbitrary rule into a token by parsing it and discarding all values it has produced.

Matches

Whatever rule matches, which will be consumed.

Error

A generic error with tag lexy::missing_token if the rule did not match.

Note
While token() is optimized to prevent any overhead created by constructing values that are later discarded, it still should only be used when required.

Values

The following rules are used to produce additional values without any additional matching.

lexy::dsl::value_*

lexy/dsl/value.hpp
value_c<Value> : Rule
value_f<Fn>    : Rule
value_t<T>     : Rule
value_str<Str> : Rule

LEXY_VALUE_STR(Str) : Rule

The value_* rules create a constant value without parsing anything.

Requires
  • Value is any constant.

  • Fn is a pointer to a function taking no arguments.

  • T is a default-constructible type.

  • Str is a string literal.

Matches

Any input, but does not consume anything.

Value
value_c

The specified constant.

value_f

The result of invoking the function.

value_t

A default constructed object of the specified type.

value_str

The string literal as a pointer, followed by its size.

Error

n/a (it does not fail)

Tip
Use the value_* rules only to create symmetry between different branches. Everything they do, can also be achieved using callbacks, which is usually a better solution.
Warning
The function might not be called or the object might not be constructed in all situations. You cannot rely on their side effects.
Note
value_str<Str> requires C++20 support for extended NTTPs. Use the LEXY_VALUE_STR(Str) macro if your compiler does not support them.

lexy::dsl::nullopt

lexy/dsl/option.hpp
namespace lexy
{
    struct nullopt
    {
        template <typename T>
        constexpr operator T() const;
    };
}

The lexy::nullopt type represents an empty optional. It is implicitly convertible to any type that has a default constructor (T()), a dereference operator (*t), and a contextual conversion to bool (if (t)). Examples are pointers or std::optional. The conversion operator returns a default constructible object, i.e. an empty optional.

lexy/dsl/option.hpp
nullopt : Rule

The nullopt rule produces a value of type lexy::nullopt without parsing anything.

Matches

Any input, but does not consume anything.

Value

An object of type lexy::nullopt.

Error

n/a (it does not fail)

Note
It is meant to be used for symmetry with together with the opt() rule.

lexy::dsl::label and lexy::dsl::id

lexy/dsl/label.hpp
namespace lexy
{
    template <typename Tag>
    struct label
    {
        // only if Tag::value is well-formed
        consteval operator auto() const
        {
            return Tag::value;
        }
    };

    template <auto Id>
    using id = label<std::integral_constant<int, Id>>;
}
lexy/dsl/label.hpp
label<Tag> : Rule
id<Id>     : Rule

The label and id rules are used to disambiguate between two branches that create otherwise the same values but should resolve to different callbacks. They simply produce the empty tag object or the id to differentiate them without parsing anything.

Requires
  • Tag is any type.

  • Id is an integer constant.

Matches

Any input, but does not consume anything.

Value
label<Tag>

A lexy::label<Tag> object.

id<Id>

A lexy::id<Id> object.

Error

n/a (it does not fail)

lexy/dsl/label.hpp
label<Tag>(rule)   : Rule   = label<Tag> + rule
label<Tag>(branch) : Branch = /* as above, except as branch */

id<Id>(rule)   : Rule   = id<Id> + rule
id<Id>(branch) : Branch = /* as above, except as branch */

For convenience, label and id have function call operators. They produce the label/id and then parse the rule.

lexy::dsl::capture

lexy/dsl/capture.hpp
capture(rule)   : Rule
capture(branch) : Branch

The capture() rule takes an arbitrary rule and parses it, capturing everything it has consumed into a lexy::lexeme. It is a branch if given a branch.

Branch Condition

The branch condition is whatever branch uses as a branch condition.

Matches

Matches and consumes whatever rule matches.

Values

A lexy::lexeme which begins at the original reader position and ends at the reader position after rule has been parsed, followed by any other values produced by parsing the rule in the same order.

Errors

All errors raised by rule. It cannot fail itself.

Example
// Captures the entire input.
dsl::capture(dsl::any)

lexy::dsl::position

lexy/dsl/position.hpp
position : Rule

The position rule creates as its value an iterator to the current reader position without consuming any input.

Matches

Any input, but does not consume anything.

Value

An iterator to the current position of the reader.

Error

n/a (it does not fail)

Example
// Parses the entire input and returns the final position.
dsl::any + dsl::position
Tip
Use position when creating an AST whose nodes are annotated with their original source position.

Errors

The following rules are used to customize/improve error messages or recover from errors.

.error<Tag>

token.error<Tag> : Token

The error member on tokens changes the error that is raised when a token failed.

Matches

Matches and consumes what token matches.

Error

A generic error with the specified Tag.

Tip
It is useful for tokens such as dsl::token() and operator-, where the result is a generic tag such as lexy::missing_token or lexy::minus_failure.

lexy::dsl::error

lexy/dsl/error.hpp
error<Tag>       : Branch
error<Tag>(rule) : Branch

The error rule always fails and produces an error with the given tag. For the second version, the rule is matched first to determine the error range.

Branch Condition

Branch is always taken.

Matches

Nothing and always fails.

Error

An error object of the specified Tag. If the optional rule is given, it will be matched (without producing values or errors). If it matched successfully, the previous and new reader position will be used to determine the error range. Otherwise, the error has no range.

Tip
Use it as the final branch of a choice rule to customize the lexy::exhausted_choice error.

lexy::dsl::require and lexy::dsl::prevent

lexy/dsl/error.hpp
require(rule).error<Tag> : Rule
prevent(rule).error<Tag> : Rule

The require and prevent rules can be used to lookahead and fail if the input matches or does not match the token.

Matches

Both match the rule without consuming input (or producing values or errors). require fails if the rule did not match; rule fails if it did.

Error

An error object of the specified Tag.

Example
// Parses a sequence of digits but raises an error with tag `forbidden_leading_zero` if a zero is followed by more digits.
// Note: this is already available as `dsl::digits<>.no_leading_zero()`.
dsl::zero >> dsl::prevent<forbidden_leading_zero>(dsl::digits<>)
    | dsl::digits<>
Tip
Use prevent together with times to prevent the rule from matching more than the specified number of times.

lexy::dsl::try_

lexy/dsl/recover.hpp
try_(rule) : Rule
try_(rule, recovery_rule) : Rule

The try_ rule matches and consumes rule. If that fails, it recovers from the error and continues as if it didn’t fail. The first overload recovers by doing nothing, the second recovers by parsing the recovery rule.

Matches

Matches and consumes rule. If that fails, matches and consumes recovery_rule.

Values

All values produced by rule if rule was parsed successfully. All values produced by recovery_rule otherwise.

Error

All errors raised by rule or recovery_rule.

lexy::dsl::find

lexy/dsl/recover.hpp
find(token_1, ..., token_n) : Rule
find(token_1, ..., token_n).limit(token_1, ..., token_n) : Rule

The find rule is designed to be used as a recovery rule. It matches and consumes everything until it finds one of the tokens. The tokens are not consumed.

If a limit is specified, recovery fails if the limiting tokens are found first. The limiting tokens are not consumed either.

lexy::dsl::recover

lexy/dsl/recover.hpp
recover(branch_1, ..., branch_n) : Rule
recover(branch_1, ..., branch_n).limit(branch_1, ..., branch_n) : Rule

The recover rule is designed to be used as a recovery rule. It matches and consumes everything until one of the recovery branches match. It then matches and consumes the branch.

If a limit is specified, recovery fails if the limiting tokens are found first. The limiting tokens are not consumed.

Branch conditions

The following rules are designed to be used as the condition of an operator>>. They have no effect if not used in a context that requires a branch.

lexy::dsl::else_

lexy/dsl/branch.hpp
else_ : Branch

If else_ is used as a condition, that branch will be taken unconditionally. It must be used as a last alternative in a choice.

lexy::dsl::peek

lexy/dsl/peek.hpp
peek(rule) : Branch

The peek branch is taken if rule matches, but does not consume it.

Caution
Automatic whitespace skipping is disabled while determining whether rule matches.
Caution
Long lookahead can slow down parsing speed due to backtracking.

lexy::dsl::peek_not

lexy/dsl/peek.hpp
peek_not(rule) : Branch

The peek_not() branch is taken if rule does not match, but does not consume it.

Caution
Automatic whitespace skipping is disabled while determining whether rule matches.
Caution
Long lookahead can slow down parsing speed due to backtracking.

lexy::dsl::lookahead

lexy/dsl/lookahead.hpp
lookahead(needle, end) : Branch

The lookahead branch is taken if lookahead finds needle before end is found, which must both be tokens. No characters are consumed.

Caution
Long lookahead can slow down parsing speed due to backtracking.

Branches

lexy::dsl::operator+

lexy/dsl/sequence.hpp
rule + rule   : Rule
token + token : Branch

A sequence rule matches multiple rules one after the other. It is a branch if it is a sequence of tokens.

Branch Condition

Branch is only taken if all tokens match in order.

Matches

Matches and consume the first rule, then matches and consumes the second rule, and so on. Only succeeds if all of them succeed.

Values

All the values produced by the rules in the same order as they were matched.

Errors

Whatever errors are raised by the individual rules.

lexy::dsl::operator>>

lexy/dsl/branch.hpp
branch >> rule : Branch

The operator>> is used to turn a rule into a branch by giving it a branch condition, which must be a branch itself. If the branch is used as a normal rule, it first matches the condition followed by the rule. If it is used in a context that requires a branch, the branch is checked to determine whether it should be taken.

Branch Condition

Whatever branch uses as branch condition.

Matches

Matches and consume the branch, then matches and consumes the rule. Only succeeds if all of them succeed.

Values

All the values produced by the branch and rule in the same order as they were matched.

Errors

Whatever errors are raised by the individual branch and rule.

lexy::dsl::if_

lexy/dsl/if.hpp
if_(branch) : Rule

The if_ rule matches a branch only if its condition matches.

Matches

First matches the branch condition. If that succeeds, consumes it and matches and consumes the rest of the branch. Otherwise, consumes nothing and succeeds anyway.

Values

Any values produced by the branch.

Errors

Any errors produced by the branch. It will only fail after the condition has been matched.

Example
// Matches an optional C style comment.
dsl::if_(LEXY_LIT("/*") >> dsl::until(LEXY_LIT("*/")))

lexy::dsl::opt

lexy/dsl/opt.hpp
opt(branch) : Rule = branch | else_ >> nullopt

The opt rule matches a branch only if its condition matches. Unlike if_, if the branch was not taken, it produces a lexy::nullopt.

Matches

First matches the branch condition. If that succeeds, consumes it and matches and consumes the rest of the branch. Otherwise, consumes nothing and succeeds anyway.

Values

If the branch condition matches, any values produced by the rule. Otherwise, a single object of type lexy::nullopt.

Errors

Any errors produced by the branch. It will only fail after the condition has been matched.

Example
// Matches an optional list of alpha characters.
// (The id<0> is just there, so the sink will be invoked on each character).
// If no items are present, it will default construct the list type.
dsl::opt(dsl::list(dsl::ascii::alpha >> dsl::id<0>))

lexy::dsl::operator|

lexy/dsl/choice.hpp
branch  | branch  : Branch

A choice rule matches the first branch in order whose condition was matched. It is always a branch, that is taken if any of its branches is taken.

Matches

Tries to match the condition of each branch in the order they were specified. As soon as one branch condition matches, matches and consumes that branch without ever backtracking to try another branch. If no branch condition matched, fails without consuming anything.

Values

Any values produced by the selected branch.

Errors

Any errors raised by the then of the selected branch. If no branch condition matched, a generic error with tag lexy::exhausted_choice.

Example
// A contrived example to illustrate the behavior of choice.
// Note that branch with id 1 will never be taken, as branch 0 takes everything starting with a and then fails if it isn't followed by bc.
// The correct behavior is illustrated with 2 and 3, there the branch with the longer condition is listed first.
dsl::id<0>(LEXY_LIT("a") >> LEXY_LIT("bc"))
  | dsl::id<1>(LEXY_LIT("a") >> LEXY_LIT("b"))
  | dsl::id<2>(LEXY_LIT("bc"))
  | dsl::id<3>(LEXY_LIT("b"))
Note
The C++ operator precedence is specified in such a way that condition >> a | else_ >> b works. The compiler might warn that the precedence is not intuitive without parentheses, but in the context of this DSL it is the expected result.
Tip
Use … | error<Tag> to raise a custom error instead of lexy::exhausted_choice.

lexy::dsl::operator/

lexy/dsl/alternative.hpp
token / token : Token

An alternative rule tries to match each token in order, backtracking if necessary.

Matches

Tries to match each token in the order they were specified. As soon as one token matches, consumes it and succeeds. If no token matched, fails without consuming anything.

Errors

A generic error with tag lexy::exhausted_alternatives if no token matched.

Note
If an alternative consists of only literals, a trie is used to efficiently match them without backtracking.
Caution
Use a choice rule with a suitable condition to avoid potentially long backtracking.

lexy::dsl::switch_

lexy/dsl/switch.hpp
switch_(rule) : Rule

switch_(rule).case_(branch)  : Rule
switch_(rule).default_(rule) : Rule = switch_(rule).case_(else_ >> rule)
switch_(rule).error<Tag>     : Rule = switch_(rule).case_(error<Tag>(any))

The switch_ rule matches a rule and then switches over the input the rule has consumed. Switch cases can be added by calling .case_(); they are tried in order. A default case is added using .default_(); it is taken unconditionally. Alternatively, an error case can be added using .error<Tag>(); it produces an error if no previous case has matched.

Matches

First matches and consumes the switched rule. What the rule has consumed is then taken as the entire input for matching the switch cases. Then it tries to match the branch conditions of each case in order. When a branch condition matches, that case is taken and its then is matched. If no case has matched, it fails.

Values

Any values produced by the switched rule followed by any values produced by the selected case.

Errors

If the switched rule fails to match, any errors raised by it. If the branch condition of a case has matched, any errors raised by the then. If the switch had an error case, a generic error with the specified Tag is raised whose range is everything consumed by the switched rule. Otherwise, a generic error with tag lexy::exhausted_switch is raised.

Example
// Parse identifiers (one or more alpha numeric characters) but detect the three reserved keywords.
// We use `+ dsl::eof` in the case condition to ensure that `boolean` is not matched as `bool`.
dsl::switch_(dsl::while_one(dsl::ascii::alnum))
    .case_(LEXY_LIT("true")  + dsl::eof >> dsl::id<1>)
    .case_(LEXY_LIT("false") + dsl::eof >> dsl::id<2>)
    .case_(LEXY_LIT("bool")  + dsl::eof >> dsl::id<3>)
    .default_(dsl::id<0>) // It wasn't a reserved keyword but a normal identifier.

// Note: a more efficient and convenient method for handling keywords is planned.
Note
It does not matter if the then of a case does not consume everything the original rule has consumed. As soon as the then has matched everything parsing continues from the reader position after the switched rule has been matched.

Loops

lexy::dsl::until

lexy/dsl/until.hpp
until(token)          : Token
until(token).or_eof() : Token = until(token / eof)

The until token consumes all input until the specified token matches, then consumes that.

Matches

If the closing token matches, consumes it and succeeds. Otherwise, consumes one code unit and tries again. If EOF is reached, fails, unless .or_eof() was called, in which case it also succeeds having consumed everything until the end of the input.

Errors

It can only fail if the reader has reached the end of the input without matching the condition. Then it raises the same error as raised if the condition would be matched at EOF.

Example
// Matches a C style comment.
// Note that we don't care what it contains.
LEXY_LIT("/*") >> dsl::until(LEXY_LIT("*/"))
Note
until includes the token.

lexy::dsl::loop

lexy/dsl/loop.hpp
loop(rule) : Rule

break_ : Rule

The loop rule matches the given rule repeatedly until it either fails to match or a break_ rule was matched.

Requires

rule must not produce any values. break_ must be used inside a loop.

Matches

While the rule matches, consumes it and repeats. If a break_ is matched, parsing will stop immediately and it succeeds. If the rule does not match, it fails.

Values

No values are produced.

Errors

Any errors raised when the rule fails to match.

Note
The loop rule is mainly used to implement other rules. It is unlikely that you are going to need it yourself.
Warning
If rule contains a branch that will not consume any characters but does not break, loop will loop forever.

lexy::dsl::while_

lexy/dsl/while.hpp
while_(branch) : Rule

The while rule matches a branch as long as it condition has matched.

Requires

branch must not produce any values.

Matches

While the branch condition matches, matches and consumes the then then repeats. If the branch condition does not match anymore, succeeds without consuming additional input.

Values

No values are produced.

Errors

The rule can only fail if the then of the branch fails. Then it will raise its error unchanged.

Warning
If the branch does not consume any characters, while_ will loop forever.

lexy::dsl::while_one()

lexy/dsl/while.hpp
while_one(branch)  : Branch  = branch >> while_(branch)

The while_one rule matches a rule one or more times.

lexy::dsl::do_while()

lexy/dsl/while.hpp
do_while(rule, condition_branch)   : Rule   = rule + while_(condition_branch >> rule)
do_while(branch, condition_branch) : Branch = branch >> while_(condition_branch >> rule)

The do_while rule matches a rule first unconditionally, and then again repeatedly while the rule matches.

Example
// Equivalent to `dsl::list(dsl::ascii::alpha, dsl::sep(dsl::comma))` but does not produce a value.
dsl::do_while(dsl::ascii::alpha, dsl::comma)

lexy::dsl::sep and lexy::dsl::trailing_sep

lexy/dsl/separator.hpp
sep(branch)
sep(branch).trailing_error<Tag>
trailing_sep(branch)

sep and trailing_sep are used to specify a separator between repeated items; they are not rules that can be parsed directly.

Use sep(branch) to indicate that branch has to be consumed between two items. If it would match after the last item (and the item is a branch to allow checking for it), an error is raised. It has the tag lexy::unexpected_trailing_separator unless a different one was specified using .trailing_error.

Use trailing_sep(branch) to indicate that branch has to be consumed between two items and can occur after the final item. If it matches after the last item, it is consumed as well.

lexy::dsl::times

lexy/dsl/times.hpp
namespace lexy
{
    template <std::size_t N, typename T>
    using times = T (&)[N];

    template <typename T>
    using twice = times<2, T>;
}
lexy/dsl/times.hpp
times<N>(rule)      : Rule
times<N>(rule, sep) : Rule

twice(rule)      : Rule = times<2>(rule)
twice(rule, sep) : Rule = times<2>(rule, sep)

The times rule repeats the rule N times with optional separator in between and collects all produced values into an array. The twice rule is a convenience alias for N = 2.

Requires

The separator must not produce any values. All values produced by the parsing the rule must have a common type. In particular, the rule must only produce one value.

Matches

If no separator is specified, matches and consumes rule N times. If a separator is specified, matches and consumes rule N times, consuming the separator between two items and potentially after all items if the separator is trailing.

Values

Produces a single array containing N items which are all the values produced by each repetition. The typedef lexy::times or lexy::twice can be used to process that array.

Errors

All errors raised by matching the rule or separator.

Example
// Parses an IPv4 address (4 uint8_t's seperated by periods).
dsl::times<4>(dsl::integer<std::uint8_t>(dsl::digits<>), dsl::sep(dsl::period))

lexy::dsl::list

lexy/dsl/list.hpp
list(rule)   : Rule
list(branch) : Branch

list(rule, sep)   : Rule
list(branch, sep) : Branch

The list rule matches a rule one or more times, optionally separated by a separator. Values produced by the list items are forwarded to a sink callback.

Branch Condition

Whatever branch uses as branch condition.

Requires

The item rule must be a branch unless a non-trailing separator is used (in that case the separator can be used as condition). A production whose rule contains list() must provide a sink.

Matches

Matches and consumes the item rule one or more times. In between items and potentially after the final item, a separator is matched and consumed if provided according to its rules. If the separator is provided and non-trailing, the existence of a separator determines whether or not the rule should be matched again. Otherwise, the branch condition of the branch rule or an added else branch of the choice rule is used to determine that.

Values

Only a single value, which is the result of the finished sink. Every time the item rule is parsed, all values it produces are passed to the sink which is invoked once per iteration. If the separator is captured, its lexeme is also passed to the sink, but in a separate invocation.

Errors

All errors raised when parsing the item rule or separator.

Example
// Parses a list of integers seperated by (a potentially trailing) comma.
// As the separator is trailing, it cannot be used to determine the end of the list.
// As such we peek whether the input contains a digit in our item condition.
// The sink is invoked with each integer.
dsl::list(dsl::peek(dsl::digit<>) >> dsl::integer<int>(dsl::digits<>),
          dsl::trailing_sep(dsl::comma))
Tip
Use one of the bracketing rules if your list item does not have an easy condition and the list is surrounded by given tokens anyway.

lexy::dsl::opt_list

lexy/dsl/list.hpp
opt_list(branch)      : Rule
opt_list(branch, sep) : Rule

The opt_list rule matches a rule zero or more times, optionally separated by a separator. Values produced by the list items are forwarded to a sink callback.

Requires

The item rule must be a branch. A production whose rule contains opt_list() must provide a sink.

Matches

Checks whether the item rule would match using its branch condition. If it does, matches and consumes list(branch, sep). Otherwise, consumes nothing and succeeds.

Values

If the list is non-empty, the result of the sink produced by parsing the list() rule. Otherwise, the result of a sink that is immediately finished without invoking it once.

Errors

If the list is non-empty, all errors raised by parsing the list() rule.

lexy::dsl::combination

lexy/dsl/combination.hpp
combination(branch1, branch2, ...) : Rule
combination(branch1, branch2, ...).duplicate_error<Tag> : Rule
combination(branch1, branch2, ...).missing_error<Tag> : Rule

The combination rule matches each of the sub-rules exactly once but in any order. Values produced by the rules are forwarded to a sink.

Requires

A production whose rule contains combination() must provide a sink.

Matches

Matches and consumes all rules in an arbitrary order. This is done by parsing the choice created from the branches exactly N times. Branches that have already been taken are not excluded on future iterations. If they are taken again, the rule fails.

Values

Only a single value, which is the result of the finished sink. All values produced by the branches are passed to the sink which is invoked once per iteration.

Errors

All errors raised by parsing the branches. If no branch is matched, but there are still missing branches, a generic error with the tag specified using missing_error is raised, or lexy::exhausted_choice if there was none. If a branch is matched twice, a generic error with the tag specified using duplicate_error is raised, or lexy::combination_duplicate if there was none.

Example
// Matches 'a', 'b', or 'c', in any order.
dsl::combination(dsl::lit_c<'a'>, dsl::lit_c<'b'>, dsl::lit_c<'c'>)
Warning
The branches are tried in order. If an earlier branch always takes precedence over a later one, the combination can never be successful.

lexy::dsl::partial_combination

lexy/dsl/combination.hpp
partial_combination(branch1, branch2, ...) : Rule
partial_combination(branch1, branch2, ...).duplicate_error<Tag> : Rule

The partial_combination rule matches each of the sub-rules at most once but in any order. Values produced by the rules are forwarded to a sink.

Requires

A production whose rule contains partial_combination() must provide a sink.

Matches

Matches and consumes a subset of the rules in an arbitrary order. This is done by parsing the choice created from the branches exactly N times. Branches that have already been taken are not excluded on future iterations. If they are taken again, the rule fails. If no branch is taken, the rule succeeds.

Values

Only a single value, which is the result of the finished sink. All values produced by the branches are passed to the sink which is invoked once per iteration.

Errors

All errors raised by parsing the branches. If a branch is matched twice, a generic error with the tag specified using duplicate_error is raised, or lexy::combination_duplicate if there was none.

Example
// Matches a subset of 'a', 'b', or 'c', in any order.
dsl::partial_combination(dsl::lit_c<'a'>, dsl::lit_c<'b'>, dsl::lit_c<'c'>)
Warning
The branches are tried in order. If an earlier branch always takes precedence over a later one, the combination can never be successful.

Productions

Every rule is owned by a production. The following rules allow interaction with other productions.

lexy::dsl::p and lexy::dsl::recurse

lexy/dsl/production.hpp
p<Production> : Rule or Branch
recurse<Production> : Rule

The p and recurse rules parses the rule of another production. The p rule is a branch, if the rule of the other production is a branch.

Requires

For p, the Production is a complete type at the point of the rule definition. The recurse rule has no such limitations.

Branch Condition

Whatever the production’s rule uses as a branch condition.

Matches

Matches and consumes Production::rule. If Production inherits from lexy::token_production, automatic whitespace is skipped afterwards by matching and consuming the lexy::dsl::whitespace rule.

Values

A single value, which is the result of parsing the production. All values produced by parsing its rule are forwarded to the productions value callback.

Errors

If matching fails, Production::rule will raise an error which is handled in the context of Production. This results in a failed result object, which is converted to our result type and returned.

Example
// Parse a sub production followed by an exclamation mark.
dsl::p<sub_production> + dsl::lit_c<'!'>
Tip
While recurse can be used to implement direct recursion (e.g. prefix >> dsl::p<current_production> | dsl::else_ >> end to match zero or more prefix followed by end), it is better to use loops instead.
Warning
Left recursion will create an infinite loop.
Caution
If a production is parsed while whitespace skipping has been disabled using lexy::dsl::no_whitespace(), it is temporarily re-enabled while Production::rule is parsed. If whitespace skipping has been disabled because the parent production inherits from lexy::token_production, whitespace skipping is still disabled while parsing Production::rule.

lexy::dsl::return_

lexy/dsl/return.hpp
return_ : Rule

Conceptually, each production has an associated function that parses the specified rule. The return_ rule will exit that function early, without parsing subsequent rules.

Requires

It must not be used inside loops.

Matches

Any input, but does not consume anything. Subsequent rules are not matched further.

Values

It does not produce any values, but all values produced so far are forwarded to the callback.

Errors

n/a (it does not fail)

Example
// Match an opening parenthesis followed by 'a' or 'b'.
// If it is followed by 'b', the closing parenthesis is not matched anymore.
dsl::parenthesized(dsl::lit_c<'a'> | dsl::lit_c<'b'> >> dsl::return_)
Caution
When using return_ together with the context sensitive parsing facilities, remember to pop all context objects before the return.

Brackets and terminator

Terminator

lexy/dsl/terminator.hpp
terminator(branch)
terminator(branch).terminator() : Branch = branch
terminator(branch).limit(token_1, ..., token_n)

A terminator can be specified using terminator(). The result is not a rule, but a DSL for specifying that a rule is followed by the terminator. The terminator is defined using a branch; it is returned by calling .terminator().

The terminator rules do automatic error recovery by matching and consuming input until the terminator is found. The error recovery can be limited by calling .limit(), which behaves like the limit of dsl::recover() or dsl::find().

lexy/dsl/terminator.hpp
t(rule) : Rule = rule + t.terminator()

Calling t(rule), where t is the result of a terminator() call, results in a rule that parses the given rule followed by the terminator.

lexy/dsl/terminator.hpp
t.try_(rule) : Rule

Calling t.try_(rule), where t is the result of a terminator() call, results in a rule that tries to parse the given rule followed by the terminator. If an error occurs while parsing rule, recovers by discarding input until the terminator (or the recovery limit) is found.

lexy/dsl/terminator.hpp
t.while_(rule) : Rule
t.while_one(rule) : Rule

t.opt(rule) : Rule

t.list(rule) : Rule
t.list(rule, sep) : Rule

t.opt_list(rule) : Rule
t.opt_list(rule, sep) : Rule

Using t.while_(), t.while_one() t.opt(), t.list(), or t.opt_list(), where t is the result of a terminator() call, results in a rule that parses while_(rule), while_one(rule), opt(rule), list(rule) and opt_list(rule), respectively, but followed by the terminator. The rule does not need to be a branch, as the terminator is used as the branch condition for the while_(), opt() and list() rule.

Brackets

lexy/dsl/brackets.hpp
brackets(open_branch, close_branch)
brackets(open_branch, close_branch).open()  : Branch = open_branch
brackets(open_branch, close_branch).close() : Branch = close_branch
brackets(open_branch, close_branch).limit(token_1, ..., token_n)

A set of open and close brackets can be specified using brackets(). The result is not a rule, but a DSL for specifying that a rule is surrounded by brackets. The open and close brackets are defined using branches; they are returned by calling .open() and .close().

The bracket rules do automatic error recovery by matching and consuming input until the closing bracket is found. The error recovery can be limited by calling .limit(), which behaves like the limit of dsl::recover() or dsl::find().

lexy/dsl/brackets.hpp
b(rule) : Branch = b.open() >> rule + b.close()

Calling b(rule), where b is the result of a brackets() call, results in a rule that parses the given rule surrounded by brackets. The rule is a branch that uses the opening bracket as a branch condition.

lexy/dsl/terminator.hpp
b.try_(rule) : Rule

Calling b.try_(rule), where t is the result of a brackets() call, results in a rule that tries to parse the given rule surrounded by brackets. If an error occurs while parsing rule, recovers by discarding input until the closing bracket (or the recovery limit) is found.

lexy/dsl/brackets.hpp
b.while_(rule) : Branch
b.while_one(rule) : Branch

b.opt(rule) : Branch

b.list(rule) : Branch
b.list(rule, sep) : Branch

b.opt_list(rule) : Branch
b.opt_list(rule, sep) : Branch

Using b.while_(), b.while_one() b.opt(), b.list(), or b.opt_list(), where b is the result of a brackets() call, results in a branch that parses while_(rule), while_one(rule), opt(rule), list(rule) and opt_list(rule), respectively, but surrounded as brackets. The rule does not need to be a branch, as the closing brackets is used as the branch condition for the while_(), opt() and list() rule.

lexy/dsl/brackets.hpp
round_bracketed  = brackets(lit_c<'('>, lit_c<')'>)
square_bracketed = brackets(lit_c<'['>, lit_c<']'>)
curly_bracketed  = brackets(lit_c<'{'>, lit_c<'}'>)
angle_bracketed  = brackets(lit_c<'<'>, lit_c<'>'>)

parenthesized = round_bracketed

Common sets of open and close brackets are pre-defined.

Example
// Parses a list of integers seperated by (a potentially trailing) comma surrounded by parentheses.
// The same example without the parentheses was also used for list,
// but we required a list condition that needed to perform lookahead.
// Now, the closing parentheses is used as the condition and we don't need to lookahead.
dsl::parenthesized.list(dsl::integer<int>(dsl::digits<>),
                        dsl::trailing_sep(dsl::comma))

Numbers

The facilities for parsing integers are split into the digit token, which do not produce any values, and the integer rule, which matches a digit token and converts it into an integer. The integer conversion has to be done during and parsing and not as a callback, as overflow creates a parse error.

Base

lexy/dsl/digit.hpp
namespace lexy::dsl
{
    struct binary;
    struct octal;
    struct decimal;
    struct hex_lower;
    struct hex_upper;
    struct hex;
}

The set of allowed digits and their values is specified using a Base, which is a policy class passed to the rules.

binary

Matches the base 2 digits 0 and 1.

octal

Matches the base 8 digits 0-7.

decimal

Matches the base 10 digits 0-9. If no base is specified, this is the default.

hex_lower

Matches the lower-case base 16 digits 0-9 and a-f.

hex_upper

Matches the upper-case base 16 digits 0-9 and A-F.

hex

Matches the base 16 digits 0-9, A-F, and a-f.

lexy::integer_traits

lexy/dsl/integer.hpp
namespace lexy
{
    template <typename T>
    struct integer_traits
    {
        using type = T;

        static constexpr bool is_bounded;

        template <int Radix>
        static constexpr std::size_t max_digit_count;

        template <int Radix>
        static constexpr void add_digit_unchecked(type& result, unsigned digit);
        template <int Radix>
        static constexpr bool add_digit_checked(type& result, unsigned digit)
    };

    template <>
    struct integer_traits<lexy::code_point>;

    template <typename T>
    struct unbounded
    {};
    template <typename T>
    struct integer_traits<unbounded<T>>
    {
        using type                       = typename integer_traits<T>::type;
        static constexpr bool is_bounded = false;

        template <int Radix>
        static constexpr void add_digit_unchecked(type& result, unsigned digit);
    };
}

The lexy::integer_traits are used for parsing an integer. It controls its maximal value and abstracts away the required integer operations.

The type member is the actual type that will be returned by the parse operation. It is usually T. The parsing algorithm does not require that type is an integer type, it only needs to have a constructor that initializes it from an int. If is_bounded is true, parsing requires overflow checking. Otherwise, parsing does not require overflow checking and max_digit_count and add_digit_checked are not required. max_digit_count returns the number of digits necessary to express the bounded integers maximal value in the given radix. It must be bigger than 1. add_digit_unchecked and add_digit_checked add digit to result by doing the equivalent of result = result * Radix + digit. The _checked version returns true if that has lead to an integer overflow.

The primary template works with any integer type and there is a specialization for lexy::code_point. By wrapping your integer type in lexy::unbounded, you can disable bounds checking during parsing. It specialization of lexy::integer_traits is built on top of the specialization of lexy::integer_traits<T>, but disables all bounds checking. You can specialize lexy::integer_traits for your own integer types.

lexy::dsl::zero

lexy/dsl/digit.hpp
zero : Token

The zero token matches the zero digit.

Matches

Matches and consumes the zero digit 0.

Errors

Raises a lexy::expected_char_class error with the name digit.zero.

lexy::dsl::digit

lexy/dsl/digit.hpp
digit<Base> : Token

The digit token matches a digit of the specified base or decimal if no base was specified.

Matches

Matches and consumes any of the valid digits of the base.

Errors

Raises a lexy::expected_char_class error with the name digit.<base>, where <base> is binary, hex-lower, etc.

lexy::dsl::digits

lexy/dsl/digit.hpp
digits<Base> : Token

digits<Base>.sep(token)        : Token
digits<Base>.no_leading_zero() : Token

The digits token matches a non-empty sequence of digits in the specified base or decimal if no base was specified. Calling .sep() allows adding a digit separator token that can be present at any point between two digits, but is not required. Calling .no_leading_zero() raises an error if one or more leading zeros are encountered. The calls to .sep() and .no_leading_zero() can be chained.

Matches

Matches and consumes one or more digits of the specified base. If a separator was added, it tries to match it after every digit. It is consumed if it was matched, but it does not fail if no separator was present. If a separator is matched without a following digit, it fails. If .no_leading_zero() was called, fails if the first digit was zero and it is followed by another digit or separator. If it could not match any more digits after the initial one, matching succeeds.

Errors

All errors raised by digit<Base>, which can only happen for the initial digit. Raises a generic error with tag lexy::forbidden_leading_zero if a leading zero was matched.

Example
// Matches upper-case hexadecimal digits seperated by ' without leading zeroes.
dsl::digits<dsl::hex_upper>.sep(dsl::digit_sep_tick).no_leading_zero()
Note
The separator can be placed at any point between two digits. There is no validation of rules to ensure it is a thousand separator or similar conventions.

lexy/dsl/digit.hpp
digit_sep_underscore : Token = lit<"_">
digit_sep_tick       : Token = lit<"'">

For convenience, two common digit separators _ and ' are predefined as digit_sep_underscore and digit_sep_tick respectively. However, the digit separator can be an arbitrarily complex token.

lexy::dsl::n_digits

lexy/dsl/digit.hpp
n_digits<N, Base> : Token

n_digits<N, Base>.sep(token) : Token

The n_digits token matches exactly N digits in the specified base or decimal if no base was specified. Calling .sep() allows adding a digit separator token that can be present at any point between two digits, but is not required.

Matches

Matches and consumes exactly N digits of the specified base. If a separator was added, it tries to match it after every digit. It is consumed if it was matched, but it does not fail if no separator was present. If a separator is matched without a following digit, it fails. Separators do not count towards the digit count.

Errors

All errors raised by digit<Base>, which can happen if less than N digits are available. Raises a generic error with tag lexy::forbidden_leading_zero if a leading zero was matched.

Example
// Matches 4 upper-case hexadecimal digits seperated by '.
dsl::n_digits<4, dsl::hex_upper>.sep(dsl::digit_sep_tick)

lexy::dsl::integer

lexy/dsl/integer.hpp
integer<T, Base>(token) : Rule

The integer rule converts the lexeme matched by the token into an integer of type T using the given base. The Base can be omitted if the token is digits or n_digits. It will then be deduced from the token.

Matches

Matches and consumes what token matches.

Values

An integer of type T that is created by the characters the token has consumed. If the token matches characters that are not valid digits of the base (e.g. a digit separator), those characters are skipped. Otherwise, the character is converted to a digit and added to the resulting integer using the lexy::integer_traits.

Errors

Any errors raised by matching the token. If the integer type T is bounded and the integer value would overflow, a generic error with tag lexy::integer_overflow is raised.

Example
// Matches upper-case hexadecimal digits seperated by ' without leading zeroes.
// Converts them into an integer, the base is deduced from the token.
dsl::integer<int>(dsl::digits<dsl::hex_upper>
                        .sep(dsl::digit_sep_tick).no_leading_zero())

lexy::dsl::code_point_id

lexy/dsl/integer.hpp
code_point_id<N, Base> : Rule = integer<lexy::code_point>(n_digits<N, Base>) // approximatively

The code_point_id rule is a convenience rule that parses a code point. It matches N digits in the specified base, which defaults to hex, and converts it into a code point.

Matches

Matches and consumes exactly N digits of the specified base.

Values

The lexy::code_point that is specified using those digits.

Errors

The same error as digit<Base> if fewer than N digits are available. A generic error with tag lexy::invalid_code_point if the code point value would exceed the maximum code point.

lexy::dsl::plus_sign, lexy::dsl::minus_sign, and lexy::dsl::sign

lexy/dsl/sign.hpp
plus_sign  : Rule
minus_sign : Rule

sign : Rule

The plus_sign, minus_sign, and sign rule match an optional sign.

Matches
plus_sign

Matches and consumes a + character, if there is one.

minus_sign

Matches and consumes a - character, if there is one.

sign

Matches and consumes a + or - character, if there is one.

Values

If a + sign was consumed, the value is +1. If a - sign was consumed, the value is -1. If no sign was consumed, the value is +1.

Errors

n/a (they don’t fail)

Example
// Parse a decimal integer with optional minus sign.
dsl::minus_sign + dsl::integer<int>(dsl::digits<>)
Tip
The callback lexy::as_integer takes the value produced by the sign rules together with an integer produced by the integer rule and negates it if necessary.

Delimited and quoted

lexy/dsl/delimited.hpp
delimited(open_branch, close_branch)
delimited(open_branch, close_branch).open()  : Branch = open_branch
delimited(open_branch, close_branch).close() : Branch = close_branch
delimited(open_branch, close_branch).limit(token_1, ..., token_n)

A set of open and close delimiters can be specified using delimited(). The result is not a rule, but a DSL for specifying a sequence of code points to be matched between the delimiters. The open and close delimiters are defined using branches; they are returned by calling .open() and .close().

An optional limit can be specified by calling .limit(). If one of the tokens specified there is found in the input before the closing delimiter has been found, it assumes that there is a missing closing delimiter and raises the appropriate error earlier. This can be used to help error recovery by exiting early instead of consuming the entire input.

lexy/dsl/delimited.hpp
delimited(branch) = delimited(branch, branch)

There is a convenience overload if the same rule is used for the open and closing delimiters.

lexy/dsl/delimited.hpp
quoted        = delimited(lit<"\"">)
triple_quoted = delimited(lit<"\"\"\"">)

single_quoted = delimited(lit<"'">)

backticked        = delimited(lit<"`">)
double_backticked = delimited(lit<"``">)
triple_backticked = delimited(lit<"```">)

Common delimiters are predefined.

Note
The naming of quoted, triple_quoted and single_quoted is not very logical, but reflects common usage.

Simple delimited

lexy/dsl/delimited.hpp
d(token) : Branch

Calling d(token), where d is the result of a delimiter() call, results in a rule that matches token as often as possible surrounded by the delimiters. Everything between the delimiters is captured and forwarded to a sink callback.

Requires

A production whose rule contains a delimited rule must provide a sink.

Branch Condition

Whatever the opening delimiter uses as branch condition.

Matching

Matches and consumes the opening delimiter, followed by zero or more occurrences of token, followed by the closing delimiter. It determines whether or not to parse another instance of token using the condition of the closing delimiter. Automatic whitespace skipping is disabled while matching the opening and closing delimiter, as well as token using lexy::dsl::no_whitespace(). After the closing delimiter has been matched and consumed, whitespace is skipped by matching and consuming lexy::dsl::whitespace.

Values

Values produced by the opening delimiter, the finished sink (which might be empty), and values produced by the closing delimiter. Everything captured by matching the token is forwarded to the sink.

Errors

All errors raised when matching the opening delimiter and the token. If EOF is reached without a closing delimiter, a generic error with tag lexy::missing_delimiter is raised.

Example
// Match a string consisting of code points that aren't control characters.
dsl::quoted(dsl::code_point - dsl::ascii::control)
Note
The sink is only invoked once. A sink callback is only required for consistency with the overload that takes an escape sequence.

Delimited with escape sequences

lexy/dsl/delimited.hpp
d(token, escape)  : Branch

This overload is used to to specify escape sequences in the delimited. It behaves like the other overload, but also matches the escape rule.

Requires

A production whose rule contains a delimited rule must provide a sink. escape must be a branch.

Branch Condition

Whatever the opening delimiter uses as branch condition.

Matching

Matches and consumes the opening delimiter, followed by zero or more occurrences of escape or token, followed by the closing delimiter. It determines whether or not to parse another instance of token using the condition of the closing delimiter. It first tries to match escape, and only then token. . Automatic whitespace skipping is disabled while matching the opening and closing delimiter, as well as token using lexy::dsl::no_whitespace(). After the closing delimiter has been matched and consumed, whitespace is skipped by matching and consuming lexy::dsl::whitespace.

Values

Values produced by the opening delimiter, the finished sink (which might be empty), and values produced by the closing delimiter. Everything captured by matching the token is forwarded to the sink, as well as all values produced by escape.

Errors

All errors raised when matching the opening delimiter, escape and the token. If EOF is reached without a closing delimiter, a generic error with tag lexy::missing_delimiter is raised.

Example
// Match a string consisting of code points that aren't control characters.
// `\"` can be used to add a `"` to the string.
dsl::quoted(dsl::code_point - dsl::ascii::control,
            LEXY_LIT("\\\"") >> dsl::value_c<'"'>)
Note
The closing delimiter is used as termination condition here as well. If the escape sequence starts with a closing delimiter, it will not be matched.

lexy::dsl::escape()

lexy/dsl/delimited.hpp
escape(token) : Rule

For convenience, the escape rule can be used to specify the escape token.

An escape rule consists of a leading token that matches the escape character (e.g. \), and zero or more alternatives for characters that can be escaped. It then is equivalent to token >> (alt0 | alt1 | alt2 | error<lexy::invalid_escape_sequence>). It will only be considered after the leading token has been matched and then tries to match one of the alternatives. If no alternative matches, it raises a generic error with tag lexy::invalid_escape_sequence.

lexy/dsl/delimited.hpp
e.rule(branch) : Rule
  = escape_token >> ( ... | branch
                      | else_ >> error<lexy::invalid_escape_sequence>)

Calling e.rule(branch), where e is an escape rule, adds branch to the end of the choice.

lexy/dsl/delimited.hpp
e.capture(token) : Rule
  = escape_token >> (... | capture(token)
                      | else_ >> error<lexy::invalid_escape_sequence>)

Calling e.capture(token), where e is an escape rule, adds an escape sequence that matches and captures token to the end of the choice.

lexy/dsl/delimited.hpp
e.lit<Str>(rule) : Rule
  = escape_token >> (... | lit<Str> >> rule
                      | else_ >> error<lexy::invalid_escape_sequence>)
e.lit<Str>() : Rule
  = e.lit<Str>(value_str<Str>)

e.lit_c<C>(rule) : Rule
  = escape_token >> (... | lit_c<C> >> rule
                      | else_ >> error<lexy::invalid_escape_sequence>)
e.lit_c<C>() : Rule
  = e.lit_c<C>(value_c<C>)

Calling e.lit() or e.lit_c(), where e is an escape rule, adds an escape sequences that matches the literal and produces the values of the rule to the end of the choice. If no rule is specified, it defaults to producing the literal itself.

lexy/dsl/delimited.hpp
backslash_escape = escape(lit_c<'\\'>)
dollar_escape    = escape(lit_c<'$'>)

Common escape characters are predefined.

Example
// Match a string consisting of code points that aren't control characters.
// `\"` can be used to add a `"` to the string.
// `\uXXXX` can be used to add the code point with the specified value.
dsl::quoted(dsl::code_point - dsl::ascii::control,
            dsl::backslash_escape
              .lit_c<'"'>()
              .rule(dsl::lit_c<'u'> >> dsl::code_point_id<4>)

Aggregates

lexy/dsl/member.hpp
member<MemPtr> = rule   : Rule
member<MemPtr> = branch : Branch

LEXY_MEM(Name) = rule   : Rule
LEXY_MEM(Name) = branch : Branch

The member rule together with the lexy::as_aggregate<T> callback assigns the values produced by the rule given to it via = to the specified member of the aggregate T.

Requires

A production that contains a member rule needs to use lexy::as_aggregate<T> as sink or callback. The rule must produce exactly one value.

Matches

Matches and consumes the rule given to it via =.

Values

Produces two values. The first value identifiers the targeted member. For member<MemPtr>, this is the member pointed to by the member pointer. For LEXY_MEM(Name), it is the member with the given Name. The second value is the value produced by the rule.

Errors

All errors raised during parsing of the assigned rule.

The lexy::as_aggregate<T> callback, collects all member and value pairs. It then constructs an object of type T using value initialization and for each pair assigns the value to the specified member of it. This works either as callback or a sink. If a member is specified more than once, the final value is stored at the end.

Example
// Parses two integers separated by commas.
// The first integer is assigned to a member called `second`,
// the second integer is assigned to a member called `first`.
(LEXY_MEM(second) = dsl::integer<int>(dsl::digits<>))
+ dsl::comma
+ (LEXY_MEM(first) = dsl::integer<int>(dsl::digits<>))

Context sensitive parsing

To parse context sensitive grammars, lexy allows the creation of context variables. They allow to save state between different rules which can be used to parse context sensitive elements such as XML with matching opening and closing tag names.

A context variable has a type, which is limited to bool, int and lexy::lexeme, and an identifier, which is given by a type. Before a variable can be used it needs to be created with .create(). It is then available for all rules of the current production: child and parent production cannot access them. Variables are not persistent between multiple invocations of a production; every time a production is parsed it starts out with no variables.

See example/xml.cpp for an example that uses the context sensitive parsing facilities.

lexy::dsl::context_flag

lexy/dsl/context_flag.hpp
context_flag<Id>

A lexy::dsl::context_flag controls a boolean that can be true or false. Each object is uniquely identified by the type Id. It is not a rule but a DSL for specifying operations which are then rules.

context_flag<Id>.create() : Rule
context_flag<Id>.create<Value>() : Rule

The .create() rule does not interact with the input at all. When it is parsed, it creates the flag with the given Id and initializes it to the Value (defaulting to false).

context_flag<Id>.set()   : Rule
context_flag<Id>.reset() : Rule

The .set()/.reset() rules do not interact with the input at all. When they are parsed, they set the flag with the given Id to true/false respectively.

context_flag<Id>.toggle() : Rule

The .toggle() rule does not interact with the input at all. When it is parsed, it toggles the value of the flag with the given Id.

context_counter<Id>.select(rule_true, rule_false) : Rule

The .select() rule selects on the given rules depending on the value of the flag with the given Id. It then parses the selected rule.

Matches

If the value of the flag is true, matches and consumes rule_true. Otherwise, matches and consumes rule_false.

Values

All values produced by parsing the selected rule.

Errors

All errors raised by parsing the selected rule.

context_flag<Id>.require().error<Tag>        : Rule
context_flag<Id>.require<Value>().error<Tag> : Rule

The .require() rule does not interact with the input at all. When it is parsed, it checks that the value of the flag with the given Id is the given Value (defaults to true). If that is the case, parsing continues. Otherwise, the rule fails, producing an error with the given Tag.

lexy::dsl::context_counter

lexy/dsl/context_counter.hpp
context_counter<Id>

A lexy::dsl::context_counter controls a C++ int. Each object is uniquely identified by the type Id. It is not a rule but a DSL for specifying operations which are then rules.

context_counter<Id>.create() : Rule
context_counter<Id>.create<Value>() : Rule

The .create() rule does not interact with the input at all. When it is parsed, it creates the counter with the given Id and initializes it to the Value (defaulting to 0).

context_counter<Id>.inc() : Rule
context_counter<Id>.dec() : Rule

The .inc()/.dec() rules do not interact with the input at all. When they are parsed, they increment/decrement the counter with the given Id respectively.

context_counter<Id>.push(rule) : Rule
context_counter<Id>.pop(rule)  : Rule

The .push()/.pop() rules parse the given rule. The counter with the given Id is then incremented/decremented by the number of characters (code units) consumed by rule.

Matches

Matches and consumes rule.

Values

All values produced by parsing rule.

Errors

All errors raised by parsing rule.

context_counter<Id>.compare<Value>(rule_less, rule_eq, rule_greater) : Rule

The .compare() rule compares the value of the counter with the given Id to Value. It then parses one of the three given rules, depending on the result.

Matches

If the value of the counter is less than Value, matches and consumes rule_less. If the value of the counter is equal than Value, matches and consumes rule_eq. If the value of the counter is greater than Value, matches and consumes rule_greater.

Values

All values produced by parsing the selected rule.

Errors

All errors raised by parsing the selected rule.

context_counter<Id>.require().error<Tag>        : Rule
context_counter<Id>.require<Value>().error<Tag> : Rule

The .require() rule does not interact with the input at all. When it is parsed, it checks that the value of the counter with the given Id is the given Value (defaults to 0). If that is the case, parsing continues. Otherwise, the rule fails, producing an error with the given Tag.

lexy::dsl::context_lexeme

lexy/dsl/context_lexeme.hpp
context_lexeme<Id>

A lexy::dsl::context_flag controls a lexy::lexeme (i.e. a string view on part of the input). Each object is uniquely identified by the type Id. It is not a rule but a DSL for specifying operations which are then rules.

context_lexeme<Id>.create() : Rule

The .create() rule does not interact with the input at all. When it is parsed, it creates the lexeme with the given Id and initializes it to an empty view.

context_lexeme<Id>.capture(rule) : Rule

The .capture() rule parses the given rule. The lexeme with the given Id is then set to view everything the rule has consumed as-if lexy::dsl::capture() was used.

Matches

Matches and consumes rule.

Values

All values produced by parsing rule.

Errors

All errors raised by parsing rule.

context_lexeme<Id>.require(rule).error<Tag> : Rule

The .require() rule parses the given rule, capturing it in a temporary lexeme. The temporary lexeme is then compared with the lexeme given by the Id. If the two lexemes are equal, parsing continues. Otherwise, the rule fails, producing an error with the given Tag.

Matches

Matches and consumes rule.

Values

Discards values produced by rule.

Errors

All errors raised by parsing rule. A generic error with the given Tag if the rule did not match.

Raw input

The following facilities are meant for parsing input that uses the lexy::byte_encoding, that is input consisting of bytes, not text.

lexy::dsl::bom

lexy/dsl/bom.hpp
bom<Encoding, Endianness> : Token

The bom token matches the byte-order mark (BOM) for the given encoding and lexy::encoding_endianness.

Requires

Endianness is lexy::encoding_endianness::little or lexy::encoding_endianness::big.

Matches

If the encoding has a BOM, matches and consumes the BOM written in the given endianness.

Errors

A lexy::expected_char_class error with the name BOM.<encoding>-<endianness> if the BOM was not matched.

Example
// Matches the UTF-16 big endian BOM (0xFE, 0xFF).
dsl::bom<lexy::utf16_encoding, lexy::encoding_endianness::big>
Note
There is a UTF-8 BOM, but it is the same regardless of endianness.
Note
This rule is only necessary when you have a raw encoding that contains a BOM. For example, lexy::read_file() already handles and deals with BOMs for you by default.

lexy::dsl::encode

lexy/dsl/encode.hpp
encode<Encoding, Endianness>(rule) : Rule

The encode rule temporarily changes the encoding of the input. The specified rule will be matched using a Reader whose encoding is Encoding converted from the raw bytes using the specified endianness. If no Endianness is specified, the default is lexy::encoding_endianness::bom, and a BOM is matched on the input to determine the endianness. If no BOM is present, big endian is assumed.

Requires

The input’s encoding is a single-byte encoding (usually lexy::byte_encoding).

Matches

If the endianness is lexy::encoding_endianness::bom, matches and consumes an optional BOM to determine endianness. Matches and consumes rule. However, the input of rule are characters according to Encoding and Endianness, not the single bytes of the actual input.

Values

All values produced by the rule.

Errors

All errors raised by the rule. The error type uses the original reader, not the encoded reader that does the input translation.

Example
// Matches a UTF-8 code point, followed by an ASCII code point.
dsl::encode<lexy::utf8_encoding>(dsl::code_point)
    + dsl::encode<lexy::ascii_encoding>(dsl::code_point)

Custom rules

The exact interface for the Rule, Token and Branch concepts is currently still experimental. Refer to the existing rules if you want to add your own.

Glossary

Branch

A rule that has an associated condition and will only be taken if the condition matches. It is used to make decisions in the parsing algorithm.

Callback

A function object with a return_type member typedef.

Encoding

Set of pre-defined classes that define the text encoding of the input.

Error Callback

The callback used to report errors.

Grammar

An entry production and all productions referenced by it.

Input

Defines the input that will be parsed.

Production

Building-block of a grammar consisting of a rule and an optional callback that produces the parsed value.

Rule

Matches a specific input and then produces a value or an error.

Sink

A type with a sink() method that then returns a function object that can be called multiple times.

Token

A rule that is an atomic building block of the input.