Header lexy/action/scan.hpp

Class lexy::scan_result

lexy/action/scan.hpp
namespace lexy
{
    struct scan_failed_t {};
    constexpr scan_failed_t scan_failed;

    template <typename T>
    class scan_result
    {
    public:
        using value_type = T;

        constexpr scan_result() = default;
        constexpr scan_result(scan_failed_t);

        constexpr scan_result(T&& value);      // T != void only
        constexpr scan_result(bool has_value); // T == void only

        constexpr bool has_value() const noexcept;
        constexpr explicit operator bool() const noexcept
        {
            return has_value();
        }

        // T != void only
        constexpr const T& value() const& noexcept;
        constexpr T&& value() && noexcept;

        // T != void only
        template <typename U = T>
        constexpr U value_or(U&& fallback) const& noexcept;
        template <typename U = T>
        constexpr U value_or(U&& fallback) && noexcept;
    };
}

The result of an individual scanning operation.

Like std::optional<T>, it is either empty or stores a T. Unlike std::optional<T>, it supports void and is immutable.

Common scanner interface

lexy/action/scan.hpp
namespace lexy
{
    template <reader Reader>
    class scanner-common
    {
    public:
        using encoding = typename Reader::encoding;

        scanner-common(const scanner-common&) = delete;
        scanner-common& operator=(const scanner-common&) = delete;

        //=== status access ===//
        constexpr explicit operator bool() const noexcept;

        constexpr bool is_at_eof() const;

        constexpr auto position() const noexcept
          -> typename Reader::iterator;

        constexpr input auto remaining_input() const noexcept;

        //=== parsing ===//
        template <typename T>
        constexpr void parse(scan_result<T>& result, rule auto rule);

        template <production Production>
        constexpr auto parse(Production production = {})
          -> scan_result<return-type-of-value>;

        constexpr void parse(rule auto rule);

        //=== branch parsing ===//
        template <typename T>
        constexpr bool branch(scan_result<T>& result, branch-rule auto rule);

        template <production Production>
        constexpr bool branch(scan_result<return-type-of-value>& result,
                              Production production = {});

        template <typename T>
        constexpr bool branch(branch-rule auto rule);

        //=== error handling ===//
        class error_recovery_guard;

        constexpr error_recovery_guard error_recovery();

        constexpr bool discard(token-rule auto rule);

        template <typename Tag, typename .... Args>
        constexpr void error(Tag tag, Args&&... args);
        template <typename Tag, typename .... Args>
        constexpr void fatal_error(Tag tag, Args&&... args);

        //=== convenience ===//
        // Forwards to parse(result, rule) overload.
        template <typename T>
        constexpr scan_result<T> parse(rule auto rule);

        // Equivalent to branch(dsl::peek(rule)).
        constexpr bool peek(rule auto rule);

        // Equivalent to parse(result, dsl::integer<T, Base>(digits)).
        template <typename T, typename Base>
        constexpr scan_result<T> auto integer(token-rule auto digits);
        template <typename T>
        constexpr scan_result<T> integer(digits-dsl  digits);
        template <typename T>
        constexpr scan_result<T> integer(ndigits-dsl digits);

        // Forwards to parse(rule) (without value!) and then returns a lexeme.
        constexpr scan_result<lexeme<Reader>> capture(rule auto rule);
        // Forwards to parse(result, dsl::capture_token(rule)).
        constexpr scan_result<lexeme<Reader>> capture_token(token-rule auto rule);
    };
}

The scanner interface common to lexy::scanner and lexy::rule_scanner.

A scanner allows parsing rules manually for full control. It internally stores a reader, which remembers the current position in the input, and it can be in one of three states:

  1. The ok state, which is the default one. In that state, the scanner can be used to parse rules.

  2. The error state, which is entered when a rule fails to parse. In that state, any other rule parse has no effect. Error recovery can be used to clear the state back to the ok state.

  3. The recovery state, which is entered by error recovery. In that state, the scanner can be used to parse rules to recover.

Status access

lexy/action/scan.hpp
constexpr explicit operator bool() const noexcept;

Returns true if the scanner is currently in the ok state, false otherwise (error or recovery).

lexy/action/scan.hpp
constexpr bool is_at_eof() const;

Returns true if the current position of the reader is at EOF, false otherwise.

lexy/action/scan.hpp
constexpr auto position() const noexcept
  -> typename Reader::iterator;

Returns an iterator to the current position of the reader.

Caution
The iterator must only be dereferenced if is_at_eof() == false.
lexy/action/scan.hpp
constexpr input auto remaining_input() const noexcept;

Returns a new input that can be used to access the input from position() until EOF.

Parsing

lexy/action/scan.hpp
template <typename T>
constexpr void parse(scan_result<T>& result, rule auto rule);

template <production Production>
constexpr auto parse(Production production = {})
  -> scan_result<return-type-of-value>;

constexpr void parse(rule auto rule);

Parses the given rule.

If the scanner is in the error state, immediately returns without doing anything. This makes it unnecessary to check for errors after each parse step.

Otherwise, parses the rule beginning at the current reader position. If that succeeds, consumes everything consumed by rule, generating the necessary tokens in the parse tree if necessary, and returns. Otherwise, consumes everything already consumed by rule and puts the scanner in the error state.

The first overload parses the rule as if the parse action lexy::parse was used, regardless of the actual parse action used in the top-level. If rule parses a child production P, it invokes the P::value callback as necessary to produce a value. When the rule succeeds, all arguments produced by rule are passed to lexy::construct<T> and the result stored in result.

The second overload parses the production; it is equivalent to parse(result, dsl::p<Production>). The production can be specified by an explicit template argument or by passing an object as parameter.

The third overload parses the rule as if the parse action lexy::match was used; no value is produced and child productions do not need a ::value member.

Branch parsing

lexy/action/scan.hpp
template <typename T>
constexpr bool branch(scan_result<T>& result, branch-rule auto rule);

template <production Production>
constexpr bool branch(scan_result<return-type-of-value>& result,
                      Production production = {});

template <typename T>
constexpr bool branch(branch-rule auto rule);

Branch parses the given rule.

If the scanner is in the error state, immediately returns false without doing anything. This makes it unnecessary to check for errors after each parse step. It returns false, as the branch couldn’t be taken.

Otherwise, branch parses the rule beginning at the current reader position. If that backtracks, the reader is not advanced and it returns false. If that succeeds, consumes everything consumed by rule, generating the necessary tokens in the parse if necessary, and returns true. Otherwise, consumes everything already consumed by rule and puts the scanner in the error state. It then also returns true, as parsing has already committed to take the branch and only failed later.

Similar to .parse(), the first overload produces a value, the second overload a production, and the third overload does not produce values.

Note
scanner.branch(condition) ? scanner.parse(a) : scanner.parse(b) is entirely equivalent to scanner.parse(condition >> a | dsl::else_ >> b).

Error handling

lexy/action/scan.hpp
class error_recovery_guard
{
public:
    error_recovery_guard(const error_recovery_guard&) = delete;
    error_recovery_guard& operator=(const error_recovery_guard&) = delete;

    constexpr void cancel() &&;
    constexpr void finish() &&;
};

constexpr error_recovery_guard error_recovery();

Allows recovery from a failed state.

Calling .error_recovery() is only allowed when the scanner is currently in the failed state. It puts the scanner in the recovery state and returns a new error_recovery_guard object.

The scanner can then be used to try and recover from the error. If that succeeds, calling .finish() on the error_recovery_guard object puts the scanner in the ok state. Otherwise, calling .cancel() resets the scanner back to the failed state. Any input already consumed during recovery stays consumed.

Example 1. Manually parse an integer surrounded by quotes
struct production : lexy::scan_production<int>
{
    template <typename Reader, typename Context>
    static constexpr scan_result scan(lexy::rule_scanner<Context, Reader>& scanner)
    {
        // Parse the initial quote.
        scanner.parse(dsl::lit_c<'"'>);
        if (!scanner)
            return lexy::scan_failed;

        // Parse an integer.
        lexy::scan_result<int> integer;
        scanner.parse(integer, dsl::integer<int>(dsl::digits<>));
        if (!scanner)
            return lexy::scan_failed;

        // Parse the closing quote.
        scanner.parse(dsl::lit_c<'"'>);
        if (!scanner)
        {
            // Recover by discarding everything until a closing quote is found.
            auto recovery = scanner.error_recovery();
            while (!scanner.branch(dsl::lit_c<'"'>))
            {
                if (!scanner.discard(dsl::ascii::character))
                {
                    // We've failed to recover.
                    LEXY_MOV(recovery).cancel();
                    return lexy::scan_failed;
                }
            }
            LEXY_MOV(recovery).finish();
        }

        return integer.value();
    }
};
lexy/action/scan.hpp
constexpr bool discard(token-rule auto rule);

Parses a token rule and discards it by producing an error token.

If the scanner is in a failed state, returns false without doing anything. Otherwise, attempts to match rule at the current error position. If that consumes a non-zero amount of input, generates an error token. It returns true if matching was successful, false otherwise.

Note
It is meant to be called during error recovery only.
lexy/action/scan.hpp
template <typename Tag, typename .... Args>
constexpr void error(Tag tag, Args&&... args);

template <typename Tag, typename .... Args>
constexpr void fatal_error(Tag tag, Args&&... args);

Raise a lexy::error.

Both overloads construct a lexy::error object with the specified Tag from the specified arguments and forward it to the handler. The second overload then puts the scanner in a failed state, the first overload leaves the state unchanged.

Action lexy::scan

lexy/action/scan.hpp
namespace lexy
{
    template <production ControlProduction,
              input Input, typename ParseState, error-callback ErrorCallback>
    class scanner
    : public scanner-common
    {
    public:
        constexpr const ParseState& parse_state() const;

        constexpr auto finish() && -> lexy::validate_result<ErrorCallback>;
    };

    template <production ControlProduction = void>
    constexpr scanner scan(const input auto& input,
                           error-callback auto error_callback);

    template <production ControlProduction = void, typename ParseState>
    constexpr scanner scan(const input auto& input,
                           const ParseState& parse_state,
                           error-callback auto error_callback);
}

A parse action that allows manual parsing of an input.

Unlike the other actions, it does not directly parse a given production on the input. Instead, it returns a scanner object that allows manual control over the parsing process.

The scanner object starts parsing the input from the beginning using the same handler as lexy::validate internally. It implements the lexy::scanner-common interface for parsing individual rules. During parsing, any errors will be forwarded to the error callback. .finish() can be called at the end to return the result in a lexy::validate_result object, whose status corresponds to the scanner state as follows:

  • If the scanner is in the ok state and no errors have been reported to the error callback, is_success() will return true.

  • If the scanner is in the ok state but error have been reported, is_recovered_error() will return true.

  • Otherwise, if the scanner is not in the ok state, is_fatal_error() will return true.

If the error callback does not return an interesting result, .finish() does not need to be called.

An optional ControlProduction can be specified. This is used to specify whitespace for automatic whitespace skipping, the recursion depth for lexy::dsl::recurse, and other meta data of the "grammar" being parsed. It does not need a ::rule member; any member specified will be ignored.

Example 2. Use lexy as a verbose std::scanf replacement
struct control_production
{
    // Allow ASCII whitespace.
    static constexpr auto whitespace = dsl::ascii::space;
};

int main()
{
    // Construct a scanner for the input.
    auto input   = lexy_ext::compiler_explorer_input();
    auto scanner = lexy::scan<control_production>(input, lexy_ext::report_error);

    // Parse two integers separated by comma.
    auto x = scanner.integer<int>(dsl::digits<>);
    scanner.parse(dsl::comma);
    auto y = scanner.integer<int>(dsl::digits<>);

    std::printf("%d, %d", x.value_or(-1), y.value_or(-1));
}
Tip
See shell.cpp for an example that uses lexy::scan() to handle parsing directives that don’t directly belong to the grammar.
Note
Use lexy::dsl::scan if you want to manually parse some production of your grammar.
Caution
The overload that takes a parse_state internally stores a pointer to it. As such, parse_state must live as long as the lexy::scanner object.

See also