Header lexy/callback/string.hpp

Callback and sink lexy::as_string

lexy/callback/string.hpp
namespace lexy
{
    template <typename String, encoding Encoding = deduce-encoding-from-string>
    constexpr auto as_string;
}

Callback and sink to construct the given String.

The encoding parameter is only relevant when it needs to encode a lexy::code_point . By default, it is deduced from the character type of String. The character type of Encoding must be compatible with the one from String, i.e. the latter must be the primary or one of the secondary character types.

As a callback, it has the following overloads; A is typename String::allocator_type:

(lexy::nullopt)

Returns an empty string by default-constructing it.

(String&& str)

Forwards an existing string unchanged.

(lexy::lexeme<Reader> lex) and (const A& allocator, lexy::lexeme<Reader> lex)

Requires that the character type of lex is compatible with the character type of Encoding. If the iterator type of lex is a pointer whose value type is the same as the character type of String, returns String(lex.data(), lex.size()); otherwise, returns String(lex.begin(), lex.end()). The second version passes the allocator as last parameter.

(Iterator begin, Iterator end) and (const A& allocator, Iterator begin, Iterator end)

Constructs a string from an iterator range directly.

(lexy::code_point cp) and (const A& allocator, lexy::code_point cp)

Encodes cp in the Encoding, which must be ASCII, UTF-8, UTF-16, or UTF-32. Returns String(begin, end), where [begin, end) is an iterator range to the encoded representation of cp. The second version passes the allocator as last parameter.

As a sink, .sink() can be called with zero arguments or with one argument of type String::allocator_type. In the first case, it default constructs an empty container. In the second case, it constructs it using the allocator. The resulting sink callback has the following overloads and returns the finished string:

(CharT c)

Calls .push_back(c) on the resulting string. Requires that this is well-formed.

(String&& str)

Calls .append(std::move(str)) on the resulting string.

(lexy::lexeme<Reader> lex)

Requires that the character type of lex is compatible with the character type of Encoding. Calls .append(lex.begin(), lex.end()) on the resulting string.

(Iterator begin, Iterator end)

Calls .append(begin, end).

(lexy::code_point cp)

Encodes cp in the Encoding, which must be ASCII, UTF-8, UTF-16, or UTF-32. Calls .append(begin, end), where [begin, end) is an iterator range to the encoded representation of cp, on the resulting string.

It also provides a member function .case_folding() that accepts a case folding DSL object like lexy::dsl::ascii::case_folding  or lexy::dsl::unicode::simple_case_folding . It returns a callback where each resulting string will be case folded before returned. It requires that Encoding is ASCII, UTF-8, UTF-16, or UTF-32.

Example 1. Build a list of characters
struct production
{
    static constexpr auto rule = [] {
        auto item = dsl::capture(dsl::ascii::alpha);
        auto sep  = dsl::sep(dsl::comma);
        return dsl::list(item, sep);
    }();

    static constexpr auto value = lexy::as_string<std::string>;
};
Example 2. Parse a string literal
struct production
{
    // A mapping of the simple escape sequences to their replacement values.
    static constexpr auto escaped_symbols = lexy::symbol_table<char> //
                                                .map<'"'>('"')
                                                .map<'\\'>('\\')
                                                .map<'/'>('/')
                                                .map<'b'>('\b')
                                                .map<'f'>('\f')
                                                .map<'n'>('\n')
                                                .map<'r'>('\r')
                                                .map<'t'>('\t');

    static constexpr auto rule = [] {
        // Arbitrary code points that aren't control characters.
        auto c = -dsl::ascii::control;

        // Escape sequences start with a backlash.
        // They either map one of the symbols,
        // or a Unicode code point of the form uXXXX.
        auto escape = dsl::backslash_escape //
                          .symbol<escaped_symbols>()
                          .rule(dsl::lit_c<'u'> >> dsl::code_point_id<4>);
        return dsl::quoted(c, escape);
    }();

    // Need to specify a target encoding to handle the code point.
    static constexpr auto value = lexy::as_string<std::string, lexy::utf8_encoding>;
};
Example 3. Parse a case-insensitive identifier
struct production
{
    static constexpr auto rule = dsl::identifier(dsl::ascii::alpha);

    static constexpr auto value
        = lexy::as_string<std::string, lexy::ascii_encoding>.case_folding(dsl::ascii::case_folding);
};
Note
lexy::as_string<std::string_view> is a valid callback that can convert a lexy::lexeme  to a std::string_view, provided that the character types are an exact match and that the iterators of the input are pointers.

See also