Header lexy/input/buffer.hpp

Inputs that read from a buffer in memory.

Input lexy::buffer

lexy/input/buffer.hpp
namespace lexy
{
    template <encoding Encoding       = default_encoding,
              typename MemoryResource = default-resource>
    class buffer
    {
    public:
        using encoding  = Encoding;
        using char_type = typename encoding::char_type;

        //=== construction ===//
        class builder;

        static buffer adopt(const char_type data, std::size_t size,
                            MemoryResource resource = default-resource);

        constexpr buffer() noexcept;
        constexpr explicit buffer(MemoryResource resource) noexcept;

        template <typename CharT>
        explicit buffer(const CharT data, std::size_t size,
                        MemoryResource resource = default-resource);
        template <typename CharT>
        explicit buffer(const CharT begin, const CharT end,
                        MemoryResource resource = default-resource);
        template <typename View>
        explicit buffer(const View&     view,
                        MemoryResource resource = default-resource);

        buffer(const buffer& other, MemoryResource resource);

        //=== access ===//
        const char_type data() const noexcept;
        std::size_t      size() const noexcept;

        const char_type release() && noexcept;

        reader auto reader() const& noexcept;
    };
}

The class buffer is an immutable, owning variant of lexy::string_input .

All memory allocation is done via a MemoryResource object, which must be a class with the same interface as std::pmr::memory_resource. By default, it uses new and delete. The memory resource object passed to the constructor does not propagate during copy/move/swap.

Example 1. Build a buffer that contains the input
int main()
{
    // Create a buffered input.
    auto input = [] {
        lexy::buffer<lexy::utf8_encoding>::builder builder(2);
        builder.data()[0] = 'H';
        builder.data()[1] = 'i';
        return std::move(builder).finish();
    }();

    // Use the input.
    if (!lexy::match<production>(input))
    {
        std::puts("Error!\n");
        return 1;
    }
}
Tip
As the buffer owns the input, it can terminate it with the EOF character for encodings that have the same character and integer type. This eliminates a branch during parsing, because there is no need to check for the end of the buffer. It also enables the use of SWAR techniques for faster parsing.

Empty constructors

lexy/input/buffer.hpp
constexpr buffer() noexcept;
constexpr explicit buffer(MemoryResource* resource) noexcept;

Both overloads construct an empty buffer. The first one requires that MemoryResource is the default-resource and uses that one; the second one uses the specified resource.

Once the resource is set, it cannot be changed; assignment will only update the memory contents, not the resource.

Pointer constructors

lexy/input/buffer.hpp
template <typename CharT>
explicit buffer(const CharT data, std::size_t size,
                MemoryResource resource = default-resource); (1)
template <typename CharT>
explicit buffer(const CharT begin, const CharT end,
                MemoryResource resource = default-resource); (2)

template <typename CharT>
buffer(const CharT data, std::size_t size)
  -> buffer<deduce_encoding<CharT>>;
template <typename CharT>
buffer(const CharT begin, const CharT end)
  -> buffer<deduce_encoding<CharT>>;
  1. Use the contiguous range [data, data + size) as input.

  2. Use the contiguous range [begin, end) as input.

CharT must be the primary or secondary character type of the encoding. Both overloads use resource for memory allocation. CTAD can be used to deduce the encoding from the character type.

View constructor

lexy/input/buffer.hpp
template <typename View>
    requires requires (View view) {
        view.data();
        view.size();
    }
explicit buffer(const View&     view,
                MemoryResource* resource = default-resource);

template <typename View>
buffer(const View&)
  -> buffer<deduce-encoding>;

Use the contiguous range [view.data(), view.data() + view.size()) as input. Its character type must be the primary or secondary character type of the encoding. It uses resource for memory allocation. CTAD can be used to deduce the encoding from the character type.

Builder

lexy/input/buffer.hpp
class buffer::builder
{
public:
    explicit builder(std::size_t     size,
                     MemoryResource resource = default-resource);

    char_type  data() const noexcept;
    std::size_t size() const noexcept;

    buffer finish() && noexcept;
};

Write the buffer contents incrementally.

The constructor allocates memory for size code units using resource, but does not initialize them. Content can then be written into the memory range [data(), data() + size()). Once everything has been initialized, finish() returns the finalized (and from now on immutable) buffer.

Adoption

lexy/input/buffer.hpp
static buffer adopt(const char_type data, std::size_t size,
                    MemoryResource resource = default-resource);

const char_type* release() && noexcept;

release() returns a pointer to the data of the buffer and relinquishes ownership over it; adopt() reconstructs a buffer object.

Note
data must be the pointer returned by an earlier call to release(), with size and resource matching the original buffer object.

Function lexy::make_buffer_from_raw

lexy/input/buffer.hpp
namespace lexy
{
    template <encoding Encoding, encoding_endianness Endianness>
    struct make-buffer-from-raw
    {
        auto operator()(const void memory, std::size_t size) const
          -> buffer<Encoding, Endianness>;

        template <typename MemoryResource>
        auto operator()(const void memory, std::size_t size,
                        MemoryResource* resource) const
          -> buffer<Encoding, Endianness, MemoryResource>;
    };

    template <encoding Encoding, encoding_endianness Endianness>
    constexpr auto make_buffer_from_raw = make-buffer-from-raw{};
}

Create a buffer from raw memory, handling endianness conversion if necessary.

It returns a buffer object that contains the input of the range [memory, memory + size), allocated using resource, but reinterpreted as code units of the specified encoding and in the specified lexy::encoding_endianness :

  • If Endianness is lexy::encoding_endianness::little/lexy::encoding_endianness::big, it will reinterpret the memory as an array of code units of Encoding, performing a byte swap if necessary. For single byte encodings, this doesn’t do anything special.

  • If Endianness is lexy::encoding_endianness::bom, Encoding must be UTF-8, UTF-16, or UTF-32. It will skip an optional BOM to determine the endianness, defaulting to big, if none was specified. Then behaves like the other overload.

Example 2. Treat a memory mapped file as little endian UTF-16
int main()
{
    // Map a file into memory.
    auto span = map_file("input.txt");

    // Treat the file as little endian UTF-16.
    constexpr auto make_utf16_le
        = lexy::make_buffer_from_raw<lexy::utf16_encoding, lexy::encoding_endianness::little>;
    auto input = make_utf16_le(span.memory, span.size);

    // Use the input.
    if (!lexy::match<production>(input))
    {
        std::puts("Error!\n");
        return 1;
    }
}

Function lexy::make_buffer_from_input

lexy/input/buffer.hpp
namespace lexy
{
    template <input Input, typename MemoryResource = default-resource>
    auto make_buffer_from_input(const Input& input, MemoryResource* resource = default-resource)
      -> buffer<encoding-of-input<Input>, MemoryResource>;
}

Returns a buffer that contains the same characters as the specified input.

The result is a copy of the existing input allocated using the specified resource.

Note
Using a buffer as input as opposed can make parsing more efficient, as lexy can use specialized algorithms that exploit guarantees the buffer makes.

Convenience typedefs

lexy/input/buffer.hpp
namespace lexy
{
    template <encoding Encoding = default_encoding,
              typename MemoryResource = default-resource>
    using buffer_lexeme = lexeme_for<buffer<Encoding, MemoryResource>>;

    template <typename Tag,
              encoding Encoding = default_encoding,
              typename MemoryResource = default-resource>
    using buffer_error = error_for<buffer<Encoding, MemoryResource>, Tag>;

    template <encoding Encoding = default_encoding
              typename MemoryResource = default-resource>
    using buffer_error_context = error_context<buffer<Encoding, MemoryResource>>;
}

Convenience typedefs for buffer.

See also