Header lexy/dsl/bom.hpp

Literal rule lexy::dsl::bom

lexy/dsl/bom.hpp
namespace lexy::dsl
{
    template <typename Encoding, lexy::encoding_endianness Endianness>
    constexpr literal-rule auto bom;
}

bom is a literal rule that matches the byte order mark (BOM) of the given encoding in the given lexy::encoding_endianness .

It is equivalent to the lexy::dsl::lit_b  rule given in the table above. If the combination of Encoding and Endianness is not listed, it succeeds without matching anything.

The possible BOMs
EncodingEndiannessBOM

UTF-8

ignored

lit_b<0xEF, 0xBB, 0xBF>

UTF-16

little

lit_b<0xFF, 0xFE>

UTF-16

big

lit_b<0xFE, 0xFF>

UTF-32

little

lit_b<0xFF, 0xFE, 0x00, 0x00>

UTF-32

big

lit_b<0x00, 0x00, 0xFE, 0xFF>

Example 1. Skip an optional UTF-8 BOM
struct production
{
    static constexpr auto rule = [] {
        auto bom = dsl::bom<lexy::utf8_encoding,
                            // Doesn't matter for UTF-8.
                            lexy::encoding_endianness::little>;
        return dsl::opt(bom) + LEXY_LIT("Hello!") + dsl::eof;
    }();
};
Caution
As a token rule, it matches whitespace immediately following the BOM. As such, the rule is best used in contexts where automatic whitespace skipping is disabled.
Note
When using lexy::read_file() as input, BOM has been taken care of by default.

See also