Header `lexy/dsl/bom.hpp`

Literal rule `lexy::dsl::bom`

lexy/dsl/bom.hpp

namespace lexy::dsl
{
    template <typename Encoding, lexy::encoding_endianness Endianness>
    constexpr literal-rule auto bom;
}

bom is a literal rule that matches the byte order mark (BOM) of the given encoding in the given lexy::encoding_endianness.

It is equivalent to the lexy::dsl::lit_b rule given in the table above. If the combination of Encoding and Endianness is not listed, it succeeds without matching anything.

The possible BOMs

Encoding Endianness BOM

Encoding	Endianness	BOM
UTF-8	ignored	`lit_b<0xEF, 0xBB, 0xBF>`
UTF-16	little	`lit_b<0xFF, 0xFE>`
UTF-16	big	`lit_b<0xFE, 0xFF>`
UTF-32	little	`lit_b<0xFF, 0xFE, 0x00, 0x00>`
UTF-32	big	`lit_b<0x00, 0x00, 0xFE, 0xFF>`

UTF-8

ignored

lit_b<0xEF, 0xBB, 0xBF>

UTF-16

little

lit_b<0xFF, 0xFE>

UTF-16

big

lit_b<0xFE, 0xFF>

UTF-32

little

lit_b<0xFF, 0xFE, 0x00, 0x00>

UTF-32

big

lit_b<0x00, 0x00, 0xFE, 0xFF>

Example 1. Skip an optional UTF-8 BOM

struct production
{
    static constexpr auto rule = [] {
        auto bom = dsl::bom<lexy::utf8_encoding,
                            // Doesn't matter for UTF-8.
                            lexy::encoding_endianness::little>;
        return dsl::opt(bom) + LEXY_LIT("Hello!") + dsl::eof;
    }();
};

Caution

As a token rule, it matches whitespace immediately following the BOM. As such, the rule is best used in contexts where automatic whitespace skipping is disabled.

Note	When using `lexy::read_file()` as input, BOM has been taken care of by default.

Literal rule lexy::dsl::bom

See also

Literal rule `lexy::dsl::bom`