Warming up — the first parser
Using lexy takes three steps:
Define the grammar.
Create an input.
Call a parse action.
Let’s apply it to parse HTML colors such as #FF00FF
into the following struct:
struct Color
{
std::uint8_t r, g, b;
}:
1. Define the grammar
A grammar consists of productions.
Each production defines a rule, which controls what is being parsed, and produces a value.
In lexy, productions are structs with a static constexpr
member called rule
.
The rule is defined by the DSL: lexy provides simple rules that can be composed to parse more complex things.
A grammar consists of multiple of those structs; it is recommended to put them all in a separate namespace:
#include <lexy/dsl.hpp> (1)
namespace
{
namespace grammar (2)
{
namespace dsl = lexy::dsl; (3)
struct color { … }; (4)
}
}
All DSL objects are defined in
lexy/dsl.hpp
.We use a designated namespace to define our grammar. Since the grammar isn’t meant to be visible outside the current translation unit, we also put it in an anonymous namespace. This can encourage the compiler to do more aggressive inlining during parsing.
For convenience, we alias the namespace of the DSL objects.
Productions are structs.
Let’s start simple by parsing a single color channel, that is, two hex digits.
For that, there is the rule lexy::dsl::n_digits
:
it parses N
digits in the specified base.
As such, a production that parses two hex digits looks like so:
struct channel (1)
{
static constexpr auto rule = dsl::n_digits<2, dsl::hex>; (2)
};
We define a new production
channel
…… that matches 2 hex digits.
A color then consists of a hash sign followed by three channels.
A hash sign can be parsed by lexy::dsl::hash_sign
(which is just a convenience alias for the general lexy::dsl::lit
rule which parses a fixed string);
the channel can be parsed by lexy::dsl::p
, which parses another production.
If we want to parse two or more rules in sequence we can use lexy::dsl::operator+
:
struct color
{
static constexpr auto rule = dsl::hash_sign + dsl::p<channel> + dsl::p<channel> + dsl::p<channel>;
};
That’s a lot of repetition, so we can use lexy::dsl::times
, which parses another rule N
times in sequence.
The grammar as written also allows arbitrary stuff after the last channel, such as #FF00FF Hello World!
.
To prevent that, we need to match lexy::dsl::eof
after the last channel, which only succeeds if we are at EOF.
struct color
{
static constexpr auto rule = dsl::hash_sign + dsl::times<3>(dsl::p<channel>) + dsl::eof;
};
Putting it together, we have a simple grammar that parses an HTML color.
2. Create an input
You can’t just directly parse something; you have to use one of the provided input classes. They also take care of encoding if necessary.
- Strings (
lexy::string_input
)
#include <lexy/input/string_input.hpp>
auto literal = lexy::zstring_input("#FF00FF");
auto str = lexy::string_input(some_string);
- Files (
lexy::read_file
)
#include <lexy/input/file.hpp>
auto file = lexy::read_file<lexy::utf8_encoding>(path);
if (!file) { … }
auto input = file.buffer();
- Command-line arguments (
lexy::argv_input
)
#include <lexy/input/argv_input.hpp>
auto input = lexy::argv_input(argc, argv);
- Iterator ranges (
lexy::range_input
)
#include <lexy/input/range_input.hpp>
auto input = lexy::range_input<lexy::ascii_encoding>(begin, end);
3. Call a parse action
Once you have defined a grammar and an input, you invoke an action that reads the input and processes it according to the grammar.
The simplest action is lexy::match
, which just gives you true
if the input matches the grammar and false
otherwise:
auto good = lexy::zstring_input("#FF00FF");
CHECK(lexy::match<grammar::color>(good) == true);
auto bad = lexy::zstring_input("#FFF");
CHECK(lexy::match<grammar::color>(bad) == false);
If you want to figure out why it didn’t match, you can use lexy::validate
.
It takes an additional error callback
that is invoked with the error, which you can use to print additional information to the user.
The extension library provides a callback that formats the error message nicely and prints it to stderr:
auto bad = lexy::zstring_input("#FFF");
auto result = lexy::validate<grammar::color>(bad, lexy_ext::report_error);
CHECK(result.is_error());
There are also actions to parse the input into a lexy::parse_tree
(lexy::parse_as_tree
) and to trace the parsing algorithm for debugging purposes (lexy::trace
).
Both of those are available to play with in the online playground.
What we really want to do though, is parse the input into our Color
struct.
For that, we need to use the action lexy::parse
:
it parses the input, reports error to the error callback, and produces a user-defined value.
This values is controlled by adding an additional static constexpr
member to each production of the grammar called value
.
It specifies a callback
that is invoked with all values produced during parsing of the rule;
lexy provides common callbacks by including lexy/callback.hpp
.
So what values are produced by parsing the rules?
Well, right now: none.
None of the primitive rules we’ve used produce any values, they just match input.
The exception is lexy::dsl::p
which produces the result of parsing the child production,
but as that doesn’t produce a value currently either, nothing happens.
So instead of just blindly matching the digits, we have to convert them into an integer and produce them.
This can be done by wrapping the lexy::dsl::n_digits
rule into a call to the lexy::dsl::integer
rule,
and providing an appropriate callback:
struct channel
{
static constexpr auto rule = dsl::integer<std::uint8_t>(dsl::n_digits<2, dsl::hex>); (1)
static constexpr auto value = lexy::forward<std::uint8_t>; (2)
};
We want to convert the matched digits into a
std::uint8_t
, which is then produced by parsing the rule.The callback uses
lexy::forward
, which just forwards the produced value as the result of parsing the production.
Now each call to lexy::dsl::p
in the color
production will result in a single std::uint8_t
, which are then passed to the provided callback:
struct color
{
static constexpr auto rule = dsl::hash_sign + dsl::times<3>(dsl::p<channel>) + dsl::eof;
static constexpr auto value = lexy::construct<Color>; (1)
};
Accept the three integers and construct our
Color
struct from them usinglexy::construct
.
Putting it all together
Combining everything, we have the full example for parsing the HTML color into our struct Color
:
#include <cstdio>
#include <lexy/action/parse.hpp>
#include <lexy/callback.hpp>
#include <lexy/dsl.hpp>
#include <lexy_ext/compiler_explorer.hpp>
#include <lexy_ext/report_error.hpp>
namespace
{
struct Color
{
std::uint8_t r, g, b;
};
namespace grammar
{
namespace dsl = lexy::dsl;
struct channel
{
static constexpr auto rule = dsl::integer<std::uint8_t>(dsl::n_digits<2, dsl::hex>);
static constexpr auto value = lexy::forward<std::uint8_t>;
};
struct color
{
static constexpr auto rule = dsl::hash_sign + dsl::times<3>(dsl::p<channel>);
static constexpr auto value = lexy::construct<Color>;
};
} // namespace grammar
} // namespace
int main()
{
auto input = lexy_ext::compiler_explorer_input(); // special input for CompilerExplorer examples
auto result = lexy::parse<grammar::color>(input, lexy_ext::report_error);
if (result.has_value())
{
auto color = result.value();
std::printf("#%02x%02x%02x\n", color.r, color.g, color.b);
}
return result ? 0 : 1;
}
Note how we’re checking whether parsing produced a value with .has_value()
, and not whether there were any parse errors .is_error()
, operator bool()
.
This is because lexy implements error recovery: certain errors can be recovered during parsing.