Parse Rules
A Rule is an object which tries to match the beginning of an input character buffer against a particular syntax.
It returns a boost::system::result
containing a value if the match was successful, or an error_code
if the match failed.
Rules are not invoked directly.
Instead they are passed as values to a parse
function, along with the input character buffer to process.
The first overload requires that the entire input string match, otherwise else an error occurs.
The second overload advances the input buffer pointer to the first unconsumed character upon success, allowing a stream of data to be parsed sequentially:
template< class Rule >
auto parse( core::string_view s, Rule const& r)
-> system::result< typename Rule::value_type >;
template< class Rule >
auto parse( char const *& it, char const* end, Rule const& r)
-> system::result< typename Rule::value_type >;
To satisfy the Rule concept, a class
or struct
must declare the nested type value_type
indicating the type of value returned upon success, and a const
member function parse
with a prescribed signature.
In the following code we define a rule that matches a single comma:
struct comma_rule_t
{
// The type of value returned upon success
using value_type = core::string_view;
// The algorithm which checks for a match
system::result< value_type >
parse( char const*& it, char const* end ) const
{
if( it != end && *it == ',')
return core::string_view( it++, 1 );
return error::mismatch;
}
};
Since rules are passed by value, we declare a constexpr
variable of the type for syntactical convenience.
Variable names for rules are usually suffixed with _rule
:
constexpr comma_rule_t comma_rule{};
Now we can call parse
with the string of input and the rule variable thusly:
system::result< core::string_view > rv = parse( ",", comma_rule );
assert( rv.has_value() && rv.value() == "," );
Rule expressions can come in several styles.
The rule defined above is a compile-time constant.
The unsigned_rule
matches an unsigned decimal integer.
Here we construct the rule at run time and specify the type of unsigned integer used to hold the result with a template parameter:
system::result< unsigned short > rv =
parse( "16384", unsigned_rule< unsigned short >{} );
The function delim_rule
returns a rule which matches the passed character literal.
This is a more general version of the comma rule which we defined earlier.
There is also an overload which matches exactly one character from a character set.
system::result< core::string_view > rv = parse( ",", delim_rule(',') );
Error Handling
When a rule fails to match, or if the rule detects a unrecoverable problem with the input, it returns a result assigned from an error_code
indicating the failure.
When using overloads of parse
which have a character pointer as both an in and out parameter, it is up to the rule to define which character is pointed to upon error.
When the rule matches successfully, the pointer is always changed to point to the first unconsumed character in the input, or to the end
pointer if all input was consumed.
It is the responsibilty of library and user-defined implementations of compound rules (explained later) to rewind their internal pointer if a parsing operation was unsuccessful, and they wish to attempt parsing the same input using a different rule. Users who extend the library’s grammar by defining their own custom rules should follow the behaviors described above regarding the handling of errors and the modification of the caller’s input pointer.