Word count: 971
As most of us should know, Parsec is a relatively fast, lightweight monadic parser combinator library.
In this post I aim to show that monadic parsing is not only useful, but a simple concept to grok.
We shall implement a simple parsing library with instances of common typeclasses of the domain, such as Monad, Functor and Applicative, and some example combinators to show how powerful this abstraction really is.
Getting the buzzwords out of the way, being monadic just means that Parsers
instances of Monad
. Recall the Monad typeclass, as defined in
Control.Monad
,
class Applicative m => Monad m where
return :: a -> m a
(>>=) :: m a -> (a -> m b) -> m b
{- Some fields omitted -}
How can we fit a parser in the above constraints? To answer that, we must first define what a parser is.
A naïve implementation of the Parser
type would be a simple type
synonym.
type Parser a = String -> (a, String)
This just defines that a parser is a function from a string to a result pair
with the parsed value and the resulting stream. This would mean that parsers are
just state transformers, and if we define it as a synonym for the existing mtl
State
monad, we get the Monad, Functor and Applicative instances for
free! But alas, this will not do.
Apart from modeling the state transformation that a parser expresses, we need a
way to represent failure. You already know that Maybe a
expresses
failure, so we could try something like this:
type Parser a = String -> Maybe (a, String)
But, as you might have guessed, this is not the optimal representation either:
Maybe
does model failure, but in a way that is lacking. It can
only express that a computation was successful or that it failed, not why it
failed. We need a way to fail with an error message. That is, the
Either
monad.
type Parser e a = String -> Either e (a, String)
Notice how we have the Maybe
and Either
outside the
tuple, so that when an error happens we stop parsing immediately. We could
instead have them inside the tuple for better error reporting, but that’s out of
scope for a simple blag post.
This is pretty close to the optimal representation, but there are still some
warts things to address: String
is a bad representation for textual
data, so ideally you’d have your own Stream
class that has instances
for things such as Text
, ByteString
and
String
.
One issue, however, is more glaring: You can’t define typeclass instances for
type synonyms! The fix, however, is simple: make Parser
a newtype.
newtype Parser a
= Parser { parse :: String -> Either String (a, String) }
Now that that’s out of the way, we can actually get around to instancing some typeclasses.
Since the AMP landed in GHC 7.10 (base 4.8), the hierarchy of the Monad typeclass is as follows:
class Functor (m :: * -> *) where
class Functor m => Applicative m where
class Applicative m => Monad m where
That is, we need to implement Functor and Applicative before we can actually implement Monad.
We shall also add an Alternative
instance for expressing choice.
First we need some utility functions, such as runParser
, that runs a
parser from a given stream.
runParser :: Parser a -> String -> Either String a
Parser p) s = fst <$> p s runParser (
We could also use function for modifying error messages. For convenience, we
make this an infix operator, <?>
.
(<?>) :: Parser a -> String -> Parser a
Parser p) <?> err = Parser go where
(= case p s of
go s Left _ -> Left err
Right x -> return x
infixl 2 <?>
Functor
Remember that Functor models something that can be mapped over (technically,
fmap
-ed over).
We need to define semantics for fmap
on Parsers. A sane implementation would
only map over the result, and keeping errors the same. This is a homomorphism,
and follows the Functor laws.
However, since we can’t modify a function in place, we need to return a new parser that applies the given function after the parsing is done.
instance Functor Parser where
`fmap` (Parser p) = Parser go where
fn = case p st of
go st Left e -> Left e
Right (res, str') -> Right (fn res, str')
Applicative
While Functor is something that can be mapped over, Applicative defines semantics for applying a function inside a context to something inside a context.
The Applicative class is defined as
class Functor m => Applicative m where
pure :: a -> m a
(<*>) :: f (a -> b) -> f a -> f b
Notice how the pure
and the return
methods are
equivalent, so we only have to implement one of them.
Let’s go over this by parts.
instance Applicative Parser where
pure x = Parser $ \str -> Right (x, str)
The pure
function leaves the stream untouched, and sets the result
to the given value.
The (<*>)
function needs to to evaluate and parse the left-hand side
to get the in-context function to apply it.
Parser p) <*> (Parser p') = Parser go where
(= case p st of
go st Left e -> Left e
Right (fn, st') -> case p' st' of
Left e' -> Left e'
Right (v, st'') -> Right (fn v, st'')
Alternative
Since the only superclass of Alternative is Applicative, we can instance it
without a Monad instance defined. We do, however, need an import of
Control.Applicative
.
instance Alternative Parser where
= Parser $ \_ -> Left "empty parser"
empty Parser p) <|> (Parser p') = Parser go where
(= case p st of
go st Left _ -> p' st
Right x -> Right x
Monad
After almost a thousand words, one would be excused for forgetting we’re
implementing a monadic parser combinator library. That means, we need an
instance of the Monad
typeclass.
Since we have an instance of Applicative, we don’t need an implementation of
return: it is equivalent to pure
, save for the class constraint.
instance Monad Parser where
return = pure
The (>>=)
implementation, however, needs a bit more thought. Its
type signature is
(>>=) :: m a -> (a -> m b) -> m b
That means we need to extract a value from the Parser monad and apply it to the given function, producing a new Parser.
Parser p) >>= f = Parser go where
(= case p s of
go s Left e -> Left e
Right (x, s') -> parse (f x) s'
While some people think that the fail
is not supposed to be in the
Monad typeclass, we do need an implementation for when pattern matching fails.
It is also convenient to use fail
for the parsing action that
returns an error with a given message.
fail m = Parser $ \_ -> Left m
We now have a Parser
monad, that expresses a parsing action. But, a
parser library is no good when actual parsing is made harder than easier. To
make parsing easier, we define combinators, functions that modify a parser in
one way or another.
But first, we should get some parsing functions.
any, satisfying
any
is the parsing action that pops a character off the stream and returns
that. It does no further parsing at all.
any :: Parser Char
any = Parser go where
= Left "any: end of file"
go [] :xs) = Right (x,xs) go (x
satisfying
tests the parsed value against a function of type Char -> Bool
before deciding if it’s successful or a failure.
satisfy :: (Char -> Bool) -> Parser Char
= d
satisfy f <- any
x if f x
then return x
else fail "satisfy: does not satisfy"
We use the fail
function defined above to represent failure.
oneOf
, char
These functions are defined in terms of satisfying
, and parse individual
characters.
char :: Char -> Parser Char
= satisfy (c ==) <?> "char: expected literal " ++ [c]
char c
oneOf :: String -> Parser Char
= satisfy (`elem` s) <?> "oneOf: expected one of '" ++ s ++ "'" oneOf s
string
This parser parses a sequence of characters, in order.
string :: String -> Parser String
= return []
string [] :xs) = do
string (x
char x
string xsreturn $ x:xs
And that’s it! In a few hundred lines, we have built a working parser combinator library with Functor, Applicative, Alternative, and Monad instances. While it’s not as complex or featureful as Parsec in any way, it is powerful enough to define grammars for simple languages.
A transcription (with syntax highlighting) of this file is available as runnable Haskell. The transcription also features some extra combinators for use.