Lab 5: Parsing
Overview
The goals of this lab are to:
- Practice parsing strings, using parsers and parser combinators (like the ones we described in class).
- Practice defining your own parser combinators.
- Practice using parser combinators to write your own parsers.
These tasks will help you on this week’s assignment, where we will be
extending the implementation of pic2ps
.
Materials
Use the assignment workflow and the link provided on Piazza to get access to this week’s files.
This lab should work on any computer with subversion and ghci
installed; you
don’t have to ssh to knuth if you don’t want to.
Part 1: Recap, parsers, and parser combinators (20 minutes)
Look back at the notes you took in class today, while also looking through the
contents of the provided lab file BasicParser.hs
.
Recall that
-
A value of type
Parser Char
is a parsing function that, if invoked with an input string, consumes some prefix of the input and tries to parse aChar
. -
A value of type
Parser Integer
is similar, but if invoked it consumes some prefix of the input and tries to parse anInteger
.
-
When invoked, a parser generally succeeds if it can consume some prefix of the input, and fails otherwise.
-
If the parser fails, the parsing function returns
Nothing
. If the parser succeeds, the parsing function returnsJust (value, rest)
, wherevalue
is the result of the parser andrest
is the part of the input string that was not consumed during parsing. -
We don’t usually call the parsing function directly. Instead we use the function.
parse :: Parser a -> String -> a
, which takes a parser and an input string, invokes the parser on that string, and returns the result of the parser. However,parse
is very picky: even if the parser succeeds, if there are any leftover (unconsumed) characters in the input string,parse
will raise an error, usually reporting:Exception: Exception: After parsing, <rest> remained
Run some small experiments
- Start ghci and load
BasicParser.hs
. - Try some simple examples of creating parsers, using the parsers and
combinators we saw in class (i.e.,
pfail
,get
,return
,<|>
and<+>
), for example:parse get "a" parse (get <+> return 3) "a"
Implement a new parser combinator and a new parser
- Complete the code for the new
<:>
operation, described in the file, which is fairly similar to<+>
. Try out your new combinator with some examples. - Using
<=>
,get
, andisDigit
, complete the implementation of thedigit
Parser (aParser Char
). Test it out on a few examples. - Using
some
anddigit
writedigits
(which has typeParser String
), which matches one or more digits. Test it out on a few examples.
Part 2: More parser combinators
- Open
ParserCombinators.hs
in a text editor, and load the file in a ghci session.
- Work your way through
ParserCombinators.hs
. For each function:- If it’s unfinished (has a
FIXME!
comment or saysTODO
), fix it so it works. - Try out one or more examples of the function.
- If it’s unfinished (has a
Note: You’ll see the term infix
a lot in the code. This term refers to the
precedence of the operator that is being defined (e.g., the operator <=>
has
precedence 7). If you see infixl
, then the operator associates to the left;
infixr
means the operator associates to the right.
Part 3: Parsing expressions
- Open
SimpleExpr.hs
in a text editor, and load the file in a ghci session. This file uses the grammar for expressions that we saw in class. - Play with the
expression
parser inSimpleExpr.hs
to confirm it really parses expressions:parse expression "1+2+3+4*5+6*7" parse factor "42"
(Note: The parsers won’t work until you’ve finished the previous part of the lab.)
Part 4: Parsing pic
Read over the file PicParser.hs
, load it, and use
parseFile picture "test1.pic"
parseFile picture "test2.pic"
etc., to confirm it really does parse pic
code. Also check out how it handles errors
by parsing the erroneous files too (e.g., test0-bad.pic
).
Bonus (if there’s still time left in lab)
Write a hex
parser
Using some of the parsers and combinators you’ve seen so far, define a parser
that can match a hexadecimal string. A hexadecimal string is one or more
hexadecimal characters: 0
–9
or a
–f
. You can define your parser in ghci
and use it, like so:
*ParserCombinators> hex = <your definition here>
*ParserCombinators> parse hex "abc123"
"abc123"
*ParserCombinators> parse hex "0"
"0"
*ParserCombinators> parse hex "f"
"f"
*ParserCombinators> parse hex "3h"
"*** Exception: <input>:1:2 -- Different kind of character expected
It might help to know that the expression elem ch "abcdef"
returns True
if the
character ch
is one of the characters in the string "abcdef"
.
Extend the AST and parser for expressions
Try extending the expression language by:
-
Modifying the abstract syntax to add boolean values (
true
andfalse
) and at least one boolean operation (such as>
). -
Modifying the parser to parse new reserved words
true
andfalse
, and to parse your new boolean operation. Boolean operations have lower precedence than (i.e., happen after) numeric operations.