Lab 5: Parsing

Overview

The goals of this lab are to:

  • Practice parsing strings, using parsers and parser combinators (like the ones we described in class).
  • Practice defining your own parser combinators.
  • Practice using parser combinators to write your own parsers.

These tasks will help you on this week’s assignment, where we will be extending the implementation of pic2ps.

Materials

Use the assignment workflow and the link provided on Piazza to get access to this week’s files.

This lab should work on any computer with subversion and ghci installed; you don’t have to ssh to knuth if you don’t want to.

Part 1: Recap, parsers, and parser combinators (20 minutes)

Look back at the notes you took in class today, while also looking through the contents of the provided lab file BasicParser.hs.

Recall that

  • A value of type Parser Char is a parsing function that, if invoked with an input string, consumes some prefix of the input and tries to parse a Char.

  • A value of type Parser Integer is similar, but if invoked it consumes some prefix of the input and tries to parse an Integer.

  • When invoked, a parser generally succeeds if it can consume some prefix of the input, and fails otherwise.

  • If the parser fails, the parsing function returns Nothing. If the parser succeeds, the parsing function returns Just (value, rest), where value is the result of the parser and rest is the part of the input string that was not consumed during parsing.

  • We don’t usually call the parsing function directly. Instead we use the function. parse :: Parser a -> String -> a, which takes a parser and an input string, invokes the parser on that string, and returns the result of the parser. However, parse is very picky: even if the parser succeeds, if there are any leftover (unconsumed) characters in the input string, parse will raise an error, usually reporting:

      Exception: Exception: After parsing, <rest> remained
    

Run some small experiments

  1. Start ghci and load BasicParser.hs.
  2. Try some simple examples of creating parsers, using the parsers and combinators we saw in class (i.e., pfail, get, return, <|> and <+>), for example:
      parse get "a"
    
      parse (get <+> return 3) "a"
    

Implement a new parser combinator and a new parser

  1. Complete the code for the new <:> operation, described in the file, which is fairly similar to <+>. Try out your new combinator with some examples.
  2. Using <=>, get, and isDigit, complete the implementation of the digit Parser (a Parser Char). Test it out on a few examples.
  3. Using some and digit write digits (which has type Parser String), which matches one or more digits. Test it out on a few examples.

Part 2: More parser combinators

  • Open ParserCombinators.hs in a text editor, and load the file in a ghci session.
  • Work your way through ParserCombinators.hs. For each function:
    • If it’s unfinished (has a FIXME! comment or says TODO), fix it so it works.
    • Try out one or more examples of the function.

Note: You’ll see the term infix a lot in the code. This term refers to the precedence of the operator that is being defined (e.g., the operator <=> has precedence 7). If you see infixl, then the operator associates to the left; infixr means the operator associates to the right.

Part 3: Parsing expressions

  • Open SimpleExpr.hs in a text editor, and load the file in a ghci session. This file uses the grammar for expressions that we saw in class.
  • Play with the expression parser in SimpleExpr.hs to confirm it really parses expressions:
            parse expression "1+2+3+4*5+6*7" 
            parse factor "42"
    

    (Note: The parsers won’t work until you’ve finished the previous part of the lab.)

Part 4: Parsing pic

Read over the file PicParser.hs, load it, and use

          parseFile picture "test1.pic" 
          parseFile picture "test2.pic" 

etc., to confirm it really does parse pic code. Also check out how it handles errors by parsing the erroneous files too (e.g., test0-bad.pic).

Bonus (if there’s still time left in lab)

Write a hex parser

Using some of the parsers and combinators you’ve seen so far, define a parser that can match a hexadecimal string. A hexadecimal string is one or more hexadecimal characters: 09 or af. You can define your parser in ghci and use it, like so:

    *ParserCombinators> hex = <your definition here>

    *ParserCombinators> parse hex "abc123"
    "abc123"

    *ParserCombinators> parse hex "0"
    "0"

    *ParserCombinators> parse hex "f"
    "f"

    *ParserCombinators> parse hex "3h"
    "*** Exception: <input>:1:2 -- Different kind of character expected

It might help to know that the expression elem ch "abcdef" returns True if the character ch is one of the characters in the string "abcdef".

Extend the AST and parser for expressions

Try extending the expression language by:

  1. Modifying the abstract syntax to add boolean values (true and false) and at least one boolean operation (such as >).

  2. Modifying the parser to parse new reserved words true and false, and to parse your new boolean operation. Boolean operations have lower precedence than (i.e., happen after) numeric operations.