Regular Expressions
Overview
Thus far, your task has been to understand how lazy data structures can provide backtracking-style search through a solution space. In this part, you will apply your knowledge to the problem of regular-expression matching.
Background: Regular expressions
Recall that a regular expression is a pattern that succinctly describes a set of
strings. For example, the pattern a*
describes the set of strings that have
zero or more repetitions of the character a
: {"", "a", "aa", "aaa", …}
.
More formally, given a set of characters $\Sigma$ (also called an alphabet), we can describe regular expressions (abbreviated as regexps) like so:
- The empty string
ε
is a regexp. It matches only the empty string. - Any single character in the alphabet is a regexp. It matches only that specific character.
- The
.
symbol is a regexp. It matches any character in the alphabet. - If
e1
ande2
are regexps, thene1 | e2
is a regexp. It represents alternatives, so it matches anything that matches eithere1
ore2
(or both). - If
e1
ande2
are regexps, thene1 · e2
is a regexp. It represents concatenation, so it matches any string that begins with a match fore1
, immediately followed by a string that matches fore2
. - If
e
is a regexp, thene*
is a regexp. It represents repetition: zero or more concatenations of strings that matche
.
Given a regular expression $e$ and an input string $s$, we will declare a pattern match to be successful if a (possibly empty) prefix of $s$ matches the regular-expression pattern. For example:
- The pattern
a
matches the stringaaab
but notbaaa
. - There are multiple matches of
a*
for input stringaaab
:""
,a
,aa
,aaa
.
Lazy regular-expression matching [21%]
The directory regex
contains a partial implementation of a regular-expression
matcher, using laziness rather than DFA / NFA technology.
Be sure to look at all the files in the directory, including the file for the AST (which you won’t need to modify), the file for the evaluator (which you will modify), and the test cases (which provide examples of matching and which will help you test your code).
Your task: In RegexEvaluation.hs
, complete the implementation for matching
Letter
s, Alt
s, and Concat
s .
Double-check all your work…
Make sure you’ve done each of the three parts of the assignment and that you have submitted your repository and your image to Gradescope.