Little Languages Commentary

Summary

This document contains some comments on the various parts of the assignment. There are no new requirements in this document, only additional information for those who are interested.

Commentary on the phrase “Little Languages”

The term “little languages” comes from the title of an article, published in 1986 and written by Jon Bentley. The article describes the benefits of what we now call “domain-specific languages (DSLs)”, which are languages designed to do a specific task. You might have heard of SQL, Matlab, LaTeX, etc., all of which are DSLs. The article talks about a PIC language used for drawing diagrams, which will be the focus of our next two assignments.

Commentary on Random Art

Viewing pictures

If you use Visual Studio Code with RemoteSSH, you can open an image file in Visual Studio by clicking on the name in the folder.

If you are using ssh to connect to knuth, you’ll want to install XWindows on your machine. To get X-Windows on Mac OS, install XQuartz. On Windows, install Xming.

Once you have X-Windows installed, use the -Y flag when sshing: ssh -Y <username>@knuth.cs.hmc.edu. Once logged in, you can use eog *.png to get a graphical display, perhaps with the image gallery menu option enabled if you have a lot of pictures to look at. (xzgv is another image viewer.)

If you are running your code on a CS Lab Mac, you can just do open . to open the current directory in the Finder. You could also do open *.png to load the pictures into Preview.app each time you have something new. And if it’s your own machine, you may have other alternatives.

Alternative color scheme

One issue with the color art made by the doColor function is that it overlays three completely independent images, a red one, a green one, and a blue one that are unlikely to have any connection to each other. Often the resulting art lacks cohesion.

An alternative way of generating color images is doColorAlt, which you can use like so:

    sequence_ [doColorAlt 300 s d1 d2 | s <- [100..120], d1 <- [5..8], d2 <- [1..3]]

The alternative scheme generates two functions of depth d1 which we can call $P(x,y)$ and $Q(x,y)$, which calculate intermediate values (p,q) for each (x,y), and then feed those in to three simpler mixing color functions, $R(p,q)$, $G(p,q)$, $B(p,q)$.

You don’t have to use doColorAlt at all, but feel free to try it out and see if you like its output better.

Is `build` “interesting”?

Your implementation of build should produce values in the range $[-1.0,1.0]$. Additionally, it should produce random expressions whose depths tend to approach that of the maximum-depth bound, without ever exceeding that bound.

For the build function, the autograder will do something similar to this:

    build: structure
    ================
    Seeds 1..30 build 30 different depth-10 trees --> True +6 points
    Avg. depth >= 2/3 maximum --> True +4 points
    Obeys maximum depth bound --> True +2 points
    Nearly obeys maximum depth bound --> True +3 points
    Expressions produce values in range [-1.0,1.0] --> True +3 points

Additional Testing

Testing pictures is a bit tricky, and we can’t easily provide a test suite because you’re going to add to the abstract syntax tree (so the test cases will necessarily be incomplete).

However, we have provided a few additional testing tools, that let you visualize your expressions. You can also always test things by generating pictures and looking at them.

Visualization by plotting

To understand (and possibly debug) your expressions, try running some of these functions:

    plotOneArg SinPi
    plotTwoArg Avg

then examine the resulting PNG files in your favorite image viewer. There is also a plotThreeArg function, but currently no such operations exist in the Exp type.

Visualization by histogram

You can also visualize an expression as a histogram, where the histogram is the result of evaluation, multiplied by 10 and truncated to be an integer.

For example, try running

    histoOneArg SinPi

The histogram rounds away from zero, so only exactly zero matches zero. If the histogram shows values for <<< or >>> it means your function returns out-of-range values (that’s bad!).

If some output values (other than perhaps zero) never appear in your function, it may be that your function has a design flaw.

Are your new expressions “creative”?

We plan to accept a large variety of ideas, so long as the expressions produce values in the right range, don’t merely compose existing expressions, and don’t produce random values but distribute the values across the range. Any good-faith effort should suffice.

More random art

After finishing the assignment, you might be interested in www.random-art.org, which has a more sophisticated implementation that represents expressions as directed graphs rather than trees.

Backtracking and Continuations

You may have seen other backtracking algorithms whose code seemed simpler than our implementation. For example, to place $n$ queens on a chessboard so that none of them attack each other, backtracking search is essentially:

    To place queens in columns i .. n:
        For each "legal" position in the i-th column
        (given previously chosen positions for queens in columns < i):
            Recursively try to place queens in columns (i+1) .. n.
            If this succeeded, abort loop by returning success.
        If the loop ends without a success, return failure.

Or, to color an $n$-node undirected graph with no two adjacent nodes having the same color:

    To color nodes i .. n:
        For each "legal" color for the i-th node
        (given the previously chosen colors for nodes < i):
          Recursively try to color nodes (i+1) .. n.
          If this succeeded, abort loop by returning success.
        If the loop ends without a success, return failure.

The difference is that in both examples, when you’re working on step $i$, it’s obvious what the rest of the problem is going to be: steps $i+1$ to $n$. We can therefore make a recursive function call to solve the entire rest of the problem. If this doesn’t work, we can make a different choice and retry the recursive call.

More complicated strategies are required when “the rest of the problem” is not as apparent. When you’re working on satisfying some subexpression, for example, you know the subexpression but not what the original expression looked like. (Recall that the proposition $(p \lor q) \land \lnot p$, while focused recursively on satisfying $p \lor q$, there’s no way to know there’s another constraint $\lnot p$ waiting in the future.) So, there’s no way to make a recursive call to satisfy “the remainder of the original proposition that follows this particular subexpression”.

Satisfiability testing is one of the best known NP-complete problems. The backtracking search suggested above is provably exponential (even trying all possible combinations) in the worst-case.

However, in the past decade there have been quite startling advances in efficient heuristics for solving satisfiability problems, e.g., by being clever about the order we traverse the proposition and/or using information about why one assignment for the variables failed to satisfy the formula, in order to rule out (and avoid checking) many other similar-but-not-quite-identical assignments.

So even though the worst case is still terrible, the specific instances we tend to care about can be solved very quickly in practice! Thus, an increasingly popular approach to solving hard problems in many domains (cryptography, program verification, scheduling, Sudokus) is to translate the problem into a huge boolean expression, and then to run it through an efficient heuristic SAT solver.

Backtracking and regular expressions

Using backtracking search to match regular expressions will suffice for a quick and dirty implementation, but it has two general problems. First, it has worst-case exponential time for particularly bad inputs. Second, it has a tendency to go into an infinite loop if you try to match repetitions of any regular expression that itself could match the empty string. Both of these problems can be worked around, but doing so is beyond the scope of this assignment.