Lab 7: The C Preprocessor (call-by-macro)

Overview and Goals

This next assignment is about parameter passing: the semantics of calling a function and passing values for that function’s parameters. There are many different techniques for parameter passing, and one of the techniques is “call-by-macro”. This approach to parameter passing is seen most often in the C preprocessor. The C preprocessor is used by some C-like languages (including C and C++) and avoided by others (JavaScript, D), and optionally available in a number of languages, including (Glasgow) Haskell!

The primary goal of this lab is to understand call-by-macro. It is strange, and you will need to understand it to do the next assignment. That being said, there is more in this lab about call-by-macro than is necessary for the next assignment.

A secondary goal is learning how to read “standardese”, the formal but sometimes unfriendly way in which programming languages are often specified.

As always, get as far into the lab as you can, thinking carefully as you go. After one hour, stop and submit your work.

Materials

Use the assignment workflow and the link provided on Piazza to get access to this week’s files.

Here are some readings and references that will help you in the lab.

Official C99 Preprocessor Definition (Reference)
Full C99 Standard (Reference)
Full C11 Standard (Reference)

The first one is likely to be the most useful. One thing that may be helpful when reading the first resources is the definition of a “preprocessing token”, which is defined in Section 6.4 of the standard.

All the readings are a little hard to understand because they are specifications, not tutorials. Spend a few minutes skimming over the first one. You can come back to look over it in depth later when you are trying to resolve specific questions you may have.

Running the C preprocessor

The C preprocessor is part of every C compiler. In this lab, you can use clang, gcc, mcpp (in /cs/cs131/bin/mcpp), or tcc (in /cs/cs131/bin).

Run (on knuth or a lab Mac):

    clang -pedantic -w -E -

Type something, e.g., Hello World, and then, at the start of a new line, press Control-D (which the Unix terminal driver turns into “End of File”).

Observe what happens. You should see something like

    # 1 "<stdin>"
    # 1 "<built-in>" 1
    # 1 "<built-in>" 3
    # 349 "<built-in>" 3
    # 1 "<command line>" 1
    # 1 "<built-in>" 2
    # 1 "<stdin>" 2
    Hello World

You can ignore the lines beginning with # in the output. They tell the C/C++ compiler what file and line number subsequent lines come from.

Repeat with gcc (on a Mac or Knuth)

    gcc -pedantic -w -E -

which should produce something like

    # 1 "<stdin>"
    # 1 "<built-in>"
    # 1 "<command-line>"
    # 31 "<command-line>"
    # 1 "/usr/include/stdc-predef.h" 1 3 4
    # 32 "<command-line>" 2
    # 1 "<stdin>"
    Hello World

Repeat the task with tcc. You should run

    /cs/cs131/bin/tcc -w -E -

Repeat the task with mcpp (note that the arguments are slightly different from the previous tools). You should run

    /cs/cs131/bin/mcpp -W 0 -

Exercises

For each of the “Think, Then Try It” examples below, stop and think about it before you run it through the C preprocessor. Ask yourself, what you think the output will be. Write your answers in Answers.txt.

One way to come up with an answer is to try to simulate the C preprocessor by copy-pasting bits of text from the file, performing replacement until you get a result.

After you have written your answers, try the examples with gcc, clang, or mcpp. They are different implementations of the preprocessor, but for well-formed code, if the a preprocessor is correctly implemented, it should meet the C99 language specification for its behavior. Of course, that is no guarantee that they produce identical output; in fact, we have already seen that they don’t (although it was only the # information that was different).

The Official C99 Preprocessor Definition is your most useful reference, in particular Section 6.10.3 (Macro replacement), which is where it describes how #define works.

Think, Then Try It: Tokens

Remember, keep a record in Answers.txt… Decide on your answers first. Also, it’s okay to be wrong. When you get a different answer from what you expected, just explain why. Remember you can ask a prof. and / or your colleagues in lab!

The C preprocessor is used as a stage in the compilation of C code. Before the code is seen by the compiler, it is run through the C preprocessor. In CS 70 (or its equivalent) you might have learned that the C preprocessor doesn’t really understand C itself, it knows just enough to get its job done.

Open tokens.pp and look at the contents.

The question for you is how will each piece of text change when it has been run through the C preprocessor. Do not think about this one too long. Just think a little about what it might do, and then try it.

You can run tokens.pp through our various preprocessors with:

    gcc -w -E - < tokens.pp
    clang -w -E - < tokens.pp
    /cs/cs131/bin/tcc -w -E - < tokens.pp
    /cs/cs131/bin/mcpp -W 0 - < tokens.pp
    /cs/cs131/bin/mcpp -W 0 -@post - < tokens.pp

Think, Then Try It: Simple Substitutions, Simple Macros

A major feature of the C preprocessor is that it supports metaprogramming via macros. This facility is described in the language specification, Section 6.10.3 (Macro replacement), specifically the “Semantics” subsection. Simple macros are called “object-like” macros in standardese. Peruse this part of the standard. Then get a feel for what actually happens by trying each of these example files in turn:

subst1.pp: Object-like macros and preprocessing tokens.
subst2.pp: Multiline object-like macros and comments.
funs.pp: Function-like macros.

Think, Then Try It: Numbers of Arguments

Try these files next, which illustrate function-like macros with different numbers of arguments.

args1.pp: Be sure to think about exactly what the output will look like, including where spaces will be. Also, this one will produce some errors, so you might want to send the output to a file to separate the errors from the output. (Note that when a line contains an error, the output for that line is unspecified; in other words for an erroneous line, all C preprocessors should give an error, but they need not produce the same result.)
args2.pp: Is it call-by-value, call-by-name, or…?
- After you’ve looked at the actual results and are trying to explain them, you can use the hint found near the end of this writeup.
args3.pp: Again, also think about spaces…
- Are other answers allowed by the standard besides the ones you got from the compiler you tested with?
- If you want to test with mcpp, for correct results you must run
```
    /cs/cs131/bin/mcpp -W 0 -@post - 
```
- Note: tcc actually violates the standard here; why? (After you’ve looked at the actual results and are trying to explain them, you can use the hint found near the end of this writeup.)

Think, Then Try It: Recursion…?

rec1.pp: What does this tell you about the C preprocessor?

Think, Then Try It: Expansion revisited…

subst3.pp: Object and function-like macros, and #undef.

Think, Then Try It: Stringification

string1.pp
string2.pp

Think, Then Try It: Token Pasting

paste1.pp
paste2.pp: Try some other examples. What can and can’t you paste together?
paste3.pp

Think, Then Try It: Recursion revisited…

rec2.pp

Think, Then Try It: Variable Arguments Intro

The C-preprocessor has a mechanism to allow us to have functions with a variable number of arguments. The mechanism is described in the C99 Preprocessor Definition — specifically Section 6.10.3 (Macro replacement), paragraph 12 and Section 6.10.3.1 (Argument substitution), paragraph 2. Peruse these parts, then try these files:

args4.pp Introduction to variable number of arguments…
varargs1.pp: With a little bit of trickiness, we can use the variable number of arguments feature in some interesting ways. (After you’ve looked at the actual results and are trying to explain them, you can use the hint found near the end of this writeup.)
varargs2.pp: Putting the pieces together, we can have overloaded macros (where overloading is based on number of arguments).
varargs3.pp: So, is it possible to detect no text at all passed to a macro? (After you’ve looked at the actual results and are trying to explain them, you can use the hint found near the end of this writeup.)

Hints

`args2.pp`

To understand why IGNORE2 gives an error and IGNORE1 isn’t. Read over the first paragraph of Section 6.10.3.1 (Argument substitution), in the Official C99 Preprocessor Definition. It tells you some important things about what happens when in the context of argument passing and macro expansion.

We’d get the same results if we had

    #define IGNORE2(x) Blah blah blah x

`args3.pp`

Consider the code

    #define NEGATE(x) -x
    a = -NEGATE(b) 

Both clang and gcc correctly produce the answer

    a = - -b

and mcpp produces

    a = - - b

But tcc produces

    a = --b

which causes problems. Why?

(mcpp gets this one right, but gets the more complicated example wrong without @post.)

`varargs1.pp`

The result of VA_NUM_ARGS() might surprise you, but no text at all counts as a valid argument. If we want to distinguish this case, we can do it, but it’ll require more Variable Arguments.

`varargs3.pp`

Our trick involves the SAW_PARENS function. It returns T if it is passed something in parentheses and F otherwise. By running SAW_PARENS(__VA_ARGS__ ())) if __VA_ARGS__ contains nothing, we’ll see the parentheses that follow. But what if our input actually is something in parentheses? We catch that case first, to eliminate it.

And how does SAW_PARENS work? It relies on the fact that a function-like macro fires if it is given parentheses. Thus MK_UNICORN42 by itself stays unchanged, but MK_UNICORN42 (something,in,"parentheses") will invoke the macro and produce UNICORN42.