Lab 7: The C Preprocessor (call-by-macro)
Overview and Goals
This next assignment is about parameter passing: the semantics of calling a function and passing values for that function’s parameters. There are many different techniques for parameter passing, and one of the techniques is “call-by-macro”. This approach to parameter passing is seen most often in the C preprocessor. The C preprocessor is used by some C-like languages (including C and C++) and avoided by others (JavaScript, D), and optionally available in a number of languages, including (Glasgow) Haskell!
The primary goal of this lab is to understand call-by-macro. It is strange, and you will need to understand it to do the next assignment. That being said, there is more in this lab about call-by-macro than is necessary for the next assignment.
A secondary goal is learning how to read “standardese”, the formal but sometimes unfriendly way in which programming languages are often specified.
As always, get as far into the lab as you can, thinking carefully as you go. After one hour, stop and submit your work.
Materials
Use the assignment workflow and the link provided on Piazza to get access to this week’s files.
Here are some readings and references that will help you in the lab.
- Official C99 Preprocessor Definition (Reference)
- Full C99 Standard (Reference)
- Full C11 Standard (Reference)
The first one is likely to be the most useful. One thing that may be helpful when reading the first resources is the definition of a “preprocessing token”, which is defined in Section 6.4 of the standard.
All the readings are a little hard to understand because they are specifications, not tutorials. Spend a few minutes skimming over the first one. You can come back to look over it in depth later when you are trying to resolve specific questions you may have.
Running the C preprocessor
The C preprocessor is part of every C compiler. In this lab, you can use
clang
, gcc
, mcpp
(in /cs/cs131/bin/mcpp
), or tcc
(in /cs/cs131/bin
).
Run (on knuth or a lab Mac):
clang -pedantic -w -E -
Type something, e.g., Hello World
, and then, at the start of a new line, press
Control-D (which the Unix terminal driver turns into “End of
File”).
Observe what happens. You should see something like
# 1 "<stdin>"
# 1 "<built-in>" 1
# 1 "<built-in>" 3
# 349 "<built-in>" 3
# 1 "<command line>" 1
# 1 "<built-in>" 2
# 1 "<stdin>" 2
Hello World
You can ignore the lines beginning with #
in the output. They tell
the C/C++ compiler what file and line number subsequent lines come from.
Repeat with gcc
(on a Mac or Knuth)
gcc -pedantic -w -E -
which should produce something like
# 1 "<stdin>"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<command-line>" 2
# 1 "<stdin>"
Hello World
Repeat the task with tcc
. You should run
/cs/cs131/bin/tcc -w -E -
Repeat the task with mcpp
(note that the arguments are slightly different from
the previous tools). You should run
/cs/cs131/bin/mcpp -W 0 -
Exercises
For each of the “Think, Then Try It” examples below, stop
and think about it before you run it through the C preprocessor. Ask
yourself, what you think the output will be. Write your answers in
Answers.txt
.
One way to come up with an answer is to try to simulate the C preprocessor by copy-pasting bits of text from the file, performing replacement until you get a result.
After you have written your answers, try the examples with gcc
, clang
, or mcpp
.
They are different implementations of the preprocessor, but for well-formed code, if
the a preprocessor is correctly implemented, it should meet the C99 language specification for
its behavior. Of course, that is no guarantee that they produce identical output; in
fact, we have already seen that they don’t (although it was only the #
information that
was different).
The Official C99 Preprocessor Definition is your
most useful reference, in particular Section 6.10.3 (Macro replacement), which is
where it describes how #define
works.
Think, Then Try It: Tokens
Remember, keep a record in Answers.txt
… Decide on your answers first.
Also, it’s okay to be wrong. When you get a different answer from what you
expected, just explain why. Remember you can ask a prof. and / or your
colleagues in lab!
The C preprocessor is used as a stage in the compilation of C code. Before the code is seen by the compiler, it is run through the C preprocessor. In CS 70 (or its equivalent) you might have learned that the C preprocessor doesn’t really understand C itself, it knows just enough to get its job done.
Open tokens.pp
and look at the contents.
The question for you is how will each piece of text change when it has been run through the C preprocessor. Do not think about this one too long. Just think a little about what it might do, and then try it.
You can run tokens.pp
through our various preprocessors with:
gcc -w -E - < tokens.pp
clang -w -E - < tokens.pp
/cs/cs131/bin/tcc -w -E - < tokens.pp
/cs/cs131/bin/mcpp -W 0 - < tokens.pp
/cs/cs131/bin/mcpp -W 0 -@post - < tokens.pp
Think, Then Try It: Simple Substitutions, Simple Macros
A major feature of the C preprocessor is that it supports metaprogramming via macros. This facility is described in the language specification, Section 6.10.3 (Macro replacement), specifically the “Semantics” subsection. Simple macros are called “object-like” macros in standardese. Peruse this part of the standard. Then get a feel for what actually happens by trying each of these example files in turn:
subst1.pp:
Object-like macros and preprocessing tokens.subst2.pp:
Multiline object-like macros and comments.funs.pp:
Function-like macros.
Think, Then Try It: Numbers of Arguments
Try these files next, which illustrate function-like macros with different numbers of arguments.
-
args1.pp
: Be sure to think about exactly what the output will look like, including where spaces will be. Also, this one will produce some errors, so you might want to send the output to a file to separate the errors from the output. (Note that when a line contains an error, the output for that line is unspecified; in other words for an erroneous line, all C preprocessors should give an error, but they need not produce the same result.) args2.pp
: Is it call-by-value, call-by-name, or…?- After you’ve looked at the actual results and are trying to explain them, you can use the hint found near the end of this writeup.
args3.pp
: Again, also think about spaces…-
Are other answers allowed by the standard besides the ones you got from the compiler you tested with?
- If you want to test with
mcpp
, for correct results you must run/cs/cs131/bin/mcpp -W 0 -@post -
- Note:
tcc
actually violates the standard here; why? (After you’ve looked at the actual results and are trying to explain them, you can use the hint found near the end of this writeup.)
-
Think, Then Try It: Recursion…?
rec1.pp
: What does this tell you about the C preprocessor?
Think, Then Try It: Expansion revisited…
subst3.pp
: Object and function-like macros, and#undef
.
Think, Then Try It: Stringification
string1.pp
string2.pp
Think, Then Try It: Token Pasting
paste1.pp
paste2.pp
: Try some other examples. What can and can’t you paste together?paste3.pp
Think, Then Try It: Recursion revisited…
rec2.pp
Think, Then Try It: Variable Arguments Intro
The C-preprocessor has a mechanism to allow us to have functions with a variable number of arguments. The mechanism is described in the C99 Preprocessor Definition — specifically Section 6.10.3 (Macro replacement), paragraph 12 and Section 6.10.3.1 (Argument substitution), paragraph 2. Peruse these parts, then try these files:
args4.pp
Introduction to variable number of arguments…varargs1.pp
: With a little bit of trickiness, we can use the variable number of arguments feature in some interesting ways. (After you’ve looked at the actual results and are trying to explain them, you can use the hint found near the end of this writeup.)varargs2.pp
: Putting the pieces together, we can have overloaded macros (where overloading is based on number of arguments).varargs3.pp
: So, is it possible to detect no text at all passed to a macro? (After you’ve looked at the actual results and are trying to explain them, you can use the hint found near the end of this writeup.)
Hints
args2.pp
To understand why IGNORE2
gives an error and IGNORE1
isn’t. Read over the
first paragraph of Section 6.10.3.1 (Argument substitution), in the Official
C99 Preprocessor Definition.
It tells you some important things about what happens when in the context of
argument passing and macro expansion.
We’d get the same results if we had
#define IGNORE2(x) Blah blah blah x
args3.pp
Consider the code
#define NEGATE(x) -x
a = -NEGATE(b)
Both clang
and gcc
correctly produce the answer
a = - -b
and mcpp
produces
a = - - b
But tcc
produces
a = --b
which causes problems. Why?
(mcpp
gets this one right, but gets the more complicated example wrong without
@post
.)
varargs1.pp
The result of VA_NUM_ARGS()
might surprise you, but no text at all counts as
a valid argument. If we want to distinguish this case, we can do it, but
it’ll require more Variable Arguments.
varargs3.pp
Our trick involves the SAW_PARENS
function. It returns T
if it is passed
something in parentheses and F
otherwise. By running
SAW_PARENS(__VA_ARGS__ ()))
if __VA_ARGS__
contains nothing, we’ll see the
parentheses that follow. But what if our input actually is something in
parentheses? We catch that case first, to eliminate it.
And how does SAW_PARENS
work? It relies on the fact that a function-like
macro fires if it is given parentheses. Thus MK_UNICORN42
by itself stays
unchanged, but MK_UNICORN42 (something,in,"parentheses")
will invoke the
macro and produce UNICORN42
.