One of the most common optimizations performed by compilers is the optimization of variable lookup. Our compiler, as we have implemented it so far, generates code that uses the lookup-variable-value operation of the evaluator machine. This searches for a variable by comparing it with each variable that is currently bound, working frame by frame outward through the run-time environment. This search can be expensive if the frames are deeply nested or if there are many variables. For example, consider the problem of looking up the value of x while evaluating the expression (* x y z) in an application of the procedure that is returned by
(let ((x 3) (y 4))
(lambda (a b c d e)
(let ((y (* a b x))
(z (+ c d x)))
(* x y z))))
Since a let expression is just syntactic sugar for a lambda combination, this expression is equivalent to
((lambda (x y)
(lambda (a b c d e)
((lambda (y z) (* x y z))
(* a b x)
(+ c d x))))
3
4)
Each time lookup-variable-value searches for x, it must determine that the symbol x is not eq? to y or z (in the first frame), nor to a, b, c, d, or e (in the second frame). We will assume, for the moment, that our programs do not use define -- that variables are bound only with lambda. Because our language is lexically scoped, the run-time environment for any expression will have a structure that parallels the lexical structure of the program in which the expression appears.noteThis is not true if we allow internal definitions, unless we scan them out. See exercise 5.43. Thus, the compiler can know, when it analyzes the above expression, that each time the procedure is applied the variable x in (* x y z) will be found two frames out from the current frame and will be the first variable in that frame.
We can exploit this fact by inventing a new kind of variable-lookup operation, lexical-address-lookup, that takes as arguments an environment and a lexical address that consists of two numbers: a frame number, which specifies how many frames to pass over, and a displacement number, which specifies how many variables to pass over in that frame. Lexical-address-lookup will produce the value of the variable stored at that lexical address relative to the current environment. If we add the lexical-address-lookup operation to our machine, we can make the compiler generate code that references variables using this operation, rather than lookup-variable-value. Similarly, our compiled code can use a new lexical-address-set! operation instead of set-variable-value!.
In order to generate such code, the compiler must be able to determine the lexical address of a variable it is about to compile a reference to. The lexical address of a variable in a program depends on where one is in the code. For example, in the following program, the address of x in expression <e1> is (2,0) -- two frames back and the first variable in the frame. At that point y is at address (0,0) and c is at address (1,2). In expression <e2>, x is at (1,0), y is at (1,1), and c is at (0,2).
((lambda (x y)
(lambda (a b c d e)
((lambda (y z) <e1>)
<e2>
(+ c d x))))
3
4)
One way for the compiler to produce code that uses lexical addressing is to maintain a data structure called a compile-time environment. This keeps track of which variables will be at which positions in which frames in the run-time environment when a particular variable-access operation is executed. The compile-time environment is a list of frames, each containing a list of variables. (There will of course be no values bound to the variables, since values are not computed at compile time.) The compile-time environment becomes an additional argument to compile and is passed along to each code generator. The top-level call to compile uses an empty compile-time environment. When a lambda body is compiled, compile-lambda-body extends the compile-time environment by a frame containing the procedure's parameters, so that the sequence making up the body is compiled with that extended environment. At each point in the compilation, compile-variable and compile-assignment use the compile-time environment in order to generate the appropriate lexical addresses.
Exercises 5.39 through 5.43 describe how to complete this sketch of the lexical-addressing strategy in order to incorporate lexical lookup into the compiler. Exercise 5.44 describes another use for the compile-time environment.
(find-variable 'c '((y z) (a b c d e) (x y)))
(1 2)
(find-variable 'x '((y z) (a b c d e) (x y)))
(2 0)
(find-variable 'w '((y z) (a b c d e) (x y)))
not-found
(lambda (+ * a b x y)
(+ (* a x) (* b y)))
which computes a linear combination of x and y. We might call it with arguments +matrix, *matrix, and four matrices, but the open-coding compiler would still open-code the + and the * in (+ (* a x) (* b y)) as primitive + and *. Modify the open-coding compiler to consult the compile-time environment in order to compile the correct code for expressions involving the names of primitive procedures. (The code will work correctly as long as the program does not define or set! these names.)