Lexical Addressing

One of the most common optimizations performed by compilers is the optimization of variable lookup. Our compiler, as we have implemented it so far, generates code that uses the lookup-variable-value operation of the evaluator machine. This searches for a variable by comparing it with each variable that is currently bound, working frame by frame outward through the run-time environment. This search can be expensive if the frames are deeply nested or if there are many variables. For example, consider the problem of looking up the value of x while evaluating the expression (* x y z) in an application of the procedure that is returned by

(let ((x 3) (y 4)) (lambda (a b c d e) (let ((y (* a b x)) (z (+ c d x))) (* x y z))))

Since a let expression is just syntactic sugar for a lambda combination, this expression is equivalent to

((lambda (x y) (lambda (a b c d e) ((lambda (y z) (* x y z)) (* a b x) (+ c d x)))) 3 4)

Each time lookup-variable-value searches for x, it must determine that the symbol x is not eq? to y or z (in the first frame), nor to a, b, c, d, or e (in the second frame). We will assume, for the moment, that our programs do not use define -- that variables are bound only with lambda. Because our language is lexically scoped, the run-time environment for any expression will have a structure that parallels the lexical structure of the program in which the expression appears.^{noteThis is not true if we allow internal definitions, unless we scan them out. See exercise 5.43.} Thus, the compiler can know, when it analyzes the above expression, that each time the procedure is applied the variable x in (* x y z) will be found two frames out from the current frame and will be the first variable in that frame.

We can exploit this fact by inventing a new kind of variable-lookup operation, lexical-address-lookup, that takes as arguments an environment and a lexical address that consists of two numbers: a frame number, which specifies how many frames to pass over, and a displacement number, which specifies how many variables to pass over in that frame. Lexical-address-lookup will produce the value of the variable stored at that lexical address relative to the current environment. If we add the lexical-address-lookup operation to our machine, we can make the compiler generate code that references variables using this operation, rather than lookup-variable-value. Similarly, our compiled code can use a new lexical-address-set! operation instead of set-variable-value!.

In order to generate such code, the compiler must be able to determine the lexical address of a variable it is about to compile a reference to. The lexical address of a variable in a program depends on where one is in the code. For example, in the following program, the address of x in expression <e1> is (2,0) -- two frames back and the first variable in the frame. At that point y is at address (0,0) and c is at address (1,2). In expression <e2>, x is at (1,0), y is at (1,1), and c is at (0,2).

((lambda (x y) (lambda (a b c d e) ((lambda (y z) <e1>) <e2> (+ c d x)))) 3 4)

One way for the compiler to produce code that uses lexical addressing is to maintain a data structure called a compile-time environment. This keeps track of which variables will be at which positions in which frames in the run-time environment when a particular variable-access operation is executed. The compile-time environment is a list of frames, each containing a list of variables. (There will of course be no values bound to the variables, since values are not computed at compile time.) The compile-time environment becomes an additional argument to compile and is passed along to each code generator. The top-level call to compile uses an empty compile-time environment. When a lambda body is compiled, compile-lambda-body extends the compile-time environment by a frame containing the procedure's parameters, so that the sequence making up the body is compiled with that extended environment. At each point in the compilation, compile-variable and compile-assignment use the compile-time environment in order to generate the appropriate lexical addresses.

Exercises 5.39 through 5.43 describe how to complete this sketch of the lexical-addressing strategy in order to incorporate lexical lookup into the compiler. Exercise 5.44 describes another use for the compile-time environment.

Exercise 5.39. Write a procedure lexical-address-lookup that implements the new lookup operation. It should take two arguments -- a lexical address and a run-time environment -- and return the value of the variable stored at the specified lexical address. Lexical-address-lookup should signal an error if the value of the variable is the symbol *unassigned*.^{noteThis is the modification to variable lookup required if we implement the scanning method to eliminate internal definitions (exercise 5.43). We will need to eliminate these definitions in order for lexical addressing to work.} Also write a procedure lexical-address-set! that implements the operation that changes the value of the variable at a specified lexical address.

Exercise 5.40. Modify the compiler to maintain the compile-time environment as described above. That is, add a compile-time-environment argument to compile and the various code generators, and extend it in compile-lambda-body.

Exercise 5.41. Write a procedure find-variable that takes as arguments a variable and a compile-time environment and returns the lexical address of the variable with respect to that environment. For example, in the program fragment that is shown above, the compile-time environment during the compilation of expression <e1> is ((y z) (a b c d e) (x y)). Find-variable should produce

(find-variable 'c '((y z) (a b c d e) (x y))) (1 2) (find-variable 'x '((y z) (a b c d e) (x y))) (2 0) (find-variable 'w '((y z) (a b c d e) (x y))) not-found

Exercise 5.42. Using find-variable from exercise 5.41, rewrite compile-variable and compile-assignment to output lexical-address instructions. In cases where find-variable returns not-found (that is, where the variable is not in the compile-time environment), you should have the code generators use the evaluator operations, as before, to search for the binding. (The only place a variable that is not found at compile time can be is in the global environment, which is part of the run-time environment but is not part of the compile-time environment.^{noteLexical addresses cannot be used to access variables in the global environment, because these names can be defined and redefined interactively at any time. With internal definitions scanned out, as in exercise 5.43, the only definitions the compiler sees are those at top level, which act on the global environment. Compilation of a definition does not cause the defined name to be entered in the compile-time environment.} Thus, if you wish, you may have the evaluator operations look directly in the global environment, which can be obtained with the operation (op get-global-environment), instead of having them search the whole run-time environment found in env.) Test the modified compiler on a few simple cases, such as the nested lambda combination at the beginning of this section.

Exercise 5.43. We argued in section 4.1.6 that internal definitions for block structure should not be considered "real" defines. Rather, a procedure body should be interpreted as if the internal variables being defined were installed as ordinary lambda variables initialized to their correct values using set!. Section 4.1.6 and exercise 4.16 showed how to modify the metacircular interpreter to accomplish this by scanning out internal definitions. Modify the compiler to perform the same transformation before it compiles a procedure body.

Exercise 5.44. In this section we have focused on the use of the compile-time environment to produce lexical addresses. But there are other uses for compile-time environments. For instance, in exercise 5.38 we increased the efficiency of compiled code by open-coding primitive procedures. Our implementation treated the names of open-coded procedures as reserved words. If a program were to rebind such a name, the mechanism described in exercise 5.38 would still open-code it as a primitive, ignoring the new binding. For example, consider the procedure

(lambda (+ * a b x y) (+ (* a x) (* b y)))

which computes a linear combination of x and y. We might call it with arguments +matrix, *matrix, and four matrices, but the open-coding compiler would still open-code the + and the * in (+ (* a x) (* b y)) as primitive + and *. Modify the open-coding compiler to consult the compile-time environment in order to compile the correct code for expressions involving the names of primitive procedures. (The code will work correctly as long as the program does not define or set! these names.)