Partial Evaluation and Memory Optimisation

Missing runtime information in the compiler

When developing any reasonably large application it becomes important to think about code reusability. So, you start writing more generic code, abstracting away from the details. For example, you write a matrix multiplication function for matrices of arbitrary size, instead of a function for matrices of specific a size. The downside here is that the compiler now has the difficult task of trying to fill in these details and if the compiler is not capable of doing so, this gets postponed until runtime. And herein lies the problem, the code generated for filling in the details at runtime slows down the program.

One solution to this problem would be to take the generic code and generate specific functions for the situations where enough information is available. For example, say you have written a function for matrix multiplication, for matrices of any size. Now if you know — or the compiler can find out — that at runtime this function gets used (quite) a few times on matrices of 1000 × 1000 elements then you can gain speed by specialising your function for matrices of this size. In, for example, the Single assignment C compiler (sac2c) this can be achieved by adding a directive to your code requesting the generation of a specialised version of the matrix multiplication function.

Introduction to partial evaluation

Now, to make this idea a little more formal, I will introduce the topic of Partial evaluation. Partial evaluation is a program specialisation technique that takes a source program and a part of the program's input and generates a residual program that runs on the remainder of the input. The residual program tends to run much faster than the original ¹. Partial evaluation has been researched since the 1970's and has some very interesting applications. For example, as described by Y. Futamura in ² it is possible, given a self applicable partial evaluator, to generate a compiler from an interpreter or a compiler generator from a compiler.

Now, the main goal of partial evaluation is reducing the amount of computation by computing as much as possible during the partial evaluation process. There are two general approaches to partial evaluation, a) online partial evaluation and b) offline partial evaluation. Online partial evaluation can be seen as interpreting the given program for some of the input and generating code for those parts that depend on the remainder of the input. This happens online, i.e. the source program is run on the input. For offline partial evaluation we first pre-process the program, performing binding-time analysis. What happens in this step is we take a specification of the input, telling us which parts of the input will be known (static) and which will be unknown (dynamic). With this information we annotate the program. After this step the program is interpreted with the actual input, but here the partial evaluator just follows the annotations instead of making the decision 'online'.

What should be noticed here, is that the resulting residual programs have lost all their genericity. This, however, is relatively easily mended by including the generic code and generating a dispatch function that chooses the right function at runtime, depending on the arguments. This is the solution that has been implemented for Single assignment C, the compiler generates so-called wrapper functions that choose the correct, specific function at runtime. To reduce the overhead of this wrapper function, the compiler will dispatch statically whenever possible ³.

Reducing execution time as well as memory usage

So, the main goal of partial evaluation is reducing computation, but what about reducing memory usage? At the moment partial evaluation is not strictly targeted towards optimising memory usage. However, the residual programs that we get, do in fact expose much more of the memory usage of a program. So, we may have some very interesting opportunities for optimising memory usage here. For example, we may have a program that allocates some memory but only uses this memory in certain parts of the program. We could think of allocating memory for a data structure that we only use when the program is run with the input debug = 1. Now, if we partially evaluate the program providing debug = 0, the memory will go unused in the residual program.

Also, due to the fact that the allocation and deallocation of this memory are seen as computations, they remain in the residual program. Removing these allocations and deallocations will reduce the memory usage of the program and decrease the execution time. Besides reducing the overall memory usage of a program, it is possible to reduce the memory in use at a given time. Consider the following example program:


void foo(int debug, int size, ...) {  
    x = malloc(size * sizeof(somestruct));

    // Computation which uses x [...]

    y = malloc(size * sizeof(somestruct));

    z = malloc(size * sizeof(somestruct));

    // Computation which uses y and z [...]

    // Computation which uses z [...]

    // Perform an amount of computation not using x, y, or z [...]

    free(x);
    free(y);
    free(z);
}

Partial evaluation and use-def analysis can transform the program listed above into a more memory efficient residual program. The transformation applied is called allocation migration and will result in the following specialised program:


void foo_specialised(int debug, int size, ...) {  
    x = malloc(size * sizeof(somestruct));

    // Computation which uses x [...]

    free(x);

    y = malloc(size * sizeof(somestruct));

    z = malloc(size * sizeof(somestruct));

    // Computation which uses y and z [...]

    free(y);

    // Computation which uses z [...]

    free(z);

    // Perform an amount of computation not using x, y, or z [...]
}

When looking at residual programs we notice that the structure of a program is much more exposed, including the way it uses memory. Recognising and exploiting opportunities for memory optimisation has proven to be an interesting new sub-area of research, which, as it would seem, has yet to be explored. In ⁴ some interesting ways of exploiting these opportunities are described along with another method for using partial evaluation for memory optimisation. The article also discusses partial evaluation in more detail including the Futamura Projections and the Specialiser Projections.

We are currently in the process of implementing some of the techniques discussed in ⁵ and will report back to you when we have some results.

N. D. Jones, C. K. Gomard, and P. Sestoft. Partial evaluation and automatic program generation. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1993. ^[return]
Y. Futamura. Partial Evaluation of Computation Process - An Approach to a Compiler-Compiler. Higher-Order and Symbolic Computation, 12:381–391, 1999. 10.1023/A:1010095604496. ^[return]
C. Grelck, T. van Deurzen, S. Herhut, SB. Scholz -Asynchronous Adaptive Optimisation for Generic Data-Parallel Array Programming. Concurrency and Computation: Practice and Experience (not yet published). ^[return]
T. van Deurzen, V. Shah - Implementing Memory Optimisation using Partial Evaluation and The Interpretive Approach. Not yet published, under revision. ^[return]
T. van Deurzen, V. Shah - Implementing Memory Optimisation using Partial Evaluation and The Interpretive Approach. Not yet published, under revision. ^[return]