A Crash Course for the MinCaml Compiler

Register Allocation (`regAlloc.ml`)

[Update on September 17, 2008: The register allocator now uses a simpler algorithm. It omits the backtracking (ToSpill and NoSpill) explained below.]

The most complex process in the MinCaml compiler is register allocation, which implements infinite number of variables by finite number of registers.

First, as a function calling convention, we assign arguments from the first register toward the last register. (The MinCaml Compiler does not support too many arguments that do not fit in registers. They must be handled by programmers, for example by using tuples.) We set return values to the first register. These are processed in RegAlloc.h which allocates registers in each top-level function.

After that, we allocate resisters in function bodies and main routine. RegAlloc.g takes an instruction sequence with a mapping regenv (from variables to registers) representing the current register assignment, and returns the instruction sequence after register allocation. The basic policy of register allocation is "to avoid registers on which live (still to be used) variables are assigned." The "variables still to be used" are calculated by SparcAsm.fv. However, when allocating registers in e1 of Let(x, e1, e2), not only e1 but also the "continuing" instruction sequence e2 must be taken into account for the calculation of the "variables to be used." For this reason, RegAlloc.g and RegAlloc.g', which allocates registers in expressions, also take the "still continuing " instruction sequence cont and use it in the calculation of live variables.

But we sometimes cannot allocate any register that is not live, since the number of variables is infinite while that of registers is not. In this case, we have to save the current value of some register to memory. This process is called register spilling. Unlike in imperative languages, the value of a variable in functional languages does not change after its definition. Therefore, it is better to save it as early as possible (if ever) in order to make the room.

Thus, when a variable x needs to be saved, RegAlloc.g returns a data type value ToSpill representing this need, and thereby goes back to the definition of x to insert a virtual instruction Save. In addition, since we want to remove x from the set of "live variables" at the point where x is spilled, we insert virtual instruction Forget to exclude x from the set of free variables. For this purpose, ToSpill keeps not only (the list of) spilled variables, but also the instruction sequence e in which Forget has been inserted. In this way, after saving x, we redo the register allocation against e.

Saving is necessary not only when resisters are spilled, but also when functions are called. MinCaml adopts the so-called caller-save convention, so every function call destroys the values of registers. Therefore, we need to save the values of all registers that are live at that point. This is why ToSpill holds the list of spilled variables.

When saving is unnecessary, we return the register-allocated instruction sequence e' (with new regenv) in the other data type value NoSpill.

A spilled variable will be used sooner or later, in which case RegAlloc.g' (the function that allocates registers in expressions) raises an exception because it cannot find the variable in regenv. This exception is handled in function RegAlloc.g'_and_restore, where virtual instruction Restore is inserted, which restores the value of a variable from the memory to a register.

When allocating registers, we not only "avoid live registers" but also try to "reduce unnecessary mov in the future." This is called register targeting or register coalescing. For example, if a variable being defined will be the second argument of a function call, we try to allocate it on the second register. For another example, we try to allocate a variable that will be returned as the result of a function on the first register. These are implemented in RegAlloc.target. For this purpose, RegAlloc.g and RegAlloc.g' also takes register dest where the result of computation will be set.

Register Allocation (regAlloc.ml)

Register Allocation (`regAlloc.ml`)