A Crash Course for the MinCaml Compiler

Policy of Compilation

In general, compilation means translating programs from a high-level language to a lower-level language. For example, the following MinCaml function, which computes the greatest common divisor of two non-negative integers,

let rec gcd m n = if m = 0 then n else if m <= n then gcd m (n - m) else gcd n (m - n)

is compiled into the following SPARC assembly.

gcd.7: cmp %i2, 0 bne be_else.18 nop mov %i3, %i2 retl nop be_else.18: cmp %i2, %i3 bg ble_else.19 nop sub %i3, %i2, %i3 b gcd.7 nop ble_else.19: sub %i2, %i3, %o5 mov %i3, %i2 mov %o5, %i3 b gcd.7 nop

At a first glance, there is a huge gap between the two languages. The MinCaml compiler bridges this gap by defining appropriate intermediate languages and applying simple translations one by one. The major five gaps between MinCaml and SPARC assembly are:

Types. MinCaml has types and type checking, but assembly has no such mechanism.
Nested expressions. In MinCaml, you can write as complex expressions as you want at once, like 1+2-(3+(4-5)), but assembly can execute only one operation by one instruction.
Nested function definitions. In MinCaml, you can define a function inside another function, like
let rec make_adder x = let rec adder y = x + y in adder in (make_adder 3) 7
but assembly has only top-level "labels."
MinCaml has data structures such as tuples and arrays, while assembly does not.
In MinCaml, you can use as many variables as you want, but only a limited number of registers can be used in assembly.

To bridge these gaps, MinCaml applies translations such as

Type inference
K-normalization
Closure conversion
Virtual machine code generation
Register allocation

in this order. We will explain these translations and optimizations in what follows.