Policy of Compilation

In general, compilation means translating programs from a high-level language to a lower-level language. For example, the following MinCaml function, which computes the greatest common divisor of two non-negative integers,

let rec gcd m n =
  if m = 0 then n else
  if m <= n then gcd m (n - m) else
  gcd n (m - n)

is compiled into the following SPARC assembly.

gcd.7:
        cmp     %i2, 0
        bne     be_else.18
        nop
        mov     %i3, %i2
        retl
        nop
be_else.18:
        cmp     %i2, %i3
        bg      ble_else.19
        nop
        sub     %i3, %i2, %i3
        b       gcd.7
        nop
ble_else.19:
        sub     %i2, %i3, %o5
        mov     %i3, %i2
        mov     %o5, %i3
        b       gcd.7
        nop

At a first glance, there is a huge gap between the two languages. The MinCaml compiler bridges this gap by defining appropriate intermediate languages and applying simple translations one by one. The major five gaps between MinCaml and SPARC assembly are:

  1. Types. MinCaml has types and type checking, but assembly has no such mechanism.
  2. Nested expressions. In MinCaml, you can write as complex expressions as you want at once, like 1+2-(3+(4-5)), but assembly can execute only one operation by one instruction.
  3. Nested function definitions. In MinCaml, you can define a function inside another function, like
    let rec make_adder x =
      let rec adder y = x + y in
      adder in
    (make_adder 3) 7

    but assembly has only top-level "labels."
  4. MinCaml has data structures such as tuples and arrays, while assembly does not.
  5. In MinCaml, you can use as many variables as you want, but only a limited number of registers can be used in assembly.

To bridge these gaps, MinCaml applies translations such as

  1. Type inference
  2. K-normalization
  3. Closure conversion
  4. Virtual machine code generation
  5. Register allocation

in this order. We will explain these translations and optimizations in what follows.

Next