A Crash Course for the MinCaml Compiler

K-Normalization (`kNormal.ml`)

We stated that "compilation is about bridging the gaps between a high-level language and a low-level language," one of which was "nested expressions." For example, ML (as well as most languages) can compute at once the values of expressions like a + b + c - d, but ordinary assembly does not have such an instruction.

This gap is bridged by a translation called K-normalization, which defines every intermediate result of computation as a variable. (The name K-normalization originates in a compiler called ML Kit.) For example, the previous expression can be translated to:

let tmp1 = a + b in let tmp2 = tmp1 + c in tmp2 - d

In MinCaml, KNormal.g is the core of K-normalization and KNormal.t is the data type of K-normalized expressions (called K-normal forms). KNormal.g takes an expression before K-normalization with type environment env, and returns the expression after K-normalization paired with its type. (Types are passed around just for annotating variable definitions in let, and are not central to the essence of K-normalization.)

For example, in the case of e1 + e2, e1 is first K-normalized by g env e1, whose result is bound by let to variable x. Then, e2 is also K-normalized by g env e2, whose result is bound by let to variable y, and the expression x + y is returned with its type int.

In order to insert such let, we use an auxiliary function named insert_let. It takes expression e, creates a fresh variable x, and returns the expression let x = e in ... (though, if e is a variable in the first place, insert_let uses this variable as x and does not insert let in fact). It also takes the "continuation" function k, apply it to x, and uses its result for the expression after in (the body "..." above).

By the way, in addition to K-normalization, KNormal.g also translates booleans true and false to integers 1 and 0. In addition, comparisons and conditional branches are translated into combined forms like if e1 <= e2 then e3 else e4 from separate forms such as e1 <= e2 and if e then e1 else e2. This translation bridges one of the gaps between MinCaml and ordinary assembly where comparison and branches are combined. If condition e is not a comparison, it is translated into a comparison as in if e <> 0 then e1 else e2. Also by translating if (not e) then e1 else e2 to if e then e2 else e1 etc., all boolean expressions and conditional branches end up in either of the two forms

if e1 = e2 then e3 else e4

and

if e1 <= e2 then e3 else e4.

This translation is different from K-normalization, but implemented together with KNormal.g just because separating them requires extra work to define the intermediate data type.

In addition, uses of external variables are limited to calls to external functions (KNormal.ExtFunApp) or references to external arrays (KNormal.ExtArray) because they are necessary and sufficient for most applications. This restriction is also implemented together with K-normalization.

K-Normalization (kNormal.ml)

K-Normalization (`kNormal.ml`)