Parsing (parser.mly)

Now that lexical analysis is done, instead of strings like

'1' '2' '3' '-' '4' '5' '6' '+' '7' '8' '9'

we have sequences of tokens like

"123" "-" "456" "+" "789".

However, we still cannot do any high-level processing on such a flat sequence as above. For example, we have to recognize "123-456+789" to be "(123-456)+789", not "123-(456+789)". In terms of the data type Syntax.t defined in, we need a parse tree like:

Add(Sub(Int 123, Int 456), Int 789)

This process of translating a sequence of tokens into a parse tree is called parsing. The MinCaml compiler uses a tool called ocamlyacc to implement parsing in file parser.mly.

The contents of parser.mly are more or less similar to those of lexer.mll. It lists pattern matching from sequences of tokens to parse trees, for example like:

| exp PLUS exp
    { Add($1, $3) }

$1 and $3 mean the first and third syntax elements (which are both exp in this case).

This definition of syntax is almost the same as the previously described expressions e, but one point needs caution. In ML, function applcation is denoted by just a sequence of expressions. Therefore, if you write x - y, it is ambiguous whether you are subtracting integer y from integer x or applying function x to argument -y! Thus, we distinguish simple expressions simple_exp, which can be function arguments without extra parentheses, from general expressions exp. For example, since -y is not a simple_exp, the previous expression is known to be integer subtraction, not function application.

parser.mly also defines the priority of various syntax and binary operators as well as the data type of tokens mentioned in lexer.mll.

By the way, whenever the type of a variable is required (as in let), it is given by a fresh, undefined type variable Var(ref None) for the moment. This point will be explained next in type inference.