Multi-staging

Unstaged untyped LC interpreter

Recall that a simple interpreter for the untyped lambda calculus should look like this:

 type dom = 
 | VInt of int
 | VFun of dom -> dom

 type exp =
 | Var of string
 | App of exp * exp
 | Lam of string * exp
 | Int of int

 let envZero x = raise NotFound

 let ext env x v = 
   \lambda y. if x = y then v else env x
 
 let unVFun v =
   match v with
   |VFun f -> f
   |_ -> raise Error


 let rec eval e env =
   match e with
   | Int i -> VInt i
   | Var s -> env s
   | Lam(x,e) -> VFun (\lambda v. eval e (ext env x v))
   | App(e1,e2) -> (unVFun (eval e1 env))(eval e2 env)

And the type of eval is:

eval: exp -> (string -> dom) -> dom

Staging the LC interpreter

This interpreter's performance is the not the bet attainable because we have 2 overheads:

First, the usage of tags and the need to untag them. We can't get rid of this because we are writing an interpreter for an untyped language and the tags have an important role here.
We construct lambda term in Lam(x,e) that contains an eval that we can't evaluate until the application occurs because we don't really know the value of v. The solution to this is using staging. This will require modifying the eval function as follows:

 let rec eval e env =
   match e with
   | Int i -> VInt < i >
   | Var s -> env s
   | Lam(x,e) -> < VFun (\lambda v. ~(eval e (ext env x < v >))) >
   | App(e1,e2) -> < (unVFun ~(eval e1 env)) ~(eval e2 env) >

and the type of eval becomes:

eval: exp -> (string -> < dom >) -> < dom >

Is this correct? No. Let's look at the first branch.

The question is do we want:

   | Int i -> < VInt i >

   | Int i -> VInt < i >

The second is better because we delay the evaluation of less things, but will this type check?

Actually using the first version, the return type of this branch is < dom >. Using the second version the type is different. It might be something that we call dom' defined as:

type dom' = 
|VInt of <int>
|VFun of <dom> -> <dom>

This will change the type of eval to:

eval: exp -> (string -> dom') -> dom'

What about the second branch? Var s

It works in both cases.

What about the third branch?

dom' will not work because in this case we would return:

VFun < (\lambda v. ~(eval e (ext env x < v >))) >

and note that (eval e (ext env x < v >)) has type dom' which can't be passed to a ~.

Therefore we have to go back to the < dom > definition and forget about dom' because it will not work. But we should all try to see if we can get it to work in a way or another (very unlikely).

So the final definition of staged eval should be:

 let rec eval e env =
   match e with
   | Int i -> < VInt i >
   | Var s -> env s
   | Lam(x,e) -> < VFun (\lambda v. ~(eval e (ext env x < v >))) >
   | App(e1,e2) -> < (unVFun ~(eval e1 env)) ~(eval e2 env) >

Having type:

eval: exp -> (string -> < dom >) -> < dom >

Example

What do we get if we run the following?

eval(parse "\lambda x.x") envZero
= eval (Lam ("x", Var "x")) envZero
= < VFun (\lambda v.v) >

eval(parse "\lambda x. \lambda y. x") envZero
= eval (Lam ("x", (Lam ("y", Var "x")))) envZero
= < VFun (\lambda v. VFun (\lambda v. v)) >

The problem here is that we get a wrong answer the correct answer should be:

< VFun (\lambda v1. VFun (\lambda v2. v1)) >

And that's what happens in MetaOCaml, and that's called hygienic renaming. This occurs as follows: Every time you see a binder (\lambda) inside a bracket, then you generate a fresh variable name and use it for all bound occurrence of this bound variable.