My links:

edit

Start of topic | Skip to actions

How to Extract OCaml Types in Coq and Use them in a Certified Interpreter

Joseph Young

OCaml is well-suited for implementing interpreters. For example, simply typed lambda calculus has the grammar and types

e::=#n | lambda.e | app e e | i
t::=int | t->t

Notice that we are using de Bruijn notation for variables. This translates nicely into OCaml with the type definitions

type value=
| VInt of int
| VFun of (value -> value)
;;

type exp=
| Var0 
| VarS of exp
| Lam of exp
| App of exp*exp
| Int of int
;;

We also need to define a type for our environment which we define as

type environment=
| Empty
| Ext of value * environment

We can now use this type to define a few simple functions such as

let exp1=App ((Lam Var0),(Int 5));;
let exp2=App (App ((Lam (Lam (VarS Var0))),(Int 5)),(Int 4));;
let exp3=App (App ((Lam (Lam Var0)),(Int 5)),(Int 4));;

This allows us to define a simple interpreter in OCaml with the code

exception EvalErr of string;;

let rec eval e env=
    match e with
    | Var0->
        (match env with
        | Empty ->
            raise (EvalErr "Tried to retrieve a var from an empty env.")
        | Ext (x,xs)->
            x)
    | VarS e->
        let env=
            (match env with
            | Empty ->
                raise (EvalErr "Tried to retrieve a var from an empty env.")
            | Ext (x,xs)->
                xs)
        in
        eval e env
    | Lam e -> VFun (fun v->eval e (Ext (v,env)))
    | App (rator,rand) ->
        (match (eval rator env) with
        | VFun f-> f (eval rand env) 
        | _ -> raise (EvalErr "Tried to evaluate with a nonfunction."))
    | Int i -> VInt i
;;

This examples demonstrates where OCaml's type system can become verbose and inefficient when interpreting. Even after we type check our program, we are still forced wrap and rewrap our values with tags defined by our value type. For example, when we want to apply a function, we need to check that the operator is a function. When we evaluate the operator, we are returned something of type value. We then need to unwrap this value, insure it's a VFun, then apply it to the operand. This process of unwrapping and rewrapping values can be very expensive. When a richer type system we can certify that our operand will always have the correct type and eliminate the need for unwrapping and rewrapping values.

Concoqtion enriches OCaml's type system with dependent types. This allow to create programs that are guaranteed to be type safe. For example, we can insure that the operator of an application will always be a function. This will allow us to avoid wrapping and rewrapping our values; we don't need to check since our type system will guarantee the application is correct.

Before we begin, we need to establish typing rules for lambda calculus


-------------- (Var0)
Gamma,t |- #0: t

Gamma |- #(n) : t
--------------------- (VarS)
Gamma,t' |- #(n+1): t

Gamma,t1 |- e: t2
------------------------- (Abs)
Gamma |- lambda.e: t1->t2 

Gamma |- e1: t1->t2   Gamma |- e2:t1
------------------------------------ (App)
Gamma |- e1 e2:t2

Our strategy for defining certified terms will be to create types that depend on assertions above the line and yield types defined below the line. In other words, our types will define a series of propositions; namely, we have a well typed term. In order to prove these propositions, we need to pass in proofs (terms with specific types) of other propositions. We accomplish this in Concoqtion with the following code

coq
    Inductive Typ:Set :=
    | Int : Typ
    | Arrow : Typ->Typ->Typ.

    Inductive Env:Set :=
    | Empty : Env
    | Ext : Env->Typ->Env.
end;;

type ('e:'(Env),'t:'(Typ)) tterm=
| Var0 of let 't:'(Typ) 'e:'(Env) in unit : ('(Ext e t),'(t)) tterm
| VarS of let 't1:'(Typ) 't2:'(Typ) 'e:'(Env) in ('(e),'(t1)) tterm
    : ('(Ext e t2),'(t1)) tterm
| Abs of let 't1:'(Typ) 't2:'(Typ) 'e:'(Env) in ('(Ext e t1),'(t2)) tterm
    : ('(e),'(Arrow t1 t2)) tterm
| App of let 't1:'(Typ) 't2:'(Typ) 'e:'(Env) 
    in ('(e),'(Arrow t1 t2)) tterm * ('(e),'(t1)) tterm
    : ('(e),'(t2)) tterm
| Int of let 'e:'(Env) in int : ('(e),'(Int)) tterm
;;

We can use this type to define a simple program

let exp=App .|'(Int), '(Int), '(Empty)| (Abs .|'(Int), '(Int), '(Empty)| (Var0 .|'(Int),'(Empty)| ()),Int .|'(Empty)| 1);;

Unfortunately, there's a separation between the type systems in Concoqtion and Coq. Thus, we need to define two functions that will allow us to map a Coq type into an OCaml type. We can accomplish this with the following code

coq
    Fixpoint evalT (t:Typ) : OCamlType :=
    match t with
    | Int => OCaml_int
    | Arrow dom cod => OCaml_Arr (evalT dom) (evalT cod)
    end.

    Fixpoint evalE (e:Env) : OCamlType :=
    match e with
    | Empty => OCaml_unit
    | Ext e t => OCaml_Tuple (OCaml_List_cons (evalT t)
        (OCaml_List_cons (evalE e) OCaml_List_nil))
    end.
end;;

Thus, the entire interpreter is now given as

let rec eval .|e:'(Env),t:'(Typ)| (exp:('(e),'(t)) tterm) (env':'(evalE e))
    : '(evalT t) =
    match exp  in '(evalT t) with
    | Var0 .|t:'(Typ),e:'(Env)| _ ->
        fst env'
    | VarS .|t1:'(Typ), t2:'(Typ), e1:'(Env)| x->
        eval .|'(e1), '(t1)| x (snd env')
    | Abs .|t1:'(Typ), t2:'(Typ), e:'(Env)| body->
        (fun x->eval .|'(Ext e t1), '(t2)| body (x,env'))
    | App .|t1:'(Typ), t2:'(Typ), e:'(Env)| (rator,rand)->
        (eval .|'(e),'(Arrow t1 t2)| rator env')
            (eval .|'(e), '(t1)| rand env')
    | Int .|e:'(Env)| i-> i
;;

This interpreter looks slightly strange for a few reasons. Notice that we are actually passing in four arguments into the interpreter. The first gives the type of the items in the environment. The second gives the type that the evaluated expression will give. The third argument is the actual expression that we are evaluating. The fourth argument is the environment that actually holds values. Again, this is necessary since there is a separation between Coq and OCaml's type system.

As an alternative approach, could we have simply implemented this interpreter in Coq and extracted OCaml code? As a first step, we need to duplicate the interpreter above. We can define well typed terms with the Coq type

Inductive Typ: Set:=
| Int : Typ
| Arr : Typ->Typ->Typ.

Inductive Env:Set :=
| Empty : Env
| Ext : Env->Typ->Env.

Axioms OCamlInt: Set.
Extract Constant OCamlInt=>"int".

Set Implicit Arguments.
Inductive tterm : Env->Typ->Set :=
| Var0 : forall e:Env, forall t:Typ,
    (tterm (Ext e t) t)
| VarS : forall e:Env, forall t1 t2:Typ,
    (tterm e t1)->(tterm (Ext e t2) t1)
| Abs : forall e:Env, forall t1 t2:Typ,
    (tterm (Ext e t1) t2)->(tterm e (Arr t1 t2))
| App : forall e:Env, forall t1 t2:Typ,
    (tterm e (Arr t1 t2))->(tterm e t1)->(tterm e t2)
| Num: forall e:Env, OCamlInt->(tterm e Int) .
Unset Implicit Arguments.

This code is very similar to that of Concoqtion. The only significant difference is the section of code

Axioms OCamlInt: Set.
Extract Constant OCamlInt=>"int".

We can see that this extracts into

Coq < Recursive Extraction OCamlInt.
type oCamlInt = int

This is how we will generate valid OCaml code with Coq. It also creates an interesting problem. Proof systems like Coq are only as accurate as the axioms they possess. Thus, we can create problematic code if we make a mistake when defining our axioms.

Next, similar to the Concoqtion code, we need to create two functions that map Coq types into OCaml types:

Fixpoint evalT (t:Typ) : Set :=
    match t with
    | Int => OCamlInt
    | Arr a b => (evalT a) -> (evalT b)
    end.

Axiom OCamlTuple: Set->Set->Set.
Extract Constant OCamlTuple "'a" "'b"=>"'a*'b".
Axiom OCamlUnit: Set.
Extract Constant OCamlUnit => "unit".

Fixpoint evalE (e:Env) : Set:=
    match e with
    | Empty => OCamlTuple OCamlUnit OCamlUnit
    | Ext e t => OCamlTuple (evalT t) (evalE e)
    end.

This allows us to define our interpreter:

Set Implicit Arguments.
Axiom OCaml_fst: forall a b:Set, (OCamlTuple a b)->a.
Axiom OCaml_snd: forall a b:Set, (OCamlTuple a b)->b.
Axiom OCaml_pair: forall a b:Set, a->b->(OCamlTuple a b).
Unset Implicit Arguments.
Extract Constant OCaml_fst=>"fun x->fst x".
Extract Constant OCaml_snd=>"fun x->snd x".
Extract Constant OCaml_pair=>"fun x y->x,y".

Fixpoint eval (e:Env) (t:Typ) (exp:tterm e t) (env:evalE e) {struct exp} : evalT t :=
    match exp in (tterm e t) return (evalE e -> evalT t) with
    | Var0 e t => fun (x:evalE (Ext e t)) => OCaml_fst x
    | VarS e t1 t2 exp =>
        fun (env:evalE (Ext e t2)) => eval e t1 exp (OCaml_snd env)
    | Abs e t1 t2 body =>
        fun (env:evalE e) (x:evalT t1) =>
            eval (Ext e t1) t2 body (OCaml_pair x env)
    | App e t1 t2 rator rand=>
        fun (env:evalE e) =>
            (eval e (Arr t1 t2) rator env) (eval e t1 rand env)
    | Num e i => fun _ : evalE e => i
    end env.

Writing this code required a few tricks in Coq that are not immediately obvious. At first, we might have been tempted to write the routine

Fixpoint eval (e:Env) (t:Typ) (exp:tterm e t) (env:evalE e) {struct exp} : (evalT t) :=
    match exp in (tterm e t) return (evalT t) with
    | Var0 e t => OCaml_fst env                         
    | VarS e t1 t2 exp =>
        eval e t1 exp (OCaml_snd env)
    | Abs e t1 t2 body =>
        fun (x:evalT t1)=>
            eval (Ext e t1) t2 body (OCaml_pair x env)
    | App e t1 t2 rator rand=>
        (eval e (Arr t1 t2) rator env) (eval e t1 rand env)
    | Num e i => i
    end.

Unfortunately, this produces the error:

Error:
In environment
eval : forall (e : Env) (t : Typ), tterm e t -> evalE e -> evalT t
e : Env
t : Typ
exp : tterm e t
env : evalE e
e0 : Env
t0 : Typ
The term "env" has type "evalE e" while it is expected to have type
 "OCamlTuple ?75 ?76"

The problem is that the type of env is bound before the match statement. Thus, even though after the match statement we have more information about the contents of e, and thus evalE e, Coq can not automatically infer this information. In order to fix this situation, the match statement instead returns a function that accepts an argument of type evalE arg where arg is the type that e should have within the match statement. This allows Coq to correctly determine evalE e in each branch in the match statement.

When we finally output the resulting OCaml code, we have the program

type __ = Obj.t

type typ =
  | Int
  | Arr of typ * typ

type env =
  | Empty
  | Ext of env * typ

type oCamlInt = int

type tterm =
  | Var0 of env * typ
  | VarS of env * typ * typ * tterm
  | Abs of env * typ * typ * tterm
  | App of env * typ * typ * tterm * tterm
  | Num of env * oCamlInt

type ('a, 'b) oCamlTuple = 'a*'b

type evalT = __

type evalE = __

(** val oCaml_fst : ('a1, 'a2) oCamlTuple -> 'a1 **)

let oCaml_fst = fun x->fst x

(** val oCaml_snd : ('a1, 'a2) oCamlTuple -> 'a2 **)

let oCaml_snd = fun x->snd x

(** val oCaml_pair : 'a1 -> 'a2 -> ('a1, 'a2) oCamlTuple **)

let oCaml_pair = fun x y->x,y

(** val eval : env -> typ -> tterm -> evalE -> evalT **)

let rec eval e t exp env0 =
  match exp with
    | Var0 (e0, t0) -> oCaml_fst (Obj.magic env0)
    | VarS (e0, t1, t2, exp0) -> eval e0 t1 exp0 (oCaml_snd (Obj.magic env0))
    | Abs (e0, t1, t2, exp0) ->
        Obj.magic (fun x ->
          eval (Ext (e0, t1)) t2 exp0 (Obj.magic (oCaml_pair x env0)))
    | App (e0, t1, t2, rator, rand) ->
        Obj.magic eval e0 (Arr (t1, t2)) rator env0 (eval e0 t1 rand env0)
    | Num (e0, i) -> Obj.magic i

This code is pretty nasty, but we can have confidence it does the right thing since Coq certified the code. It prevents unwrapping and rewrapping values by using the unsafe cast Obj.magic. From a software engineering point of view, these routines should form a module of code that we use, but do not modify directly.

Of course, unlike Concoqtion, there's no separation between OCaml types and Coq types when we correctly define our axioms. Thus, it is logical to think that we can eliminate the functions evalT and evalE and implement these types directly. This leads us to our new definition of tterm as

Set Implicit Arguments.
Inductive tterm : Set->Set->Type :=
| Var0 : forall (e t: Set),
    (tterm (OCamlTuple t e) t)
| VarS : forall (e t1 t2: Set),
    (tterm e t1)->(tterm (OCamlTuple t2 e) t1)
| Abs : forall (e t1 t2: Set),
    (tterm (OCamlTuple t1 e) t2)->(tterm e (t1->t2))
| App : forall (e t1 t2: Set),
    (tterm e (t1->t2))->(tterm e t1)->(tterm e t2)
| Int: forall (e: Set),
    OCamlInt->(tterm e OCamlInt) .
Unset Implicit Arguments.

Notice that tterm no longer takes in arguments of type Env and Typ. Instead, it directly depends on Coq Sets. This allows us to write a new evaluation function

Fixpoint eval (e t:Set) (exp:(tterm e t)) {struct exp}:(e->t) :=
    match exp in (tterm e t) return (e->t) with
    | Var0 e t =>
        fun (env:Ext e t) => OCaml_fst env
    | VarS e t1 t2 exp=>
        fun (env:Ext e t2) => eval e t1 exp (OCaml_snd env)
    | Abs e t1 t2 body=>
        fun (env:e) (t:t1) =>
            eval (Ext e t1) t2 body (OCaml_pair t env)
    | App e t1 t2 rator rand=>
        fun (env:e)=>
            (eval e (t1->t2) rator env) (eval e t1 rand env)
    | Int e i =>
        fun _ : e => i
    end.

Similar to our problems above, we don't have enough information about e in each branch of the match statement. Thus, we have to return a function that takes e as an argument and returns a value of the correct type. This extracts to the code:

type __ = Obj.t

type oCamlInt = int

type ('a, 'b) oCamlTuple = 'a*'b

(** val oCaml_fst : ('a1, 'a2) oCamlTuple -> 'a1 **)

let oCaml_fst = fun x->fst x

(** val oCaml_snd : ('a1, 'a2) oCamlTuple -> 'a2 **)

let oCaml_snd = fun x->snd x

(** val oCaml_pair : 'a1 -> 'a2 -> ('a1, 'a2) oCamlTuple **)

let oCaml_pair = fun x y->x,y

type ('x0, 'x) tterm =
  | Var0
  | VarS of (__, 'x) tterm
  | Abs of ((__, 'x0) oCamlTuple, __) tterm
  | App of ('x0, __ -> 'x) tterm * ('x0, __) tterm
  | Int of oCamlInt

(** val eval : ('a1, 'a2) tterm -> 'a1 -> 'a2 **)

let rec eval exp x =
  match exp with
    | Var0 -> oCaml_fst (Obj.magic x)
    | VarS exp0 -> eval exp0 (oCaml_snd (Obj.magic x))
    | Abs body -> Obj.magic (fun t -> eval body (oCaml_pair t x))
    | App (rator, rand) -> eval rator x (eval rand x)
    | Int i -> Obj.magic i

However, when we run this OCaml, we receive the following error:

On line:
    | Abs body -> Obj.magic (fun t -> eval.reccall eval body (oCaml_pair t env))
At the expression:
    body
This expression has type ((__, __) oCamlTuple, __) tterm
but is here used with type (__, 'a) tterm
Type (__, __) oCamlTuple = __ * __ is not compatible with type __ = Obj.t

At first, this is quite a bit puzzling, but it gives some good insight into OCaml's type system. In Hinley-Milner type systems, the type of the function may be inferred from the first type the function is called. In our case, eval has been called in a case where the first argument has type,

(__, 'a) tterm

. This was most likely bound in the VarS^? case of the match statement. When we try to call eval again in the Abs case, body has type,

(__, __) oCamlTuple, __) tterm

. Thus, we have a type error. Of course, why did Coq generate this code in the first place? The answer lies within the definitions of evalT and evalE. If you look at the code extracted above, these functions erase entirely. In the definition of tterm, they prevent us from explicitely operating on Sets. Extracting types from Coq that depend on Sets can lead us to trouble as we've seen above.

There are a few different options to rectify this error. Notice that this function would type check correctly if OCaml had polymorphic recursive. Although it doesn't explicitly support polymorphic recursion, there's a hack using record types that allows us to simulate it. Consider the following code

type q = { reccall : 'a 'b. q -> ('a,'b) tterm -> 'a -> 'b }

let eval eval exp env =
  match exp with 
    | Var0 -> oCaml_fst (Obj.magic env)
    | VarS exp0 -> eval.reccall eval exp0 (oCaml_snd (Obj.magic env))
    | Abs body -> Obj.magic (fun t -> eval.reccall eval body (oCaml_pair t env))
    | App (rator, rand) -> eval.reccall eval rator env (eval.reccall eval rand env)     
    | Int i -> Obj.magic i

let myeval exp env=
    let r = {reccall = eval} in
    eval r exp env
;;

Record types in OCaml allow universal quantifiers inside the type. This differs from regular type declarations which require that universal quantifiers exist outside the type. This allows us to define a nonrecursive version of eval where we explicitely perform the recursive call using our record type. Then, the resulting function myeval typechecks correctly.

Of course, the polymorphic recursion trick is a dirty hack. From a software engineering point of view, we do not want to touch the code the Coq extracts since it contains many unsafe constructs. Thus, we will be far better off creating a program in Coq that extracts OCaml code that typechecks correctly. At this point, we make one important observation. Whenever we evaluate a new program, our initial environment is empty. Therefore, we know the type of the environment for every new, unevaluated program. Thus, we can add one level of indirection with the following Coq code:

Axiom OCaml_unit:OCamlUnit.
Extract Constant OCaml_unit=>"()".
Set Implicit Arguments. 
Definition eval (t:Set) (exp:(tterm OCamlUnit t)) :=
    let eval:=fix eval (e t:Set) (exp:(tterm e t)) {struct exp}:(e->t) :=
        match exp in (tterm e t) return (e->t) with
        | Var0 e t =>
            fun (env:Ext e t) => OCaml_fst env
        | VarS e t1 t2 exp=>
            fun (env:Ext e t2) => eval e t1 exp (OCaml_snd env)
        | Abs e t1 t2 body=>
            fun (env:e) (t:t1) =>
                eval (Ext e t1) t2 body (OCaml_pair t env)
        | App e t1 t2 rator rand=>
            fun (env:e)=>
                (eval e (t1->t2) rator env) (eval e t1 rand env)
        | Int e i =>
            fun _ : e => i
        end 
    in
    eval OCamlUnit t exp OCaml_unit.
Unset Implicit Arguments.

This function uses the val function as we've defined above, but it restricts all inputed programs to have an initial empty environment. This extracts to the code:

type __ = Obj.t
let __ = let rec f _ = Obj.repr f in Obj.repr f

type oCamlInt = int

type ('a, 'b) oCamlTuple = 'a*'b

type oCamlUnit = unit

(** val oCaml_fst : ('a1, 'a2) oCamlTuple -> 'a1 **)

let oCaml_fst = fun x->fst x

(** val oCaml_snd : ('a1, 'a2) oCamlTuple -> 'a2 **)

let oCaml_snd = fun x->snd x

(** val oCaml_pair : 'a1 -> 'a2 -> ('a1, 'a2) oCamlTuple **)

let oCaml_pair = fun x y->x,y

type ('x0, 'x) tterm =
  | Var0
  | VarS of (__, 'x) tterm
  | Abs of ((__, 'x0) oCamlTuple, __) tterm
  | App of ('x0, __ -> 'x) tterm * ('x0, __) tterm
  | Int of oCamlInt

(** val oCaml_unit : oCamlUnit **)

let oCaml_unit = ()

(** val eval : (oCamlUnit, 'a1) tterm -> 'a1 **)

let eval exp =
  Obj.magic (fun _ _ exp0 env ->
    let rec eval0 exp1 env0 =
      match exp1 with
        | Var0 -> oCaml_fst env0
        | VarS exp2 ->
            Obj.magic (fun _ _ exp3 env1 -> eval0 exp3 env1) __ __ exp2
              (oCaml_snd env0)
        | Abs body -> (fun t ->
            Obj.magic (fun _ _ exp2 env1 -> eval0 exp2 env1) __ __ body
              (oCaml_pair t env0))
        | App (rator, rand) ->
            Obj.magic (fun _ _ exp2 env1 -> eval0 exp2 env1) __ __ rator env0
              (eval0 rand env0)
        | Int i -> Obj.magic i
    in eval0 exp0 env) __ __ exp oCaml_unit

This code typechecks correctly in OCaml. Thus, what we've done is eliminated the need for polymorphic recursion by using the knowledge that all unevaluated programs should begin with an empty environment.

Now that we know that we can implement a certified interpreter in Coq, what are the advantages or disadvantages to this approach as opposed to Concoqtion? One obvious advantage of Concoqtion is that readily allows mutation while Coq does not. Now, that doesn't mean that we can not produce OCaml code in Coq that performs mutation. For example, we can create OCaml references in Coq with the following

Axiom OCamlRef: Set->Set.
Extract Constant OCamlRef "'a"=>"'a ref".

But, these constructs are far more convenient on Concoqtion. Also, in Coq, we need to declare an Axiom for every OCaml function that we use. This could potentially introduce errors if we make a mistake when defining our axioms. However, this penalty can be mitigated once a standard library has been created and throughly checked. Finally, in Concoqtion, we have context refinement. This prevents us from using the strange match statment trick where we return a function. The obvious advantage of Coq over Concoqtion is that there's no separation between different kinds of types. This gives us far more flexibility when designing the interpreter. Of course, we must be careful with how the interpreter is created, otherwise we will not generate valid OCaml code.

End of topic
Skip to actions | Back to top

Web Actions:

Preferences

This work is licensed under a Creative Commons Attribution 2.5 License. Please follow our citation guidelines.