Word count: 1345
Compositional type-checking is a neat technique that I first saw in a paper by Olaf Chitil1. He introduces a system of principal typings, as opposed to a system of principal types, as a way to address the bad type errors that many functional programming languages with type systems based on Hindley-Milner suffer from.
Today I want to present a small type checker for a core ML (with, notably, no data types or modules) based roughly on the ideas from that paper. This post is almost literate Haskell, but it’s not a complete program: it only implements the type checker. If you actually want to play with the language, grab the unabridged code here.
module Typings where
import qualified Data.Map.Merge.Strict as Map
import qualified Data.Map.Strict as Map
import qualified Data.Set as Set
import Data.Foldable
import Data.List
import Data.Char
import Control.Monad.Except
We’ll begin, like always, by defining data structures for the language.
Now, this is a bit against my style, but this system (which I
shall call ML - but only because it sounds cool) is not
presented as a pure type system - there are separate grammars for terms
and types. Assume that Var
is a suitable member of all the
appropriate type classes.
data Exp
= Lam Var Exp
| App Exp Exp
| Use Var
| Let (Var, Exp) Exp
| Num Integer
deriving (Eq, Show, Ord)
data Type
= TyVar Var
| TyFun Type Type
| TyCon Var
deriving (Eq, Show, Ord)
ML is painfully simple: It’s a lambda calculus
extended with Let
since there needs to be a demonstration of
recursion and polymorphism, and numbers so there can be a base type. It
has no unusual features - in fact, it doesn’t have many features at all:
no rank-N types, GADTs, type classes, row-polymorphic records, tuples or
even algebraic data types.
I believe that a fully-featured programming language along the lines of Haskell could be shaped out of a type system like this, however I am not smart enough and could not find any prior literature on the topic. Sadly, it seems that compositional typings aren’t a very active area of research at all.
The novelty starts to show up when we define data to represent the
different kinds of scopes that crop up. There are monomorphic
-contexts, which assign types to names, and also polymorphic
-contexts, that assign typings to names instead. While we’re
defining newtype
s over Map
s, let’s also get
substitutions out of the way.
newtype Delta = Delta (Map.Map Var Type)
deriving (Eq, Ord, Semigroup, Monoid)
newtype Subst = Subst (Map.Map Var Type)
deriving (Eq, Show, Ord, Monoid)
newtype Gamma = Gamma (Map.Map Var Typing)
deriving (Eq, Show, Ord, Semigroup, Monoid)
The star of the show, of course, are the typings themselves. A typing is a pair of a (monomorphic) type and a -context, and in a way it packages both the type of an expression and the variables it’ll use from the scope.
data Typing = Typing Delta Type
deriving (Eq, Show, Ord)
With this, we’re ready to look at how inference proceeds for ML. I make no effort at relating the rules implemented in code to anything except a vague idea of the rules in the paper: Those are complicated, especially since they deal with a language much more complicated than this humble calculus. In an effort not to embarrass myself, I’ll also not present anything “formal”.
infer :: Exp -- The expression we're computing a typing for
-> Gamma -- The Γ context
-> [Var] -- A supply of fresh variables
-> Subst -- The ambient substitution
-> Either TypeError ( Typing -- The typing
Var] -- New variables
, [Subst -- New substitution
, )
There are two cases when dealing with variables. Either a typing is present in the environment , in which case we just use that with some retouching to make sure type variables aren’t repeated - this takes the place of instantiating type schemes in Hindley-Milner. However, a variable can also not be in the environment , in which case we invent a fresh type variable 2 for it and insist on the monomorphic typing .
Use v) (Gamma env) (new:xs) sub =
infer (case Map.lookup v env of
Just ty -> -- Use the typing that was looked up
pure ((\(a, b) -> (a, b, sub)) (refresh ty xs))
Nothing -> -- Make a new one!
let new_delta = Delta (Map.singleton v new_ty)
= TyVar new
new_ty in pure (Typing new_delta new_ty, xs, sub)
Interestingly, this allows for (principal!) typings to be given even to
code containing free variables. The typing for the expression x
, for
instance, is reported to be . Since
this isn’t meant to be a compiler, there’s no handling for variables
being out of scope, so the full inferred typings are printed on the
REPL- err, RETL? A read-eval-type-loop!
> x
{ x :: a } ⊢ a
Moreover, this system does not have type schemes: Typings subsume those as well. Typings explicitly carry information regarding which type variables are polymorphic and which are constrained by something in the environment, avoiding a HM-like generalisation step.
where
refresh :: Typing -> [Var] -> (Typing, [Var])
Typing (Delta delta) tau) xs =
refresh (let tau_fv = Set.toList (ftv tau `Set.difference` foldMap ftv delta)
= splitAt (length tau_fv) xs
(used, xs') = Subst (Map.fromList (zip tau_fv (map TyVar used)))
sub in (Typing (applyDelta sub delta) (apply sub tau), xs')
refresh
is responsible for ML’s analogue of
instantiation: New, fresh type variables are invented for each type
variable free in the type that is not also free in the context
. Whether or not this is better than quantifiers is up
for debate, but it is jolly neat.
The case for application might be the most interesting. We infer two typings and for the function and the argument respectively, then unify with with fresh.
App f a) env (alpha:xs) sub = do
infer (Typing delta_f type_f, xs, sub) <- infer f env xs sub
(Typing delta_a type_a, xs, sub) <- infer a env xs sub
(
<- unify (TyFun type_a (TyVar alpha)) type_f mgu
This is enough to make sure that the expressions involved are compatible, but it does not ensure that the contexts attached are also compatible. So, the substitution is applied to both contexts and they are merged - variables present in one but not in the other are kept, and variables present in both have their types unified.
let delta_f' = applyDelta mgu delta_f
= applyDelta mgu delta_a
delta_a' <- mergeDelta delta_f' delta_a'
delta_fa
pure (Typing delta_fa (apply mgu (TyVar alpha)), xs, sub <> mgu)
If a variable x
has, say, type Bool
in the function’s context but Int
in the argument’s context - that’s a type error, one which that can be
very precisely reported as an inconsistency in the types x
is used at
when trying to type some function application. This is much better than
the HM approach, which would just claim the latter usage is wrong.
There are three spans of interest, not one.
Inference for abstractions is simple: We invent a fresh monomorphic typing for the bound variable, add it to the context when inferring a type for the body, then remove that one specifically from the typing of the body when creating one for the overall abstraction.
Lam v b) (Gamma env) (alpha:xs) sub = do
infer (let ty = TyVar alpha
= Typing (Delta (Map.singleton v ty)) ty
mono_typing = Gamma (Map.insert v mono_typing env)
new_env
Typing (Delta body_delta) body_ty, xs, sub) <- infer b new_env xs sub
(
let delta' = Delta (Map.delete v body_delta)
pure (Typing delta' (apply sub (TyFun ty body_ty)), xs, sub)
Care is taken to apply the ambient substitution to the type of the abstraction so that details learned about the bound variable inside the body will be reflected in the type. This could also be extracted from the typing of the body, I suppose, but eh.
let
s are very easy, especially since generalisation is
implicit in the structure of typings. We simply compute a typing from
the body, reduce it with respect to the let-bound variable, add it to
the environment and infer a typing for the body.
Let (var, exp) body) gamma@(Gamma env) xs sub = do
infer (<- infer exp gamma xs sub
(exp_t, xs, sub) let exp_s = reduceTyping var exp_t
= Gamma (Map.insert var exp_s env)
gamma' infer body gamma' xs sub
Reduction w.r.t. a variable x
is a very simple operation that makes
typings as polymorphic as possible, by deleting entries whose free type
variables are disjoint with the overall type along with the entry for
x
.
reduceTyping :: Var -> Typing -> Typing
Typing (Delta delta) tau) =
reduceTyping x (let tau_fv = ftv tau
= Map.filter keep (Map.delete x delta)
delta' = not $ Set.null (ftv sigma `Set.intersection` tau_fv)
keep sigma in Typing (Delta delta') tau
Parsing, error reporting and user interaction do not have interesting implementations, so I have chosen not to include them here.
Compositional typing is a very promising approach for languages with simple polymorphic type systems, in my opinion, because it presents a very cheap way of providing very accurate error messages much better than those of Haskell, OCaml and even Elm, a language for which good error messages are an explicit goal.
As an example of this, consider the expression fun x -> if x (add x 0) 1
(or, in Haskell, \x -> if x then (x + (0 :: Int)) else (1 :: Int)
- the type annotations are to emulate
ML’s insistence on monomorphic numbers).
Types Bool and Int aren't compatible
When checking that all uses of 'x' agree
When that checking 'if x' (of type e -> e -> e)
can be applied to 'add x 0' (of type Int)
Typing conflicts:
x : Bool vs. Int ·
The error message generated here is much better than the one GHC
reports, if you ask me. It points out not that x has some “actual” type
distinct from its “expected” type, as HM would conclude from its
left-to-right bias, but rather that two uses of x
aren’t compatible.
<interactive>:4:18: error:
Couldn't match expected type ‘Int’ with actual type ‘Bool’
• In the expression: (x + 0 :: Int)
• In the expression: if x then (x + 0 :: Int) else 0
In the expression: \ x -> if x then (x + 0 :: Int) else 0
Of course, the prototype doesn’t care for positions, so the error message is still not as good as it could be.
Perhaps it should be further investigated whether this approach scales to at least type classes (since a form of ad-hoc polymorphism is absolutely needed) and polymorphic records, so that it can be used in a real language. I have my doubts as to if a system like this could reasonably be extended to support rank-N types, since it does not have quantifiers.
UPDATE: I found out that extending a compositional typing system to support type classes is not only possible, it was also Gergő Érdi’s MSc. thesis!
UPDATE: Again! This is new. Anyway, I’ve cleaned up the code and thrown it up on GitHub.
Again, a full program implementing ML is available here. Thank you for reading!
Olaf Chitil. 2001. Compositional explanation of types and algorithmic debugging of type errors. In Proceedings of the sixth ACM SIGPLAN international conference on Functional programming (ICFP ’01). ACM, New York, NY, USA, 193-204. DOI.↩︎
Since I couldn’t be arsed to set up monad transformers and all, we’re doing this the lazy way (ba dum tss): an infinite list of variables, and hand-rolled reader/state monads.↩︎