Word count: 2160
Dependent types are pretty cool, yo. This post is a semi-structured ramble about dtt, a small dependently-typed “programming language” inspired by Thierry Coquand’s Calculus of (inductive) Constructions (though, note that the induction part is still lacking: There is support for defining inductive data types, and destructuring them by pattern matching, but since there’s no totality checker, recursion is disallowed).
dtt
is written in Haskell, and served as a learning experience both in
type theory and in writing programs using extensible
effects. I do partly regret
the implementation of effects I chose (the more popular
extensible-effects
did not build on the Nixpkgs channel I had, so I went with freer
;
Refactoring between these should be easy enough, but I still haven’t
gotten around to it, yet)
I originally intended for this post to be a Literate Haskell file,
interleaving explanation with code. However, for a pet project, dtt
’s
code base quickly spiralled out of control, and is now over a thousand
lines long: It’s safe to say I did not expect this one bit.
The language
dtt
is a very standard calculus. We have all 4 axes of
Barendgret’s lambda cube, in virtue of having types be first class
values: Values depending on values (functions), values depending on
types (polymorphism), types depending on types (type operators), and
types depending on values (dependent types). This places dtt squarely at
the top, along with other type theories such as the Calculus of
Constructions (the theoretical basis for the Coq proof assistant) and TT
(the type theory behind the Idris programming language).
The syntax is very simple. We have the standard lambda calculus
constructs - -abstraction, application and variables - along
with let
-bindings, pattern matching case
expression, and
the dependent type goodies: -abstraction and Set
.
As an aside, pi types are called as so because the dependent function
space may (if you follow the “types are sets of values” line of
thinking) be viewed as the cartesian product of types. Consider a type
A
with inhabitants Foo
, Bar
and
a type B
with inhabitant Quux
. A dependent
product , then, has
inhabitants (Foo, Quux)
and (Bar, Quux)
.
You’ll notice that dtt does not have a dedicated arrow type. Indeed, the dependent product subsumes both the quantifier of System , and the arrow type of the simply-typed lambda calculus. Keep this in mind: It’ll be important later.
Since dtt’s syntax is unified (i.e., there’s no stratification of terms and types), the language can be - and is - entirely contained in a single algebraic data type. All binders are explicitly typed, seeing as inference for dependent types is undecidable (and, therefore, bad).1
type Type = Term
data Term
= Variable Var
| Set Int
| TypeHint Term Type
| Pi Var Type Type
| Lam Var Type Term
| Let Var Term Term
| App Term Term
| Match Term [(Pattern, Term)]
deriving (Eq, Show, Ord)
The TypeHint
term constructor, not mentioned before, is
merely a convenience: It allows the programmer to check their
assumptions and help the type checker by supplying a type (Note that we
don’t assume this type is correct, as you’ll see later; It merely helps
guide inference.)
Variables aren’t merely strings because of the large amount of substitutions we have to perform: For this, instead of generating a new name, we increment a counter attached to the variable - the pretty printer uses the original name to great effect, when unambiguous.
data Var
= Name String
| Refresh String Int
| Irrelevant
deriving (Eq, Show, Ord)
The Irrelevant
variable constructor is used to support as sugar for when does not
appear free in . As soon as the type checker encounters an
Irrelevant
variable, it is refreshed with a new name.
dtt
does not have implicit support (as in Idris), so all parameters,
including type parameters, must be bound explicitly. For this, we
support several kinds of syntatic sugar. First, all abstractions support
multiple variables in a binding group. This allows the programmer to
write (a, b, c : α) -> β
instead of (a : α) -> (b : α) -> (c : α) -> β
. Furthermore, there is special syntax /\a
for single-parameter
abstraction with type Set 0
, and lambda abstractions support
multiple binding groups.
As mentioned before, the language does not support recursion (either general or well-founded). Though I would like to, writing a totality checker is hard - way harder than type checking , in fact. However, an alternative way of inspecting inductive values does exist: eliminators. These are dependent versions of catamorphisms, and basically encode a proof by induction. An inductive data type as Nat gives rise to an eliminator much like it gives rise to a natural catamorphism.
inductive Nat : Type of {
Z : Nat;
S : Nat -> Nat
}
natElim : (P : Nat -> Type)
-> P Z
-> ((k : Nat) -> P k -> P (S k))
-> (n : Nat)
-> P n
If you squint, you’ll see that the eliminator models a proof by induction (of the proposition ) on the natural number : The type signature basically states “Given a proposition on , a proof of , a proof that follows from and a natural number , I’ll give you a proof of .”
This understanding of computations as proofs and types as propositions, by the way, is called the Curry-Howard Isomorphism. The regular, simply-typed lambda calculus corresponds to natural deduction, while corresponds to predicate logic.
The type system
Should this be called the term system?
Our type inference algorithm, contrary to what you might expect for such a complicated system, is actually quite simple. Unfortunately, the code isn’t, and thus isn’t reproduced in its entirety below.
Variables
The simplest case in any type system. The typing judgement that gives rise to this case is pretty much the identity: . If, from the current typing context we know that has type , then we know that has type .
Variable x -> do
<- lookupType x -- (I)
ty case ty of
Just t -> pure t -- (II)
Nothing -> throwError (NotFound x) -- (III)
- Look up the type of the variable in the current context.
- If we found a type for it, then return that (this is the happy path)
- If we didn’t find a type for it, we raise a type error.
Set
s
Since dtt has a cummulative hierarchy of universes, . This helps us avoid the logical inconsistency introduced by having type-in-type2, i.e. . We say that is the type of small types: in fact, is where most computation actually happens, seeing as for is reserved for -abstractions quantifying over such types.
Set k -> pure . Set . (+1) $ k
Type hints
Type hints are the first appearance of the unification engine, by far
the most complex part of dtt’s type checker. But for now, suffices to
know that t1 `assertEquality` t2
errors if the types t1
and t2 can’t be made to line up, i.e., unify.
For type hints, we infer the type of given expression, and compare it against the user-provided type, raising an error if they don’t match. Because of how the unification engine works, the given type may be more general (or specific) than the inferred one.
TypeHint v t -> do
<- infer v
it `assertEquality` it
t pure t
-abstractions
This is where it starts to get interesting. First, we mandate that the parameter type is inhabited (basically, that it is, in fact, a type). The dependent product , while allowed by the language’s grammar, is entirely meaningless: There’s no way to construct an inhabitant of , and thus this function may never be applied.
Then, in the context extended with , we require that the consequent is also a type itself: The function , while again a valid parse, is also meaningless.
The type of the overall abstraction is, then, the maximum value of the indices of the universes of the parameter and the consequent.
Pi x p c -> do
<- inferSet tx
k1 <- local (insertType (x, p)) $
k2
inferSet cpure $ Set (k1 `max` k2)
-abstractions
Much like in the simply-typed lambda calculus, the type of a -abstraction is an arrow between the type of its parameter and the type of its body. Of course, incurs the additional constraint that the type of the parameter is inhabited.
Alas, we don’t have arrows. So, we “lift” the lambda’s parameter to the type level, and bind it in a -abstraction.
Lam x t b -> do
<- inferSet t
_ Pi x t <$> local (insertType (x, t)) (infer b)
Note that, much like in the Pi
case, we type-check the body
in a context extended with the parameter’s type.
Application
Application is the most interesting rule, as it has to not only handle inference, it also has to handle instantiation of -abstractions.
Instantation is, much like application, handled by -reduction, with the difference being that instantiation happens during type checking (applying a -abstraction is meaningless) and application happens during normalisation (instancing a -abstraction is meaningless).
The type of the function being applied needs to be
a -abstraction, while the type of the operand needs to be
inhabited. Note that the second constraint is not written out
explicitly: It’s handled by the Pi
case above, and
furthermore by the unification engine.
App e1 e2 -> do
<- infer e1
t1 case t1 of
Pi vr i o -> do
<- infer e2
t2 `assertEquality` i
t =<< subst [(vr, e2)] o -- (I)
N.normalise -> throwError (ExpectedPi e) -- (II) e
Notice that, here, we don’t substitute the -bound variable by the type of : That’d make us equivalent to System . The whole deal with dependent types is that types depend on values, and that entirely stems from this one line. By instancing a type variable with a value, we allow types to depend on values.
Oh, and if we didn’t get a -abstraction, error.
You’ll notice that two typing rules are missing here: One for handling
let
s, which was not included because it is entirely
uninteresting, and one for case ... of
expressions, which
was redacted because it is entirely a mess.
Hopefully, in the future, the typing of case
expressions is simpler
- if not, they’ll probably be replaced by eliminators.
Unification and Constraint Solving
The unification engine is the man behind the curtain in type checking: We often don’t pay attention to it, but it’s the driving force behind it all. Fortunately, in our case, unification is entirely trivial: Solving is the hard bit.
The job of the unification engine is to produce a set of constraints
that have to be satisfied in order for two types to be equal. Then, the
solver is run on these constraints to assert that they are logically
consistent, and potentially produce substitutions that reify those
constraints.
Our solver isn’t that cool, though, so it just verifies consitency.
The kinds of constraints we can generate are as in the data type below.
data Constraint
= Instance Var Term -- (1)
| Equal Term Term -- (2)
| EqualTypes Type Type -- (3)
| IsSet Type -- (4)
deriving (Eq, Show, Ord)
- The constraint
Instance v t
corresponds to a substitution betweenv
and the termt
. - A constraint
Equal a b
states that the two termsa
andb
are equal under normalisation. - Ditto, but with their types (We normalise, infer, and check for equality)
- A constraint
IsSet t
asserts that the provided type has inhabitants.
Unification
Unification of most terms is entirely uninteresting. Simply line up the structures and produce the appropriate equality (or instance) constraints.
Variable a) b = instanceC a b
unify (Variable a) = instanceC a b
unify b (Set a) (Set b) | a == b = pure []
unify (App x y) (App x' y') =
unify (++) <$> unify x x' <*> unify y y'
(TypeHint a b) (TypeHint c d) =
unify (++) <$> unify a c <*> unify b d
(= throwError (NotEqual a b) unify a b
Those are all the boring cases, and I’m not going to comment on them. Similarly boring are binders, which were abstracted out because hlint told me to.
Lam v1 t1 b1) (Lam v2 t2 b2) = unifyBinder (v1, v2) (t1, t2) (b1, b2)
unify (Pi v1 t1 b1) (Pi v2 t2 b2) = unifyBinder (v1, v2) (t1, t2) (b1, b2)
unify (Let v1 t1 b1) (Let v2 t2 b2) = unifyBinder (v1, v2) (t1, t2) (b1, b2)
unify (= do
unifyBinder (v1, v2) (t1, t2) (b1, b2) <- (,) <$> unify (Variable v1) (Variable v2) <*> unify t1 t2
(a, b) ++ b) ++) <$> unify b1 b2 ((a
There are two interesting cases: Unification between some term and a pi abstraction, and unification between two variables.
@(Variable a) tb@(Variable b)
unify ta| a == b = pure []
| otherwise = do
<- (,) <$> lookupType a <*> lookupType b
(x, y) case (x, y) of
Just _, Just _) -> do
(<- equalTypesC ta tb
ca <- equalC ta tb
cb pure (ca ++ cb)
Just x', Nothing) -> instanceC b x'
(Nothing, Just x') -> instanceC a x'
(Nothing, Nothing) -> instanceC a (Variable b) (
If the variables are syntactically the same, then we’re done, and no constraints have to be generated (Technically you could generate an entirely trivial equality constraint, but this puts unnecessary pressure on the solver).
If either variable has a known type, then we generate an instance constraint between the unknown variable and the known one.
If both variables have a value, we equate their types’ types and their types. This is done mostly for error messages’ sakes, seeing as if two values are propositionally equal, so are their types.
Unification between a term and a -abstraction is the most interesting case: We check that the type abstracts over a type (i.e., it corresponds to a System F instead of a System F ), and instance the with a fresh type variable.
= do
unifyPi v1 t1 b1 a id <- refresh Irrelevant
<- isSetC t1
ss <- subst [(v1, Variable id)] b1
pi' ++ ss) <$> unify a pi'
(
Pi v1 t1 b1) = unifyPi v1 t1 b1 a
unify a (Pi v1 t1 b1) a = unifyPi v1 t1 b1 a unify (
Solving
Solving is a recursive function of the list of constraints (a catamorphism!) with some additional state: Namely, a strict map of already-performed substitutions. Let’s work through the cases in reverse order of complexity (and, interestingly, reverse order of how they’re in the source code).
No constraints
Solving an empty list of constraints is entirely trivial.
= pure () solveInner _ []
IsSet
We infer the index of the universe of the given type, much like in the inferrence case for -abstractions, and check the remaining constraints.
map (IsSet t:xs) = do
solveInner <- inferSet t
_ map xs solveInner
EqualTypes
We infer the types of both provided values, and generate an equality constraint.
map (EqualTypes a b:xs) = do
solveInner <- infer a
ta <- infer b
tb map (Equal ta tb:xs) solveInner
Equal
We merely have to check for syntactic equality of the (normal forms of) terms, because the hard lifting of destructuring and lining up was done by the unification engine.
map (Equal a b:xs) = do
solveInner <- N.normalise a
a' <- N.normalise b
b' <- equal a' b'
eq if eq
then solveInner map xs
else throwError (NotEqual a b)
Instance
If the variable we’re instancing is already in the map, and the thing we’re instancing it to now is not the same as before, we have an inconsistent set of substitutions and must error.
map (Instance a b:xs)
solveInner | a `M.member` map
/= map M.! a
, b Irrelevant /= a
, = throwError $ InconsistentSubsts (a, b) (map M.! a)
Otherwise, if we have a coherent set of instances, we add the instance both to scope and to our local state map and continue checking.
| otherwise =
$
local (insertType (a, b)) map) xs solveInner (M.insert a b
Now that we have both unify
and solve
, we can write
assertEquality
: We unify the two types, and then try to solve the set
of constraints.
= do
assertEquality t1 t2 <- unify t1 t2
cs solve cs
The real implementation will catch and re-throw any errors raised by
solve
to add appropriate context, and that’s not the only case where
“real implementation” and “blag implementation” differ.
Conclusion
Wow, that was a lot of writing. This conclusion begins on exactly the
500th line of the Markdown source of this article, and this is the
longest article on this blag (by far). However, that’s not to say it’s
bad: It was amazing to write, and writing dtt
was also amazing. I am
not good at conclusions.
dtt
is available under the BSD 3-clause licence, though I must warn
you that the source code hasn’t many comments.
I hope you learned nearly as much as I did writing this by reading it.
See System U, also Girard’s paradox - the type theory equivalent of Russell’s paradox.↩︎