Word count: 2160

Dependent types are pretty cool, yo. This post is a semi-structured
ramble about dtt,
a small dependently-typed “programming language” inspired by Thierry
Coquand’s Calculus of (inductive) Constructions (though, note that the
*induction* part is still lacking: There is support for defining
inductive data types, and destructuring them by pattern matching, but
since there’s no totality checker, recursion is disallowed).

`dtt`

is written in Haskell, and served as a learning experience both in
type theory and in writing programs using extensible
effects. I *do* partly regret
the implementation of effects I chose (the more popular
`extensible-effects`

did not build on the Nixpkgs channel I had, so I went with `freer`

;
Refactoring between these should be easy enough, but I still haven’t
gotten around to it, yet)

I originally intended for this post to be a Literate Haskell file,
interleaving explanation with code. However, for a pet project, `dtt`

’s
code base quickly spiralled out of control, and is now over a thousand
lines long: It’s safe to say I did not expect this one bit.

### The language

`dtt`

is a very standard $\lambda_{\prod{}}$ calculus. We have all 4 axes of
Barendgret’s lambda cube, in virtue of having types be first class
values: Values depending on values (functions), values depending on
types (polymorphism), types depending on types (type operators), and
types depending on values (dependent types). This places dtt squarely at
the top, along with other type theories such as the Calculus of
Constructions (the theoretical basis for the Coq proof assistant) and TT
(the type theory behind the Idris programming language).

The syntax is very simple. We have the standard lambda calculus
constructs - $\lambda$-abstraction, application and variables - along
with `let`

-bindings, pattern matching `case`

expression, and
the dependent type goodies: $\prod$-abstraction and `Set`

.

*As an aside*, pi types are called as so because the dependent function
space may (if you follow the “types are sets of values” line of
thinking) be viewed as the cartesian product of types. Consider a type
`A`

with inhabitants `Foo`

, `Bar`

and
a type `B`

with inhabitant `Quux`

. A dependent
product $\displaystyle\prod_{(x: \mathtt{A})}\mathtt{B}$, then, has
inhabitants `(Foo, Quux)`

and `(Bar, Quux)`

.

You’ll notice that dtt does not have a dedicated arrow type. Indeed, the dependent product subsumes both the $\forall$ quantifier of System $F$, and the arrow type $\to$ of the simply-typed lambda calculus. Keep this in mind: It’ll be important later.

Since dtt’s syntax is unified (i.e., there’s no stratification of terms
and types), the language can be - and is - entirely contained in
a single algebraic data type. All binders are *explicitly typed*, seeing
as inference for dependent types is undecidable (and, therefore,
bad).^{1}

```
type Type = Term
data Term
= Variable Var
| Set Int
| TypeHint Term Type
| Pi Var Type Type
| Lam Var Type Term
| Let Var Term Term
| App Term Term
| Match Term [(Pattern, Term)]
deriving (Eq, Show, Ord)
```

The `TypeHint`

term constructor, not mentioned before, is
merely a convenience: It allows the programmer to check their
assumptions and help the type checker by supplying a type (Note that we
don’t assume this type is correct, as you’ll see later; It merely helps
guide inference.)

Variables aren’t merely strings because of the large amount of substitutions we have to perform: For this, instead of generating a new name, we increment a counter attached to the variable - the pretty printer uses the original name to great effect, when unambiguous.

```
data Var
= Name String
| Refresh String Int
| Irrelevant
deriving (Eq, Show, Ord)
```

The `Irrelevant`

variable constructor is used to support $a \to b$ as sugar for $\displaystyle\prod_{(x: a)} b$ when $x$ does not
appear free in $b$. As soon as the type checker encounters an
`Irrelevant`

variable, it is refreshed with a new name.

`dtt`

does not have implicit support (as in Idris), so all parameters,
including type parameters, must be bound explicitly. For this, we
support several kinds of syntatic sugar. First, all abstractions support
multiple variables in a *binding group*. This allows the programmer to
write `(a, b, c : α) -> β`

instead of `(a : α) -> (b : α) -> (c : α) -> β`

. Furthermore, there is special syntax `/\a`

for single-parameter
abstraction with type `Set 0`

, and lambda abstractions support
multiple binding groups.

As mentioned before, the language does not support recursion (either
general or well-founded). Though I would like to, writing a totality
checker is hard - way harder than type checking $\lambda_{\prod{}}$, in
fact. However, an alternative way of inspecting inductive values *does*
exist: eliminators. These are dependent versions of catamorphisms, and
basically encode a proof by induction. An inductive data type as Nat
gives rise to an eliminator much like it gives rise to a natural
catamorphism.

```
inductive Nat : Type of {
Z : Nat;
S : Nat -> Nat
}
natElim : (P : Nat -> Type)
-> P Z
-> ((k : Nat) -> P k -> P (S k))
-> (n : Nat)
-> P n
```

If you squint, you’ll see that the eliminator models a proof by induction (of the proposition $P$) on the natural number $n$: The type signature basically states “Given a proposition $P$ on $\mathbb{N}$, a proof of $P_0$, a proof that $P_{(k + 1)}$ follows from $P_k$ and a natural number $n$, I’ll give you a proof of $P_n$.”

This understanding of computations as proofs and types as propositions, by the way, is called the Curry-Howard Isomorphism. The regular, simply-typed lambda calculus corresponds to natural deduction, while $\lambda_{\prod{}}$ corresponds to predicate logic.

### The type system

Should this be called the term system?

Our type inference algorithm, contrary to what you might expect for such a complicated system, is actually quite simple. Unfortunately, the code isn’t, and thus isn’t reproduced in its entirety below.

#### Variables

The simplest case in any type system. The typing judgement that gives rise to this case is pretty much the identity: $\Gamma \vdash \alpha: \tau \therefore \Gamma \vdash \alpha: \tau$. If, from the current typing context we know that $\alpha$ has type $\tau$, then we know that $\alpha$ has type $\tau$.

```
Variable x -> do
<- lookupType x -- (I)
ty case ty of
Just t -> pure t -- (II)
Nothing -> throwError (NotFound x) -- (III)
```

- Look up the type of the variable in the current context.
- If we found a type for it, then return that (this is the happy path)
- If we didn’t find a type for it, we raise a type error.

`Set`

s

Since dtt has a cummulative hierarchy of universes, $\mathtt{Set}_k: \mathtt{Set}_{(k + 1)}$. This helps us avoid the logical inconsistency
introduced by having *type-in-type*^{2}, i.e. $\mathtt{Type}: \mathtt{Type}$. We say that $\mathtt{Set}_0$ is the type of *small
types*: in fact, $\mathtt{Set}_0$ is where most computation actually
happens, seeing as $\mathtt{Set}_k$ for $k \ge 1$ is reserved for
$\prod$-abstractions quantifying over such types.

`Set k -> pure . Set . (+1) $ k `

#### Type hints

Type hints are the first appearance of the unification engine, by far
the most complex part of dtt’s type checker. But for now, suffices to
know that `t1 `assertEquality` t2`

errors if the types t1
and t2 can’t be made to *line up*, i.e., unify.

For type hints, we infer the type of given expression, and compare it against the user-provided type, raising an error if they don’t match. Because of how the unification engine works, the given type may be more general (or specific) than the inferred one.

```
TypeHint v t -> do
<- infer v
it `assertEquality` it
t pure t
```

#### $\prod$-abstractions

This is where it starts to get interesting. First, we mandate that the
parameter type is inhabited (basically, that it *is*, in fact, a type).
The dependent product $\displaystyle\prod_{(x : 0)} \alpha$, while allowed by the
language’s grammar, is entirely meaningless: There’s no way to construct
an inhabitant of $0$, and thus this function may never be applied.

Then, in the context extended with $(\alpha : \tau)$, we require that the consequent is also a type itself: The function $\displaystyle\prod_{(x: \mathbb{N})} 0$, while again a valid parse, is also meaningless.

The type of the overall abstraction is, then, the maximum value of the indices of the universes of the parameter and the consequent.

```
Pi x p c -> do
<- inferSet tx
k1 <- local (insertType (x, p)) $
k2
inferSet cpure $ Set (k1 `max` k2)
```

#### $\lambda$-abstractions

Much like in the simply-typed lambda calculus, the type of a $\lambda$-abstraction is an arrow between the type of its parameter and the type of its body. Of course, $\lambda_{\prod{}}$ incurs the additional constraint that the type of the parameter is inhabited.

Alas, we don’t have arrows. So, we “lift” the lambda’s parameter to the type level, and bind it in a $\prod$-abstraction.

```
Lam x t b -> do
<- inferSet t
_ Pi x t <$> local (insertType (x, t)) (infer b)
```

Note that, much like in the `Pi`

case, we type-check the body
in a context extended with the parameter’s type.

#### Application

Application is the most interesting rule, as it has to not only handle inference, it also has to handle instantiation of $\prod$-abstractions.

Instantation is, much like application, handled by $\beta$-reduction, with the difference being that instantiation happens during type checking (applying a $\prod$-abstraction is meaningless) and application happens during normalisation (instancing a $\lambda$-abstraction is meaningless).

The type of the function being applied needs to be
a $\prod$-abstraction, while the type of the operand needs to be
inhabited. Note that the second constraint is not written out
explicitly: It’s handled by the `Pi`

case above, and
furthermore by the unification engine.

```
App e1 e2 -> do
<- infer e1
t1 case t1 of
Pi vr i o -> do
<- infer e2
t2 `assertEquality` i
t =<< subst [(vr, e2)] o -- (I)
N.normalise -> throwError (ExpectedPi e) -- (II) e
```

Notice that, here, we don’t substitute the $\prod$-bound variable by the type of $e_2$: That’d make us equivalent to System $F$. The whole

*deal*with dependent types is that types depend on values, and that entirely stems from this one line. By instancing a type variable with a value, we allow*types*to depend on*values*.Oh, and if we didn’t get a $\prod$-abstraction, error.

You’ll notice that two typing rules are missing here: One for handling
`let`

s, which was not included because it is entirely
uninteresting, and one for `case ... of`

expressions, which
was redacted because it is entirely a mess.

Hopefully, in the future, the typing of `case`

expressions is simpler
- if not, they’ll probably be replaced by eliminators.

### Unification and Constraint Solving

The unification engine is the man behind the curtain in type checking: We often don’t pay attention to it, but it’s the driving force behind it all. Fortunately, in our case, unification is entirely trivial: Solving is the hard bit.

The job of the unification engine is to produce a set of constraints
that have to be satisfied in order for two types to be equal. Then, the
solver is run on these constraints to assert that they are logically
consistent, and potentially produce substitutions that *reify* those
constraints.

Our solver isn’t that cool, though, so it just verifies consitency.

The kinds of constraints we can generate are as in the data type below.

```
data Constraint
= Instance Var Term -- (1)
| Equal Term Term -- (2)
| EqualTypes Type Type -- (3)
| IsSet Type -- (4)
deriving (Eq, Show, Ord)
```

- The constraint
`Instance v t`

corresponds to a substitution between`v`

and the term`t`

. - A constraint
`Equal a b`

states that the two terms`a`

and`b`

are equal under normalisation. - Ditto, but with their
*types*(We normalise, infer, and check for equality) - A constraint
`IsSet t`

asserts that the provided type has inhabitants.

#### Unification

Unification of most terms is entirely uninteresting. Simply line up the structures and produce the appropriate equality (or instance) constraints.

```
Variable a) b = instanceC a b
unify (Variable a) = instanceC a b
unify b (Set a) (Set b) | a == b = pure []
unify (App x y) (App x' y') =
unify (++) <$> unify x x' <*> unify y y'
(TypeHint a b) (TypeHint c d) =
unify (++) <$> unify a c <*> unify b d
(= throwError (NotEqual a b) unify a b
```

Those are all the boring cases, and I’m not going to comment on them. Similarly boring are binders, which were abstracted out because hlint told me to.

```
Lam v1 t1 b1) (Lam v2 t2 b2) = unifyBinder (v1, v2) (t1, t2) (b1, b2)
unify (Pi v1 t1 b1) (Pi v2 t2 b2) = unifyBinder (v1, v2) (t1, t2) (b1, b2)
unify (Let v1 t1 b1) (Let v2 t2 b2) = unifyBinder (v1, v2) (t1, t2) (b1, b2)
unify (= do
unifyBinder (v1, v2) (t1, t2) (b1, b2) <- (,) <$> unify (Variable v1) (Variable v2) <*> unify t1 t2
(a, b) ++ b) ++) <$> unify b1 b2 ((a
```

There are two interesting cases: Unification between some term and a pi abstraction, and unification between two variables.

```
@(Variable a) tb@(Variable b)
unify ta| a == b = pure []
| otherwise = do
<- (,) <$> lookupType a <*> lookupType b
(x, y) case (x, y) of
Just _, Just _) -> do
(<- equalTypesC ta tb
ca <- equalC ta tb
cb pure (ca ++ cb)
Just x', Nothing) -> instanceC b x'
(Nothing, Just x') -> instanceC a x'
(Nothing, Nothing) -> instanceC a (Variable b) (
```

If the variables are syntactically the same, then we’re done, and no constraints have to be generated (Technically you could generate an entirely trivial equality constraint, but this puts unnecessary pressure on the solver).

If either variable has a known type, then we generate an instance constraint between the unknown variable and the known one.

If both variables have a value, we equate their types’ types and their types. This is done mostly for error messages’ sakes, seeing as if two values are propositionally equal, so are their types.

Unification between a term and a $\prod$-abstraction is the most
interesting case: We check that the $\prod$ type abstracts over a type
(i.e., it corresponds to a System F $\forall$ instead of a System
F $\to$), and *instance* the $\prod$ with a fresh type variable.

```
= do
unifyPi v1 t1 b1 a id <- refresh Irrelevant
<- isSetC t1
ss <- subst [(v1, Variable id)] b1
pi' ++ ss) <$> unify a pi'
(
Pi v1 t1 b1) = unifyPi v1 t1 b1 a
unify a (Pi v1 t1 b1) a = unifyPi v1 t1 b1 a unify (
```

#### Solving

Solving is a recursive function of the list of constraints (a catamorphism!) with some additional state: Namely, a strict map of already-performed substitutions. Let’s work through the cases in reverse order of complexity (and, interestingly, reverse order of how they’re in the source code).

##### No constraints

Solving an empty list of constraints is entirely trivial.

`= pure () solveInner _ [] `

`IsSet`

We infer the index of the universe of the given type, much like in the inferrence case for $\prod$-abstractions, and check the remaining constraints.

```
map (IsSet t:xs) = do
solveInner <- inferSet t
_ map xs solveInner
```

`EqualTypes`

We infer the types of both provided values, and generate an equality constraint.

```
map (EqualTypes a b:xs) = do
solveInner <- infer a
ta <- infer b
tb map (Equal ta tb:xs) solveInner
```

`Equal`

We merely have to check for syntactic equality of the (normal forms of) terms, because the hard lifting of destructuring and lining up was done by the unification engine.

```
map (Equal a b:xs) = do
solveInner <- N.normalise a
a' <- N.normalise b
b' <- equal a' b'
eq if eq
then solveInner map xs
else throwError (NotEqual a b)
```

`Instance`

If the variable we’re instancing is already in the map, and the thing
we’re instancing it to *now* is not the same as before, we have an
inconsistent set of substitutions and must error.

```
map (Instance a b:xs)
solveInner | a `M.member` map
/= map M.! a
, b Irrelevant /= a
, = throwError $ InconsistentSubsts (a, b) (map M.! a)
```

Otherwise, if we have a coherent set of instances, we add the instance both to scope and to our local state map and continue checking.

```
| otherwise =
$
local (insertType (a, b)) map) xs solveInner (M.insert a b
```

Now that we have both `unify`

and `solve`

, we can write
`assertEquality`

: We unify the two types, and then try to solve the set
of constraints.

```
= do
assertEquality t1 t2 <- unify t1 t2
cs solve cs
```

The real implementation will catch and re-throw any errors raised by
`solve`

to add appropriate context, and that’s not the only case where
“real implementation” and “blag implementation” differ.

### Conclusion

Wow, that was a lot of writing. This conclusion begins on exactly the
500th line of the Markdown source of this article, and this is the
longest article on this blag (by far). However, that’s not to say it’s
bad: It was amazing to write, and writing `dtt`

was also amazing. I am
not good at conclusions.

`dtt`

is available under the BSD 3-clause licence, though I must warn
you that the source code hasn’t many comments.

I hope you learned nearly as much as I did writing this by reading it.

See System U, also Girard’s paradox - the type theory equivalent of Russell’s paradox.↩︎