Word count: 1371

Dependent types are a very useful feature - the gold standard of enforcing invariants at compile time. However, they are still very much not practical, especially considering inference for unrestricted dependent types is equivalent to higher-order unification, which was proven to be undecidable.

Fortunately, many of the benefits that dependent types bring aren’t
because of dependent products themselves, but instead because of
associated features commonly present in those programming languages. One
of these, which also happens to be especially easy to mimic, are
*inductive families*, a generalisation of inductive data types: instead
of defining a single type inductively, one defines an entire *family* of
related types.

Many use cases for inductive families are actually instances of a rather less general concept, that of generalised algebraic data types, or GADTs: Contrary to the indexed data types of full dependently typed languages, these can and are implemented in several languages with extensive inference, such as Haskell, OCaml and, now, Amulet.

Before I can talk about their implementation, I am legally obligated to
present the example of *length indexed vectors*, linked structures whose
size is known at compile time—instead of carrying around an integer
representing the number of elements, it is represented in the type-level
by a Peano^{1} natural number, as an *index* to the vector type. By
universally quantifying over the index, we can guarantee by
parametricity^{2} that functions operating on these don’t do inappropriate
things to the sizes of vectors.

```
type z ;;
type s 'k ;;
type vect 'n 'a =
| Nil : vect z 'a | Cons : 'a * vect 'k 'a -> vect (s 'k) 'a
```

Since the argument `'n`

to `vect`

(its length) varies with the constructor one
chooses, we call it an *index*; On the other hand, `'a`

, being uniform over all
constructors, is called a *parameter* (because the type is *parametric* over
the choice of `'a`

). These definitions bake the measure of length into
the type of vectors: an empty vector has length 0, and adding an element
to the front of some other vector increases the length by 1.

Matching on a vector reveals its index: in the `Nil`

case, it’s possible
to (locally) conclude that it had length `z`

. Meanwhile, the `Cons`

case
lets us infer that the length was the successor of some other natural
number, `s 'k`

, and that the tail itself has length `'k`

.

If one were to write a function to `map`

a function over a `vect`

or,
they would be bound by the type system to write a correct implementation
- well, either that or going out of their way to make a bogus one. It
would be possible to enforce total correctness of a function such as
this one, by adding linear types and making the vector parameter linear.

```
let map (f : 'a -> 'b) (xs : vect 'n 'a) : vect 'n 'b =
match xs with
| Nil -> Nil | Cons (x, xs) -> Cons (f x, map f xs) ;;
```

If we were to, say, duplicate every element in the list, an error would be reported. Unlike some others, this one is not very clear, and it definitely could be improved.

```
Occurs check: The type variable jx
occurs in the type s 'jx
· Arising from use of the expression
Cons (f x, Cons (f x, map f xs))
│
33 │ | Cons (x, xs) -> Cons (f x, Cons (f x, map f xs)) ;;
│ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

This highlights the essence of GADTs: pattern matching on them reveals
equalities about types that the solver can later exploit. This is what
allows the programmer to write functions that vary their return types
based on their inputs - a very limited form of type-term dependency,
which brings us ever closer to the Calculus of Constructions corner of
Barendregt’s lambda cube^{3}.

The addition of generalised algebraic data types has been in planning for over two years—it was in the original design document. In a mission that not even my collaborator noticed, all of the recently-added type system and IR features were directed towards enabling the GADT work: bidirectional type checking, rank-N polymorphism and coercions.

All of these features had cover stories: higher-ranked polymorphism was motivated by monadic regions; bidirectional type checking was motivated by the aforementioned polymorphism; and coercions were motivated by newtype optimisation. But, in reality, it was a conspiracy to make GADTs possible: having support for these features simplified implementing our most recent form of fancy types, and while adding all of these in one go would be possible, doing it incrementally was a lot saner.

While neither higher-ranked types nor GADTs technically demand a
bidirectional type system, implementing them with such a specification
is considerably easier, removing the need for workarounds such as boxy
types and a distinction between rigid/wobbly type variables. Our
algorithm for GADT inference rather resembles Richard Eisenberg’s
Bake^{4}, in that it only uses local equalities in *checking*
mode.

Adding GADTs also lead directly to a rewrite of the solver, which now
has to work with *implication constraints*, of the form `(Q₁, ..., Qₙ) => Q`

, which should be read as “Assuming `Q₁`

through `Qₙ`

, conclude
`Q`

.” Pattern matching on generalised constructors, in checking mode,
captures every constraint generated by checking the right-hand side of a
clause and captures that as an implication constraint, with all the
constructor-bound equalities as assumptions. As an example, this lets us
write a type-safe cast function:

```
type eq 'a 'b = Refl : eq 'a 'a
(* an inhabitant of eq 'a 'b is a proof that 'a and 'b are equal *)
let subst (Refl : eq 'a 'b) (x : 'a) : 'b = x ;;
```

Unfortunately, to keep inference decidable, many functions that depend on generalised pattern matching need explicit type annotations, to guide the type checker.

When *checking* the body of the function, namely the variable reference
`x`

, the solver is working under an assumption `'a ~ 'b`

(i.e., `'a`

and
`'b`

stand for the same type), which lets us unify the stated type of
`x`

, namely `'a`

, with the return type of the function, `'b`

.

If we remove the local assumption, say, by not matching on
`Refl`

, the solver will not be allowed to unify the two type
variables `'a`

and `'b`

, and an error message will be reported^{5}:

```
examples/gadt/equality.ml[11:43 ..11:43]: error
Can not unify rigid type variable b with the rigid type variable a
· Note: the variable b was rigidified because of a type ascription
against the type forall 'a 'b. t 'a 'b -> 'a -> 'b
and is represented by the constant bq
· Note: the rigid type variable a, in turn,
was rigidified because of a type ascription
against the type forall 'a 'b. t 'a 'b -> 'a -> 'b
· Arising from use of the expression
x
│
11 │ let subst (_ : t 'a 'b) (x : 'a) : 'b = x ;;
│ ~
```

Our intermediate language was also extended, from a straightforward
System F-like lambda calculus with type abstractions and applications,
to a System F_{C}-like system with *coercions*, *casts*, and
*coercion abstraction*. Coercions are the evidence, produced by the
solver, that an expression is usable as a given type—GADT patterns
bind coercions like these, which are the “reification” of an implication
constraint. This lets us make type-checking on the intermediate language
fast and decidable^{6}, as a useful sanity check.

The two new judgements for GADT inference correspond directly to new
cases in the `infer`

and `check`

functions, the latter of which I
present here for completeness. The simplicity of this change serves as
concrete evidence of the claim that bidirectional systems extend readily
to new, complex features, producing maintainable and readable code.

```
Match t ps a) ty = do
check (<- infer t
(t, tt) <- for ps $ \(p, e) -> do
ps <- checkPattern p tt
(p', ms, cs) let tvs = Set.map unTvName (boundTvs p' ms)
<$> implies (Arm p e) tt cs
(p',) %~ Set.union tvs)
(local (typeVars
(extendMany ms (check e ty)))pure (Match t ps (a, ty))
```

This corresponds to the checking judgement for matches, presented below.
Note that in my (rather informal) theoretical presentation of Amulet
typing judgements, we present implication constraints as a lexical scope
of equalities conjoined with the scope of variables; Inference
judgements (with double right arrows, $\Rightarrow$) correspond to uses of
`infer`

, pattern checking judgements ($\Leftarrow_\text{pat}$)
correspond to `checkPattern`

, which also doubles as $\mathtt{binds}$ and
$\mathtt{cons}$, and the main checking judgement $\Leftarrow$ is the
function `check`

.

$\frac{\Gamma; \mathscr{Q} \vdash e \Rightarrow \tau \quad \Gamma \vdash p_i \Leftarrow_\text{pat} \tau \quad \Gamma, \mathtt{binds}(p_i); \mathscr{Q}, \mathtt{cons}(p_i) \vdash e_i \Leftarrow \sigma} {\Gamma; \mathscr{Q} \vdash \mathtt{match}\ e\ \mathtt{with}\ \{p_i \to e_i\} \Leftarrow \sigma}$

Our implementation of the type checker is a bit more complex, because it also does (some) elaboration and bookkeeping: tagging terms with types, blaming type errors correctly, etc.

This new, complicated feature was a lot harder to implement than
originally expected, but in the end it worked out. GADTs let us make the
type system *stronger*, while maintaining the decidable inference that
the non-fancy subset of the language enjoys.

The example presented here was the most boring one possible, mostly because two weeks ago I wrote about their impact on the language’s ability to make things safer.

Peano naturals are one particular formulation of the natural numbers, which postulates that zero (denoted

`z`

above) is a natural number, and any natural number’s successor (denoted`s 'k`

above) is itself natural.↩︎This is one application of Philip Wadler’s Theorems for Free technique: given a (polymorphic) type of some function, we can derive much of its behaviour.↩︎

Amulet is currently somewhere on the edge between λ2 - the second order lambda calculus, System F, and λP2, a system that allows quantification over types and terms using the dependent product form, which subsumes both the ∀ binder and the → arrow. Our lack of type functions currently leaves us very far from the CoC.↩︎

See his thesis. Our algorithm, of course, has the huge simplification of not having to deal with full dependent types.↩︎

And quite a good one, if I do say so! The compiler syntax highlights and pretty-prints both terms and types relevant to the error, as you can see here.↩︎

Even if we don’t do it yet—work is still ongoing to make the type checker and solver sane.↩︎