Posted on October 30, 2020
Word count: 3518

When shopping for a dependent type theory, many factors should be taken into consideration: how inductive data is represented (inductive schemas vs W-types), how inductive data computes (eliminators vs case trees), how types of types are represented (universes à la Tarski vs à la Russell). However, the most important is their treatment of equality.

Conor McBride, a prominent figure in type theory research, noted in a reddit comment that you should never trust a type theorist who has not changed their mind about equality (I’m paraphrasing). Recently, I’ve embarked on a journey to improve my credibility (or, at least, get out of the “instantly discarded opinion” pool): I’ve changed my mind about equality.

What’s the fuss about?

Equality is very prevalent when using dependently typed languages, whether as frameworks for writing mathematical proofs or for writing verified computer programs. Most properties of mathematical operators are expressed as equalities which they should respect. For example, a semigroup is a set $S$ with an associative $\times$ operator. The property of associativity just means the operator respects the equality $a \times (b \times c) \equiv (a \times b) \times c$ !

However, the equality relation which can be expressed with proofs is not the only equality which a dependently-typed system needs to consider. There’s also the judgemental equality, that is, which terms are always identified independently of their semantics. For example, $a + b$ and $b + a$ are propositionally equal, which can be shown by writing an inductive proof of this fact, but they’re judgmentally different, because they have different term structure.

There are two ‘main’ traditions of Martin-Löf dependent type theory: the intensional type theories are as in the paragraph above, but the extensional type theories make me a liar. Extensional type theory is obtained by adding a rule of equality reflection, which collapses the distinction between propositional and judgemental equality: whenever there exists a proof of an equality between terms $x$ and $y$ , they’re considered judgmentally equal.

Adding equality reflection makes a type system more expressive with respect to equality: for example, in extensional type theories, one can derive the rule of function extensionality, which says two functions are equal when they are equal pointwise. However, equality reflection also makes a type system less expressive with respect to equality: there is only one way for two things to be equal!

Moreover, equality reflection complicates type checking to an unacceptable degree. Rather than being able to check the validity of a proof by comparing a term against a known type, an entire typing derivation is necessary as input to the type checker. To see why, consider the following derivation:

$\cfrac{ \cfrac{ \cfrac{}{\mathop{\mathop{\vdash}} (+) \mathop{:} \mathbb{N} \to \mathbb{N} \to \mathbb{N}} \quad \cfrac{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \mathtt{"foo"} \mathop{:} \text{Str} \quad \cfrac{ \cfrac{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \text{no} \mathop{:} \bot }{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \mathrm{absurd}(\text{no}) \mathop{:} \text{Str} \equiv \mathbb{N} } }{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \text{Str} = \mathbb{N} } }{ \text{no} \mathop{:} \bot\ \mathop{\mathop{\vdash}} \mathtt{"foo"} \mathop{:} \mathbb{N} } \quad \cfrac{}{\mathop{\mathop{\vdash}} 2 \mathop{:} \mathbb{N}} }{ \text{no} \mathop{:} \bot \mathop{\mathop{\vdash}} (2 + \mathtt{"foo"}) \mathop{:} \mathbb{N} } }{ \mathop{\mathop{\vdash}} \lambda \text{no} \to (2 + \mathtt{"foo"}) : \bot \to \mathbb{N} }$

If the context contains an element of the empty type, every term needs to be accepted

Here, the context contains an element of the empty type, written $\bot$ . It’s comparatively easy for the type checker to see that in this case, the context we’re working under is absurd, and any equality should be accepted. However, there are many such empty types: $\mathrm{Fin}(0)$ , for example. Consider the type family $\mathrm{SAT}$ , indexed by a Boolean clause, such that $\mathrm{SAT}(c)$ reduces to $\top$ when $c$ is satisfiable and $\bot$ otherwise. How would you type-check the program $\lambda x \to 2 + \mathtt{"foo"}$ at type $\mathrm{SAT}(c) \to \mathbb{N}$ , where $c$ is some adversarially-chosen, arbitrarily complex expression? How would you type check it at type $\mathrm{Halts}(m) \to \mathbb{N}$ ?

In contrast, Intensional Type Theory, or ITT for short, treats equality as if it were any other type. In ITT, equality is inductively generated by the constructor $\mathrm{refl}_x$ for any element $x$ of a type $A$ , which leads to an induction principle saying that, if

$A$ is a type, and
$C$ is a proposition indexed by $x, y : A$ and $p : x \equiv_{A} y$ , and
$p_{\mathrm{refl}}$ is a proof of $C(x, x, \mathrm{refl_{x}})$ , then

we can deduce, for all $x, y : A$ and $p : x \equiv_{A} y$ , that $C(x, y, p)$ holds.

Given this operator, generally called axiom J (since “J” is the letter after Identity, another word for “equality”), we can derive many of the properties of equality: transitivity (given $x \equiv y$ and $y \equiv z$ , get $x \equiv z$ ) and symmetry (given $x \equiv y$ , get $y \equiv x$ ) make $\equiv$ an equivalence relation, and substitutivity (assuming $P(x)$ and $x \equiv y$ , get $P(y)$ ) justifies calling it “equality”.

However, axiom J is both weaker and stronger than equality reflection: for one, it doesn’t let us prove that functions are equal when they are pointwise equal, which leads to several complications. However, it also doesn’t let us prove that all equalities are equal to $\mathrm{refl}$ , which lets us strengthen equality by postulating, for example,

The univalence axiom of Voevodsky

The “tagline” for the univalence axiom, which lies at the center of Homotopy Type Theory (HoTT), is that “equality is equivalent to equivalence”. More specifically, given a function $f : A \to B$ , together with a proof that $f$ has left and right inverses, univalence gives us an equality $\mathrm{ua}(f)$ such that transporting “along” this path is the same as applying $f$ .

For example, we could define the “usual” unary naturals $\mathbb{N}$ (which are easy to use for proving but terrible computationally) and binary naturals $\mathbb{N}_2$ (which have more efficient computational behaviour at the cost of more complicated structure), demonstrate an equivalence $\mathrm{peano2Binary} : \mathbb{N} \cong \mathbb{N}_2$ , then transport proofs about $\mathbb{N}$ to proofs about $\mathbb{N}_2$ !

A recent paper by Tabareau et al explores the consequences of strengthening a type theory with univalence together with parametricity, unlocking efficient and automated (given the equivalence) transport between algebraic structures of types.

A note on terminology

The HoTT interpretation of “types as topological spaces” leads us to interpret the type of equalities as the type of paths in a space. From this, we get some terminology: instead of saying “cast $x$ with the equality $e$ ”, we can equivalently say “transport $x$ along the path $p$ ”.

Using the “path” terminology in a context with unicity of equality proofs is a bit misleading, but the terminology does not break down (a set-truncated system is just one where all paths are loops). Because of this, I’ll use “path” and “equality” interchangeably. Sorry if this causes any confusion.

The undecidability of type checking ETT and the comparative weakness of ITT has led many researchers to consider the following question:

Can we have:

Decidable type checking

A “more extensional” equality

Good computational behaviour

All at the same time?

Turns out, the answer is yes!

Observational Equality

One early positive answer is that of Observational Type Theory, presented in a 2007 paper by Altenkirch, McBride, and Swierstra. The basic idea is that, instead of defining the type $a \equiv_{A} b$ as an inductive family of types, we define it so that equality computes on the structure of $A$ , $a$ and $b$ to reduce to tractable forms. Observational type theory (OTT, from now on) has two universes (types whose elements are types), $\mathrm{Prop}$ and $\mathrm{Set}$ , such that $\mathrm{Prop} : \mathrm{Set}$ and $\mathrm{Set} : \mathrm{Set}$ ¹

The elements of $\mathrm{Prop}$ are taken to be propositions, not in the sense of propositions-as-types, but in the sense of HoTT. Propositions are the types $T$ for which, given $x, y : T$ , one has $x \equiv y$ : they’re types with at most one element. Some propositions are $\top$ , the trivially true proposition, and $\bot$ , the empty proposition.

Given $A : u$ , and some $B : v$ with one variable $x : A$ free (where $u$ and $v$ are possibly distinct universes), we can form the type $\prod_{x : A} B$ of dependent products from $A$ to $B$ . If $v$ is in $\mathrm{Prop}$ , then the dependent product also lives in $\mathrm{Prop}$ . Otherwise, it lives in $\mathrm{Set}$ . Moreover, we can also form the type $\sum_{x : A} B$ of dependent sums of $A$ and $B$ , which always lives in $\mathrm{Set}$ .

Given a type $A$ and two elements $a, b : A$ , we can form the proposition $a \equiv_{A} b$ . Note the emphasis on the word proposition here! Since equivalence is a proposition, we have uniqueness of equality proofs by definition: there’s at most one way for things to be equal, conflicting with univalence. So we get some extensionality, namely of functions, but not for arbitrary types. Given types $A$ and $B$ , a proof $p : A \equiv B$ and $x : A$ , we have the term $\mathrm{coe}(A, B, p, x) : B$ , which represents the coercion of $x$ along the path $p$ .

Here is where my presentation of observational equality starts to differentiate from the paper’s: McBride et al use heterogeneous equality, i.e. a 4-place relation $(x : A) \equiv (y : B)$ , where $A$ and $B$ are potentially distinct types. But! Their system only allows you to use an equality when $A \equiv B$ . The main motivation for heterogeneous equality is to “bunch up” as many equalities as possible to be eliminated all in one go, since coercion in their system does not compute. However, if coercion computes normally, then we don’t need to (and, in fact, can’t) do this “bunching”: one just uses coercion normally.

The key idea of OTT is to identify as “equal” objects which support the same observations: for functions, observation is application; for pairs, it’s projection, etc. This is achieved by making the definition of equality acts as a “pattern matching function” on the structure of terms and types. For example, there is a rule which says an equality between functions is a function that returns equalities:

$\cfrac{}{f \equiv_{(x : A) \to B(x)} g \longrightarrow (x : A) \to (f\ x) \equiv_{B(x)} (g\ x)}$

Equality of functions is extensional by definition

So, not only do we have a term funext of type ((x : A) → f x == g x) → f == g but one with a stronger type, namely ((x : A) → f x == g x) == (f == g), and that term is.. refl!

OTT is appropriate, in my opinion, for doing set-level mathematics: where types have no “interesting” equality structure. However, it breaks down at higher h-levels, where there is interesting structure to be found in the equalities between elements of types. This is because OTT, by placing the $\equiv$ type in its universe of propositions, validates the principle of uniqueness of identity proofs, which says any two proofs of the same equality are themselves equal. UIP conflicts with the univalence axiom of Homotopy Type Theory, by (to use the HoTT terminology) saying that all types are sets.

Theorem. Suppose there exists one universe, $\mathscr{U}$ , which contains the type of Booleans. Assuming univalence, the type $a \equiv_{\mathscr{U}} b$ is not a proposition.

Proof. The function $\mathrm{not} : 2 \to 2$ , which maps $\mathtt{tt}$ to $\mathtt{ff}$ and vice-versa, is an equivalence, being its own left and right inverses. Thus, by univalence, we have a path $\mathrm{ua}(\mathrm{not}) : 2 \equiv_{\mathscr{U}} 2$ .

To see that this path is different from $\mathrm{refl}_2$ , consider its behaviour with respect to transport: $\mathrm{transp}(\mathrm{refl}_2, \mathtt{tt}) \equiv tt$ but $\mathrm{transp}(\mathrm{ua}(\mathrm{not}), \mathtt{tt}) \equiv \mathtt{ff}$ . Since $\mathtt{tt}$ is different from $\mathtt{ff}$ , it follows that $\mathrm{ua}(\mathrm{not})$ is different from $\mathrm{refl}_2$ . $\blacksquare$

So, if having univalence is desirable, OTT is off the table. However, the previous example of transporting proofs between equivalent types of natural numbers might not have convinced you that HoTT is indeed an interesting field of study, and univalence might seem mostly like a novelty, a shiny thing to pursue for its own sake (it certainly did to me, at first). So why is HoTT interesting?

HoTT

Between 2012 and 2013, a special year of research took place in the Institute for Advanced Studies to develop a type theory that can be used as a foundation for mathematics “at large”. Their result: the book Homotopy Type Theory: Univalent Foundations for Mathematics. The IAS Special Year on Univalent Foundations would have been interesting even if it hadn’t started a new branch of type theory, just from the participants: Thierry Coquand, Thorsten Altenkirch, Andrej Bauer, Per Martin-Löf and, of course, the late Vladimir Voevodsky.

HoTT’s main contribution to the field, in my (lay) opinion, is the interpretation of types as spaces: by interpreting types as homotopy theoretical spaces, a semantics for types with more interesting “equality structure”, so to speak, arises. In the “classical” intensional type theory of Agda and friends, and indeed the theory of OTT commented on above, equality is a proposition. Agda (without the --without-K option) accepts the following proof:

uip : (A : Set) (x y : A) (p q : x ≡ y) → p ≡ q
uip A x .x refl refl = refl

Apart from ruling out univalence, UIP rules out another very interesting class of types which can be found in HoTT: the higher inductive types, which contain constructors for equalities as well as for values.

The simplest higher inductive type is the interval, which has two “endpoints” (i0 and i1) and a path between them:

data I : Type where
  i0 i1 : I
  seg   : i0 ≡ i1

The names $i_0$ , $i_1$ and $\mathrm{seg}$ were chosen to remind the reader of a line segment between a pair of points. Therefore, we may represent the type $\mathbb{I}$ as a diagram, with discrete points representing $i_0$ and $i_1$ , and a line, labelled $\mathrm{seg}$ , connecting them.

A diagrammatic representation of the type `I`

The real power of the I type comes from its induction principle, which says that, given a proposition $P : \mathbb{I} \to \mathrm{Type}$ , if:

There exists a pair of proofs $pi_0 : P(i_0)$ and $pi_1 : P(i_1)$ , and
$pi_0$ and $pi_1$ are equal “with respect to” the path $\mathrm{seg}$ , then

$P$ holds for every element of the interval. However, that second constraint, which relates the two proofs, needs a bit of explaining. What does it mean for two elements to be equal with respect to a path? Well, the “obvious” interpretation would be that we require a proof $p\mathrm{seg} : pi_0 \equiv pi_1$ . However, this isn’t well-typed! The type of $pi_0$ is $P(i_0)$ and the type of $pi_1$ is $P(i_1)$ , so we need a way to make them equal.

This is where the path $\mathrm{seg}$ comes in to save the day. Since it states $i_0$ and $i_1$ are equal, we can transport the proof $pi_0$ along the path $\mathrm{seg}$ to get an inhabitant of the type $P(i_1)$ , which makes our desired equality $pi_0$ between $pi_1$ with respect to $\mathrm{seg}$ come out as $\mathrm{transp}(P, \mathrm{seg}, pi_0) \equiv pi_1$ , which is well-typed.

That’s fine, I hear you say, but there is a question: how is the interval type useful? It certainly looks as though if it were useless, considering it’s just got one element pretending to be two! However, using the interval higher inductive type, we can actually prove functional extensionality, the principle that says two functions are equal when they’re equal pointwise, everywhere. The proof below can be found, in a more presentable manner, in the HoTT book, section 6.3.

Theorem. If $f$ and $g$ are two functions between $A$ and $B$ such that $f(x) \equiv g(x)$ for all elements $x : A$ , then $f \equiv g$ .

Proof. Call the proof we were given $p : \prod_{(x : A)} f(x) \equiv g(x)$ We define, for all $x : A$ , the function $p^\prime_{x} : \mathbb{I} \to B$ by induction on $I$ . Let $p^\prime_{x}(i_0) = f(x)$ , $p^\prime_{x}(i_1) = g(x)$ . The equality between these terms is given by $p^\prime_{x}(\mathrm{seg}) = p(x)$ .

Now, define $q : \mathbb{I} \to A \to B$ by $q(i) = \lambda x. p^\prime_x(i)$ . We have that $q(i_0)$ is the function $\lambda x. p^\prime_{x}(i_0)$ , which is defined to be $\lambda x. f(x)$ , which is $\eta$ -equal to $f$ . Similarly, $q(i_1)$ is equal to $g$ , and thus, $q(\mathrm{seg}) : f \equiv g$ . $\blacksquare$

Isn’t that cool? By augmenting our inductive types with the ability to additionally specify equalities between elements, we get a proof of function extensionality! I think that’s pretty cool. Of course, if HITs were limited to structures like the interval, spheres, and other abstract mathematical things, they wouldn’t be very interesting for programmers. However, the ability to endow types with additional equalities is also useful when doing down-to-earth programming! A 2017 paper by Basold et al explores three applications of HITs to programming, in addition to containing an accessible introduction to HoTT in general.

Another very general higher inductive type, one that might be more obviously useful, is the general type of quotients. Whenever $A$ is a type and $R$ is a binary, propositional relation between members of $A$ , we can form the quotient type $A/R$ , which is given by the following constructors²:

$\mathrm{intro}$ says that, for each $x : A$ , we can make an element of $A/R$ , and
$\mathrm{quot}$ which gives, for each $x, y : A$ which are related by $R$ , an equality between $\mathrm{intro}(x)$ and $\mathrm{intro}(y)$ .

The induction principle for quotients, which is far too complicated to include here (but can be derived mechanically from the specification given above and the knowledge of “equal with respect to some path” from the section on the interval), says roughly that we can pattern-match on $A/R$ if and only the function we’re defining does not distinguish between elements related by $R$ .

This type is very general! For example, given a type of naturals $\mathbb{N}$ and a two-place relation $\mathrm{mod2}$ which holds for numbers congruent modulo 2, we can form the quotient type $\mathbb{N}/\mathrm{mod2}$ of naturals mod 2. Functions (say, $f$ ) defined on this type must then respect the relation that, whenever $x \equiv y \mod 2$ , $f(x) \equiv f(y)$ .

All of this talk about HoTT, and in fact the book itself, though, neglected to mention one thing. What is the computational content of the univalence axiom? What are the reduction rules for matching on higher inductive types? How do we take a proof, written in the language of HoTT, and run it? The Book does not address this, and in fact, it’s still a half-open problem, and has been since 2013. Computing in the presence of all these paths, paths between paths, ad infinitum is mighty complicated.

Cubical TT

The challenge of making a computational HoTT did not stop Cohen et al, who in a 2016 paper presented Cubical Type Theory, which, after three years of active research, provides a way to compute in the presence of univalence. There’s just one problem, though: cubical type theory is hella complicated.

The core idea is simple enough: we extend type theory with a set of names $\mathbb{I}$ , with points $0$ and $1$ and operations $\vee$ , $\wedge$ , and $1 - r$ , which behave like a de Morgan algebra. To represent equalities, the type $\mathrm{Path}\ A\ t\ u$ is introduced, together with a “name abstraction” operation $\langle i \rangle\ t$ and “path application” $t\ r$ , where $r$ is an element of the interval.

$\frac{\Gamma, i : \mathbb{I} \vdash t : A}{\Gamma \vdash \langle i \rangle\ t : \mathrm{Path}\ A\ t[0/i]\ t[1/i]}$

Path formation

$\frac{\Gamma, t : \mathrm{Path}\ A\ a\ b\quad\Gamma \vdash r : \mathbb{I}}{\Gamma \vdash t\ r : A}$

Path elimination

The intuition, the authors say, is that a term with $n$ variables of $\mathbb{I}$ -type free corresponds to an $n$ -dimensional cube.

$\cdot \vdash A : \mathrm{Type}$
$i : \mathbb{I} \vdash A : \mathrm{Type}$
$i, j : \mathbb{I} \vdash A : \mathrm{Type}$

This is about where anyone’s “intuition” for Cubical Type Theory, especially my own, flies out the window. Specifically, using abstraction and the de Morgan algebra on names, we can define operations such as reflexivity (introduced with $\langle i \rangle a : \mathrm{Path}\ A\ a\ a$ ), symmetry ( $\lambda p. \langle i \rangle p\ (1 - i) : \mathrm{Path}\ A\ a\ b \to \mathrm{Path}\ A\ b\ a$ ), congruence, and even function extensionality, which has a delightfully simple proof: $\lambda p. \langle i \rangle\ \lambda x. p\ x\ i$ .

However, to transport along these paths, the paper defines a “face lattice”, which consists of constraints on elements of the interval, uses that to define “systems”, which are arbitrary restrictions of cubes; From systems, one can define “composition”, which compute the lid of an open box (yeah, I don’t get it either), “Kan filling”, and finally, transport. Since the authors give a semantics of Cubical Type Theory in a previously well-established model of cubical sets, I’ll just.. take their word on this.

The Cubical Agda documentation has a section explaining a generalised transport operation transp and the regular transport operation transport. I recommend that you go check that out, since explaining each is beyond my powers. However, this transport operation does let us prove that the J induction principle for equalities also holds for these cubical paths, and from that we can define all of the other nice operations!

Cubical Agda, a proof assistant based on Cubical Type Theory, exports a library which provides all of HoTT’s primitive notions (The identity type, transport, the J rule, function extensionality, univalence), that compute properly. Furthermore, it supports higher inductive types! However, as I understand it, these can not be compiled into an executable program yet. This is because the transp operation, fundamental to computation in the presence of path types, is defined on the structure of the type we’re transporting over, and type structure is not preserved when compiling.

So What?

Even after all this explanation of fancy equality relations, you might still be unconvinced. I certainly was, for a while. But I’d argue that, if you care about typed programming enough to be interested in dependent types at all, you should be interested in, at least, the quotient inductive types of OTT, if not the more general higher inductive types of HoTT.

The reason for this is simple: inductive types let you restrict how elements of a type are formed. Quotient inductive types let you restrict, in a principled way, how elements of a type are used. Whereas in languages without quotients, like Haskell or even non-cubical Agda, one is forced to use the module system to hide the inductive structure of a type if they wish to prevent unauthorized fiddling with structure, in a language with quotients, we can have the type checker enforce, internally, that these invariants are maintained.

Just like quantitative type theory gives programmers a way to reason about resource usage, and efficiently implement mutable structures in a referentially-transparent way, I strongly believe that quotient types, and even univalent parametricity (when we figure out how to compile that) are the “next step forward” in writing reliable software using types as a verification method.

However, dependent types are not the only verification method! Indeed, there are a number of usability problems to solve with dependent types for them to be adopted by the mainstream. A grim fact that everyone who wants reliable software has to face every day is that most programmers are out there using JavaScript, and that the most popular typed language released recently is Go.³

So I leave you, dear reader, with this question: what can we do, as researchers and implementors, to make dependently typed languages more user-friendly? If you are not a researcher, and you have tried a dependently typed language, what are pain points you have encountered? And if you haven’t used a dependently typed language, why not?

With special thanks to

This is a simplified presentation that uses a single, inconsistent universe (Girard’s paradox applies). The actual presentation uses a stratified, predicative hierarchy of $\mathrm{Set}_i$ universes to avoid this paradox.↩︎
The presentation of quotients in the HoTT book also contains a 0-truncation constructor, which has to do with limiting quotients to work on sets only. The details are, IMO, out of scope for this post; So check out section 6.10 of the book to get all of them.↩︎
Please read the italics on this paragraph as derision.↩︎

Reflections on Equality