Word count: 3971

Translating mathematics into code that a proof assistant can understand
is a process that involves a fair bit of friction. There’s quite a big
difference between what humans are willing to suspend disbelief about,
and what a type checker will actually accept without proof. Reducing
this friction is the primary task of the authors of a proof assistant
— but, of course, it would be infeasible to modify the proof assistant
*itself* to cover a new problem domain every time one comes up. To this
end, we provide users with various mechanisms for automation, through
which users can encode their domain-specific knowledge.

Arguably one of the oldest forms of automation, **type classes**
originated in Haskell, all the way back in the 80’s. The idea is simple
enough: we can define *predicates* on types (the **classes**) and
*clauses* for solving those predicates (the **instances**) — the
connection with logic programming is not superficial. By “recording the
execution” of this logic program, we can get the elaborator to
automatically fill in terms, according to rules we specify.

In Agda, **instance search** has historically been a pretty simple
process: like GHC, we do not support *backtracking* when looking for a
given instance. Moreover, the implementation itself has cut some serious
corners. It works, but it’s not particularly clever, or particularly
*efficient*. This post details the first step taken in our journey to
bring the implementation of instance search up to parity with what users
of proof assistants expect in the 21st century.

# An example

To start with, let’s actually present an instance (ha) of the problem,
so that we have some code to refer back to. When doing mathematics (or
computation!) in a proof assistant, we often care about *decidability*:
for a particular proposition `P`

, do we have a program that can tell us
whether P holds or fails? Keeping track of decidability is valuable even
when working in classical mathematics, since the automation *around* a
proof must obviously be executable — and *it* will use certain cases
of decidability, e.g., for looking things up in a list. It’s also
*really boring*, and generally extremely formulaic, to keep track of.

Instance search is the perfect solution for automating the boring cases
of proving things are decidable — when we can put together a decision
procedure simply by the *syntax* of the type we’re asking about. A basic
setup would look something like this:

```
data Dec {l} (A : Type l) : Type l where
: A → Dec A
yes : ¬ A → Dec A
no
instance -- The implementation of each instance is immaterial
: Dec ⊤
DecYes : Dec ⊥ DecNo
```

Agda allows us to do instance search for any data or record type, so we
can directly use `Dec`

as a “type class”. The two definitions in the
`instance`

block are the two trivial cases of putting together a
decision procedure: the unit type always holds, and the empty type
always fails. We can then *use* these instances by wrapping certain
function arguments in double-braces: `⦃ ... ⦄`

, or the ASCII `{{ ... }}`

. A simple use-case for the setup above is a function that calls the
decision procedure for a type, and, if it succeeds, produces a proof:

```
: ∀ {l} {A : Type l} → Dec A → Type
is-yes (yes _) = ⊤
is-yes (no _) = ⊥
is-yes
: ∀ {l} {A : Type l} ⦃ prf : Dec A ⦄ {_ : is-yes prf} → A
decide = p
decide ⦃ yes p ⦄
_ : ⊤
_ = decide -- boring!
```

Instance search gets more interesting in general when we consider
instances *with instance arguments*. In our example, they allow us to
implement decision procedures for type *formers*, where decidability of
the overall thing boils down to decidability of components. For example,
decidability is closed under negation and formation of product types:

```
instance
: ⦃ _ : Dec A ⦄ → Dec (¬ A)
DecNot : ⦃ _ : Dec A ⦄ ⦃ _ : Dec B ⦄ → Dec (A × B) DecProd
```

A particularly common class of decision procedures are those for
*equality*. This is the type-theoretic version of Haskell’s `Eq`

class, and, correspondingly, it shows up quite a lot when writing
metaprograms. We can also use instance search to handle decidable
equality — the code below sets up some more instances. It also
demonstrates that we can use instance search for anything that
*evaluates to* a data (or record) type, not just applications of those
literal types.

```
: ∀ {l} → Type l → Type l
Discrete = {x y : A} → Dec (x ≡ y)
Discrete A
instance
: Discrete Nat
DiscreteNat : Discrete Int
DiscreteInt : ⦃ _ : Discrete A ⦄ ⦃ _ : Discrete B ⦄ → Discrete (A × B)
DiscreteProd
_ : ¬ (2 + 2 ≡ 5)
_ = decide -- slightly more interesting
```

## Detailing the problem

To discuss Agda’s actual instance search algorithm, I think it’ll probably be for the best if I write out the terms of art that will come up.

The algorithm we’re discussing is

**instance search**. We’ll refer to the problems it solve as**instance goals**, since the term “instance constraint” might get conflated with the instance*context*, defined below, and “`FindInstance`

constraint” is a bit verbose.An

**instance**is a declaration annotated with the`instance`

keyword. Following GHC’s terminology, we will split the type of each instance into a**head**and a**context**:`instance Discrete-List: ∀ {l} {A : Type l} ⦃ da : Discrete A ⦄ -- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the context → Discrete (List A) -- ^^^^^^^^^^^^^^^^^~~~~~~~~~~~~~~~~~~~~~ the head`

The head furthermore decomposes as a

**head symbol**and a**spine**. The head symbol is the name of the “type class” we’re looking for: it could be a data type, a record type, or a postulate. The spine are the arguments for the head symbol. For`Discrete-List`

, the head symbol is, since`Dec`

`Discrete`

is a definition and can reduce.The

**instance table**is some data structure that, given the type of an instance goal, produces a list of all instances that could match that goal.A

**candidate**for an instance goal is either an instance with matching head symbol, or a local variable with`⦃...⦄`

visiblity drawn from the context.For example, in the body of

`Discrete-List`

, the possible candidates for a`⊢ Dec ?0`

goal would include all the instances (like`DecYes`

), but also the local variable`da`

.

Once we’re given an instance goal, with an identifiable head symbol, we can actually try to solve it. The procedure is simple:

The first thing to do is to verify that instance search even applies to this goal. This is a basic check on the shape of the type: it must be of the form

`∀ {X Y Z} ⦃ _ : C X ⦄ ⦃ _ : D Y ⦄ → T x y z`

where

`T`

is one of those head symbols we mentioned before — a data type, a record type, or a postulate.We then build a list of initial candidates, based on the definitions in the instance table and the variables in the context.

The list of initial candidates is an

*over*approximation to the set of all possible solutions — it’s possible, at this step, that some candidates are ill-typed.We

*narrow*those initial candidates to those that could actually solve the instance goal. This is the most complicated — and most expensive! — step.A candidate could be rejected by a variety of reasons:

- It might be in the instance table despite not being in scope.
- It might under be in scope only under a qualified name, and the user has asked us not to use qualified names for instances.
- It might
*not even be type-correct*, since we consider all the local`⦃...⦄`

-variables as candidates;

However, unlike in Lean (but like in GHC), this narrowing does not do any

*backtracking*. That is, when considering whether a candidate might solve a goal, we only consider its*head*, and not its context.If we’re left with a single candidate, the instance goal is solved, and we’re technically done. If there are

*no*candidates, that’s an immediate failure.However, if there are

*multiple*candidates, we don’t immediately report a failure. The type of the instance goal might be insufficiently determined, e.g. because of a metavariable, so we set this particular instance goal aside to revisit later. The second time we try it, if there’s still overlap, we report the failure to the user.

In most cases, the expensive part of executing a logic program *is the
backtracking*: the combinatorial explosion in cases to consider requires
complicated algorithms to deal with. It might then be surprising to
learn that Agda’s instance search can often be *quite* slow!

The problem is that discarding the ill-typed instance candidates in the
narrowing step can be quite expensive. Essentially, it’s a recursive
invocation of the type checker, with all of *its* complexity. If the set
of initial candidates is big, we may spend quite a long time just in the
narrowing phase! To motivate this complexity, note that Agda *accepts*
the following program:

```
data T : Type where
: T
foo bar
: Bool → T
g = foo
g false = bar
g true
postulate
: T → Type
Cls instance Inst : ∀ {x} → Cls (g x)
: ⦃ _ : Cls foo ⦄ → Type
t1 = ⊤
t1 ⦃ x ⦄
: t1
t2 = tt t2
```

To know that `t2 : t1`

is justfied, we have to know that `Inst`

— the
only possible candidate, in this case — actually checks at the goal
type `Cls foo`

. In this case, that involves the reduction of `g`

*and*
applying injectivity. While it’s not a lot of work here, it’s easy to
imagine situations where Agda must do quite a bit of work before it can
*discard* a candidate.

## The jumpscare

In the vocabulary, the instance *table* was left purposefully
underspecified. That’s because it’s secretly the star of today’s show!
It’s also because it would have given away the ending way too soon:
foreshadowing, not spoilers. And now, the shocking conclusion!

*Up through Agda 2.6.4.3, the instance table is only indexed by the head
symbol.*

You read that right! See for yourself, if you find it hard to believe.
This means that, *at each instance goal*, we must discard in turn *every
single instance of that class*. And, just to stress this point,
discarding instance candidates can involve potentially unbounded work.
Yeesh.

It’s not a secret that this is something we needed to improve; we’ve known this for a while. In Jesper’s blog post about the improvements they would like to see in 2024, it was point 4. So why did it take so long to solve?

I think the reason is essentially lack of shininess. This is a pure
engineering problem. The heading in Jesper’s blog post
*includes the solution* — someone just had to find the time (and
spoons) to make it happen. It’s not the type of work that you could
reasonably turn into a paper, especially after the Lean folks did such
a stellar job, and it’s not the type of work that usually catches my
eye.

But, catch my eye it (eventually) did, and I think that indexing things by terms is a common enough problem that it falls on me (having solved the problem) to write up my solution — so that future compiler engineers have something to refer to for this bit of folklore.

# Discrimination trees

A **discrimination tree** is an associative data structure where the
indices are some sort of term — in our case, Agda’s type theory.
Despite the fancy name, it’s basically a trie: you traverse the term in
some (consistent) order to turn into a list of *keys*, which you then
look up in the trie.

Note that this process doesn’t exactly extend to higher-order terms, so
there’s an unavoidable amount of *inexactness* that is unavoidable when
looking up things in a discrimination tree. We may also choose to
*introduce* some inexactness, when being exact would be prohibitively
expensive, by deliberately excluding some classes of terms from being
represented as keys.

Remember, our goal is to produce a small set of initial candidates, so that we don’t spend a lot of time discarding candidates. But it’d be a total waste to do this if turning terms into keys was too expensive! The core trade-off is the following: adding more types of key will result in smaller candidate sets, but we’ll spend longer producing those keys.

Anyway, what exactly goes into a key? A good start is the following:

```
data Key
= RigidK {-# UNPACK #-} !QName {-# UNPACK #-} !Int
-- ^ Rigid symbols (constructors, data types, record types,
-- postulates) identified by a QName.
--
-- The 'Int' argument is the *arity* of the symbol as we encountered
-- it.
| FlexK
-- ^ Literally anything else, treated as a wildcard.
```

This representation is extremely coarse, but it serves to convey the key
ideas. When traversing a term, if we see a rigid symbol (as in the
comment), we can turn it into a `RigidK`

. Literally everything
else — lambda abstractions, function types, universe levels,
metavariables, local variables — will get turned into a
`FlexK`

.

One important thing to keep in mind is that, to tell if a term is rigid,
we need to do reduction — remember the `Discrete`

vs. `Dec`

instance
head? If we didn’t reduce, then we would mistakenly have a
`RigidK`

for `Discrete`

, which would cause us to miss
`Discrete-List`

if the goal was instead phrased as `Dec (Path (List A) ...)`

. Not good!

The main workhorse is the function `splitTermKey`

, found
here, which simplifies to the following:

```
splitTermKey :: Term -> TCM (Key, [Term])
= do
splitTermKey term <- reduce term
hnf case hnf of
Def head args ->
if head is rigid -- some nonsense about blockers...
then return (RigidK head (length args), args)
else return (FlexK, [])
-> return (FlexK, []) _
```

As the name implies, `splitTermKey`

splits a term into a key and
whatever arguments the rigid symbol had. The full traversal is then
elegantly implemented as a tail-recursive function. Full disclosure, the
implementation up to this point is heavily inspired by how it’s done in
Lean 4, but it’ll diverge in the next section.

```
termToKeys :: Term -> TCM [Key]
= loop [] [t] where
termToKeys t :rest) = do
loop acc (focus<- splitTermKey focus
(key, args) :acc) (args ++ rest)
loop (key= return (reverse acc) loop acc []
```

The idea of the loop is that we have a term-to-inspect (the “focus”) and a number of other term parts that we should get to next — the “rest”. We split the focus into a key and its arguments, record the key, and “push” the arguments in front of the rest.

Tracing the execution for a particularly simple case might look like the
following. I’ll write Agda terms in Agda syntax in these `⌜ fancy corners ⌝`

.

`Dec {lzero} ⊤ ⌝:[]) loop [] (⌜ `

The first thing we do is split the focus into a key and the arguments.
`Dec`

is a rigid symbol, so we get a rigid key. This key is pushed onto
the accumulator and we move onto the arguments:

```
-> splitTermKey ⌜ Dec {lzero} ⊤ ⌝
-> Dec, [ ⌜ lzero ⌝ , ⌜ ⊤ ⌝ ]
```

The next thing to look at is the first argument of `Dec`

, which was
`lzero`

. Since universe levels are not `Def`

s, they’re not
represented in our keys, and we get a “wildcard” FlexK.

```
Dec ] [ ⌜ lzero ⌝ , ⌜ ⊤ ⌝ ]
loop [ -> splitTermKey ⌜ lzero ⌝
-> *, []
```

Moving on again, we do the same thing for the unit type. It’s a record type, so it’s rigid, and we get an actual key. It has no arguments, though, so at this point, we’re done!

```
*, Dec ] [ ⌜ ⊤ ⌝ ]
loop [ -> splitTermKey ⌜ ⊤ ⌝
-> ⊤, []
*, Dec ] []
loop [ ⊤, -> returns [ Dec, *, ⊤ ]
```

The keys that correspond to the instance head `Dec {lzero} ⊤`

are `Dec, *, ⊤`

. We now know how to store our instances in the discrimination
trie!

## Trouble in paradise

Now consider *looking up* instances, to solve a goal. For example, we
might be trying to solve `Dec (4 ≡ 5)`

, which arises in the example
involving decidable equality, above. We can turn this term into a key:

```
Dec (4 ≡ 5) ⌝ =
termToKeys ⌜ Dec -- the class itself
[ * -- the universe level
,
-- the identity type
, _≡_ * -- another level
, Nat -- the type of 4 and 5
,
-- the number 4
, suc, suc, suc, suc, zero -- the number 5
, suc, suc, suc, suc, suc, zero ]
```

This utterly sucks, and not just for the obvious reason. To turn the
term into a key, we had to look at *every subterm*, and *reduce* all of
them, even though we know that there’s only one possible instance by the
time we get to `Nat`

! When the numbers are 4 and 5, this isn’t an issue,
but imagine how embarrassing it would be *not* to solve the goal `Dec (0 ≡ (2 ^ 2048))`

because we got stuck generating a key with eleventy
quadspillion `suc`

s?

There’s also another, hidden problem. The keys are bogged down with a lot more wildcards than we could possibly ever need. For example:

```
instance
extensionality-uncurry: ∀ {ℓ ℓ' ℓ''} {A : Type ℓ} {B : A → Type ℓ'} {C : Type ℓ''}
→ Extensionality (Σ A B → C)
-- key: Extensionality, *, Π, Σ, *, *, *, *, *
```

Taking these keys literally would lead to a pretty significant amount of
pointer-chasing when we’re following a deep sequence of `FlexK`

s,
though this pales in comparison to the amount of extra normalisation we
would have to do.

I thought about these two problems for a literal entire week, and only
arrived at a solution on a day I accidentally took two doses of my ADHD
medication. Very elegantly, the solution to *not storing eighty
wildcards* turns out to be precisely the information we need to *avoid
normalising literally everything*!

# Case trees

In Agda, pattern-matching functions are efficiently represented using
*case trees*. I promise this digression is going somewhere. Consider
this slightly silly function:

```
data Foo : Type where
: Foo
foo bar baz
: Nat → Foo
nat-to-foo 0 = foo
nat-to-foo 1 = bar
nat-to-foo _ = baz nat-to-foo
```

Internally, we represent `nat-to-foo`

like this. The `@0`

is a de Bruijn
level: in this context, it means the first argument to `nat-to-foo`

. We
have a tree indicating exactly what we have to evaluate:

```
= case @0 of
nat-to-foo → foo
zero → case @0 of
suc → bar
zero _ → baz
```

A crucial detail here is that we case on the first argument *twice*. The
inner case tree is oblivious to its enclosing (evaluation) context.
Therefore, *if* the `suc`

pattern matches, the first argument is
*replaced with the predecessor*. This is a bit technical, but it has a
clear interpretation in terms of contexts! If we have something like

```
data Bar : Type where
: Nat → Nat → Bar mk
```

then, in the function

```
: Bar → …
use = ? use x
```

matching on `x`

will *replace* the variable `x`

in the context by two
new variables, corresponding to the arguments of the `mk`

constructor:

`(mk n m) = ? use `

This procedure, *replacing* a matched term by its subterms, will be
pretty important in understanding the case trees below: while nested
case trees may *look like* they’re matching on the same argument
repeatedly, or matching on something that doesn’t exist, if you trace
the evolution of the “argument list” through the case trees, it’s
possible to work out exactly what subterm is being matched on.

## Case trees for discrimination

While the humble case tree was originally conceived to represent pattern
matching, it’s actually *exactly* the solution to our problems with the
literal implementation of discrimination trees as tries.

We might attempt a more literal solution, and I thought about it, by
“run-length encoding” the sequences of wildcards in the trie
representation. When *looking up* an instance, the number of wildcards
to skip is exactly the index of the next subterm that we’ll have to turn
into a key. Anything that gets skipped doesn’t have to be forced!

But assuming that we’ll force everything, and marking what needs to be
skipped, is empirically backwards: in practice, *very few* term
positions need to be forced to narrow to a single potential instance. So
we choose instead to record the index of the next subterm *to inspect*
— put another way, the next *variable to case on!*

So what do these “instance-case” trees actually look like in practice?
We can ask Agda to dump them by passing the `-vtc.instance.discrim:60`

verbosity flag. Using the `Dec`

and `Discrete`

instances we have
written so far, the instance table looks like this.

```
case 0 of
Dec² → case 1 of
DecYes}
⊤⁰ → done {DecNo}
⊥⁰ → done {case 4 of
Σ⁴ → Const → done {DecProd}
Pi → case 2 of
DecNot}
⊥⁰ → done {case 2 of
_≡_⁴ → Int⁰ → done {DiscreteInt}
Nat⁰ → done {DiscreteNat}
case 5 of
Σ⁴ → Const → done {DiscreteProd}
```

Here, we can see a few leaks of the `Key`

data type that we
actually use in Agda. First, it represents Π-types using a
`PiK`

key, printed as `Pi`

. Second, it has a special case for
constant functions — that’s `ConstK`

and it’s printed as
`Const`

. Oh, by the way, the high number indicates that it’ll print
quite a lot of stuff — and, indeed, it’ll print *the entire instance
table* for *every since instance goal*. You were warned.

## Finding instances

So, armed with the case tree above, how are we to find the candidates
for, e.g. `Dec (4 ≡ ack 4 4)`

? First, we start, once again, by putting
the term to be matched in a list. We’ll call this list the **spine**.

`Dec {lzero} (_≡_ {lzero} {Nat} 4 (ack 4 4)) ⌝ ] (case 0 of ...) match [ ⌜ `

We have to case on the first argument, the element with index zero on
the spine. Here, casing means calling our good friend `splitTermKey`

to
find out what branch to take. Dutifully, it splits `Dec`

into its
arguments — and we have a branch for `Dec`

, so that our next state is

`Nat} 4 (ack 4 4) ⌝ ] (case 1 of ...) match [ ⌜ lzero ⌝ , ⌜ _≡_ {lzero} {`

We now case on the *second* argument, the element with index 1; Since
the identity type is data, this also succeeds, and we shift into

`Nat ⌝ , ⌜ 4 ⌝ , ⌜ ack 4 4 ⌝ ] (case 2 of ...) match [ ⌜ lzero ⌝ , ⌜ lzero ⌝ , ⌜ `

The next thing we case on has index 2, so it’s the third thing on the
list: it’s `Nat`

, which is rigid and has *no* arguments. We shift to

`4 ⌝ , ⌜ ack 4 4 ⌝ ] (done {DiscreteNat}) match [ ⌜ lzero ⌝ , ⌜ lzero ⌝ , ⌜ `

but this is a done state! We know what the instance we’re looking for
is. Though, note the slight oddity that results from matched patterns
being replaced with their subterms: `Nat`

is gone. Anyway, please note
that, while the literal reading of *turn a term into a key* would’ve
gotten stuck exploding `ack 4 4`

, the case-tree representation doesn’t
touch it at all!

We’ve solved both the problem of having long paths of wildcards *and*
the problem of having to potentially force every subterm. Nice!

### Meandering around

This case-tree stuff is all well and good, but what about finding
instances that *aren’t* fully determined? What if our instance goal was,
for example, `Discrete ?0`

, where we don’t know the type we’re
comparing? This is where discrimination tree lookup diverges from
pattern matching. While applying a function to a metavariable is just
stuck, in instance lookup, a metavariable means that we should take
*every possible opportunity*.

This is the job of the `explore`

state in the lookup function. If, at
any point, we `case`

on a metavariable, instead of getting
stuck, we pretend, for each pattern in turn, that the metavariable was
actually that all along! For example, *this* state from before

`?0 ⌝ , ⌜ x ⌝ , ⌜ y ⌝ ] (case 2 of ...) match [ ⌜ lzero ⌝ , ⌜ lzero ⌝ , ⌜ `

would result, after a trip through `explore`

, in the same results as
*all three of*

```
match [ ⌜ lzero ⌝ , ⌜ lzero ⌝ , ⌜ x ⌝ , ⌜ y ⌝ ]DiscreteNat})
(done {
match [ ⌜ lzero ⌝ , ⌜ lzero ⌝ , ⌜ x ⌝ , ⌜ y ⌝ ]DiscreteInt})
(done {?1 ⌝ , ⌜ ?2 ⌝ , ⌜ ?3 ⌝ , ⌜ ?4 ⌝ , ⌜ x ⌝ , ⌜ y ⌝ ]
match [ ⌜ lzero ⌝ , ⌜ lzero ⌝ , ⌜ DiscreteProd}) (done {
```

Note that in the first two cases, the metavariable was simply removed
from the spine, since `Nat`

and `Int`

both have arity zero. In the
second case, we had to invent four new metas to stand in its place,
since `Σ`

has *four* arguments. Knowing how many placeholders we should
insert is the reason that we have the arity in `RigidK`

.

### True overlap

The example from the previous section, where the *goal* had a
metavariable, is one source of overlap. However, it’s quite benign: we
know all the metavariables that caused us to go `explore`

ing, so we know
which metavariables need to be solved to make progress on the overlap.
However, there are cases where overlap is unresolvable, at the level of
the discrimination tree:

```
postulate
: Type → Type
C instance
: C ⊤
CUnit : ∀ {A} → C A CAny
```

The `CUnit`

instance wants to case on the first argument twice: its case
tree is

```
case 0 of
C¹ → case 0 of
CUnit} ⊤⁰ → done {
```

But `C-any`

does *not* do any matching on the subterm of `C`

. Its case
tree looks like

```
case 0 of
C¹ → done {CAny}
```

How do we represent this? Well, in traditional pattern matching, we
*can’t*. A wildcard pattern is only matched if none of the patterns
above match, but, if we’re looking up `C ⊤`

, we should look up
*both* instances: the `C-any`

instance is *not* simply a “fallback”
after `C ⊤`

, it really does overlap. To this effect, our case trees have
an optional “and also” subtree: we evaluate the matching case (if any)
*and also* that case tree, totally independently from the other cases.

However, for ease of skimming, I elected to print the “and also” case tree as though it were a wildcard pattern: it just scans better. Just keep in mind that, when Agda says

```
case 0 of
C¹ → case 0 of
CUnit}
⊤⁰ → done {* → done {CAny}
```

the `* → done {CAny}`

branch will always be evaluated.

## Merging case trees

Where does true overlap come from? Merging case trees. It’s inevitable
that we would come to this — after all, you *can* import instances,
and you *can* import multiple modules, so we would have to deal with
merging the instance tables of multiple modules anyway. However, the
algorithm is actually pretty simple!

First, let me transcribe the actual data type for case trees, just so we’re on the same page.

```
data DiscrimTree a
= EmptyDT
-- ^ The empty discrimination tree.
| DoneDT (Set a)
-- ^ Succeed with a given set of values.
| CaseDT
-- ^ Do case analysis on a term.
{-# UNPACK #-} !Int -- ^ The variable to case on.
Map Key (DiscrimTree a)) -- ^ The proper branches.
(DiscrimTree a) -- ^ A further tree, which should always be explored.
(deriving (Generic, Eq, Show)
```

The easiest cases are merging the empty tree against anything, and
merging a `DoneDT`

against (an empty tree or) another
`DoneDT`

.

```
EmptyDT x = x
mergeDT DoneDT s) x = case x of
mergeDT (EmptyDT -> DoneDT s
DoneDT s' -> DoneDT (s <> s') -- set union
```

What if we’re merging a `DoneDT`

against a `CaseDT`

,
like in the `C-any`

example above? That’s what the “and-also” branch,
the third field of `CaseDT`

, is for.

`DoneDT s -> CaseDT i bs (mergeDT (DoneDT s) els) `

If we have a `CaseDT`

on the left, things get really interesting. The
first two cases are symmetric:

```
CaseDT i bs els) x = case x of
mergeDT (EmptyDT -> CaseDT i bs els
DoneDT s -> CaseDT i bs (mergeDT (DoneDT s) els)
```

If we’re merging a legitimate `case`

against another, there
are two possibilities: if they case on the same thing, we can merge the
corresponding branches from each, and if they don’t, we make an
arbitrary choice of which should be on the outside, and which should be
pushed into the “and-also” branch.

```
CaseDT j bs' els' -> case compare i j of
EQ -> CaseDT j (Map.unionWith mergeDT bs bs') (mergeDT els els')
LT -> CaseDT i bs (mergeDT els (CaseDT j bs' els'))
GT -> CaseDT j bs' (mergeDT els' (CaseDT i bs els))
```

In our implementation, we keep the match against the numerically earliest variable on the outside.

# The upshot

To justify all this complexity, we need some performance numbers. To this effect, I added a ton more profiling points to the implementation of instance search, and checked the 1Lab with and without discrimination trees. The result was a fairly modest, but consistent, improvement in type-checking performance. Overall, we gained 10 seconds:

Profiling point | 2.6.4.3 | 2.6.5 |
---|---|---|

Total | 703,384ms | 693,122ms |

Typing | 48,922ms | 50,900ms |

Typing.InstanceSearch | 918ms | 852ms |

Typing.InstanceSearch.FilterCandidates | 64,238ms | 34,063ms |

Typing.InstanceSearch.InitialCandidates | 1,954ms | 7,546ms |

Typing.OccursCheck | 44,405ms | 30,683ms |

Looking at the table above, what sticks out is that the time spent
discarding candidates was *halved*. We make up for this change by
spending more time computing the initial candidate set. While 7s is a
big number, the 1Lab has *4,695,879 instance goals*, so, on a per-goal
basis, we didn’t get that much slower. Interestingly, it seems that a
significant portion of the occurs checker’s runtime — the part of the
elaborator responsible for checking that metavariables do not appear in
their own solutions — was also being spent in instance filtering.

Is this work particularly novel? No, not exactly — but, even though
it’s only a small component of instance *search* as a whole, we can
finally say that Agda now matches the state of the art in… looking up
things in collections indexed by terms? Oh, well. Thanks for reading —
I’m happy to answer any questions about this work on Mastodon.