Optimisation through Constraint Propagation

Posted on August 6, 2017
Word count: 1030

Constraint propagation is a new optimisation proposed for implementation in the Urn compiler¹. It is a variation on the idea of flow-sensitive typing in that it is not applied to increasing program safety, rather being used to improve speed.

Motivation

The Urn compiler is decently fast for being implemented in Lua. Currently, it manages to compile itself (and a decent chunk of the standard library) in about 4.5 seconds (when using LuaJIT; When using the lua.org interpreter, this time roughly doubles). Looking at a call-stack profile of the compiler, we notice a very interesting data point: about 11% of compiler runtime is spent in the (type) function.

There are two ways to fix this: Either we introduce a type system (which is insanely hard to do for a language as dynamic as Urn - or Lisp in general) or we reduce the number of calls to (type) by means of optimisation. Our current plan is to do the latter.

How

The proposed solution is to collect all the branches that the program has taken to end up in the state it currently is. Thus, every branch grows the set of “constraints” - the predicates which have been invoked to get the program here.

Most useful predicates involve a variable: Checking if it is or isn’t nil, if is positive or negative, even or odd, a list or a string, and etc. However, when talking about a single variable, this test only has to be performed once (in general - mutating the variable invalidates the set of collected constraints), and their truthiness can be kept, by the compiler, for later use.

As an example, consider the following code. It has three branches, all of which imply something different about the type of the variable x.

(cond
  [(list? x)]    ; first case
  [(string? x)]  ; second case
  [(number? x)]) ; third case

If, in the first case, the program then evaluated (car x), it’d end up doing a redundant type check. (car), is, in the standard library, implemented like so:

(defun car (x)
  (assert-type! x list)
  (.> x 0))

assert-type! is merely a macro to make checking the types of arguments more convenient. Let’s make the example of branching code a bit more complicated by making it take and print the car of the list.

(cond
  [(list? x)
   (print! (car x))])
  ; other branches elided for clarity

To see how constraint propagation would aid the runtime performance of this code, let’s play optimiser for a bit, and see what this code would end up looking like at each step.

First, (car x) is inlined.

(cond
  [(list? x)
   (print! (progn (assert-type! x list)
                  (.> x 0)))])

assert-type! is expanded, and the problem becomes apparent: the type of x is being computed twice!

(cond
  [(list? x)
   (print! (progn (if (! (list? x))
                    (error! "the argument x is not a list"))
                  (.> x 0)))])

If the compiler had constraint propagation (and the associated code motions), this code could be simplified further.

(cond
  [(list? x)
   (print! (.> x 0))])

Seeing as we already know that (list? x) is true, we don’t need to test anymore, and the conditional can be entirely eliminated. Figuring out (! (list? x)) from (list? x) is entirely trivial constant folding (the compiler already does it)

This code is optimal. The (list? x) test can’t be eliminated because nothing else is known about x. If its value were statically known, the compiler could eliminate the branch and invocation of (car x) completely by constant propagation and folding ((car) is, type assertion notwithstanding, a pure function - it returns the same results for the same inputs. Thus, it is safe to execute at compile time)

How, exactly

In this section I’m going to outline a very simple implementation of the constraint propagation algorithm to be employed in the Urn compiler. It’ll work on a simple Lisp with no quoting or macros (thus, basically the lambda calculus).

(lambda (var1 var2) exp) ; λ-abstraction
(foo bar baz) ; procedure application
var ; variable reference
(list x y z) ; list
t, nil ; boolean
(cond [t1 b1] [t2 b2]) ; conditional

The language has very simple semantics. It has three kinds of values (closures, lists and booleans), and only a couple reduction rules. The evaluation rules are presented as an interpretation function (in Urn, not the language itself).

(defun interpret (x env)
  (case x
    [(lambda ?params . ?body)
     `(:closure ,params ,body ,(copy env))] ; 1
    [(list . ?xs)
     (map (cut interpret <> env) xs)] ; 2
    [t true] [nil false] ; 3
    [(cond . ?alts) ; 4
     (interpret
       (block (map (lambda (alt)
                     (when (interpret (car alt) env)
                       (break (cdr alt))))))
       env)]
    [(?fn . ?args)
     (case (eval fn env)
       [(:closure ?params ?body ?cl-env) ; 5
        (map (lambda (a k)
               (.<! cl-env (symbol->string a) (interpret k env)))
             params args)
        (last (map (cut interpret <> env) body))]
       [_ (error! $"not a procedure: ${fn}")])]
    [else (.> env (symbol->string x))]))

In the case the expression currently being evaluated is a lambda, we make a copy of the current environment and store it in a closure.
If a list is being evaluated, we recursively evaluate each sub-expression and store all of them in a list.
If a boolean is being interpreted, they’re mapped to the respective values in the host language.
If a conditional is being evaluated, each test is performed in order, and we abort to interpret with the corresponding body.
When evaluating a procedure application, the procedure to apply is inspected: If it is a closure, we evaluate all the arguments, bind them along with the closure environment, and interpret the body. If not, an error is thrown.

Collecting constraints in a language as simple as this is fairly easy, so here’s an implementation.

(defun collect-constraints (expr (constrs '()))
  (case expr
    [(lambda ?params . ?body)
     `(:constraints (lambda ,params
                      ,@(map (cut collect-constraints <> constrs) body))
                    ,constrs)]

Lambda expressions incur no additional constraints, so the inner expressions (namely, the body) receive the old set. The same is true for lists:

    [(list . ?xs)
     `(:constraints (list ,@(map (cut collect-constraints <> constrs) xs))
                    ,constrs)]

Booleans are simpler:

    [t `(:constraints ,'t ,constrs)]
    [nil `(:constraints ,'nil ,constrs)]

Since there are no sub-expressions to go through, we only associate the constraints with the boolean values.

Conditionals are where the real work happens. For each case, we add that case’s test as a constraint in its body.

    [(cond . ?alts)
     `(:constraints
         (cond
           ,@(map (lambda (x)
                    `(,(collect-constraints (car x) constrs)
                      ,(collect-constraints (cadr x) (cons (car x) constrs))))
                  alts))
         ,constrs)]

Applications are as simple as lists. Note that we make no distinction between valid applications and invalid ones, and just tag both.

    [(?fn . ?args)
     `(:constraints
        (,(collect-constraints fn constrs)
         ,@(map (cut collect-constraints <> constrs)
                args))
        ,constrs)]

References are also straightforward:

    [else `(:constraints ,expr ,constrs)]))

That’s it! Now, this information can be exploited to select a case branch at compile time, and eliminate the overhead of performing the test again.

This is really easy to do in a compiler that already has constant folding of alternatives. All we have to do is associate constraints to truthy values. For instance:

(defun fold-on-constraints (x)
  (case x
    [((:constraints ?e ?x)
      :when (known? e x))
     't]
    [else x]))

That’s it! We check if the expression is in the set of known constraints, and if so, reduce it to true. Then, the constant folding code will take care of eliminating the redundant branches.

When

This is a really complicated question. The Urn core language, unfortunately, is a tad more complicated, as is the existing optimiser. Collecting constraints and eliminating tests would be in completely different parts of the compiler.

There is also a series of code motions that need to be in place for constraints to be propagated optimally, especially when panic edges are involved. Fortunately, these are all simple to implement, but it’s still a whole lot of work.

I don’t feel confident setting a specific timeframe for this, but I will post more blags on this topic. It’s fascinating (for me, at least) and will hopefully make the compiler faster!

The relevant merge request can be found here.↩︎