Equality Saturation in Haskell, a tutorial

Aug 23, 2022

A first tutorial for equality saturation with hegg, a library of e-graphs and fast equality saturation written in Haskell. This tutorial implements a symbolic manipulation library using hegg.

hegg is a Haskell-native library providing fast e-graphs and equality saturation, based on egg: Fast and Extensible Equality Saturation and Relational E-matching.

Suggested material on equality saturation and e-graphs for beginners

To get a feel for how we can use hegg and do equality saturation in Haskell, we’ll write a simple numeric symbolic manipulation library that can simplify expressions according to a set of rewrite rules by leveraging equality saturation.

I hope to eventually write a better exposition that assumes less prior knowledge which introduces first e-graphs-only workflows, and only then equality saturation, from a hegg user’s perspective. Until then, this rough tutorial serves as an alternative.

1 Symbolic Maths in E-graphs

If you’ve never heard of symbolic mathematics you might get some intuition from reading Let’s Program a Calculus Student first. First, we define our symbolic maths language and enable it to be represented by an e-graph using hegg.

1.1 Syntax

We’ll start by defining the abstract syntax tree for our simple symbolic expressions:

data SymExpr = Const Double
             | Symbol String
             | SymExpr :+: SymExpr
             | SymExpr :*: SymExpr
             | SymExpr :/: SymExpr
infix 6 :+:
infix 7 :*:, :/:

e1 :: SymExpr
e1 = (Symbol "x" :*: Const 2) :/: (Const 2) -- (x*2)/2

You might notice that (x*2)/2 is the same as just x. Our goal is to get equality saturation to do that for us.

Our second step is to instance Language for our SymExpr

1.2 Language

Language is the required constraint on expressions that are to be represented in e-graph and on which equality saturation can be run:

type Language l = (Traversable l, ∀ a. Ord a => Ord (l a))

To declare a Language we must write the “base functor” of SymExpr (i.e. use a type parameter where the recursion points used to be in the original SymExpr), then instance Traversable l, ∀ a. Ord a => Ord (l a) (we can do it automatically through deriving), and write an Analysis instance for it (see next section).

data SymExpr a = Const Double
               | Symbol String
               | a :+: a
               | a :*: a
               | a :/: a
               deriving (Eq, Ord, Show, Functor, Foldable, Traversable)
infix 6 :+:
infix 7 :*:, :/:

Suggested reading on defining recursive data types in their parametrized version: Introduction To Recursion Schemes

If we now wanted to represent an expression, we’d write it in its fixed-point form

e1 :: Fix SymExpr
e1 = Fix (Fix (Fix (Symbol "x") :*: Fix (Const 2)) :/: (Fix (Const 2))) -- (x*2)/2

Then, we define an Analysis for our SymExpr.

1.3 Analysis

E-class analysis is first described in egg: Fast and Extensible Equality Saturation as a way to make equality saturation more extensible.

With it, we can attach analysis data from a semilattice to each e-class. More can be read about e-class analysis in the Data.Equality.Analsysis module and in the paper.

We can easily define constant folding (2+2 being simplified to 4) through an Analysis instance.

An Analysis is defined over a domain and a language. To define constant folding, we’ll say the domain is Maybe Double to attach a value of that type to each e-class, where Nothing indicates the e-class does not currently have a constant value and Just i means the e-class has constant value i.

instance Analysis (Maybe Double) SymExpr
  makeA = ...
  joinA = ...
  modifyA = ...

Let’s now understand and implement the three methods of the analysis instance we want.

makeA is called when a new e-node is added to a new e-class, and constructs for the new e-class a new value of the domain to be associated with it, always by accessing the associated data of the node’s children data. Its type is l domain -> domain, so note that the e-node’s children associated data is directly available in place of the actual children.

We want to associate constant data to the e-class, so we must find if the e-node has a constant value or otherwise return Nothing:

makeA :: SymExpr (Maybe Double) -> Maybe Double
makeA = \case
  Const x -> Just x
  Symbol _ -> Nothing
  x :+: y -> (+) <$> x <*> y
  x :*: y -> (*) <$> x <*> y
  x :/: y -> (/) <$> x <*> y

joinA is called when e-classes c1 c2 are being merged into c. In this case, we must join the e-class data from both classes to form the e-class data to be associated with new e-class c. Its type is domain -> domain -> domain. In our case, to merge Just _ with Nothing we simply take the Just, and if we merge two e-classes with a constant value (that is, both are Just), then the constant value is the same (or something went very wrong) and we just keep it.

joinA :: Maybe Double -> Maybe Double -> Maybe Double
joinA Nothing (Just x) = Just x
joinA (Just x) Nothing = Just x
joinA Nothing Nothing  = Nothing
joinA (Just x) (Just y) = if x == y then Just x else error "ouch, that shouldn't have happened"

Finally, modifyA describes how an e-class should (optionally) be modified according to the e-class data and what new language expressions are to be added to the e-class also w.r.t. the e-class data. Its type is ClassId -> EGraph domain l -> EGraph domain l, where the first argument is the id of the class to modify (the class which prompted the modification), and then receives and returns an e-graph, in which the e-class has been modified. For our example, if the e-class has a constant value associated to it, we want to create a new e-class with that constant value and merge it to this e-class.

-- import Data.Equality.Graph.Lens ((^.), _class, _data)
modifyA :: ClassId -> EGraph (Maybe Double) SymExpr -> EGraph (Maybe Double) SymExpr
modifyA c egr
    = case egr ^._class c._data of
        Nothing -> egr
        Just i ->
          let (c', egr') = represent (Fix (Const i)) egr
           in snd $ merge c c' egr'

Modify is a bit trickier than the other methods, but it allows our e-graph to change based on the e-class analysis data. Note that the method is optional and there’s a default implementation for it which doesn’t change the e-class or adds anything to it. Analysis data can be otherwise used, e.g., to inform rewrite conditions.

By instancing this e-class analysis, all e-classes that have a constant value associated to them will also have an e-node with a constant value. This is great for our simple symbolic library because it means if we ever find e.g. an expression equal to 3+1, we’ll also know it to be equal to 4, which is a better result than 3+1 (we’ve then successfully implemented constant folding).

If, otherwise, we didn’t want to use an analysis, we could specify the analysis domain as () which will make the analysis do nothing, because there’s an instance polymorphic over lang for () that looks like this:

instance Analysis () lang where
  makeA _ = ()
  joinA _ _ = ()

2 Equality saturation on symbolic expressions

Equality saturation is defined as the function

equalitySaturation :: forall l. Language l
                   => Fix l             -- ^ Expression to run equality saturation on
                   -> [Rewrite l]       -- ^ List of rewrite rules
                   -> CostFunction l    -- ^ Cost function to extract the best equivalent representation
                   -> (Fix l, EGraph l) -- ^ Best equivalent expression and resulting e-graph

To recap, our goal is to reach x starting from (x*2)/2 by means of equality saturation.

We already have a starting expression, so we’re missing a list of rewrite rules ([Rewrite l]) and a cost function (CostFunction).

2.1 Cost function

Picking up the easy one first:

type CostFunction l cost = l cost -> cost

A cost function is used to attribute a cost to representations in the e-graph and to extract the best one. The first type parameter l is the language we’re going to attribute a cost to, and the second type parameter cost is the type with which we will model cost. For the cost function to be valid, cost must instance Ord.

We’ll say Consts and Symbols are the cheapest and then in increasing cost we have :+:, :*: and :/:, and model cost with the Int type.

cost :: CostFunction SymExpr Int
cost = \case
  Const  x -> 1
  Symbol x -> 1
  c1 :+: c2 -> c1 + c2 + 2
  c1 :*: c2 -> c1 + c2 + 3
  c1 :/: c2 -> c1 + c2 + 4

2.2 Rewrite rules

Rewrite rules are transformations applied to matching expressions represented in an e-graph.

We can write simple rewrite rules and conditional rewrite rules, but we’ll only look at the simple ones.

A simple rewrite is formed of its left hand side and right hand side. When the left hand side is matched in the e-graph, the right hand side is added to the e-class where the left hand side was found.

data Rewrite lang = Pattern lang := Pattern lang          -- Simple rewrite rule
                  | Rewrite lang :| RewriteCondition lang -- Conditional rewrite rule

A Pattern is basically an expression that might contain variables and which can be matched against actual expressions.

data Pattern lang
    = NonVariablePattern (lang (Pattern lang))
    | VariablePattern Var

A patterns is defined by its non-variable and variable parts, and can be constructed directly or using the helper function pat and using OverloadedStrings for the variables, where pat is just a synonym for NonVariablePattern and a string literal "abc" is turned into a Pattern constructed with VariablePattern.

We can then write the following very specific set of rewrite rules to simplify our simple symbolic expressions.

rewrites :: [Rewrite SymExpr]
rewrites =
  [ pat (pat ("a" :*: "b") :/: "c") := pat ("a" :*: pat ("b" :/: "c"))
  , pat ("x" :/: "x")               := pat (Const 1)
  , pat ("x" :*: (pat (Const 1)))   := "x"

2.3 Equality saturation, finally

We can now run equality saturation on our expression!

let expr = fst (equalitySaturation e1 rewrites cost)

And upon printing we’d see expr = Symbol "x"!

If we had instead e2 = Fix (Fix (Fix (Symbol "x") :/: Fix (Symbol "x")) :+: (Fix (Const 3))) -- (x/x)+3, we’d get expr = Const 4 because of our rewrite rules put together with our constant folding!

This was a first introduction which skipped over some details but that tried to walk through fundamental concepts for using e-graphs and equality saturation with this library.

The final code for this tutorial is available under test/SimpleSym.hs

A more complicated symbolic rewrite system which simplifies some derivatives and integrals was written for the testsuite. It can be found at test/Sym.hs.

This library could also be used not only for equality-saturation but also for the equality-graphs and other equality-things (such as e-matching) available. For example, using just the e-graphs from Data.Equality.Graph to improve GHC’s pattern match checker (https://gitlab.haskell.org/ghc/ghc/-/issues/19272).