`hegg`

is a Haskell-native library providing fast e-graphs and equality
saturation, based on *egg: Fast and Extensible Equality
Saturation* and *Relational
E-matching*.

Suggested material on equality saturation and e-graphs for beginners

To get a feel for how we can use `hegg`

and do equality saturation in Haskell,
we’ll write a simple numeric *symbolic* manipulation library that can simplify expressions
according to a set of rewrite rules by leveraging equality saturation.

I hope to eventually write a better exposition that assumes less prior
knowledge which introduces first e-graphs-only workflows, and only then equality
saturation, from a `hegg`

user’s perspective. Until then, this rough tutorial
serves as an alternative.

# 1 Symbolic Maths in E-graphs

If you’ve never heard of symbolic mathematics you might get some intuition from
reading Let’s Program a Calculus Student first.
First, we define our symbolic maths language and enable it to be represented by
an e-graph using `hegg`

.

## 1.1 Syntax

We’ll start by defining the abstract syntax tree for our simple symbolic expressions:

```
data SymExpr = Const Double
| Symbol String
| SymExpr :+: SymExpr
| SymExpr :*: SymExpr
| SymExpr :/: SymExpr
6 :+:
infix 7 :*:, :/:
infix
e1 :: SymExpr
= (Symbol "x" :*: Const 2) :/: (Const 2) -- (x*2)/2 e1
```

You might notice that `(x*2)/2`

is the same as just `x`

. Our goal is to get
equality saturation to do that for us.

Our second step is to instance `Language`

for our `SymExpr`

## 1.2 Language

`Language`

is the required constraint on *expressions* that are to be
represented in e-graph and on which equality saturation can be run:

`type Language l = (Traversable l, ∀ a. Ord a => Ord (l a))`

To declare a `Language`

we must write the “base functor” of `SymExpr`

(i.e. use
a type parameter where the recursion points used to be in the original
`SymExpr`

), then instance `Traversable l`

, `∀ a. Ord a => Ord (l a)`

(we can do
it automatically through deriving), and write an `Analysis`

instance for it (see
next section).

```
data SymExpr a = Const Double
| Symbol String
| a :+: a
| a :*: a
| a :/: a
deriving (Eq, Ord, Show, Functor, Foldable, Traversable)
6 :+:
infix 7 :*:, :/: infix
```

Suggested reading on defining recursive data types in their parametrized version: Introduction To Recursion Schemes

If we now wanted to represent an expression, we’d write it in its fixed-point form

```
e1 :: Fix SymExpr
= Fix (Fix (Fix (Symbol "x") :*: Fix (Const 2)) :/: (Fix (Const 2))) -- (x*2)/2 e1
```

Then, we define an `Analysis`

for our `SymExpr`

.

## 1.3 Analysis

E-class analysis is first described in *egg: Fast and Extensible Equality
Saturation* as a way to make equality
saturation more *extensible*.

With it, we can attach *analysis data* from a semilattice to each e-class. More
can be read about e-class analysis in the `Data.Equality.Analsysis`

module and
in the paper.

We can easily define constant folding (`2+2`

being simplified to `4`

) through
an `Analysis`

instance.

An `Analysis`

is defined over a `domain`

and a `language`

. To define constant
folding, we’ll say the domain is `Maybe Double`

to attach a value of that type to
each e-class, where `Nothing`

indicates the e-class does not currently have a
constant value and `Just i`

means the e-class has constant value `i`

.

```
instance Analysis (Maybe Double) SymExpr
= ...
makeA = ...
joinA = ... modifyA
```

Let’s now understand and implement the three methods of the analysis instance we want.

`makeA`

is called when a new e-node is added to a new e-class, and constructs
for the new e-class a new value of the domain to be associated with it, always
by accessing the associated data of the node’s children data. Its type is `l domain -> domain`

, so note that the e-node’s children associated data is
directly available in place of the actual children.

We want to associate constant data to the e-class, so we must find if the
e-node has a constant value or otherwise return `Nothing`

:

```
makeA :: SymExpr (Maybe Double) -> Maybe Double
= \case
makeA Const x -> Just x
Symbol _ -> Nothing
:+: y -> (+) <$> x <*> y
x :*: y -> (*) <$> x <*> y
x :/: y -> (/) <$> x <*> y x
```

`joinA`

is called when e-classes c1 c2 are being merged into c. In this case, we
must join the e-class data from both classes to form the e-class data to be
associated with new e-class c. Its type is `domain -> domain -> domain`

. In our
case, to merge `Just _`

with `Nothing`

we simply take the `Just`

, and if we
merge two e-classes with a constant value (that is, both are `Just`

), then the
constant value is the same (or something went very wrong) and we just keep it.

```
joinA :: Maybe Double -> Maybe Double -> Maybe Double
Nothing (Just x) = Just x
joinA Just x) Nothing = Just x
joinA (Nothing Nothing = Nothing
joinA Just x) (Just y) = if x == y then Just x else error "ouch, that shouldn't have happened" joinA (
```

Finally, `modifyA`

describes how an e-class should (optionally) be modified
according to the e-class data and what new language expressions are to be added
to the e-class also w.r.t. the e-class data.
Its type is `ClassId -> EGraph domain l -> EGraph domain l`

, where the first argument
is the id of the class to modify (the class which prompted the modification),
and then receives and returns an e-graph, in which the e-class has been
modified. For our example, if the e-class has a constant value associated to
it, we want to create a new e-class with that constant value and merge it to
this e-class.

```
-- import Data.Equality.Graph.Lens ((^.), _class, _data)
modifyA :: ClassId -> EGraph (Maybe Double) SymExpr -> EGraph (Maybe Double) SymExpr
modifyA c egr= case egr ^._class c._data of
Nothing -> egr
Just i ->
let (c', egr') = represent (Fix (Const i)) egr
in snd $ merge c c' egr'
```

Modify is a bit trickier than the other methods, but it allows our e-graph to change based on the e-class analysis data. Note that the method is optional and there’s a default implementation for it which doesn’t change the e-class or adds anything to it. Analysis data can be otherwise used, e.g., to inform rewrite conditions.

By instancing this e-class analysis, all e-classes that have a constant value
associated to them will also have an e-node with a constant value. This is great
for our simple symbolic library because it means if we ever find e.g. an
expression equal to `3+1`

, we’ll also know it to be equal to `4`

, which is a
better result than `3+1`

(we’ve then successfully implemented constant folding).

If, otherwise, we didn’t want to use an analysis, we could specify the analysis
domain as `()`

which will make the analysis do nothing, because there’s an
instance polymorphic over `lang`

for `()`

that looks like this:

```
instance Analysis () lang where
= ()
makeA _ = () joinA _ _
```

# 2 Equality saturation on symbolic expressions

Equality saturation is defined as the function

```
equalitySaturation :: forall l. Language l
=> Fix l -- ^ Expression to run equality saturation on
-> [Rewrite l] -- ^ List of rewrite rules
-> CostFunction l -- ^ Cost function to extract the best equivalent representation
-> (Fix l, EGraph l) -- ^ Best equivalent expression and resulting e-graph
```

To recap, our goal is to reach `x`

starting from `(x*2)/2`

by means of equality
saturation.

We already have a starting expression, so we’re missing a list of rewrite rules
(`[Rewrite l]`

) and a cost function (`CostFunction`

).

## 2.1 Cost function

Picking up the easy one first:

`type CostFunction l cost = l cost -> cost`

A cost function is used to attribute a cost to representations in the e-graph and to extract the best one.
The first type parameter `l`

is the language we’re going to attribute a cost to, and
the second type parameter `cost`

is the type with which we will model cost. For
the cost function to be valid, `cost`

must instance `Ord`

.

We’ll say `Const`

s and `Symbol`

s are the cheapest and then in increasing cost we
have `:+:`

, `:*:`

and `:/:`

, and model cost with the `Int`

type.

```
cost :: CostFunction SymExpr Int
= \case
cost Const x -> 1
Symbol x -> 1
:+: c2 -> c1 + c2 + 2
c1 :*: c2 -> c1 + c2 + 3
c1 :/: c2 -> c1 + c2 + 4 c1
```

## 2.2 Rewrite rules

Rewrite rules are transformations applied to matching expressions represented in an e-graph.

We can write simple rewrite rules and conditional rewrite rules, but we’ll only look at the simple ones.

A simple rewrite is formed of its left hand side and right hand side. When the left hand side is matched in the e-graph, the right hand side is added to the e-class where the left hand side was found.

```
data Rewrite lang = Pattern lang := Pattern lang -- Simple rewrite rule
| Rewrite lang :| RewriteCondition lang -- Conditional rewrite rule
```

A `Pattern`

is basically an expression that might contain variables and which can be matched against actual expressions.

```
data Pattern lang
= NonVariablePattern (lang (Pattern lang))
| VariablePattern Var
```

A patterns is defined by its non-variable and variable parts, and can be
constructed directly or using the helper function `pat`

and using
`OverloadedStrings`

for the variables, where `pat`

is just a synonym for
`NonVariablePattern`

and a string literal `"abc"`

is turned into a `Pattern`

constructed with `VariablePattern`

.

We can then write the following very specific set of rewrite rules to simplify our simple symbolic expressions.

```
rewrites :: [Rewrite SymExpr]
=
rewrites "a" :*: "b") :/: "c") := pat ("a" :*: pat ("b" :/: "c"))
[ pat (pat ("x" :/: "x") := pat (Const 1)
, pat ("x" :*: (pat (Const 1))) := "x"
, pat ( ]
```

## 2.3 Equality saturation, finally

We can now run equality saturation on our expression!

`let expr = fst (equalitySaturation e1 rewrites cost)`

And upon printing we’d see `expr = Symbol "x"`

!

If we had instead `e2 = Fix (Fix (Fix (Symbol "x") :/: Fix (Symbol "x")) :+: (Fix (Const 3))) -- (x/x)+3`

, we’d get `expr = Const 4`

because of our rewrite
rules put together with our constant folding!

This was a first introduction which skipped over some details but that tried to walk through fundamental concepts for using e-graphs and equality saturation with this library.

The final code for this tutorial is available under `test/SimpleSym.hs`

A more complicated symbolic rewrite system which simplifies some derivatives and
integrals was written for the testsuite. It can be found at `test/Sym.hs`

.

This library could also be used not only for equality-saturation but also for
the equality-graphs and other equality-things (such as e-matching) available.
For example, using just the e-graphs from `Data.Equality.Graph`

to improve GHC’s
pattern match checker (https://gitlab.haskell.org/ghc/ghc/-/issues/19272).