# Autodiff engine

KeOps provides a simple **Automatic Differentiation** (AD) engine for generic formulas.
This feature can be used **seamlessly** through the `Grad`

instruction
or the **PyTorch backend**: users don’t have to understand backpropagation
to enjoy our “free” gradients.
Nevertheless, for the sake of completeness, here is
a short introduction to the inner workings of KeOps.

## Backprop 101

### Gradient of a vector valued function

Let \(F:\mathbb R^n \to \mathbb R^d\) be a smooth function.
Around any given point \(x \in \mathbb R^n\), its variations are encoded in a linear application \(\text{d}F(x):\mathbb R^n \to \mathbb R^d\) called the **differential**
of \(F\): for all variation \(\delta x \in \mathbb R^n\) of \(x\),

Going further, we can define the **gradient** of \(F\) at \(x\)
as the adjoint of \(\text{d}F(x)\): it is the unique application
\(\partial F(x)=\text{d}F(x)^*:\mathbb R^d \to \mathbb R^n\) such that
for all vector \(e \in \mathbb R^d\) and
variation \(\delta x \in \mathbb R^n\) of \(x\), we have:

where \(\langle\,\cdot\,,\,\cdot\,\rangle\) denotes the standard scalar product.

When \(F\) is scalar-valued (i.e. when \(d=1\)),
the gradient is a linear application
from \(\mathbb{R}\) to \(\mathbb{R}^n\):
it is best understood as the vector of **partial derivatives** of \(F\).
In the general case, the matrix of the gradient in the canonical basis
is given by the **transpose of the Jacobian** of \(F\).

### Reverse mode AD = backpropagation = chain rule

Now, let’s assume that the function \(F:\mathbb R^n \to \mathbb R^d\) can be written as a composition \(F =F_p \circ \cdots \circ F_2 \circ F_1\) of \(p\) functions \(F_i:E_{i-1} \to E_{i}\), where \(E_i=\mathbb{R}^{d_i}\). With these notations \(d_0 = n\) and \(d_p = d\).

Evaluating the gradient of \(F\) with the backpropagation algorithm requires:

A

**Forward pass**to evaluate the functions\[\begin{split}\begin{array}{ccccl} F_i & : & E_{i-1} & \to & E_{i} \\ & & x & \mapsto & F_i(x) \end{array}\end{split}\]and thus compute the

**value**\(F(x)\) :A

**Backward pass**to evaluate the (adjoints of the) differentials\[\begin{split}\begin{array}{ccccl} \partial F_i & : & E_{i-1}\times E_{i} & \to & E_{i-1} \\ & & (x_{i-1},x_i^*) & \mapsto & [\text{d} F_i^*(x_{i-1})](x_i^*) = x_{i-1}^* \end{array}\end{split}\]and compute the

**gradient**of \(F\) at location \(x\), applied to an arbitrary vector \(e\) is the space of outputs:

This method relies on the chain-rule, as

When \(F\) is scalar-valued (i.e. \(d=1\)), this algorithm allows us to compute the vector of partial derivatives

with a mere **forward-backward pass** through the computational graph of \(F\)…
which is much **cheaper** than the naive evaluation of \(n\) finite differences of \(F\).

## The KeOps generic engine

Backpropagation has become the standard way of computing the gradients of arbitrary “Loss” functions in imaging and machine learning. Crucially, any backprop engine should be able to:

**Link together**the*forward*operations \(F_i\) with their*backward*counterparts \(\partial F_i\).**Store in memory**the intermediate results \(x_0,\dots,x_p\) before using them in the backward pass.

### The `Grad`

operator

At a low level, KeOps allows us to perform these tasks with the `Grad`

instruction:
given a formula \(F\), the symbolic expression `Grad(F, V, E)`

denotes the gradient \([\partial_V F(x)] (E)\) with respect to the variable \(V\) evaluated on the input variable \(E\).

If `V`

is a variable place-holder that appears in the expression of `F`

and if `E`

has the same dimension and category as `F`

, `Grad(F,V,E)`

can be fed to KeOps just like any other symbolic expression.
The resulting output will have the same dimension and category as the variable `V`

,
and can be used directly for gradient descent or **higher-order differentiation**:
operations such as `Grad(Grad(..,..,..),..,..)`

are fully supported.

### User interfaces

As evidenced by this example, the simple `Grad`

syntax can relieve us from the burden of differentiating symbolic formulas by hand.

Going further, our Python interface is fully compatible with the PyTorch library:
feel free to use the output of a `pykeops.torch`

routine **just like any other differentiable tensor**!
Thanks to the flexibility of the `torch.autograd`

engine,
end-to-end automatic differentiation is at hand:
see this example or this example for an introduction.

## An example

Coming back to our previous example where the formula

```
F = "Sum_Reduction(Square(Pm(0,1) - Vj(3,1)) * Exp(Vi(1,3) + Vj(2,3)), 1)"
```

was discussed, the symbolic expression

```
[grad_a F] = "Grad( Sum_Reduction(Square(Pm(0,1) - Vj(3,1)) * Exp(Vi(1,3) + Vj(2,3)), 1),
Vj(3,1), Vi(4,3) )"
```

allows us to compute the gradient of \(F\) with respect to \((a_j) \in \mathbb R^N\) (`= Vj(3,1)`

), applied to an arbitrary test vector \(e\in\mathbb R^{M\times 3}\) given as a fifth input `Vi(4,3)`

:

With aliases, this computation simply reads:

```
p=Pm(0,1), x=Vi(1,3), y=Vj(2,3), a=Vj(3,1), e=Vi(4,3)
[grad_a F](e) = "Grad( Sum_Reduction(Square(p-a)*Exp(x+y), 1), a, e)"
```