Backpropagation

Last but not least, KeOps fully supports automatic differentiation. Most of the magic required is implemented by the F::DiffT attributes of KeOps formulas and reductions, as discussed in previous pages.

Backprop through a Sum reduction

Then, to implement the PyTorch backward of the KeOps Genred operator, we simply have to remember that if $(g_{i}) \in R^{M \times E}$ is a “gradient to backpropagate” with respect to the output $(a_{i}) \in R^{M \times E}$ of a Genred call with a Sum reduction, we can write that for all variations $(δ p, δ x_{i}, δ y_{j})$ of the parameters, $i$ - and $j$ -variables, at order 1:

\begin{array}{r} \begin{aligned} ⟨ & \sum_{j = 1}^{N} F (p + δ p, x_{i} + δ x_{i}, y_{j} + δ y_{j}) - F (p, x_{i}, y_{j}), g_{i} ⟩_{R^{M \times E}} \\ = & \sum_{i = 1}^{M} \sum_{j = 1}^{N} (⟨ \partial_{p} F (\dots) \cdot g_{i}, δ p ⟩ + ⟨ \partial_{x_{i}} F (\dots) \cdot g_{i}, δ x_{i} ⟩ + ⟨ \partial_{y_{j}} F (\dots) \cdot g_{i}, δ y_{j} ⟩) . \end{aligned} \end{array}

Consequently, performing the appropriate permutations of sums:

\begin{array}{r} \begin{aligned} \partial_{x_{i}} [\sum_{j = 1}^{N} F (p, x_{i}, y_{j})] \cdot (g_{i}) & = \sum_{j = 1}^{N} (\partial_{x_{i}} [F (p, x_{i}, y_{j})] \cdot g_{i}), \\ \partial_{y_{j}} [\sum_{j = 1}^{N} F (p, x_{i}, y_{j})] \cdot (g_{i}) & = \sum_{i = 1}^{M} (\partial_{y_{j}} [F (p, x_{i}, y_{j})] \cdot g_{i}), \\ \partial_{p} [\sum_{j = 1}^{N} F (p, x_{i}, y_{j})] \cdot (g_{i}) & = \sum_{i = 1}^{M} \sum_{j = 1}^{N} (\partial_{p} [F (p, x_{i}, y_{j})] \cdot g_{i}) . \end{aligned} \end{array}

Backprop through a Log-Sum-Exp reduction

Similarly, when $(a_{i})$ is given through a Log-Sum-Exp reduction:

\begin{array}{r} a_{i} = \log \sum_{j = 1}^{N} \exp F (p, x_{i}, y_{j}), \end{array}

straightforward computations show that:

\begin{array}{r} \begin{aligned} \partial_{x_{i}} [\log \sum_{j = 1}^{N} \exp F (p, x_{i}, y_{j})] \cdot (g_{i}) & = \sum_{j = 1}^{N} e^{F (p, x_{i}, y_{j}) - a_{i}} \cdot (\partial_{x_{i}} [F (p, x_{i}, y_{j})] \cdot g_{i}), \\ \partial_{y_{j}} [\log \sum_{j = 1}^{N} \exp F (p, x_{i}, y_{j})] \cdot (g_{i}) & = \sum_{i = 1}^{M} e^{F (p, x_{i}, y_{j}) - a_{i}} \cdot (\partial_{y_{j}} [F (p, x_{i}, y_{j})] \cdot g_{i}), \\ \partial_{p} [\log \sum_{j = 1}^{N} \exp F (p, x_{i}, y_{j})] \cdot (g_{i}) & = \sum_{i = 1}^{M} \sum_{j = 1}^{N} e^{F (p, x_{i}, y_{j}) - a_{i}} \cdot (\partial_{p} [F (p, x_{i}, y_{j})] \cdot g_{i}) . \end{aligned} \end{array}

In other words, a backward pass through a Genred call that involves a Sum or a Log-Sum-Exp reduction can always be written as a symbolic Map-Reduce computation.

Bootstrapping derivatives of arbitrary order

Applying these commutation rules between the differential operator $\partial_{V}$ and the Sum or Log-Sum-Exp reductions, the pykeops/torch/generic/generic_red.py module provides full compatibility between KeOps LazyTensors and the torch.autograd package. Thanks to recursive calls to the Genred operator and to our symbolic math engine, everything works just fine – even high-order derivatives.