Autodiff engine
KeOps provides a simple Automatic Differentiation (AD) engine for generic formulas.
This feature can be used seamlessly through the Grad
instruction
or the PyTorch backend: users don’t have to understand backpropagation
to enjoy our “free” gradients.
Nevertheless, for the sake of completeness, here is
a short introduction to the inner workings of KeOps.
Backprop 101
Gradient of a vector valued function
Let
Going further, we can define the gradient of
where
When
Reverse mode AD = backpropagation = chain rule
Now, let’s assume that the function
Evaluating the gradient of
A Forward pass to evaluate the functions
and thus compute the value
:A Backward pass to evaluate the (adjoints of the) differentials
and compute the gradient of
at location , applied to an arbitrary vector is the space of outputs:
This method relies on the chain-rule, as
When
with a mere forward-backward pass through the computational graph of
The KeOps generic engine
Backpropagation has become the standard way of computing the gradients of arbitrary “Loss” functions in imaging and machine learning. Crucially, any backprop engine should be able to:
Link together the forward operations
with their backward counterparts .Store in memory the intermediate results
before using them in the backward pass.
The Grad
operator
At a low level, KeOps allows us to perform these tasks with the Grad
instruction:
given a formula Grad(F, V, E)
denotes the gradient
If V
is a variable place-holder that appears in the expression of F
and if E
has the same dimension and category as F
, Grad(F,V,E)
can be fed to KeOps just like any other symbolic expression.
The resulting output will have the same dimension and category as the variable V
,
and can be used directly for gradient descent or higher-order differentiation:
operations such as Grad(Grad(..,..,..),..,..)
are fully supported.
User interfaces
As evidenced by this example, the simple Grad
syntax can relieve us from the burden of differentiating symbolic formulas by hand.
Going further, our Python interface is fully compatible with the PyTorch library:
feel free to use the output of a pykeops.torch
routine just like any other differentiable tensor!
Thanks to the flexibility of the torch.autograd
engine,
end-to-end automatic differentiation is at hand:
see this example or this example for an introduction.
An example
Coming back to our previous example where the formula
F = "Sum_Reduction(Square(Pm(0,1) - Vj(3,1)) * Exp(Vi(1,3) + Vj(2,3)), 1)"
was discussed, the symbolic expression
[grad_a F] = "Grad( Sum_Reduction(Square(Pm(0,1) - Vj(3,1)) * Exp(Vi(1,3) + Vj(2,3)), 1),
Vj(3,1), Vi(4,3) )"
allows us to compute the gradient of = Vj(3,1)
), applied to an arbitrary test vector Vi(4,3)
:
With aliases, this computation simply reads:
p=Pm(0,1), x=Vi(1,3), y=Vj(2,3), a=Vj(3,1), e=Vi(4,3)
[grad_a F](e) = "Grad( Sum_Reduction(Square(p-a)*Exp(x+y), 1), a, e)"