Note
Go to the end to download the full example code
Vectorial LogSumExp reduction
Let’s compute the (3000,1) tensor
where
is a (3000,1) tensor, with entries . is a (5000,1) tensor, with entries . is a (5000,1) tensor, with entries . is a scalar, encoded as a vector of size (1,). is a (5000,3) tensor, with entries .
Setup
Standard imports:
import time
import matplotlib.pyplot as plt
import torch
from torch.autograd import grad
from pykeops.torch import Genred
Declare random inputs:
M = 3000
N = 5000
dtype = "float32" # Could be 'float32' or 'float64'
torchtype = torch.float32 if dtype == "float32" else torch.float64
x = torch.rand(M, 1, dtype=torchtype)
y = torch.rand(N, 1, dtype=torchtype, requires_grad=True)
a = torch.rand(N, 1, dtype=torchtype)
p = torch.rand(1, dtype=torchtype)
b = torch.rand(N, 3, dtype=torchtype)
Define a custom formula
formula = "Square(p-a)*Exp(x+y)"
formula2 = "b"
variables = [
"x = Vi(1)", # First arg : i-variable, of size 1 (scalar)
"y = Vj(1)", # Second arg : j-variable, of size 1 (scalar)
"a = Vj(1)", # Third arg : j-variable, of size 1 (scalar)
"p = Pm(1)", # Fourth arg : Parameter, of size 1 (scalar)
"b = Vj(3)",
] # Fifth arg : j-variable, of size 3 (vector)
start = time.time()
Our log-sum-exp reduction is performed over the index 1
of the kernel matrix.
The output c is an
my_routine = Genred(
formula, variables, reduction_op="LogSumExp", axis=1, dtype=dtype, formula2=formula2
)
c = my_routine(x, y, a, p, b, backend="CPU")
# N.B.: By specifying backend='CPU', we can make sure that the result is computed using a simple C++ for loop.
print(
"Time to compute the convolution operation on the cpu: ",
round(time.time() - start, 5),
"s",
end=" ",
)
[pyKeOps] Warning : keyword argument dtype in Genred is deprecated ; argument is ignored.
Time to compute the convolution operation on the cpu: 0.04646 s
We compare with the unstable, naive computation “Log of Sum of Exp”:
my_routine2 = Genred(
"Exp(" + formula + ")*" + formula2,
variables,
reduction_op="Sum",
axis=1,
dtype=dtype,
)
c2 = torch.log(my_routine2(x, y, a, p, b, backend="CPU"))
print("(relative error: ", ((c2 - c).norm() / c.norm()).item(), ")")
# Plot the results next to each other:
for i in range(3):
plt.subplot(3, 1, i + 1)
plt.plot(c.detach().cpu().numpy()[:40, i], "-", label="KeOps - Stable")
plt.plot(c2.detach().cpu().numpy()[:40, i], "--", label="KeOps - Unstable")
plt.legend(loc="lower right")
plt.tight_layout()
plt.show()

[pyKeOps] Warning : keyword argument dtype in Genred is deprecated ; argument is ignored.
(relative error: 5.1588305183258854e-08 )
Compute the gradient
Now, let’s compute the gradient of
takes as input a new tensor
with the shape ofoutputs a tensor
with the shape of
such that for all variation
Backpropagation is all about computing the tensor
# Declare a new tensor of shape (M,1) used as the input of the gradient operator.
# It can be understood as a "gradient with respect to the output c"
# and is thus called "grad_output" in the documentation of PyTorch.
e = torch.rand_like(c)
# Call the gradient op:
start = time.time()
g = grad(c, y, e)[0]
# PyTorch remark : grad(c, y, e) alone outputs a length 1 tuple, hence the need for [0] at the end.
print(
"Time to compute gradient of convolution operation on the cpu: ",
round(time.time() - start, 5),
"s",
end=" ",
)
Time to compute gradient of convolution operation on the cpu: 0.06648 s
We compare with gradient of Log of Sum of Exp:
g2 = grad(c2, y, e)[0]
print("(relative error: ", ((g2 - g).norm() / g.norm()).item(), ")")
# Plot the results next to each other:
plt.plot(g.detach().cpu().numpy()[:40], "-", label="KeOps - Stable")
plt.plot(g2.detach().cpu().numpy()[:40], "--", label="KeOps - Unstable")
plt.legend(loc="lower right")
plt.tight_layout()
plt.show()

(relative error: 5.8240928524355695e-08 )
Same operations performed on the Gpu
Of course, this will only work if you own a Gpu…
if torch.cuda.is_available():
# first transfer data on gpu
pc, ac, xc, yc, bc, ec = p.cuda(), a.cuda(), x.cuda(), y.cuda(), b.cuda(), e.cuda()
# then call the operations
start = time.time()
c3 = my_routine(xc, yc, ac, pc, bc, backend="GPU")
print(
"Time to compute convolution operation on the gpu:",
round(time.time() - start, 5),
"s ",
end="",
)
print("(relative error:", float(torch.abs((c2 - c3.cpu()) / c2).mean()), ")")
start = time.time()
g3 = grad(c3, yc, ec)[0]
print(
"Time to compute gradient of convolution operation on the gpu:",
round(time.time() - start, 5),
"s ",
end="",
)
print("(relative error:", float(torch.abs((g2 - g3.cpu()) / g2).mean()), ")")
# Plot the results next to each other:
for i in range(3):
plt.subplot(3, 1, i + 1)
plt.plot(c.detach().cpu().numpy()[:40, i], "-", label="KeOps - CPU")
plt.plot(c3.detach().cpu().numpy()[:40, i], "--", label="KeOps - GPU")
plt.legend(loc="lower right")
plt.tight_layout()
plt.show()
# Plot the results next to each other:
plt.plot(g.detach().cpu().numpy()[:40], "-", label="KeOps - CPU")
plt.plot(g3.detach().cpu().numpy()[:40], "--", label="KeOps - GPU")
plt.legend(loc="lower right")
plt.tight_layout()
plt.show()

Time to compute convolution operation on the gpu: 0.00289 s (relative error: 2.260827614009031e-08 )
Time to compute gradient of convolution operation on the gpu: 0.01904 s (relative error: 1.344030380323602e-07 )
Total running time of the script: (0 minutes 0.798 seconds)