Genred¶
Summary
This section contains the full API documentation for the PyTorch Generic reductions, with full support of PyTorch’s torch.autograd
engine.
Creates a new generic operation. 

Instantiate a new generic operation. 

To apply the routine on arbitrary torch Tensors. 
Syntax

class
pykeops.torch.
Genred
[source]¶ Creates a new generic operation.
This is KeOps’ main function, whose usage is documented in the userguide, the gallery of examples and the highlevel tutorials. Taking as input a handful of strings and integers that specify a custom MapReduce operation, it returns a C++ wrapper that can be called just like any other PyTorch function.
Note
Genred()
is fully compatible with PyTorch’sautograd
engine: You can backprop through the KeOps__call__()
just as if it was a vanilla PyTorch operation (except for Min or Max reduction types, see reductions)Example
>>> my_conv = Genred('Exp(SqNorm2(x  y))', # formula ... ['x = Vi(3)', # 1st input: dim3 vector per line ... 'y = Vj(3)'], # 2nd input: dim3 vector per column ... reduction_op='Sum', # we also support LogSumExp, Min, etc. ... axis=1) # reduce along the lines of the kernel matrix >>> # Apply it to 2d arrays x and y with 3 columns and a (huge) number of lines >>> x = torch.randn(1000000, 3, requires_grad=True).cuda() >>> y = torch.randn(2000000, 3).cuda() >>> a = my_conv(x, y) # a_i = sum_j exp(x_iy_j^2) >>> print(a.shape) torch.Size([1000000, 1]) >>> [g_x] = torch.autograd.grad((a ** 2).sum(), [x]) # KeOps supports autograd! >>> print(g_x.shape) torch.Size([1000000, 3])

__init__
(formula, aliases, reduction_op='Sum', axis=0, dtype='float32', opt_arg=None, formula2=None, cuda_type=None, use_double_acc=False, use_BlockRed='auto', use_Kahan=False)[source]¶ Instantiate a new generic operation.
Note
Genred
relies on C++ or CUDA kernels that are compiled onthefly, and stored in a cache directory as shared libraries (“.so” files) for later use. Parameters
formula (string) – The scalar or vectorvalued expression that should be computed and reduced. The correct syntax is described in the documentation, using appropriate mathematical operations.
aliases (list of strings) –
A list of identifiers of the form
"AL = TYPE(DIM)"
that specify the categories and dimensions of the input variables. Here:AL
is an alphanumerical alias, used in the formula.TYPE
is a category. One of:Vi
: indexation by \(i\) along axis 0.Vj
: indexation by \(j\) along axis 1.Pm
: no indexation, the input tensor is a vector and not a 2d array.
DIM
is an integer, the dimension of the current variable.
As described below,
__call__()
will expect as input Tensors whose shape are compatible with aliases.
 Keyword Arguments
reduction_op (string, default =
"Sum"
) – Specifies the reduction operation that is applied to reduce the values offormula(x_i, y_j, ...)
along axis 0 or axis 1. The supported values are one of Reductions.axis (int, default = 0) –
Specifies the dimension of the “kernel matrix” that is reduced by our routine. The supported values are:
axis = 0: reduction with respect to \(i\), outputs a
Vj
or “\(j\)” variable.axis = 1: reduction with respect to \(j\), outputs a
Vi
or “\(i\)” variable.
dtype (string, default =
"float32"
) –Specifies the numerical
dtype
of the input and output arrays. The supported values are:dtype =
"float32"
or"float"
.dtype =
"float64"
or"double"
.
opt_arg (int, default = None) – If reduction_op is in
["KMin", "ArgKMin", "KMin_ArgKMin"]
, this argument allows you to specify the numberK
of neighbors to consider.use_double_acc (bool, default False) – if True, accumulate results of reduction in float64 variables, before casting to float32. This can only be set to True when data is in float32, and reduction_op is one of:”Sum”, “MaxSumShiftExp”, “LogSumExp”, “Max_SumShiftExpWeight”, “LogSumExpWeight”, “SumSoftMaxWeight”. It improves the accuracy of results in case of large sized data, but is slower.
use_BlockRed (bool or "auto", default "auto") – if True, use an intermediate accumulator in each block before accumulating in the output. This improves accuracy for large sized data. This can only be set to True when reduction_op is one of:”Sum”, “MaxSumShiftExp”, “LogSumExp”, “Max_SumShiftExpWeight”, “LogSumExpWeight”, “SumSoftMaxWeight”. Default value “auto” will reset it to True for these reductions.
use_Kahan (bool, default False) – use Kahan summation algorithm to compensate for roundoff errors. This improves accuracy for large sized data. This can only be set to True when reduction_op is one of:”Sum”, “MaxSumShiftExp”, “LogSumExp”, “Max_SumShiftExpWeight”, “LogSumExpWeight”, “SumSoftMaxWeight”.

__call__
(*args, backend='auto', device_id=1, ranges=None)[source]¶ To apply the routine on arbitrary torch Tensors.
Warning
Even for variables of size 1 (e.g. \(a_i\in\mathbb{R}\) for \(i\in[0,M)\)), KeOps expects inputs to be formatted as 2d Tensors of size
(M,dim)
. In practice,a.view(1,1)
should be used to turn a vector of weights into a list of scalar values. Parameters
*args (2d Tensors (variables
Vi(..)
,Vj(..)
) and 1d Tensors (parametersPm(..)
)) –The input numerical arrays, which should all have the same
dtype
, be contiguous and be stored on the same device. KeOps expects one array per alias, with the following compatibility rules:All
Vi(Dim_k)
variables are encoded as 2dtensors withDim_k
columns and the same number of lines \(M\).All
Vj(Dim_k)
variables are encoded as 2dtensors withDim_k
columns and the same number of lines \(N\).All
Pm(Dim_k)
variables are encoded as 1dtensors (vectors) of sizeDim_k
.
 Keyword Arguments
backend (string) –
Specifies the mapreduce scheme. The supported values are:
"auto"
(default): let KeOps decide which backend is best suited to your data, based on the tensors’ shapes."GPU_1D"
will be chosen in most cases."CPU"
: use a simple C++for
loop on a single CPU core."GPU_1D"
: use a simple multithreading scheme on the GPU  basically, one thread per value of the output index."GPU_2D"
: use a more sophisticated 2D parallelization scheme on the GPU."GPU"
: let KeOps decide which one of the"GPU_1D"
or the"GPU_2D"
scheme will run faster on the given input.
device_id (int, default=1) – Specifies the GPU that should be used to perform the computation; a negative value lets your system choose the default GPU. This parameter is only useful if your system has access to several GPUs.
ranges (6uple of IntTensors, None by default) –
Ranges of integers that specify a blocksparse reduction scheme with Mc clusters along axis 0 and Nc clusters along axis 1. If None (default), we simply loop over all indices \(i\in[0,M)\) and \(j\in[0,N)\).
The first three ranges will be used if axis = 1 (reduction along the axis of “\(j\) variables”), and to compute gradients with respect to
Vi(..)
variables:ranges_i
, (Mc,2) IntTensor  slice indices \([\operatorname{start}^I_k,\operatorname{end}^I_k)\) in \([0,M]\) that specify our Mc blocks along the axis 0 of “\(i\) variables”.slices_i
, (Mc,) IntTensor  consecutive slice indices \([\operatorname{end}^S_1, ..., \operatorname{end}^S_{M_c}]\) that specify Mc ranges \([\operatorname{start}^S_k,\operatorname{end}^S_k)\) inredranges_j
, with \(\operatorname{start}^S_k = \operatorname{end}^S_{k1}\). The first 0 is implicit, meaning that \(\operatorname{start}^S_0 = 0\), and we typically expect thatslices_i[1] == len(redrange_j)
.redranges_j
, (Mcc,2) IntTensor  slice indices \([\operatorname{start}^J_l,\operatorname{end}^J_l)\) in \([0,N]\) that specify reduction ranges along the axis 1 of “\(j\) variables”.
If axis = 1, these integer arrays allow us to say that
for k in range(Mc)
, the output values for indicesi in range( ranges_i[k,0], ranges_i[k,1] )
should be computed using a MapReduce scheme over indicesj in Union( range( redranges_j[l, 0], redranges_j[l, 1] ))
forl in range( slices_i[k1], slices_i[k] )
.Likewise, the last three ranges will be used if axis = 0 (reduction along the axis of “\(i\) variables”), and to compute gradients with respect to
Vj(..)
variables:ranges_j
, (Nc,2) IntTensor  slice indices \([\operatorname{start}^J_k,\operatorname{end}^J_k)\) in \([0,N]\) that specify our Nc blocks along the axis 1 of “\(j\) variables”.slices_j
, (Nc,) IntTensor  consecutive slice indices \([\operatorname{end}^S_1, ..., \operatorname{end}^S_{N_c}]\) that specify Nc ranges \([\operatorname{start}^S_k,\operatorname{end}^S_k)\) inredranges_i
, with \(\operatorname{start}^S_k = \operatorname{end}^S_{k1}\). The first 0 is implicit, meaning that \(\operatorname{start}^S_0 = 0\), and we typically expect thatslices_j[1] == len(redrange_i)
.redranges_i
, (Ncc,2) IntTensor  slice indices \([\operatorname{start}^I_l,\operatorname{end}^I_l)\) in \([0,M]\) that specify reduction ranges along the axis 0 of “\(i\) variables”.
If axis = 0, these integer arrays allow us to say that
for k in range(Nc)
, the output values for indicesj in range( ranges_j[k,0], ranges_j[k,1] )
should be computed using a MapReduce scheme over indicesi in Union( range( redranges_i[l, 0], redranges_i[l, 1] ))
forl in range( slices_j[k1], slices_j[k] )
.
 Returns
The output of the reduction, stored on the same device as the input Tensors. The output of a Genred call is always a 2dtensor with \(M\) or \(N\) lines (if axis = 1 or axis = 0, respectively) and a number of columns that is inferred from the formula.
 Return type
(M,D) or (N,D) Tensor
