KeOps Lazy tensors
Current state of the art.
The level of performance provided by KeOps
may surprise readers who grew accustomed to
the limitations of tensor-centric frameworks.
As discussed in previous sections,
common knowledge in the machine
learning community asserts that “kernel” computations can not scale to
large point clouds with the CUDA backends of modern libraries:
Focusing on the key operation involved, the kernel dot product or discrete convolution:
most authors are tempted to introduce an
To accelerate computations, a flourishing literature has then focused
on the construction of low-rank approximations of the linear
operator
Our focus: exact Map-Reduce computations
As discussed in detail in our conclusion, such approximation strategies have a long history and a clear intrinsic value. Nevertheless, acknowledging the fact that progresses can also be made through low-level software engineering, we decide to tackle this problem in a completely different way. Brushing aside the elegant but inefficient matrix decomposition written above, the KeOps package directly optimizes the kernel sum by understanding it as a Map-Reduce composition of the operators:
over the indexing indices
Parenthesis: are we re-packaging the wheel? This approach is common in the computer graphics literature but tends to be strictly limited to C++/CUDA programming guides: with its emphasis on real-time rendering and explicit models, the graphics community never felt the need to develop high-level libraries that would be suited to machine learning research.
In this context, our scientific contribution does not lie in any new theorem or algorithm. Described in the next few sections, the tools on which KeOps relies (the backpropagation algorithm, online Map-Reduce CUDA schemes and symbolic variadic templating) are all very well-known in their respective communities. But as we combine them in a versatile framework, endowed with a transparent interface and a comprehensive documentation, we allow them to reach a much wider audience and have, hopefully, a positive and fertilizing impact.
A generic framework
The seminal Theano library combined the flexibility of high-level Python frameworks with a first-class support of convolutional architectures on the GPU. In the same vein, the KeOps package puts the spotlight on Map-Reduce schemes for (off-grid) sampled data, an algorithmic structure that we deem to be relevant in many fields that are related to data sciences and shape analysis.
Removing all the Python sugar coating, the workhorse of our
library is a Generic Reduction (Genred
) operator that supports
a wide family of formulas. Let us assume that we have:
A collection
of vectors. A collection
of vector sequences, indexed by an integer in . A collection
of vector sequences, indexed by an integer in . A vector-valued formula
on these input vectors. A
operation that may be a sum, an arg-min, a log-sum-exp, etc.
Then, referring to the Genred
call allows
users to compute efficiently the expression
alongside its derivatives with respect to all variables and parameters. As showcased in our gallery of tutorials, this level of generality allows KeOps to handle K-Nearest-Neighbors classification, K-Means clustering, Gaussian Mixture Model-fitting and many other tasks.
The LazyTensor
abstraction
Implementation details are covered in the next few sections but probably
won’t interest most mathematicians. Wary of making users step outside of
the convenient tensor-centric paradigm, we give a matrix-like interface
to the generic reduction above. Through a new
LazyTensor
wrapper for NumPy arrays and PyTorch tensors,
users may specify formulas
KeOps LazyTensors
embody the concept of
“symbolic” tensors that are not sparse in the traditional sense, but can
nevertheless be handled more efficiently than large
A symbolic mathematical formula
, the “.formula” attribute that is encoded as a well-formed string, manipulated with Python operations and parsed at reduction time. A collection of “small” data arrays
, and , the “.variables” list of parameters, - and -variables that are needed to evaluate the formula .
Coming back to the example of the previous section,
we may display the LazyTensor
K_ij using:
>>> print(K_ij)
... KeOps LazyTensor
... formula: Exp((Minus(Sum(Square((Var(0,3,0) - Var(1,3,1))))) / Var(2,1,2)))
... shape: (1000, 1000)
Here, the Var(index, dimension, [ i | j | parameter ] ) placeholders
refer to the data arrays q_i, q_j and 1/(2*s*s) that are
stored in the list of K_ij.variables. As we call a supported
reduction operator such as the matrix dot-product “@” on K_ij,
this information is fed to the Genred
engine and a result is
returned as a genuine, differentiable PyTorch tensor: things just
work smoothly, with full support of operator broadcasting and
batch dimensions.