# Why using KeOps?¶

## Scalable kernel operations¶

KeOps can now be used on a broad class of problems (A generic framework that suits your needs).
But at heart, the main motivation behind this library is the need to compute fast and scalable Gaussian convolutions (**RBF kernel products**). For
**very large values** of \(M\) and \(N\), given :

a

**target**point cloud \((x_i)_{i=1}^M \in \mathbb R^{M \times D}\),a

**source**point cloud \((y_j)_{j=1}^N \in \mathbb R^{N \times D}\),a

**signal**\((b_j)_{j=1}^N \in \mathbb R^{N}\) attached to the \(y_j\)’s,

KeOps allows you to compute efficiently the array \((a_i)_{i=1}^M \in \mathbb R^{M}\) given by

where \(K(x_i,y_j) = \exp(-\|x_i - y_j\|^2 / 2 \sigma^2)\).
On top of this, thanks to KeOps’ **automatic differentiation** module,
you can also get access to the gradient of the \(a_i\)’s with respect to the \(x_i\)’s:

without having to code the formula \(\partial_x K(x_i,y_j) = -\tfrac{1}{\sigma^2}(x_i - y_j) \exp(-\|x_i - y_j\|^2 / 2 \sigma^2)\) !

## High performances¶

In recent years, Deep Learning frameworks such as PyTorch or TensorFlow have evolved into fully-fledged applied math libraries: with negligible overhead, they bring **automatic differentiation** and **seamless GPU support** to research communities that were used to Matlab, NumPy and other tensor-centric frameworks.

Unfortunately, though, *no magic* is involved: optimised CUDA codes still have to be written for every single operation provided to end-users. Supporting all the standard mathematical routines thus comes at a **huge engineering cost** for the developers of the main frameworks. As of 2019, this effort has been mostly restricted to the operations needed to implement **Convolutional Neural Networks**: linear algebra routines and convolutions on *grids*, images and volumes.

Consequently, in array-centric frameworks, the standard way of computing Gaussian convolutions is to create and store in memory the full \(M\)-by-\(N\) kernel matrix \(K_{i,j}=K(x_i,y_j)\), before computing \((a_i) = (K_{i,j}) (b_j)\) as a matrix product.
But for large datasets (say, \(M,N \geqslant 10,000\)), this is not realistic: **large matrices just don’t fit in GPU memories**.

KeOps is all about **letting researchers break through this memory bottleneck**. Relying on **online map-reduce schemes**, we provide CUDA routines that “sum” the coefficients \(K_{i,j}\cdot b_j\) as they are computed, without ever storing the full matrix \(K\) in memory.

As evidenced by our benchmarks,
the KeOps routines **outperform** their standard GPU-tensorized counterparts
**by two orders of magnitude** on modern hardware:
on top of a reduced memory usage, they can also bring
a considerable speed-up to all applications
that rely on massive - but simple - map-reduce operations.

## A generic framework that suits your needs¶

KeOps supports **generic operations**, way beyond the simple case of kernel convolutions.
Let’s say that you have at hand:

a collection \(p^1, p^2, ..., p^P\) of vectors.

a collection \(x^1_i, x^2_i, ..., x^X_i\) of vector sequences, indexed by an integer \(i\) ranging from 1 to \(M\).

a collection \(y^1_j, y^2_j, ..., y^Y_j\) of vector sequences, indexed by an integer \(j\) ranging from 1 to \(N\).

a vector-valued function \(f(p^1, p^2,..., x^1_i, x^2_i,..., y^1_j, y^2_j, ...)\) on these input vectors.

Then, referring to the \(p\)’s as **parameters**, the \(x_i\)’s as **x-variables** and the \(y_j\)’s as **y-variables**, the KeOps library allows you to compute efficiently *any* expression \(a_i\) of the form

alongside its **derivatives** with respect to all the variables and parameters.

Feel free to browse through our gallery of tutorials for examples of applications.

## Features¶

Most common reduction operations: Summation, stabilized LogSumExp reduction, Min, Max, SoftMin, Softmax…

Block-sparse reductions and kernel matrices.

Custom high-level (

`'gaussian(x,y) * (1+linear(u,v)**2)'`

) and low-level (`'Exp(-G*SqDist(X,Y)) * ( IntCst(1) + Pow((U|V), 2) )'`

) syntaxes to compute general formulas.High-order derivatives with respect to all parameters and variables.

Inverse of positive definite linear systems through the classes

`torch.KernelSolve`

(see also here) and`numpy.KernelSolve`

(see also here)