Licence and Copyright: see https://github.com/getkeops/keops/blob/master/licence.txt

# Authors¶

Feel free to contact us for any bug report or feature request, you can also fill an issue report on GitHub.

## KeOps, PyKeOps, KeOpsLab¶

## RKeOps¶

## Contributors¶

François-David Collin

# What is RKeOps?¶

RKeOps is the R package interfacing the KeOps library. Here you can find a few slides explaining functionalities of the KeOps library.

## KeOps¶

Seamless Kernel Operations on GPU (or CPU), with auto-differentiation and without memory overflows

The KeOps library (http://www.kernel-operations.io) provides routines to compute generic reductions of large 2d arrays whose entries are given by a mathematical formula. Using a C++/CUDA-based implementation with GPU support, it combines a tiled reduction scheme with an automatic differentiation engine. Relying on online map-reduce schemes, it is perfectly suited to the scalable computation of kernel dot products and the associated gradients, even when the full kernel matrix does not fit into the GPU memory.

KeOps is all about breaking through this memory bottleneck and making GPU power available for seamless standard mathematical routine computations. As of 2019, this effort has been mostly restricted to the operations needed to implement Convolutional Neural Networks: linear algebra routines and convolutions on grids, images and volumes. KeOps provides CPU and GPU support without the cost of developing a specific CUDA implementation of your custom mathematical operators.

To ensure its versatility, KeOps can be used through Matlab, Python (NumPy or PyTorch) and R back-ends.

## RKeOps¶

- Compute
**generic reduction**(row-wise or column-wise) of very large array/matrices, i.e. \[\sum_{i=1}^M a_{ij} \ \ \ \ \text{or}\ \ \ \ \sum_{j=1}^N a_{ij}\] for some matrix \(A = [a_{ij}]_{M \times N}\) with \(M\) rows and \(N\) columns, whose entries \(a_{ij}\) can be defined with basic math formulae or matrix operators. - Compute
**kernel dot products**, i.e. \[\sum_{i=1}^M K(\mathbf x_i, \mathbf y_j)\ \ \ \ \text{or}\ \ \ \ \sum_{j=1}^N K(\mathbf x_i, \mathbf y_j)\] for a kernel function \(K\) and some vectors \(\mathbf x_i\), \(\mathbf y_j\in \mathbb{R}^D\) that are generally rows of some data matrices \(\mathbf X = [x_{ik}]_{M \times D}\) and \(\mathbf Y = [y_{jk}]_{N \times D}\) respectively. - Compute the
**associated gradients*****Applications***: RKeOps can be used to implement a wide range of problems encountered in***machine learning***,***statistics***and more: such as \(k\)-nearest neighbor classification, \(k\)-means clustering, Gaussian-kernel-based problems (e.g. linear system with Ridge regularization), etc.

## Why using RKeOps?¶

- an API to create
**user-defined operators**based on generic mathematical formulae, that can be applied to data matrices such as \(\mathbf X = [x_{ik}]_{M \times D}\) and \(\mathbf Y = [y_{jk}]_{N \times D}\). - fast computation on
**GPU**without memory overflow, especially to process**very large dimensions**\(M\) and \(N\) (e.g. \(\approx 10^4\) or \(10^6\)) over indexes \(i\) and \(j\). - automatic differentiation and
**gradient computations**for user-defined operators.

# Matrix reduction and kernel operator¶

- \(\boldsymbol{\sigma}\in\mathbb R^L\) is a vector of parameters
- \(\mathbf x_i\in \mathbb{R}^D\) and \(\mathbf y_j\in \mathbb{R}^{D’}\) are two vectors of data (potentially with different dimensions)
- \(G: \mathbb R^L \times \mathbb R^D \times \mathbb R^{D’} \to \mathbb R\) is a function of the data and the parameters, that can be expressed through a composition of generic operators
- \(\text{reduction}\) is a generic reduction operation over the index \(i\) or \(j\) (e.g. sum)

*Note:*You can use a wide range of reduction such as`sum`

,`min`

,`argmin`

,`max`

,`argmax`

, etc.

## What you need to do¶

- ones whose rows (or columns) are indexed by \(i=1,…,M\) such as \(\mathbf X = [x_{ik}]_{M \times D}\)
- others whose rows (or columns) are indexed by \(j=1,…,N\) such as \(\mathbf Y = [y_{ik’}]_{N \times D’}\)

More details about input matrices (size, storage order) are given in the vignette ‘Using RKeOps’.

## Example in R¶

We want to implement with RKeOps the following mathematical formula \[\sum_{j=1}^{N} \exp\Big(-\sigma || \mathbf x_i - \mathbf y_j ||_2^{\,2}\Big)\,\mathbf b_j\] with

- parameter: \(\sigma\in\mathbb R\)
- \(i\)-indexed variables \(\mathbf X = [\mathbf x_i]_{i=1,…,M} \in\mathbb R^{M\times 3}\)
- \(j\)-indexed variables \(\mathbf Y = [\mathbf y_j]_{j=1,…,N} \in\mathbb R^{N\times 3}\) and \(\mathbf B = [\mathbf b_j]_{j=1,…,N} \in\mathbb R^{N\times 6}\)

In R, we can define the corresponding KeOps formula as a **simple text
string**:

```
formula = "Sum_Reduction(Exp(-s * SqNorm2(x - y)) * b, 0)"
```

`SqNorm2`

= squared \(\ell_2\) norm`Exp`

= exponential`Sum_reduction(..., 0)`

= sum reduction over the dimension 0 i.e. sum on the \(j\)’s (1 to sum over the \(i\)’s)

and the corresponding arguments of the formula, i.e. parameters or variables indexed by \(i\) or \(j\) with their corresponding inner dimensions:

```
args = c("x = Vi(3)", # vector indexed by i (of dim 3)
"y = Vj(3)", # vector indexed by j (of dim 3)
"b = Vj(6)", # vector indexed by j (of dim 6)
"s = Pm(1)") # parameter (scalar)
```

Then we just compile the corresponding operator and apply it to some data

```
# compilation
op <- keops_kernel(formula, args)
# data and parameter values
nx <- 100
ny <- 150
X <- matrix(runif(nx*3), nrow=nx) # matrix 100 x 3
Y <- matrix(runif(ny*3), nrow=ny) # matrix 150 x 3
B <- matrix(runif(ny*6), nrow=ny) # matrix 150 x 6
s <- 0.2
# computation (order of the input arguments should be similar to `args`)
res <- op(list(X, Y, B, s))
```

## Generic kernel function¶

- the linear kernel (standard scalar product) \(K(\mathbf x_i, \mathbf y_j) = \big\langle \mathbf x_i \, , \, \mathbf y_j \big\rangle\)
- the Gaussian kernel \(K(\mathbf x_i, \mathbf y_j) = \exp\left(-\frac{1}{2\sigma^2} || \mathbf x_i - \mathbf y_j ||_2^{\,2}\right)\) with \(\sigma>0\)
- and more…

Kernel reduction: \[\sum_{i=1}^M K(\mathbf x_i, \mathbf y_j)\ \ \ \ \text{or}\ \ \ \ \sum_{j=1}^N K(\mathbf x_i, \mathbf y_j)\]

- Convolution-like operations: \[\sum_{i=1}^M K(\mathbf x_i, \mathbf y_j)\boldsymbol\beta_j\ \ \ \ \text{or}\ \ \ \ \sum_{j=1}^N K(\mathbf x_i, \mathbf y_j)\boldsymbol\beta_j\] for some vectors \((\boldsymbol\beta_j)_{j=1,…,N} \in \mathbb R^{N\times D}\)
More complex operations: \[\sum_{i=1}^{M}\, K_1(\mathbf x_i, \mathbf y_j)\, K_2(\mathbf u_i, \mathbf v_j)\,\langle \boldsymbol\alpha_i\, ,\,\boldsymbol\beta_j\rangle \ \ \ \ \text{or}\ \ \ \ \sum_{j=1}^{N}\, K_1(\mathbf x_i, \mathbf y_j)\, K_2(\mathbf u_i, \mathbf v_j)\,\langle \boldsymbol\alpha_i\, ,\,\boldsymbol\beta_j\rangle\] for some kernel \(K_1\) and \(K_2\), and some \(D\)-vectors \((\mathbf x_i)_{i=1,…,M}, (\mathbf u_i)_{i=1,…,M}, (\boldsymbol\alpha_i)_{i=1,…,M} \in \mathbb R^{M\times D}\) and \((\mathbf y_j)_{j=1,…,N}, (\mathbf v_j)_{j=1,…,N}, (\boldsymbol\beta_j)_{j=1,…,N} \in \mathbb R^{N\times D}\)

## CPU and GPU computing¶

Based on your formulae, RKeOps compile on the fly operators that can be used to run the corresponding computations on CPU or GPU, it uses a tiling scheme to decompose the data and avoid (i) useless and costly memory transfers between host and GPU (performance gain) and (ii) memory overflow.

*Note:*You can use the same code (i.e. define the same operators) for CPU or GPU computing. The only difference will be the compiler used for the compilation of your operators (upon the availability of CUDA on your system).

To use CPU computing mode, you can call `use_cpu()`

(with an optional
argument `ncore`

specifying the number of cores used to run parallel
computations).

To use GPU computing mode, you can call `use_gpu()`

(with an optional
argument `device`

to choose a specific GPU id to run computations).

# Installing and using RKeOps¶

See the specific vignette **Using RKeOps**.