Using RKeOps
2023-03-31
Thanks to RKeOps, you can use GPU computing directly inside R without the cost of developing a specific CUDA implementation of your custom mathematical operators.
Installing RKeOps
Requirements
R (tested with R >= 4.2)
C++ compiler (g++ >=7 or clang) for CPU computing or CUDA compiler (nvcc >=10) and CUDA libs for GPU computing
Python (>= 3.10)
Important: Python is a requirement as an intern machinery for the package to work but you will not need to create nor manipulate Python codes to use the RKeOps package.
Disclaimer: KeOps (including RKeOps) is not functional on Windows, it was only tested on Linux and MacOS.
Python requirement
RKeOps now uses PyKeOps Python package under the hood thanks to
the `reticulate
<https://rstudio.github.io/reticulate/>`__ R
package that provides an “R Interface to Python”.
We recommend to use a dedicated Python environment (through
reticulate
) to install and use RKeOps Python dependencies.
To do so, you can run the following code before installing
RKeOps:
install.packages("reticulate")
library(reticulate)
virtualenv_create("rkeops")
use_virtualenv(virtualenv = "rkeops", required = TRUE)
py_config()
Note: To get more information about managing which version of Python you are using, you can refer to
reticulate
documentation about “Python Version Configuration”. In particular, if you are a Miniconda/Anaconda Python distribution user, you can either user a Python virtual environment (c.f. above) or a Conda environment with the same results. Please refer to thereticulate
documentation in this case.
Install from CRAN
RKeOps >=2 is not available on CRAN yet, please refer to the next section.
Install development version
install.packages("remotes")
remotes::install_github("getkeops/keops", subdir = "rkeops")
How to use RKeOps
If you are using a Python environment (c.f. previously), you should always run the following commands before loading RKeOps:
use_virtualenv(virtualenv = "rkeops", required = TRUE)
py_config()
Load RKeOps in R:
library(rkeops)
You can verify that every thing is ok:
check_rkeops()
RKeOps allows to define and compile new operators that run computations on GPU.
Example
# implementation of a convolution with a Gaussian kernel
formula = "Sum_Reduction(Exp(-s * SqNorm2(x - y)) * b, 0)"
# input arguments
args = c("x = Vi(3)", # vector indexed by i (of dim 3)
"y = Vj(3)", # vector indexed by j (of dim 3)
"b = Vj(6)", # vector indexed by j (of dim 6)
"s = Pm(1)") # parameter (scalar)
# compilation
op <- keops_kernel(formula, args)
# data and parameter values
nx <- 100
ny <- 150
X <- matrix(runif(nx*3), nrow=nx) # matrix 100 x 3
Y <- matrix(runif(ny*3), nrow=ny) # matrix 150 x 3
B <- matrix(runif(ny*6), nrow=ny) # matrix 150 x 6
s <- 0.2
# to run computation on CPU (default mode)
use_cpu()
# to run computations on GPU (to be used only if relevant)
use_gpu()
# computation (order of the input arguments should be similar to `args`)
res <- op(list(X, Y, B, s))
The different elements (formula, arguments, compilation, computation) in the previous example will be detailed in the next sections.
Formula
To use RKeOps and define new operators, you need to write the corresponding formula which is a text string defining a composition of mathematical operations. It should be characterized by two elements:
a composition of generic functions applied to some input matrices, whose one of their dimensions is either indexed by \(i=1,…,M\) or \(j=1,…,N\)
a reduction over indexes \(i=1,…,M\) (row-wise) or \(j=1,…,N\) (column-wise) of the \(M \times N\) matrix whose entries are defined by 1.
Example: We want to implement the following kernel-based reduction (convolution with a Gaussian kernel): \[\sum_{j=1}^{N} \exp\Big(-\sigma || \mathbf x_i - \mathbf y_j ||_2^{\,2}\Big)\,\mathbf b_j\] with
- parameter: \(\sigma\in\mathbb R\)
- \(i\)-indexed variables \([\mathbf x_i]_{i=1,…,M} \in\mathbb R^{M\times 3}\)
- \(j\)-indexed variables \([\mathbf y_j]_{j=1,…,N} \in\mathbb R^{N\times 3}\) and \([\mathbf b_j]_{j=1,…,N} \in\mathbb R^{N\times 6}\)
In R, we can define the corresponding KeOps formula as a simple text string:
formula = "Sum_Reduction(Exp(-s * SqNorm2(x - y)) * b, 0)"
SqNorm2
= squared \(\ell_2\) normExp
= exponentialSum_reduction(..., 0)
= sum reduction over the dimension 0 i.e. sum on the \(j\)’s (1 to sum over the \(i\)’s)
Arguments
The formula describing your computation can take several input arguments: variables and parameters. The input variables will generally corresponds to rows or columns of your data matrices, you need to be cautious with their dimensions.
Input matrix
- ones whose rows (or columns) are indexed by \(i=1,…,M\) such as \(\mathbf X = [x_{ik}]_{M \times D}\)
- others whose rows (or columns) are indexed by \(j=1,…,N\) such as \(\mathbf Y = [y_{ik’}]_{N \times D’}\)
- Outer dimensions \(M\) and \(N\) (over indexes \(i\) and \(j\) respectively) can be very large, even too large for GPU memory.
- Inner dimensions \(D\) and \(D’\) should be small enough to fit in GPU memory, in particular to ensure data colocality and avoid useless memory transfers. Corresponding columns (or rows) should be contiguous in memory (this point is handled for you in RKeOps, see this section).
Note 1: The outer dimension can correspond to the rows or the columns of the input matrices (and vice-versa for the inner dimension). The optimal orientation of input matrices is discussed in this section .
Note 2: All matrices indexed by \(i\) should have the same outer dimension \(M\) over \(i\), same for all matrices indexed by \(j\) (outer dimension \(N\)). Only the inner dimensions \(D\) and \(D’\) should be known for the compilation of your operators. The respective outer dimensions \(M\) and \(N\) are discovered at runtime (and can change from one run to another).
Notations
Input arguments of the formula are defined by using keywords, they can be of different types:
keyword |
meaning |
---|---|
|
variable indexed by |
|
variable indexed by |
|
parameter |
You should provide a vector of text string specifying the name and the type of all arguments in your formula.
"Vi(D)"
, same for
a \(D\)-dimensional variable indexed by \(j\) being
"Vj(D)"
or a \(D\)-dimensional parameter "Pm(D)"
.The vector of arguments should be
args = c("<name1>=<type1>(dim1)", "<name2>=<type2>(dim2)", "<nameX>=<typeX>(dimX)")
where
<nameX>
is the name<type1>
is the type (amongVi
,Vj
orPm
)<dimX>
is the inner dimension
X
\(^\text{th}\) variable in the formula.Important: The names should correspond to the ones used in the formula. The input parameter order will be the one used when calling the compiled operator.
Example: We define the corresponding arguments of the previous formula, i.e. parameters or variables indexed by \(i\) or \(j\) with their corresponding inner dimensions:
args = c("x = Vi(3)", # vector indexed by i (of dim 3)
"y = Vj(3)", # vector indexed by j (of dim 3)
"b = Vj(6)", # vector indexed by j (of dim 6)
"s = Pm(1)") # parameter (scalar)
Creating a new operator
By using the function keops_kernel
, based on the formula and
its arguments that we previously defined, we can compile and load
into R the corresponding operator:
# compilation
op <- keops_kernel(formula, args)
keops_kernel(formula, args)
returns a function that
can be later used to run computations on your data with your
value of parameters. You should only be cautious with the
similarity of each argument inner dimension.The returned function (here op
) expects a list of input values
in the order specified in the vector args
.
The result of compilation (shared library file) is stored on the
system and will be reused when calling again the function
keops_kernel
on the same formula with the same arguments and
the same conditions (e.g. precision), to avoid useless
recompilation.
Run computations
We generate data with inner dimensions (number of columns)
corresponding to each arguments expected by the operator op
.
The function op
takes in input a list of input arguments. If
the list if named, op
checks the association between the
supplied names and the names of the formula arguments. In this
case only, it can also correct the order of the input list to
match the expected order of arguments.
# data and parameter values
nx <- 100
ny <- 150
X <- matrix(runif(nx*3), nrow=nx) # matrix 100 x 3
Y <- matrix(runif(ny*3), nrow=ny) # matrix 150 x 3
B <- matrix(runif(ny*6), nrow=ny) # matrix 150 x 6
s <- 0.2
# to run computation on CPU (default mode)
use_cpu()
# to run computations on GPU (to be used only if relevant)
use_gpu()
# computation (order of the input arguments should be similar to `args`)
res <- op(list(x, y, beta, s))
Computing gradients
You can define gradients directly in the formula, e.g.
# defining a formula with a Gradient
formula <- "Grad(Sum_Reduction(SqNorm2(x-y), 0), x, eta)"
args <- c("x=Vi(0,3)", "y=Vj(1,3)", "eta=Vi(2,1)")
# compiling the corresponding operator
op <- keops_kernel(formula, args)
# data
nx <- 100
ny <- 150
x <- matrix(runif(nx*3), nrow=nx, ncol=3) # matrix 100 x 3
y <- matrix(runif(ny*3), nrow=ny, ncol=3) # matrix 150 x 3
eta <- matrix(runif(nx*1), nrow=nx, ncol=1) # matrix 100 x 1
# computation
input <- list(x, y, eta)
res <- op(input)
where eta
is the new variable at which the gradient is
computed, its dimension should correspond to the output dimension
of the operation inside the gradient (here SqNorm2(x-y)
is of
dimension 1).
You can also use the function keops_grad
to compute partial
derivative for existing KeOps operators.
# defining an operator (reduction on squared distance)
formula <- "Sum_Reduction(SqNorm2(x-y), 0)"
args <- c("x=Vi(0,3)", "y=Vj(1,3)")
op <- keops_kernel(formula, args)
# defining its gradient regarding x
grad_op <- keops_grad(op, var="x")
# data
nx <- 100
ny <- 150
x <- matrix(runif(nx*3), nrow=nx, ncol=3) # matrix 100 x 3
y <- matrix(runif(ny*3), nrow=ny, ncol=3) # matrix 150 x 3
eta <- matrix(runif(nx*1), nrow=nx, ncol=1) # matrix 100 x 1
# computation
input <- list(x, y, eta)
res <- grad_op(input)
Note: when defining a gradient, the operator created by
keops_grad
requires an additional variable, of the same type
(Vi
, Vj
, Pm
) as the variable with respect to which the
differentiation is made (here x
hence Vi
), and whose inner
dimension corresponds to the output dimension of the derived
formula (here SqNorm2(x-y)
is a real-valued function, hence
dimension 1). In addition, (if Vi
, Vj
variable), its outer
dimension should corresponds to the outer dimension of the
variable regarding which the gradient is taken (here x
).
RKeOps options
RKeOps behavior is driven by specific options in R
global
options scope. Such options are set up when loading RKeOps
(i.e. by calling library(rkeops)
).
You can get the current values of RKeOps options with
get_rkeops_options()
To (re)set RKeOps options to default values, run:
set_rkeops_options()
You can either use set_rkeops_options()
to modify a specific
options (c.f. ?set_rkeops_options
) or you can use helper
functions, c.f. below.
Choosing CPU or GPU computing at runtime
By default, RKeOps runs computations on CPU (even for GPU-compiled operators). To enable GPU computing, you can run (before calling your operator):
rkeops_use_gpu()
You can also specify the GPU id that you want to use,
e.g. rkeops_use_gpu(device=0)
to use GPU 0 (default) for
instance.
To deactivate GPU computations, you can run
rkeops_use_cpu()
.
In CPU mode, you can control the number of CPU cores used by RKeOps for computations, e.g. with
rkeops_use_cpu(ncore = 2)
to run on 2 cores.
Computation precision
By default, RKeOps uses 32bits float precision for
computations. Since R only considers 64bits floating point
numbers, if you want to use float 32bits, input data and output
results will be casted befors and after computations
respectively in your RKeOps operator. If your application
requires to use 64bits float (double) precision, keep in mind
that you will suffer a performance loss (potentially not an
issue on high-end GPUs). In any case, compensated summation
reduction is available in KeOps to correct for 32bits floating
point arithmetic errors (see sum_scheme
argument for
keops_kernel()
function, c.f. ?keops_kernel
).
You use the following functions to choose between 32bits float and 64bits float precision (respectively):
rkeops_use_float32()
rkeops_use_float64()
Verbosity
You can use the following functions to respectively enable and disable compilation verbosity:
rkeops_enable_verbosity()
rkeops_disable_verbosity()
Data storage orientation
Important: In machine learning and statistics, we generally use data matrices where each sample/observation/individual is a row, i.e. matrices where the outer dimensions correspond to rows, e.g. \(\mathbf X = [x_{ik}]_{M \times D}\), \(\mathbf Y = [y_{ik’}]_{N \times D’}\). This is the default using case of RKeOps. RKeOps will then automatically process the input data to enforce colocality along the inner dimension.
If you want to use data where the inner dimension corresponds to
rows of your matrices, i.e. \(\mathbf X^{t} = [x_{ki}]_{D
\times M}\) or \(\mathbf Y^{t} = [y_{k’i}]_{D’ \times
N}\), you just need to specify the input parameter
inner_dim="row"
when calling your operator.
Example:
# standard column reduction of a matrix product
op <- keops_kernel(formula = "Sum_Reduction((x|y), 1)",
args = c("x=Vi(3)", "y=Vj(3)"))
# data (inner dimension = columns)
nx <- 10
ny <- 15
# x_i = rows of the matrix X
X <- matrix(runif(nx*3), nrow=nx, ncol=3)
# y_j = rows of the matrix Y
Y <- matrix(runif(ny*3), nrow=ny, ncol=3)
# computing the result (here, by default `inner_dim="col"` and columns
# corresponds to the inner dimension)
res <- op(list(X,Y))
# data (inner dimension = rows)
nx <- 10
ny <- 15
# x_i = columns of the matrix X
X <- matrix(runif(nx*3), nrow=3, ncol=nx)
# y_j = columns of the matrix Y
Y <- matrix(runif(ny*3), nrow=3, ncol=ny)
# computing the result (we specify `inner_dim="row"` to indicate that rows
# corresponds to the inner dimension)
res <- op(list(X,Y), inner_dim="row")
Compilation files and cleaning
The compilation of new operators produces shared library (or
share object .so
) files stored in a build
sub-directory
of the package installation directory, to be reused and avoid
recompilation of already defined operators.
You can check where your compiled operators are stored by
running get_rkeops_build_dir()
. To clean RKeOps install and
remove all shared library files, you can run
clean_rkeops()
.