Structure of the repository¶
But how does KeOps handle symbolic formulas on the GPU? How can its routines outperform the CUDA backends of Deep Learning frameworks by such a wide margin? To answer these questions, we need to dive into the mixed C++/Python/Matlab codebase of the KeOps package, whose structure may be summarized as follows:
The pykeops/ folder, with common/, numpy/ and torch/ subfolders contains our Python wrappers and relies on the fantastic PyBind11 library.
The keopslab/ folder provides a collection of entry points for Matlab scripts.
The keops/ folder contains our C++ files and the associated compilation scripts. The generic KeOps engine that we are now about to discuss is implemented in the core/ subfolder which contains:
The link_autodiff.cpp and link_autodiff.cu “main” C++ files, which define the methods that binding libraries may use to create high-level modules.
The pack/ subfolder, which defines abstract types for lists and tuples within the C++ templating system. Using advanced concepts that were introduced with the C++11 revision, this file allows us to drive the nvcc compiler with declarative “variadic templating” and generate routines that manipulate an arbitrary number of parameters, \(i\)- and \(j\)-variables.
The autodiff/ subfolder, which defines the primitives of the KeOps symbolic syntax: variables, abstract unary and binary operations, gradients.
The mapreduce/GpuConv*_.cu CUDA files, which implement our massively parallel Map-Reduce schemes. These files contain the core logic of the KeOps library.
The mapreduce/CpuConv*_.cpp C++ files, which implement simple Map-Reduce schemes using standard “for” loops. They may be used to test the correctness of our parallel implementations and provide a fall-back mode to users who do not have access to GPU chips on their machines.
The reductions/ subfolder, which implements the supported \(\operatorname{Reduction}\) operations: sum, arg-min, log-sum-exp, etc.
The formulas/ subfolder, which implement the atomic operations that users may combine to define vector-valued formulas \(F\).
As evidenced here, the KeOps engine is heavily reliant on modern
features of the C++ language: every time Genred
encounters a
new kind of generic operation (up to the values of
\(\mathrm{M}\), \(\mathrm{N}\) and the data arrays which are free to change
between every call), the string that specifies a generic formula is
parsed by the compiler and a new “.dll” or “.so” shared object
is generated before being executed on the relevant Python or
Matlab tensors.