Structure of the repository

But how does KeOps handle symbolic formulas on the GPU? How can its routines outperform the CUDA backends of Deep Learning frameworks by such a wide margin? To answer these questions, we need to dive into the mixed C++/Python/Matlab codebase of the KeOps package, whose structure may be summarized as follows:

  • The pykeops/ folder, with common/, numpy/ and torch/ subfolders contains our Python wrappers and relies on the fantastic PyBind11 library.

  • The keopslab/ folder provides a collection of entry points for Matlab scripts.

  • The keops/ folder contains our C++ files and the associated compilation scripts. The generic KeOps engine that we are now about to discuss is implemented in the core/ subfolder which contains:

    • The link_autodiff.cpp and link_autodiff.cu “main” C++ files, which define the methods that binding libraries may use to create high-level modules.

    • The pack/ subfolder, which defines abstract types for lists and tuples within the C++ templating system. Using advanced concepts that were introduced with the C++11 revision, this file allows us to drive the nvcc compiler with declarative “variadic templating” and generate routines that manipulate an arbitrary number of parameters, \(i\)- and \(j\)-variables.

    • The autodiff/ subfolder, which defines the primitives of the KeOps symbolic syntax: variables, abstract unary and binary operations, gradients.

    • The mapreduce/GpuConv*_.cu CUDA files, which implement our massively parallel Map-Reduce schemes. These files contain the core logic of the KeOps library.

    • The mapreduce/CpuConv*_.cpp C++ files, which implement simple Map-Reduce schemes using standard “for” loops. They may be used to test the correctness of our parallel implementations and provide a fall-back mode to users who do not have access to GPU chips on their machines.

    • The reductions/ subfolder, which implements the supported \(\operatorname{Reduction}\) operations: sum, arg-min, log-sum-exp, etc.

    • The formulas/ subfolder, which implement the atomic operations that users may combine to define vector-valued formulas \(F\).

As evidenced here, the KeOps engine is heavily reliant on modern features of the C++ language: every time Genred encounters a new kind of generic operation (up to the values of \(\mathrm{M}\), \(\mathrm{N}\) and the data arrays which are free to change between every call), the string that specifies a generic formula is parsed by the compiler and a new “.dll” or “.so” shared object is generated before being executed on the relevant Python or Matlab tensors.