Following the same design principles, \(\operatorname{Reduction}\) operators are implemented in the keops/core/reductions/*.h headers. Taking as input an arbitrary symbolic formula F, Reduction templates encode generic Map-Reduce schemes and should implement a few standard routines.


In the case of the simple Sum reduction (Sum_Reduction.h header), these can be described as:

  1. An InitializeReduction method, which fills up the running buffer\(a\)” of our Map-Reduce algorithm – a vector of size F::DIM – with zeros before the start of the loop on the reduction index \(j\).

  2. A ReducePair method, which takes as input a pointer to the running buffer \(a\), a pointer to the result \(F_{i,j} = F(p^1,\dots,x^1_i,\dots,y^1_j,\dots)\) and implements the in-place reduction:

    \[\begin{aligned} a~\gets~a~+~F_{i,j}. \end{aligned}\]
  3. A FinalizeOutput method, which post-processes the buffer \(a\) before saving its value in the output array. This is a useful step for argmin-like reductions; but in the case of the sum, no post-processing is needed.

The online Log-Sum-Exp trick

More interestingly, the Max_SumShiftExp_Reduction.h header implements an online version of the well-known Log-Sum-Exp trick: a factorization of the maximum in the computation of

\[\begin{aligned} \log \sum_{j=1}^\mathrm{N} \exp (F_{i,j}) ~=~ m_i~+~ \log \sum_{j=1}^\mathrm{N} \exp (F_{i,j}\,-\,m_i), ~~\text{with}~~ m_i~=~ \max_{j=1}^\mathrm{N} F_{i,j} \label{eq:logsumexp_trick}\end{aligned}\]

that ensures the computation of this important quantity – the linchpin of maximum likelihood estimators and entropic Optimal Transport solvers – without numeric overflows.

Merging the content of our C++ header and of the Python post-processing step implemented in pykeops/common/, assuming that \(F_{i,j} = F(p^1,\dots,x^1_i,\dots,y^1_j,\dots)\) is a scalar quantity, we may describe its behaviour as follows:

  1. The InitializeReduction method ensures that our running buffer \(a\) is a vector of size 2 that encodes the current value of the inner summation as an explicit (exponent, mantissa) or “(maximum, residual)” pair of float numbers: at any stage of the computation, the pair \((m,r)\) encodes the positive number \(e^{m}\cdot r\) with the required precision. We initially set the value of \(a\) to \((-\infty, 0)\simeq e^{-\infty}\cdot 0\).

  2. The ReducePair method takes as input a pointer to the result \(F_{i,j}\) of the computation, a pointer to the running buffer \(a = (m, r) \simeq e^m\cdot r\) and implements the in-place update:

    \[\begin{split}\begin{aligned} (m,r) ~\gets~ \begin{cases} \big( ~m~, ~\,r + \phantom{r\cdot{}} e^{F_{i,j} - m} \big) & \text{if}~ m \geqslant F_{i,j}\\ \big( F_{i,j},~ 1 + r \cdot e^{m - F_{i,j}} \big) & \text{otherwise.} \end{cases} \end{aligned}\end{split}\]

    This is a numerically stable way of writing the sum reduction:

    \[\begin{split}\begin{aligned} e^m \cdot r ~\gets~ e^m\cdot r \, +\, e^{F_{i,j}} ~=~ \begin{cases} ~e^m~\cdot(\,r+ \phantom{r\cdot{}} e^{F_{i,j}-m}) & \text{if}~ m \geqslant F_{i,j}\\ e^{F_{i,j}}\cdot (1 + r\cdot e^{m-F_{i,j}}) & \text{otherwise.} \end{cases} \end{aligned}\end{split}\]
  3. FinalizeOutput post-processes the buffer \(a = (m,r) \simeq e^{m}\cdot r\) by applying the final “\(\log\)” operation, returning a value of \(m+\log(r)\) for the full reduction.