PyTorch API

geomloss - Geometric Loss functions, with full support of PyTorch’s autograd engine:

SamplesLoss([loss, p, blur, reach, ...])

Creates a criterion that computes distances between sampled measures on a vector space.

class geomloss.SamplesLoss(loss='sinkhorn', p=2, blur=0.05, reach=None, diameter=None, scaling=0.5, truncate=5, cost=None, kernel=None, cluster_scale=None, debias=True, potentials=False, verbose=False, backend='auto')[source]

Creates a criterion that computes distances between sampled measures on a vector space.

Warning

If loss is "sinkhorn" and reach is None (balanced Optimal Transport), the resulting routine will expect measures whose total masses are equal with each other.

Parameters:

loss (string, default = "sinkhorn") –
The loss function to compute. The supported values are:
- "sinkhorn": (Un-biased) Sinkhorn divergence, which interpolates between Wasserstein (blur=0) and kernel (blur= $+ \infty$ ) distances.
- "hausdorff": Weighted Hausdorff distance, which interpolates between the ICP loss (blur=0) and a kernel distance (blur= $+ \infty$ ).
- "energy": Energy Distance MMD, computed using the kernel $k (x, y) = - ‖ x - y ‖_{2}$ .
- "gaussian": Gaussian MMD, computed using the kernel $k (x, y) = \exp (- ‖ x - y ‖_{2}^{2} / 2 σ^{2})$ of standard deviation $σ$ = blur.
- "laplacian": Laplacian MMD, computed using the kernel $k (x, y) = \exp (- ‖ x - y ‖_{2} / σ)$ of standard deviation $σ$ = blur.
p (int, default=2) –
If loss is "sinkhorn" or "hausdorff", specifies the ground cost function between points. The supported values are:
- p = 1: $C (x, y) = ‖ x - y ‖_{2}$ .
- p = 2: $C (x, y) = \frac{1}{2} ‖ x - y ‖_{2}^{2}$ .
blur (float, default=.05) –
The finest level of detail that should be handled by the loss function - in order to prevent overfitting on the samples’ locations.
- If loss is "gaussian" or "laplacian", it is the standard deviation $σ$ of the convolution kernel.
- If loss is "sinkhorn" or "hausdorff", it is the typical scale $σ$ associated to the temperature $ε = σ^{p}$ . The default value of .05 is sensible for input measures that lie in the unit square/cube.
Note that the Energy Distance is scale-equivariant, and won’t be affected by this parameter.
reach (float, default=None= $+ \infty$ ) – If loss is "sinkhorn" or "hausdorff", specifies the typical scale $τ$ associated to the constraint strength $ρ = τ^{p}$ .
diameter (float, default=None) – A rough indication of the maximum distance between points, which is used to tune the $ε$ -scaling descent and provide a default heuristic for clustering multiscale schemes. If None, a conservative estimate will be computed on-the-fly.
scaling (float, default=.5) – If loss is "sinkhorn", specifies the ratio between successive values of $σ = ε^{1 / p}$ in the $ε$ -scaling descent. This parameter allows you to specify the trade-off between speed (scaling < .4) and accuracy (scaling > .9).
truncate (float, default=None= $+ \infty$ ) – If backend is "multiscale", specifies the effective support of a Gaussian/Laplacian kernel as a multiple of its standard deviation. If truncate is not None, kernel truncation steps will assume that $\exp (- x / σ)$ or $\exp (- x^{2} / 2 σ^{2})$ are zero when $‖ x ‖ > truncate \cdot σ$ .
cost (function or string, default=None) –
if loss is "sinkhorn" or "hausdorff", specifies the cost function that should be used instead of $\frac{1}{p} ‖ x - y ‖^{p}$ :
- If backend is "tensorized", cost should be a python function that takes as input a (B,N,D) torch Tensor x, a (B,M,D) torch Tensor y and returns a batched Cost matrix as a (B,N,M) Tensor.
- Otherwise, if backend is "online" or "multiscale", cost should be a KeOps formula, given as a string, with variables X and Y. The default values are "Norm2(X-Y)" (for p = 1) and "(SqDist(X,Y) / IntCst(2))" (for p = 2).
cluster_scale (float, default=None) – If backend is "multiscale", specifies the coarse scale at which cluster centroids will be computed. If None, a conservative estimate will be computed from diameter and the ambient space’s dimension, making sure that memory overflows won’t take place.
debias (bool, default=True) – If loss is "sinkhorn", specifies if we should compute the unbiased Sinkhorn divergence instead of the classic, entropy-regularized “SoftAssign” loss.
potentials (bool, default=False) – When this parameter is set to True, the SamplesLoss layer returns a pair of optimal dual potentials $F$ and $G$ , sampled on the input measures, instead of differentiable scalar value. These dual vectors $(F (x_{i}))$ and $(G (y_{j}))$ are encoded as Torch tensors, with the same shape as the input weights $(α_{i})$ and $(β_{j})$ .
verbose (bool, default=False) – If backend is "multiscale", specifies whether information on the clustering and $ε$ -scaling descent should be displayed in the standard output.
backend (string, default = "auto") –
The implementation that will be used in the background; this choice has a major impact on performance. The supported values are:
- "auto": Choose automatically, using a simple heuristic based on the inputs’ shapes.
- "tensorized": Relies on a full cost/kernel matrix, computed once and for all and stored on the device memory. This method is fast, but has a quadratic memory footprint and does not scale beyond ~5,000 samples per measure.
- "online": Computes cost/kernel values on-the-fly, leveraging online map-reduce CUDA routines provided by the pykeops library.
- "multiscale": Fast implementation that scales to millions of samples in dimension 1-2-3, relying on the block-sparse reductions provided by the pykeops library.