PyTorch API
geomloss
- Geometric Loss functions,
with full support of PyTorch’s autograd
engine:
|
Creates a criterion that computes distances between sampled measures on a vector space. |
- class geomloss.SamplesLoss(loss='sinkhorn', p=2, blur=0.05, reach=None, diameter=None, scaling=0.5, truncate=5, cost=None, kernel=None, cluster_scale=None, debias=True, potentials=False, verbose=False, backend='auto')[source]
Creates a criterion that computes distances between sampled measures on a vector space.
Warning
If loss is
"sinkhorn"
and reach is None (balanced Optimal Transport), the resulting routine will expect measures whose total masses are equal with each other.- Parameters:
loss (string, default =
"sinkhorn"
) –The loss function to compute. The supported values are:
"sinkhorn"
: (Un-biased) Sinkhorn divergence, which interpolates between Wasserstein (blur=0) and kernel (blur= \(+\infty\) ) distances."hausdorff"
: Weighted Hausdorff distance, which interpolates between the ICP loss (blur=0) and a kernel distance (blur= \(+\infty\) )."energy"
: Energy Distance MMD, computed using the kernel \(k(x,y) = -\|x-y\|_2\)."gaussian"
: Gaussian MMD, computed using the kernel \(k(x,y) = \exp \big( -\|x-y\|_2^2 \,/\, 2\sigma^2)\) of standard deviation \(\sigma\) = blur."laplacian"
: Laplacian MMD, computed using the kernel \(k(x,y) = \exp \big( -\|x-y\|_2 \,/\, \sigma)\) of standard deviation \(\sigma\) = blur.
p (int, default=2) –
If loss is
"sinkhorn"
or"hausdorff"
, specifies the ground cost function between points. The supported values are:p = 1: \(~~C(x,y) ~=~ \|x-y\|_2\).
p = 2: \(~~C(x,y) ~=~ \tfrac{1}{2}\|x-y\|_2^2\).
blur (float, default=.05) –
The finest level of detail that should be handled by the loss function - in order to prevent overfitting on the samples’ locations.
If loss is
"gaussian"
or"laplacian"
, it is the standard deviation \(\sigma\) of the convolution kernel.If loss is
"sinkhorn"
or"hausdorff"
, it is the typical scale \(\sigma\) associated to the temperature \(\varepsilon = \sigma^p\). The default value of .05 is sensible for input measures that lie in the unit square/cube.
Note that the Energy Distance is scale-equivariant, and won’t be affected by this parameter.
reach (float, default=None= \(+\infty\)) – If loss is
"sinkhorn"
or"hausdorff"
, specifies the typical scale \(\tau\) associated to the constraint strength \(\rho = \tau^p\).diameter (float, default=None) – A rough indication of the maximum distance between points, which is used to tune the \(\varepsilon\)-scaling descent and provide a default heuristic for clustering multiscale schemes. If None, a conservative estimate will be computed on-the-fly.
scaling (float, default=.5) – If loss is
"sinkhorn"
, specifies the ratio between successive values of \(\sigma=\varepsilon^{1/p}\) in the \(\varepsilon\)-scaling descent. This parameter allows you to specify the trade-off between speed (scaling < .4) and accuracy (scaling > .9).truncate (float, default=None= \(+\infty\)) – If backend is
"multiscale"
, specifies the effective support of a Gaussian/Laplacian kernel as a multiple of its standard deviation. If truncate is not None, kernel truncation steps will assume that \(\exp(-x/\sigma)\) or \(\exp(-x^2/2\sigma^2) are zero when :math:\)|x| ,>, text{truncate}cdot sigma`.cost (function or string, default=None) –
if loss is
"sinkhorn"
or"hausdorff"
, specifies the cost function that should be used instead of \(\tfrac{1}{p}\|x-y\|^p\):If backend is
"tensorized"
, cost should be a python function that takes as input a (B,N,D) torch Tensor x, a (B,M,D) torch Tensor y and returns a batched Cost matrix as a (B,N,M) Tensor.Otherwise, if backend is
"online"
or"multiscale"
, cost should be a KeOps formula, given as a string, with variablesX
andY
. The default values are"Norm2(X-Y)"
(for p = 1) and"(SqDist(X,Y) / IntCst(2))"
(for p = 2).
cluster_scale (float, default=None) – If backend is
"multiscale"
, specifies the coarse scale at which cluster centroids will be computed. If None, a conservative estimate will be computed from diameter and the ambient space’s dimension, making sure that memory overflows won’t take place.debias (bool, default=True) – If loss is
"sinkhorn"
, specifies if we should compute the unbiased Sinkhorn divergence instead of the classic, entropy-regularized “SoftAssign” loss.potentials (bool, default=False) – When this parameter is set to True, the
SamplesLoss
layer returns a pair of optimal dual potentials \(F\) and \(G\), sampled on the input measures, instead of differentiable scalar value. These dual vectors \((F(x_i))\) and \((G(y_j))\) are encoded as Torch tensors, with the same shape as the input weights \((\alpha_i)\) and \((\beta_j)\).verbose (bool, default=False) – If backend is
"multiscale"
, specifies whether information on the clustering and \(\varepsilon\)-scaling descent should be displayed in the standard output.backend (string, default =
"auto"
) –The implementation that will be used in the background; this choice has a major impact on performance. The supported values are:
"auto"
: Choose automatically, using a simple heuristic based on the inputs’ shapes."tensorized"
: Relies on a full cost/kernel matrix, computed once and for all and stored on the device memory. This method is fast, but has a quadratic memory footprint and does not scale beyond ~5,000 samples per measure."online"
: Computes cost/kernel values on-the-fly, leveraging online map-reduce CUDA routines provided by the pykeops library."multiscale"
: Fast implementation that scales to millions of samples in dimension 1-2-3, relying on the block-sparse reductions provided by the pykeops library.