Note
Go to the end to download the full example code
Gradient flows in 2D
Let’s showcase the properties of kernel MMDs, Hausdorff and Sinkhorn divergences on a simple toy problem: the registration of one blob onto another.
Setup
import numpy as np
import matplotlib.pyplot as plt
import time
import torch
from geomloss import SamplesLoss
use_cuda = torch.cuda.is_available()
dtype = torch.cuda.FloatTensor if use_cuda else torch.FloatTensor
Display routine
import numpy as np
import torch
from random import choices
from imageio import imread
from matplotlib import pyplot as plt
def load_image(fname):
img = imread(fname, as_gray=True) # Grayscale
img = (img[::-1, :]) / 255.0
return 1 - img
def draw_samples(fname, n, dtype=torch.FloatTensor):
A = load_image(fname)
xg, yg = np.meshgrid(
np.linspace(0, 1, A.shape[0]),
np.linspace(0, 1, A.shape[1]),
indexing="xy",
)
grid = list(zip(xg.ravel(), yg.ravel()))
dens = A.ravel() / A.sum()
dots = np.array(choices(grid, dens, k=n))
dots += (0.5 / A.shape[0]) * np.random.standard_normal(dots.shape)
return torch.from_numpy(dots).type(dtype)
def display_samples(ax, x, color):
x_ = x.detach().cpu().numpy()
ax.scatter(x_[:, 0], x_[:, 1], 25 * 500 / len(x_), color, edgecolors="none")
Dataset
Our source and target samples are drawn from intervals of the real line and define discrete probability measures:
N, M = (100, 100) if not use_cuda else (10000, 10000)
X_i = draw_samples("data/density_a.png", N, dtype)
Y_j = draw_samples("data/density_b.png", M, dtype)
/home/code/geomloss/geomloss/examples/comparisons/plot_gradient_flows_2D.py:38: DeprecationWarning: Starting with ImageIO v3 the behavior of this function will switch to that of iio.v3.imread. To keep the current behavior (and make this warning disappear) use `import imageio.v2 as imageio` or call `imageio.v2.imread` directly.
img = imread(fname, as_gray=True) # Grayscale
Wasserstein gradient flow
To study the influence of the
def gradient_flow(loss, lr=0.05):
"""Flows along the gradient of the cost function, using a simple Euler scheme.
Parameters:
loss ((x_i,y_j) -> torch float number):
Real-valued loss function.
lr (float, default = .05):
Learning rate, i.e. time step.
"""
# Parameters for the gradient descent
Nsteps = int(5 / lr) + 1
display_its = [int(t / lr) for t in [0, 0.25, 0.50, 1.0, 2.0, 5.0]]
# Use colors to identify the particles
colors = (10 * X_i[:, 0]).cos() * (10 * X_i[:, 1]).cos()
colors = colors.detach().cpu().numpy()
# Make sure that we won't modify the reference samples
x_i, y_j = X_i.clone(), Y_j.clone()
# We're going to perform gradient descent on Loss(α, β)
# wrt. the positions x_i of the diracs masses that make up α:
x_i.requires_grad = True
t_0 = time.time()
plt.figure(figsize=(12, 8))
k = 1
for i in range(Nsteps): # Euler scheme ===============
# Compute cost and gradient
L_αβ = loss(x_i, y_j)
[g] = torch.autograd.grad(L_αβ, [x_i])
if i in display_its: # display
ax = plt.subplot(2, 3, k)
k = k + 1
plt.set_cmap("hsv")
plt.scatter(
[10], [10]
) # shameless hack to prevent a slight change of axis...
display_samples(ax, y_j, [(0.55, 0.55, 0.95)])
display_samples(ax, x_i, colors)
ax.set_title("t = {:1.2f}".format(lr * i))
plt.axis([0, 1, 0, 1])
plt.gca().set_aspect("equal", adjustable="box")
plt.xticks([], [])
plt.yticks([], [])
plt.tight_layout()
# in-place modification of the tensor's values
x_i.data -= lr * len(x_i) * g
plt.title(
"t = {:1.2f}, elapsed time: {:.2f}s/it".format(
lr * i, (time.time() - t_0) / Nsteps
)
)
Kernel norms, MMDs
Gaussian MMD
The smooth Gaussian kernel
gradient_flow(SamplesLoss("gaussian", blur=0.5))

On the other hand, if the radius
gradient_flow(SamplesLoss("gaussian", blur=0.1))

Laplacian MMD
The pointy exponential kernel
gradient_flow(SamplesLoss("laplacian", blur=0.1))

Energy Distance MMD
The scale-equivariant kernel
# sphinx_gallery_thumbnail_number = 4
gradient_flow(SamplesLoss("energy"))

Sinkhorn divergence
(Unbiased) Sinkhorn divergences have recently been introduced in the machine learning litterature, and can be understood as modern iterations of the classic SoftAssign algorithm from economics and computer vision.
Wasserstein-1 distance
When p = 1
, the Sinkhorn divergence
gradient_flow(SamplesLoss("sinkhorn", p=1, blur=1.0))

And the Earth-Mover’s (Wassertein-1) distance:
gradient_flow(SamplesLoss("sinkhorn", p=1, blur=0.01))

Wasserstein-2 distance
When p = 2
,
which only registers the means of both measures with each other
(when
gradient_flow(SamplesLoss("sinkhorn", p=2, blur=1.0))

And the quadratic, Wasserstein-2 Optimal Transport
distance which has been studied so well by mathematicians
from the 80’s onwards (when
gradient_flow(SamplesLoss("sinkhorn", p=2, blur=0.01))

Introduced in 2016-2018, the unbalanced
setting (Gaussian-Hellinger, Wasserstein-Fisher-Rao, etc.)
provides a principled way of introducing a threshold
in Optimal Transport computations:
it allows you to introduce laziness in the transportation problem
by replacing distance fields
In real-life applications, this tunable parameter could allow you to be a little bit more robust to outliers!
gradient_flow(SamplesLoss("sinkhorn", p=2, blur=0.01, reach=0.3))

Total running time of the script: ( 0 minutes 20.557 seconds)