Welcome to Wasserstein

Features

The Wasserstein package computes Wasserstein distances and related quantities efficiently. It contains an efficient implementation of the network simplex algorithm originally from the LEMON graph library, modified by Nicolas Boneel, modified by the authors of the Python Optimal Transport (POT) library, and further modified in this package by Patrick Komiske. The main code is written in C++ with a NumPy-based Python wrapper provided via SWIG.

To get started, check out the Python Binder Demo or the C++ Examples.

The following classes contain the main functionalities of Wasserstein:

  • EMD: Computes the Wasserstein distance between two distributions, including a possible penalty term. Can use either the builtin Euclidean ground distance (with the possibility of raising these to a power beta) or a custom ground distance between distributions.

  • PairwiseEMD: Computes pairs of Wasserstein distances between collections of distributions. Multi-threading support is provided via OMP.

  • CorrelationDimension: The correlation dimension is a type of fractal dimension that estimates dimensionality of the underlying data manifold on which the distributions live. It has been applied to CMS Open Data.

The current version is 1.1.0. Changes are summarized in the Release Notes. Using the most up-to-date version is recommended. As of version 0.2.0, tests have been written covering the majority of the code. The source code can be found on GitHub.

References

[1] N. Bonneel, M. van de Panne, S. Paris, W. Heidrich, Displacement interpolation using Lagrangian mass transport, ACM Trans. Graph. 30 (2011).

[2] P. T. Komiske, E. M. Metodiev, and J. Thaler, The Metric Space of Collider Events, Phys. Rev. Lett. 123 (2019) 041801 [1902.02346].

[3] P. T. Komiske, E. M. Metodiev, and J. Thaler, The Hidden Geometry of Particle Collisions, JHEP 07 (2020) 006 [2004.04159].

Wasserstein is licensed under the GNU Puplic License v3. See the LICENSE for detailed copyright information.