External EMD Handlers


Python

ExternalEMDHandler

Base class for all external EMD handlers. Cannot be directly instantiated. This takes care of thread safety (when used with PairwiseEMD) and tracks the number of calls to the handler. In Python, the floating point type is selected by using one of ExternalEMDHandlerFloat64 or ExternalEMDHandlerFloat32.

evaluate

evaluate(*args)

Evaluates the ExternalEMDHandler on a collection of (weighted) EMD values.

Arguments

  • *args : one or two numpy.ndarray
    • args[0] should be an array of EMD values. args[1] is optional, but if present it should be the same length as args[0] and is the weight associated with that EMD value (typically the product of the event weights corresponding to that EMD).

evaluate_symmetric

evaluate_symmetric(emds, event_weights)

Evaluates the ExternalEMDHandler on a collection of weighted EMD values, where the weights are provided as event weights and the EMDs are provided as the upper-triangular part of a symmetric distance matrix for the events.

Arguments

  • emds : numpy.ndarray
    • EMD values between all pairs of events, such as those returned by raw_emds after calling the PairwiseEMD object on a single set of events. Should have length n*(n-1)/2 where n is the number of events.
  • event_weights : numpy.ndarray
    • Event weights for the events the EMDs were computed between. Should be length n. The weight associated to the EMD between events i and j is event_weights[i] * event_weights[j].

call

__call__(emd, weight=1)

Evaluates the ExternalEMDHandler on a single EMD value, optionally weighted.

Arguments

  • emd : float
    • The EMD value to process.
  • weight : float
    • The (optional) weight associated to the EMD value.

num_calls

num_calls()

Returns

  • int
    • The number of times that the handler has been called.

description

description()
  • str
    • A string describing the handler.

Histogram1DHandler

Histograms the EMD values into a pre-determined histogram. Histogram1DHandler uses a linearly-spaced axis whereas Histogram1DHandlerLog uses a log-spaced axis. The underlying C++ class uses the Boost Histogram Package.

wasserstein.Histogram1DHandlerFloat64(nbins, axis_min, axis_max)
wasserstein.Histogram1DHandlerFloat32(nbins, axis_min, axis_max)
wasserstein.Histogram1DHandlerLogFloat64(nbins, axis_min, axis_max)
wasserstein.Histogram1DHandlerLogFloat32(nbins, axis_min, axis_max)

The Float64 versions use double-precision and the Float32 versions use single-precision.

Arguments

  • nbins : int
    • The number of bins to create in the histogram.
  • axis_min : float
    • The lower bound of the axis.
  • axis_max : float
    • The upper bound of the axis.

hist_vals_errs

hist_vals_errs(overflows=True)

This accesses the histogram values and errors (which are the square root of the sum of the squared weights).

Arguments

  • overflows : bool
    • Whether or not to include the overflow bins as the first and last entry of the histogram contents and errors.

Returns

  • (numpy.ndarray, numpy.ndarray)
    • A pair of numpy arrays, the first is the values of the histogram and the second is the errors. If overflows is True then each will have length nbins+2, otherwise they will have length nbins.

hist_vals_vars

hist_vals_vars(overflows=True)

This accesses the histogram values and variances (the sum of the squared weights).

Arguments

  • overflows : bool
    • Whether or not to include the overflow bins as the first and last entry of the histogram contents and errors.

Returns

  • (numpy.ndarray, numpy.ndarray) (first version only)
    • A pair of numpy arrays, the first is the values of the histogram and the second is the variances. If overflows is True then each will have length nbins+2, otherwise they will have length nbins.

bin_centers

bin_centers()

Returns

  • numpy.ndarray
    • The centers of each of the bins as a nbins length numpy array. For the linearly-spaced axis this is the arithmetic mean of the bin edges and for the log-spaced axis this is the geometric mean of the bin edges.

bin_edges

bin_edges()

Returns

  • numpy.ndarray
    • The bin edges as a nbins+1 length numpy array.
print_axis()

Returns

  • str
    • A textual representation of the histogram axis.
print_hist()

Returns

  • str
    • A textual representation of the histogram.

nbins

nbins()

Returns

  • int
    • The number of bins of the histogram axis.

axis_min

axis_min()

Returns

  • float
    • The lower bound of the histogram axis.

axis_max

axis_max()

Returns

  • float
    • The upper bound of the histogram axis.

CorrelationDimension

This class inherits from Histogram1DHandlerLog and can be used to compute the correlation dimension of the collection of EMDs.

wasserstein.CorrelationDimension(nbins, axis_min, axis_max, dtype='float64')

Arguments

The first threearguments are the same as those of Histogram1DHandlerLog. The dtype argument should be either 'float64' or 'float32' and selects the floating-point precision.

corrdims

corrdims(eps=1e-100)

Arguments

  • eps : float
    • The epsilon value to use to avoid dividing by zero or taking the log of zero.

Returns

  • (numpy.ndarray, numpy.ndarray)
    • A pair of numpy arrays, the first is the correlation dimension values and the second is the correlation dimension errors. Each of these will be lengthnbins-1 because a derivative was taken.

corrdim_bins

corrdim_bins()

The EMD bins corresponding to the correlation dimension values and errors returned by corrdims.

Returns

  • numpy.ndarray
    • The distance scales of the correlation dimension values. This has length nbins-1 because a derivative was taken.

cumulative_vals_vars

cumulative_vals_vars()

Accesses, the raw cumulative histogram of EMD values and their variances, excluding the overflow bins.

Returns

  • (numpy.ndarray, numpy.ndarray)
    • A pair of numpy arrays, the first is the cumulative EMD histogram values and the second is the variances of these bins. Each of these will be lengthnbins.

C++

ExternalEMDHandler

The functionality mirrors that of the Python wrapper. We list here the declarations of the methods of this class:

template<typename Value>
wasserstein::ExternalEMDHandler();

virtual std::string description() const = 0;
std::size_t num_calls() const;
void operator()(Value emd, Value weight = 1);
void evaluate(const std::vector<Value> & emds, const std::vector<Value> & weights = {});
void evaluate(const Value * emds, std::size_t num_emds, const Value * weights = nullptr);
void evaluate_symmetric(const std::vector<Value> & emds, const std::vector<Value> & weights);
void evaluate_symmetric(const Value * emds, std::size_t nev, const Value * weights);

protected:
  virtual void handle(Value emd, Value weight) = 0;

Base classes should implement the description and handle methods.

Histogram1DHandler

The functionality mirrors that of the Python wrapper. We list here the declarations of the methods of this class:

template<class Transform, typename Value>
wasserstein::Histogram1DHandler(unsigned nbins, Value axis_min, Value axis_max)

unsigned nbins() const;
Value axis_min() const;
Value axis_max() const;
std::string description() const;

std::vector<Value> bin_centers() const;
std::vector<Value> bin_edges() const;
std::pair<std::vector<Value>, std::vector<Value>> hist_vals_vars(bool overflows = true) const;

// return textual representations of axis/hist
std::string print_axis() const;
std::string print_hist() const;

// access underlying boost histogram
auto & hist();
auto & axis();
const auto & hist() const;
const auto & axis() const;

CorrelationDimension

The functionality mirrors that of the Python wrapper. This class inherits from Histogram1DHandler<boost::histogram::axis::transform::log, Value>. We list here the declarations of the methods of this class:

template<typename Value>
wasserstein::CorrelationDimension(unsigned nbins, Value axis_min, Value axis_max);

std::pair<std::vector<Value>, std::vector<Value>>
corrdims(Value eps = std::numeric_limits<Value>::epsilon()) const;

std::vector<Value> corrdim_bins() const;
std::pair<std::vector<Value>, std::vector<Value>> cumulative_vals_vars() const;