External EMD Handlers
Python
ExternalEMDHandler
Base class for all external EMD handlers. Cannot be directly instantiated. This takes care of thread safety (when used with PairwiseEMD) and tracks the number of calls to the handler. In Python, the floating point type is selected by using one of ExternalEMDHandlerFloat64 or ExternalEMDHandlerFloat32.
evaluate
evaluate(*args)
Evaluates the ExternalEMDHandler on a collection of (weighted) EMD values.
Arguments
- *args : one or two numpy.ndarray
args[0]should be an array of EMD values.args[1]is optional, but if present it should be the same length asargs[0]and is the weight associated with that EMD value (typically the product of the event weights corresponding to that EMD).
evaluate_symmetric
evaluate_symmetric(emds, event_weights)
Evaluates the ExternalEMDHandler on a collection of weighted EMD values, where the weights are provided as event weights and the EMDs are provided as the upper-triangular part of a symmetric distance matrix for the events.
Arguments
- emds : numpy.ndarray
- EMD values between all pairs of events, such as those returned by
raw_emdsafter calling thePairwiseEMDobject on a single set of events. Should have lengthn*(n-1)/2wherenis the number of events.
- EMD values between all pairs of events, such as those returned by
- event_weights : numpy.ndarray
- Event weights for the events the EMDs were computed between. Should be length
n. The weight associated to the EMD between eventsiandjisevent_weights[i] * event_weights[j].
- Event weights for the events the EMDs were computed between. Should be length
call
__call__(emd, weight=1)
Evaluates the ExternalEMDHandler on a single EMD value, optionally weighted.
Arguments
- emd : float
- The EMD value to process.
- weight : float
- The (optional) weight associated to the EMD value.
num_calls
num_calls()
Returns
- int
- The number of times that the handler has been called.
description
description()
- str
- A string describing the handler.
Histogram1DHandler
Histograms the EMD values into a pre-determined histogram. Histogram1DHandler uses a linearly-spaced axis whereas Histogram1DHandlerLog uses a log-spaced axis. The underlying C++ class uses the Boost Histogram Package.
wasserstein.Histogram1DHandlerFloat64(nbins, axis_min, axis_max)
wasserstein.Histogram1DHandlerFloat32(nbins, axis_min, axis_max)
wasserstein.Histogram1DHandlerLogFloat64(nbins, axis_min, axis_max)
wasserstein.Histogram1DHandlerLogFloat32(nbins, axis_min, axis_max)
The Float64 versions use double-precision and the Float32 versions use single-precision.
Arguments
- nbins : int
- The number of bins to create in the histogram.
- axis_min : float
- The lower bound of the axis.
- axis_max : float
- The upper bound of the axis.
hist_vals_errs
hist_vals_errs(overflows=True)
This accesses the histogram values and errors (which are the square root of the sum of the squared weights).
Arguments
- overflows : bool
- Whether or not to include the overflow bins as the first and last entry of the histogram contents and errors.
Returns
- (numpy.ndarray, numpy.ndarray)
- A pair of numpy arrays, the first is the values of the histogram and the second is the errors. If
overflowsisTruethen each will have lengthnbins+2, otherwise they will have lengthnbins.
- A pair of numpy arrays, the first is the values of the histogram and the second is the errors. If
hist_vals_vars
hist_vals_vars(overflows=True)
This accesses the histogram values and variances (the sum of the squared weights).
Arguments
- overflows : bool
- Whether or not to include the overflow bins as the first and last entry of the histogram contents and errors.
Returns
- (numpy.ndarray, numpy.ndarray) (first version only)
- A pair of numpy arrays, the first is the values of the histogram and the second is the variances. If
overflowsisTruethen each will have lengthnbins+2, otherwise they will have lengthnbins.
- A pair of numpy arrays, the first is the values of the histogram and the second is the variances. If
bin_centers
bin_centers()
Returns
- numpy.ndarray
- The centers of each of the bins as a
nbinslength numpy array. For the linearly-spaced axis this is the arithmetic mean of the bin edges and for the log-spaced axis this is the geometric mean of the bin edges.
- The centers of each of the bins as a
bin_edges
bin_edges()
Returns
- numpy.ndarray
- The bin edges as a
nbins+1length numpy array.
- The bin edges as a
print_axis
print_axis()
Returns
- str
- A textual representation of the histogram axis.
print_hist
print_hist()
Returns
- str
- A textual representation of the histogram.
nbins
nbins()
Returns
- int
- The number of bins of the histogram axis.
axis_min
axis_min()
Returns
- float
- The lower bound of the histogram axis.
axis_max
axis_max()
Returns
- float
- The upper bound of the histogram axis.
CorrelationDimension
This class inherits from Histogram1DHandlerLog and can be used to compute the correlation dimension of the collection of EMDs.
wasserstein.CorrelationDimension(nbins, axis_min, axis_max, dtype='float64')
Arguments
The first threearguments are the same as those of Histogram1DHandlerLog. The dtype argument should be either 'float64' or 'float32' and selects the floating-point precision.
corrdims
corrdims(eps=1e-100)
Arguments
- eps : float
- The epsilon value to use to avoid dividing by zero or taking the log of zero.
Returns
- (numpy.ndarray, numpy.ndarray)
- A pair of numpy arrays, the first is the correlation dimension values and the second is the correlation dimension errors. Each of these will be length
nbins-1because a derivative was taken.
- A pair of numpy arrays, the first is the correlation dimension values and the second is the correlation dimension errors. Each of these will be length
corrdim_bins
corrdim_bins()
The EMD bins corresponding to the correlation dimension values and errors returned by corrdims.
Returns
- numpy.ndarray
- The distance scales of the correlation dimension values. This has length
nbins-1because a derivative was taken.
- The distance scales of the correlation dimension values. This has length
cumulative_vals_vars
cumulative_vals_vars()
Accesses, the raw cumulative histogram of EMD values and their variances, excluding the overflow bins.
Returns
- (numpy.ndarray, numpy.ndarray)
- A pair of numpy arrays, the first is the cumulative EMD histogram values and the second is the variances of these bins. Each of these will be length
nbins.
- A pair of numpy arrays, the first is the cumulative EMD histogram values and the second is the variances of these bins. Each of these will be length
C++
ExternalEMDHandler
The functionality mirrors that of the Python wrapper. We list here the declarations of the methods of this class:
template<typename Value>
wasserstein::ExternalEMDHandler();
virtual std::string description() const = 0;
std::size_t num_calls() const;
void operator()(Value emd, Value weight = 1);
void evaluate(const std::vector<Value> & emds, const std::vector<Value> & weights = {});
void evaluate(const Value * emds, std::size_t num_emds, const Value * weights = nullptr);
void evaluate_symmetric(const std::vector<Value> & emds, const std::vector<Value> & weights);
void evaluate_symmetric(const Value * emds, std::size_t nev, const Value * weights);
protected:
virtual void handle(Value emd, Value weight) = 0;
Base classes should implement the description and handle methods.
Histogram1DHandler
The functionality mirrors that of the Python wrapper. We list here the declarations of the methods of this class:
template<class Transform, typename Value>
wasserstein::Histogram1DHandler(unsigned nbins, Value axis_min, Value axis_max)
unsigned nbins() const;
Value axis_min() const;
Value axis_max() const;
std::string description() const;
std::vector<Value> bin_centers() const;
std::vector<Value> bin_edges() const;
std::pair<std::vector<Value>, std::vector<Value>> hist_vals_vars(bool overflows = true) const;
// return textual representations of axis/hist
std::string print_axis() const;
std::string print_hist() const;
// access underlying boost histogram
auto & hist();
auto & axis();
const auto & hist() const;
const auto & axis() const;
CorrelationDimension
The functionality mirrors that of the Python wrapper. This class inherits from Histogram1DHandler<boost::histogram::axis::transform::log, Value>. We list here the declarations of the methods of this class:
template<typename Value>
wasserstein::CorrelationDimension(unsigned nbins, Value axis_min, Value axis_max);
std::pair<std::vector<Value>, std::vector<Value>>
corrdims(Value eps = std::numeric_limits<Value>::epsilon()) const;
std::vector<Value> corrdim_bins() const;
std::pair<std::vector<Value>, std::vector<Value>> cumulative_vals_vars() const;