External EMD Handlers
Python
ExternalEMDHandler
Base class for all external EMD handlers. Cannot be directly instantiated. This takes care of thread safety (when used with PairwiseEMD
) and tracks the number of calls to the handler. In Python, the floating point type is selected by using one of ExternalEMDHandlerFloat64
or ExternalEMDHandlerFloat32
.
evaluate
evaluate(*args)
Evaluates the ExternalEMDHandler
on a collection of (weighted) EMD values.
Arguments
- *args : one or two numpy.ndarray
args[0]
should be an array of EMD values.args[1]
is optional, but if present it should be the same length asargs[0]
and is the weight associated with that EMD value (typically the product of the event weights corresponding to that EMD).
evaluate_symmetric
evaluate_symmetric(emds, event_weights)
Evaluates the ExternalEMDHandler
on a collection of weighted EMD values, where the weights are provided as event weights and the EMDs are provided as the upper-triangular part of a symmetric distance matrix for the events.
Arguments
- emds : numpy.ndarray
- EMD values between all pairs of events, such as those returned by
raw_emds
after calling thePairwiseEMD
object on a single set of events. Should have lengthn*(n-1)/2
wheren
is the number of events.
- EMD values between all pairs of events, such as those returned by
- event_weights : numpy.ndarray
- Event weights for the events the EMDs were computed between. Should be length
n
. The weight associated to the EMD between eventsi
andj
isevent_weights[i] * event_weights[j]
.
- Event weights for the events the EMDs were computed between. Should be length
call
__call__(emd, weight=1)
Evaluates the ExternalEMDHandler
on a single EMD value, optionally weighted.
Arguments
- emd : float
- The EMD value to process.
- weight : float
- The (optional) weight associated to the EMD value.
num_calls
num_calls()
Returns
- int
- The number of times that the handler has been called.
description
description()
- str
- A string describing the handler.
Histogram1DHandler
Histograms the EMD values into a pre-determined histogram. Histogram1DHandler
uses a linearly-spaced axis whereas Histogram1DHandlerLog
uses a log-spaced axis. The underlying C++ class uses the Boost Histogram Package.
wasserstein.Histogram1DHandlerFloat64(nbins, axis_min, axis_max)
wasserstein.Histogram1DHandlerFloat32(nbins, axis_min, axis_max)
wasserstein.Histogram1DHandlerLogFloat64(nbins, axis_min, axis_max)
wasserstein.Histogram1DHandlerLogFloat32(nbins, axis_min, axis_max)
The Float64
versions use double-precision and the Float32
versions use single-precision.
Arguments
- nbins : int
- The number of bins to create in the histogram.
- axis_min : float
- The lower bound of the axis.
- axis_max : float
- The upper bound of the axis.
hist_vals_errs
hist_vals_errs(overflows=True)
This accesses the histogram values and errors (which are the square root of the sum of the squared weights).
Arguments
- overflows : bool
- Whether or not to include the overflow bins as the first and last entry of the histogram contents and errors.
Returns
- (numpy.ndarray, numpy.ndarray)
- A pair of numpy arrays, the first is the values of the histogram and the second is the errors. If
overflows
isTrue
then each will have lengthnbins+2
, otherwise they will have lengthnbins
.
- A pair of numpy arrays, the first is the values of the histogram and the second is the errors. If
hist_vals_vars
hist_vals_vars(overflows=True)
This accesses the histogram values and variances (the sum of the squared weights).
Arguments
- overflows : bool
- Whether or not to include the overflow bins as the first and last entry of the histogram contents and errors.
Returns
- (numpy.ndarray, numpy.ndarray) (first version only)
- A pair of numpy arrays, the first is the values of the histogram and the second is the variances. If
overflows
isTrue
then each will have lengthnbins+2
, otherwise they will have lengthnbins
.
- A pair of numpy arrays, the first is the values of the histogram and the second is the variances. If
bin_centers
bin_centers()
Returns
- numpy.ndarray
- The centers of each of the bins as a
nbins
length numpy array. For the linearly-spaced axis this is the arithmetic mean of the bin edges and for the log-spaced axis this is the geometric mean of the bin edges.
- The centers of each of the bins as a
bin_edges
bin_edges()
Returns
- numpy.ndarray
- The bin edges as a
nbins+1
length numpy array.
- The bin edges as a
print_axis
print_axis()
Returns
- str
- A textual representation of the histogram axis.
print_hist
print_hist()
Returns
- str
- A textual representation of the histogram.
nbins
nbins()
Returns
- int
- The number of bins of the histogram axis.
axis_min
axis_min()
Returns
- float
- The lower bound of the histogram axis.
axis_max
axis_max()
Returns
- float
- The upper bound of the histogram axis.
CorrelationDimension
This class inherits from Histogram1DHandlerLog
and can be used to compute the correlation dimension of the collection of EMDs.
wasserstein.CorrelationDimension(nbins, axis_min, axis_max, dtype='float64')
Arguments
The first threearguments are the same as those of Histogram1DHandlerLog
. The dtype
argument should be either 'float64'
or 'float32'
and selects the floating-point precision.
corrdims
corrdims(eps=1e-100)
Arguments
- eps : float
- The epsilon value to use to avoid dividing by zero or taking the log of zero.
Returns
- (numpy.ndarray, numpy.ndarray)
- A pair of numpy arrays, the first is the correlation dimension values and the second is the correlation dimension errors. Each of these will be length
nbins-1
because a derivative was taken.
- A pair of numpy arrays, the first is the correlation dimension values and the second is the correlation dimension errors. Each of these will be length
corrdim_bins
corrdim_bins()
The EMD bins corresponding to the correlation dimension values and errors returned by corrdims
.
Returns
- numpy.ndarray
- The distance scales of the correlation dimension values. This has length
nbins-1
because a derivative was taken.
- The distance scales of the correlation dimension values. This has length
cumulative_vals_vars
cumulative_vals_vars()
Accesses, the raw cumulative histogram of EMD values and their variances, excluding the overflow bins.
Returns
- (numpy.ndarray, numpy.ndarray)
- A pair of numpy arrays, the first is the cumulative EMD histogram values and the second is the variances of these bins. Each of these will be length
nbins
.
- A pair of numpy arrays, the first is the cumulative EMD histogram values and the second is the variances of these bins. Each of these will be length
C++
ExternalEMDHandler
The functionality mirrors that of the Python wrapper. We list here the declarations of the methods of this class:
template<typename Value>
wasserstein::ExternalEMDHandler();
virtual std::string description() const = 0;
std::size_t num_calls() const;
void operator()(Value emd, Value weight = 1);
void evaluate(const std::vector<Value> & emds, const std::vector<Value> & weights = {});
void evaluate(const Value * emds, std::size_t num_emds, const Value * weights = nullptr);
void evaluate_symmetric(const std::vector<Value> & emds, const std::vector<Value> & weights);
void evaluate_symmetric(const Value * emds, std::size_t nev, const Value * weights);
protected:
virtual void handle(Value emd, Value weight) = 0;
Base classes should implement the description
and handle
methods.
Histogram1DHandler
The functionality mirrors that of the Python wrapper. We list here the declarations of the methods of this class:
template<class Transform, typename Value>
wasserstein::Histogram1DHandler(unsigned nbins, Value axis_min, Value axis_max)
unsigned nbins() const;
Value axis_min() const;
Value axis_max() const;
std::string description() const;
std::vector<Value> bin_centers() const;
std::vector<Value> bin_edges() const;
std::pair<std::vector<Value>, std::vector<Value>> hist_vals_vars(bool overflows = true) const;
// return textual representations of axis/hist
std::string print_axis() const;
std::string print_hist() const;
// access underlying boost histogram
auto & hist();
auto & axis();
const auto & hist() const;
const auto & axis() const;
CorrelationDimension
The functionality mirrors that of the Python wrapper. This class inherits from Histogram1DHandler<boost::histogram::axis::transform::log, Value>
. We list here the declarations of the methods of this class:
template<typename Value>
wasserstein::CorrelationDimension(unsigned nbins, Value axis_min, Value axis_max);
std::pair<std::vector<Value>, std::vector<Value>>
corrdims(Value eps = std::numeric_limits<Value>::epsilon()) const;
std::vector<Value> corrdim_bins() const;
std::pair<std::vector<Value>, std::vector<Value>> cumulative_vals_vars() const;