timeeval.metrics package¶

This module contains all metrics that can be used with TimeEval. The metrics are divided into five different categories:

Classification-metrics: These metrics are defined over binary classification predictions (zeros or ones), thus they require a thresholding strategy to convert anomaly scorings to binary classification results.
AUC-metrics: All AUC-Metrics support continuous scorings, and calculate the area under a custom curve function.
- RocAUC
- PrAUC
Range-metrics: Range-metrics compute the quality scores for anomaly ranges (windows) instead of each point in the time series.
VUS-metrics: The metrics of this category share a custom definition of range-based recall and range-based precision [PaparrizosEtAl2022].
Other-metrics: Metrics that don’t belong to any of the above categories:

All metrics inherit from the abstract base class Metric, and implement the __call__ method, the supports_continuous_scorings method, and the name property. This allows them to be used within TimeEval and on their own. You can also implement your own metrics by inheriting from timeeval.metrics.Metric (see its documentation for more information).

Examples

Using the default metric list that just contains ROC_AUC:

>>> from timeeval import TimeEval, DefaultMetrics
>>> TimeEval(dataset_mgr=..., datasets=[], algorithms=[],
>>>          metrics=DefaultMetrics.default_list())

Using a custom selection of metrics:

>>> from timeeval import TimeEval
>>> from timeeval.metrics import RangeRocAUC, RangeRocVUS, RangePrAUC, RangePrVUS
>>> TimeEval(dataset_mgr=..., datasets=[], algorithms=[],
>>>          metrics=[RangeRocAUC(buffer_size=100), RangeRocVUS(max_buffer_size=100),
>>>                  RangePrAUC(buffer_size=100), RangePrVUS(max_buffer_size=100)])

Using the metrics without TimeEval:

>>> import numpy as np
>>> from timeeval import DefaultMetrics
>>> from timeeval.metrics import RangePrAUC
>>> from timeeval.metrics.thresholding import PercentileThresholding
>>> rng = np.random.default_rng(42)
>>> y_true = rng.random(100) > 0.5
>>> y_score = rng.random(100)
>>> metrics = [
>>>     # default metrics are already parameterized objects:
>>>     DefaultMetrics.ROC_AUC,
>>>     # all metrics (in general) are classes that need to be instantiated with their parameterization:
>>>     RangePrAUC(buffer_size=100),
>>>     # classification metrics need a thresholding strategy for continuous scorings:
>>>     F1Score(PercentileThresholding(percentile=95))
>>> ]
>>> # compute the metrics
>>> for m in metrics:
>>>     metric_value = m(y_true, y_score)
>>>     print(f"{m.name} = {metric_value}")

timeeval.metrics.Metric¶

class timeeval.metrics.Metric¶

Bases: ABC

Base class for metric implementations that score anomaly scorings against ground truth binary labels. Every subclass must implement name(), score(), and supports_continuous_scorings().

Examples

You can implement a new TimeEval metric easily by inheriting from this base class. A simple metric, for example, uses a fixed threshold to get binary labels and computes the false positive rate:

>>> from timeeval.metrics import Metric
>>> class FPR(Metric):
>>>     def __init__(self, threshold: float = 0.8):
>>>         self._threshold = threshold
>>>     @property
>>>     def name(self) -> str:
>>>         return f"FPR@{self._threshold}"
>>>     def score(self, y_true: np.ndarray, y_score: np.ndarray) -> float:
>>>         y_pred = y_score >= self._threshold
>>>         fp = np.sum(y_pred & ~y_true)
>>>         return fp / (fp + np.sum(y_true))
>>>     def supports_continuous_scorings(self) -> bool:
>>>         return True

This metric can then be used in TimeEval:

>>> from timeeval import TimeEval
>>> from timeeval.metrics import DefaultMetrics
>>> timeeval = TimeEval(dmgr=..., datasets=[], algorithms=[],
>>>                     metrics=[FPR(threshold=0.8), DefaultMetrics.ROC_AUC])

abstract property name: str¶: Returns the unique name of this metric.

abstract score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

abstract supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.RocAUC¶

class timeeval.metrics.RocAUC(plot: bool = False, plot_store: bool = False)¶

Bases: AucMetric

Computes the area under the receiver operating characteristic curve.

Parameters

plot (bool) – Set this parameter to True to plot the curve.
plot_store (bool) – If this parameter is True the curve plot will be saved in the current working directory under the name template “fig-{metric-name}.pdf”.

See also

https://en.wikipedia.org/wiki/Receiver_operating_characteristic : Explanation of the ROC-curve.

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.PrAUC¶

class timeeval.metrics.PrAUC(plot: bool = False, plot_store: bool = False)¶

Bases: AucMetric

Computes the area under the precision recall curve.

Parameters

plot (bool) – Set this parameter to True to plot the curve.
plot_store (bool) – If this parameter is True the curve plot will be saved in the current working directory under the name template “fig-{metric-name}.pdf”.

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.RangePrecisionRangeRecallAUC¶

class timeeval.metrics.RangePrecisionRangeRecallAUC(max_samples: int = 50, r_alpha: float = 0.5, p_alpha: float = 0, cardinality: str = 'reciprocal', bias: str = 'flat', plot: bool = False, plot_store: bool = False, name: str = 'RANGE_PR_AUC')¶

Bases: AucMetric

Computes the area under the precision recall curve when using the range-based precision and range-based recall metric introduced by Tatbul et al. at NeurIPS 2018 [TatbulEtAl2018].

Parameters

max_samples (int) – TimeEval uses a community implementation of the range-based precision and recall metrics, which is quite slow. To prevent long runtimes caused by scorings with high precision (many thresholds), just a specific amount of possible thresholds is sampled. This parameter controls the maximum number of thresholds; too low numbers degrade the metrics’ quality.
r_alpha (float) – Weight of the existence reward for the range-based recall.
p_alpha (float) – Weight of the existence reward for the range-based precision. For most - when not all - cases, p_alpha should be set to 0.
cardinality ({'reciprocal', 'one', 'udf_gamma'}) – Cardinality type.
bias ({'flat', 'front', 'middle', 'back'}) – Positional bias type.
plot (bool) –
plot_store (bool) –
name (str) – Custom name for this metric (e.g. including your parameter changes).

References

TatbulEtAl2018(1,2,3,4): Tatbul, Nesime, Tae Jun Lee, Stan Zdonik, Mejbah Alam, and Justin Gottschlich. “Precision and Recall for Time Series.” In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), 1920–30. 2018. http://papers.nips.cc/paper/7462-precision-and-recall-for-time-series.pdf.

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.AveragePrecision¶

class timeeval.metrics.AveragePrecision(**kwargs)¶

Bases: Metric

Computes the average precision metric aver all possible thresholds.

This metric is an approximation of the timeeval.metrics.PrAUC-metric.

Parameters: kwargs (dict) – Keyword arguments that get passed down to sklearn.metrics.average_precision_score()

See also

sklearn.metrics.average_precision_score: Implementation of the average precision metric.

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.Precision¶

class timeeval.metrics.Precision(thresholding_strategy: ThresholdingStrategy)¶

Bases: ClassificationMetric

Computes the precision metric.

Parameters: thresholding_strategy (ThresholdingStrategy) – Thresholding strategy used to transform the anomaly scorings to binary classification predictions.

See also

sklearn.metrics.precision_score: Implementation of the precision metric.

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.Recall¶

class timeeval.metrics.Recall(thresholding_strategy: ThresholdingStrategy)¶

Bases: ClassificationMetric

Computes the recall metric.

Parameters: thresholding_strategy (ThresholdingStrategy) – Thresholding strategy used to transform the anomaly scorings to binary classification predictions.

See also

sklearn.metrics.recall_score: Implementation of the recall metric.

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.F1Score¶

class timeeval.metrics.F1Score(thresholding_strategy: ThresholdingStrategy)¶

Bases: ClassificationMetric

Computes the F1 metric, which is the harmonic mean of precision and recall.

Parameters: thresholding_strategy (ThresholdingStrategy) – Thresholding strategy used to transform the anomaly scorings to binary classification predictions.

See also

sklearn.metrics.f1_score: Implementation of the F1 metric.

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.RangePrecision¶

class timeeval.metrics.RangePrecision(thresholding_strategy: ThresholdingStrategy = NoThresholding(), alpha: float = 0, cardinality: str = 'reciprocal', bias: str = 'flat', name: str = 'RANGE_PRECISION')¶

Bases: Metric

Computes the range-based precision metric introduced by Tatbul et al. at NeurIPS 2018 [TatbulEtAl2018].

Range precision is the average precision of each predicted anomaly range. For each predicted continuous anomaly range the overlap size, position, and cardinality is considered.

Parameters

thresholding_strategy (ThresholdingStrategy) – Strategy used to find a threshold over continuous anomaly scores to get binary labels. Use timeeval.metrics.thresholding.NoThresholding for results that already contain binary labels.
alpha (float) – Weight of the existence reward. Because precision by definition emphasizes on prediction quality, there is no need for an existence reward and this value should always be set to 0.
cardinality ({'reciprocal', 'one', 'udf_gamma'}) – Cardinality type.
bias ({'flat', 'front', 'middle', 'back'}) – Positional bias type.
name (str) – Custom name for this metric (e.g. including your parameter changes).

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.RangeRecall¶

class timeeval.metrics.RangeRecall(thresholding_strategy: ThresholdingStrategy = NoThresholding(), alpha: float = 0, cardinality: str = 'reciprocal', bias: str = 'flat', name: str = 'RANGE_RECALL')¶

Bases: Metric

Computes the range-based recall metric introduced by Tatbul et al. at NeurIPS 2018 [TatbulEtAl2018].

Range recall is the average recall of each real anomaly range. For each real anomaly range the overlap size, position, and cardinality with predicted anomaly ranges are considered. In addition, an existence reward can be given that boosts the recall even if just a single point of the real anomaly is in the predicted ranges.

Parameters

thresholding_strategy (ThresholdingStrategy) – Strategy used to find a threshold over continuous anomaly scores to get binary labels. Use timeeval.metrics.thresholding.NoThresholding for results that already contain binary labels.
alpha (float) – Weight of the existence reward. If 0: no existence reward, if 1: only existence reward. The existence reward is given if the real anomaly range has overlap with even a single point of the predicted anomaly range.
cardinality ({'reciprocal', 'one', 'udf_gamma'}) – Cardinality type.
bias ({'flat', 'front', 'middle', 'back'}) – Positional bias type.
name (str) – Custom name for this metric (e.g. including your parameter changes).

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.RangeFScore¶

class timeeval.metrics.RangeFScore(thresholding_strategy: ThresholdingStrategy = NoThresholding(), beta: float = 1, p_alpha: float = 0, r_alpha: float = 0.5, cardinality: str = 'reciprocal', p_bias: str = 'flat', r_bias: str = 'flat', name: Optional[str] = None)¶

Bases: Metric

Computes the range-based F-score using the recall and precision metrics by Tatbul et al. at NeurIPS 2018 [TatbulEtAl2018].

The F-beta score is the weighted harmonic mean of precision and recall, reaching its optimal value at 1 and its worst value at 0. This implementation uses the range-based precision and range-based recall as basis.

Parameters

thresholding_strategy (ThresholdingStrategy) – Strategy used to find a threshold over continuous anomaly scores to get binary labels. Use timeeval.metrics.thresholding.NoThresholding for results that already contain binary labels.
beta (float) – F-score beta determines the weight of recall in the combined score. beta < 1 lends more weight to precision, while beta > 1 favors recall.
p_alpha (float) – Weight of the existence reward for the range-based precision. For most - when not all - cases, p_alpha should be set to 0.
r_alpha (float) – Weight of the existence reward. If 0: no existence reward, if 1: only existence reward.
cardinality ({'reciprocal', 'one', 'udf_gamma'}) – Cardinality type.
p_bias ({'flat', 'front', 'middle', 'back'}) – Positional bias type.
r_bias ({'flat', 'front', 'middle', 'back'}) – Positional bias type.
name (str) – Custom name for this metric (e.g. including your parameter changes). If None, will include the beta-value in the name: “RANGE_F{beta}_SCORE”.

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.FScoreAtK¶

class timeeval.metrics.FScoreAtK(k: Optional[int] = None)¶

Bases: Metric

Computes the F-score at k based on anomaly ranges.

This metric only considers the top-k predicted anomaly ranges within the scoring by finding a threshold on the scoring that produces at least k anomaly ranges. If k is not specified, the number of anomalies within the ground truth is used as k.

Parameters: k (int (optional)) – Number of top anomalies used to calculate precision. If k is not specified (None) the number of true anomalies (based on the ground truth values) is used.

See also

timeeval.metrics.thresholding.TopKRangesThresholding: Thresholding approach used.

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.PrecisionAtK¶

class timeeval.metrics.PrecisionAtK(k: Optional[int] = None)¶

Bases: Metric

Computes the Precision at k based on anomaly ranges.

This metric only considers the top-k predicted anomaly ranges within the scoring by finding a threshold on the scoring that produces at least k anomaly ranges. If k is not specified, the number of anomalies within the ground truth is used as k.

Parameters: k (int (optional)) – Number of top anomalies used to calculate precision. If k is not specified (None) the number of true anomalies (based on the ground truth values) is used.

See also

timeeval.metrics.thresholding.TopKRangesThresholding: Thresholding approach used.

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.RangePrAUC¶

class timeeval.metrics.RangePrAUC(buffer_size: Optional[int] = None, compatibility_mode: bool = False, max_samples: int = 250, plot: bool = False, plot_store: bool = False)¶

Bases: RangeAucMetric

Computes the area under the precision-recall-curve using the range-based precision and range-based recall definition from Paparrizos et al. published at VLDB 2022 [PaparrizosEtAl2022].

We first extend the anomaly labels by two slopes of buffer_size//2 length on both sides of each anomaly, uniformly sample thresholds from the anomaly score, and then compute the confusion matrix for all thresholds. Using the resulting precision and recall values, we can plot a curve and compute its area.

We make some changes to the original implementation from [PaparrizosEtAl2022] because we do not agree with the original assumptions. To reproduce the original results, you can set the parameter compatibility_mode=True. This will compute exactly the same values as the code by the authors of the paper.

The following things are different in TimeEval compared to the original version:

For the recall (FPR) existence reward, we count anomalies as separate events, even if the added slopes overlap.
Overlapping slopes don’t sum up in their anomaly weight, but we just take to maximum anomaly weight for each point in the ground truth.
The original slopes are asymmetric: The slopes at the end of anomalies are a single point shorter than the ones at the beginning of anomalies. We use symmetric slopes of the same size for the beginning and end of anomalies.
We use a linear approximation of the slopes instead of the convex slope shape presented in the paper.

Parameters

buffer_size (Optional[int]) – Size of the buffer region around an anomaly. We add an increasing slope of size buffer_size//2 to the beginning of anomalies and a decreasing slope of size buffer_size//2 to the end of anomalies. Per default (when buffer_size==None), buffer_size is the median length of the anomalies within the time series. However, you can also set it to the period size of the dominant frequency or any other desired value.
compatibility_mode (bool) – When set to True, produces exactly the same output as the metric implementation by the original authors. Otherwise, TimeEval uses a slightly improved implementation that fixes some bugs and uses linear slopes.
max_samples (int) – Calculating precision and recall for many thresholds is quite slow. We, therefore, uniformly sample thresholds from the available score space. This parameter controls the maximum number of thresholds; too low numbers degrade the metrics’ quality.
plot (bool) –
plot_store (bool) –

anomaly_bounds(y_true: ndarray) → Tuple[ndarray, ndarray]¶: corresponds to range_convers_new

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.RangeRocAUC¶

class timeeval.metrics.RangeRocAUC(buffer_size: Optional[int] = None, compatibility_mode: bool = False, max_samples: int = 250, plot: bool = False, plot_store: bool = False)¶

Bases: RangeAucMetric

Computes the area under the receiver-operating-characteristic-curve using the range-based TPR and range-based FPR definition from Paparrizos et al. published at VLDB 2022 [PaparrizosEtAl2022].

We first extend the anomaly labels by two slopes of buffer_size//2 length on both sides of each anomaly, uniformly sample thresholds from the anomaly score, and then compute the confusion matrix for all thresholds. Using the resulting false positive (FPR) and false positive rates (FPR), we can plot a curve and compute its area.

We make some changes to the original implementation from [PaparrizosEtAl2022] because we do not agree with the original assumptions. To reproduce the original results, you can set the parameter compatibility_mode=True. This will compute exactly the same values as the code by the authors of the paper.

The following things are different in TimeEval compared to the original version:

For the recall (FPR) existence reward, we count anomalies as separate events, even if the added slopes overlap.
Overlapping slopes don’t sum up in their anomaly weight, but we just take to maximum anomaly weight for each point in the ground truth.
The original slopes are asymmetric: The slopes at the end of anomalies are a single point shorter than the ones at the beginning of anomalies. We use symmetric slopes of the same size for the beginning and end of anomalies.
We use a linear approximation of the slopes instead of the convex slope shape presented in the paper.

Parameters

buffer_size (Optional[int]) – Size of the buffer region around an anomaly. We add an increasing slope of size buffer_size//2 to the beginning of anomalies and a decreasing slope of size buffer_size//2 to the end of anomalies. Per default (when buffer_size==None), buffer_size is the median length of the anomalies within the time series. However, you can also set it to the period size of the dominant frequency or any other desired value.
compatibility_mode (bool) – When set to True, produces exactly the same output as the metric implementation by the original authors. Otherwise, TimeEval uses a slightly improved implementation that fixes some bugs and uses linear slopes.
max_samples (int) – Calculating precision and recall for many thresholds is quite slow. We, therefore, uniformly sample thresholds from the available score space. This parameter controls the maximum number of thresholds; too low numbers degrade the metrics’ quality.
plot (bool) –
plot_store (bool) –

See also

https://en.wikipedia.org/wiki/Receiver_operating_characteristic : Explanation of the ROC-curve.

anomaly_bounds(y_true: ndarray) → Tuple[ndarray, ndarray]¶: corresponds to range_convers_new

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.RangePrVUS¶

class timeeval.metrics.RangePrVUS(max_buffer_size: int = 500, compatibility_mode: bool = False, max_samples: int = 250)¶

Bases: RangeAucMetric

Computes the volume under the precision-recall-buffer_size-surface using the range-based precision and range-based recall definition from Paparrizos et al. published at VLDB 2022 [PaparrizosEtAl2022].

For all buffer sizes from 0 to max_buffer_size, we first extend the anomaly labels by two slopes of buffer_size//2 length on both sides of each anomaly, uniformly sample thresholds from the anomaly score, and then compute the confusion matrix for all thresholds. Using the resulting precision and recall values, we can plot a curve and compute its area.

This metric includes similar changes as RangePrAUC, which can be disabled using the compatibility_mode parameter.

Parameters

max_buffer_size (int) – Maximum size of the buffer region around an anomaly. We iterate over all buffer sizes from 0 to may_buffer_size to create the surface.
compatibility_mode (bool) – When set to True, produces exactly the same output as the metric implementation by the original authors. Otherwise, TimeEval uses a slightly improved implementation that fixes some bugs and uses linear slopes.
max_samples (int) – Calculating precision and recall for many thresholds is quite slow. We, therefore, uniformly sample thresholds from the available score space. This parameter controls the maximum number of thresholds; too low numbers degrade the metrics’ quality.

See also

timeeval.metrics.RangePrAUC: Area under the curve version using a single buffer size.

anomaly_bounds(y_true: ndarray) → Tuple[ndarray, ndarray]¶: corresponds to range_convers_new

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.RangeRocVUS¶

class timeeval.metrics.RangeRocVUS(max_buffer_size: int = 500, compatibility_mode: bool = False, max_samples: int = 250)¶

Bases: RangeAucMetric

Computes the volume under the receiver-operating-characteristic-buffer_size-surface using the range-based TPR and range-based FPR definition from Paparrizos et al. published at VLDB 2022 [PaparrizosEtAl2022].

For all buffer sizes from 0 to max_buffer_size, we first extend the anomaly labels by two slopes of buffer_size//2 length on both sides of each anomaly, uniformly sample thresholds from the anomaly score, and then compute the confusion matrix for all thresholds. Using the resulting false positive (FPR) and false positive rates (FPR), we can plot a curve and compute its area.

This metric includes similar changes as RangeRocAUC, which can be disabled using the compatibility_mode parameter.

Parameters

max_buffer_size (int) – Maximum size of the buffer region around an anomaly. We iterate over all buffer sizes from 0 to may_buffer_size to create the surface.
compatibility_mode (bool) – When set to True, produces exactly the same output as the metric implementation by the original authors. Otherwise, TimeEval uses a slightly improved implementation that fixes some bugs and uses linear slopes.
max_samples (int) – Calculating precision and recall for many thresholds is quite slow. We, therefore, uniformly sample thresholds from the available score space. This parameter controls the maximum number of thresholds; too low numbers degrade the metrics’ quality.

See also

https://en.wikipedia.org/wiki/Receiver_operating_characteristic :: Explanation of the ROC-curve.
timeeval.metrics.RangeRocAUC :: Area under the curve version using a single buffer size.

References

PaparrizosEtAl2022(1,2,3,4,5,6,7): John Paparrizos, Paul Boniol, Themis Palpanas, Ruey S. Tsay, Aaron Elmore, and Michael J. Franklin. Volume Under the Surface: A New Accuracy Evaluation Measure for Time-Series Anomaly Detection. PVLDB, 15(11): 2774 - 2787, 2022. doi:10.14778/3551793.3551830

anomaly_bounds(y_true: ndarray) → Tuple[ndarray, ndarray]¶: corresponds to range_convers_new

property name: str¶: Returns the unique name of this metric.

score(y_true: ndarray, y_score: ndarray) → float¶

Implementation of the metric’s scoring function.

Please use __call__() instead of calling this function directly!

Examples

Instantiate a metric and call it using the __call__ method:

>>> import numpy as np
>>> from timeeval.metrics import RocAUC
>>> metric = RocAUC(plot=False)
>>> metric(np.array([0, 1, 1, 0]), np.array([0.1, 0.4, 0.35, 0.8]))
0.5

supports_continuous_scorings() → bool¶: Whether this metric accepts continuous anomaly scorings as input (True) or binary classification labels (False).

timeeval.metrics.DefaultMetrics¶

class timeeval.metrics.DefaultMetrics¶

Default metrics of TimeEval that can be used directly for time series anomaly detection algorithms without further configuration.

Examples

Using the default metric list that just contains ROC_AUC:

>>> from timeeval import TimeEval, DefaultMetrics
>>> TimeEval(dataset_mgr=..., datasets=[], algorithms=[],
>>>          metrics=DefaultMetrics.default_list())

You can also specify multiple default metrics:

>>> from timeeval import TimeEval, DefaultMetrics
>>> TimeEval(dataset_mgr=..., datasets=[], algorithms=[],
>>>          metrics=[DefaultMetrics.ROC_AUC, DefaultMetrics.PR_AUC, DefaultMetrics.FIXED_RANGE_PR_AUC])

AVERAGE_PRECISION = <timeeval.metrics.other_metrics.AveragePrecision object>¶

FIXED_RANGE_PR_AUC = <timeeval.metrics.range_metrics.RangePrecisionRangeRecallAUC object>¶

PR_AUC = <timeeval.metrics.auc_metrics.PrAUC object>¶

RANGE_F1 = <timeeval.metrics.range_metrics.RangeFScore object>¶

RANGE_PRECISION = <timeeval.metrics.range_metrics.RangePrecision object>¶

RANGE_PR_AUC = <timeeval.metrics.range_metrics.RangePrecisionRangeRecallAUC object>¶

RANGE_RECALL = <timeeval.metrics.range_metrics.RangeRecall object>¶

ROC_AUC = <timeeval.metrics.auc_metrics.RocAUC object>¶

static default() → Metric¶: TimeEval’s default metric ROC_AUC.

static default_list() → List[Metric]¶: The list containing TimeEval’s single default metric ROC_AUC. For your convenience and usage as default parameter in many TimeEval library functions.